S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different...
Transcript of S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different...
![Page 1: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/1.jpg)
S.V.N. Vishwanathan: Graph Kernels, Page 1
Graph KernelsS.V.N. Vishwanathan
[email protected]://www.stat.purdue.edu/~vishy
Purdue University
Joint work with Karsten Borgwardt, Nic Schraudolph, andRisi Kondor
![Page 2: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/2.jpg)
Graphs are Everywhere
S.V.N. Vishwanathan: Graph Kernels, Page 2
Two protein molecules The Internet
Comparing Graphs:How similar are two graphs?
Comparing Nodes:How similar are two nodes of a graph?
![Page 3: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/3.jpg)
Adjacency Matrix
S.V.N. Vishwanathan: Graph Kernels, Page 3
1
23
4
5
6
7 8
vertices/nodes edges
Undirected Graph G(V, E)
A =
0 1 0 0 0 0 1 01 0 1 0 0 0 0 00 1 0 1 0 0 0 10 0 1 0 1 1 0 10 0 0 1 0 1 0 00 0 0 1 1 0 1 01 0 0 0 0 1 0 10 0 1 1 0 0 1 0
sub-matrix of A = a subgraph of G
![Page 4: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/4.jpg)
Degree Matrix
S.V.N. Vishwanathan: Graph Kernels, Page 4
1
23
4
5
6
7 8
D =
2 0 0 0 0 0 0 00 2 0 0 0 0 0 00 0 3 0 0 0 0 00 0 0 4 0 0 0 00 0 0 0 2 0 0 00 0 0 0 0 3 0 00 0 0 0 0 0 3 00 0 0 0 0 0 0 3
Normalized Adjacency matrix A = D−1A is a stochastic ma-trix (each row sums to one)
![Page 5: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/5.jpg)
Graph Laplacian
S.V.N. Vishwanathan: Graph Kernels, Page 5
1
23
4
5
6
7 8
L =
2 −1 0 0 0 0 −1 0−1 2 −1 0 0 0 0 0
0 −1 3 −1 0 0 0 −10 0 −1 4 −1 −1 0 −10 0 0 −1 2 −1 0 00 0 0 −1 −1 3 −1 0−1 0 0 0 0 −1 3 −1
0 0 −1 −1 0 0 −1 3
L = D − A
Normalized version
L = D−12(D − A)D−
12
Spectrum bounded between 0 and 2
![Page 6: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/6.jpg)
Random Walk
S.V.N. Vishwanathan: Graph Kernels, Page 6
1
23
4
5
6
7 8
?
From a vertex i randomly jump to any adjacent vertex j
Probability of jumping to j proportional to Aij
![Page 7: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/7.jpg)
Walks of Length 2
S.V.N. Vishwanathan: Graph Kernels, Page 7
1
23
4
5
6
7 8
A2 =
2 0 1 0 0 1 0 10 2 0 1 0 0 1 11 0 3 1 1 1 1 10 1 1 4 1 1 2 10 0 1 1 2 1 1 11 0 1 1 1 3 0 20 1 1 2 1 0 3 01 1 1 1 1 2 0 3
Entries of A2 = number of length 2 walks
Entries of A2 = probability of length 2 walks
![Page 8: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/8.jpg)
Idea!
S.V.N. Vishwanathan: Graph Kernels, Page 8
1
23
4
5
6
7 8
Count walks?
Count number ofwalks between twonodes
Two nodes are similarif they are connectedby many walks
Does not work :(
If graph has cycles then number of walks goes to∞
![Page 9: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/9.jpg)
A Better Idea!
S.V.N. Vishwanathan: Graph Kernels, Page 9
1
23
4
5
6
7 8
Count walks? Discount contributionof longer walks
Count number ofwalks between twonodes
Two nodes are similarif they are connectedby many walks
Works if discounting factor chosen appropriately!
![Page 10: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/10.jpg)
Diffusion Kernels
S.V.N. Vishwanathan: Graph Kernels, Page 10
Discounting Factor:Discount a k length walk by λk/k! for 0 ≤ λ ≤ 1
Similarity:Similarity defined as
k(i, j) =
[∑k
λk
k!Ak
]ij
= [exp(λA)]ij
Kondor and Lafferty:Work with diffusion and hence the graph Laplacian
k(i, j) =
[∑k
λk
k!Lk
]ij
= [exp(λL)]ij
They show that this is a valid p.s.d kernel
![Page 11: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/11.jpg)
Extensions
S.V.N. Vishwanathan: Graph Kernels, Page 11
Laplacian as a regularizer:For any real-valued function f on the vertices of a graph
〈f, Lf〉 = f>Lf = −1
2
∑i∼j
(fi − fj)2
Can regularize differently if we replace L by
r(L) :=∑i
r(ρi)lil>i
Any monotonically increasing function of ρ admissible
![Page 12: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/12.jpg)
Smola and Kondor
S.V.N. Vishwanathan: Graph Kernels, Page 12
Other Kernels:
r(ρ) = 1 + σ2ρ, K = (I + σ2L)−1 regularized Laplacianr(ρ) = (1− λρ)−p, K = (I − λL)p p-step random walk
![Page 13: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/13.jpg)
Comparing Graphs
S.V.N. Vishwanathan: Graph Kernels, Page 13
3
4
5
6
1
2
7 8
Count matching walks?
3'1'
2'
4'5'
Count number ofmatching walks intwo graphs
Discount contributionof longer walks
Two graphs are simi-lar if many walks arematching
Three Questions:How to formalize this intuition?How to compute this efficiently?How is this related to diffusion kernels?
![Page 14: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/14.jpg)
Direct Product Graph
S.V.N. Vishwanathan: Graph Kernels, Page 14
3'1'
2'
4'
31
2
11' 21'31'
12'
32'
22'
13'23'
33'14'
24'
34'
X
Formal Definition
V×(G×G′) ={(v, v′) : v ∈ V, v′ ∈ V ′}E×(G×G′) ={((v, v′), (w,w′)) : (v, w) ∈ E, (v′, w′) ∈ E ′}
![Page 15: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/15.jpg)
Key Insight
S.V.N. Vishwanathan: Graph Kernels, Page 15
Random Walk on Product Graph:Equivalent to simultaneous random walk on input graphs
Kernel Definition:
k(G,G′) =1
|G| |G′|∑k
λk
k!e>Ak
× e =1
|G| |G′|e> exp(λA×) e
![Page 16: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/16.jpg)
Extensions
S.V.N. Vishwanathan: Graph Kernels, Page 16
Different Decay Factor (Gärtner et al.):Using a λk decay
k(G,G′) =1
|G||G′|∑k
λk e>Ak× e
=1
|G||G′|e>(I−λA×)−1 e
Taking expectations:Instead of summing, take expectations
k(G,G′) =∑k
λk q>×Ak×p× = q>×(I−λA×)−1p×
p× and q× are initial and stopping probabilities resp.
![Page 17: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/17.jpg)
Efficient Computation
S.V.N. Vishwanathan: Graph Kernels, Page 17
Product Graph is Huge:If G and G′ have n vertices then product graph has n2
verticesAdjacency matrix A× is of size n2 × n2
Houston we have a problem:Kernel computation involves
k(G,G′) = q>× exp(λA×)︸ ︷︷ ︸O(n6) !
p×
or
k(G,G′) = q>× (I−λA×)−1︸ ︷︷ ︸O(n6) !
p×
![Page 18: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/18.jpg)
Kronecker Products
S.V.N. Vishwanathan: Graph Kernels, Page 18
Definition (by example):
A =
[1 00 1
]and B =
2 5 25 2 51 5 2
then
A⊗B =
2 5 2 0 0 05 2 5 0 0 01 5 2 0 0 00 0 0 2 5 20 0 0 5 2 50 0 0 1 5 2
![Page 19: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/19.jpg)
Key Insight
S.V.N. Vishwanathan: Graph Kernels, Page 19
The adjacency matrix of the product graph
A× = A⊗ A′
Can compute exp(A×) as
exp(A×) = exp(A)︸ ︷︷ ︸O(n3)
⊗ exp(A′)︸ ︷︷ ︸O(n3)
Computing (I−λA)−1 involves a bit more work . . .
![Page 20: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/20.jpg)
Sylvester Equations
S.V.N. Vishwanathan: Graph Kernels, Page 20
Claim:Computing the Gärtner et. al kernel is no harder thansolving
X = A′XA> + P
Sylvester Equations:The above equation is called a Sylvester equationWell studied in control theoryEfficiently solvable in O(n3) timedylap method in Matlab
![Page 21: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/21.jpg)
Before the proof . . .
S.V.N. Vishwanathan: Graph Kernels, Page 21
vec operator:
B =
2 5 25 2 51 5 2
and vec(B) =
251525252
Key Equation:
vec(ABC)= (C> ⊗ A) vec(B)
![Page 22: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/22.jpg)
The Proof
S.V.N. Vishwanathan: Graph Kernels, Page 22
Rewrite the Sylvester equation as
vec(X) = vec(A′XA>) + vec(P )
Apply key equation
vec(X) = (A⊗ A′) vec(X) + vec(P )
Rearrange
(I−A⊗ A′) vec(X) = vec(P )
or equivalently
vec(X) = (I−A⊗ A′)−1 vec(P )
Let p× = vec(P ) and multiply both sides by q×
q>× vec(X) = q>×(I−A⊗ A′)−1p× = K(G,G′)
![Page 23: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/23.jpg)
Other Schemes
S.V.N. Vishwanathan: Graph Kernels, Page 23
Basic Idea:
vec(A′XA>)︸ ︷︷ ︸O(n3)
= (A⊗ A′) vec(X)︸ ︷︷ ︸O(n4)
Can exploit sparsity of A and A′ to speed up things
Fixed Point Iteration:Solve for a fixed point (Kashima et. al):
(I−A⊗ A′) vec(X∞) = vec(X∞)
Conjugate Gradient:Fast matrix-vector multiplication to speed up CG solverConvergence depends on spectrum of A and A′
![Page 24: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/24.jpg)
Relation to Diffusion Kernels
S.V.N. Vishwanathan: Graph Kernels, Page 24
Laplacian of the Direct Product Graph:In general L× 6= L1 ⊗ L2 :(But there is a fix ...
Cartesian Product of Graphs:
V� ={(v, v′) : v ∈ V, v′ ∈ V ′}E� ={((v, v′), (w,w′)) : (v, w) ∈ E, (v′, w′) ∈ E ′}
For Cartesian products
A� = A1 ⊕ A2 := A1 ⊗ I + I ⊗ A2
L� = L1 ⊕ L2
All our efficient computation tricks apply!Is the kernel PSD?
![Page 25: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/25.jpg)
Experiments
S.V.N. Vishwanathan: Graph Kernels, Page 25
Scaling Behavior - I:Begin with empty graphs of size 2k where k = 1, . . . 10
Randomly insert edges untilavg. degree at least 2 orgraph is full
Generate 10 random graphs and compute kernel matrix
![Page 26: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/26.jpg)
Experiments
S.V.N. Vishwanathan: Graph Kernels, Page 26
Scaling Behavior - II:Begin with empty graphs of size 32
Randomly insert edges untilavg. fill-in of adjacency matrix is 10% . . . 100% andGraph is connected
Generate 10 random graphs and compute kernel matrix
![Page 27: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/27.jpg)
Experiments
S.V.N. Vishwanathan: Graph Kernels, Page 27
Impact of the vec-trick:Same graphs as the runtime vs nodes experimentUse the vec trick in the fixed point iterationCompare to original fixed point iteration
![Page 28: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/28.jpg)
Experiments
S.V.N. Vishwanathan: Graph Kernels, Page 28
Unlabeled Graphs: We computed graph kernels on fourdatasets for molecular function prediction: Mutag andPtc (chemical compounds), Enzyme and Protein (proteinstructures). We report runtimes for computing a 100 × 100kernel matrix.
dataset Mutag Ptc Enzyme Proteinnodes/graph 17.7 26.7 32.6 38.6edges/node 2.2 1.9 3.8 3.7
Direct 18’09" 142’53" 31h* 36d*Sylvester 25.9" 73.8" 48.3" 69’15"
Conjugate 42.1" 58.4" 44.6" 55.3"Fixed-Point 12.3" 32.4" 13.6" 31.1"
![Page 29: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/29.jpg)
Experiments
S.V.N. Vishwanathan: Graph Kernels, Page 29
Labeled Graphs: We repeated the above graph kernel com-putation, now using either a linear or delta kernel betweennode labels as well.
kernel delta lineardataset Mutag Ptc Enzyme Protein
Direct 7.2h 1.4d* 2.4d* 5.3d*Sylvester 3.9d* 2.7d* 89.8" 25’24"
Conjugate 2’35" 3’20" 124.4" 3’01"Fixed Point 1’05" 1’31" 50.1" 1’47"
![Page 30: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/30.jpg)
What I did not talk about
S.V.N. Vishwanathan: Graph Kernels, Page 30
Random walks on other semirings e.g. (min,+)
Why (min,+) does not yield p.s.d kernels
Differences between A, A, L, and L
Kernels on vertices (yields marginal graph kernels ofKashima et. al)
Extensions to trajectories of ARMA models (joint work withRené Vidal and Alex Smola)
General theory using Binet-Cauchy theorem (joint workwith Alex Smola)
Connections to Rational kernels of Cortes et. al
Connections to R-Convolution kernels of Haussler
![Page 31: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/31.jpg)
Overview of my Research
S.V.N. Vishwanathan: Graph Kernels, Page 31
Structured Input:StringsGraphsARMA models
Structured Output:Exponential families in feature space
Optimization for Machine Learning:Bundle methodssubBFGS
TheoryFundamental limitations of kernelsRates of convergence of boosting algorithms
![Page 32: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/32.jpg)
Conclusion
S.V.N. Vishwanathan: Graph Kernels, Page 32
First unifying view of
Diffusion kernelsRegularization on graphsGeometric and random walk kernelsMarginal graph kernels
Efficient computation by exploiting Kronecker products
Papers at http://www.stat.purdue.edu/~vishy
![Page 33: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/33.jpg)
Big Open Question
S.V.N. Vishwanathan: Graph Kernels, Page 33
Comparing paths in two different graphs is polynomial
Subgraph isomorphism is known to be NP-hard
Computing the so-called universal graph kernel whichcounts all common subgraphs of two graphs is harder thansubgraph isomorphism
When we compare any other subgraphs e.g.
simple paths (where vertices do not repeat)cyclestrees
we seem to lose polynomial run-time
Are there other subgraphs for which efficient computationis possible?
![Page 34: S.V.N. Vishwanathanvishy/talks/Graphs.pdf · S.V.N. Vishwanathan:Graph Kernels, Page 16 Different Decay Factor (Gärtner et al.): Using a kdecay k(G;G0) = 1 jGjjG0j X k k e>Ak e =](https://reader034.fdocuments.in/reader034/viewer/2022050210/5f5cbbd26433ff40f21bd32c/html5/thumbnails/34.jpg)
References
S.V.N. Vishwanathan: Graph Kernels, Page 34
Journal Papers[1] S. V. N. Vishwanathan, Karsten Borgwardt, Nicol N. Schraudolph, and Imre Risi Kondor. On graph kernels. J. Mach. Learn. Res.,
2008. submitted.
[2] S. V. N. Vishwanathan, A. J. Smola, and R. Vidal. Binet-Cauchy kernels on dynamical systems and its application to the analysis ofdynamic scenes. International Journal of Computer Vision, 73(1):95–119, 2007.
Conference Papers[1] S. V. N. Vishwanathan, Karsten Borgwardt, and Nicol N. Schraudolph. Fast computation of graph kernels. Technical report, NICTA,
2006.
[2] S. V. N. Vishwanathan and A. J. Smola. Binet-Cauchy kernels. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in NeuralInformation Processing Systems 17, pages 1441–1448, Cambridge, MA, 2005. MIT Press.
Applications to Bioinformatics[1] Karsten M. Borgwardt, H.-P. Kriegel, S. V. N. Vishwanathan, and N. Schraudolph. Graph kernels for disease outcome prediction from
protein-protein interaction networks. In Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Tiffany Murray, and Teri E Klein, editors,Proceedings of the Pacific Symposium of Biocomputing 2007, Maui Hawaii, January 2007. World Scientific.
[2] Karsten M. Borgwardt, S. V. N. Vishwanathan, and H.-P. Kriegel. Class prediction from time series gene expression profiles us-ing dynamical systems kernels. In Russ B. Altman, A. Keith Dunker, Lawrence Hunter, Tiffany Murray, and Teri E Klein, editors,Proceedings of the Pacific Symposium of Biocomputing 2006, pages 547–558, Maui Hawaii, January 2006. World Scientific.
[3] K. M. Borgwardt, C. S. Ong, S. Schönauer, S. V. N. Vishwanathan, A. J. Smola, and H.P. Kriegel. Protein function prediction viagraph kernels. In Proceedings of Intelligent Systems in Molecular Biology (ISMB), Detroit, USA, 2005.