Bipartite Edge Prediction via Transductive Learning over...
Transcript of Bipartite Edge Prediction via Transductive Learning over...
Bipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via TransductiveLearning over Product Graphs
Hanxiao Liu, Yiming Yang
School of Computer Science, Carnegie Mellon University
July 8, 2015
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 1
Bipartite Edge Prediction via Transductive Learning over Product GraphsProblem Description
Outline
1 Problem Description
2 The Proposed Framework
3 FormulationProduct Graph ConstructionGraph-based Transductive Learning
4 Optimization
5 Experiment
6 Conclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 2
Bipartite Edge Prediction via Transductive Learning over Product GraphsProblem Description
Problem Description
Many applications involve predicting the edges of a bipartite graph.
I
II
A
B
C
?
?
?
?-2
+5
Graph G Graph H
1 Recommender System2 Host-Pathogen Interaction3 Question-Answering Mapping4 Citation Network . . .
Sometimes, vertex sets on both sides are intrinsically structured.Heterogeneous info: G + H + partial observationsCombine them to make better edge predictions?
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 3
Bipartite Edge Prediction via Transductive Learning over Product GraphsProblem Description
Problem Description
Many applications involve predicting the edges of a bipartite graph.
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
1 Recommender System2 Host-Pathogen Interaction3 Question-Answering Mapping4 Citation Network . . .
Sometimes, vertex sets on both sides are intrinsically structured.
Heterogeneous info: G + H + partial observationsCombine them to make better edge predictions?
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 4
Bipartite Edge Prediction via Transductive Learning over Product GraphsProblem Description
Problem Description
Many applications involve predicting the edges of a bipartite graph.
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
1 Recommender System2 Host-Pathogen Interaction3 Question-Answering Mapping4 Citation Network . . .
Sometimes, vertex sets on both sides are intrinsically structured.Heterogeneous info: G + H + partial observations
Combine them to make better edge predictions?
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 5
Bipartite Edge Prediction via Transductive Learning over Product GraphsProblem Description
Problem Description
Many applications involve predicting the edges of a bipartite graph.
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
1 Recommender System2 Host-Pathogen Interaction3 Question-Answering Mapping4 Citation Network . . .
Sometimes, vertex sets on both sides are intrinsically structured.Heterogeneous info: G + H + partial observationsCombine them to make better edge predictions?
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 6
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labelsPrerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 7
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labels
Prerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 8
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labelsPrerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)
Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 9
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
I
II
A
B
C
?
?
?
?-2
+5Graph G Graph H
Transductive learning should be effective1 Labeled edges (red) are highly sparse2 Unlabeled edges (gray) are massively available
Assumption: similar edges should have similar labelsPrerequisite: a similarity measure among the edges, i.e. a “Graph ofEdges” (not directly provided)Can be induced from G and H via Graph Product!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 10
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
The “Graph of Edges” can be induced by taking the product of G and H
In the product graph G ◦HEach Vertex ∼ edge (in the original bipartite graph)Each Edge ∼ edge-edge similarity
The adjacency matrix of the product graph is defined by “◦” (to bediscussed later).
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 11
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
The “Graph of Edges” can be induced by taking the product of G and H
In the product graph G ◦HEach Vertex ∼ edge (in the original bipartite graph)Each Edge ∼ edge-edge similarity
The adjacency matrix of the product graph is defined by “◦” (to bediscussed later).
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 12
Bipartite Edge Prediction via Transductive Learning over Product GraphsThe Proposed Framework
The Proposed Framework
Problem Mapping
Edge Prediction(Original Problem)Given G, H and labeled edges,predict the unlabeled edges
I
II
A
B
C
?
?
?
?-2
+5
Vertex Prediction(Equivalent Problem)Given G◦H and labeled vertices,predict the unlabeled vertices
(I, C)?
(I, A)-2
(I, B)?
(II, C)?
(II, A)?
(II, B)+5
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 13
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Outline
1 Problem Description
2 The Proposed Framework
3 FormulationProduct Graph ConstructionGraph-based Transductive Learning
4 Optimization
5 Experiment
6 Conclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 14
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Outline
1 Problem Description
2 The Proposed Framework
3 FormulationProduct Graph ConstructionGraph-based Transductive Learning
4 Optimization
5 Experiment
6 Conclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 15
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Product Graph Construction
Q: When should vertex (i, j) ∼ (i′, j′) in the product graph?Tensor GP i ∼ i′ in G AND j ∼ j′ in H
Cartesian GP(i ∼ i′ in G AND j = j′
)OR
(i = i′ AND j ∼ j′ in H
)
Can be trivially generalized to weighted graphs.
To compute the adjacency matrices of PGG ◦Tensor H = G⊗H︸ ︷︷ ︸
Kronecker (a.k.a. Tensor) Product
G ◦Cartesian H = G⊗ I + I ⊗H = G⊕H︸ ︷︷ ︸Kronecker Sum
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 16
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Product Graph Construction
Q: When should vertex (i, j) ∼ (i′, j′) in the product graph?Tensor GP i ∼ i′ in G AND j ∼ j′ in H
Cartesian GP(i ∼ i′ in G AND j = j′
)OR
(i = i′ AND j ∼ j′ in H
)Can be trivially generalized to weighted graphs.
To compute the adjacency matrices of PGG ◦Tensor H = G⊗H︸ ︷︷ ︸
Kronecker (a.k.a. Tensor) Product
G ◦Cartesian H = G⊗ I + I ⊗H = G⊕H︸ ︷︷ ︸Kronecker Sum
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 17
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Product Graph Construction
Q: When should vertex (i, j) ∼ (i′, j′) in the product graph?Tensor GP i ∼ i′ in G AND j ∼ j′ in H
Cartesian GP(i ∼ i′ in G AND j = j′
)OR
(i = i′ AND j ∼ j′ in H
)Can be trivially generalized to weighted graphs.
To compute the adjacency matrices of PGG ◦Tensor H = G⊗H︸ ︷︷ ︸
Kronecker (a.k.a. Tensor) Product
G ◦Cartesian H = G⊗ I + I ⊗H = G⊕H︸ ︷︷ ︸Kronecker Sum
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 18
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Product Graph Construction
Both GPs can be written in the form of spectral decomposition
G ◦Tensor H =∑i,j
(λi × µj)(ui ⊗ vj)(ui ⊗ vj)> (1)
G ◦Cartesian H =∑i,j
(λi + µj)(ui ⊗ vj)(ui ⊗ vj)> (2)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 19
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Product Graph Construction
Both GPs can be written in the form of spectral decomposition
G ◦Tensor H =∑i,j
(λi × µj)︸ ︷︷ ︸soft AND
(ui ⊗ vj)(ui ⊗ vj)> (1)
G ◦Cartesian H =∑i,j
(λi + µj)︸ ︷︷ ︸soft OR
(ui ⊗ vj)(ui ⊗ vj)> (2)
The interplay of graphs is captured by the interplay of their spectrum!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 20
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Product Graph Construction
Both GPs can be written in the form of spectral decomposition
G ◦Tensor H =∑i,j
(λi × µj)︸ ︷︷ ︸soft AND
(ui ⊗ vj)(ui ⊗ vj)> (1)
G ◦Cartesian H =∑i,j
(λi + µj)︸ ︷︷ ︸soft OR
(ui ⊗ vj)(ui ⊗ vj)> (2)
The interplay of graphs is captured by the interplay of their spectrum!
Generalization: Spectral Graph Product
G ◦H def=∑i,j
(λi ◦ µj)(ui ⊗ vj)(ui ⊗ vj)> (3)
where “◦” can be arbitrary binary operator (“×”, “+”, . . . )
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 21
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Product Graph Construction
Product Graph Construction
Both GPs can be written in the form of spectral decomposition
G ◦Tensor H =∑i,j
(λi × µj)︸ ︷︷ ︸soft AND
(ui ⊗ vj)(ui ⊗ vj)> (1)
G ◦Cartesian H =∑i,j
(λi + µj)︸ ︷︷ ︸soft OR
(ui ⊗ vj)(ui ⊗ vj)> (2)
The interplay of graphs is captured by the interplay of their spectrum!
Generalization: Spectral Graph Product
G ◦H def=∑i,j
(λi ◦ µj)(ui ⊗ vj)(ui ⊗ vj)> (3)
where “◦” can be arbitrary binary operator (“×”, “+”, . . . )
Commutative Property: G ◦H and H ◦G are isomorphic.ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 22
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Graph-based Transductive Learning
Outline
1 Problem Description
2 The Proposed Framework
3 FormulationProduct Graph ConstructionGraph-based Transductive Learning
4 Optimization
5 Experiment
6 Conclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 23
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Graph-based Transductive Learning
Graph-based Transductive Learning
With the product graph A def= G ◦H constructed, we solve a standardgraph-based transductive learning problem over A
Learning Objective
minf
`(f)︸︷︷︸Loss Function
+ λf>A−1f︸ ︷︷ ︸Graph Regularization
(4)
fi system-predicted value for vertex i in A`(f) quantifies the gap between f and partially observed labels.
λf>A−1f quantifies the smoothness over graphUnderlying assumption: f ∼ N (0, A)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 24
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Graph-based Transductive Learning
Graph-based Transductive Learning
With the product graph A def= G ◦H constructed, we solve a standardgraph-based transductive learning problem over A
Learning Objective
minf
`(f)︸︷︷︸Loss Function
+ λf>A−1f︸ ︷︷ ︸Graph Regularization
(4)
fi system-predicted value for vertex i in A`(f) quantifies the gap between f and partially observed labels.
λf>A−1f quantifies the smoothness over graphUnderlying assumption: f ∼ N (0, A)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 25
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Graph-based Transductive Learning
Graph-based Transductive Learning
The enhanced learning objective
minf
`(f)︸︷︷︸Loss Function
+ λf>κ(A)−1f︸ ︷︷ ︸Graph Regularization
(5)
to incorporate a variety of graph transduction patterns:
k-step Random Walk κ(A) = Ak
Regularized Laplacian κ(A) = (εI −A)−1 = I +A+A2 +A3 + . . .
Diffusion Process κ(A) = exp(A) ≡ I +A+ 12!A
2 + 13!A
3 + · · ·
All can be viewed as to transform the spectrum of A :=∑i θiuiu
>i
Ak =∑i
θki uiu>i (εI−A)−1 =
∑i
1ε− θi
uiu>i exp(A) =
∑i
eθiuiu>i
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 26
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Graph-based Transductive Learning
Graph-based Transductive Learning
The enhanced learning objective
minf
`(f)︸︷︷︸Loss Function
+ λf>κ(A)−1f︸ ︷︷ ︸Graph Regularization
(5)
to incorporate a variety of graph transduction patterns:k-step Random Walk κ(A) = Ak
Regularized Laplacian κ(A) = (εI −A)−1 = I +A+A2 +A3 + . . .
Diffusion Process κ(A) = exp(A) ≡ I +A+ 12!A
2 + 13!A
3 + · · ·
All can be viewed as to transform the spectrum of A :=∑i θiuiu
>i
Ak =∑i
θki uiu>i (εI−A)−1 =
∑i
1ε− θi
uiu>i exp(A) =
∑i
eθiuiu>i
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 27
Bipartite Edge Prediction via Transductive Learning over Product GraphsFormulation
Graph-based Transductive Learning
Graph-based Transductive Learning
The enhanced learning objective
minf
`(f)︸︷︷︸Loss Function
+ λf>κ(A)−1f︸ ︷︷ ︸Graph Regularization
(5)
to incorporate a variety of graph transduction patterns:k-step Random Walk κ(A) = Ak
Regularized Laplacian κ(A) = (εI −A)−1 = I +A+A2 +A3 + . . .
Diffusion Process κ(A) = exp(A) ≡ I +A+ 12!A
2 + 13!A
3 + · · ·
All can be viewed as to transform the spectrum of A :=∑i θiuiu
>i
Ak =∑i
θki uiu>i (εI−A)−1 =
∑i
1ε− θi
uiu>i exp(A) =
∑i
eθiuiu>i
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 28
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Outline
1 Problem Description
2 The Proposed Framework
3 FormulationProduct Graph ConstructionGraph-based Transductive Learning
4 Optimization
5 Experiment
6 Conclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 29
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 30
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Even if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 31
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memoryProhibitive to compute its inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 32
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memory No need to store κ(A)Prohibitive to compute its inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 33
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memory No need to store κ(A)Prohibitive to compute its inverse No need of matrix inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naively
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 34
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Transductive Learning over Product Graph
minf
`(f) + λ f>κ(A)−1f︸ ︷︷ ︸r(f)
(6)
Challenge: κ(A) = κ( G︸︷︷︸m×m
◦ H︸︷︷︸n×n
) is a huge mn×mn matrix!
Prohibitive to load it into memory No need to store κ(A)Prohibitive to compute its inverse No need of matrix inverseEven if κ(A)−1 is given, it is expensive to compute ∇r(f) naivelyCan be performed much more efficiently
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 35
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:
Bottleneck: multiplication (X ⊗ Y )ff = vec(F ), where Fij
def= system-predicted score for edge (i, j)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 36
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:Bottleneck: multiplication (X ⊗ Y )f
f = vec(F ), where Fijdef= system-predicted score for edge (i, j)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 37
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:Bottleneck: multiplication (X ⊗ Y )ff = vec(F ), where Fij
def= system-predicted score for edge (i, j)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 38
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization
Keys for complexity reduction1 Instead of matrices—
κ only manipulates eigenvalues◦ only manipulates the interplay of eigenvalues
2 The “vec” trick:Bottleneck: multiplication (X ⊗ Y )ff = vec(F ), where Fij
def= system-predicted score for edge (i, j)(X ⊗ Y )f︸ ︷︷ ︸
O(m2n2) time/space
= (X ⊗ Y )vec(F )
≡ vec(XFY >)︸ ︷︷ ︸O(mn(m + n)) time, O((m + n)2) space
(7)
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 39
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matrices
The cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)
An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 40
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)
Σ: a “Characteristic Matrix” where Σij = 1κ(λi◦µj)
An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 41
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)
An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 42
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)An interesting observation: rank(Σ) is usually a small constant!
Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 43
Bipartite Edge Prediction via Transductive Learning over Product GraphsOptimization
Optimization with Low-rank Constraint
Further speedup is possible by factorizing F into two low-rank matricesThe cost of each alternating gradient step is proportional torank(F ) · rank(Σ)Σ: a “Characteristic Matrix” where Σij = 1
κ(λi◦µj)An interesting observation: rank(Σ) is usually a small constant!Example: Diffusion process over the Cartesian PG
Σ =
e−(λ1+µ1) . . . e−(λ1+µn)
.... . .
...e−(λm+µ1) . . . e−(λm+µn)
=
e−λ1
...e−λm
[e−µ1 . . . e−µn]
=⇒ rank(Σ) = 1
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 44
Bipartite Edge Prediction via Transductive Learning over Product GraphsExperiment
Outline
1 Problem Description
2 The Proposed Framework
3 FormulationProduct Graph ConstructionGraph-based Transductive Learning
4 Optimization
5 Experiment
6 Conclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 45
Bipartite Edge Prediction via Transductive Learning over Product GraphsExperiment
Datasets and Baselines
Datasets
Dataset G H
Movielens-100K Users MoviesCora Publications Publications
Courses Courses Prerequisite Courses
BaselinesMC Matrix Completion.
Ignores the info of G and H.TK Tensor Kernel.
Implicitly construct PG, no transductionGRMC Graph Regularized Matrix Completion.
Transduction over G and H, no PG constructed
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 46
Bipartite Edge Prediction via Transductive Learning over Product GraphsExperiment
Results
Performance of several interesting combinations of ◦ and κ
Dataset Graph Transduction Graph Product MAP AUC ndcg@3
CoursesRandom Walk Tensor 0.488 0.827 0.461
Diffusion Cartesian 0.518 0.872 0.500von-Neumann Tensor 0.472 0.861 0.449von-Neumann Cartesian 0.366 0.531 0.359
Sigmoid Cartesian 0.443 0.617 0.431
CoraRandom Walk Tensor 0.222 0.764 0.205
Diffusion Cartesian 0.256 0.884 0.232von-Neumann Tensor 0.230 0.853 0.211von-Neumann Cartesian 0.218 0.633 0.212
Sigmoid Cartesian 0.192 0.443 0.188
MovieLensRandom Walk Tensor - - 0.7695
Diffusion Cartesian - - 0.7702von-Neumann Tensor - - 0.7720von-Neumann Cartesian - - 0.7624
Sigmoid Cartesian - - 0.7650
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 47
Bipartite Edge Prediction via Transductive Learning over Product GraphsExperiment
Results
Proposed method (Diff + Cartesian GP) v.s. Baselines
Dataset Method MAP AUC ndcg@3
CoursesMC 0.319 0.758 0.294
GRMC 0.366 0.777 0.343TK 0.449 0.810 0.446
Proposed 0.490 0.838 0.473
CoraMC 0.101 0.697 0.086
GRMC 0.115 0.702 0.101TK 0.248 0.872 0.231
Proposed 0.268 0.894 0.243
MovieLensMC - - 0.748
GRMC - - 0.752TK - - 0.718
Proposed - - 0.765
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 48
Bipartite Edge Prediction via Transductive Learning over Product GraphsConclusion
Outline
1 Problem Description
2 The Proposed Framework
3 FormulationProduct Graph ConstructionGraph-based Transductive Learning
4 Optimization
5 Experiment
6 Conclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 49
Bipartite Edge Prediction via Transductive Learning over Product GraphsConclusion
Conclusion
SummaryProblem Predicting the missing edges of a bipartite graph with
graph-structured vertex sets on both sides.Contribution A novel approach via transductive learning over product
graph, efficient algorithmic solution and good results.
On-going WorkExtend to k Graphs (k > 2)
Bipartite Graph → k-partite GraphEdge → Hyperedge
Determine the “optimal” graph product for any given problem.
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 50
Bipartite Edge Prediction via Transductive Learning over Product GraphsConclusion
Conclusion
SummaryProblem Predicting the missing edges of a bipartite graph with
graph-structured vertex sets on both sides.Contribution A novel approach via transductive learning over product
graph, efficient algorithmic solution and good results.
On-going WorkExtend to k Graphs (k > 2)
Bipartite Graph → k-partite GraphEdge → Hyperedge
Determine the “optimal” graph product for any given problem.
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 51
Bipartite Edge Prediction via Transductive Learning over Product GraphsConclusion
ICML 2015 Bipartite Edge Prediction via Transductive Learning over Product Graphs 52