Cross-Graph Learning of Multi-Relational Associationshanxiaol/slides/icml2016-liu.pdfStructure...

Cross-Graph Learning of Multi-Relational

Associations

Hanxiao Liu, Yiming YangCarnegie Mellon University

{hanxiaol, yiming}@cs.cmu.edu

June 22, 2016

1 / 24

Outline

Task Description

New Contributions

Framework

Scalable Inference

Empirical Evaluation

Summary

2 / 24

Task Description

Goal: Predict associations among heterogeneous graphs.

ProteinCompound

Structure Similarity Sequence Similarity

Interact

(a) Drug-Target Interaction

Author Venue

Coauthorship

Citation

Shared Foci

WritePublish

Attend

(b) Citation Network

“John publish a reinforcement learning paper at ICML.”(John,RL Paper,ICML)

3 / 24

Outline

Task Description

New Contributions

Framework

Scalable Inference

Summary

4 / 24

New Contributions

I A unified framework to integrating heterogeneousinformation in multiple graphs.

I Transductive learning to leverage both labeled data(sparse) and unlabeled data (massive).

I A convex approximation for the scalable inference overthe combinatorial number of possible tuples.

5 / 24

Outline

Task Description

New Contributions

Framework

Scalable Inference

Summary

6 / 24

Framework

Notation

I G(1), G(2), . . . , G(J) are individual graphs;

I nj is the #nodes in G(j);

I (i1, i2, . . . , iJ) is a tuple (multi-relation);

I fi1,i2,...,iJ is the predicted score for the tuple;

I f is a tensor in Rn1×n2×···×nJ .

7 / 24

Framework

Product Graph (P) induced from G(1), . . . , G(J).

(︸︷︷︸G(1)

, ︸︷︷︸G(2)

, ︸︷︷︸G(3)

Tensor product: P(G(1), G(2), G(3)) = G(1) ⊗G(2) ⊗G(3)

8 / 24

Framework

Product Graph (P) induced from G(1), . . . , G(J).

(︸︷︷︸G(1)

, ︸︷︷︸G(2)

, ︸︷︷︸G(3)

Tensor product: P(G(1), G(2), G(3)) = G(1) ⊗G(2) ⊗G(3)

8 / 24

Framework

Why product graph?

I Mapping heterogeneous graphs onto a unified graph forlabel propagation (transductive learning).

9 / 24

Framework

Assumingvec(f) ∼ N (0,P) (1)

which implies:

− log p (f |P) ∝ vec(f)>P−1vec(f) := ‖f‖2P (2)

Optimization problem

`O(f) +γ

2‖f‖2P (3)

10 / 24

Framework

which implies:

`O(f) +γ

2‖f‖2P (3)

10 / 24

Framework

which implies:

`O(f) +γ

2‖f‖2P (3)

10 / 24

Framework

For computational tractability, we focus on the spectralgraph product family of P.

Spectral Graph Product (SGP)The eigensystem of Pκ

(G(1), . . . , G(J)

)is parametrized by

the eigensystems of individual graphs, i.e.,{κ(λi1 , . . . , λiJ

),⊗j

}i1,...,iJ

λij/vij is the ij-th eigenvalue/eigenvector of the j-th graph.

11 / 24

Framework

Nice properties of SGP:

Subsuming basic operations

κ(x, y) = x× y =⇒ Pκ(G,H) = G⊗H Tensor (5)

κ(x, y) = x+ y =⇒ Pκ(G,H) = G⊕H Cartesian (6)

Supporting graph diffusions

σHeat(Pκ) =I + Pκ +1

κ + · · · = Peκ (7)

σvon−Neumann(Pκ) =I + Pκ + P2κ + · · · = P 1

1−κ(8)

Order-insensitive: If κ is commutative, then SGP iscommutative (up to graph isomorphism).

12 / 24

Framework

κ + · · · = Peκ (7)

1−κ(8)

12 / 24

Framework

κ + · · · = Peκ (7)

1−κ(8)

12 / 24

Outline

Task Description

New Contributions

Framework

Scalable Inference

Summary

13 / 24

Scalable Inference

For general GP, the semi-norm is computed as

‖f‖2P = vec(f)>P−1vec(f) (9)

For SGP, Pκ no longer has to be explicitly computed.

‖f‖2Pκ=

n1,n2,...,nJ∑i1,i2,...,iJ

f(vi1 , . . . , viJ

)2κ(λi1 , . . . , λiJ

) (10)

I f(vi1 , vi2 , . . . , viJ ) = f ×1 vi1 ×2 vi2 · · · ×J viJI However, even evaluating (10) is expensive.

14 / 24

Scalable Inference

‖f‖2P = vec(f)>P−1vec(f) (9)

‖f‖2Pκ=

n1,n2,...,nJ∑i1,i2,...,iJ

f(vi1 , . . . , viJ

)2κ(λi1 , . . . , λiJ

) (10)

14 / 24

Scalable Inference

‖f‖2P = vec(f)>P−1vec(f) (9)

‖f‖2Pκ=

n1,n2,...,nJ∑i1,i2,...,iJ

f(vi1 , . . . , viJ

)2κ(λi1 , . . . , λiJ

) (10)

14 / 24

Scalable Inference

Using low-rank SGP

I f lies in the linear span of the eigenvectors of P.

I Eigenvectors of high volatility can be pruned away.

Figure : Eigenvectors of G (blue), H (red) and P(G,H).

15 / 24

Scalable Inference

Using low-rank SGP

I f lies in the linear span of the eigenvectors of P.

I Eigenvectors of high volatility can be pruned away.

Figure : Eigenvectors of G (blue), H (red) and P(G,H).

15 / 24

Scalable Inference

Restrict f in the linear span of “smooth” bases of P.

f(α) =

d1,d2,··· ,dJ∑i1,i2,··· ,iJ=1

αi1,i2,··· ,iJ⊗j

vij (11)

where the core tensor α ∈ Rd1×d2×···×dJ , dj � nj.

The semi-norm becomes

‖f(α)‖2Pκ=

d1,d2,··· ,dJ∑i1,i2,...,iJ=1

α2i1,i2,··· ,iJ

κ(λi1 , λi2 , . . . , λiJ

) (12)

We then optimize w.r.t. α instead of f . Parameter size:∏j nj →

∏j dj.

16 / 24

Scalable Inference

Restrict f in the linear span of “smooth” bases of P.

f(α) =

d1,d2,··· ,dJ∑i1,i2,··· ,iJ=1

αi1,i2,··· ,iJ⊗j

vij (11)

where the core tensor α ∈ Rd1×d2×···×dJ , dj � nj.

The semi-norm becomes

‖f(α)‖2Pκ=

d1,d2,··· ,dJ∑i1,i2,...,iJ=1

α2i1,i2,··· ,iJ

κ(λi1 , λi2 , . . . , λiJ

) (12)

We then optimize w.r.t. α instead of f . Parameter size:∏j nj →

∏j dj.

16 / 24

Scalable Inference

Figure : Tucker Decomposition, where α is the core tensor.

17 / 24

Scalable Inference

Revised optimization objective

minα∈Rd1×d2···×dJ

`O (f(α)) +γ

2‖f(α)‖2Pκ

Ranking loss function

`O(f) =

∑(i1, . . . , iJ ) ∈ O(i′1, . . . , i

′J ) ∈ O

(fi1...iJ − fi′1...i′J

|O × O|(14)

∇α =∂`O∂f

(∂fi1,...,iJ∂α

−∂fi′1,...,i′J∂α

)+ γα� κ (15)

Tensor algebras are carried out on GPU.

18 / 24

Scalable Inference

`O (f(α)) +γ

2‖f(α)‖2Pκ

`O(f) =

∑(i1, . . . , iJ ) ∈ O(i′1, . . . , i

′J ) ∈ O

(fi1...iJ − fi′1...i′J

|O × O|(14)

∇α =∂`O∂f

(∂fi1,...,iJ∂α

−∂fi′1,...,i′J∂α

)+ γα� κ (15)

18 / 24

Scalable Inference

`O (f(α)) +γ

2‖f(α)‖2Pκ

`O(f) =

∑(i1, . . . , iJ ) ∈ O(i′1, . . . , i

′J ) ∈ O

(fi1...iJ − fi′1...i′J

|O × O|(14)

∇α =∂`O∂f

(∂fi1,...,iJ∂α

−∂fi′1,...,i′J∂α

)+ γα� κ (15)

18 / 24

Outline

Task Description

New Contributions

Framework

Scalable Inference

Summary

19 / 24

Datasets

Enzyme 445 compounds, 664 proteins.

DBLP 34K authors, 11K papers, 22 venues.

Representative Baselines

TF/GRTF Tensor Factorization/Graph-Regularized TF

NN One-class Nearest Neighbor

RSVM Ranking SVMs

LTKM Low-Rank Tensor Kernel Machines

20 / 24

Datasets

Enzyme 445 compounds, 664 proteins.

DBLP 34K authors, 11K papers, 22 venues.

Representative Baselines

TF/GRTF Tensor Factorization/Graph-Regularized TF

NN One-class Nearest Neighbor

RSVM Ranking SVMs

LTKM Low-Rank Tensor Kernel Machines

20 / 24

Our method: “TOP” (blue).

12.5 25 50 100

Training Size (%)

12.5 25 50 100AU

CTraining Size (%)

TOPLTKM

NNRSVM

TFGRTF

12.5 25 50 100

Training Size (%)

Figure : Performance on Enzyme (above) and DBLP (below).

12.5 25 50 100

Training Size (%)

0.5 0.55

0.6 0.65

0.7 0.75

0.8 0.85

0.9 0.95

12.5 25 50 100

Training Size (%)

TOPLTKM

NNRSVM

TFGRTF

12.5 25 50 100

Training Size (%)

21 / 24

Outline

Task Description

New Contributions

Framework

Scalable Inference

Summary

22 / 24

Summary

Contribution

Future/On-going Work

I Learning structured associations.

I Larger problems: Microsoft Academic Graph (37 GB).

23 / 24

Summary

Contribution

Future/On-going Work

I Learning structured associations.

I Larger problems: Microsoft Academic Graph (37 GB).

23 / 24

Thank You

24 / 24

Cross-Graph Learning of Multi-Relational Associationshanxiaol/slides/icml2016-liu.pdfStructure...

Documents

Transcript of Cross-Graph Learning of Multi-Relational Associationshanxiaol/slides/icml2016-liu.pdfStructure...