Jean-Philippe Vert [email protected]/.../alicante.pdf ·...

62
Graph kernels and applications in chemoinformatics Jean-Philippe Vert [email protected] Center for Computational Biology Ecole des Mines de Paris, ParisTech International Workshop on Graph-based Representations in Pattern Recognition (Gbr’2007), Alicante, Spain, June 13, 2007. Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 1 / 49

Transcript of Jean-Philippe Vert [email protected]/.../alicante.pdf ·...

Page 1: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Graph kernels and applications in chemoinformatics

Jean-Philippe [email protected]

Center for Computational BiologyEcole des Mines de Paris, ParisTech

International Workshop on Graph-based Representations inPattern Recognition (Gbr’2007), Alicante, Spain, June 13, 2007.

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 1 / 49

Page 2: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Outline

1 Introduction

2 Complexity vs expressiveness trade-off

3 Walk kernels

4 Extensions

5 Applications

6 Conclusion

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 2 / 49

Page 3: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Outline

1 Introduction

2 Complexity vs expressiveness trade-off

3 Walk kernels

4 Extensions

5 Applications

6 Conclusion

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 3 / 49

Page 4: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Ligand-Based Virtual Screening

ObjectiveBuild models to predict biochemical properties of small molecules fromtheir structures.

Structures

C15H14ClN3O3

Propertiesbinding to a therapeutic target,pharmacokinetics (ADME),toxicity...

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 4 / 49

Page 5: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Example

inactive

active

active

active

inactive

inactive

NCI AIDS screen results (from http://cactus.nci.nih.gov).Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 5 / 49

Page 6: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Image retrieval and classification

From Harchaoui and Bach (2007).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 6 / 49

Page 7: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Formalization

The problemGiven a set of training instances (x1, y1), . . . , (xn, yn), where xi ’sare graphs and yi ’s are continuous or discrete variables of interest,Estimate a function

y = f (x)

where x is any graph to be labeled.This is a classical regression or pattern recognition problem overthe set of graphs.

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 7 / 49

Page 8: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Classical approaches

Classical approaches1 Map each molecule to a vector of fixed dimension.2 Apply an algorithm for regression or pattern recognition over

vectors.

Example: 2D structural keys in chemoinformaticsA vector indexed by a limited set of informative stuctures

O

N

O

O

OO

N N N

O O

O

+ NN, PLS, decision tree, ...Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 8 / 49

Page 9: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Classical approaches

Classical approaches1 Map each molecule to a vector of fixed dimension.2 Apply an algorithm for regression or pattern recognition over

vectors.

Example: 2D structural keys in chemoinformaticsA vector indexed by a limited set of informative stuctures

O

N

O

O

OO

N N N

O O

O

+ NN, PLS, decision tree, ...Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 8 / 49

Page 10: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Difficulties

O

N

O

O

OO

N N N

O O

O

Expressiveness of the features (which features are relevant?)Large dimension of the vector representation (memory storage,speed, statistical issues)

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 9 / 49

Page 11: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

The kernel trick

KernelLet Φ(x) be a vector representation of the graph xThe kernel between two graphs is defined by:

K (x , x ′) = Φ(x)>Φ(x ′) .

The trickMany linear algorithms for regression or pattern recognition canbe expressed only in terms of inner products between vectorsComputing the kernel is often more efficient than computing Φ(x),especially in high or infinite dimensions!

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 10 / 49

Page 12: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

The kernel trick

KernelLet Φ(x) be a vector representation of the graph xThe kernel between two graphs is defined by:

K (x , x ′) = Φ(x)>Φ(x ′) .

The trickMany linear algorithms for regression or pattern recognition canbe expressed only in terms of inner products between vectorsComputing the kernel is often more efficient than computing Φ(x),especially in high or infinite dimensions!

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 10 / 49

Page 13: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Kernel trick example: computing distances in thefeature space

φX F

x1

x2

x1

x2φ( )

φ( )d(x1,x2)

dK (x1, x2)2 = ‖Φ (x1)− Φ (x2) ‖2

H

= 〈Φ (x1)− Φ (x2) ,Φ (x1)− Φ (x2)〉H= 〈Φ (x1) ,Φ (x1)〉H + 〈Φ (x2) ,Φ (x2)〉H − 2 〈Φ (x1) ,Φ (x2)〉H

dK (x1, x2)2 = K (x1, x1) + K (x2, x2)− 2K (x1, x2)

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 11 / 49

Page 14: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Kernel trick example: computing distances in thefeature space

φX F

x1

x2

x1

x2φ( )

φ( )d(x1,x2)

dK (x1, x2)2 = ‖Φ (x1)− Φ (x2) ‖2

H

= 〈Φ (x1)− Φ (x2) ,Φ (x1)− Φ (x2)〉H= 〈Φ (x1) ,Φ (x1)〉H + 〈Φ (x2) ,Φ (x2)〉H − 2 〈Φ (x1) ,Φ (x2)〉H

dK (x1, x2)2 = K (x1, x1) + K (x2, x2)− 2K (x1, x2)

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 11 / 49

Page 15: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Positive Definite (p.d.) Kernels

DefinitionA positive definite (p.d.) kernel on a set X is a function K : X × X → Rsymmetric:

∀(x, x′

)∈ X 2, K

(x, x′

)= K

(x′, x

),

and which satisfies, for all N ∈ N, (x1, x2, . . . , xN) ∈ XN et(a1, a2, . . . , aN) ∈ RN :

N∑i=1

N∑j=1

aiajK(xi , xj

)≥ 0 .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 12 / 49

Page 16: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

P.d. kernels are inner products

Theorem (Aronszajn, 1950)K is a p.d. kernel on the set X if and only if there exists a Hilbert spaceH and a mapping

Φ : X 7→ H ,

such that, for any x, x′ in X :

K(x, x′

)=⟨Φ (x) ,Φ

(x′)⟩H .

φX H

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 13 / 49

Page 17: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Graph kernels

DefinitionA graph kernel K (x , x ′) is a p.d. kernel over the set of (labeled)graphs.It is equivalent to an embedding Φ : X 7→ H of the set of graphs toa Hilbert space through the relation:

K (x , x ′) = Φ(x)>Φ(x ′) .

φX H

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 14 / 49

Page 18: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Summary

The problemRegression and pattern recognition over labeled graphsClassical vector representation is both statistically andcomputationally challenging

The kernel approachBy defining a graph kernel we work implicitly in large (potentiallyinfinite!) dimensions:

Allows to consider a large number of potentially importantfeatures.No need to store explicitly the vectors (no problem of memorystorage or hash clashes) thanks to the kernel trickUse of regularized statistical algorithm (SVM, kernel PLS, kernelperceptron...)to handle the statistical problem of large dimension

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 15 / 49

Page 19: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Summary

The problemRegression and pattern recognition over labeled graphsClassical vector representation is both statistically andcomputationally challenging

The kernel approachBy defining a graph kernel we work implicitly in large (potentiallyinfinite!) dimensions:

Allows to consider a large number of potentially importantfeatures.No need to store explicitly the vectors (no problem of memorystorage or hash clashes) thanks to the kernel trickUse of regularized statistical algorithm (SVM, kernel PLS, kernelperceptron...)to handle the statistical problem of large dimension

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 15 / 49

Page 20: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Outline

1 Introduction

2 Complexity vs expressiveness trade-off

3 Walk kernels

4 Extensions

5 Applications

6 Conclusion

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 16 / 49

Page 21: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Expressiveness vs Complexity

Definition: Complete graph kernelsA graph kernel is complete if it separates non-isomorphic graphs, i.e.:

∀G1, G2 ∈ X , dK (G1, G2) = 0 =⇒ G1 ' G2 .

Equivalently, Φ(G1) 6= Φ(G1) if G1 and G2 are not isomorphic.

Expressiveness vs Complexity trade-offIf a graph kernel is not complete, then there is no hope to learn allpossible functions over X : the kernel is not expressive enough.On the other hand, kernel computation must be tractable, i.e., nomore than polynomial (with small degree) for practicalapplications.Can we define tractable and expressive graph kernels?

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 17 / 49

Page 22: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Expressiveness vs Complexity

Definition: Complete graph kernelsA graph kernel is complete if it separates non-isomorphic graphs, i.e.:

∀G1, G2 ∈ X , dK (G1, G2) = 0 =⇒ G1 ' G2 .

Equivalently, Φ(G1) 6= Φ(G1) if G1 and G2 are not isomorphic.

Expressiveness vs Complexity trade-offIf a graph kernel is not complete, then there is no hope to learn allpossible functions over X : the kernel is not expressive enough.On the other hand, kernel computation must be tractable, i.e., nomore than polynomial (with small degree) for practicalapplications.Can we define tractable and expressive graph kernels?

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 17 / 49

Page 23: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Complexity of complete kernels

Proposition (Gärtner et al., 2003)Computing any complete graph kernel is at least as hard as the graphisomorphism problem.

ProofFor any kernel K the complexity of computing dK is the same asthe complexity of computing K , because:

dK (G1, G2)2 = K (G1, G1) + K (G2, G2)− 2K (G1, G2) .

If K is a complete graph kernel, then computing dK solves thegraph isomorphism problem (dK (G1, G2) = 0 iff G1 ' G2). �

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 18 / 49

Page 24: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Complexity of complete kernels

Proposition (Gärtner et al., 2003)Computing any complete graph kernel is at least as hard as the graphisomorphism problem.

ProofFor any kernel K the complexity of computing dK is the same asthe complexity of computing K , because:

dK (G1, G2)2 = K (G1, G1) + K (G2, G2)− 2K (G1, G2) .

If K is a complete graph kernel, then computing dK solves thegraph isomorphism problem (dK (G1, G2) = 0 iff G1 ' G2). �

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 18 / 49

Page 25: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Subgraphs

DefinitionA subgraph of a graph (V , E) is a connected graph (V ′, E ′) withV ′ ⊂ V and E ′ ⊂ E .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 19 / 49

Page 26: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Subgraph kernel

DefinitionLet (λG)G∈X a set or nonnegative real-valued weightsFor any graph G ∈ X , let

∀H ∈ X , ΦH(G) =∣∣ {G′ is a subgraph of G : G′ ' H

} ∣∣ .

The subgraph kernel between any two graphs G1 and G2 ∈ X isdefined by:

Ksubgraph(G1, G2) =∑H∈X

λHΦH(G1)ΦH(G2) .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 20 / 49

Page 27: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Subgraph kernel complexity

Proposition (Gärtner et al., 2003)Computing the subgraph kernel is NP-hard.

Proof (1/2)Let Pn be the path graph with n vertices.Subgraphs of Pn are path graphs:

Φ(Pn) = neP1 + (n − 1)eP2 + . . . + ePn .

The vectors Φ(P1), . . . ,Φ(Pn) are linearly independent, therefore:

ePn =n∑

i=1

αiΦ(Pi) ,

where the coefficients αi can be found in polynomial time (solvinga n × n triangular system).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 21 / 49

Page 28: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Subgraph kernel complexity

Proposition (Gärtner et al., 2003)Computing the subgraph kernel is NP-hard.

Proof (1/2)Let Pn be the path graph with n vertices.Subgraphs of Pn are path graphs:

Φ(Pn) = neP1 + (n − 1)eP2 + . . . + ePn .

The vectors Φ(P1), . . . ,Φ(Pn) are linearly independent, therefore:

ePn =n∑

i=1

αiΦ(Pi) ,

where the coefficients αi can be found in polynomial time (solvinga n × n triangular system).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 21 / 49

Page 29: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Subgraph kernel complexity

Proposition (Gärtner et al., 2003)Computing the subgraph kernel is NP-hard.

Proof (2/2)If G is a graph with n vertices, then it has a path that visits eachnode exactly once (Hamiltonian path) if and only if Φ(G)>en > 0,i.e.,

Φ(G)>

(n∑

i=1

αiΦ(Pi)

)=

n∑i=1

αiKsubgraph(G, Pi) > 0 .

The decision problem whether a graph has a Hamiltonian path isNP-complete. �

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 22 / 49

Page 30: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Paths

DefinitionA path of a graph (V , E) is sequence of distinct verticesv1, . . . , vn ∈ V (i 6= j =⇒ vi 6= vj ) such that (vi , vi+1) ∈ E fori = 1, . . . , n − 1.Equivalently the paths are the linear subgraphs.

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 23 / 49

Page 31: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Path kernel

DefinitionThe path kernel is the subgraph kernel restricted to paths, i.e.,

Kpath(G1, G2) =∑H∈P

λHΦH(G1)ΦH(G2) ,

where P ⊂ X is the set of path graphs.

Proposition (Gärtner et al., 2003)Computing the path kernel is NP-hard.

ProofSame as the subgraph kernel. �

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 24 / 49

Page 32: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Path kernel

DefinitionThe path kernel is the subgraph kernel restricted to paths, i.e.,

Kpath(G1, G2) =∑H∈P

λHΦH(G1)ΦH(G2) ,

where P ⊂ X is the set of path graphs.

Proposition (Gärtner et al., 2003)Computing the path kernel is NP-hard.

ProofSame as the subgraph kernel. �

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 24 / 49

Page 33: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Path kernel

DefinitionThe path kernel is the subgraph kernel restricted to paths, i.e.,

Kpath(G1, G2) =∑H∈P

λHΦH(G1)ΦH(G2) ,

where P ⊂ X is the set of path graphs.

Proposition (Gärtner et al., 2003)Computing the path kernel is NP-hard.

ProofSame as the subgraph kernel. �

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 24 / 49

Page 34: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Summary

Expressiveness vs Complexity trade-offIt is intractable to compute complete graph kernels.It is intractable to compute the subgraph kernels.Restricting subgraphs to be linear does not help: it is alsointractable to compute the path kernel.One approach to define polynomial time computable graph kernelsis to have the feature space be made up of graphs homomorphicto subgraphs, e.g., to consider walks instead of paths.

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 25 / 49

Page 35: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Outline

1 Introduction

2 Complexity vs expressiveness trade-off

3 Walk kernels

4 Extensions

5 Applications

6 Conclusion

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 26 / 49

Page 36: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walks

DefinitionA walk of a graph (V , E) is sequence of v1, . . . , vn ∈ V such that(vi , vi+1) ∈ E for i = 1, . . . , n − 1.We note Wn(G) the set of walks with n vertices of the graph G,and W(G) the set of all walks.

etc...

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 27 / 49

Page 37: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Paths and walks

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 28 / 49

Page 38: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walk kernel

DefinitionLet Sn denote the set of all possible label sequences of walks oflength n (including vertices and edges labels), and S = ∪n≥1Sn.For any graph X let a weight λG(w) be associated to each walkw ∈ W(G).Let the feature vector Φ(G) = (Φs(G))s∈S be defined by:

Φs(G) =∑

w∈W(G)

λG(w)1 (s is the label sequence of w) .

A walk kernel is a graph kernel defined by:

Kwalk (G1, G2) =∑s∈S

Φs(G1)Φs(G2) .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 29 / 49

Page 39: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walk kernel

DefinitionLet Sn denote the set of all possible label sequences of walks oflength n (including vertices and edges labels), and S = ∪n≥1Sn.For any graph X let a weight λG(w) be associated to each walkw ∈ W(G).Let the feature vector Φ(G) = (Φs(G))s∈S be defined by:

Φs(G) =∑

w∈W(G)

λG(w)1 (s is the label sequence of w) .

A walk kernel is a graph kernel defined by:

Kwalk (G1, G2) =∑s∈S

Φs(G1)Φs(G2) .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 29 / 49

Page 40: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walk kernel examples

ExamplesThe nth-order walk kernel is the walk kernel with λG(w) = 1 if thelength of w is n, 0 otherwise. It compares two graphs through theircommon walks of length n.The random walk kernel is obtained with λG(w) = PG(w), wherePG is a Markov random walk on G. In that case we have:

K (G1, G2) = P(label(W1) = label(W2)) ,

where W1 and W2 are two independant random walks on G1 andG2, respectively (Kashima et al., 2003).The geometric walk kernel is obtained (when it converges) withλG(w) = β length(w), for β > 0. In that case the feature space is ofinfinite dimension (Gärtner et al., 2003).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 30 / 49

Page 41: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walk kernel examples

ExamplesThe nth-order walk kernel is the walk kernel with λG(w) = 1 if thelength of w is n, 0 otherwise. It compares two graphs through theircommon walks of length n.The random walk kernel is obtained with λG(w) = PG(w), wherePG is a Markov random walk on G. In that case we have:

K (G1, G2) = P(label(W1) = label(W2)) ,

where W1 and W2 are two independant random walks on G1 andG2, respectively (Kashima et al., 2003).The geometric walk kernel is obtained (when it converges) withλG(w) = β length(w), for β > 0. In that case the feature space is ofinfinite dimension (Gärtner et al., 2003).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 30 / 49

Page 42: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walk kernel examples

ExamplesThe nth-order walk kernel is the walk kernel with λG(w) = 1 if thelength of w is n, 0 otherwise. It compares two graphs through theircommon walks of length n.The random walk kernel is obtained with λG(w) = PG(w), wherePG is a Markov random walk on G. In that case we have:

K (G1, G2) = P(label(W1) = label(W2)) ,

where W1 and W2 are two independant random walks on G1 andG2, respectively (Kashima et al., 2003).The geometric walk kernel is obtained (when it converges) withλG(w) = β length(w), for β > 0. In that case the feature space is ofinfinite dimension (Gärtner et al., 2003).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 30 / 49

Page 43: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Computation of walk kernels

PropositionThese three kernels (nth-order, random and geometric walk kernels)can be computed efficiently in polynomial time.

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 31 / 49

Page 44: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Product graph

DefinitionLet G1 = (V1, E1) and G2 = (V2, E2) be two graphs with labeledvertices. The product graph G = G1 ×G2 is the graph G = (V , E) with:

1 V = {(v1, v2) ∈ V1 × V2 : v1 and v2 have the same label} ,2 E ={(

(v1, v2), (v ′1, v ′2))∈ V × V : (v1, v ′1) ∈ E1 and (v2, v ′2) ∈ E2

}.

G1 x G2

c

d e43

2

1 1b 2a 1d

1a 2b

3c

4c

2d

3e

4e

G1 G2

a b

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 32 / 49

Page 45: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walk kernel and product graph

LemmaThere is a bijection between:

1 The pairs of walks w1 ∈ Wn(G1) and w2 ∈ Wn(G2) with the samelabel sequences,

2 The walks on the product graph w ∈ Wn(G1 ×G2).

Corollary

Kwalk (G1, G2) =∑s∈S

Φs(G1)Φs(G2)

=∑

(w1,w2)∈W(G1)×W(G1)

λG1(w1)λG2(w2)1(l(w1) = l(w2))

=∑

w∈W(G1×G2)

λG1×G2(w) .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 33 / 49

Page 46: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Walk kernel and product graph

LemmaThere is a bijection between:

1 The pairs of walks w1 ∈ Wn(G1) and w2 ∈ Wn(G2) with the samelabel sequences,

2 The walks on the product graph w ∈ Wn(G1 ×G2).

Corollary

Kwalk (G1, G2) =∑s∈S

Φs(G1)Φs(G2)

=∑

(w1,w2)∈W(G1)×W(G1)

λG1(w1)λG2(w2)1(l(w1) = l(w2))

=∑

w∈W(G1×G2)

λG1×G2(w) .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 33 / 49

Page 47: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Computation of the nth-order walk kernel

For the nth-order walk kernel we have λG1×G2(w) = 1 if the lengthof w is n, 0 otherwise.Therefore:

Knth−order (G1, G2) =∑

w∈Wn(G1×G2)

1 .

Let A be the adjacency matrix of G1 ×G2. Then we get:

Knth−order (G1, G2) =∑i,j

[An]i,j = 1>An1 .

Computation in O(n|G1||G2|d1d2), where di is the maximumdegree of Gi .

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 34 / 49

Page 48: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Computation of random and geometric walk kernels

In both cases λG(w) for a walk w = v1 . . . vn can be decomposedas:

λG(v1 . . . vn) = λi(v1)n∏

i=2

λt(vi−1, vi) .

Let Λi be the vector of λi(v) and Λt be the matrix of λt(v , v ′):

Kwalk (G1, G2) =∞∑

n=1

∑w∈Wn(G1×G2)

λi(v1)n∏

i=2

λt(vi−1, vi)

=∞∑

n=0

ΛiΛnt 1

= Λi (I − Λt)−1 1

Computation in O(|G1|3|G2|3)

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 35 / 49

Page 49: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Outline

1 Introduction

2 Complexity vs expressiveness trade-off

3 Walk kernels

4 Extensions

5 Applications

6 Conclusion

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 36 / 49

Page 50: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Extensions 1: label enrichment

Atom relabebling with the Morgan index

Order 2 indices

N

O

O

1

1

1

1

1

1

1

1

1

N

O

O

2

2

2

3

2

3

1

1

2

N

O

O

4

4

5

7

5

5

3

3

4

No Morgan Indices Order 1 indices

Compromise between fingerprints and structural keys features.Other relabeling schemes are possible (graph coloring).Faster computation with more labels (less matches implies asmaller product graph).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 37 / 49

Page 51: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Extension 2: Non-tottering walk kernel

Tottering walksA tottering walk is a walk w = v1 . . . vn with vi = vi+2 for some i .

Tottering

Non−tottering

Tottering walks seem irrelevant for many applicationsFocusing on non-tottering walks is a way to get closer to the pathkernel (e.g., equivalent on trees).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 38 / 49

Page 52: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Computation of the non-tottering walk kernel (Mahé etal., 2005)

Second-order Markov random walk to prevent tottering walksWritten as a first-order Markov random walk on an augmentedgraphNormal walk kernel on the augmented graph (which is always adirected graph).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 39 / 49

Page 53: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Extension 2: Subtree kernels

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 40 / 49

Page 54: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Example: Tree-like fragments of molecules

.

.

.

.

.

.

.

.

.N

N

C

CO

C

.

.

. C

O

C

N

C

N O

C

N CN C C

N

N

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 41 / 49

Page 55: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Computation of the subtree kernel

Like the walk kernel, amounts to compute the (weighted) numberof subtrees in the product graph.Recursion: if T (v , n) denotes the weighted number of subtrees ofdepth n rooted at the vertex v , then:

T (v , n + 1) =∑

R⊂N (v)

∏v ′∈R

λt(v , v ′)T (v ′, n) ,

where N (v) is the set of neighbors of v .Can be combined with the non-tottering graph transformation aspreprocessing to obtain the non-tottering subtree kernel.

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 42 / 49

Page 56: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Outline

1 Introduction

2 Complexity vs expressiveness trade-off

3 Walk kernels

4 Extensions

5 Applications

6 Conclusion

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 43 / 49

Page 57: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Chemoinformatics (Mahé et al., 2004)

MUTAG datasetaromatic/hetero-aromatic compoundshigh mutagenic activity /no mutagenic activity, assayed inSalmonella typhimurium.188 compouunds: 125 + / 63 -

Results10-fold cross-validation accuracy

Method AccuracyProgol1 81.4%2D kernel 91.2%

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 44 / 49

Page 58: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Subtree kernels

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 176

77

78

79

80

81

82

83

84

85

86

lambda

AU

C

BRANCH−BASED, NO−TOTTERING

h = 2h = 3h = 4h = 5h = 6h = 7h = 8h = 9h = 10

AUC as a function of the branching factors for different tree depths(from Mahé et al., 2007).

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 45 / 49

Page 59: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Image classification (Harchaoui and Bach, 2007)

COREL14 dataset1400 natural images in 14 classesCompare kernel between histograms (H), walk kernel (W), subtreekernel (TW), weighted subtree kernel (wTW), and a combination(M).

H W TW wTW M

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

Tes

t err

or

Kernels

Performance comparison on Corel14

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 46 / 49

Page 60: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Outline

1 Introduction

2 Complexity vs expressiveness trade-off

3 Walk kernels

4 Extensions

5 Applications

6 Conclusion

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 47 / 49

Page 61: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

Conclusion

What we sawExtension of machine learning algorithms to graph data throughthe definition of positive definite kernels for graphsThe 2D kernel for molecule extends classical fingerprint-basedapproches. It solves the problem of bit clashes, allows infinitefingerprints and various extensions.Increasingly used in real-world applications.

Open questionHow to design / choose / learn a kernel for a given application inpractice?How to improve scalability of kernel methods + graph kernels tolarge datasets?

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 48 / 49

Page 62: Jean-Philippe Vert Jean-Philippe.Vert@ensmpmembers.cbio.mines-paristech.fr/.../alicante.pdf · 2014. 10. 10. · Graph kernels and applications in chemoinformatics Jean-Philippe Vert

References

Kashima, H., Tsuda, K., and Inokuchi, A. Marginalized kernels between labeledgraphs. Proceedings of the 20th ICML, 2003, pp. 321-328.

T. Gärtner, P. Flach, and S. Wrobel. On graph kernels: hardness results andefficient alternatives. Proceedings of COLT, p.129–143, Springer, 2003.

J. Ramon and T. Gärtner. Expressivity versus Efficiency of Graph Kernels. FirstInternational Workshop on Mining Graphs, Trees and Sequences, 2003.

P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert. Graph kernels formolecular structure-activity relationship analysis with SVM. J. Chem. Inf. Model.,45(4):939-951, 2005.

P. Mahé and J.-P. Vert. Graph kernels based on tree patterns for molecules.Technical report HAL:ccsd-00095488, 2006.

P. Mahé. Kernel design for virtual screening of small molecules with supportvector machines. PhD thesis, Ecole des Mines de Paris, 2006.

Z. Harchaoui and F. Bach. Image classification with segmentation graph kernels.IEEE Computer Society Conference on Computer Vision and PatternRecognition (CVPR), 2007.

Open-source kernels for chemoinformatics:http://chemcpp.sourceforge.net/

Jean-Philippe Vert (ParisTech) Graph kernels and chemoinformatics Gbr’2007 49 / 49