A Binary Linear Programming Formulation of the Graph Edit Distance

A Binary Linear Programming Formulation of the Graph Edit Distance

Presented by Shihao Ji

Duke University Machine Learning Group

July 17, 2006

Authors: Derek Justice & Alfred Hero (PAMI 2006)

• Introduction to Graph Matching

• Proposed Method (binary linear program)

• Experimental Results (chemical graph matching)

Outline

Graph Matching

• Objective: matching a sample input graph to a database of known prototype graphs.

Graph Matching (cont’d)

• A real example: face identification


Key issues: (1) representative graph generation

(a) facial graph representations

(b) chemical graphs

Maximum Common Subgraph (MCS)

Graph Edit Distance (GED) Enumeration procedures (for small graphs) Probabilistic models (MAP estimates) Binary Linear Programming (BLP)


Key issues: (2) graph distance metrics

• Basic idea: define graph edit operations (such as insertion or deletion or relabeling of a vertex) along with costs associated with each operation.

• The GED between two graphs is the cost associated with the least costly series of edit operations needed to make the two graph isomorphic.

• Key issues: how to find the least costly series of edit operations? how to define edit costs?

Graph Edit Distance

Graph Edit Distance (cont’d)

• How to compute the distance between G0 and G1?

• Edit Grid

• Isomorphisms of G0 on the edit grid

• State Vectors


standard placement

• Definition: (if the cost function c is a metric)

• Objective function: binary linear program (NP-hard!!!)

Graph Edit Distance (Cont’d)

• Lower bound: linear program (polynomial time)

• Upper bound: assignment problem (polynomial time)


Edit Cost Selection

• Goal: suppose there is a set of prototype graphs {Gi} i=1,…,N

and we classify a sample graph G0 by a nearest neighbor classifier in the metric space defined by the graph edit distance.

• Prior informaiton: the prototypes should be roughly uniformly distributed in the metric space of graphs.

• Why: it minimizes the worst case classification error since it equalizes the probability of error under a nearest neighbor classifier.

Edit Cost Selection (cont’d)

• Objective: minimize the variance of pairwise NN distances• Define unit cost function, i.e., c(0,1)=1, c(,)=1, c(,)=0

• Solve the BLP (with unit cost) and find the NN pair

• Construct Hk,i = the number of ith edit operation for the kth NN pair

•

• Objective function: (convex optimization)

Experimental Results

• Chemical Graph Recognition

1. edge edit2. vertex deletion 3. vertex insertion 4. vertex relabeling5. random

(a) original graph

Experiments Results (cont’d)

(b) example perturbed graphs


• Optimal Edit Costs

A: GEDo B: GEDu C: MCS1D: MCS2


• Classification Results

• Present a binary linear programming formulation of the graph edit distance;

• Offer a minimum variance method for choosing a cost metric;

• Demonstrate the utility of the new method in the context of a chemical graph recognition.

Conclusion

A Binary Linear Programming Formulation of the Graph Edit Distance

Documents

Transcript of A Binary Linear Programming Formulation of the Graph Edit Distance