SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong...

21
SPIN: Mining Maximal SPIN: Mining Maximal Frequent Subgraphs Frequent Subgraphs from Graph Databases from Graph Databases Jun Huan, Wei Wang, Jan P Jun Huan, Wei Wang, Jan P rins, Jiong Yang rins, Jiong Yang KDD 2004 KDD 2004

Transcript of SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong...

SPIN: Mining Maximal SPIN: Mining Maximal Frequent Subgraphs Frequent Subgraphs

from Graph Databasesfrom Graph Databases

Jun Huan, Wei Wang, Jan Prins, Jun Huan, Wei Wang, Jan Prins, Jiong YangJiong YangKDD 2004KDD 2004

IntroductionIntroduction

►Graphs model a relations among dataGraphs model a relations among data Inter-disciplinary researchInter-disciplinary research

►Huge number of recurring patternsHuge number of recurring patterns►To mining only maximal frequent To mining only maximal frequent

subgraphs.subgraphs. None of its super graphs are frequentNone of its super graphs are frequent

AdvantagesAdvantages

►Reducing the total number of mined subgReducing the total number of mined subgraphsraphs Saving space and analysis effortSaving space and analysis effort

►Reducing mining timeReducing mining time►Non-maximal frequent subgraph can be rNon-maximal frequent subgraph can be r

econstructed.econstructed.►Maximal frequent subgraphs are of most iMaximal frequent subgraphs are of most i

nterest in some appliations.nterest in some appliations.

AlgorithmAlgorithm

►Mining all frequent trees from a general grMining all frequent trees from a general graph database.aph database. Tree normalization is simpler than graph.Tree normalization is simpler than graph. In certain applications, most of the frequent sIn certain applications, most of the frequent s

ubgraphs are really trees.ubgraphs are really trees. Use current subgraph mining algorithmUse current subgraph mining algorithm Mining subtrees from a forestMining subtrees from a forest

AlgorithmAlgorithm►Reconstruct all maximal subgraphs from tReconstruct all maximal subgraphs from t

he mined trees.he mined trees. For each frequent tree T, find all frequent subFor each frequent tree T, find all frequent sub

graphs whose canonical spanning tree are isographs whose canonical spanning tree are isomorphic to Tmorphic to T

Enumerate the equvalence class of a tree TEnumerate the equvalence class of a tree T Maximal subgraph miningMaximal subgraph mining

Tree-based Equivalence Tree-based Equivalence ClassesClasses

►A subtree T is a A subtree T is a spanning treespanning tree of G if T cont of G if T contains ains all nodesall nodes in G. in G. MaximalMaximal one: one: canonicalcanonical spanning tree spanning tree

►Group all frequent subgraphs in to equivalGroup all frequent subgraphs in to equivalence classes based on spanning trees.ence classes based on spanning trees.

Spanning treeSpanning tree

Tree-based Equivalence Tree-based Equivalence ClassesClasses

back

12 singletons group12 singletons group

b

a

y

b

a

x

a

a

y

a

a

x

a

y

b

a

y

a

x

b

a

y

a

x

b

a

x

a

x

a

a

y

a

x

b

a

x

a

y

a

y

b

a

y

a

x

a

y

b

a

y

a

x

a

y

a

b

x

a

x

Enumerating Graphs from Enumerating Graphs from TreesTrees

► G C :{eG C :{e11,e,e22,…,e,…,enn}} If frequent -> edge C (candidate set)If frequent -> edge C (candidate set)

► Search space of GSearch space of G : : G:C ={G+y|y 2G:C ={G+y|y 2CC}}

GO

OptimizationsOptimizations

►Removing a set of frequent subgraphs thaRemoving a set of frequent subgraphs that can not be maximal from a search spacet can not be maximal from a search space

►Locally Locally maximalmaximal :: frequent subgraph G is frequent subgraph G is maximal in its maximal in its equivalence classequivalence class

►GloballyGlobally maximal maximal :: maximal frequent in maximal frequent in a graph databasea graph database

►Avoid enumerating subgraphs which are Avoid enumerating subgraphs which are nnotot locally maximallocally maximal..

Bottom-up PruningBottom-up Pruning

►G’ = G C G’ = G C G’ is G’ is frequentfrequent : each graph in search space is a : each graph in search space is a

subgraph of G’ and not maximalsubgraph of G’ and not maximal

Tail ShrinkTail Shrink► EmbeddingEmbedding of G in G’ is a subgraph of G in G’ is a subgraph

isomorphism f from G to G’isomorphism f from G to G’ Two embeddings of L in PTwo embeddings of L in P

l1->P1, l2->P2, l3->P3, l4->P4

l1->P1, l2->P3 ,l3->P2 ,l4->P4

go

Tail ShrinkTail Shrink

►candidate edge (i, j, ecandidate edge (i, j, ell) is) is associative associative to a graph Gto a graph G It appears in It appears in every embeddingevery embedding of G in a of G in a

graph databasesgraph databases

► If a tree T contains a set of associative If a tree T contains a set of associative edges, any edges, any maximal frequent graphmaximal frequent graph G, G, a superset of T, must contains a superset of T, must contains allall associative edges.associative edges.

Tail ShrinkTail Shrink► Remove associative edges from candidate Remove associative edges from candidate

sets and augment them to T without sets and augment them to T without missing any maximal onesmissing any maximal ones Reducing the search spaceReducing the search space Prune the entire equivalences class in certain Prune the entire equivalences class in certain

casescases

► A set of associative edges C of a tree T isA set of associative edges C of a tree T is lethallethal G’ = T C has a G’ = T C has a canonical spanning treecanonical spanning tree

differentdifferent from that of T from that of T

go

External-Edge PruningExternal-Edge Pruning► Remove one equivalence class without any knoRemove one equivalence class without any kno

wledge about its candidate edgeswledge about its candidate edges► External-edgeExternal-edge for a graph G: it connects a node i for a graph G: it connects a node i

n G and a node n G and a node notnot in G in G► (i, e(i, ell, v, vll) is associative to a graph G) is associative to a graph G

Every embedding f of G in a graph G’, G’ has a nodEvery embedding f of G in a graph G’, G’ has a node v with the label ve v with the label vll

v connects to the node f(i) with an edge label ev connects to the node f(i) with an edge label ell in G in G’’

Not exist node j V[G] such that v = f(j)Not exist node j V[G] such that v = f(j)

Associative external edgesAssociative external edges

ExperimentsExperiments►2.8GHz Pentium Xeon, 2.8GHz Pentium Xeon, ►512KB L2 cache,2GB main memory512KB L2 cache,2GB main memory►Red Hat Linux 7.3Red Hat Linux 7.3►C++ Programming languageC++ Programming language

Synthetic DatasetSynthetic DatasetD10KT30L200I11V4E4

DTP CA data setDTP CA data set

DTP CM data setDTP CM data set