Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo...
-
Upload
isabel-page -
Category
Documents
-
view
214 -
download
0
Transcript of Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo...
![Page 1: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/1.jpg)
Frequent Subgraph Pattern MiningFrequent Subgraph Pattern Miningon Uncertain Graph Dataon Uncertain Graph Data
Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang
Harbin Institute of Technology, China
CIKM’09, Hong KongNov 4, 2009
![Page 2: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/2.jpg)
Outline
Background
Problem Definition
Algorithm
Experimental Results
Conclusions
![Page 3: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/3.jpg)
Background
Graph mining has played an important role in a range of real world applications. medicines: structures of molecules bioinformatics: biological networks technologies: WWW social science: social networks many others
![Page 4: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/4.jpg)
Directions of Graph Mining
Patterns of graphse.g., [Yan et al. ICDM’02]
Privacy of graphse.g., [Zou et al. VLDB’09]
Uncertainties of graphs
Models of graphse.g. [Leskovec et al. KDD’05]
Evolution of graphse.g., [Faloutsos et al. SIGMOD’07]
![Page 5: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/5.jpg)
Uncertainties of Graphs: Example I Protein-Protein Interaction (PPI) Networks
Vertices: proteins Edges: interactions between proteins Uncertainties: probabilities of interactions really existing
The data are taken from the STRING Database (http://string-db.org).
NTG1
FET3
TIF34
SMT3
RPC40
0.375
0.639
0.651
0.147
0.651
0.639
0.867
0.698
RAD59
![Page 6: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/6.jpg)
Uncertainties of Graphs: Example II Topologies of wireless sensor networks (WSNs)
Vertices: sensor nodes Edges: wireless links between sensor nodes Uncertainties: probabilities of wireless links functioning at an
y given time
0.75
0.92
0.88
0.95
0.69
![Page 7: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/7.jpg)
Outline
Background
Problem Definition
Algorithm
Experimental Results
Conclusions
![Page 8: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/8.jpg)
Preliminaries
BB
x y
A
The support of S = the number of graphs containing S the total number of graphs
BB
x x
A
BB
A
x y
z
graph G2
B
B B
B
A x
x y
y
graph G1
support = 1.0
support = 0.5
Graph Database
Subgraph Pattern
![Page 9: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/9.jpg)
Frequent Subgraph Pattern Mining Problem
Input: a graph database D, and a support threshold minsup Output: all subgraph patterns with support no less than minsup
FSP mining on biological networks (e.g., PPI networks) is an important tool for discovering functional modules [Koyutürk et al. Bioinformatics 04, Turanalp et al. BMC Bioinformatics 08].
PPI networks are subject to uncertainties. How do we define support?
![Page 10: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/10.jpg)
Model of Uncertain Graphs
B
B B
B
A x
x y
y0.5
0.60.7
0.8
B
B B
B
A x
x
y
exist in this
form
(1 – 0.5) * 0.6 * 0.7 * 0.8 = 0.168
0.5 * (1 – 0.6) * 0.7 * 0.8 = 0.112
Uncertain Graph
B
B B
B
A
x y
y
exist in
this
form
Implicated Graph
![Page 11: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/11.jpg)
Model of Uncertain Graphs (Cont’d)
Theorem: An uncertain graph represents a probability distribution over all its implicated graphs.
![Page 12: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/12.jpg)
Uncertain Graph DatabasesB
B B
B
A x
x y
y0.5
0.60.7
0.8
BB
A
x y
z
0.8 0.1
0.7
Uncertain graph G1 Uncertain graph G2
Totally, 24 * 23 = 128 implicated graph databases.
B
B B
B
A
x y
y
BB
A
x y
exist in this form
Implicated graph of G1
Implicated graph of G2
Theorem: An uncertain graph DB represents a probability distribution over all its implicated graph DBs.
((1 – 0.5) * 0.6 * 0.7 * 0.8) * (0.8 * 0.1 * (1 – 0.7)) = 4.032 * 10-3
Implicated Graph Database
![Page 13: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/13.jpg)
Expected Support
D uncertain graph DB
d1 d2 dn
implicating
impl
icat
ing implicating
……p1 = Pr(D implicates d1) p2 = Pr(D implicates d2) pn = Pr(D implicates dn)
s1 = support of S in d1 s2 = support of S in d2 sn = support of S in dn
The expected support of S is
n
iii psSesup
1
)(
![Page 14: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/14.jpg)
FSP Mining Problem on Uncertain Graphs
Input: an uncertain graph database D, and an expected support threshold minsup
Output: all subgraph patterns with expected support no less than minsup
It is #P-hard to count the number of frequent subgraph patterns. Reduction from the problem of counting the number of satisf
ying truth assignments of a monotone k-CNF formula.
The FSP mining problem on uncertain graphs is NP-hard.
![Page 15: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/15.jpg)
Outline
Background
Problem Definition
Algorithm
Experimental Results
Conclusions
![Page 16: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/16.jpg)
Approximation Method It is #P-hard to compute the expected support of a subgraph patt
ern.
We develop an approximation method to find an approximate set of frequent subgraph patterns. Let e (0 < e < 1) be a relative error tolerance.
expected supportminsup(1-e) minsup
Output
Discard
Arbitrary
10
![Page 17: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/17.jpg)
Objective I Difficulty I: # of frequent subgraph patterns is exponentially larg
e.
Objective I: Examine subgraph patterns as efficiently as possible to find all frequent ones.
![Page 18: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/18.jpg)
Method for Objectives I Step 1: Build a search tree T of subgraph patterns. Step 2: Examine subgraph patterns in T in depth-first order
If S is infrequent, then all its descendents can be pruned.B
B B
B
A x
x y
y0.5
0.6
0.7
0.8
BB
A
x y
z
0.8
0.1
0.7
Uncertain graph G1
Uncertain graph G2
expected support
minsup(1-e) minsup
Output
Discard Arbitrary
10
![Page 19: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/19.jpg)
Objective II Difficulty II: It is #P-hard to compute the expected support esup
(S) of a subgraph pattern S.
Objective II: Make the following judgments without computing esup(S) exactly. If esup(S) is surely not in the green region, then discard. If esup(S) is probable to be in the green region and surely not
in the red region, then output.
expected supportminsup(1-e) minsup
Output
Discard
Arbitrary
10
![Page 20: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/20.jpg)
Method for Objective II Step 1: Approximate esup(S) by an interval [l, u] such that esup
(S)∈[l, u]. Step 2: Decide whether S can be output or not by testing the foll
owing conditions.
Output
Discard
Shrink
expected supportminsup(1-e) minsup 10
![Page 21: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/21.jpg)
Approximating esup(S) by [l,u]
ISGII
IGGS
in contained is andby implicated is :
) implicates Pr()in occurs Pr(
||
1
)in occurs Pr(||
1)sup(
D
iiGS
DSe
||
1||
1 D
iilD
l
A subgraph pattern S occurs in an uncertain graph G if S is contained in at least one implicated graph of G.
Algorithm Approximate esup(S) by [l,u]
Step 1: For each uncertain graph Gi in D, approximate Pr(S occurs in Gi) by an interval [li, ui] of width at most
e*minsup.
Step 2:
||
1||
1 D
iiuD
u
![Page 22: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/22.jpg)
Approximate Pr(S occurs in Gi) by [li, ui]B
B B
B
A x
x y
y0.5
0.60.7
0.8
uncertain graph Gi
BB
x y
A
pattern S
(x1)
(x2)
(x4)
(x3)
Step 1: Find all embeddings of S in Gi. 4 embeddings
Step 2: Assign boolean variables to the edges in the embeddings.Pr(x1) = 0.5, Pr(x2) = 0.6, Pr(x3) = 0.7, Pr(x4) = 0.8.
Step 3: Construct a conjunctive formula for each embedding.C1 = (x1 ^ x2), C2 = (x1 ^ x4), C3 = (x2 ^ x3), C4 = (x3 ^ x4).
Step 4: Construct a DNF formula.F = C1 V C2 V C3 V C4.
Step 5: Estimate Pr(F = TRUE) by p using Karp & Luby’s Markov-Chain Monte-Carlo method with absolute error e*minsup/2 and confidence d (d ∈[0,1]).
Step 6: [li, ui] = [p - e*minsup/2, p + e*minsup/2].
![Page 23: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/23.jpg)
Outline
Background
Problem Definition
Algorithm
Experimental Results
Conclusions
![Page 25: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/25.jpg)
Time Efficiency
![Page 26: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/26.jpg)
Approximation Quality
![Page 27: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/27.jpg)
Scalability
![Page 28: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/28.jpg)
Conclusions
A new model of uncertain graph data has been proposed.
The frequent subgraph pattern mining problem on uncertain graph data has been formalized.
The computational complexity of the problem has been formally proved to be NP-hard.
An approximate mining algorithm has been proposed.
The proposed algorithm has high efficiency, high approximation quality, and high scalability.
![Page 29: Frequent Subgraph Pattern Mining on Uncertain Graph Data Zhaonian Zou, Jianzhong Li, Hong Gao, Shuo Zhang Harbin Institute of Technology, China CIKM’09,](https://reader030.fdocuments.in/reader030/viewer/2022032703/56649d055503460f949d9233/html5/thumbnails/29.jpg)
Thank youThank you