1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science...
-
Upload
hugh-allison -
Category
Documents
-
view
227 -
download
0
Transcript of 1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science...
1/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Online Search of Overlapping Communities
Wanyun Cui, Fudan UniversityYanghua Xiao, Fudan University
Haixun Wang, Microsoft Research AsiaYiqi Lu, Fudan University
Wei Wang, Fudan University
Presenter. Wanyun Cui
2/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Outline Motivation Model Algorithm Experiments Applications
3/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Outline Motivation Model Algorithm Experiments Applications
4/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Complex network Complex network is everywhere.
Social Network
5/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Complex network Complex network is everywhere.
Internet
6/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Complex network Complex network is everywhere.
Protein Network
7/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Complex network Complex network is everywhere.
InternetSocial Network Protein Network
8/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Community structures Complex network is everywhere. Most real life networks have community
structures.• The graph can be divided into different groups such that
the vertices within each group are closely connected and the vertices between different groups are sparsely connected
InternetSocial Network Protein Network
9/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Overlapping community structure Overlapping community: a vertex may belong to
multiple communities
10/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Overlapping community structure Overlapping community: a vertex may belong to
multiple communities
C1: small boatC2: meaning of bucketC3: big boatC4: table wares
11/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Finding community structures Two possible ways to find the community
structure• OCD: overlapping community detection• OCS: overlapping community search
12/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS OCD: divides the entire network to find
communities
13/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS
Disadvantages of OCD• Too costly• Global criterion• Unfriendly to
dynamic graph
Facebook network: over 800 million nodes and 100 billion links
algorithm complexity
Girvan–Newman algorithm
O(|E|3)
LPA Almost linear
LA O(|C||E|+|V|)
14/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS
Disadvantages of OCD• Too costly• Global criterion• Unfriendly to
dynamic graph
A fixed parameter or criterion is not appropriate for all vertices and queries.• Communities of a student• Communities of Barack
Obama
15/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS
Disadvantages of OCD• Too costly• Global criterion• Unfriendly to
dynamic graph
Graphs in real life are always evolving over time.
We cannot afford to run OCD very frequently.
OCD loses its freshness and effectiveness
16/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS
Disadvantages of OCD• Too costly• Global criterion• Unfriendly to
dynamic graph
Usually performed in an offline fashion
17/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCS: problem definition OCS:
• Given graph G, a query vertex v• Return: all communities that v belong to
Given: Return:
18/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS
Advantages of OCS:• More efficient• Personalized
criterion• Light weight
We just need to find communities within the local neighborhoods of the vertex.
Our OCS solution only needs several milliseconds to find answer
19/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS
Advantages of OCS:• More efficient• Personalized
criterion• Friendly to
dynamic graph
20/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
OCD vs. OCS
Advantages of OCS:• More efficient• Personalized
criterion• Light weight
A good choice to find communities in an online fashion
21/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Applications of OCS
• Friend recommendation on Facebook.
• Semantic expansion.• Infectious disease
control.• Etc.
22/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Challenges of OCS
• Modeling• Complexity and
scalability
A community should be dense enough
Overlapping aware
Generality
23/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Challenges of OCS
• Modeling• Complexity and
scalability
OCS in the worst case may need to enumerate an exponential number of valid communities.• Computational hard
Approximate approach
24/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Outline Introduction Model Algorithm Experiments Applications
25/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Model Community structure
awareness
Overlapping awareness
Generality
The inner edges of a community should be dense
Clique as the unit of community
A clique of 6 vertices
26/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Model Community structure
awareness
Overlapping awareness
Generality
Two k-cliques are adjacent if they share k-1 vertices
A community is a component in the k-clique graph
Original graph Clique graph (k=4)
27/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Model Community structure
awareness
Overlapping awareness
Generality
Weaken the strict constraint on clique density and clique adjacency
quasi-clique
adjacency
28/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Model Community structure
awareness
Overlapping awareness
Generality
Weaken the strict constraint on clique density and clique adjacency
quasi-clique
adjacency
It’s ok if a few edges are missing in the clique
29/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Model Community structure
awareness
Overlapping awareness
Generality
Loose the strict constraint of clique and adjacency
quasi-clique
𝛼 adjacency
If two cliques share at least 𝛼 vertices, they are 𝛼 adjacent.
30/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Model Community structure
awareness
Overlapping awareness
Generality
Loose the strict constraint of clique and adjacency
quasi-clique
𝛼 adjacency
Original graph Clique graph (=1)
31/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
(𝛼 ,𝛾 )−𝑂𝐶𝑆 Given graph G, query vertex v, k, , and , find all
connected quasi-clique components containing v.
k=4
32/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Alpha-gamma ocs Given graph G, query vertex v, k, , and , find all
connected quasi-clique components containing v.
k=3
33/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Parameter selection and k
• In general, larger k leads to larger
• Has an upper bound and a lower bound corresponding
to and k
34/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Outline Introduction Model Algorithm Experiments Applications
35/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Algorithm Exact algorithm
Approximate algorithm
36/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Exact Algorithm Example
• k=4, (3,1)-OCS• Query vertex = Bob
37/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Exact Algorithm Example
• k=4, (3,1)-OCS• Query vertex = Bob
Drawback• exponential enumerations
38/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Approximate Algorithm Example
• k=4, (3,1)-OCS• Query vertex = Bob
Approximate• the new clique contains at
least one new vertex
39/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Approximate Algorithm Example
• k=4, (3,1)-OCS• Query vertex = Bob
Approximate• the new clique contains at
least one new vertex
40/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Outline Introduction Model Algorithm Experiments Applications
41/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Experiments Setup
Dataset
Intel Core2 2.13GHz
4GB memory
64 bit windows 7
42/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Experiments Setup
Dataset
Dataset |V| |E|
WordNet 82676 133445
DBLP 560851 1816613
Google 916427 4322051
Livejournal 4847572 42851237
43/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Effectiveness It successfully unveils multiple research interests Example
• Jiawei Han • K=6
Jiawei Han
C1: multimedia data miningC2: stream data miningC3: information network
44/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Effectiveness Our model is flexible to support different
parameters. Example
• Jiawei Han • K=9
Jiawei Han
45/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Effectiveness
For most vertices, OCS model can find non-trivial results.
46/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Performance OCS is more efficient than OCD.
Competitors:• LA
• <Identification of overlapping community structure in complex networks using fuzzy c-means clustering>
• OSLOM
• <Finding statistically significant communities in networks> Amortized time
• (Total time of OCD)/n
47/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Performance: influence of parameters For the same k and , a smaller costs more time For the same k and , a smaller costs more time
48/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Accuracy of approximate algorithm More than 70% accuracy can be consistently
achieved, in some cases almost 90% accuracy can be achieved
49/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Outline Introduction Model Algorithm Experiments Applications
50/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Diversity-based Social Network Analysis What is the distribution of diversity? Can we find people with really large diversity?
51/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Name disambiguation Ambiguous names with a significant number of
entities also have a large number of communities.
Real person’s communities is smaller than these ambiguous names.
52/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Contributions Problem definition Model Guide for parameter selection Algorithms Extensive experiments and applications
53/52Overlapping Community Search
Graph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cnGraph Data Management Lab, School of Computer Science
GDM@FUDAN
www.gdm.fudan.edu.cn
Email: [email protected]
Q&A
Thank you!