1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti...
![Page 1: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/1.jpg)
1
AutoPart: Parameter-Free Graph Partitioning and Outlier Detection
Deepayan Chakrabarti ([email protected])
![Page 2: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/2.jpg)
2
Problem Definition
People
Pe
ople
People Groups
Pe
ople
Gro
up
s
Group people in a social network, or, species in a food web, or, proteins in protein interaction graphs …
![Page 3: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/3.jpg)
3
Reminder
People
Pe
ople
Graph: N nodes and E directed edges
![Page 4: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/4.jpg)
4
Problem Definition
People
Pe
ople
People Groups
Pe
ople
Gro
up
s
Goals:
• [#1] Find groups (of people, species, proteins, etc.)
• [#2] Find outlier edges (“bridges”)
• [#3] Compute inter-group “distances” (how similar are two groups of proteins?)
![Page 5: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/5.jpg)
5
Problem Definition
People
Pe
ople
People Groups
Pe
ople
Gro
up
s
Properties:
• Fully Automatic (estimate the number of groups)
• Scalable
• Allow incremental updates
![Page 6: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/6.jpg)
6
Related Work
Graph Partitioning METIS (Karypis+/1998)
Spectral partitioning (Ng+/2001)
Clustering Techniques K-means and variants
(Pelleg+/2000,Hamerly+/2003)
Information-theoreticco-clustering (Dhillon+/2003)
LSI (Deerwester+/1990) Choosing the number of “concepts”
Measure of imbalance between clusters, OR
Number of partitions
Rows and columns are considered separately, OR
Not fully automatic
![Page 7: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/7.jpg)
7
Outline
Problem Definition Related Work Finding clusters in graphs Outliers and inter-group distances Experiments Conclusions
![Page 8: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/8.jpg)
8
Outline
Problem Definition Related Work Finding clusters in graphs
What is a good clustering? How can we find such a clustering?
Outliers and inter-group distances Experiments Conclusions
![Page 9: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/9.jpg)
9
What is a “good” clustering
Node GroupsNode Groups
No
de
Gro
up
s
No
de
Gro
up
s
versus
Why is this better?
Good Clustering
1. Similar nodes are grouped together
2. As few groups as necessary
A few, homogeneous
blocks
Good Compression
implies
![Page 10: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/10.jpg)
10
Binary Matrix
Node groups
Nod
e gr
oups
Main Idea
Good Compression
Good Clusteringimplies
pi1 = ni
1 / (ni1 + ni
0)
(ni1+ni
0)* H(pi1) Cost of describing
ni1, ni
0 and groups
Code Cost Description Cost
Σi +Σi
![Page 11: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/11.jpg)
11
Examples
One node group
high low
n node groups
highlow
Total Encoding Cost = (ni1+ni
0)* H(pi1) Cost of describing
ni1, ni
0 and groups
Code Cost Description Cost
Σi +Σi
![Page 12: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/12.jpg)
12
What is a “good” clustering
Node GroupsNode Groups
No
de
Gro
up
s
No
de
Gro
up
s
versus
Why is this better?
low low
Total Encoding Cost = (ni1+ni
0)* H(pi1) Cost of describing
ni1, ni
0 and groups
Code Cost Description Cost
Σi +Σi
![Page 13: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/13.jpg)
13
Outline
Problem Definition Related Work Finding clusters in graphs
What is a good clustering? How can we find such a clustering?
Outliers and inter-group distances Experiments Conclusions
![Page 14: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/14.jpg)
14
Algorithms k = 5 node groups
![Page 15: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/15.jpg)
15
Algorithms
Start with initial matrix
Find good groups for fixed k
Choose better values for k
Final groupingLower the
encoding cost
![Page 16: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/16.jpg)
16
Algorithms
Start with initial matrix
Find good groups for fixed k
Choose better values for k
Final groupingLower the
encoding cost
![Page 17: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/17.jpg)
17
Node groups
Nod
e gr
oups
Fixed number of groups k
Reassign:for each node:
reassign it to the group which minimizes the code cost
![Page 18: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/18.jpg)
18
Algorithms
Start with initial matrix
Choose better values for k
Final groupingLower the
encoding cost
Find good groups for fixed k
![Page 19: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/19.jpg)
19
Choosing k
Split:1. Find the group R with the maximum entropy per node
2. Choose the nodes in R whose removal reduces the entropy per node in R
3. Send these nodes to the new group, and set k=k+1
![Page 20: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/20.jpg)
20
Algorithms
Start with initial matrix
Find good groups for fixed k
Choose better values for k
Final groupingLower the
encoding cost
Reassign
Splits
![Page 21: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/21.jpg)
21
Algorithms
Properties:
Fully Automatic number of groups is found automatically
Scalable O(E) time
Allow incremental updates reassign new node/edge to the group with least cost, and continue…
![Page 22: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/22.jpg)
22
Outline
Problem Definition Related Work Finding clusters in graphs
What is a good clustering? How can we find such a clustering?
Outliers and inter-group distances Experiments Conclusions
![Page 23: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/23.jpg)
23
Outlier Edges
Nodes
No
des
Outliers Deviations from “normality”
Lower quality compression
Find edges whose removal maximally reduces cost
No
de
Gro
up
s
Node Groups
Outlier edges
![Page 24: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/24.jpg)
24
Inter-cluster distances
Nodes
No
des
No
de
Gro
up
s
Node Groups
Grp1
Grp2
Grp3
Two groups are “close”
Merging them does not increase cost by much
distance(i,j) = relative increase in cost on merging i and j
![Page 25: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/25.jpg)
25
Inter-cluster distances
No
de
Gro
up
s
Node Groups
Grp1
Grp2
Grp3
Two groups are “close”
Merging them does not increase cost by much
distance(i,j) = relative increase in cost on merging i and j
Grp1 Grp2
Grp3
5.5
4.55.1
![Page 26: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/26.jpg)
26
Outline
Problem Definition Related Work Finding clusters in graphs
What is a good clustering? How can we find such a clustering?
Outliers and inter-group distances Experiments Conclusions
![Page 27: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/27.jpg)
27
Experiments
“Quasi block-diagonal” graph with noise=10%
![Page 28: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/28.jpg)
28
Experiments
Authors
Aut
hors
DBLP dataset
• 6,090 authors in:• SIGMOD
• ICDE
• VLDB
• PODS
• ICDT
• 175,494 “dots”, one “dot” per co-citation
![Page 29: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/29.jpg)
29
Experiments
Authors
Aut
hors
Aut
hor
grou
ps
Author groups
k=8 author groups found
Stonebraker, DeWitt, Carey
![Page 30: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/30.jpg)
30
Experiments
Aut
hor
grou
ps
Author groups
Grp8Grp1
Inter-group distances
![Page 31: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/31.jpg)
31
Experiments
User groups
Use
r gr
oups
Epinions dataset
• 75,888 users
• 508,960 “dots”, one “dot” per “trust” relationship
k=19 groups foundSmall dense “core”
![Page 32: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/32.jpg)
32
Experiments
Number of “dots”
Tim
e (in
sec
onds
)
Linear in the number of “dots” Scalable
![Page 33: 1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti (deepay@cs.cmu.edu)](https://reader035.fdocuments.in/reader035/viewer/2022062407/56649d595503460f94a39800/html5/thumbnails/33.jpg)
33
Conclusions
Goals:
Find groups
Find outliers
Compute inter-group “distances”
Properties:
Fully Automatic
Scalable
Allow incremental updates