Structural Measures of Clustering
Quality on Graph Samples
Jianpeng Zhang, Yulong Pei
George Fletcher, Mykola Pechenizkiy
Motivation
New Structural Measures
Main Issues of Evaluation Metrics
Experiment and Analysis
Conclusion & Future Work
Outline
1. Motivation
• Most graphs in practice are large-scale and/or
streaming. They are too large and can not be
clustered unless we sample a representative
subgraph.
• Challenge: how to evaluate the clustering quality
in the samples of the graph?
3
a) Co-authorship network of
the Los Alamos Condensed
Matter archive[1]
b) Facebook100 graph
of UNC[2] 3
Key Question
Fig. 1. Problem setting. Let S be a sampled subgraph of a
graph G and π(G) be a valid ground-truth clustering. Given
clustering π(S) of S induced by process P, what is the quality
of π(S) with respect to π(G)?
4
Evaluationmetrics
Sampling
G
SΡ
π(G)
π(S)Clustering process
Are results
representative?
Seeking and defining ground-truth
2、 Main Issues of Evaluation Metrics
How to assess the clustering quality by using ground-truth
clusters (i.e., common external properties that the members of
given clusters share ) is an essential procedure.
Unsupervised quality metrics (e.g., cut-size) are used as metrics
of clustering quality, but we are not sure whether the quality
metric gives the expected answers compared with the ground-
truth clusters.
Little attention has been paid to evaluation measures for
clustering quality on samples of graphs.
(1) Ground-truth clusters
5
Several classic quality metrics were proposed in the literatures.
However, there is no consensus on their quality and how well
they perform on different kinds of graphs.
These metrics try to identify good clusters by quantifying the
value of metrics which may not be the most meaningful
interpretation for what a good cluster is.
(2) The validity of quality metrics
Proposed solution: defining two novel quality
metrics called σ-precision and σ-recall based on
ground-truth clusters.
2、 Main Issues of Evaluation Metrics
6
The goal: the clusters in sampling graph S are good
representations of the clusters of the original graph G.
• b is a cluster-set of nodes in π(G)
• bs is a cluster-set of nodes in π(S)
• δ is a predefined match threshold, e.g., 90%.
1) Firstly, we proposed a basic metric based on the
set-matching, i.e., δ-coverage.
3、 New Structural Measures
7
The coverage of the clustering π(G) is given as the number of
clusters in π(G) which are represented by clusters in π(S).
Higher values of coverage mean the clusters in π(S) are more
consistent with and reflective of the ground-truth clusters in π(G).
6
5
7
3 2
1
0
8 6
5
4
7
3 2
1
0
8
G S
{1}, {0,2,3} ∈ {0,1,2,3}
{5,6,7,8} ∈ {4,5,6,7,8} δ-coverage =2
An illustration example
(δ=1)
8
Assume A and i belong to one partition
block (in red color) while B and j belong
to another block (in blue color).
After sampling and clustering on the
sampled graph, we have two clusters:
(1) A’ and j; (2) B’ and i, where A’∈A,
B’∈B.
Intuitively, the cluster blocks in the
sampled graph can represent the
original graph well
A good example why
we need parameter δ?
The main effect of δ-coverage is that the
measure is more lenient.
Based on δ-coverage, we design our new
metrics.
3、 New Structural Measures
9
A B
i
j
ij
A` B`
G
S
2) δ-precision and δ-recall
Higher values of δ-precision mean that the clusters in
S are more precisely representative of the clusters in G.
Higher values of δ-recall indicate that clusters in S
more successfully cover clusters in G.
3、 New Structural Measures
10
6
5
7
3 2
1
0
8 6
5
4
7
3 2
1
0
8
G S
{1}, {0,2,3} ∈ {0,1,2,3}
{5,6,7,8} ∈ {4,5,6,7,8}
δ-coverage =2 δ-precision =2/3
δ-recall =2/2
δ=1 Infer whether results
are representative?
11
An illustration example
• Case study: streaming graph(a sequence of consecutively
arriving nodes/edges)
• Clustering algorithm: Structural-sampler[3], EAC[4] and METIS [5].
a) Quality test: we compare our new metrics compared with
the classic metrics ,i.e., cut-size and NMI. We run each
algorithm ten times, and then compare the average value of
those metrics.
b) Tuning test: The task is to evaluate the impact of the
random sampling threshold p on the clustering quality in
sampled graph S. We sample and set all edges with a
random value as the same manner as Structural-sampler.
4、Experiments and Analysis
12
Figure shows that when the degree ratio μ changes from 0 to 0.5,
the value of the new metrics follows a downward trend.
The δ-precision and δ-recall metrics are more insightful
expression of clusters structure than the supervised metric NMI.
The unsupervised metric cut-size just estimates the edge-cut
value between the clusters.
4、Experiments and Analysis
a) Test quality on the benchmark datasets
13
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-pre
cis
ion(
=0.9
)
METIS
EAC
Structural-sampler
μ
δ-precision(δ=0.9)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-rec
all(
=0.9
)
METIS
EAC
Structural-sampler
δ-recall(δ=0.9)
μ0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NM
I
METIS
EAC
Structural-sampler
NMI
μ0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0
100
200
300
400
500
600
700
800
900
1000
Cut
-siz
e
METIS
EAC
Structural-sampler
Cut-size
μ
b) Tuning test on the benchmark graphs which have
relatively distinct cluster structure (μ = 0.2)
For the online algorithms, the moderate sampling on the graph
with distinct cluster structure makes the cluster structure more
clear and obtain higher value of the δ-precision and δ-recall.
4、Experiments and Analysis
The cut-size decreases gradually while NMI metric just has a
slight increase when the sampling threshold p becomes smaller.
They can not capture the structure change appropriately. 14
δ-precision(δ=0.9)
0.4 0.5 0.6 0.7 0.8 0.9 1
Sample threshold p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
METIS
EAC
Structural-sampler
δ-p
reci
son
(δ=0
.9)
δ-recall(δ=0.9)
0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
METIS
EAC
Structural-sampler
δ-r
eca
ll(δ
=0.9
)
Sample threshold p
NMI
0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
METIS
EAC
Structural-sampler
NM
I
Sample threshold p
Cut-size
0.4 0.5 0.6 0.7 0.8 0.9 10
100
200
300
400
500
600
700
800
900
METIS
EAC
Structural-sampler
Cu
t-si
ze
Sample threshold p
We proposed two new structural measures, and they are
effectively reflect the match quality of the clusters in the
sampled graph with respect to the ground-truth clusters
in the original graph.
The experimental results indicate that classic metrics do
not share a common view of what a true clustering should
look like.
Our new metrics have a more insightful results and that
could be helpful when used as a complementary standard
measures.
Conclusion & Future Work
15
Conclusion & Future Work
Our future work
We want to generalize our evaluation framework
and give concrete advise on different sampling
strategies and clustering approaches,
We also plan to extend our study of the fidelity of
the evaluation measures.
16
Dimension
reduciton
Classification Clustering
Big data
analytics
Streaming
data
Deep learning Manfold
learning
Predictive
analytics
Change
detection
Object
Recognition
Thanks for your attention!!!
Jianpeng Zhang
Top Related