Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges...

8
Graph Partitioning Graph Partitioning and Clustering and Clustering

Transcript of Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges...

Page 1: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

Graph PartitioningGraph Partitioningand Clusteringand Clustering

Page 2: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

0.1

0.2

0.8

0.7

0.6

0.8

0.8

0.8

E={wE={wijij}} Set of weighted edges indicating pair-wise Set of weighted edges indicating pair-wise similarity between pointssimilarity between points

Similarity GraphSimilarity Graph

Represent dataset as a weighted graph Represent dataset as a weighted graph G(V,E)G(V,E) Example datasetExample dataset },...,,{ 621 xxx

1

2

3

4

5

6

V={xV={xii}} Set of Set of nn vertices representing data points vertices representing data points

Page 3: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

Graph PartitioningGraph Partitioning

Clustering can be viewed as partitioning a similarity graphClustering can be viewed as partitioning a similarity graph

Bi-partitioningBi-partitioning task: task: Divide vertices into two disjoint groups Divide vertices into two disjoint groups (A,B)(A,B)

1

2

3

4

5

6

A B

Relevant Issues:Relevant Issues: How can we define a “good” partition of the graph?How can we define a “good” partition of the graph? How can we efficiently identify such a partition?How can we efficiently identify such a partition?

Page 4: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

Clustering ObjectivesClustering Objectives

Traditional definition of a “good” clustering:Traditional definition of a “good” clustering:1.1. Points assigned to same cluster should be highly similar.Points assigned to same cluster should be highly similar.

2.2. Points assigned to different clusters should be highly dissimilar.Points assigned to different clusters should be highly dissimilar.

2. Minimise weight of 2. Minimise weight of between-groupbetween-group connections connections

0.1

0.2

1. Maximise weight of 1. Maximise weight of within-groupwithin-group connectionsconnections

0.8

0.7

0.6

0.8

0.8

0.8

1

2

3

4

5

6

Apply these objectives to our graph representationApply these objectives to our graph representation

Page 5: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

Graph CutsGraph Cuts

Express partitioning objectives as a function of the Express partitioning objectives as a function of the “edge cut” of the partition.“edge cut” of the partition.

Cut:Cut: Set of edges with only one vertex in a group. Set of edges with only one vertex in a group.

BjAi

ijwBAcut,

),(

0.1

0.2

0.8

0.7

0.6

0.8

0.8

1

2

3

4

5

6

0.8

A B

cut(A,B) = 0.3

Page 6: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

Graph Cut CriteriaGraph Cut Criteria

Criterion: Minimum-cutCriterion: Minimum-cut Minimise weight of connections between groupsMinimise weight of connections between groups

min cut(A,B)

Optimal cutMinimum cut

Problem:Problem: Only considers external cluster connectionsOnly considers external cluster connections Does not consider internal cluster densityDoes not consider internal cluster density

Degenerate case:Degenerate case:

Page 7: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

Graph Cut Criteria Graph Cut Criteria (continued)(continued)

Criterion: Normalised-cut Criterion: Normalised-cut (Shi & Malik,’97)(Shi & Malik,’97) Consider the connectivity between groups relative to Consider the connectivity between groups relative to

the density of each group.the density of each group.

Normalise the association between groups by Normalise the association between groups by volumevolume.. Vol(A)Vol(A): The total weight of the edges originating from : The total weight of the edges originating from

group group AA. .

)(

),(

)(

),(),(min

Bvol

BAcut

Avol

BAcutBANcut

Why use this criterion?Why use this criterion? Minimising the normalised cut is equivalent to Minimising the normalised cut is equivalent to

maximising normalised association.maximising normalised association. Produces more balanced partitions.Produces more balanced partitions.

Page 8: Graph Partitioning and Clustering. 0.1 0.2 0.8 0.7 0.6 0.8 E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

How do we efficiently identify How do we efficiently identify a “good” partition?a “good” partition?

Problem:Problem: Computing an optimal cut is NP-hardComputing an optimal cut is NP-hard