Unsupervised Learning Clustering Algorithms - IT -...

42
1 1 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana Fred Outline Part 1: Basic Concepts of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties and open problems Part 2: Clustering Algorithms Hierarchical methods : Single-link : Complete-link : Clustering Based on Dissimilarity Increments Criteria From Single Clustering to Ensemble Methods - April 2009 2 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana Fred Hierarchical Clustering Use proximity matrix: nxn : D(i,j): proximity (similarity or distance) between patterns i and j Step 0 Step 1 Step 2 Step 3 Step 4 b d c e a a b d e c d e a b c d e Step 4 Step 3 Step 2 Step 1 Step 0 agglomerative divisive From Single Clustering to Ensemble Methods - April 2009

Transcript of Unsupervised Learning Clustering Algorithms - IT -...

Page 1: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

1

1

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Outline

Part 1: Basic Concepts of data clustering

Non-Supervised Learning and Clustering

: Problem formulation – cluster analysis

: Taxonomies of Clustering Techniques

: Data types and Proximity Measures

: Difficulties and open problems

Part 2: Clustering Algorithms

Hierarchical methods

: Single-link

: Complete-link

: Clustering Based on Dissimilarity Increments Criteria

From Single Clustering to Ensemble Methods - April 2009

2

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering

Use proximity matrix: nxn

: D(i,j): proximity (similarity or distance) between patterns i and j

Step 0 Step 1 Step 2 Step 3 Step 4

b

d

c

e

aa b

d e

c d e

a b c d e

Step 4 Step 3 Step 2 Step 1 Step 0

agglomerative

divisive

From Single Clustering to Ensemble Methods - April 2009

Page 2: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

2

3

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering: Agglomerative Methods

1. Start with n clusters containing one object

2. Find the most similar pair of clusters Ci e Cj from the proximity

matrix and merge them into a single cluster

3. Update the proximity matrix (reduce its order by one, by replacing

the individual clusters with the merged cluster)

4. Repeat steps (2) e (3) until a single cluster is obtained (i.e. N-1

times)

From Single Clustering to Ensemble Methods - April 2009

4

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering: Agglomerative Methods

Similarity measures between clusters:

:Well known similarity measures can be written using the Lance-Williams

formula, expressing the distance between cluster k and cluster i+j,

obtained by the merging of clusters i and j:

Single-link

Complete-link

Centroid

Median

(Average link)

Ward’s Method

(minimum variance)

),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),( 0

2 kji

ji

ji

ji

j

j

ji

ii dkjidc

nn

nnb

nn

na

nn

na

baa ji 0c ; 25.0 ; 5.0

ji CbCaji

ji

ji

j

j

ji

ii bad

nnCCdcb

nn

na

nn

na

,

),(1

),( 0

0

c

nnn

nb

nnn

nna

nnn

nna

jik

k

jik

jk

j

jik

iki

),(),(),(),(),(),( kjdkidcjibdkjdakidakjid ji

From Single Clustering to Ensemble Methods - April 2009

Page 3: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

3

5

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering: Agglomerative Methods

Single-link

Complete-link

Centroid

Median

(Average link)

Ward’s Method

(minimum variance)

),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),( 0

2 kji

ji

ji

ji

j

j

ji

ii dkjidc

nn

nnb

nn

na

nn

na

baa ji 0c ; 25.0 ; 5.0

ji CbCaji

ji

ji

j

j

ji

ii bad

nnCCdcb

nn

na

nn

na

,

),(1

),( 0

0

c

nnn

nb

nnn

nna

nnn

nna

jik

k

jik

jk

j

jik

iki

Single Link: Distance between two clusters

is the distance between the closest points.

Also called “neighbor joining.”

From Single Clustering to Ensemble Methods - April 2009

6

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering: Agglomerative Methods

Single-link

Complete-link

Centroide

Median

(Average link)

Ward’s Method

(minimum variance)

),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),( 0

2 kji

ji

ji

ji

j

j

ji

ii dkjidc

nn

nnb

nn

na

nn

na

baa ji 0c ; 25.0 ; 5.0

ji CbCaji

ji

ji

j

j

ji

ii bad

nnCCdcb

nn

na

nn

na

,

),(1

),( 0

0

c

nnn

nb

nnn

nna

nnn

nna

jik

k

jik

jk

j

jik

iki

Complete Link: Distance between clusters

is distance between farthest pair of points.

From Single Clustering to Ensemble Methods - April 2009

Page 4: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

4

7

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering: Agglomerative Methods

Single-link

Complete-link

Centroide

Median

(Average link)

Ward’s Method

(minimum variance)

),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),( 0

2 kji

ji

ji

ji

j

j

ji

ii dkjidc

nn

nnb

nn

na

nn

na

baa ji 0c ; 25.0 ; 5.0

ji CbCaji

ji

ji

j

j

ji

ii bad

nnCCdcb

nn

na

nn

na

,

),(1

),( 0

0

c

nnn

nb

nnn

nna

nnn

nna

jik

k

jik

jk

j

jik

iki

Centroid: Distance between clusters is

distance between centroids.

From Single Clustering to Ensemble Methods - April 2009

8

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering: Agglomerative Methods

Single-link

Complete-link

Centroid

Median

(Average link)

Ward’s Method

(minimum variance)

),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),( 0

2 kji

ji

ji

ji

j

j

ji

ii dkjidc

nn

nnb

nn

na

nn

na

baa ji 0c ; 25.0 ; 5.0

ji CbCaji

ji

ji

j

j

ji

ii bad

nnCCdcb

nn

na

nn

na

,

),(1

),( 0

0

c

nnn

nb

nnn

nna

nnn

nna

jik

k

jik

jk

j

jik

iki

Average Link: Distance between clusters is

average distance between the cluster points.

From Single Clustering to Ensemble Methods - April 2009

Page 5: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

5

9

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Ward’s Link: Minimizes the sum-of-squares

criterion (measure of heterogenity)

Hierarchical Clustering: Agglomerative Methods

Single-link

Complete-link

Centroid

Median

(Average link)

Ward’s Method

(minimum variance)

),(),,(min, -0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),,(max, 0.5c ; 0 ; 5.0 kjdkidkjid baa ji

),(),( 0

2 kji

ji

ji

ji

j

j

ji

ii dkjidc

nn

nnb

nn

na

nn

na

baa ji 0c ; 25.0 ; 5.0

ji CbCaji

ji

ji

j

j

ji

ii bad

nnCCdcb

nn

na

nn

na

,

),(1

),( 0

0

c

nnn

nb

nnn

nna

nnn

nna

jik

k

jik

jk

j

jik

iki

From Single Clustering to Ensemble Methods - April 2009

10

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Single Linkage: ),(min,

,badCCd

ji CbCaji

x y

1 4 4

2 8 4

3 15 8

4 24 4

5 24 121 2 3 4 5

1 - 4 11.7 20 21.5

2 4 - 8.1 16 17.9

3 11.7 8.1 - 9.8 9.8

4 20 16 9.8 - 8

5 21.5 17.9 9.8 8 -

1 2

35

4

1 2

35

4

1,2 3 4 5

1,2 - 8.1 16 17.9

3 8.1 - 9.8 9.8

4 16 9.8 - 8

5 17.9 9.8 8 -

1 2

35

4

From Single Clustering to Ensemble Methods - April 2009

Page 6: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

6

11

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Single Linkage:

1,2 3 4,5

1,2 - 8.1 16

3 8.1 - 9.8

4,5 16 9.8 -

1 2

35

4

1,2,3 4,5

1,2,3 - 9.8

4,5 9.8 -

1 2

35

4

Dendrogram

),(min,,

badCCdji CbCa

ji

From Single Clustering to Ensemble Methods - April 2009

12

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Complete-Link: ),(max,,

badCCdji CbCa

ji

1 2 3 4 5

1 - 4 11.7 20 21.5

2 4 - 8.1 16 17.9

3 11.7 8.1 - 9.8 9.8

4 20 16 9.8 - 8

5 21.5 17.9 9.8 8 -

1 2

35

4

1 2

35

4

1,2 3 4 5

1,2 - 11.7 20 21.5

3 11.7 - 9.8 9.8

4 20 9.8 - 8

5 21.5 9.8 8 -

1 2

35

4

1,2 3 4,5

1,2 - 11.7 21.5

3 11.7 - 9.8

4,5 21.5 9.8 -

From Single Clustering to Ensemble Methods - April 2009

Page 7: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

7

13

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Complete-Link: ),(max,,

badCCdji CbCa

ji

1 2

35

4

1,2,3 4,5

1,2,3 - 21.5

4,5 21.5 -

Dendrogram

From Single Clustering to Ensemble Methods - April 2009

14

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

1 2

35

4

Single-link

1 2

35

4

Complete-link

Single-link and Complete-Link

: A clustering of the data objects is obtained by cutting the dendrogram at

the desired level, then each connected component forms a cluster.

Favours connectedness Favours compactness

From Single Clustering to Ensemble Methods - April 2009

Page 8: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

8

15

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Single-link and Complete-Link

SL algorithm:

: Favors connectedness

Equivalent to building a MST

And cutting at weak links

2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

From Single Clustering to Ensemble Methods - April 2009

16

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Single-link and Complete-Link

SL algorithm:

: Favors connectedness

: Detects arbitrary-shaped clusters with even densities

: Cannot handle distinct density clusters

. Single-link method, th=0.49

From Single Clustering to Ensemble Methods - April 2009

Page 9: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

9

17

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

-5 0 5 10-4

-2

0

2

4

-5 0 5 10-4

-2

0

2

4

Single-link and Complete-Link

SL algorithm:

: Favors connectedness

: Detects arbitrary-shaped clusters with even densities

: Cannot handle distinct density clusters

: Is sensitive to in-between patterns

-5 0 5 10-4

-2

0

2

4

From Single Clustering to Ensemble Methods - April 2009

18

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Single-link and Complete-Link

SL algorithm:

: Favors connectedness

: Detects arbitrary-shaped clusters with even densities

: Cannot handle distinct density clusters

: Is sensitive to in-between patterns

: Needs criteria to set the final number of clusters

CL algorithm:

: Favors compactness

-5 0 5 10-4

-2

0

2

4

-5 0 5 10-4

-2

0

2

4

From Single Clustering to Ensemble Methods - April 2009

Page 10: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

10

19

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Single-link and Complete-Link

SL algorithm:

: Favors connectedness

: Detects arbitrary-shaped clusters with even densities

: Cannot handle distinct density clusters

: Is sensitive to in-between patterns

: Needs criteria to set the final number of clusters

CL algorithm:

: Favors compactness

: Imposes spherical-shaped clusters on data

-2 -1 0 1 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1 0 1 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

From Single Clustering to Ensemble Methods - April 2009

20

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Single-link and Complete-Link

SL algorithm:

: Favors connectedness

: Detects arbitrary-shaped clusters with even densities

: Cannot handle distinct density clusters

: Is sensitive to in-between patterns

: Needs criteria to set the final number of clusters

CL algorithm:

: Favors compactness

: Imposes spherical-shaped clusters on data

: Needs criteria to set the final number of clusters

From Single Clustering to Ensemble Methods - April 2009

Page 11: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

11

21

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Hierarchical Clustering

Weakness

: do not scale well: time complexity of O(n2), where n is the number

of total objects

: can never undo what was done previously

Integration of hierarchical with distance-based clustering

: BIRCH (Zhang, Ramakrishnan, Livny, 1996): uses a

ClusteringFeature-tree and incrementally adjusts the quality of sub-

clusters

: CURE (Guha, Rastogi & Shim, 1998): selects well-scattered points

from the cluster and then shrinks them towards the center of the

cluster by a specified fraction

: CHAMELEON (G. Karypis, E.H. Han and V. Kumar, 1999):

hierarchical clustering using dynamic modeling

1. Use a graph partitioning algorithm: cluster objects into a large

number of relatively small sub-clusters

2. Use an agglomerative hierarchical clustering algorithm: find the

genuine clusters by repeatedly combining these sub-clusters

From Single Clustering to Ensemble Methods - April 2009

22

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Smoothness Hypothesis:

: A cluster is a set of patterns sharing important characteristics in a

given context

: A dissimilarity measure encapsulates the notion of pattern

resemblance

: Higher resemblance patterns are more likely to belong to the same

cluster and should be associated first

: Dissimilarity between neighboring patterns within a cluster should

not occur with abrupt changes

: The merging of well separated clusters results in abrupt changes

in dissimilarity values

A Fred, J Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE PAMI, 2003.

From Single Clustering to Ensemble Methods - April 2009

Page 12: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

12

23

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Dissimilarity Increments:

Dissimilarity increment:

, , - nearest neighbors

: argmin , ,

: argmin , ,

i j k

j l il

k l jl

x x x

x j d x x l i

x k d x x l i

, , , ,inc i j k i j j kd x x x d x x d x x

ixjx

kx

,i jd x x

,i jd x x

incd

From Single Clustering to Ensemble Methods - April 2009

24

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:

: Uniformly distributed data

From Single Clustering to Ensemble Methods - April 2009

Page 13: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

13

25

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:

: 2D Gaussian data

From Single Clustering to Ensemble Methods - April 2009

26

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:

: Ring-shaped data

From Single Clustering to Ensemble Methods - April 2009

Page 14: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

14

27

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:

: Directional expanding data

From Single Clustering to Ensemble Methods - April 2009

28

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Distribution of Dissimilarity Increments:

: Exponential distribution: ( ) expx

p x

From Single Clustering to Ensemble Methods - April 2009

Page 15: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

15

29

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Exponential distribution:

: Higher density patterns -> higher

: Well separated clusters -> dinc on the tail of p(x)

From Single Clustering to Ensemble Methods - April 2009

30

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Gap between clusters: ,i i j t igap d C C d C

12 3

45

6

7

8

9

10

11

12

13

14

15

16

17

18d

t(C

i) d(C

i,C

j)

dt(C

j)

gapj

dt(C

j)

dt(C

i)gap

i

Ci

Cj

From Single Clustering to Ensemble Methods - April 2009

Page 16: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

16

31

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Gap between clusters: ,i i j t igap d C C d C

Ci

Cj

gapi

gapj

dt(C

i )d

t(C

j ) d(C

i ,C

j )

From Single Clustering to Ensemble Methods - April 2009

32

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Cluster Isolation criterion:

Let Ci, Ck be two clusters which are candidates for merging, and let i , k

be the respective mean values of the dissimilarity increments in each

cluster. Compute the increments for each cluster, gapi and gapk. If

gapi i (gapk k), isolate cluster Ci (Ck) and proceed the

clustering strategy with the remaining patterns. If neither cluster exceeds

the gap limit, merge them.

From Single Clustering to Ensemble Methods - April 2009

Page 17: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

17

33

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Setting the Isolation Criterion Parameter :

: Result: the crossings of the tangential line, at points which are multiple of

the distribution mean value, /, with the x axis, is given by (+1)/

: 3,5 cover the significant part of the distribution

exponential distribution, =.05

tangential lines at points multiple of

=.05

From Single Clustering to Ensemble Methods - April 2009

34

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Hierarchical Clustering Algorithm:

: A statistic of the dissimilarity increments within a cluster is maintained and

updated during cluster merging

: Clusters are obtained by comparing dissimilarity increments with a dynamic

threshold, i , based on cluster statistics

From Single Clustering to Ensemble Methods - April 2009

Page 18: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

18

35

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Results:

:Ring-Shaped Clusters

From Single Clustering to Ensemble Methods - April 2009

36

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Results:

:Ring-Shaped Clusters

. Single-link method, th=0.49

From Single Clustering to Ensemble Methods - April 2009

Page 19: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

19

37

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Results:

:Ring-Shaped Clusters

. Dissimilarity Increments-base method:

d1

d2

d1

d2d2

g1

g2

3/

3/1

3/3

From Single Clustering to Ensemble Methods - April 2009

38

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Results:

:Ring-Shaped Clusters

. Dissimilarity Increments-base method:

d1

d2d2

g1

g2

3/

3/1

3/3

From Single Clustering to Ensemble Methods - April 2009

Page 20: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

20

39

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Results:

:2-D Patterns with Complex Structure

Dissimilarity Increments-base method Single-link

2 < < 9

Single Link

th=1.1

From Single Clustering to Ensemble Methods - April 2009

40

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering of Contour Images

: The data set is composed by 634 contour images of 15 types of hardware tools: t1 to t15.

: When counting each pose as a distinct sub-class in the object type, we obtain a total of 24 classes.

From Single Clustering to Ensemble Methods - April 2009

Page 21: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

21

41

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering of Contour Images

Contour extraction

: the object boundary is sampled at 50 equally spaced points

: the angle between consecutive segments is quantized in 8 levels.

Contour

Extraction

String Contour

Description

Based on a

thresholding

method

8 directional

differential

chain code

From Single Clustering to Ensemble Methods - April 2009

42

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering of Contour Images

From Single Clustering to Ensemble Methods - April 2009

Page 22: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

22

43

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering of Contour Images

t1

t2

t3

t4

t5

t1

t2

t3

t4

t5

Single-link Dissimilarity Increments-based method

(string-edit distance)

From Single Clustering to Ensemble Methods - April 2009

44

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Clustering Based on Dissimilarity Increments Criteria

Description:

: Hierarchical agglomerative algorithm adopting a cluster isolation

criterion based on dissimilarity increments

Strength

: Method is not conditioned by a particular dissimilarity measure

(examples used Euclidean and string-edit distances)

: Ability to identify arbitrary shaped and sized clusters

: The number of clusters is intrinsically found

Weakness

: Sensitive to in-between points connecting touching clusters

From Single Clustering to Ensemble Methods - April 2009

Page 23: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

23

45

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Outline

Partitional Methods

: K-Means

: Spectral Clustering

: EM-based Gaussian Mixture Decomposition

Part 3.: Validation of clustering solutions

Cluster Validity Measures

Part 4.: Ensemble Methods

Evidence Accumulation Clustering

From Single Clustering to Ensemble Methods - April 2009

46

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Partitional Methods

K-Means: Minimizes the cost function:

: Algorithm:

. Input: k, the number of clusters; data set

1. Randomly select k seed points from the data set, and take them as initial centroids

2. Partition the data into k clusters by assigning each object to the cluster with the nearest centroid.

3. Compute centroids of the clusters of the current partition. The centroid is the center (mean point) of the cluster.

4. Go back to step 2 or stop when no more new assignment.

From Single Clustering to Ensemble Methods - April 2009

Page 24: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

24

47

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Partitional Methods: K-Means

Favors compactness

-5 0 5 10-4

-2

0

2

4

-5 0 5 10-4

-2

0

2

4

From Single Clustering to Ensemble Methods - April 2009

48

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Partitional Methods: K-Means

Favors compactness

Strength

: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)

: Scalability

: Often terminates at a local optimum.

Weakness

: Imposes spherical-shaped clusters

K-means clustering of uniform data (k=4)

From Single Clustering to Ensemble Methods - April 2009

Page 25: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

25

49

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Partitional Methods: K-Means

Favors compactness

Strength

: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)

: Scalability

: Often terminates at a local optimum.

Weakness

: Imposes spherical-shaped clusters

: Is sensitive to the number of objects in clusters0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

From Single Clustering to Ensemble Methods - April 2009

50

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Partitional Methods: K-Means

Favors compactness

Strength

: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)

: Scalability

: Often terminates at a local optimum.

Weakness

: Imposes spherical-shaped clusters

: Is sensitive to the number of objects in clusters

: Dependence on initialization

From Single Clustering to Ensemble Methods - April 2009

Page 26: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

26

51

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Favors compactness

Strength

: Fast

: Scalability

: Robust to noise

Weakness

: Imposes spherical-shaped clusters

: Is sensitive to the number of objects in clusters

: Dependence on initialization

0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

From Single Clustering to Ensemble Methods - April 2009

52

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Partitional Methods: K-Means

Favors compactness

Strength

: Fast algorithm (O(tkn) – t is the number of iterations; normally, k, t << n.)

: Scalability

: Often terminates at a local optimum.

Weakness

: Imposes spherical-shaped clusters

: Is sensitive to the number of objects in clusters

: Dependence on initialization

: Needs criteria to set the final number of clusters

: Applicable only when mean is defined (what about categorical

data?)

From Single Clustering to Ensemble Methods - April 2009

Page 27: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

27

53

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Variations of K-Means Method

K-Means(MacQueen’67): each cluster is represented by the center

of the cluster

A few variants of the k-means which differ in

: Selection of the initial k means

: Dissimilarity calculations (Mahalanobis distance -> elliptic clusters)

: Strategies to calculate cluster means

. Medoid - each cluster is represented by one of the objects in the cluster

: Fuzzy version: Fuzzy K-Means

Handling categorical data: k-modes (Huang’98)

: Replacing means of clusters with modes

: Using new dissimilarity measures to deal with categorical objects

: Using a frequency-based method to update modes of clusters

From Single Clustering to Ensemble Methods - April 2009

54

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering

For a given data set, X, Spectral Clustering finds a set of data clusters on the basis of spectral analysis of a similarity graph

The clustering problem is defined in terms of a complete graph, G, with vertices V={1, …, N}, corresponding to the data points in the data set, and each edge between two vertices is weighted by the similarity between them.

The weight matrix is also called the affinity matrix or the similarity matrix.

2

2

2

ji xx

ij eA

Gaussian Kernel

From Single Clustering to Ensemble Methods - April 2009

Page 28: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

28

55

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering

Cutting edges of G we obtain disjoint subgraphs of G as the clusters of X

The goal of clustering is to organize the dataset into disjoint subsets with high intra-cluster similarity and low inter-cluster similarity

The resulting clusters should be as compact and isolated as possible

From Single Clustering to Ensemble Methods - April 2009

56

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering

The graph partitioning for data clustering can be interpreted as aminimization problem of an objective function, in which the compactnessand isolation are quantified by the subset sums of edge weights.

Common objective functions

Ratio cut (Rcut)

Normalised cut (Ncut)

Min-max cut (Mcut)

• cut(A, B) is the sum of the edge weights between ∀p ∈ A and ∀p ∈ B

• P\Cl is the complement of Cl ⊂ X,

• card Cl denotes the number of points in Cl

k

l ll

llk

k

l l

llk

k

l l

llk

CC

CXCCC

XC

CXCCC

C

CXCCC

1

1

1

1

1

1

),cut(

)\,cut(:),,Mcut(

),cut(

)\,cut(:),,Ncut(

card

)\,cut(:),,Rcut(

From Single Clustering to Ensemble Methods - April 2009

Page 29: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

29

57

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering

The solution of the minimization problem of any of the previous objectivefunctions is obtained from the matrix of the first k eigenvectors of a matrixderived from the affinity matrix (Laplacian matrix)

The eigenvectors for Ncut and Mcut are identical, and obtained from thesymmetrical Laplacian

Another common choice is

Distinct algorithms differ on the way of producing and using the eigenvectors and how to derive clusters from them.: Some use each eigenvector one at a time

: Other, use top k eigenvectors simultaneously

Closely related with spectral graph partitioning, in which the second eigenvector of a graph’s Laplacian is used to define a semi-optimal cut; the second eigenvector solves a relaxation of an NP-hard discrete graph partitioning problem, giving an approximation to the optimal cut.

2/12/1: ADDLL sym

• D is a diagonal matrix whose ii-th entry is the sum of the i-th row of A

ADDLL rw 1:

From Single Clustering to Ensemble Methods - April 2009

58

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering (Ng et al, 2001)

Maps the feature space into a new space, Y, based on the

eigenvectors of a matrix derived from an affinity matrix associated

with the data set.

The data partition is obtained by applying the K-means algorithm

on the new space.

(NG. and Al. 2001)

2

2

2

ji xx

ij eA

Original feature

space

Eigen vector

Feature space

A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, NIPS 2001

From Single Clustering to Ensemble Methods - April 2009

Page 30: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

30

59

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering (Ng et al, 2001)

Algorithm:

From Single Clustering to Ensemble Methods - April 2009

60

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering (Ng et al, 2001)

From Single Clustering to Ensemble Methods - April 2009

Page 31: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

31

61

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering (Ng et al, 2001)

Results strongly depend on parameter values: k e σ

K=2, σ=0.1 K=2, σ=0.4

From Single Clustering to Ensemble Methods - April 2009

62

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering (Ng et al, 2001)

Results strongly depend on parameter values: k e σ

K=2, σ=0.1 K=2, σ=0.4

From Single Clustering to Ensemble Methods - April 2009

Page 32: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

32

63

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering (Ng et al, 2001)

Results strongly depend on parameter values: k e σ

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5

-1

-0.5

0

0.5

1

1.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5

-1

-0.5

0

0.5

1

1.5

K=3, σ=0.1 K=3, σ=0.45

From Single Clustering to Ensemble Methods - April 2009

64

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering (Ng et al, 2001)

Results strongly depend on parameter values: k e σ

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5

-1

-0.5

0

0.5

1

1.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-1.5

-1

-0.5

0

0.5

1

1.5

K=3, σ=0.1 K=3, σ=0.45

From Single Clustering to Ensemble Methods - April 2009

Page 33: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

33

65

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering

Selection of parameter values:

MSE:

Eigengap

Rcut

k

i

n

j

i

j

iY

i

myN

MSE1 1

21

1

21)(

A

i j ij

K

k

K

kll Sj Sj jk

KA

ARcut k l1 ,1

From Single Clustering to Ensemble Methods - April 2009

66

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Selection of Parameters: Global Results on selecting σ and K

None of the studied methods is suitable for the automatic selection of the spectral clustering parameters

A majority voting decision did not significantly improve the results

Percentage of correct classification:

σ

From Single Clustering to Ensemble Methods - April 2009

Page 34: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

34

67

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Spectral Clustering

Strength

: Detects arbitrary-shaped clusters.

: By using an adequate similarity measure between patterns, can be

applied to all types of data

Weakness

: Computationally heavy

: Needs criteria to set the final number of clusters and scaling factor

From Single Clustering to Ensemble Methods - April 2009

68

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Model-Based Clustering: Finite Mixtures

k random sources, with probability density functions fi(x), i=1,…,k

f1(x)

f2(x)

fi(x)

fk(x)

Choose at random

X random variable

f (x|source i) = fi (x)Conditional:

f (x and source i) = fi (x) iJoint:

f(x) = Unconditional: f (x and source i) sourcesall

k

1i

ii )x(f

From Single Clustering to Ensemble Methods - April 2009

Page 35: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

35

69

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Model-Based Clustering: Finite Mixtures

each component models one cluster

clustering = mixture fitting

f1(x)

f2(x)

f3(x)

k

i i

i=1

f(x|Θ) = α f(x|θ )

From Single Clustering to Ensemble Methods - April 2009

70

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Gaussian Mixture Decomposition

Mixture Model:

:.

.

k

i i

i=1

f(x|Θ) = α f(x|θ )Component densities

Mixing probabilities:

k

1i

i 1iα 0 and

f ( | )ix Gaussian

Arbitrary covariances: ),|x(N)|x(f iii Cμ

},...,,,...,,,,...,,{ 1k21k21k21 CCCμμμ

Common covariance: ),|x(N)|x(f ii Cμ

},...,,,,...,,{ 1k21k21 Cμμμ

From Single Clustering to Ensemble Methods - April 2009

Page 36: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

36

71

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Mixture Model Fitting

n independent observations

Mixture density model:

Estimate that maximizes (log)likelihood (ML estimate of ):

k

i i

i=1

f(x|Θ) = α f(x|θ )

}x,...,x,x{ )n()2()1(x

),x(Lmaxargˆ

n

1j

k

1i

i

)j(

i )|(flog x

mixture

n

1j

)j()|x(flog),x(L

From Single Clustering to Ensemble Methods - April 2009

72

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Gaussian Mixture Model Fitting

Problem: the likelihood function is unbounded as

. There is no global maximum

. Unusual goal: a “good” local maximum

Example: a 2-component Gaussian mixture

some data points

0)det( i C

2

)x(

σ2

)x(

2

2

21

22

21

21

e2

1e

σ2),σ,μ,μ|x(f

}x,...,x,x{ n21

n

2j

2

)x(

2log(...)e

2

1

σ2log),x(L

221

0σas, 2

11 xμ

From Single Clustering to Ensemble Methods - April 2009

Page 37: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

37

73

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Mixture Model Fitting

ML estimate has no closed-form solution

Standard alternative: expectation-maximization (EM) algorithm:

: Missing data problem:

. Observed data: }x,...,x,x{ )n()2()1(x

From Single Clustering to Ensemble Methods - April 2009

74

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Mixture Model Fitting

ML estimate has no closed-form solution

Standard alternative: expectation-maximization (EM) algorithm:

: Missing data problem:

. Observed data:

. Missing data:

Missing labels (“colors”)

“1” at position i x( j) generated by component i

. Complete log-likelihood function:

}x,...,x,x{ )n()2()1(x},...,,{ )n()2()1(

zzzz

T)j(

k

)j(

2

)j(

1

)j( 0...010...0,z...,,z,z z

n

1j

k

1i

i

)j(

ii

)j(

ic )|(flogz),,(L xzx

)|z,x(flog )j()j(

From Single Clustering to Ensemble Methods - April 2009

Page 38: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

38

75

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

The EM Algorithm

The E-step: compute the expected value of

The M-step: update parameter estimates

),,(Lc zx

)ˆ,(Q]ˆ,x|),,(L[E )t()t(

c zx

)ˆ,(Qmaxargˆ )t()1t(

From Single Clustering to Ensemble Methods - April 2009

76

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

EM-Algorithm for Mixture of Gaussian

Iterative procedure:

The E-step:

The M-step:

,...ˆ,ˆ,...,ˆ,ˆ )1t()t()1()0(

( ) ( )( , )

( ) ( )

1

ˆˆ ( | )

ˆˆ ( | )

j tj t i i

i kj t

n n

n

f xw

f x

)t,j(

iw Estimate, at iteration t, of the probability that x( j) was

produced by component i

n

1j

)t,j(

i

)1t(

i wn

n

1j

)t,j(

i

n

1j

)j()t,j(

i

)1t(

i

w

xw

ˆ

n

1j

)t,j(

i

n

1j

T)1t(

i

)j()1t(

i

)j()t,j(

i

)1t(

i

w

)ˆx()ˆx(w

C

From Single Clustering to Ensemble Methods - April 2009

Page 39: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

39

77

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Mixture Gaussian Decomposition: Model Selection

How many components?

: The maximized likelihood never decreases when k increases

:

: Usually:

: Criteria in this cathegory:

. Minimum description length (MDL), Rissanen and Ristad, 1992.

. Akaike’s information criterion (AIC), Whindham and Cutler, 1992.

. Schwarz´s Bayesian inference criterion (BIC), Fraley and Raftery, 1998.

: Resampling-based techniques

. Bootstrap for clustering, Jain and Moreau, 1987.

. Bootstrap for Gaussian mixtures, McLachlan, 1987.

. Cross validation, Smyth, 1998.

maxminmin)k( k,...,1k,kk),ˆ(minargk C

)ˆ(P)ˆ,(L)ˆ( )k()k()k( xC

From Single Clustering to Ensemble Methods - April 2009

78

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Mixture Gaussian Decomposition: Model Selection

Given (k) , shortest code-length for x (Shannon’s):

MDL criterion: parameter code length

: Total code-length (two part code):

: MDL criterion:

)|(flog)|(L )k()k( xx

)(L)|(flog),(L )k()k()k( xx

Parameter

code-length

)(L)|(flogminargˆ)k()k()k(

)k(

x

L(each component of (k) ) = )'nlog(2

1

'n Amount of data from which the parameter is estimated

From Single Clustering to Ensemble Methods - April 2009

Page 40: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

40

79

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Mixture Gaussian Decomposition: Model Selection

Classical MDL: n’ = n

Mixtures MDL (MMDL) (Figueiredo, 2002)

: Using EM and redefining the M-Step

)nlog(2

k)|(flogminargˆ

)k()k(

)k( x

k

1m

m

pp

)k()k( )log(2

N)nlog(

2

)1N(k)|(flogminargˆ

)k(

x

2

Nw

pn

1j

)t,j(

ii

y

y+

k

1m

m

i)1t(

This M-step may annihilate components

M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002

Np is the number of parameters

of each component.

Gaussian, arbitrary covariances

Np = d + d(d+1)/2

Gaussian, common covariance:

Np = d

From Single Clustering to Ensemble Methods - April 2009

80

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Gaussian Mixture Decomposition – EM MMDL

Examples

From Single Clustering to Ensemble Methods - April 2009

Page 41: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

41

81

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Gaussian Mixture Decomposition – EM MMDL

Examples

From Single Clustering to Ensemble Methods - April 2009

82

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Gaussian Mixture Decomposition

Strength

: Model-based approach

: Good for Gaussian data

: Handles touching clusters

Weakness

: Unable to detect arbitrary shaped clusters

: Dependence on initialization

From Single Clustering to Ensemble Methods - April 2009

Page 42: Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

42

83

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

Gaussian Mixture Decomposition

It is a local (greedy) algorithm (likelihood never decreases)

=> Initialization dependent

74 iterations 270 iterations

From Single Clustering to Ensemble Methods - April 2009

84

Unsupervised Learning Clustering Algorithms

Unsupervised Learning -- Ana Fred

References

A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.

A.K. Jain and M. N. Murty and P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol 31. No

3,pp 264-323, 1999.

Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann Pusblishers, 2001.

A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, in Advances in

Neural Information Processing Systems 14, T. G. Dietterich and S. Becker and Z. Ghahramani, MIT

Press, 2002,

M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002

A. L. N. Fred, J. M. N. Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE

Trans. On Pattern Analysis and Machine Intelligence, vol 25, NO. 8, pp 2003.

From Single Clustering to Ensemble Methods - April 2009