Clustering Validity Adriano Joaquim de O Cruz ©2006 NCE/UFRJ [email protected].

Clustering ValidityClustering Validity

Adriano Joaquim de O Cruz ©2006

NCE/UFRJ

[email protected]

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 2

Clustering ValidityClustering Validity

The number of clusters is not always The number of clusters is not always previously known.previously known.

In many problems the number of In many problems the number of classes is known but it is not the best classes is known but it is not the best configuration.configuration.

It is necessary to study methods to It is necessary to study methods to indicate and/or validate the number of indicate and/or validate the number of classes.classes.


Clustering Validity Example 1Clustering Validity Example 1

Consider the problem of number Consider the problem of number recognitionrecognition

It is known that there are 10 classes (10 It is known that there are 10 classes (10 digits)digits)

The number of clusters, however, may The number of clusters, however, may be greater than 10be greater than 10

This is the result of different handwriting This is the result of different handwriting to the same digitto the same digit


Clustering Validity Example 2Clustering Validity Example 2

Consider the problem segmentation of Consider the problem segmentation of thermal image in a roomthermal image in a room

It is known that there are 2 classes of It is known that there are 2 classes of temperatures: body and room temperatures: body and room temperaturestemperatures

This is a problem where the number of This is a problem where the number of classes is well defined.classes is well defined.


Clustering Validity ProblemClustering Validity Problem

First data is partitioned in different number of First data is partitioned in different number of clustersclusters

It is also important to try different initial It is also important to try different initial conditions to the same number of partitionsconditions to the same number of partitions

Validity measures are applied to these Validity measures are applied to these partitions to estimate their qualitypartitions to estimate their quality

It is necessary to estimate the quality when It is necessary to estimate the quality when the number of partitions is changed and, for the number of partitions is changed and, for the same number, when the initial conditions the same number, when the initial conditions are differentare different

Clustering Validity

L-ClustersL-Clusters


Initial Definitions Initial Definitions

dd((eeii,,eekk) is the dissimilarity between ) is the dissimilarity between element element eeii and and eekk. .

Euclidean distance is an example Euclidean distance is an example of an measure of dissimilarityof an measure of dissimilarity


L–Cluster DefinitionL–Cluster Definition

CC is an L-cluster if for is an L-cluster if for each objecteach object eeii belonging to belonging to CC::

eekk C,C, maxmax dd((eeii,,eekk)<)<eehh C, C, minmin dd((eeii,,eehh))

Maximum distance between any element Maximum distance between any element eeii and any element and any element eekk is smaller than the is smaller than the minimum distance between minimum distance between eeii and any and any eehh from another cluster.from another cluster.


L-clusterL-cluster

C


L* – DefinitionL* – Definition

CC is an L*-cluster if for each object is an L*-cluster if for each object eeii belonging to belonging to CC::

eekk C,C, maxmax dd((eeii,,eekk) < ) < eell C,C, eehh C, C, minmin dd((eell,,eehh))


L*-clusterL*-cluster

C

Clustering Validity

SilhouettesSilhouettes


IntroductionIntroduction

Silhouettes: a graphical aid to the Silhouettes: a graphical aid to the interpretation and validation of cluster interpretation and validation of cluster analysis. analysis. Journal of Computational and Journal of Computational and Applied MathematicsApplied Mathematics. P.J. Rousseeuw, 1987. P.J. Rousseeuw, 1987

Each cluster is represented by one silhouette, Each cluster is represented by one silhouette, showing which objects lie well within the showing which objects lie well within the cluster.cluster.

The user can compare the quality of the The user can compare the quality of the clustersclusters


Method - IMethod - I

Consider a cluster Consider a cluster A .A . For each element For each element eei i A A calculate the calculate the

average dissimilarity to all other objects average dissimilarity to all other objects of of AA, , aa((eeii) = ) = dd((eeii,A,A).).

Therefore, Therefore, AA can not be a singleton. can not be a singleton. Euclidean distance is an example of Euclidean distance is an example of

dissimilarity.dissimilarity.


Method - IIMethod - II

Consider all clusters Consider all clusters CCkk different from different from AA..

Calculate Calculate ddkk((eeii,C,Ckk), the average ), the average

dissimilarity of dissimilarity of eeii to all elements of to all elements of CCkk..

Select Select bb((eeii) = ) = minmin((ddkk((eeii,C,Ckk)).)).

Let us call Let us call BB the cluster whose the cluster whose dissimilarity is dissimilarity is bb((eeii).).

This is the second-best choice for This is the second-best choice for eeii


Method - IIIMethod - III

The silhouette s(The silhouette s(eeii) is equal to ) is equal to

ss((eeii) = 1–[) = 1–[aa((eeii) / ) / bb((eeii)])] sese aa((eeii) < ) < bb((eeii))..

ss((eeii) = 0 ) = 0 sese aa((eeii) = ) = bb((eeii))..

ss((eeii) = [) = [bb((eeii) / ) / aa((eeii)] - 1 )] - 1 sese aa((eeii) > ) > bb((eeii))..

ouou

ss((eeii) = [) = [bb((eeii) - ) - aa((eeii)] / )] / maxmax ( (bb((eeii),),aa((eeii))))

-1 <= -1 <= ss((eeii) <= +1) <= +1


Understanding Understanding ss((eeii))

ss((eeii) ) 1: within dissimilarity 1: within dissimilarity aa((eeii) << ) <<

bb((eeii), ), eeii is well classified. is well classified.

ss((eeii) ) 0: 0: aa((eeii) ) bb((eeii), ), eeii may belong to may belong to

either cluster.either cluster. ss((eeii) ) -1: within dissimilarity -1: within dissimilarity

aa((eeii)>>)>>bb((eeii), ), eeii is misclassified, should is misclassified, should

belong to belong to BB..


SilhouetteSilhouette

The silhouette of the cluster The silhouette of the cluster AA is the plot is the plot of all of all ss((eeii) ranked in decreasing order.) ranked in decreasing order.

The average of all The average of all ss((eeii) of all elements ) of all elements

in the cluster is called the average in the cluster is called the average silhouette.silhouette.


Example of use I Example of use I

QTY = 100;QTY = 100;

X = [randn(QTY,2)+0.5*ones(QTY,2);randn(QTY,2)...X = [randn(QTY,2)+0.5*ones(QTY,2);randn(QTY,2)...

- 0.5*ones(QTY,2)];- 0.5*ones(QTY,2)];

opts = statset('Display','final');opts = statset('Display','final');

[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...

'Replicates',5, 'Options',opts);'Replicates',5, 'Options',opts);

figure;figure;

plot(X(cidx==1,1),X(cidx==1,2),'r.', ...plot(X(cidx==1,1),X(cidx==1,2),'r.', ...

X(cidx==2,1),X(cidx==2,2), ...X(cidx==2,1),X(cidx==2,2), ...

'b.', ctrs(:,1),ctrs(:,2),'kx');'b.', ctrs(:,1),ctrs(:,2),'kx');

figure;figure;

[s, h] = silhouette(X, cidx, 'sqeuclid');[s, h] = silhouette(X, cidx, 'sqeuclid');


Ex Silhouette 1Ex Silhouette 1


Ex Silhouette 2Ex Silhouette 2


Example of use I IExample of use I I

QTY = 100;QTY = 100;

X = [randn(QTY,2)+2*ones(QTY,2);randn(QTY,2)...X = [randn(QTY,2)+2*ones(QTY,2);randn(QTY,2)...

- 2*ones(QTY,2)];- 2*ones(QTY,2)];

opts = statset('Display','final');opts = statset('Display','final');

[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...

'Replicates',5, 'Options',opts);'Replicates',5, 'Options',opts);

figure;figure;

plot(X(cidx==1,1),X(cidx==1,2),'r.', ...plot(X(cidx==1,1),X(cidx==1,2),'r.', ...

X(cidx==2,1),X(cidx==2,2), ...X(cidx==2,1),X(cidx==2,2), ...

'b.', ctrs(:,1),ctrs(:,2),'kx');'b.', ctrs(:,1),ctrs(:,2),'kx');

figure;figure;

[s, h] = silhouette(X, cidx, 'sqeuclid');[s, h] = silhouette(X, cidx, 'sqeuclid');


Ex silhouette 3Ex silhouette 3


Ex silhouette 4Ex silhouette 4

Cluster Validity

Partition CoefficientPartition Coefficient



This coefficient is defined asThis coefficient is defined as

1/1

/1 1

2

Fc

n)(μ=Fc

=i

n

j=ij


Partition Coefficient commentsPartition Coefficient comments

FF is inversely proportional to the number is inversely proportional to the number of clusters.of clusters.

FF is not appropriated to find the best is not appropriated to find the best number of partitionsnumber of partitions

FF is best suited to validate the best is best suited to validate the best partition among those with the same partition among those with the same number of clustersnumber of clusters



When When F=1/cF=1/c the system is entirely the system is entirely fuzzy, since every element belongs to fuzzy, since every element belongs to all clusters with the same degree of all clusters with the same degree of membershipmembership

When When F=1F=1 the system is rigid and the system is rigid and membership values are either 1 or 0.membership values are either 1 or 0.

This measurement can only be applied This measurement can only be applied to fuzzy partitionsto fuzzy partitions


Partition Coefficient ExamplePartition Coefficient Example

The Partition Matrix isThe Partition Matrix is

w1

w2

w3

w3

1100

0011=U

14

1111 2222

=+++

=F




w1

w2 w3

w4

0.50.50.50.5

0.50.50.50.5=U

c====F /12/10.54

0.58 2




0.80.70.1100.5

0.20.30.9010.51=U

X1 X2 X3

X4 X5 X6

0.7636

0.80.70.110.50.20.30.910.5 2222222222

=F

+++++++++=F

Cluster Validity

Partition EntropyPartition Entropy


Partition EntropyPartition Entropy

Partition Entropy is defined asPartition Entropy is defined as

When When H=0H=0 the partition is rigid. the partition is rigid. When When H=log(c)H=log(c) the fuzziness is maximum. the fuzziness is maximum. 0 <= 1-F <= H0 <= 1-F <= H

cH

n)μ(μ=Hc

=iij

n

j=ij

log0

/log1 1


Partition Entropy commentsPartition Entropy comments

Partition Entropy (Partition Entropy (HH) is directly proportional to ) is directly proportional to the number of partitions.the number of partitions.

HH is more appropriated to validate the best is more appropriated to validate the best partition among several runs of an algorithm.partition among several runs of an algorithm.

HH is strictly a fuzzy measure is strictly a fuzzy measure

Cluster Validity

Compactness and SeparationCompactness and Separation



CS is defined as CS is defined as

JJmm is the objective function minimized by is the objective function minimized by

the FCM algorithm.the FCM algorithm. nn is the number of elements. is the number of elements. ddminmin is minimum Euclidean distance is minimum Euclidean distance

between the center of two clusters.between the center of two clusters.

2min )(dn

J=CS m



The minimum distance is defined asThe minimum distance is defined as

The complete formula isThe complete formula is

jiji,cc=d nim

min

2nim

1 1

2

jiji,

c

=i

n

j=ji

mij

vvn

xvμ

=CS



This a very complete validation This a very complete validation measure.measure.

It validates the number of clusters and It validates the number of clusters and the checks the separation among the checks the separation among clusters.clusters.

From our experiments it works well From our experiments it works well even when the degree of superposition even when the degree of superposition is high.is high.

Cluster Validity

Fuzzy Linear DiscriminantFuzzy Linear Discriminant


Fischer Linear DiscriminantFischer Linear Discriminant

The Fisher’s Linear Discriminant (FLD)The Fisher’s Linear Discriminant (FLD) is an important technique used in is an important technique used in pattern recognition problems to evaluate pattern recognition problems to evaluate the the compactnesscompactness and and separationseparation of the of the partitions produced by partitions produced by crisp clusteringcrisp clustering techniques.techniques.



It is easier to handle classification It is easier to handle classification problems in which sampled data has problems in which sampled data has few characteristicsfew characteristics

So it is important to reduce the problem So it is important to reduce the problem dimensionalitydimensionality

When FLD is applied to a space crisply When FLD is applied to a space crisply partitioned it produces an operator (partitioned it produces an operator (WW) ) that maps the original set (that maps the original set (RRpp) into a ) into a new set (new set (RRkk), where ), where k<pk<p



W

x1

x2

Figura . – Projeção de amostras dispostas em 2 classes em uma reta feita pelo Discriminante Linear de Fisher


FLDFLD

FLD measures the compactness and FLD measures the compactness and separation of all categories when crisp separation of all categories when crisp partitions are createdpartitions are created

FLD uses two matrices: FLD uses two matrices:

SSBB : Between Classes Scatter Matrix : Between Classes Scatter Matrix

SSWW: Within Classes Scatter Matrix: Within Classes Scatter Matrix


FLD – FLD – SSBB Matrix Matrix

Measures the quality of separation between classes

Ti

c

=iiiB m))(mm(mn=S

1

n

j=jxn

=m1

1ii

n

j=i

ii cxxn

=mi

,1

1


FLD – FLD – SSBB Matrix Matrix

m is the average of all samples

mi is the average of all samples belonging to cluster i

n is the number of samples ni is the number of samples belonging to cluster i

Ti

c

=iiiB m))(mm(mn=S

1

n

j=jxn

=m1

1 ii

n

=ji

ii cxxn

=mi

,1

1


FLD – FLD – SSWW Matrix Matrix

Measures the compactness of all Measures the compactness of all classesclasses

It is the sum of all internal scatteringIt is the sum of all internal scattering

Tij

icjijiW

)m)(xm(x=S

c

=i

Tij

n

j=ijW )m)(xm(x=S

1 1


Total ScatteringTotal Scattering

The total scattering is the sum of the The total scattering is the sum of the internal scattering and the scattering internal scattering and the scattering between the classesbetween the classes

SSTT=S=SWW+S+SBB

In an optimal partition the separation In an optimal partition the separation between classes (between classes (SSBB) must be maximum ) must be maximum

and within the classes minimum (and within the classes minimum (SSWW))


JJ criteria criteria

Fisher defined the Fisher defined the JJ criteria that must criteria that must be maximizedbe maximized

A simplified way to evaluate A simplified way to evaluate JJ is is

WB

S

S=J

)trace(S

)trace(S=J

W

B


JJ comments comments

JJ may vary in the interval 0<= may vary in the interval 0<=JJ<=<=

JJ is strictly rigid is strictly rigid

JJ looses precision as the sample looses precision as the sample overlapping increasesoverlapping increases


EFLDEFLD

EFLD measures the compactness and EFLD measures the compactness and separation of all categories when fuzzy separation of all categories when fuzzy partitions are createdpartitions are created

EFLD uses two matrices: EFLD uses two matrices:

SSBeBe : Between Classes Scatter Matrix : Between Classes Scatter Matrix

SSWeWe: Within Classes Scatter Matrix: Within Classes Scatter Matrix


EFLD – EFLD – SSBeBe Matrix Matrix

Measures the quality of separation Measures the quality of separation between classesbetween classes

n

j=jxn

=m1

1

Tei

c

=i

n

j=eiijBe m))(mm(mμ=S

1 1

n

j=ij

n

j=jij

ei

μ

xμ

=m

1

1


EFLD – EFLD – SSWeWe Matrix Matrix

Measures the compactness of all Measures the compactness of all classesclasses

It is the sum of all internal scatteringIt is the sum of all internal scattering

c

=i

Teij

n

j=eijijWe )m)(xm(xμ=S

1 1


Total ScatteringTotal Scattering

The total scattering is the sum of the The total scattering is the sum of the internal scattering and the scattering internal scattering and the scattering between the classesbetween the classes

SSTeTe=S=SWeWe+S+SBeBe

In an optimal partition the separation In an optimal partition the separation between classes (between classes (SSBeBe) must be ) must be

maximum and within the classes maximum and within the classes minimum (minimum (SSWeWe))


JJee criteria criteria

JJee : criteria that must be maximised : criteria that must be maximised

A simplified way to evaluate A simplified way to evaluate JJee is is

eW

eB

e S

S=J

)trace(S

)trace(S=J

eW

eB

e


Simplifying Simplifying JJee criteria criteria

A simplified way to evaluate A simplified way to evaluate JJee It can be proved that It can be proved that SSTT is constant and is constant and

equal toequal to

n

j=jT

TT

mx=S

)(S=S

1

2

trace

BeT

Be

We

Bee SS

S=

S

S=J


JJee comments comments

JJee may vary in the interval 0<= may vary in the interval 0<=JJee<=<=

JJee is strictly rigid is strictly rigid

JJee looses precision as the sample looses precision as the sample

overlapping increasesoverlapping increases


Applying EFLDApplying EFLD

EFLD

Número de Categorias

2 3 4 5 6

Amostras X1 4,6815 4,9136 0,2943 0,2559 0,3157

Amostras X2 0,3271 0,8589 0,8757 0,9608 1,0674

Cluster Validity

Inter Class ContrastInter Class Contrast


CommentsComments

EFLDEFLD

Increases as the number of clusters Increases as the number of clusters rises. rises.

Increases when classes have high Increases when classes have high degree of overlapping.degree of overlapping.

Reaches maximum for a wrong number Reaches maximum for a wrong number of clusters.of clusters.


ICCICC

Evaluates a crisp and fuzzy clustering Evaluates a crisp and fuzzy clustering algorithmsalgorithms

Measures:Measures: Partition Compactness Partition Compactness Partition Separation Partition Separation

ICC must be MaximizedICC must be Maximized


ICCICC

ssBeBe – estimates the quality of the – estimates the quality of the

placement of the centres. placement of the centres. 1/1/nn – scale factor – scale factor

Compensates the influence of the number Compensates the influence of the number of points in of points in ssBeBe

cDn

s=ICC Be

min


ICC - 2ICC - 2

DDminmin – minimum Euclidian distance between – minimum Euclidian distance between all pairs of centresall pairs of centres

Neutralizes the tendency of Neutralizes the tendency of ssBeBe to grow, to grow,

avoiding the maximum being reached for a avoiding the maximum being reached for a number of clusters greater than the ideal number of clusters greater than the ideal value.value. When 2 or more clusters represent a class When 2 or more clusters represent a class

– – DDminmin decreases abruptly decreases abruptly

cDn

s=ICC Be

min


ICC Fuzzy ApplicationICC Fuzzy Application

Five classes with 500 points eachFive classes with 500 points each No class overlappingNo class overlapping X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3 Apply FCM for m = 2 and c = 2 ...10Apply FCM for m = 2 and c = 2 ...10


ICC Fuzzy Application ResultsICC Fuzzy Application Results

00000,3160,3160,1000,10000MinRFMinRF

1,8871,8871,3271,3270,4960,4960,5190,519MMMeanHTMeanHT

1,9941,9942,1242,1240,5720,5720,6470,647MMMinHTMinHT

0,9430,9430,7950,7950,7130,7130,7050,705MMFF

0,0110,0110,0700,0700,0960,0960,3500,350mmCSCS

182,70182,703,9603,9600,9550,955INDINDMMEFLDDetEFLDDet

13,6513,651,8771,8770,9860,9860,1850,185MMEFLDTraEFLDTra

13.6513.651.8771.8770.9860.9860.1850.185MMEFLDEFLD

673637673637259791259791154685154685INDINDMMICCDetICCDet

96,7096,7051,9251,9241,9941,997,5967,596MMICCTraICCTra

96,7096,7051,9251,9241,9941,997,5967,596MMICCICC

55443322

Number of clustersNumber of clustersMeasuresMeasures


ICC Fuzzy Application TimeICC Fuzzy Application Time

0,00530,0053220,00490,00490,00450,00450,00610,0061FPIFPI

0,00490,0049110,00490,00490,00450,00450,00440,0044FF

0,00600,0060330,00580,00580,00560,00560,00610,0061NFINFI

0,04760,04760,03820,03820,02610,02610,02260,0226CSCS

2,01602,01601,55101,55101,13921,13920,78000,7800EFLDDetEFLDDet

1,89821,89821,47801,47801,08701,08700,76780,7678EFLDTraEFLDTra

0.00800.00800.00630.00630.00710.00710.00530.0053EFLDEFLD

0,01320,01320,01100,01100,00880,00880,01100,0110ICCDetICCDet

0,01100,01100,00880,00880,00600,00600,00780,0078ICCTraICCTra

0,00910,0091440,00820,00820,00690,00690,00610,0061ICCICC

55443322

Number of CategoriesNumber of CategoriesTimeTime


Application with OverlappingApplication with Overlapping

Five classes with 500 points eachFive classes with 500 points each High cluster overlapping High cluster overlapping X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3 Apply FCM for m = 2 and c = 2 ...10Apply FCM for m = 2 and c = 2 ...10


Application Overlapping ResultsApplication Overlapping Results

0,5650,5650,5250,5250,5610,5610,6010,6010,5680,568mmMPEMPE

0,4020,4020,2100,2100,1940,1940,2940,2940,1700,17000MinRFMinRF

0,4290,4290,5970,5970,5500,5500,4850,4850,6320,632MMMeanHTMeanHT

0,4390,4390,5860,5860,5910,5910,6210,6210,7540,754MMFF

0,2230,2230,1220,1220,1910,1910,2250,2250,1640,164mmCSCS

1,2001,2000,7430,7430,3150,3150,0490,049INDINDMMEFLDDetEFLDDet

1,3441,3441,0951,0950,8390,8390,5850,5850,4500,450MMEFLDTraEFLDTra

1.3441.3441.0951.0950.8390.8390.5850.5850.4500.450MMEFLDEFLD

602460247048704835723572715,19715,19INDINDMMICCDetICCDet

5,695,697,8297,8296,1916,1914,9384,9385,0655,065MMICCTraICCTra

5,695,697,8297,8296,1916,1914,9384,9385,0655,065MMICCICC

101055443322MeasuresMeasures


Application Time ResultsApplication Time Results

0,03970,0397220,03190,03190,02710,02710,01670,0167MPEMPE

0,01640,01640,00610,00610,01210,01210,01120,0112FF

0,05900,0590330,03620,03620,02830,02830,02200,0220CSCS

1,84501,84501,60901,60901,25801,25800,97200,9720EFLDDetEFLDDet

2,25842,25841,75981,75982,10382,10380,79300,7930EFLDTraEFLDTra

0.01100.01100.00960.00960.00880.00880.00630.0063EFLDEFLD

0,01200,01200,01100,01100,00780,00780,01100,0110ICCDetICCDet

0,01100,01100,00980,00980,00600,00600,00660,0066ICCTraICCTra

0,00880,0088110,00770,00770,00640,00640,00600,0060ICCICC

55443322

Number of ClustersNumber of ClustersTimeTime


ICC conclusionsICC conclusions

Fast and efficientFast and efficient Works with fuzzy and crisp partitionsWorks with fuzzy and crisp partitions Efficient even with high overlapping Efficient even with high overlapping

clustersclusters High rate of right resultsHigh rate of right results

Clustering Validity Adriano Joaquim de O Cruz ©2006 NCE/UFRJ [email protected].

Documents

Transcript of Clustering Validity Adriano Joaquim de O Cruz ©2006 NCE/UFRJ [email protected].