Clustering Validity Adriano Joaquim de O Cruz ©2006 NCE/UFRJ [email protected].
-
Upload
grace-hicks -
Category
Documents
-
view
216 -
download
1
Transcript of Clustering Validity Adriano Joaquim de O Cruz ©2006 NCE/UFRJ [email protected].
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 2
Clustering ValidityClustering Validity
The number of clusters is not always The number of clusters is not always previously known.previously known.
In many problems the number of In many problems the number of classes is known but it is not the best classes is known but it is not the best configuration.configuration.
It is necessary to study methods to It is necessary to study methods to indicate and/or validate the number of indicate and/or validate the number of classes.classes.
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 3
Clustering Validity Example 1Clustering Validity Example 1
Consider the problem of number Consider the problem of number recognitionrecognition
It is known that there are 10 classes (10 It is known that there are 10 classes (10 digits)digits)
The number of clusters, however, may The number of clusters, however, may be greater than 10be greater than 10
This is the result of different handwriting This is the result of different handwriting to the same digitto the same digit
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 4
Clustering Validity Example 2Clustering Validity Example 2
Consider the problem segmentation of Consider the problem segmentation of thermal image in a roomthermal image in a room
It is known that there are 2 classes of It is known that there are 2 classes of temperatures: body and room temperatures: body and room temperaturestemperatures
This is a problem where the number of This is a problem where the number of classes is well defined.classes is well defined.
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 5
Clustering Validity ProblemClustering Validity Problem
First data is partitioned in different number of First data is partitioned in different number of clustersclusters
It is also important to try different initial It is also important to try different initial conditions to the same number of partitionsconditions to the same number of partitions
Validity measures are applied to these Validity measures are applied to these partitions to estimate their qualitypartitions to estimate their quality
It is necessary to estimate the quality when It is necessary to estimate the quality when the number of partitions is changed and, for the number of partitions is changed and, for the same number, when the initial conditions the same number, when the initial conditions are differentare different
Clustering Validity
L-ClustersL-Clusters
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 7
Initial Definitions Initial Definitions
dd((eeii,,eekk) is the dissimilarity between ) is the dissimilarity between element element eeii and and eekk. .
Euclidean distance is an example Euclidean distance is an example of an measure of dissimilarityof an measure of dissimilarity
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 8
L–Cluster DefinitionL–Cluster Definition
CC is an L-cluster if for is an L-cluster if for each objecteach object eeii belonging to belonging to CC::
eekk C,C, maxmax dd((eeii,,eekk)<)<eehh C, C, minmin dd((eeii,,eehh))
Maximum distance between any element Maximum distance between any element eeii and any element and any element eekk is smaller than the is smaller than the minimum distance between minimum distance between eeii and any and any eehh from another cluster.from another cluster.
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 9
L-clusterL-cluster
C
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 10
L* – DefinitionL* – Definition
CC is an L*-cluster if for each object is an L*-cluster if for each object eeii belonging to belonging to CC::
eekk C,C, maxmax dd((eeii,,eekk) < ) < eell C,C, eehh C, C, minmin dd((eell,,eehh))
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 11
L*-clusterL*-cluster
C
Clustering Validity
SilhouettesSilhouettes
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 13
IntroductionIntroduction
Silhouettes: a graphical aid to the Silhouettes: a graphical aid to the interpretation and validation of cluster interpretation and validation of cluster analysis. analysis. Journal of Computational and Journal of Computational and Applied MathematicsApplied Mathematics. P.J. Rousseeuw, 1987. P.J. Rousseeuw, 1987
Each cluster is represented by one silhouette, Each cluster is represented by one silhouette, showing which objects lie well within the showing which objects lie well within the cluster.cluster.
The user can compare the quality of the The user can compare the quality of the clustersclusters
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 14
Method - IMethod - I
Consider a cluster Consider a cluster A .A . For each element For each element eei i A A calculate the calculate the
average dissimilarity to all other objects average dissimilarity to all other objects of of AA, , aa((eeii) = ) = dd((eeii,A,A).).
Therefore, Therefore, AA can not be a singleton. can not be a singleton. Euclidean distance is an example of Euclidean distance is an example of
dissimilarity.dissimilarity.
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 15
Method - IIMethod - II
Consider all clusters Consider all clusters CCkk different from different from AA..
Calculate Calculate ddkk((eeii,C,Ckk), the average ), the average
dissimilarity of dissimilarity of eeii to all elements of to all elements of CCkk..
Select Select bb((eeii) = ) = minmin((ddkk((eeii,C,Ckk)).)).
Let us call Let us call BB the cluster whose the cluster whose dissimilarity is dissimilarity is bb((eeii).).
This is the second-best choice for This is the second-best choice for eeii
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 16
Method - IIIMethod - III
The silhouette s(The silhouette s(eeii) is equal to ) is equal to
ss((eeii) = 1–[) = 1–[aa((eeii) / ) / bb((eeii)])] sese aa((eeii) < ) < bb((eeii))..
ss((eeii) = 0 ) = 0 sese aa((eeii) = ) = bb((eeii))..
ss((eeii) = [) = [bb((eeii) / ) / aa((eeii)] - 1 )] - 1 sese aa((eeii) > ) > bb((eeii))..
ouou
ss((eeii) = [) = [bb((eeii) - ) - aa((eeii)] / )] / maxmax ( (bb((eeii),),aa((eeii))))
-1 <= -1 <= ss((eeii) <= +1) <= +1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 17
Understanding Understanding ss((eeii))
ss((eeii) ) 1: within dissimilarity 1: within dissimilarity aa((eeii) << ) <<
bb((eeii), ), eeii is well classified. is well classified.
ss((eeii) ) 0: 0: aa((eeii) ) bb((eeii), ), eeii may belong to may belong to
either cluster.either cluster. ss((eeii) ) -1: within dissimilarity -1: within dissimilarity
aa((eeii)>>)>>bb((eeii), ), eeii is misclassified, should is misclassified, should
belong to belong to BB..
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 18
SilhouetteSilhouette
The silhouette of the cluster The silhouette of the cluster AA is the plot is the plot of all of all ss((eeii) ranked in decreasing order.) ranked in decreasing order.
The average of all The average of all ss((eeii) of all elements ) of all elements
in the cluster is called the average in the cluster is called the average silhouette.silhouette.
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 19
Example of use I Example of use I
QTY = 100;QTY = 100;
X = [randn(QTY,2)+0.5*ones(QTY,2);randn(QTY,2)...X = [randn(QTY,2)+0.5*ones(QTY,2);randn(QTY,2)...
- 0.5*ones(QTY,2)];- 0.5*ones(QTY,2)];
opts = statset('Display','final');opts = statset('Display','final');
[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...
'Replicates',5, 'Options',opts);'Replicates',5, 'Options',opts);
figure;figure;
plot(X(cidx==1,1),X(cidx==1,2),'r.', ...plot(X(cidx==1,1),X(cidx==1,2),'r.', ...
X(cidx==2,1),X(cidx==2,2), ...X(cidx==2,1),X(cidx==2,2), ...
'b.', ctrs(:,1),ctrs(:,2),'kx');'b.', ctrs(:,1),ctrs(:,2),'kx');
figure;figure;
[s, h] = silhouette(X, cidx, 'sqeuclid');[s, h] = silhouette(X, cidx, 'sqeuclid');
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 20
Ex Silhouette 1Ex Silhouette 1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 21
Ex Silhouette 2Ex Silhouette 2
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 22
Example of use I IExample of use I I
QTY = 100;QTY = 100;
X = [randn(QTY,2)+2*ones(QTY,2);randn(QTY,2)...X = [randn(QTY,2)+2*ones(QTY,2);randn(QTY,2)...
- 2*ones(QTY,2)];- 2*ones(QTY,2)];
opts = statset('Display','final');opts = statset('Display','final');
[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...[cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...
'Replicates',5, 'Options',opts);'Replicates',5, 'Options',opts);
figure;figure;
plot(X(cidx==1,1),X(cidx==1,2),'r.', ...plot(X(cidx==1,1),X(cidx==1,2),'r.', ...
X(cidx==2,1),X(cidx==2,2), ...X(cidx==2,1),X(cidx==2,2), ...
'b.', ctrs(:,1),ctrs(:,2),'kx');'b.', ctrs(:,1),ctrs(:,2),'kx');
figure;figure;
[s, h] = silhouette(X, cidx, 'sqeuclid');[s, h] = silhouette(X, cidx, 'sqeuclid');
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 23
Ex silhouette 3Ex silhouette 3
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 24
Ex silhouette 4Ex silhouette 4
Cluster Validity
Partition CoefficientPartition Coefficient
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 26
Partition CoefficientPartition Coefficient
This coefficient is defined asThis coefficient is defined as
1/1
/1 1
2
Fc
n)(μ=Fc
=i
n
j=ij
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 27
Partition Coefficient commentsPartition Coefficient comments
FF is inversely proportional to the number is inversely proportional to the number of clusters.of clusters.
FF is not appropriated to find the best is not appropriated to find the best number of partitionsnumber of partitions
FF is best suited to validate the best is best suited to validate the best partition among those with the same partition among those with the same number of clustersnumber of clusters
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 28
Partition CoefficientPartition Coefficient
When When F=1/cF=1/c the system is entirely the system is entirely fuzzy, since every element belongs to fuzzy, since every element belongs to all clusters with the same degree of all clusters with the same degree of membershipmembership
When When F=1F=1 the system is rigid and the system is rigid and membership values are either 1 or 0.membership values are either 1 or 0.
This measurement can only be applied This measurement can only be applied to fuzzy partitionsto fuzzy partitions
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 29
Partition Coefficient ExamplePartition Coefficient Example
The Partition Matrix isThe Partition Matrix is
w1
w2
w3
w3
1100
0011=U
14
1111 2222
=+++
=F
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 30
Partition Coefficient ExamplePartition Coefficient Example
The Partition Matrix isThe Partition Matrix is
w1
w2 w3
w4
0.50.50.50.5
0.50.50.50.5=U
c====F /12/10.54
0.58 2
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 31
Partition Coefficient ExamplePartition Coefficient Example
The Partition Matrix isThe Partition Matrix is
0.80.70.1100.5
0.20.30.9010.51=U
X1 X2 X3
X4 X5 X6
0.7636
0.80.70.110.50.20.30.910.5 2222222222
=F
+++++++++=F
Cluster Validity
Partition EntropyPartition Entropy
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 33
Partition EntropyPartition Entropy
Partition Entropy is defined asPartition Entropy is defined as
When When H=0H=0 the partition is rigid. the partition is rigid. When When H=log(c)H=log(c) the fuzziness is maximum. the fuzziness is maximum. 0 <= 1-F <= H0 <= 1-F <= H
cH
n)μ(μ=Hc
=iij
n
j=ij
log0
/log1 1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 34
Partition Entropy commentsPartition Entropy comments
Partition Entropy (Partition Entropy (HH) is directly proportional to ) is directly proportional to the number of partitions.the number of partitions.
HH is more appropriated to validate the best is more appropriated to validate the best partition among several runs of an algorithm.partition among several runs of an algorithm.
HH is strictly a fuzzy measure is strictly a fuzzy measure
Cluster Validity
Compactness and SeparationCompactness and Separation
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 36
Compactness and SeparationCompactness and Separation
CS is defined as CS is defined as
JJmm is the objective function minimized by is the objective function minimized by
the FCM algorithm.the FCM algorithm. nn is the number of elements. is the number of elements. ddminmin is minimum Euclidean distance is minimum Euclidean distance
between the center of two clusters.between the center of two clusters.
2min )(dn
J=CS m
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 37
Compactness and SeparationCompactness and Separation
The minimum distance is defined asThe minimum distance is defined as
The complete formula isThe complete formula is
jiji,cc=d nim
min
2nim
1 1
2
jiji,
c
=i
n
j=ji
mij
vvn
xvμ
=CS
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 38
Compactness and SeparationCompactness and Separation
This a very complete validation This a very complete validation measure.measure.
It validates the number of clusters and It validates the number of clusters and the checks the separation among the checks the separation among clusters.clusters.
From our experiments it works well From our experiments it works well even when the degree of superposition even when the degree of superposition is high.is high.
Cluster Validity
Fuzzy Linear DiscriminantFuzzy Linear Discriminant
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 40
Fischer Linear DiscriminantFischer Linear Discriminant
The Fisher’s Linear Discriminant (FLD)The Fisher’s Linear Discriminant (FLD) is an important technique used in is an important technique used in pattern recognition problems to evaluate pattern recognition problems to evaluate the the compactnesscompactness and and separationseparation of the of the partitions produced by partitions produced by crisp clusteringcrisp clustering techniques.techniques.
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 41
Fischer Linear DiscriminantFischer Linear Discriminant
It is easier to handle classification It is easier to handle classification problems in which sampled data has problems in which sampled data has few characteristicsfew characteristics
So it is important to reduce the problem So it is important to reduce the problem dimensionalitydimensionality
When FLD is applied to a space crisply When FLD is applied to a space crisply partitioned it produces an operator (partitioned it produces an operator (WW) ) that maps the original set (that maps the original set (RRpp) into a ) into a new set (new set (RRkk), where ), where k<pk<p
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 42
Fischer Linear DiscriminantFischer Linear Discriminant
W
x1
x2
Figura . – Projeção de amostras dispostas em 2 classes em uma reta feita pelo Discriminante Linear de Fisher
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 43
FLDFLD
FLD measures the compactness and FLD measures the compactness and separation of all categories when crisp separation of all categories when crisp partitions are createdpartitions are created
FLD uses two matrices: FLD uses two matrices:
SSBB : Between Classes Scatter Matrix : Between Classes Scatter Matrix
SSWW: Within Classes Scatter Matrix: Within Classes Scatter Matrix
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 44
FLD – FLD – SSBB Matrix Matrix
Measures the quality of separation between classes
Ti
c
=iiiB m))(mm(mn=S
1
n
j=jxn
=m1
1ii
n
j=i
ii cxxn
=mi
,1
1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 45
FLD – FLD – SSBB Matrix Matrix
m is the average of all samples
mi is the average of all samples belonging to cluster i
n is the number of samples ni is the number of samples belonging to cluster i
Ti
c
=iiiB m))(mm(mn=S
1
n
j=jxn
=m1
1 ii
n
=ji
ii cxxn
=mi
,1
1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 46
FLD – FLD – SSWW Matrix Matrix
Measures the compactness of all Measures the compactness of all classesclasses
It is the sum of all internal scatteringIt is the sum of all internal scattering
Tij
icjijiW
)m)(xm(x=S
c
=i
Tij
n
j=ijW )m)(xm(x=S
1 1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 47
Total ScatteringTotal Scattering
The total scattering is the sum of the The total scattering is the sum of the internal scattering and the scattering internal scattering and the scattering between the classesbetween the classes
SSTT=S=SWW+S+SBB
In an optimal partition the separation In an optimal partition the separation between classes (between classes (SSBB) must be maximum ) must be maximum
and within the classes minimum (and within the classes minimum (SSWW))
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 48
JJ criteria criteria
Fisher defined the Fisher defined the JJ criteria that must criteria that must be maximizedbe maximized
A simplified way to evaluate A simplified way to evaluate JJ is is
WB
S
S=J
)trace(S
)trace(S=J
W
B
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 49
JJ comments comments
JJ may vary in the interval 0<= may vary in the interval 0<=JJ<=<=
JJ is strictly rigid is strictly rigid
JJ looses precision as the sample looses precision as the sample overlapping increasesoverlapping increases
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 50
EFLDEFLD
EFLD measures the compactness and EFLD measures the compactness and separation of all categories when fuzzy separation of all categories when fuzzy partitions are createdpartitions are created
EFLD uses two matrices: EFLD uses two matrices:
SSBeBe : Between Classes Scatter Matrix : Between Classes Scatter Matrix
SSWeWe: Within Classes Scatter Matrix: Within Classes Scatter Matrix
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 51
EFLD – EFLD – SSBeBe Matrix Matrix
Measures the quality of separation Measures the quality of separation between classesbetween classes
n
j=jxn
=m1
1
Tei
c
=i
n
j=eiijBe m))(mm(mμ=S
1 1
n
j=ij
n
j=jij
ei
μ
xμ
=m
1
1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 52
EFLD – EFLD – SSWeWe Matrix Matrix
Measures the compactness of all Measures the compactness of all classesclasses
It is the sum of all internal scatteringIt is the sum of all internal scattering
c
=i
Teij
n
j=eijijWe )m)(xm(xμ=S
1 1
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 53
Total ScatteringTotal Scattering
The total scattering is the sum of the The total scattering is the sum of the internal scattering and the scattering internal scattering and the scattering between the classesbetween the classes
SSTeTe=S=SWeWe+S+SBeBe
In an optimal partition the separation In an optimal partition the separation between classes (between classes (SSBeBe) must be ) must be
maximum and within the classes maximum and within the classes minimum (minimum (SSWeWe))
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 54
JJee criteria criteria
JJee : criteria that must be maximised : criteria that must be maximised
A simplified way to evaluate A simplified way to evaluate JJee is is
eW
eB
e S
S=J
)trace(S
)trace(S=J
eW
eB
e
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 55
Simplifying Simplifying JJee criteria criteria
A simplified way to evaluate A simplified way to evaluate JJee It can be proved that It can be proved that SSTT is constant and is constant and
equal toequal to
n
j=jT
TT
mx=S
)(S=S
1
2
trace
BeT
Be
We
Bee SS
S=
S
S=J
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 56
JJee comments comments
JJee may vary in the interval 0<= may vary in the interval 0<=JJee<=<=
JJee is strictly rigid is strictly rigid
JJee looses precision as the sample looses precision as the sample
overlapping increasesoverlapping increases
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 57
Applying EFLDApplying EFLD
EFLD
Número de Categorias
2 3 4 5 6
Amostras X1 4,6815 4,9136 0,2943 0,2559 0,3157
Amostras X2 0,3271 0,8589 0,8757 0,9608 1,0674
Cluster Validity
Inter Class ContrastInter Class Contrast
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 59
CommentsComments
EFLDEFLD
Increases as the number of clusters Increases as the number of clusters rises. rises.
Increases when classes have high Increases when classes have high degree of overlapping.degree of overlapping.
Reaches maximum for a wrong number Reaches maximum for a wrong number of clusters.of clusters.
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 60
ICCICC
Evaluates a crisp and fuzzy clustering Evaluates a crisp and fuzzy clustering algorithmsalgorithms
Measures:Measures: Partition Compactness Partition Compactness Partition Separation Partition Separation
ICC must be MaximizedICC must be Maximized
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 61
ICCICC
ssBeBe – estimates the quality of the – estimates the quality of the
placement of the centres. placement of the centres. 1/1/nn – scale factor – scale factor
Compensates the influence of the number Compensates the influence of the number of points in of points in ssBeBe
cDn
s=ICC Be
min
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 62
ICC - 2ICC - 2
DDminmin – minimum Euclidian distance between – minimum Euclidian distance between all pairs of centresall pairs of centres
Neutralizes the tendency of Neutralizes the tendency of ssBeBe to grow, to grow,
avoiding the maximum being reached for a avoiding the maximum being reached for a number of clusters greater than the ideal number of clusters greater than the ideal value.value. When 2 or more clusters represent a class When 2 or more clusters represent a class
– – DDminmin decreases abruptly decreases abruptly
cDn
s=ICC Be
min
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 63
ICC Fuzzy ApplicationICC Fuzzy Application
Five classes with 500 points eachFive classes with 500 points each No class overlappingNo class overlapping X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3 Apply FCM for m = 2 and c = 2 ...10Apply FCM for m = 2 and c = 2 ...10
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 64
ICC Fuzzy Application ResultsICC Fuzzy Application Results
00000,3160,3160,1000,10000MinRFMinRF
1,8871,8871,3271,3270,4960,4960,5190,519MMMeanHTMeanHT
1,9941,9942,1242,1240,5720,5720,6470,647MMMinHTMinHT
0,9430,9430,7950,7950,7130,7130,7050,705MMFF
0,0110,0110,0700,0700,0960,0960,3500,350mmCSCS
182,70182,703,9603,9600,9550,955INDINDMMEFLDDetEFLDDet
13,6513,651,8771,8770,9860,9860,1850,185MMEFLDTraEFLDTra
13.6513.651.8771.8770.9860.9860.1850.185MMEFLDEFLD
673637673637259791259791154685154685INDINDMMICCDetICCDet
96,7096,7051,9251,9241,9941,997,5967,596MMICCTraICCTra
96,7096,7051,9251,9241,9941,997,5967,596MMICCICC
55443322
Number of clustersNumber of clustersMeasuresMeasures
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 65
ICC Fuzzy Application TimeICC Fuzzy Application Time
0,00530,0053220,00490,00490,00450,00450,00610,0061FPIFPI
0,00490,0049110,00490,00490,00450,00450,00440,0044FF
0,00600,0060330,00580,00580,00560,00560,00610,0061NFINFI
0,04760,04760,03820,03820,02610,02610,02260,0226CSCS
2,01602,01601,55101,55101,13921,13920,78000,7800EFLDDetEFLDDet
1,89821,89821,47801,47801,08701,08700,76780,7678EFLDTraEFLDTra
0.00800.00800.00630.00630.00710.00710.00530.0053EFLDEFLD
0,01320,01320,01100,01100,00880,00880,01100,0110ICCDetICCDet
0,01100,01100,00880,00880,00600,00600,00780,0078ICCTraICCTra
0,00910,0091440,00820,00820,00690,00690,00610,0061ICCICC
55443322
Number of CategoriesNumber of CategoriesTimeTime
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 66
Application with OverlappingApplication with Overlapping
Five classes with 500 points eachFive classes with 500 points each High cluster overlapping High cluster overlapping X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3X1 – (1,2), (6,2), (1, 6), (6,6), (3,5, 9) Std 0,3 Apply FCM for m = 2 and c = 2 ...10Apply FCM for m = 2 and c = 2 ...10
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 67
Application Overlapping ResultsApplication Overlapping Results
0,5650,5650,5250,5250,5610,5610,6010,6010,5680,568mmMPEMPE
0,4020,4020,2100,2100,1940,1940,2940,2940,1700,17000MinRFMinRF
0,4290,4290,5970,5970,5500,5500,4850,4850,6320,632MMMeanHTMeanHT
0,4390,4390,5860,5860,5910,5910,6210,6210,7540,754MMFF
0,2230,2230,1220,1220,1910,1910,2250,2250,1640,164mmCSCS
1,2001,2000,7430,7430,3150,3150,0490,049INDINDMMEFLDDetEFLDDet
1,3441,3441,0951,0950,8390,8390,5850,5850,4500,450MMEFLDTraEFLDTra
1.3441.3441.0951.0950.8390.8390.5850.5850.4500.450MMEFLDEFLD
602460247048704835723572715,19715,19INDINDMMICCDetICCDet
5,695,697,8297,8296,1916,1914,9384,9385,0655,065MMICCTraICCTra
5,695,697,8297,8296,1916,1914,9384,9385,0655,065MMICCICC
101055443322MeasuresMeasures
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 68
Application Time ResultsApplication Time Results
0,03970,0397220,03190,03190,02710,02710,01670,0167MPEMPE
0,01640,01640,00610,00610,01210,01210,01120,0112FF
0,05900,0590330,03620,03620,02830,02830,02200,0220CSCS
1,84501,84501,60901,60901,25801,25800,97200,9720EFLDDetEFLDDet
2,25842,25841,75981,75982,10382,10380,79300,7930EFLDTraEFLDTra
0.01100.01100.00960.00960.00880.00880.00630.0063EFLDEFLD
0,01200,01200,01100,01100,00780,00780,01100,0110ICCDetICCDet
0,01100,01100,00980,00980,00600,00600,00660,0066ICCTraICCTra
0,00880,0088110,00770,00770,00640,00640,00600,0060ICCICC
55443322
Number of ClustersNumber of ClustersTimeTime
*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 69
ICC conclusionsICC conclusions
Fast and efficientFast and efficient Works with fuzzy and crisp partitionsWorks with fuzzy and crisp partitions Efficient even with high overlapping Efficient even with high overlapping
clustersclusters High rate of right resultsHigh rate of right results