Research Article An Improved Fuzzy c-Means Clustering ...

11
Research Article An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO Jian Zhang 1 and Ling Shen 2 1 School of Mechanical Engineering, Tongji University, Shanghai 200092, China 2 Precision Medical Device Department, University of Shanghai for Science and Technology, Shanghai 200093, China Correspondence should be addressed to Jian Zhang; [email protected] Received 8 June 2014; Revised 30 September 2014; Accepted 14 October 2014; Published 12 November 2014 Academic Editor: Daoqiang Zhang Copyright © 2014 J. Zhang and L. Shen. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy -means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering. SP- FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. is new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect. 1. Introduction Clustering is the process of assigning a homogeneous group of objects into subsets called clusters, so that objects in each cluster are more similar to each other than objects from different clusters based on the values of their attributes [1]. Clustering techniques have been studied extensively in data mining [2], pattern recognition [3], and machine learning [4]. Clustering algorithms can be generally grouped into two main classes, namely, supervised clustering and unsupervised clustering where the parameters of classifier are optimized. Many unsupervised clustering algorithms have been devel- oped. One such algorithm is -means, which assigns objects to clusters by minimizing the sum of squared Euclidean distance between the objects in each cluster to the cluster center. e main drawback of the -means algorithm is that the result is sensitive to the selection of initial cluster centroids and may converge to local optima [5]. For handling those random distribution data sets, soſt computing has been introduced in clustering [6], which exploits the tolerance for imprecision and uncertainty in order to achieve tractability and robustness. Fuzzy sets and rough sets have been incorporated in the -means framework to develop the fuzzy -means (FCM) [7] and rough -means (RCM) [8] algorithms. Fuzzy algorithms can assign data object partially to multiple clusters and handle overlapping partitions. e degree of membership in the fuzzy clusters depends on the closeness of the data object to the cluster centers. e most popular fuzzy clustering algorithm is FCM which is introduced by Bezdek [9] and now it is widely used. FCM is an effective algorithm, but the random selection in center points makes iterative process fall into the saddle points or local optimal solution easily. Furthermore, if the data sets contain severe noise points or if the data sets are high dimensional, such as bioinformatics [10], the alternating optimization oſten fails to find the global optimum. In these cases, the probability of finding the global optimum can be increased by stochastic methods such as evolutionary or swarm-based methods. Bezdek and Hathaway [11] optimized the hard -means (HCM) model with a genetic algorithm. Runkler [12] introduced an ant colony optimization algorithm which explicitly minimizes the HCM and FCM cluster models. Al-Sultan and Selim [13] proposed the simulated annealing algorithm (SA) to overcome some of these limits and got promising results. Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2014, Article ID 368628, 10 pages http://dx.doi.org/10.1155/2014/368628

Transcript of Research Article An Improved Fuzzy c-Means Clustering ...

Page 1: Research Article An Improved Fuzzy c-Means Clustering ...

Research ArticleAn Improved Fuzzy c-Means Clustering Algorithm Based onShadowed Sets and PSO

Jian Zhang1 and Ling Shen2

1 School of Mechanical Engineering Tongji University Shanghai 200092 China2 Precision Medical Device Department University of Shanghai for Science and Technology Shanghai 200093 China

Correspondence should be addressed to Jian Zhang jianzhtongjieducn

Received 8 June 2014 Revised 30 September 2014 Accepted 14 October 2014 Published 12 November 2014

Academic Editor Daoqiang Zhang

Copyright copy 2014 J Zhang and L ShenThis is an open access article distributed under the Creative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

To organize the wide variety of data sets automatically and acquire accurate classification this paper presents a modified fuzzy119888-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzyclustering utilizes vagueness balance property of shadowed sets to handle overlapping among clusters and models uncertaintyin class boundaries This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster numberwithin a specific rangewith cluster partitions that provide compact andwell-separated clusters Experiments show that the proposedapproach significantly improves the clustering effect

1 Introduction

Clustering is the process of assigning a homogeneous groupof objects into subsets called clusters so that objects in eachcluster are more similar to each other than objects fromdifferent clusters based on the values of their attributes [1]Clustering techniques have been studied extensively in datamining [2] pattern recognition [3] andmachine learning [4]

Clustering algorithms can be generally grouped into twomain classes namely supervised clustering and unsupervisedclustering where the parameters of classifier are optimizedMany unsupervised clustering algorithms have been devel-oped One such algorithm is 119896-means which assigns 119899 objectsto 119896 clusters by minimizing the sum of squared Euclideandistance between the objects in each cluster to the clustercenter The main drawback of the 119896-means algorithm isthat the result is sensitive to the selection of initial clustercentroids and may converge to local optima [5]

For handling those random distribution data sets softcomputing has been introduced in clustering [6] whichexploits the tolerance for imprecision and uncertainty inorder to achieve tractability and robustness Fuzzy sets andrough sets have been incorporated in the 119888-means framework

to develop the fuzzy 119888-means (FCM) [7] and rough 119888-means(RCM) [8] algorithms

Fuzzy algorithms can assign data object partially tomultiple clusters and handle overlapping partitions Thedegree of membership in the fuzzy clusters depends onthe closeness of the data object to the cluster centers Themost popular fuzzy clustering algorithm is FCM which isintroduced byBezdek [9] andnow it is widely used FCM is aneffective algorithm but the random selection in center pointsmakes iterative process fall into the saddle points or localoptimal solution easily Furthermore if the data sets containsevere noise points or if the data sets are high dimensionalsuch as bioinformatics [10] the alternating optimizationoften fails to find the global optimum In these cases theprobability of finding the global optimum can be increasedby stochastic methods such as evolutionary or swarm-basedmethods Bezdek and Hathaway [11] optimized the hard119888-means (HCM) model with a genetic algorithm Runkler[12] introduced an ant colony optimization algorithm whichexplicitly minimizes the HCM and FCM cluster modelsAl-Sultan and Selim [13] proposed the simulated annealingalgorithm (SA) to overcome some of these limits and gotpromising results

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2014 Article ID 368628 10 pageshttpdxdoiorg1011552014368628

2 Computational Intelligence and Neuroscience

PSO is a population based optimization tool developedby Eberhart and Kennedy [14] which can be implementedand applied easily to solve various function optimizationproblems Runkler and Katz [15] introduced two new meth-ods for minimizing the reformulated objective functions ofthe FCM clustering model by PSO PSO-119881 and PSO-119880 Inorder to overcome the shortcomings of FCM a PSO-basedfuzzy clustering algorithm was discussed [16] this algorithmuses the global search capacity of PSO to overcome theshortcomings of FCM For finding more appropriate clustercenters a generalized FCM optimized by PSO algorithm [17]was proposed

Shadowed sets are considered as a conceptual and algo-rithmic bridge between rough sets and fuzzy sets therebyincorporate the generic merits and have been successfullyused for unsupervised learning Shadowed sets introduce(0 1) interval to denote the belongingness of those clusteringpoints and the uncertainty among patterns lying in the shad-owed region is efficiently handled in terms of membershipThus in order to disambiguate and capture the essence of adistribution recently the concept of shadowed sets has beenintroduced [18] which can also raise the efficiency in theiteration process of the new prototypes by eliminating someldquobad pointsrdquo that have bad influence on cluster structure[19 20] Compared with FCM the capability of shadowed 119888-means is enhanced when dealing with outlier [21]

Although lots of clustering algorithms based on FCMPSO or shadowed sets were proposed most of them needto input the preestimated cluster number 119862 To obtain thedesirable cluster partitions in a given data commonly119862 is setmanually and this is a very subjective and somewhat arbitraryprocess A number of approaches have been proposed toselect the appropriate 119862 Bezdek et al [22] suggested therule of thumb 119862 le 11987312 where the upper bound must bedetermined based on knowledge or applications about thedata Another approach is to use a cluster validity indexas a measure criterion about the data partition such asDavies-Bouldin (DB) [23] Xie-Beni (XB) [24] and Dunn[25] indices These indices often follow the principle that thedistance between objects in the same cluster should be assmall as possible and the distance between objects in differentclusters should be as large as possible They have also beenused to acquire the optimal number of clusters 119862 accordingto their maximum or minimum value

Therefore we wish to find the best 119862 in some rangeobtain cluster partitions by considering compactness andintercluster separation and reduce the sensitivity to initialvalues Here we propose a modified algorithm named asSP-FCM which integrates the merits of PSO and interleavesshadowed sets between stabilization iterations And it canautomatically estimate the optimal cluster number with afaster initialization than our previous approach

The structure of the paper is as follows Section 2 outlinesall necessary prerequisites In Section 3 a new clusteringapproach called SP-FCM is presented for automatically find-ing the optimal cluster number Section 4 includes the resultsof experiments involvingUCI data sets yeast gene expressiondata sets and real data set In Section 5 main conclusions arecovered

2 Related Clustering Algorithms

In this section we briefly describe some basic concepts ofFCM PSO shadowed sets and XB validity index and reviewthe PSO-based clustering method

21 FCM We define 119883 = 1199091 119909

119873 as the universe of a

clustering data set 119861 = 1205731 120573

119862 as the prototypes of 119862

clusters and 119880 = [119906119894119895]119873times119862

as a fuzzy partition matrix where119906119894119895isin [0 1] is themembership of 119909

119894in a cluster with prototype

120573119895 119909119894 120573119895isin 119877119875 where P is the data dimensionality 1 le 119894 le 119873

and 1 le 119895 le 119862 The FCM algorithm is derived by minimizingthe objective function [22]

119869FCM (119880 119861119883) =119862

sum

119895=1

119873

sum

119894=1

119906119898

1198941198951198892

119894119895(119909119894 120573119895) (1)

where 119898 gt 10 is the weighting exponent on each fuzzymembership and 119889

119894119895is the Euclidian distance from data

vectors 119909119894to cluster center 120573

119895 And

119862

sum

119895=1

119906119894119895= 1 forall119894 = 1 2 119873

0 lt

119873

sum

119894=1

119906119894119895lt 119873 forall119895 = 1 2 119862

119889119894119895=

10038171003817100381710038171003817119909119894minus 120573119895

10038171003817100381710038171003817

(2)

This produces the following update equations

119906119894119895= (

119862

sum

119896=1

(

119889 (119909119894 120573119895)

119889 (119909119894 120573119896)

)

2(119898minus1)

)

minus1

(3)

120573119895=

(sum119873

119894=1(119906119894119895)

119898

119909119894)

(sum119873

119894=1(119906119894119895)

119898

)

(4)

After computing the memberships of all the objects thenew prototypes of the clusters are calculated The processstops when the prototypes stabilize That is the prototypesfrom the previous iteration are of close proximity to thosegenerated in the current iteration normally less than an errorthreshold

22 PSO PSO was originally introduced in terms of socialand cognitive behavior of bird flocking and fish schoolingThe potential solutions are called particles which fly throughthe problem space by following the current best particlesEach particle keeps track of its coordinates in the problemspace which are associated with the best solution that hasbeen achieved so far The solution is evaluated by the fitnessvalue which is also storedThis value is called 119901best Another

Computational Intelligence and Neuroscience 3

best value that is tracked by the PSO is the best value obtainedso far by any particle in the swarm The best value is a globalbest and is called 119892best The search for the better positionsfollows the rule as

119881 (119905 + 1) = 119908119881 (119905) + 11988811199031(119901best (119905) minus 119875 (119905))

+ 11988821199032(119892best (119905) minus 119875 (119905))

119875 (119905 + 1) = 119875 (119905) + 119881 (119905 + 1)

(5)

where 119875 and 119881 are position and velocity vector of particlerespectively 119908 is inertia weight 119888

1and 119888

2are positive

constants called acceleration coefficients which control theinfluence of 119901best and 119892best in search process and 119903

1and 1199032

are random values in the range [0 1]The fitness value of eachparticlersquos position is determined by a fitness function andPSO is usually executed with repeated application of (5) untila specified number of iterations have been exceeded or thevelocity updates are close to zero over a number of iterations

23 PSO-Based FCM In this algorithm [26] each particlePart119897represents a cluster center vector which is constructed

as

Part119897= (1198751198971 119875

119897119895 119875

119897119862) (6)

where 119897 represents the 119897th particle 119897 = 1 2 119871 119871 is thenumber of particles and 119871 lt 119873 119875

119897119895is the 119895th cluster center

of particle Part119897 Therefore a swarm represents a number

of candidates cluster center for the data vector Each datavector belongs to a cluster according to its membershipfunction and thus a fuzzy membership is assigned to eachdata vector Each cluster has a cluster center per iteration andpresents a solution which gives a vector of cluster centersThis method determines the position vector Partl for everyparticle updates it and then changes the position of clustercenter And the fitness function for evaluating the generalizedsolutions is stated as

119865 (119875) =

1

119869FCM (7)

The smaller is the 119869FCM the better is the clustering effectand the higher is the fitness function 119865(119875)

24 Shadowed Sets Conventional uncertainty models likefuzzy sets tend to capture vagueness through membershipvalues and associate precise numeric values of membershipwith vague concepts By introducing 120572-cut [19] a fuzzy setcan be converted into a classical set Shadowed sets mapeach element of a given fuzzy set into 0 1 and the unitinterval [0 1] namely excluded included and uncertainrespectively

For constructing a shadowed set Mitra et al [21] pro-posed an optimization based on balance of vagueness Aselevating membership values of some regions to 1 and atthe same time reducing membership values of some regionsto 0 the uncertainty in these regions can be eliminatedTo keep the balance of the total uncertainty regions it

f(x)

Ω3

Ω2

Ω1

x1 x2

1

120572

x

1minus120572

0

Figure 1 Shadowed sets induced by fuzzy function 119891(119909)

needs to compensate these changes by the construction ofuncertain regions namely shadowed sets that absorb theprevious elimination of partial membership at low and highranges The shadowed sets are induced by fuzzy membershipfunction in Figure 1

Here119909 denotes the objects119891(119909) isin [0 1] is the continuousmembership function of the objects belonging to a clusterThe symbolΩ

1shows the reduction of membership the sym-

bol Ω2depicts the elevation of membership and the symbol

Ω3shows the formation of shadows In order to balance the

total uncertainty the retention of balance translates into thefollowing dependency

Ω1+ Ω2= Ω3 (8)

And the integral forms are given as

Ω1= int

119909119891(119909)le120572

119891 (119909) 119889119909 Ω2= int

119909119891(119909)ge1minus120572

(1 minus 119891 (119909)) 119889119909

Ω3= int

119909120572lt119891(119909)lt1minus120572

119889119909

(9)

The threshold of reducing and elevating is 120572 and 1 minus 120572(120572 isin (0 05)) The optimal value of 120572 should be acquired bytranslating it into the minimization of the following objectivefunction

119874 (120572) =1003816100381610038161003816Ω1+ Ω2minus Ω3

1003816100381610038161003816 (10)

4 Computational Intelligence and Neuroscience

For a fuzzy set with discrete membership function thebalance equation is modified as

119874(120572119895) =

10038161003816100381610038161003816100381610038161003816100381610038161003816

sum

119906119894119895le120572119895

119906119894119895+ sum

119906119894119895ge119906119895maxminus120572119895

(119906119895maxminus 120572119895)

minus card 119906119894119895| 120572119895lt 119906119894119895lt 119906119895maxminus 120572119895

10038161003816100381610038161003816100381610038161003816100381610038161003816

(11)

In order to find the best 120572119895 it should satisfy the following

optimal problem

120572119895= arg min120572119895

119874(120572119895) (12)

where 119906119894119895isin [0 1] is the membership of 119909

119894in a cluster with

prototype 120573119895 119906119895max

and 119906119895min

denote the highest and lowestmembership values to the 119895th cluster and 120572

119895is the threshold

of the 119895th cluster The range of feasible values of threshold 120572119895

is [119906119895min (119906119895min+ 119906119895max)2] [19]

This approach considers all membership values withrespect to a fixed cluster when updating the prototype ofthis cluster The main merits of shadowed sets involve theoptimizationmechanism for choosing separate threshold andthe reduction of the burden of plain numeric computations

25 XB Clustering Validity Index The clustering algorithmsdescribed above require prespecification of the number ofclusters The partition results are dependent on the choiceof 119862 There exist validity indices to evaluate the goodness ofclustering according to a given number of clusters thereforethese validity indices can be used to acquire the optimal valueof 119862 [27]

The XB index presents a fuzzy-validity criterion basedon a validity function which identifies overall compact andseparate fuzzy 119888-partitions This function depends upon thedata set geometric distance measure and distance betweencluster centroids and fuzzy partition irrespective of anyfuzzy algorithm used For evaluating the goodness of thedata partition both cluster compactness and interclusterseparation should be taken into account For the FCMalgorithm with 119898 = 20 the Xie-Beni index can be shownto be

119878XB =119869FCM1198731198892

min (13)

where 119889min = min119894119895

10038171003817100381710038171003817120573119894minus 120573119895

10038171003817100381710038171003817is the minimum distance

between cluster centroidsThemore separate the clusters thelarger the 119889min and the smaller the 119878XB

3 Shadowed Sets-Based PSO-FuzzyClustering SP-FCM

FCM strives to find 119862 compact clusters in 119883 where 119862 is oneof the specified parameters But the process of selecting andadjusting 119862manually to obtain desirable cluster partitions ina given data set is very subjective and somewhat arbitrary

To seek the optimal cluster structure 119862 is always allowed tobe overestimated [28] such that the distances between someclusters are not big enough or themembership values of someobjects with different clusters are adjacent and ambiguousin a given data set And in this case the modification ofprototypes through long time iteration is meaningless

The main subject of cluster validation is the evaluation ofclustering results to find the partitioning that best fits the dataset Based on the foregoing algorithms we wish to find clusterpartitions that contain compact and well-separated clustersIn our algorithm 119862 is also overestimated and the clusterscompete for data membership We can set [119862min 119862max] as thereasonable range of cluster number based on the knowledgeof the data This provides a more transparent and tractableprocess of cluster number reduction Considering the fuzzypartition matrix 119880 = [119906

119894119895]119873times119862

each column is comprisedof the membership values of all feature vectors 119909

119894with a

single cluster center Thus an optimal threshold 120572119895(119895 =

1 2 119862) for each column should be found to create a harderpartition by (12) The amount of data which are assignedmembership value equal to 1 is identified as the cardinalityof corresponding cluster According to 120572

119895 the cardinality of

the 119895th column is

119872119895= card 119906

119894119895| 119906119894119895ge 119906119895maxminus 120572119895 (14)

Here the threshold is not subjectively user-defined butit is established on the balance of uncertainty and canbe adjusted automatically in the clustering process Thisproperty of shadowed sets can be used to reduce the clusternumber In order to control the convergence speed thedecision to delete clusters can be based on some thresholdsDifferent threshold values should be set for different data setsdepending on the cluster structure and size of data sets Herea threshold 120576 and attrition rate 120588 (0 lt 120588 lt 1) are set Thedecision to delete clusters in SP-FCM is based solely on clustercardinality and the threshold 120576 If 120576 is too small 119862 is reducedmore slowly and it may stop prematurely before the optimalcluster number is found On the other hand if 120576 is too large119862 may be reduced too drastically In our method clusterswhose cardinalities 119872

119895lt 120576 are considered as ldquocandidatesrdquo

for removal And we can remove up to lfloor120588 times 119862rfloor clustershaving the lowest cardinality from the pool of candidatesspecified by 120576 Limiting the number of clusters that can beremoved at one time prevents 119862 from being reduced toodrastically when 120576 is set too high for a given data set Thiswould automatically estimate the best cluster number whilealso utilizing a faster consistent and repeatable initializationtechnique For evaluating the goodness of the data partitionboth cluster compactness and intercluster separation shouldbe taken into account Hence the XB index is adopted

For each 119862 in the range of [119862min 119862max] a set of clustervalidity indexes were calculated where 119862max is the initialcluster number which is set to be much larger than theexpected cluster numberThe partitionmatrix with119862 clusterswith the best aggregate validity index is selected as the finalcluster partitionThe SP-FCM algorithm is summarized as inAlgorithm 1

Computational Intelligence and Neuroscience 5

(1) Initialize 119862max and 119862min let 119862 = 119862max the real number119898 iteration counter 119896 = 0 iteration counter 119905 = 0 maximumiteration number 119879 of PSO

(2) Initialize the population size 119871 the initial velocity of particles the initial position of particles 1198881 1198882 119908 the threshold 120576

and attrition rate 120588(3)DO

Repeat (a) Update partition matrix 119880(119896)for all particles by (3)(b) Calculate the cluster center for each particle by (4)(c) Calculate the fitness value for each particle by (7)(d) Calculate 119901119887119890119904119905 for each particle(e) Calculate 119892119887119890119904119905 for the swarm(f) Update the velocity and position of each particle by (5)(g) 119905 = 119905 + 1

Until PSO termination condition is met (lowast)(i) Calculate the optimal threshold 120572

119895(119895 = 1 2 119862) for each column of partition matrix 119880(119896) by (12)

and relocate 119906119894119895(1 le 119894 le 119873) of 119895th cluster according to 120572

119895

(ii) Calculate cardinality119872119895for each cluster on the basis of the number of data whose membership value equal to 1 by (14)

1 le 119895 le 119862

(iii) Remove all clusters whose119872119895lt 120576 and119872

119895is among lfloor120588 times 119862rfloor lowest cardinality

(iv) Update cluster number C(v) Calculate cluster validity index 119878XB(119896) by (13)(vi) Update iteration counter 119896 = 119896 + 1

While termination condition is not met (lowastlowast)(lowast) The termination condition of PSO in this method is 119905 ge 119879 (reach the maximum number of iterations) or the velocity

updates are close to zero over a number of iterations(lowastlowast) The algorithm can terminate under either of the following two conditions(1) The prototype parameters in 119861 stabilize within some threshold 120575(2) The number of clusters has reached the minimum limit 119862min

Algorithm 1 SP-FCM

Here if lfloor120588 times 119862rfloor is equal to 0 we can let it to be1 This means that the cluster with the lowest cardinalitymay be removed The initial 119862max cluster prototypes canbe initialized using exemplars from data points selected by120573119895= 119909lfloor(119873119862max)119895rfloor

After termination the 119861 and 119880 from119862 isin [119862min 119862max] with the best cluster validity index 119878XB areselected as the final cluster prototype and partition

4 Experimental Results

In this section the performance of FCM RCM shadowed 119888-means (SCM) [21] shadowed rough 119888-means (SRCM) [19]and SP-FCM algorithms is presented on four UCI datasetsfour yeast gene expression datasets and real data Forevaluating the convergence effect the fundamental criterioncan be described as follows the distance between differentobjects in the same cluster should be as close as possible thedistance between different objects in different cluster shouldbe as far as possible Here we use DB index and Dunn indexto evaluate the clustering effect For a given data set and 119862value the higher the similarity values within the clusters andthe intercluster separation the lower the DB index value Agood clustering procedure shouldmake the value ofDB indexas low as possible Reversely higher values of the Dunn indexindicate better clustering in the sense that the clusters are well

separated and relatively compact The details of experimentsare mentioned below

41 UCI Data Set In our experiments totally four UCI datasets are used including 4-dimensional Iris 13-dimensionalWine 10-dimensional Glass and 34-dimensional Iono-sphere There are 3 clusters in data set of Iris each of whichhas 50 data patterns 3 clusters in data set of Wine whichhave 50 60 and 68 data patterns 6 clusters in data set ofGlass which have 30 35 40 42 36 and 31 separately and2 clusters in data set of Ionosphere which have 226 and125 data patterns The validity indices of each method arecompared in Table 1 SP-FCM can identify compact groupscompared to other algorithmswhen given the cluster number119862 It can also be seen that SRCM and SP-FCM have moreobvious advantages than FCM RCM and SCM SP-FCMperforms slightly better than SRCM in most cases due tothe global search ability which enables it to converge to anoptimum or near optimum solutions Moreover shadowedset- and rough set-based clustering methods namely SP-FCM SRCM RCM and SCM perform better than FCM Itimplies that the partition of approximation regions can revealthe nature of data structure and only the lower bound andboundary region of each cluster have positive contribution inthe process of updating the prototypes

6 Computational Intelligence and Neuroscience

Table 1 Performance of FCM RCM SCM SRCM and SP-FCM on four UCI data sets

Different indices Algorithm Data setsIris Wine Glass Ionosphere

DB index

FCM 07642 08803 22971 20587RCM 06875 05692 19635 15434SCM 06862 05327 18495 14763SRCM 06613 04436 15804 13971SP-FCM 06574 04328 15237 14066

Dunn index

FCM 23106 25834 01142 08381RCM 27119 28157 02637 10233SCM 24801 27992 03150 10319SRCM 30874 31342 05108 11924SP-FCM 33254 31764 04921 12605

0

05

1

15

2

25

3

12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(a) Iris

0

1

2

3

4

5

6

13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(b) Wine

0

2

4

6

8

10

14 1213 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(c) Glass

0

5

10

15

20

25

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(d) Ionosphere

Figure 2 XB validity index of four UCI data sets with cluster number C

As usual the number of clusters is implied by the nature ofthe problem Here with the shadowed sets involved one cananticipate that the optimal number of clusters could be foundThe fuzzification coefficient 119898 can be optimized however itis common to assume a fixed value of 20 which associateswith the form of the membership functions of the generatedclusters For testing the SP-FCMalgorithm the rule119862 le 11987312is adopted and the range of the expected cluster number canbe set as (1) Iris [119862min = 2 119862max = 12] (2) Wine [119862min = 2119862max = 13] (3) Glass [119862min = 2 119862max = 14] (4) Ionosphere

[119862min = 2 119862max = 16] The swarm size is set as 119871 = 20the maximum iteration number of PSO 119879 = 50 and forcluster reduction the cluster cardinality threshold 120576 = 10and the attrition rate 120588 = 01 In each cycle we get thedistribution of every cluster remove part of them accordingto their cardinality and calculate the XB index and thecluster number 119862 varies from 119862max to 119862min After ending thecirculation the partition with the lowest value is selected asthe final result Figure 2 presents the validity indices in theprocess of generating the optimal cluster number Smaller

Computational Intelligence and Neuroscience 7

Table 2 Performance of FCM RCM SCM SRCM and SP-FCM on four yeast expression data sets

Different indices Algorithm Data setsGDS608 GDS2003 GDS2267 GDS2712

DB index

FCM 20861 24671 15916 19526RCM 16109 22104 10274 12058SCM 15938 21346 08946 10965SRCM 13274 19523 07438 07326SP-FCM 12958 18946 07962 06843

Dunn index

FCM 02647 02976 04208 03519RCM 03789 03981 07164 06074SCM 03865 03775 08439 06207SRCM 05126 04953 09759 08113SP-FCM 05407 05026 09182 08049

values indicate more compact and well-separated clustersThe validity indices reach their minimum value at 119862 = 33 6 and 2 separately which correspond to the final clusterprototype and the best partition Through the shadowedsets and PSO approaches the influence of each boundaryregion to the formation of the prototypes and the clusterscan be properly resolved Although more computing timeis required to run SP-FCM the reasonable result can beacquired for processing the overlapping and vagueness datapatterns

42 Yeast Gene Expression Data Set There are four yeastgene expression data sets used in the experiments includingGDS608 GDS2003 GDS2267 and GDS2712 downloadedfrom Gene Expression Omnibus The number of classesand samples of GDS608 is 26 and 6303 for GDS2003 thenumber of classes and samples is 23 and 5617 for GDS2267is 14 and 9275 and for GDS2712 is 15 and 9275 Table 2presents the validity indices of different methods after thecluster number 119862 was given The SP-FCM and SRCM obtainthe same effect and perform better than other clusteringalgorithms The improvement can be attributed to the factthat the global search capacity of PSO is conducive to findingmore appropriate cluster centers while escaping from localoptima

For getting the optimum 119862 automatically we let119898 = 201198881= 149 119888

2= 149 and 119908 = 072 and the rule 119862 le 11987312

is adopted The swarm size is set as 119871 = 20 the maximumiteration number of PSO is 119879 = 80 and for clusterreduction the range of the expected cluster number thecluster cardinality threshold 120576 and the attrition rate 120588 can beset as (1) GDS608 [119862min = 20 119862max = 80] 120576 = 20 120588 = 005(2) GDS2003 [119862min = 20 119862max = 75] 120576 = 20 120588 = 005(3) GDS2267 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008(4) GDS2712 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008In each cycle we get the distribution of every cluster removepart of them according to their cardinality and calculate theXB index and the cluster number 119862 varies from 119862max to119862min The partition with the lowest value is selected as the

final result after the loop is ended As seen in Figure 3 forGDS608 at the beginning the cluster number decreases at afaster rate it takes 26 iterations to reduce the cluster numberfrom 119862 = 80 to 119862 = 30 and 4 iterations from 119862 = 30

to 119862 = 26 and the XB index begins to increase when thecluster number 119862 lt 26 For GDS2003 it takes 24 iterationsto reduce the cluster number from 119862 = 75 to 119862 = 30 and 7iterations from 119862 = 30 to 119862 = 23 and the XB index beginsto increase when the cluster number 119862 lt 23 For GDS2267 ittakes 23 iterations to reduce the cluster number from 119862 = 96to 119862 = 20 and 6 iterations from 119862 = 20 to 119862 = 14 andthe XB index begins to increase when the cluster number119862 lt 14 For GDS2712 it takes 23 iterations to reduce thecluster number from 119862 = 96 to 119862 = 20 and 5 iterationsfrom 119862 = 20 to 119862 = 15 and the XB index begins to increasewhen the cluster number 119862 lt 15 Here the advantagesof fuzzy sets PSO and shadowed sets are integrated in theSP-FCM and make this algorithm applicable to deal withoverlapping partitions the uncertainty and vagueness arisingfrom the boundary regions and the optimization processin the shadowed sets makes this method robust to outliersso that the approximation regions of each cluster can bedetermined accurately and the obtained prototypes approachthe desired locations

43 Real Data In this experiment totally 10 different pack-ages are tested Each package is represented by 100 framescaptured from different angles by camera and each frame isextracted SIFT feature points which are used for training arecognition system Figure 4 shows some images with theirSIFT keypoints And this data set is comprised of 248150descriptors We let 119898 = 20 119888

1= 149 119888

2= 149 119908 = 072

119871 = 20 120576 = 30 and 120588 = 001 for the SP-FCM and choose thereasonable range [119862min = 200 119862max = 360] according to thecategory amount of packages and distribution of keypoints ineach image Eighty iterations of PSO are run on each given119862 to produce the cluster prototype 119861 and partition matrix119880 as the starting point for the shadowed sets Longer PSOstabilization is needed to obtainmore stable cluster partitions

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article An Improved Fuzzy c-Means Clustering ...

2 Computational Intelligence and Neuroscience

PSO is a population based optimization tool developedby Eberhart and Kennedy [14] which can be implementedand applied easily to solve various function optimizationproblems Runkler and Katz [15] introduced two new meth-ods for minimizing the reformulated objective functions ofthe FCM clustering model by PSO PSO-119881 and PSO-119880 Inorder to overcome the shortcomings of FCM a PSO-basedfuzzy clustering algorithm was discussed [16] this algorithmuses the global search capacity of PSO to overcome theshortcomings of FCM For finding more appropriate clustercenters a generalized FCM optimized by PSO algorithm [17]was proposed

Shadowed sets are considered as a conceptual and algo-rithmic bridge between rough sets and fuzzy sets therebyincorporate the generic merits and have been successfullyused for unsupervised learning Shadowed sets introduce(0 1) interval to denote the belongingness of those clusteringpoints and the uncertainty among patterns lying in the shad-owed region is efficiently handled in terms of membershipThus in order to disambiguate and capture the essence of adistribution recently the concept of shadowed sets has beenintroduced [18] which can also raise the efficiency in theiteration process of the new prototypes by eliminating someldquobad pointsrdquo that have bad influence on cluster structure[19 20] Compared with FCM the capability of shadowed 119888-means is enhanced when dealing with outlier [21]

Although lots of clustering algorithms based on FCMPSO or shadowed sets were proposed most of them needto input the preestimated cluster number 119862 To obtain thedesirable cluster partitions in a given data commonly119862 is setmanually and this is a very subjective and somewhat arbitraryprocess A number of approaches have been proposed toselect the appropriate 119862 Bezdek et al [22] suggested therule of thumb 119862 le 11987312 where the upper bound must bedetermined based on knowledge or applications about thedata Another approach is to use a cluster validity indexas a measure criterion about the data partition such asDavies-Bouldin (DB) [23] Xie-Beni (XB) [24] and Dunn[25] indices These indices often follow the principle that thedistance between objects in the same cluster should be assmall as possible and the distance between objects in differentclusters should be as large as possible They have also beenused to acquire the optimal number of clusters 119862 accordingto their maximum or minimum value

Therefore we wish to find the best 119862 in some rangeobtain cluster partitions by considering compactness andintercluster separation and reduce the sensitivity to initialvalues Here we propose a modified algorithm named asSP-FCM which integrates the merits of PSO and interleavesshadowed sets between stabilization iterations And it canautomatically estimate the optimal cluster number with afaster initialization than our previous approach

The structure of the paper is as follows Section 2 outlinesall necessary prerequisites In Section 3 a new clusteringapproach called SP-FCM is presented for automatically find-ing the optimal cluster number Section 4 includes the resultsof experiments involvingUCI data sets yeast gene expressiondata sets and real data set In Section 5 main conclusions arecovered

2 Related Clustering Algorithms

In this section we briefly describe some basic concepts ofFCM PSO shadowed sets and XB validity index and reviewthe PSO-based clustering method

21 FCM We define 119883 = 1199091 119909

119873 as the universe of a

clustering data set 119861 = 1205731 120573

119862 as the prototypes of 119862

clusters and 119880 = [119906119894119895]119873times119862

as a fuzzy partition matrix where119906119894119895isin [0 1] is themembership of 119909

119894in a cluster with prototype

120573119895 119909119894 120573119895isin 119877119875 where P is the data dimensionality 1 le 119894 le 119873

and 1 le 119895 le 119862 The FCM algorithm is derived by minimizingthe objective function [22]

119869FCM (119880 119861119883) =119862

sum

119895=1

119873

sum

119894=1

119906119898

1198941198951198892

119894119895(119909119894 120573119895) (1)

where 119898 gt 10 is the weighting exponent on each fuzzymembership and 119889

119894119895is the Euclidian distance from data

vectors 119909119894to cluster center 120573

119895 And

119862

sum

119895=1

119906119894119895= 1 forall119894 = 1 2 119873

0 lt

119873

sum

119894=1

119906119894119895lt 119873 forall119895 = 1 2 119862

119889119894119895=

10038171003817100381710038171003817119909119894minus 120573119895

10038171003817100381710038171003817

(2)

This produces the following update equations

119906119894119895= (

119862

sum

119896=1

(

119889 (119909119894 120573119895)

119889 (119909119894 120573119896)

)

2(119898minus1)

)

minus1

(3)

120573119895=

(sum119873

119894=1(119906119894119895)

119898

119909119894)

(sum119873

119894=1(119906119894119895)

119898

)

(4)

After computing the memberships of all the objects thenew prototypes of the clusters are calculated The processstops when the prototypes stabilize That is the prototypesfrom the previous iteration are of close proximity to thosegenerated in the current iteration normally less than an errorthreshold

22 PSO PSO was originally introduced in terms of socialand cognitive behavior of bird flocking and fish schoolingThe potential solutions are called particles which fly throughthe problem space by following the current best particlesEach particle keeps track of its coordinates in the problemspace which are associated with the best solution that hasbeen achieved so far The solution is evaluated by the fitnessvalue which is also storedThis value is called 119901best Another

Computational Intelligence and Neuroscience 3

best value that is tracked by the PSO is the best value obtainedso far by any particle in the swarm The best value is a globalbest and is called 119892best The search for the better positionsfollows the rule as

119881 (119905 + 1) = 119908119881 (119905) + 11988811199031(119901best (119905) minus 119875 (119905))

+ 11988821199032(119892best (119905) minus 119875 (119905))

119875 (119905 + 1) = 119875 (119905) + 119881 (119905 + 1)

(5)

where 119875 and 119881 are position and velocity vector of particlerespectively 119908 is inertia weight 119888

1and 119888

2are positive

constants called acceleration coefficients which control theinfluence of 119901best and 119892best in search process and 119903

1and 1199032

are random values in the range [0 1]The fitness value of eachparticlersquos position is determined by a fitness function andPSO is usually executed with repeated application of (5) untila specified number of iterations have been exceeded or thevelocity updates are close to zero over a number of iterations

23 PSO-Based FCM In this algorithm [26] each particlePart119897represents a cluster center vector which is constructed

as

Part119897= (1198751198971 119875

119897119895 119875

119897119862) (6)

where 119897 represents the 119897th particle 119897 = 1 2 119871 119871 is thenumber of particles and 119871 lt 119873 119875

119897119895is the 119895th cluster center

of particle Part119897 Therefore a swarm represents a number

of candidates cluster center for the data vector Each datavector belongs to a cluster according to its membershipfunction and thus a fuzzy membership is assigned to eachdata vector Each cluster has a cluster center per iteration andpresents a solution which gives a vector of cluster centersThis method determines the position vector Partl for everyparticle updates it and then changes the position of clustercenter And the fitness function for evaluating the generalizedsolutions is stated as

119865 (119875) =

1

119869FCM (7)

The smaller is the 119869FCM the better is the clustering effectand the higher is the fitness function 119865(119875)

24 Shadowed Sets Conventional uncertainty models likefuzzy sets tend to capture vagueness through membershipvalues and associate precise numeric values of membershipwith vague concepts By introducing 120572-cut [19] a fuzzy setcan be converted into a classical set Shadowed sets mapeach element of a given fuzzy set into 0 1 and the unitinterval [0 1] namely excluded included and uncertainrespectively

For constructing a shadowed set Mitra et al [21] pro-posed an optimization based on balance of vagueness Aselevating membership values of some regions to 1 and atthe same time reducing membership values of some regionsto 0 the uncertainty in these regions can be eliminatedTo keep the balance of the total uncertainty regions it

f(x)

Ω3

Ω2

Ω1

x1 x2

1

120572

x

1minus120572

0

Figure 1 Shadowed sets induced by fuzzy function 119891(119909)

needs to compensate these changes by the construction ofuncertain regions namely shadowed sets that absorb theprevious elimination of partial membership at low and highranges The shadowed sets are induced by fuzzy membershipfunction in Figure 1

Here119909 denotes the objects119891(119909) isin [0 1] is the continuousmembership function of the objects belonging to a clusterThe symbolΩ

1shows the reduction of membership the sym-

bol Ω2depicts the elevation of membership and the symbol

Ω3shows the formation of shadows In order to balance the

total uncertainty the retention of balance translates into thefollowing dependency

Ω1+ Ω2= Ω3 (8)

And the integral forms are given as

Ω1= int

119909119891(119909)le120572

119891 (119909) 119889119909 Ω2= int

119909119891(119909)ge1minus120572

(1 minus 119891 (119909)) 119889119909

Ω3= int

119909120572lt119891(119909)lt1minus120572

119889119909

(9)

The threshold of reducing and elevating is 120572 and 1 minus 120572(120572 isin (0 05)) The optimal value of 120572 should be acquired bytranslating it into the minimization of the following objectivefunction

119874 (120572) =1003816100381610038161003816Ω1+ Ω2minus Ω3

1003816100381610038161003816 (10)

4 Computational Intelligence and Neuroscience

For a fuzzy set with discrete membership function thebalance equation is modified as

119874(120572119895) =

10038161003816100381610038161003816100381610038161003816100381610038161003816

sum

119906119894119895le120572119895

119906119894119895+ sum

119906119894119895ge119906119895maxminus120572119895

(119906119895maxminus 120572119895)

minus card 119906119894119895| 120572119895lt 119906119894119895lt 119906119895maxminus 120572119895

10038161003816100381610038161003816100381610038161003816100381610038161003816

(11)

In order to find the best 120572119895 it should satisfy the following

optimal problem

120572119895= arg min120572119895

119874(120572119895) (12)

where 119906119894119895isin [0 1] is the membership of 119909

119894in a cluster with

prototype 120573119895 119906119895max

and 119906119895min

denote the highest and lowestmembership values to the 119895th cluster and 120572

119895is the threshold

of the 119895th cluster The range of feasible values of threshold 120572119895

is [119906119895min (119906119895min+ 119906119895max)2] [19]

This approach considers all membership values withrespect to a fixed cluster when updating the prototype ofthis cluster The main merits of shadowed sets involve theoptimizationmechanism for choosing separate threshold andthe reduction of the burden of plain numeric computations

25 XB Clustering Validity Index The clustering algorithmsdescribed above require prespecification of the number ofclusters The partition results are dependent on the choiceof 119862 There exist validity indices to evaluate the goodness ofclustering according to a given number of clusters thereforethese validity indices can be used to acquire the optimal valueof 119862 [27]

The XB index presents a fuzzy-validity criterion basedon a validity function which identifies overall compact andseparate fuzzy 119888-partitions This function depends upon thedata set geometric distance measure and distance betweencluster centroids and fuzzy partition irrespective of anyfuzzy algorithm used For evaluating the goodness of thedata partition both cluster compactness and interclusterseparation should be taken into account For the FCMalgorithm with 119898 = 20 the Xie-Beni index can be shownto be

119878XB =119869FCM1198731198892

min (13)

where 119889min = min119894119895

10038171003817100381710038171003817120573119894minus 120573119895

10038171003817100381710038171003817is the minimum distance

between cluster centroidsThemore separate the clusters thelarger the 119889min and the smaller the 119878XB

3 Shadowed Sets-Based PSO-FuzzyClustering SP-FCM

FCM strives to find 119862 compact clusters in 119883 where 119862 is oneof the specified parameters But the process of selecting andadjusting 119862manually to obtain desirable cluster partitions ina given data set is very subjective and somewhat arbitrary

To seek the optimal cluster structure 119862 is always allowed tobe overestimated [28] such that the distances between someclusters are not big enough or themembership values of someobjects with different clusters are adjacent and ambiguousin a given data set And in this case the modification ofprototypes through long time iteration is meaningless

The main subject of cluster validation is the evaluation ofclustering results to find the partitioning that best fits the dataset Based on the foregoing algorithms we wish to find clusterpartitions that contain compact and well-separated clustersIn our algorithm 119862 is also overestimated and the clusterscompete for data membership We can set [119862min 119862max] as thereasonable range of cluster number based on the knowledgeof the data This provides a more transparent and tractableprocess of cluster number reduction Considering the fuzzypartition matrix 119880 = [119906

119894119895]119873times119862

each column is comprisedof the membership values of all feature vectors 119909

119894with a

single cluster center Thus an optimal threshold 120572119895(119895 =

1 2 119862) for each column should be found to create a harderpartition by (12) The amount of data which are assignedmembership value equal to 1 is identified as the cardinalityof corresponding cluster According to 120572

119895 the cardinality of

the 119895th column is

119872119895= card 119906

119894119895| 119906119894119895ge 119906119895maxminus 120572119895 (14)

Here the threshold is not subjectively user-defined butit is established on the balance of uncertainty and canbe adjusted automatically in the clustering process Thisproperty of shadowed sets can be used to reduce the clusternumber In order to control the convergence speed thedecision to delete clusters can be based on some thresholdsDifferent threshold values should be set for different data setsdepending on the cluster structure and size of data sets Herea threshold 120576 and attrition rate 120588 (0 lt 120588 lt 1) are set Thedecision to delete clusters in SP-FCM is based solely on clustercardinality and the threshold 120576 If 120576 is too small 119862 is reducedmore slowly and it may stop prematurely before the optimalcluster number is found On the other hand if 120576 is too large119862 may be reduced too drastically In our method clusterswhose cardinalities 119872

119895lt 120576 are considered as ldquocandidatesrdquo

for removal And we can remove up to lfloor120588 times 119862rfloor clustershaving the lowest cardinality from the pool of candidatesspecified by 120576 Limiting the number of clusters that can beremoved at one time prevents 119862 from being reduced toodrastically when 120576 is set too high for a given data set Thiswould automatically estimate the best cluster number whilealso utilizing a faster consistent and repeatable initializationtechnique For evaluating the goodness of the data partitionboth cluster compactness and intercluster separation shouldbe taken into account Hence the XB index is adopted

For each 119862 in the range of [119862min 119862max] a set of clustervalidity indexes were calculated where 119862max is the initialcluster number which is set to be much larger than theexpected cluster numberThe partitionmatrix with119862 clusterswith the best aggregate validity index is selected as the finalcluster partitionThe SP-FCM algorithm is summarized as inAlgorithm 1

Computational Intelligence and Neuroscience 5

(1) Initialize 119862max and 119862min let 119862 = 119862max the real number119898 iteration counter 119896 = 0 iteration counter 119905 = 0 maximumiteration number 119879 of PSO

(2) Initialize the population size 119871 the initial velocity of particles the initial position of particles 1198881 1198882 119908 the threshold 120576

and attrition rate 120588(3)DO

Repeat (a) Update partition matrix 119880(119896)for all particles by (3)(b) Calculate the cluster center for each particle by (4)(c) Calculate the fitness value for each particle by (7)(d) Calculate 119901119887119890119904119905 for each particle(e) Calculate 119892119887119890119904119905 for the swarm(f) Update the velocity and position of each particle by (5)(g) 119905 = 119905 + 1

Until PSO termination condition is met (lowast)(i) Calculate the optimal threshold 120572

119895(119895 = 1 2 119862) for each column of partition matrix 119880(119896) by (12)

and relocate 119906119894119895(1 le 119894 le 119873) of 119895th cluster according to 120572

119895

(ii) Calculate cardinality119872119895for each cluster on the basis of the number of data whose membership value equal to 1 by (14)

1 le 119895 le 119862

(iii) Remove all clusters whose119872119895lt 120576 and119872

119895is among lfloor120588 times 119862rfloor lowest cardinality

(iv) Update cluster number C(v) Calculate cluster validity index 119878XB(119896) by (13)(vi) Update iteration counter 119896 = 119896 + 1

While termination condition is not met (lowastlowast)(lowast) The termination condition of PSO in this method is 119905 ge 119879 (reach the maximum number of iterations) or the velocity

updates are close to zero over a number of iterations(lowastlowast) The algorithm can terminate under either of the following two conditions(1) The prototype parameters in 119861 stabilize within some threshold 120575(2) The number of clusters has reached the minimum limit 119862min

Algorithm 1 SP-FCM

Here if lfloor120588 times 119862rfloor is equal to 0 we can let it to be1 This means that the cluster with the lowest cardinalitymay be removed The initial 119862max cluster prototypes canbe initialized using exemplars from data points selected by120573119895= 119909lfloor(119873119862max)119895rfloor

After termination the 119861 and 119880 from119862 isin [119862min 119862max] with the best cluster validity index 119878XB areselected as the final cluster prototype and partition

4 Experimental Results

In this section the performance of FCM RCM shadowed 119888-means (SCM) [21] shadowed rough 119888-means (SRCM) [19]and SP-FCM algorithms is presented on four UCI datasetsfour yeast gene expression datasets and real data Forevaluating the convergence effect the fundamental criterioncan be described as follows the distance between differentobjects in the same cluster should be as close as possible thedistance between different objects in different cluster shouldbe as far as possible Here we use DB index and Dunn indexto evaluate the clustering effect For a given data set and 119862value the higher the similarity values within the clusters andthe intercluster separation the lower the DB index value Agood clustering procedure shouldmake the value ofDB indexas low as possible Reversely higher values of the Dunn indexindicate better clustering in the sense that the clusters are well

separated and relatively compact The details of experimentsare mentioned below

41 UCI Data Set In our experiments totally four UCI datasets are used including 4-dimensional Iris 13-dimensionalWine 10-dimensional Glass and 34-dimensional Iono-sphere There are 3 clusters in data set of Iris each of whichhas 50 data patterns 3 clusters in data set of Wine whichhave 50 60 and 68 data patterns 6 clusters in data set ofGlass which have 30 35 40 42 36 and 31 separately and2 clusters in data set of Ionosphere which have 226 and125 data patterns The validity indices of each method arecompared in Table 1 SP-FCM can identify compact groupscompared to other algorithmswhen given the cluster number119862 It can also be seen that SRCM and SP-FCM have moreobvious advantages than FCM RCM and SCM SP-FCMperforms slightly better than SRCM in most cases due tothe global search ability which enables it to converge to anoptimum or near optimum solutions Moreover shadowedset- and rough set-based clustering methods namely SP-FCM SRCM RCM and SCM perform better than FCM Itimplies that the partition of approximation regions can revealthe nature of data structure and only the lower bound andboundary region of each cluster have positive contribution inthe process of updating the prototypes

6 Computational Intelligence and Neuroscience

Table 1 Performance of FCM RCM SCM SRCM and SP-FCM on four UCI data sets

Different indices Algorithm Data setsIris Wine Glass Ionosphere

DB index

FCM 07642 08803 22971 20587RCM 06875 05692 19635 15434SCM 06862 05327 18495 14763SRCM 06613 04436 15804 13971SP-FCM 06574 04328 15237 14066

Dunn index

FCM 23106 25834 01142 08381RCM 27119 28157 02637 10233SCM 24801 27992 03150 10319SRCM 30874 31342 05108 11924SP-FCM 33254 31764 04921 12605

0

05

1

15

2

25

3

12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(a) Iris

0

1

2

3

4

5

6

13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(b) Wine

0

2

4

6

8

10

14 1213 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(c) Glass

0

5

10

15

20

25

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(d) Ionosphere

Figure 2 XB validity index of four UCI data sets with cluster number C

As usual the number of clusters is implied by the nature ofthe problem Here with the shadowed sets involved one cananticipate that the optimal number of clusters could be foundThe fuzzification coefficient 119898 can be optimized however itis common to assume a fixed value of 20 which associateswith the form of the membership functions of the generatedclusters For testing the SP-FCMalgorithm the rule119862 le 11987312is adopted and the range of the expected cluster number canbe set as (1) Iris [119862min = 2 119862max = 12] (2) Wine [119862min = 2119862max = 13] (3) Glass [119862min = 2 119862max = 14] (4) Ionosphere

[119862min = 2 119862max = 16] The swarm size is set as 119871 = 20the maximum iteration number of PSO 119879 = 50 and forcluster reduction the cluster cardinality threshold 120576 = 10and the attrition rate 120588 = 01 In each cycle we get thedistribution of every cluster remove part of them accordingto their cardinality and calculate the XB index and thecluster number 119862 varies from 119862max to 119862min After ending thecirculation the partition with the lowest value is selected asthe final result Figure 2 presents the validity indices in theprocess of generating the optimal cluster number Smaller

Computational Intelligence and Neuroscience 7

Table 2 Performance of FCM RCM SCM SRCM and SP-FCM on four yeast expression data sets

Different indices Algorithm Data setsGDS608 GDS2003 GDS2267 GDS2712

DB index

FCM 20861 24671 15916 19526RCM 16109 22104 10274 12058SCM 15938 21346 08946 10965SRCM 13274 19523 07438 07326SP-FCM 12958 18946 07962 06843

Dunn index

FCM 02647 02976 04208 03519RCM 03789 03981 07164 06074SCM 03865 03775 08439 06207SRCM 05126 04953 09759 08113SP-FCM 05407 05026 09182 08049

values indicate more compact and well-separated clustersThe validity indices reach their minimum value at 119862 = 33 6 and 2 separately which correspond to the final clusterprototype and the best partition Through the shadowedsets and PSO approaches the influence of each boundaryregion to the formation of the prototypes and the clusterscan be properly resolved Although more computing timeis required to run SP-FCM the reasonable result can beacquired for processing the overlapping and vagueness datapatterns

42 Yeast Gene Expression Data Set There are four yeastgene expression data sets used in the experiments includingGDS608 GDS2003 GDS2267 and GDS2712 downloadedfrom Gene Expression Omnibus The number of classesand samples of GDS608 is 26 and 6303 for GDS2003 thenumber of classes and samples is 23 and 5617 for GDS2267is 14 and 9275 and for GDS2712 is 15 and 9275 Table 2presents the validity indices of different methods after thecluster number 119862 was given The SP-FCM and SRCM obtainthe same effect and perform better than other clusteringalgorithms The improvement can be attributed to the factthat the global search capacity of PSO is conducive to findingmore appropriate cluster centers while escaping from localoptima

For getting the optimum 119862 automatically we let119898 = 201198881= 149 119888

2= 149 and 119908 = 072 and the rule 119862 le 11987312

is adopted The swarm size is set as 119871 = 20 the maximumiteration number of PSO is 119879 = 80 and for clusterreduction the range of the expected cluster number thecluster cardinality threshold 120576 and the attrition rate 120588 can beset as (1) GDS608 [119862min = 20 119862max = 80] 120576 = 20 120588 = 005(2) GDS2003 [119862min = 20 119862max = 75] 120576 = 20 120588 = 005(3) GDS2267 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008(4) GDS2712 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008In each cycle we get the distribution of every cluster removepart of them according to their cardinality and calculate theXB index and the cluster number 119862 varies from 119862max to119862min The partition with the lowest value is selected as the

final result after the loop is ended As seen in Figure 3 forGDS608 at the beginning the cluster number decreases at afaster rate it takes 26 iterations to reduce the cluster numberfrom 119862 = 80 to 119862 = 30 and 4 iterations from 119862 = 30

to 119862 = 26 and the XB index begins to increase when thecluster number 119862 lt 26 For GDS2003 it takes 24 iterationsto reduce the cluster number from 119862 = 75 to 119862 = 30 and 7iterations from 119862 = 30 to 119862 = 23 and the XB index beginsto increase when the cluster number 119862 lt 23 For GDS2267 ittakes 23 iterations to reduce the cluster number from 119862 = 96to 119862 = 20 and 6 iterations from 119862 = 20 to 119862 = 14 andthe XB index begins to increase when the cluster number119862 lt 14 For GDS2712 it takes 23 iterations to reduce thecluster number from 119862 = 96 to 119862 = 20 and 5 iterationsfrom 119862 = 20 to 119862 = 15 and the XB index begins to increasewhen the cluster number 119862 lt 15 Here the advantagesof fuzzy sets PSO and shadowed sets are integrated in theSP-FCM and make this algorithm applicable to deal withoverlapping partitions the uncertainty and vagueness arisingfrom the boundary regions and the optimization processin the shadowed sets makes this method robust to outliersso that the approximation regions of each cluster can bedetermined accurately and the obtained prototypes approachthe desired locations

43 Real Data In this experiment totally 10 different pack-ages are tested Each package is represented by 100 framescaptured from different angles by camera and each frame isextracted SIFT feature points which are used for training arecognition system Figure 4 shows some images with theirSIFT keypoints And this data set is comprised of 248150descriptors We let 119898 = 20 119888

1= 149 119888

2= 149 119908 = 072

119871 = 20 120576 = 30 and 120588 = 001 for the SP-FCM and choose thereasonable range [119862min = 200 119862max = 360] according to thecategory amount of packages and distribution of keypoints ineach image Eighty iterations of PSO are run on each given119862 to produce the cluster prototype 119861 and partition matrix119880 as the starting point for the shadowed sets Longer PSOstabilization is needed to obtainmore stable cluster partitions

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article An Improved Fuzzy c-Means Clustering ...

Computational Intelligence and Neuroscience 3

best value that is tracked by the PSO is the best value obtainedso far by any particle in the swarm The best value is a globalbest and is called 119892best The search for the better positionsfollows the rule as

119881 (119905 + 1) = 119908119881 (119905) + 11988811199031(119901best (119905) minus 119875 (119905))

+ 11988821199032(119892best (119905) minus 119875 (119905))

119875 (119905 + 1) = 119875 (119905) + 119881 (119905 + 1)

(5)

where 119875 and 119881 are position and velocity vector of particlerespectively 119908 is inertia weight 119888

1and 119888

2are positive

constants called acceleration coefficients which control theinfluence of 119901best and 119892best in search process and 119903

1and 1199032

are random values in the range [0 1]The fitness value of eachparticlersquos position is determined by a fitness function andPSO is usually executed with repeated application of (5) untila specified number of iterations have been exceeded or thevelocity updates are close to zero over a number of iterations

23 PSO-Based FCM In this algorithm [26] each particlePart119897represents a cluster center vector which is constructed

as

Part119897= (1198751198971 119875

119897119895 119875

119897119862) (6)

where 119897 represents the 119897th particle 119897 = 1 2 119871 119871 is thenumber of particles and 119871 lt 119873 119875

119897119895is the 119895th cluster center

of particle Part119897 Therefore a swarm represents a number

of candidates cluster center for the data vector Each datavector belongs to a cluster according to its membershipfunction and thus a fuzzy membership is assigned to eachdata vector Each cluster has a cluster center per iteration andpresents a solution which gives a vector of cluster centersThis method determines the position vector Partl for everyparticle updates it and then changes the position of clustercenter And the fitness function for evaluating the generalizedsolutions is stated as

119865 (119875) =

1

119869FCM (7)

The smaller is the 119869FCM the better is the clustering effectand the higher is the fitness function 119865(119875)

24 Shadowed Sets Conventional uncertainty models likefuzzy sets tend to capture vagueness through membershipvalues and associate precise numeric values of membershipwith vague concepts By introducing 120572-cut [19] a fuzzy setcan be converted into a classical set Shadowed sets mapeach element of a given fuzzy set into 0 1 and the unitinterval [0 1] namely excluded included and uncertainrespectively

For constructing a shadowed set Mitra et al [21] pro-posed an optimization based on balance of vagueness Aselevating membership values of some regions to 1 and atthe same time reducing membership values of some regionsto 0 the uncertainty in these regions can be eliminatedTo keep the balance of the total uncertainty regions it

f(x)

Ω3

Ω2

Ω1

x1 x2

1

120572

x

1minus120572

0

Figure 1 Shadowed sets induced by fuzzy function 119891(119909)

needs to compensate these changes by the construction ofuncertain regions namely shadowed sets that absorb theprevious elimination of partial membership at low and highranges The shadowed sets are induced by fuzzy membershipfunction in Figure 1

Here119909 denotes the objects119891(119909) isin [0 1] is the continuousmembership function of the objects belonging to a clusterThe symbolΩ

1shows the reduction of membership the sym-

bol Ω2depicts the elevation of membership and the symbol

Ω3shows the formation of shadows In order to balance the

total uncertainty the retention of balance translates into thefollowing dependency

Ω1+ Ω2= Ω3 (8)

And the integral forms are given as

Ω1= int

119909119891(119909)le120572

119891 (119909) 119889119909 Ω2= int

119909119891(119909)ge1minus120572

(1 minus 119891 (119909)) 119889119909

Ω3= int

119909120572lt119891(119909)lt1minus120572

119889119909

(9)

The threshold of reducing and elevating is 120572 and 1 minus 120572(120572 isin (0 05)) The optimal value of 120572 should be acquired bytranslating it into the minimization of the following objectivefunction

119874 (120572) =1003816100381610038161003816Ω1+ Ω2minus Ω3

1003816100381610038161003816 (10)

4 Computational Intelligence and Neuroscience

For a fuzzy set with discrete membership function thebalance equation is modified as

119874(120572119895) =

10038161003816100381610038161003816100381610038161003816100381610038161003816

sum

119906119894119895le120572119895

119906119894119895+ sum

119906119894119895ge119906119895maxminus120572119895

(119906119895maxminus 120572119895)

minus card 119906119894119895| 120572119895lt 119906119894119895lt 119906119895maxminus 120572119895

10038161003816100381610038161003816100381610038161003816100381610038161003816

(11)

In order to find the best 120572119895 it should satisfy the following

optimal problem

120572119895= arg min120572119895

119874(120572119895) (12)

where 119906119894119895isin [0 1] is the membership of 119909

119894in a cluster with

prototype 120573119895 119906119895max

and 119906119895min

denote the highest and lowestmembership values to the 119895th cluster and 120572

119895is the threshold

of the 119895th cluster The range of feasible values of threshold 120572119895

is [119906119895min (119906119895min+ 119906119895max)2] [19]

This approach considers all membership values withrespect to a fixed cluster when updating the prototype ofthis cluster The main merits of shadowed sets involve theoptimizationmechanism for choosing separate threshold andthe reduction of the burden of plain numeric computations

25 XB Clustering Validity Index The clustering algorithmsdescribed above require prespecification of the number ofclusters The partition results are dependent on the choiceof 119862 There exist validity indices to evaluate the goodness ofclustering according to a given number of clusters thereforethese validity indices can be used to acquire the optimal valueof 119862 [27]

The XB index presents a fuzzy-validity criterion basedon a validity function which identifies overall compact andseparate fuzzy 119888-partitions This function depends upon thedata set geometric distance measure and distance betweencluster centroids and fuzzy partition irrespective of anyfuzzy algorithm used For evaluating the goodness of thedata partition both cluster compactness and interclusterseparation should be taken into account For the FCMalgorithm with 119898 = 20 the Xie-Beni index can be shownto be

119878XB =119869FCM1198731198892

min (13)

where 119889min = min119894119895

10038171003817100381710038171003817120573119894minus 120573119895

10038171003817100381710038171003817is the minimum distance

between cluster centroidsThemore separate the clusters thelarger the 119889min and the smaller the 119878XB

3 Shadowed Sets-Based PSO-FuzzyClustering SP-FCM

FCM strives to find 119862 compact clusters in 119883 where 119862 is oneof the specified parameters But the process of selecting andadjusting 119862manually to obtain desirable cluster partitions ina given data set is very subjective and somewhat arbitrary

To seek the optimal cluster structure 119862 is always allowed tobe overestimated [28] such that the distances between someclusters are not big enough or themembership values of someobjects with different clusters are adjacent and ambiguousin a given data set And in this case the modification ofprototypes through long time iteration is meaningless

The main subject of cluster validation is the evaluation ofclustering results to find the partitioning that best fits the dataset Based on the foregoing algorithms we wish to find clusterpartitions that contain compact and well-separated clustersIn our algorithm 119862 is also overestimated and the clusterscompete for data membership We can set [119862min 119862max] as thereasonable range of cluster number based on the knowledgeof the data This provides a more transparent and tractableprocess of cluster number reduction Considering the fuzzypartition matrix 119880 = [119906

119894119895]119873times119862

each column is comprisedof the membership values of all feature vectors 119909

119894with a

single cluster center Thus an optimal threshold 120572119895(119895 =

1 2 119862) for each column should be found to create a harderpartition by (12) The amount of data which are assignedmembership value equal to 1 is identified as the cardinalityof corresponding cluster According to 120572

119895 the cardinality of

the 119895th column is

119872119895= card 119906

119894119895| 119906119894119895ge 119906119895maxminus 120572119895 (14)

Here the threshold is not subjectively user-defined butit is established on the balance of uncertainty and canbe adjusted automatically in the clustering process Thisproperty of shadowed sets can be used to reduce the clusternumber In order to control the convergence speed thedecision to delete clusters can be based on some thresholdsDifferent threshold values should be set for different data setsdepending on the cluster structure and size of data sets Herea threshold 120576 and attrition rate 120588 (0 lt 120588 lt 1) are set Thedecision to delete clusters in SP-FCM is based solely on clustercardinality and the threshold 120576 If 120576 is too small 119862 is reducedmore slowly and it may stop prematurely before the optimalcluster number is found On the other hand if 120576 is too large119862 may be reduced too drastically In our method clusterswhose cardinalities 119872

119895lt 120576 are considered as ldquocandidatesrdquo

for removal And we can remove up to lfloor120588 times 119862rfloor clustershaving the lowest cardinality from the pool of candidatesspecified by 120576 Limiting the number of clusters that can beremoved at one time prevents 119862 from being reduced toodrastically when 120576 is set too high for a given data set Thiswould automatically estimate the best cluster number whilealso utilizing a faster consistent and repeatable initializationtechnique For evaluating the goodness of the data partitionboth cluster compactness and intercluster separation shouldbe taken into account Hence the XB index is adopted

For each 119862 in the range of [119862min 119862max] a set of clustervalidity indexes were calculated where 119862max is the initialcluster number which is set to be much larger than theexpected cluster numberThe partitionmatrix with119862 clusterswith the best aggregate validity index is selected as the finalcluster partitionThe SP-FCM algorithm is summarized as inAlgorithm 1

Computational Intelligence and Neuroscience 5

(1) Initialize 119862max and 119862min let 119862 = 119862max the real number119898 iteration counter 119896 = 0 iteration counter 119905 = 0 maximumiteration number 119879 of PSO

(2) Initialize the population size 119871 the initial velocity of particles the initial position of particles 1198881 1198882 119908 the threshold 120576

and attrition rate 120588(3)DO

Repeat (a) Update partition matrix 119880(119896)for all particles by (3)(b) Calculate the cluster center for each particle by (4)(c) Calculate the fitness value for each particle by (7)(d) Calculate 119901119887119890119904119905 for each particle(e) Calculate 119892119887119890119904119905 for the swarm(f) Update the velocity and position of each particle by (5)(g) 119905 = 119905 + 1

Until PSO termination condition is met (lowast)(i) Calculate the optimal threshold 120572

119895(119895 = 1 2 119862) for each column of partition matrix 119880(119896) by (12)

and relocate 119906119894119895(1 le 119894 le 119873) of 119895th cluster according to 120572

119895

(ii) Calculate cardinality119872119895for each cluster on the basis of the number of data whose membership value equal to 1 by (14)

1 le 119895 le 119862

(iii) Remove all clusters whose119872119895lt 120576 and119872

119895is among lfloor120588 times 119862rfloor lowest cardinality

(iv) Update cluster number C(v) Calculate cluster validity index 119878XB(119896) by (13)(vi) Update iteration counter 119896 = 119896 + 1

While termination condition is not met (lowastlowast)(lowast) The termination condition of PSO in this method is 119905 ge 119879 (reach the maximum number of iterations) or the velocity

updates are close to zero over a number of iterations(lowastlowast) The algorithm can terminate under either of the following two conditions(1) The prototype parameters in 119861 stabilize within some threshold 120575(2) The number of clusters has reached the minimum limit 119862min

Algorithm 1 SP-FCM

Here if lfloor120588 times 119862rfloor is equal to 0 we can let it to be1 This means that the cluster with the lowest cardinalitymay be removed The initial 119862max cluster prototypes canbe initialized using exemplars from data points selected by120573119895= 119909lfloor(119873119862max)119895rfloor

After termination the 119861 and 119880 from119862 isin [119862min 119862max] with the best cluster validity index 119878XB areselected as the final cluster prototype and partition

4 Experimental Results

In this section the performance of FCM RCM shadowed 119888-means (SCM) [21] shadowed rough 119888-means (SRCM) [19]and SP-FCM algorithms is presented on four UCI datasetsfour yeast gene expression datasets and real data Forevaluating the convergence effect the fundamental criterioncan be described as follows the distance between differentobjects in the same cluster should be as close as possible thedistance between different objects in different cluster shouldbe as far as possible Here we use DB index and Dunn indexto evaluate the clustering effect For a given data set and 119862value the higher the similarity values within the clusters andthe intercluster separation the lower the DB index value Agood clustering procedure shouldmake the value ofDB indexas low as possible Reversely higher values of the Dunn indexindicate better clustering in the sense that the clusters are well

separated and relatively compact The details of experimentsare mentioned below

41 UCI Data Set In our experiments totally four UCI datasets are used including 4-dimensional Iris 13-dimensionalWine 10-dimensional Glass and 34-dimensional Iono-sphere There are 3 clusters in data set of Iris each of whichhas 50 data patterns 3 clusters in data set of Wine whichhave 50 60 and 68 data patterns 6 clusters in data set ofGlass which have 30 35 40 42 36 and 31 separately and2 clusters in data set of Ionosphere which have 226 and125 data patterns The validity indices of each method arecompared in Table 1 SP-FCM can identify compact groupscompared to other algorithmswhen given the cluster number119862 It can also be seen that SRCM and SP-FCM have moreobvious advantages than FCM RCM and SCM SP-FCMperforms slightly better than SRCM in most cases due tothe global search ability which enables it to converge to anoptimum or near optimum solutions Moreover shadowedset- and rough set-based clustering methods namely SP-FCM SRCM RCM and SCM perform better than FCM Itimplies that the partition of approximation regions can revealthe nature of data structure and only the lower bound andboundary region of each cluster have positive contribution inthe process of updating the prototypes

6 Computational Intelligence and Neuroscience

Table 1 Performance of FCM RCM SCM SRCM and SP-FCM on four UCI data sets

Different indices Algorithm Data setsIris Wine Glass Ionosphere

DB index

FCM 07642 08803 22971 20587RCM 06875 05692 19635 15434SCM 06862 05327 18495 14763SRCM 06613 04436 15804 13971SP-FCM 06574 04328 15237 14066

Dunn index

FCM 23106 25834 01142 08381RCM 27119 28157 02637 10233SCM 24801 27992 03150 10319SRCM 30874 31342 05108 11924SP-FCM 33254 31764 04921 12605

0

05

1

15

2

25

3

12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(a) Iris

0

1

2

3

4

5

6

13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(b) Wine

0

2

4

6

8

10

14 1213 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(c) Glass

0

5

10

15

20

25

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(d) Ionosphere

Figure 2 XB validity index of four UCI data sets with cluster number C

As usual the number of clusters is implied by the nature ofthe problem Here with the shadowed sets involved one cananticipate that the optimal number of clusters could be foundThe fuzzification coefficient 119898 can be optimized however itis common to assume a fixed value of 20 which associateswith the form of the membership functions of the generatedclusters For testing the SP-FCMalgorithm the rule119862 le 11987312is adopted and the range of the expected cluster number canbe set as (1) Iris [119862min = 2 119862max = 12] (2) Wine [119862min = 2119862max = 13] (3) Glass [119862min = 2 119862max = 14] (4) Ionosphere

[119862min = 2 119862max = 16] The swarm size is set as 119871 = 20the maximum iteration number of PSO 119879 = 50 and forcluster reduction the cluster cardinality threshold 120576 = 10and the attrition rate 120588 = 01 In each cycle we get thedistribution of every cluster remove part of them accordingto their cardinality and calculate the XB index and thecluster number 119862 varies from 119862max to 119862min After ending thecirculation the partition with the lowest value is selected asthe final result Figure 2 presents the validity indices in theprocess of generating the optimal cluster number Smaller

Computational Intelligence and Neuroscience 7

Table 2 Performance of FCM RCM SCM SRCM and SP-FCM on four yeast expression data sets

Different indices Algorithm Data setsGDS608 GDS2003 GDS2267 GDS2712

DB index

FCM 20861 24671 15916 19526RCM 16109 22104 10274 12058SCM 15938 21346 08946 10965SRCM 13274 19523 07438 07326SP-FCM 12958 18946 07962 06843

Dunn index

FCM 02647 02976 04208 03519RCM 03789 03981 07164 06074SCM 03865 03775 08439 06207SRCM 05126 04953 09759 08113SP-FCM 05407 05026 09182 08049

values indicate more compact and well-separated clustersThe validity indices reach their minimum value at 119862 = 33 6 and 2 separately which correspond to the final clusterprototype and the best partition Through the shadowedsets and PSO approaches the influence of each boundaryregion to the formation of the prototypes and the clusterscan be properly resolved Although more computing timeis required to run SP-FCM the reasonable result can beacquired for processing the overlapping and vagueness datapatterns

42 Yeast Gene Expression Data Set There are four yeastgene expression data sets used in the experiments includingGDS608 GDS2003 GDS2267 and GDS2712 downloadedfrom Gene Expression Omnibus The number of classesand samples of GDS608 is 26 and 6303 for GDS2003 thenumber of classes and samples is 23 and 5617 for GDS2267is 14 and 9275 and for GDS2712 is 15 and 9275 Table 2presents the validity indices of different methods after thecluster number 119862 was given The SP-FCM and SRCM obtainthe same effect and perform better than other clusteringalgorithms The improvement can be attributed to the factthat the global search capacity of PSO is conducive to findingmore appropriate cluster centers while escaping from localoptima

For getting the optimum 119862 automatically we let119898 = 201198881= 149 119888

2= 149 and 119908 = 072 and the rule 119862 le 11987312

is adopted The swarm size is set as 119871 = 20 the maximumiteration number of PSO is 119879 = 80 and for clusterreduction the range of the expected cluster number thecluster cardinality threshold 120576 and the attrition rate 120588 can beset as (1) GDS608 [119862min = 20 119862max = 80] 120576 = 20 120588 = 005(2) GDS2003 [119862min = 20 119862max = 75] 120576 = 20 120588 = 005(3) GDS2267 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008(4) GDS2712 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008In each cycle we get the distribution of every cluster removepart of them according to their cardinality and calculate theXB index and the cluster number 119862 varies from 119862max to119862min The partition with the lowest value is selected as the

final result after the loop is ended As seen in Figure 3 forGDS608 at the beginning the cluster number decreases at afaster rate it takes 26 iterations to reduce the cluster numberfrom 119862 = 80 to 119862 = 30 and 4 iterations from 119862 = 30

to 119862 = 26 and the XB index begins to increase when thecluster number 119862 lt 26 For GDS2003 it takes 24 iterationsto reduce the cluster number from 119862 = 75 to 119862 = 30 and 7iterations from 119862 = 30 to 119862 = 23 and the XB index beginsto increase when the cluster number 119862 lt 23 For GDS2267 ittakes 23 iterations to reduce the cluster number from 119862 = 96to 119862 = 20 and 6 iterations from 119862 = 20 to 119862 = 14 andthe XB index begins to increase when the cluster number119862 lt 14 For GDS2712 it takes 23 iterations to reduce thecluster number from 119862 = 96 to 119862 = 20 and 5 iterationsfrom 119862 = 20 to 119862 = 15 and the XB index begins to increasewhen the cluster number 119862 lt 15 Here the advantagesof fuzzy sets PSO and shadowed sets are integrated in theSP-FCM and make this algorithm applicable to deal withoverlapping partitions the uncertainty and vagueness arisingfrom the boundary regions and the optimization processin the shadowed sets makes this method robust to outliersso that the approximation regions of each cluster can bedetermined accurately and the obtained prototypes approachthe desired locations

43 Real Data In this experiment totally 10 different pack-ages are tested Each package is represented by 100 framescaptured from different angles by camera and each frame isextracted SIFT feature points which are used for training arecognition system Figure 4 shows some images with theirSIFT keypoints And this data set is comprised of 248150descriptors We let 119898 = 20 119888

1= 149 119888

2= 149 119908 = 072

119871 = 20 120576 = 30 and 120588 = 001 for the SP-FCM and choose thereasonable range [119862min = 200 119862max = 360] according to thecategory amount of packages and distribution of keypoints ineach image Eighty iterations of PSO are run on each given119862 to produce the cluster prototype 119861 and partition matrix119880 as the starting point for the shadowed sets Longer PSOstabilization is needed to obtainmore stable cluster partitions

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article An Improved Fuzzy c-Means Clustering ...

4 Computational Intelligence and Neuroscience

For a fuzzy set with discrete membership function thebalance equation is modified as

119874(120572119895) =

10038161003816100381610038161003816100381610038161003816100381610038161003816

sum

119906119894119895le120572119895

119906119894119895+ sum

119906119894119895ge119906119895maxminus120572119895

(119906119895maxminus 120572119895)

minus card 119906119894119895| 120572119895lt 119906119894119895lt 119906119895maxminus 120572119895

10038161003816100381610038161003816100381610038161003816100381610038161003816

(11)

In order to find the best 120572119895 it should satisfy the following

optimal problem

120572119895= arg min120572119895

119874(120572119895) (12)

where 119906119894119895isin [0 1] is the membership of 119909

119894in a cluster with

prototype 120573119895 119906119895max

and 119906119895min

denote the highest and lowestmembership values to the 119895th cluster and 120572

119895is the threshold

of the 119895th cluster The range of feasible values of threshold 120572119895

is [119906119895min (119906119895min+ 119906119895max)2] [19]

This approach considers all membership values withrespect to a fixed cluster when updating the prototype ofthis cluster The main merits of shadowed sets involve theoptimizationmechanism for choosing separate threshold andthe reduction of the burden of plain numeric computations

25 XB Clustering Validity Index The clustering algorithmsdescribed above require prespecification of the number ofclusters The partition results are dependent on the choiceof 119862 There exist validity indices to evaluate the goodness ofclustering according to a given number of clusters thereforethese validity indices can be used to acquire the optimal valueof 119862 [27]

The XB index presents a fuzzy-validity criterion basedon a validity function which identifies overall compact andseparate fuzzy 119888-partitions This function depends upon thedata set geometric distance measure and distance betweencluster centroids and fuzzy partition irrespective of anyfuzzy algorithm used For evaluating the goodness of thedata partition both cluster compactness and interclusterseparation should be taken into account For the FCMalgorithm with 119898 = 20 the Xie-Beni index can be shownto be

119878XB =119869FCM1198731198892

min (13)

where 119889min = min119894119895

10038171003817100381710038171003817120573119894minus 120573119895

10038171003817100381710038171003817is the minimum distance

between cluster centroidsThemore separate the clusters thelarger the 119889min and the smaller the 119878XB

3 Shadowed Sets-Based PSO-FuzzyClustering SP-FCM

FCM strives to find 119862 compact clusters in 119883 where 119862 is oneof the specified parameters But the process of selecting andadjusting 119862manually to obtain desirable cluster partitions ina given data set is very subjective and somewhat arbitrary

To seek the optimal cluster structure 119862 is always allowed tobe overestimated [28] such that the distances between someclusters are not big enough or themembership values of someobjects with different clusters are adjacent and ambiguousin a given data set And in this case the modification ofprototypes through long time iteration is meaningless

The main subject of cluster validation is the evaluation ofclustering results to find the partitioning that best fits the dataset Based on the foregoing algorithms we wish to find clusterpartitions that contain compact and well-separated clustersIn our algorithm 119862 is also overestimated and the clusterscompete for data membership We can set [119862min 119862max] as thereasonable range of cluster number based on the knowledgeof the data This provides a more transparent and tractableprocess of cluster number reduction Considering the fuzzypartition matrix 119880 = [119906

119894119895]119873times119862

each column is comprisedof the membership values of all feature vectors 119909

119894with a

single cluster center Thus an optimal threshold 120572119895(119895 =

1 2 119862) for each column should be found to create a harderpartition by (12) The amount of data which are assignedmembership value equal to 1 is identified as the cardinalityof corresponding cluster According to 120572

119895 the cardinality of

the 119895th column is

119872119895= card 119906

119894119895| 119906119894119895ge 119906119895maxminus 120572119895 (14)

Here the threshold is not subjectively user-defined butit is established on the balance of uncertainty and canbe adjusted automatically in the clustering process Thisproperty of shadowed sets can be used to reduce the clusternumber In order to control the convergence speed thedecision to delete clusters can be based on some thresholdsDifferent threshold values should be set for different data setsdepending on the cluster structure and size of data sets Herea threshold 120576 and attrition rate 120588 (0 lt 120588 lt 1) are set Thedecision to delete clusters in SP-FCM is based solely on clustercardinality and the threshold 120576 If 120576 is too small 119862 is reducedmore slowly and it may stop prematurely before the optimalcluster number is found On the other hand if 120576 is too large119862 may be reduced too drastically In our method clusterswhose cardinalities 119872

119895lt 120576 are considered as ldquocandidatesrdquo

for removal And we can remove up to lfloor120588 times 119862rfloor clustershaving the lowest cardinality from the pool of candidatesspecified by 120576 Limiting the number of clusters that can beremoved at one time prevents 119862 from being reduced toodrastically when 120576 is set too high for a given data set Thiswould automatically estimate the best cluster number whilealso utilizing a faster consistent and repeatable initializationtechnique For evaluating the goodness of the data partitionboth cluster compactness and intercluster separation shouldbe taken into account Hence the XB index is adopted

For each 119862 in the range of [119862min 119862max] a set of clustervalidity indexes were calculated where 119862max is the initialcluster number which is set to be much larger than theexpected cluster numberThe partitionmatrix with119862 clusterswith the best aggregate validity index is selected as the finalcluster partitionThe SP-FCM algorithm is summarized as inAlgorithm 1

Computational Intelligence and Neuroscience 5

(1) Initialize 119862max and 119862min let 119862 = 119862max the real number119898 iteration counter 119896 = 0 iteration counter 119905 = 0 maximumiteration number 119879 of PSO

(2) Initialize the population size 119871 the initial velocity of particles the initial position of particles 1198881 1198882 119908 the threshold 120576

and attrition rate 120588(3)DO

Repeat (a) Update partition matrix 119880(119896)for all particles by (3)(b) Calculate the cluster center for each particle by (4)(c) Calculate the fitness value for each particle by (7)(d) Calculate 119901119887119890119904119905 for each particle(e) Calculate 119892119887119890119904119905 for the swarm(f) Update the velocity and position of each particle by (5)(g) 119905 = 119905 + 1

Until PSO termination condition is met (lowast)(i) Calculate the optimal threshold 120572

119895(119895 = 1 2 119862) for each column of partition matrix 119880(119896) by (12)

and relocate 119906119894119895(1 le 119894 le 119873) of 119895th cluster according to 120572

119895

(ii) Calculate cardinality119872119895for each cluster on the basis of the number of data whose membership value equal to 1 by (14)

1 le 119895 le 119862

(iii) Remove all clusters whose119872119895lt 120576 and119872

119895is among lfloor120588 times 119862rfloor lowest cardinality

(iv) Update cluster number C(v) Calculate cluster validity index 119878XB(119896) by (13)(vi) Update iteration counter 119896 = 119896 + 1

While termination condition is not met (lowastlowast)(lowast) The termination condition of PSO in this method is 119905 ge 119879 (reach the maximum number of iterations) or the velocity

updates are close to zero over a number of iterations(lowastlowast) The algorithm can terminate under either of the following two conditions(1) The prototype parameters in 119861 stabilize within some threshold 120575(2) The number of clusters has reached the minimum limit 119862min

Algorithm 1 SP-FCM

Here if lfloor120588 times 119862rfloor is equal to 0 we can let it to be1 This means that the cluster with the lowest cardinalitymay be removed The initial 119862max cluster prototypes canbe initialized using exemplars from data points selected by120573119895= 119909lfloor(119873119862max)119895rfloor

After termination the 119861 and 119880 from119862 isin [119862min 119862max] with the best cluster validity index 119878XB areselected as the final cluster prototype and partition

4 Experimental Results

In this section the performance of FCM RCM shadowed 119888-means (SCM) [21] shadowed rough 119888-means (SRCM) [19]and SP-FCM algorithms is presented on four UCI datasetsfour yeast gene expression datasets and real data Forevaluating the convergence effect the fundamental criterioncan be described as follows the distance between differentobjects in the same cluster should be as close as possible thedistance between different objects in different cluster shouldbe as far as possible Here we use DB index and Dunn indexto evaluate the clustering effect For a given data set and 119862value the higher the similarity values within the clusters andthe intercluster separation the lower the DB index value Agood clustering procedure shouldmake the value ofDB indexas low as possible Reversely higher values of the Dunn indexindicate better clustering in the sense that the clusters are well

separated and relatively compact The details of experimentsare mentioned below

41 UCI Data Set In our experiments totally four UCI datasets are used including 4-dimensional Iris 13-dimensionalWine 10-dimensional Glass and 34-dimensional Iono-sphere There are 3 clusters in data set of Iris each of whichhas 50 data patterns 3 clusters in data set of Wine whichhave 50 60 and 68 data patterns 6 clusters in data set ofGlass which have 30 35 40 42 36 and 31 separately and2 clusters in data set of Ionosphere which have 226 and125 data patterns The validity indices of each method arecompared in Table 1 SP-FCM can identify compact groupscompared to other algorithmswhen given the cluster number119862 It can also be seen that SRCM and SP-FCM have moreobvious advantages than FCM RCM and SCM SP-FCMperforms slightly better than SRCM in most cases due tothe global search ability which enables it to converge to anoptimum or near optimum solutions Moreover shadowedset- and rough set-based clustering methods namely SP-FCM SRCM RCM and SCM perform better than FCM Itimplies that the partition of approximation regions can revealthe nature of data structure and only the lower bound andboundary region of each cluster have positive contribution inthe process of updating the prototypes

6 Computational Intelligence and Neuroscience

Table 1 Performance of FCM RCM SCM SRCM and SP-FCM on four UCI data sets

Different indices Algorithm Data setsIris Wine Glass Ionosphere

DB index

FCM 07642 08803 22971 20587RCM 06875 05692 19635 15434SCM 06862 05327 18495 14763SRCM 06613 04436 15804 13971SP-FCM 06574 04328 15237 14066

Dunn index

FCM 23106 25834 01142 08381RCM 27119 28157 02637 10233SCM 24801 27992 03150 10319SRCM 30874 31342 05108 11924SP-FCM 33254 31764 04921 12605

0

05

1

15

2

25

3

12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(a) Iris

0

1

2

3

4

5

6

13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(b) Wine

0

2

4

6

8

10

14 1213 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(c) Glass

0

5

10

15

20

25

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(d) Ionosphere

Figure 2 XB validity index of four UCI data sets with cluster number C

As usual the number of clusters is implied by the nature ofthe problem Here with the shadowed sets involved one cananticipate that the optimal number of clusters could be foundThe fuzzification coefficient 119898 can be optimized however itis common to assume a fixed value of 20 which associateswith the form of the membership functions of the generatedclusters For testing the SP-FCMalgorithm the rule119862 le 11987312is adopted and the range of the expected cluster number canbe set as (1) Iris [119862min = 2 119862max = 12] (2) Wine [119862min = 2119862max = 13] (3) Glass [119862min = 2 119862max = 14] (4) Ionosphere

[119862min = 2 119862max = 16] The swarm size is set as 119871 = 20the maximum iteration number of PSO 119879 = 50 and forcluster reduction the cluster cardinality threshold 120576 = 10and the attrition rate 120588 = 01 In each cycle we get thedistribution of every cluster remove part of them accordingto their cardinality and calculate the XB index and thecluster number 119862 varies from 119862max to 119862min After ending thecirculation the partition with the lowest value is selected asthe final result Figure 2 presents the validity indices in theprocess of generating the optimal cluster number Smaller

Computational Intelligence and Neuroscience 7

Table 2 Performance of FCM RCM SCM SRCM and SP-FCM on four yeast expression data sets

Different indices Algorithm Data setsGDS608 GDS2003 GDS2267 GDS2712

DB index

FCM 20861 24671 15916 19526RCM 16109 22104 10274 12058SCM 15938 21346 08946 10965SRCM 13274 19523 07438 07326SP-FCM 12958 18946 07962 06843

Dunn index

FCM 02647 02976 04208 03519RCM 03789 03981 07164 06074SCM 03865 03775 08439 06207SRCM 05126 04953 09759 08113SP-FCM 05407 05026 09182 08049

values indicate more compact and well-separated clustersThe validity indices reach their minimum value at 119862 = 33 6 and 2 separately which correspond to the final clusterprototype and the best partition Through the shadowedsets and PSO approaches the influence of each boundaryregion to the formation of the prototypes and the clusterscan be properly resolved Although more computing timeis required to run SP-FCM the reasonable result can beacquired for processing the overlapping and vagueness datapatterns

42 Yeast Gene Expression Data Set There are four yeastgene expression data sets used in the experiments includingGDS608 GDS2003 GDS2267 and GDS2712 downloadedfrom Gene Expression Omnibus The number of classesand samples of GDS608 is 26 and 6303 for GDS2003 thenumber of classes and samples is 23 and 5617 for GDS2267is 14 and 9275 and for GDS2712 is 15 and 9275 Table 2presents the validity indices of different methods after thecluster number 119862 was given The SP-FCM and SRCM obtainthe same effect and perform better than other clusteringalgorithms The improvement can be attributed to the factthat the global search capacity of PSO is conducive to findingmore appropriate cluster centers while escaping from localoptima

For getting the optimum 119862 automatically we let119898 = 201198881= 149 119888

2= 149 and 119908 = 072 and the rule 119862 le 11987312

is adopted The swarm size is set as 119871 = 20 the maximumiteration number of PSO is 119879 = 80 and for clusterreduction the range of the expected cluster number thecluster cardinality threshold 120576 and the attrition rate 120588 can beset as (1) GDS608 [119862min = 20 119862max = 80] 120576 = 20 120588 = 005(2) GDS2003 [119862min = 20 119862max = 75] 120576 = 20 120588 = 005(3) GDS2267 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008(4) GDS2712 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008In each cycle we get the distribution of every cluster removepart of them according to their cardinality and calculate theXB index and the cluster number 119862 varies from 119862max to119862min The partition with the lowest value is selected as the

final result after the loop is ended As seen in Figure 3 forGDS608 at the beginning the cluster number decreases at afaster rate it takes 26 iterations to reduce the cluster numberfrom 119862 = 80 to 119862 = 30 and 4 iterations from 119862 = 30

to 119862 = 26 and the XB index begins to increase when thecluster number 119862 lt 26 For GDS2003 it takes 24 iterationsto reduce the cluster number from 119862 = 75 to 119862 = 30 and 7iterations from 119862 = 30 to 119862 = 23 and the XB index beginsto increase when the cluster number 119862 lt 23 For GDS2267 ittakes 23 iterations to reduce the cluster number from 119862 = 96to 119862 = 20 and 6 iterations from 119862 = 20 to 119862 = 14 andthe XB index begins to increase when the cluster number119862 lt 14 For GDS2712 it takes 23 iterations to reduce thecluster number from 119862 = 96 to 119862 = 20 and 5 iterationsfrom 119862 = 20 to 119862 = 15 and the XB index begins to increasewhen the cluster number 119862 lt 15 Here the advantagesof fuzzy sets PSO and shadowed sets are integrated in theSP-FCM and make this algorithm applicable to deal withoverlapping partitions the uncertainty and vagueness arisingfrom the boundary regions and the optimization processin the shadowed sets makes this method robust to outliersso that the approximation regions of each cluster can bedetermined accurately and the obtained prototypes approachthe desired locations

43 Real Data In this experiment totally 10 different pack-ages are tested Each package is represented by 100 framescaptured from different angles by camera and each frame isextracted SIFT feature points which are used for training arecognition system Figure 4 shows some images with theirSIFT keypoints And this data set is comprised of 248150descriptors We let 119898 = 20 119888

1= 149 119888

2= 149 119908 = 072

119871 = 20 120576 = 30 and 120588 = 001 for the SP-FCM and choose thereasonable range [119862min = 200 119862max = 360] according to thecategory amount of packages and distribution of keypoints ineach image Eighty iterations of PSO are run on each given119862 to produce the cluster prototype 119861 and partition matrix119880 as the starting point for the shadowed sets Longer PSOstabilization is needed to obtainmore stable cluster partitions

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article An Improved Fuzzy c-Means Clustering ...

Computational Intelligence and Neuroscience 5

(1) Initialize 119862max and 119862min let 119862 = 119862max the real number119898 iteration counter 119896 = 0 iteration counter 119905 = 0 maximumiteration number 119879 of PSO

(2) Initialize the population size 119871 the initial velocity of particles the initial position of particles 1198881 1198882 119908 the threshold 120576

and attrition rate 120588(3)DO

Repeat (a) Update partition matrix 119880(119896)for all particles by (3)(b) Calculate the cluster center for each particle by (4)(c) Calculate the fitness value for each particle by (7)(d) Calculate 119901119887119890119904119905 for each particle(e) Calculate 119892119887119890119904119905 for the swarm(f) Update the velocity and position of each particle by (5)(g) 119905 = 119905 + 1

Until PSO termination condition is met (lowast)(i) Calculate the optimal threshold 120572

119895(119895 = 1 2 119862) for each column of partition matrix 119880(119896) by (12)

and relocate 119906119894119895(1 le 119894 le 119873) of 119895th cluster according to 120572

119895

(ii) Calculate cardinality119872119895for each cluster on the basis of the number of data whose membership value equal to 1 by (14)

1 le 119895 le 119862

(iii) Remove all clusters whose119872119895lt 120576 and119872

119895is among lfloor120588 times 119862rfloor lowest cardinality

(iv) Update cluster number C(v) Calculate cluster validity index 119878XB(119896) by (13)(vi) Update iteration counter 119896 = 119896 + 1

While termination condition is not met (lowastlowast)(lowast) The termination condition of PSO in this method is 119905 ge 119879 (reach the maximum number of iterations) or the velocity

updates are close to zero over a number of iterations(lowastlowast) The algorithm can terminate under either of the following two conditions(1) The prototype parameters in 119861 stabilize within some threshold 120575(2) The number of clusters has reached the minimum limit 119862min

Algorithm 1 SP-FCM

Here if lfloor120588 times 119862rfloor is equal to 0 we can let it to be1 This means that the cluster with the lowest cardinalitymay be removed The initial 119862max cluster prototypes canbe initialized using exemplars from data points selected by120573119895= 119909lfloor(119873119862max)119895rfloor

After termination the 119861 and 119880 from119862 isin [119862min 119862max] with the best cluster validity index 119878XB areselected as the final cluster prototype and partition

4 Experimental Results

In this section the performance of FCM RCM shadowed 119888-means (SCM) [21] shadowed rough 119888-means (SRCM) [19]and SP-FCM algorithms is presented on four UCI datasetsfour yeast gene expression datasets and real data Forevaluating the convergence effect the fundamental criterioncan be described as follows the distance between differentobjects in the same cluster should be as close as possible thedistance between different objects in different cluster shouldbe as far as possible Here we use DB index and Dunn indexto evaluate the clustering effect For a given data set and 119862value the higher the similarity values within the clusters andthe intercluster separation the lower the DB index value Agood clustering procedure shouldmake the value ofDB indexas low as possible Reversely higher values of the Dunn indexindicate better clustering in the sense that the clusters are well

separated and relatively compact The details of experimentsare mentioned below

41 UCI Data Set In our experiments totally four UCI datasets are used including 4-dimensional Iris 13-dimensionalWine 10-dimensional Glass and 34-dimensional Iono-sphere There are 3 clusters in data set of Iris each of whichhas 50 data patterns 3 clusters in data set of Wine whichhave 50 60 and 68 data patterns 6 clusters in data set ofGlass which have 30 35 40 42 36 and 31 separately and2 clusters in data set of Ionosphere which have 226 and125 data patterns The validity indices of each method arecompared in Table 1 SP-FCM can identify compact groupscompared to other algorithmswhen given the cluster number119862 It can also be seen that SRCM and SP-FCM have moreobvious advantages than FCM RCM and SCM SP-FCMperforms slightly better than SRCM in most cases due tothe global search ability which enables it to converge to anoptimum or near optimum solutions Moreover shadowedset- and rough set-based clustering methods namely SP-FCM SRCM RCM and SCM perform better than FCM Itimplies that the partition of approximation regions can revealthe nature of data structure and only the lower bound andboundary region of each cluster have positive contribution inthe process of updating the prototypes

6 Computational Intelligence and Neuroscience

Table 1 Performance of FCM RCM SCM SRCM and SP-FCM on four UCI data sets

Different indices Algorithm Data setsIris Wine Glass Ionosphere

DB index

FCM 07642 08803 22971 20587RCM 06875 05692 19635 15434SCM 06862 05327 18495 14763SRCM 06613 04436 15804 13971SP-FCM 06574 04328 15237 14066

Dunn index

FCM 23106 25834 01142 08381RCM 27119 28157 02637 10233SCM 24801 27992 03150 10319SRCM 30874 31342 05108 11924SP-FCM 33254 31764 04921 12605

0

05

1

15

2

25

3

12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(a) Iris

0

1

2

3

4

5

6

13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(b) Wine

0

2

4

6

8

10

14 1213 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(c) Glass

0

5

10

15

20

25

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(d) Ionosphere

Figure 2 XB validity index of four UCI data sets with cluster number C

As usual the number of clusters is implied by the nature ofthe problem Here with the shadowed sets involved one cananticipate that the optimal number of clusters could be foundThe fuzzification coefficient 119898 can be optimized however itis common to assume a fixed value of 20 which associateswith the form of the membership functions of the generatedclusters For testing the SP-FCMalgorithm the rule119862 le 11987312is adopted and the range of the expected cluster number canbe set as (1) Iris [119862min = 2 119862max = 12] (2) Wine [119862min = 2119862max = 13] (3) Glass [119862min = 2 119862max = 14] (4) Ionosphere

[119862min = 2 119862max = 16] The swarm size is set as 119871 = 20the maximum iteration number of PSO 119879 = 50 and forcluster reduction the cluster cardinality threshold 120576 = 10and the attrition rate 120588 = 01 In each cycle we get thedistribution of every cluster remove part of them accordingto their cardinality and calculate the XB index and thecluster number 119862 varies from 119862max to 119862min After ending thecirculation the partition with the lowest value is selected asthe final result Figure 2 presents the validity indices in theprocess of generating the optimal cluster number Smaller

Computational Intelligence and Neuroscience 7

Table 2 Performance of FCM RCM SCM SRCM and SP-FCM on four yeast expression data sets

Different indices Algorithm Data setsGDS608 GDS2003 GDS2267 GDS2712

DB index

FCM 20861 24671 15916 19526RCM 16109 22104 10274 12058SCM 15938 21346 08946 10965SRCM 13274 19523 07438 07326SP-FCM 12958 18946 07962 06843

Dunn index

FCM 02647 02976 04208 03519RCM 03789 03981 07164 06074SCM 03865 03775 08439 06207SRCM 05126 04953 09759 08113SP-FCM 05407 05026 09182 08049

values indicate more compact and well-separated clustersThe validity indices reach their minimum value at 119862 = 33 6 and 2 separately which correspond to the final clusterprototype and the best partition Through the shadowedsets and PSO approaches the influence of each boundaryregion to the formation of the prototypes and the clusterscan be properly resolved Although more computing timeis required to run SP-FCM the reasonable result can beacquired for processing the overlapping and vagueness datapatterns

42 Yeast Gene Expression Data Set There are four yeastgene expression data sets used in the experiments includingGDS608 GDS2003 GDS2267 and GDS2712 downloadedfrom Gene Expression Omnibus The number of classesand samples of GDS608 is 26 and 6303 for GDS2003 thenumber of classes and samples is 23 and 5617 for GDS2267is 14 and 9275 and for GDS2712 is 15 and 9275 Table 2presents the validity indices of different methods after thecluster number 119862 was given The SP-FCM and SRCM obtainthe same effect and perform better than other clusteringalgorithms The improvement can be attributed to the factthat the global search capacity of PSO is conducive to findingmore appropriate cluster centers while escaping from localoptima

For getting the optimum 119862 automatically we let119898 = 201198881= 149 119888

2= 149 and 119908 = 072 and the rule 119862 le 11987312

is adopted The swarm size is set as 119871 = 20 the maximumiteration number of PSO is 119879 = 80 and for clusterreduction the range of the expected cluster number thecluster cardinality threshold 120576 and the attrition rate 120588 can beset as (1) GDS608 [119862min = 20 119862max = 80] 120576 = 20 120588 = 005(2) GDS2003 [119862min = 20 119862max = 75] 120576 = 20 120588 = 005(3) GDS2267 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008(4) GDS2712 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008In each cycle we get the distribution of every cluster removepart of them according to their cardinality and calculate theXB index and the cluster number 119862 varies from 119862max to119862min The partition with the lowest value is selected as the

final result after the loop is ended As seen in Figure 3 forGDS608 at the beginning the cluster number decreases at afaster rate it takes 26 iterations to reduce the cluster numberfrom 119862 = 80 to 119862 = 30 and 4 iterations from 119862 = 30

to 119862 = 26 and the XB index begins to increase when thecluster number 119862 lt 26 For GDS2003 it takes 24 iterationsto reduce the cluster number from 119862 = 75 to 119862 = 30 and 7iterations from 119862 = 30 to 119862 = 23 and the XB index beginsto increase when the cluster number 119862 lt 23 For GDS2267 ittakes 23 iterations to reduce the cluster number from 119862 = 96to 119862 = 20 and 6 iterations from 119862 = 20 to 119862 = 14 andthe XB index begins to increase when the cluster number119862 lt 14 For GDS2712 it takes 23 iterations to reduce thecluster number from 119862 = 96 to 119862 = 20 and 5 iterationsfrom 119862 = 20 to 119862 = 15 and the XB index begins to increasewhen the cluster number 119862 lt 15 Here the advantagesof fuzzy sets PSO and shadowed sets are integrated in theSP-FCM and make this algorithm applicable to deal withoverlapping partitions the uncertainty and vagueness arisingfrom the boundary regions and the optimization processin the shadowed sets makes this method robust to outliersso that the approximation regions of each cluster can bedetermined accurately and the obtained prototypes approachthe desired locations

43 Real Data In this experiment totally 10 different pack-ages are tested Each package is represented by 100 framescaptured from different angles by camera and each frame isextracted SIFT feature points which are used for training arecognition system Figure 4 shows some images with theirSIFT keypoints And this data set is comprised of 248150descriptors We let 119898 = 20 119888

1= 149 119888

2= 149 119908 = 072

119871 = 20 120576 = 30 and 120588 = 001 for the SP-FCM and choose thereasonable range [119862min = 200 119862max = 360] according to thecategory amount of packages and distribution of keypoints ineach image Eighty iterations of PSO are run on each given119862 to produce the cluster prototype 119861 and partition matrix119880 as the starting point for the shadowed sets Longer PSOstabilization is needed to obtainmore stable cluster partitions

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article An Improved Fuzzy c-Means Clustering ...

6 Computational Intelligence and Neuroscience

Table 1 Performance of FCM RCM SCM SRCM and SP-FCM on four UCI data sets

Different indices Algorithm Data setsIris Wine Glass Ionosphere

DB index

FCM 07642 08803 22971 20587RCM 06875 05692 19635 15434SCM 06862 05327 18495 14763SRCM 06613 04436 15804 13971SP-FCM 06574 04328 15237 14066

Dunn index

FCM 23106 25834 01142 08381RCM 27119 28157 02637 10233SCM 24801 27992 03150 10319SRCM 30874 31342 05108 11924SP-FCM 33254 31764 04921 12605

0

05

1

15

2

25

3

12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(a) Iris

0

1

2

3

4

5

6

13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(b) Wine

0

2

4

6

8

10

14 1213 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(c) Glass

0

5

10

15

20

25

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2Number of clusters

Xie-

Beni

val

idity

(d) Ionosphere

Figure 2 XB validity index of four UCI data sets with cluster number C

As usual the number of clusters is implied by the nature ofthe problem Here with the shadowed sets involved one cananticipate that the optimal number of clusters could be foundThe fuzzification coefficient 119898 can be optimized however itis common to assume a fixed value of 20 which associateswith the form of the membership functions of the generatedclusters For testing the SP-FCMalgorithm the rule119862 le 11987312is adopted and the range of the expected cluster number canbe set as (1) Iris [119862min = 2 119862max = 12] (2) Wine [119862min = 2119862max = 13] (3) Glass [119862min = 2 119862max = 14] (4) Ionosphere

[119862min = 2 119862max = 16] The swarm size is set as 119871 = 20the maximum iteration number of PSO 119879 = 50 and forcluster reduction the cluster cardinality threshold 120576 = 10and the attrition rate 120588 = 01 In each cycle we get thedistribution of every cluster remove part of them accordingto their cardinality and calculate the XB index and thecluster number 119862 varies from 119862max to 119862min After ending thecirculation the partition with the lowest value is selected asthe final result Figure 2 presents the validity indices in theprocess of generating the optimal cluster number Smaller

Computational Intelligence and Neuroscience 7

Table 2 Performance of FCM RCM SCM SRCM and SP-FCM on four yeast expression data sets

Different indices Algorithm Data setsGDS608 GDS2003 GDS2267 GDS2712

DB index

FCM 20861 24671 15916 19526RCM 16109 22104 10274 12058SCM 15938 21346 08946 10965SRCM 13274 19523 07438 07326SP-FCM 12958 18946 07962 06843

Dunn index

FCM 02647 02976 04208 03519RCM 03789 03981 07164 06074SCM 03865 03775 08439 06207SRCM 05126 04953 09759 08113SP-FCM 05407 05026 09182 08049

values indicate more compact and well-separated clustersThe validity indices reach their minimum value at 119862 = 33 6 and 2 separately which correspond to the final clusterprototype and the best partition Through the shadowedsets and PSO approaches the influence of each boundaryregion to the formation of the prototypes and the clusterscan be properly resolved Although more computing timeis required to run SP-FCM the reasonable result can beacquired for processing the overlapping and vagueness datapatterns

42 Yeast Gene Expression Data Set There are four yeastgene expression data sets used in the experiments includingGDS608 GDS2003 GDS2267 and GDS2712 downloadedfrom Gene Expression Omnibus The number of classesand samples of GDS608 is 26 and 6303 for GDS2003 thenumber of classes and samples is 23 and 5617 for GDS2267is 14 and 9275 and for GDS2712 is 15 and 9275 Table 2presents the validity indices of different methods after thecluster number 119862 was given The SP-FCM and SRCM obtainthe same effect and perform better than other clusteringalgorithms The improvement can be attributed to the factthat the global search capacity of PSO is conducive to findingmore appropriate cluster centers while escaping from localoptima

For getting the optimum 119862 automatically we let119898 = 201198881= 149 119888

2= 149 and 119908 = 072 and the rule 119862 le 11987312

is adopted The swarm size is set as 119871 = 20 the maximumiteration number of PSO is 119879 = 80 and for clusterreduction the range of the expected cluster number thecluster cardinality threshold 120576 and the attrition rate 120588 can beset as (1) GDS608 [119862min = 20 119862max = 80] 120576 = 20 120588 = 005(2) GDS2003 [119862min = 20 119862max = 75] 120576 = 20 120588 = 005(3) GDS2267 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008(4) GDS2712 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008In each cycle we get the distribution of every cluster removepart of them according to their cardinality and calculate theXB index and the cluster number 119862 varies from 119862max to119862min The partition with the lowest value is selected as the

final result after the loop is ended As seen in Figure 3 forGDS608 at the beginning the cluster number decreases at afaster rate it takes 26 iterations to reduce the cluster numberfrom 119862 = 80 to 119862 = 30 and 4 iterations from 119862 = 30

to 119862 = 26 and the XB index begins to increase when thecluster number 119862 lt 26 For GDS2003 it takes 24 iterationsto reduce the cluster number from 119862 = 75 to 119862 = 30 and 7iterations from 119862 = 30 to 119862 = 23 and the XB index beginsto increase when the cluster number 119862 lt 23 For GDS2267 ittakes 23 iterations to reduce the cluster number from 119862 = 96to 119862 = 20 and 6 iterations from 119862 = 20 to 119862 = 14 andthe XB index begins to increase when the cluster number119862 lt 14 For GDS2712 it takes 23 iterations to reduce thecluster number from 119862 = 96 to 119862 = 20 and 5 iterationsfrom 119862 = 20 to 119862 = 15 and the XB index begins to increasewhen the cluster number 119862 lt 15 Here the advantagesof fuzzy sets PSO and shadowed sets are integrated in theSP-FCM and make this algorithm applicable to deal withoverlapping partitions the uncertainty and vagueness arisingfrom the boundary regions and the optimization processin the shadowed sets makes this method robust to outliersso that the approximation regions of each cluster can bedetermined accurately and the obtained prototypes approachthe desired locations

43 Real Data In this experiment totally 10 different pack-ages are tested Each package is represented by 100 framescaptured from different angles by camera and each frame isextracted SIFT feature points which are used for training arecognition system Figure 4 shows some images with theirSIFT keypoints And this data set is comprised of 248150descriptors We let 119898 = 20 119888

1= 149 119888

2= 149 119908 = 072

119871 = 20 120576 = 30 and 120588 = 001 for the SP-FCM and choose thereasonable range [119862min = 200 119862max = 360] according to thecategory amount of packages and distribution of keypoints ineach image Eighty iterations of PSO are run on each given119862 to produce the cluster prototype 119861 and partition matrix119880 as the starting point for the shadowed sets Longer PSOstabilization is needed to obtainmore stable cluster partitions

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article An Improved Fuzzy c-Means Clustering ...

Computational Intelligence and Neuroscience 7

Table 2 Performance of FCM RCM SCM SRCM and SP-FCM on four yeast expression data sets

Different indices Algorithm Data setsGDS608 GDS2003 GDS2267 GDS2712

DB index

FCM 20861 24671 15916 19526RCM 16109 22104 10274 12058SCM 15938 21346 08946 10965SRCM 13274 19523 07438 07326SP-FCM 12958 18946 07962 06843

Dunn index

FCM 02647 02976 04208 03519RCM 03789 03981 07164 06074SCM 03865 03775 08439 06207SRCM 05126 04953 09759 08113SP-FCM 05407 05026 09182 08049

values indicate more compact and well-separated clustersThe validity indices reach their minimum value at 119862 = 33 6 and 2 separately which correspond to the final clusterprototype and the best partition Through the shadowedsets and PSO approaches the influence of each boundaryregion to the formation of the prototypes and the clusterscan be properly resolved Although more computing timeis required to run SP-FCM the reasonable result can beacquired for processing the overlapping and vagueness datapatterns

42 Yeast Gene Expression Data Set There are four yeastgene expression data sets used in the experiments includingGDS608 GDS2003 GDS2267 and GDS2712 downloadedfrom Gene Expression Omnibus The number of classesand samples of GDS608 is 26 and 6303 for GDS2003 thenumber of classes and samples is 23 and 5617 for GDS2267is 14 and 9275 and for GDS2712 is 15 and 9275 Table 2presents the validity indices of different methods after thecluster number 119862 was given The SP-FCM and SRCM obtainthe same effect and perform better than other clusteringalgorithms The improvement can be attributed to the factthat the global search capacity of PSO is conducive to findingmore appropriate cluster centers while escaping from localoptima

For getting the optimum 119862 automatically we let119898 = 201198881= 149 119888

2= 149 and 119908 = 072 and the rule 119862 le 11987312

is adopted The swarm size is set as 119871 = 20 the maximumiteration number of PSO is 119879 = 80 and for clusterreduction the range of the expected cluster number thecluster cardinality threshold 120576 and the attrition rate 120588 can beset as (1) GDS608 [119862min = 20 119862max = 80] 120576 = 20 120588 = 005(2) GDS2003 [119862min = 20 119862max = 75] 120576 = 20 120588 = 005(3) GDS2267 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008(4) GDS2712 [119862min = 10 119862max = 96] 120576 = 20 120588 = 008In each cycle we get the distribution of every cluster removepart of them according to their cardinality and calculate theXB index and the cluster number 119862 varies from 119862max to119862min The partition with the lowest value is selected as the

final result after the loop is ended As seen in Figure 3 forGDS608 at the beginning the cluster number decreases at afaster rate it takes 26 iterations to reduce the cluster numberfrom 119862 = 80 to 119862 = 30 and 4 iterations from 119862 = 30

to 119862 = 26 and the XB index begins to increase when thecluster number 119862 lt 26 For GDS2003 it takes 24 iterationsto reduce the cluster number from 119862 = 75 to 119862 = 30 and 7iterations from 119862 = 30 to 119862 = 23 and the XB index beginsto increase when the cluster number 119862 lt 23 For GDS2267 ittakes 23 iterations to reduce the cluster number from 119862 = 96to 119862 = 20 and 6 iterations from 119862 = 20 to 119862 = 14 andthe XB index begins to increase when the cluster number119862 lt 14 For GDS2712 it takes 23 iterations to reduce thecluster number from 119862 = 96 to 119862 = 20 and 5 iterationsfrom 119862 = 20 to 119862 = 15 and the XB index begins to increasewhen the cluster number 119862 lt 15 Here the advantagesof fuzzy sets PSO and shadowed sets are integrated in theSP-FCM and make this algorithm applicable to deal withoverlapping partitions the uncertainty and vagueness arisingfrom the boundary regions and the optimization processin the shadowed sets makes this method robust to outliersso that the approximation regions of each cluster can bedetermined accurately and the obtained prototypes approachthe desired locations

43 Real Data In this experiment totally 10 different pack-ages are tested Each package is represented by 100 framescaptured from different angles by camera and each frame isextracted SIFT feature points which are used for training arecognition system Figure 4 shows some images with theirSIFT keypoints And this data set is comprised of 248150descriptors We let 119898 = 20 119888

1= 149 119888

2= 149 119908 = 072

119871 = 20 120576 = 30 and 120588 = 001 for the SP-FCM and choose thereasonable range [119862min = 200 119862max = 360] according to thecategory amount of packages and distribution of keypoints ineach image Eighty iterations of PSO are run on each given119862 to produce the cluster prototype 119861 and partition matrix119880 as the starting point for the shadowed sets Longer PSOstabilization is needed to obtainmore stable cluster partitions

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article An Improved Fuzzy c-Means Clustering ...

8 Computational Intelligence and Neuroscience

0

05

1

15

2

25

3

35

30 29 28 27 26 25 24 23 22 21 20Number of clusters

Xie-

Beni

val

idity

(a) GDS608

0

1

2

3

4

5

6

Xie-

Beni

val

idity

30 29 28 27 26 25 24 23 22 21 20Number of clusters

(b) GDS2003

0

1

2

3

4

5

6

7

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(c) GDS2267

0

1

2

3

4

5

6

20 19 18 17 16 15 14 13 12 11 10Number of clusters

Xie-

Beni

val

idity

(d) GDS2712

Figure 3 XB validity index of four yeast gene expression data sets with cluster number 119862

Table 3 Performance of FCM RCM SCM SRCM and SP-FCM on package datasets

Different indices AlgorithmsFCM RCM SCM SRCM SP-FCM

DB index 184569 159671 143194 124038 107826Dunn index 92647 116298 125376 169422 167313

Within each cluster the optimal 120572119895decides the cardinality

and realizes cluster reduction and XB index is calculatedEach 119862-partition is ranked using this index and selected asthe final output by the smallest index value that indicates thebest compact and well-separated clusters At the beginningthe cluster number decreases at a faster speed it takes 26iterations to reduce the cluster number from 119862 = 360 to 119862 =289 and 20 iterations from119862 = 289 to119862 = 267 The XB indexincreases at a relatively faster rate when the cluster number119862 lt 267 Figure 5 shows the XB index for 119862 isin [267 289]The index reaches its minimum value at 119862 = 276 that meansthe best partition for this data set is 276 clusters Table 3exhibits the comparative analysis of convergence effect Asexpected SP-FCMcan provide sound results for the real datathe performance is assessed by those validity indices

5 Conclusions

This paper presents a modified fuzzy 119888-means algorithmbased on the particle swarm optimization and shadowed setsto perform unsupervised feature clustering This algorithmcalled SP-FCMutilizes the global search property of PSO andvagueness balance property of shadowed sets such that it canestimate the optimal cluster number as it runs through itsalternating optimization process SP-FCM as a randomizedbased approach has the capability to alleviate the problemsfaced by FCM which has some demerits of initializationand falling in local minima Moreover this algorithm avoidsthe subjective and somewhat arbitrary trials to estimate theappropriate value of cluster number and it enhances thiscapability to find the optimal cluster number within a specific

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article An Improved Fuzzy c-Means Clustering ...

Computational Intelligence and Neuroscience 9

(a) 644 features (b) 460 features (c) 742 features

(d) 442 features (e) 313 features (f) 602 features

(g) 373 features (h) 724 features (i) 539 features (j) 124 features

Figure 4 Ten package images with SIFT features

0100200300400500600700800900

1000

289

285

282

280

278

276

274

272

270

268

Number of clusters

Xie-

Beni

val

idity

287

283

281

279

277

275

273

271

269

267

Figure 5 XB validity index of bag data set with cluster number 119862

range using cluster validity measures as indicatorsThe use ofXB validity index allows the algorithm to find the optimumcluster number with cluster partitions that provide compactand well-separated clusters From the experiments we haveshown that the SP-FCMalgorithmproduces good resultswithreference to DB and Dunn indices especially to the highdimension and large data cases

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article An Improved Fuzzy c-Means Clustering ...

10 Computational Intelligence and Neuroscience

Acknowledgment

This research was supported by the National Natural ScienceFoundation of China (Grant no 61105089 NSFC)

References

[1] A K Jain ldquoData clustering 50 years beyond K-meansrdquo PatternRecognition Letters vol 31 no 8 pp 651ndash666 2010

[2] H-P Kriegel P Kroger and A Zimek ldquoClustering high-dimensional data a survey on subspace clustering pattern-based clustering and correlation clusteringrdquoACMTransactionson Knowledge Discovery from Data vol 3 no 1 article 1 2009

[3] P Melin and O Castillo ldquoA review on type-2 fuzzy logic appli-cations in clustering classification and pattern recognitionrdquoApplied Soft Computing Journal vol 21 pp 568ndash577 2014

[4] M Yuwono S W Su B D Moulton and H T Nguyen ldquoDataclustering using variants of rapid centroid estimationrdquo IEEETransactions on Evolutionary Computation vol 18 no 3 pp366ndash377 2014

[5] S C Satapathy G Pradhan S Pattnaik J V R Murthy andP V G D P Reddy ldquoPerformance comparisons of PSO basedclusteringrdquo InterJRI Computer Science and Networking vol 1no 1 pp 18ndash23 2009

[6] G Peters and RWeber ldquoDynamic clustering with soft comput-ingrdquo Wiley Interdisciplinary Reviews Data Mining and Knowl-edge Discovery vol 2 no 3 pp 226ndash236 2012

[7] D T Anderson J C Bezdek M Popescu and J M KellerldquoComparing fuzzy probabilistic and possibilistic partitionsrdquoIEEE Transactions on Fuzzy Systems vol 18 no 5 pp 906ndash9182010

[8] P Maji and S Paul ldquoRobust rough-fuzzy C-means algorithmdesign and applications in coding and non-coding RNA expres-sion data clusteringrdquo Fundamenta Informaticae vol 124 no 1-2pp 153ndash174 2013

[9] J Bezdek Fuzzy mathematics in pattern classification [PhDthesis] Cornell University New york NY USA 1974

[10] V Olman F Mao H Wu and Y Xu ldquoParallel clusteringalgorithm for large data sets with applications in bioinformat-icsrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 344ndash352 2009

[11] J C Bezdek and R J Hathaway ldquoOptimization of fuzzyclustering criteria using genetic algorithmsrdquo in Proceedings ofthe 1st IEEE Conference on Evolutionary Computation pp 589ndash594 June 1994

[12] T A Runkler ldquoAnt colony optimization of clustering modelsrdquoInternational Journal of Intelligent Systems vol 20 no 12 pp1233ndash1251 2005

[13] K S Al-Sultan and S Z Selim ldquoA global algorithm for theFuzzy Clustering problemrdquo Pattern Recognition vol 26 no 9pp 1357ndash1361 1993

[14] R Eberhart and J Kennedy ldquoNew optimizer using particleswarm theoryrdquo in Proceedings of the 6th International Sympo-sium onMicroMachine and Human Science pp 39ndash43 NagoyaJapan October 1995

[15] T A Runkler and C Katz ldquoFuzzy clustering by particleswarm optimizationrdquo in Proceedings of the IEEE InternationalConference on Fuzzy Systems pp 601ndash608 can July 2006

[16] C-F Juang C-M Hsiao and C-H Hsu ldquoHierarchical cluster-based multispecies particle-swarm optimization for fuzzy-system optimizationrdquo IEEE Transactions on Fuzzy Systems vol18 no 1 pp 14ndash26 2010

[17] H Izakian and A Abraham ldquoFuzzy C-means and fuzzy swarmfor fuzzy clustering problemrdquo Expert Systems with Applicationsvol 38 no 3 pp 1835ndash1838 2011

[18] W Pedrycz ldquoShadowed sets representing and processing fuzzysetsrdquo IEEE Transactions on Systems Man and Cybernetics BCybernetics vol 28 no 1 pp 103ndash109 1998

[19] J Zhou W Pedrycz and D Miao ldquoShadowed sets in the char-acterization of rough-fuzzy clusteringrdquoPattern Recognition vol44 no 8 pp 1738ndash1749 2011

[20] W Pedrycz ldquoFrom fuzzy sets to shadowed sets interpretationand computingrdquo International Journal of Intelligent Systems vol24 no 1 pp 48ndash61 2009

[21] S Mitra W Pedrycz and B Barman ldquoShadowed c-meansintegrating fuzzy and rough clusteringrdquo Pattern Recognitionvol 43 no 4 pp 1282ndash1291 2010

[22] J C Bezdek J M Keller R Krishnapuram and N R PalFuzzy Models and Algorithms for Pattern Recognition and ImageProcessing vol 4 ofTheHandbooks of Fuzzy Sets Springer NewYork NY USA 1999

[23] D L Davies and DW Bouldin ldquoA cluster separation measurerdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 1 no 2 pp 224ndash227 1978

[24] X L Xie and G Beni ldquoA validity measure for fuzzy clusteringrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 13 no 8 pp 841ndash847 1991

[25] J C Bezdek and N R Pal ldquoSome new indexes of clustervalidityrdquo IEEE Transactions on Systems Man and CyberneticsPart B Cybernetics vol 28 no 3 pp 301ndash315 1998

[26] S C Satapathy S K Patnaik C D P Dash and S Sahoo ldquoDataclustering using modified fuzzy-PSO (MFPSO)rdquo in Proceedingsof the 5th Multi-Disciplinary Int Workshop on Artificial Intelli-gence Hyderabad India 2011

[27] M K Pakhira S Bandyopadhyay and U Maulik ldquoValidityindex for crisp and fuzzy clustersrdquo Pattern Recognition vol 37no 3 pp 487ndash501 2004

[28] R Krishnapuram and J M Keller ldquoThe possibilistic C-meansalgorithm insights and recommendationsrdquo IEEE Transactionson Fuzzy Systems vol 4 no 3 pp 385ndash393 1996

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article An Improved Fuzzy c-Means Clustering ...

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014