Compare K,Fcm Imp

7/28/2019 Compare K,Fcm Imp

1/5

International Journal of Advances in Computing & Communications

Volume * No.*, ___________ 2013

79|P a g ewww.ijacc.org MohitaBansal,MeenakshiChaudhary,SwastiSharma

A COMPARATIVE ANALYSIS BETWEEN K-MEANS AND

FUZZY C-MEANS CLUSTERING ALGORITHMS

MOHITA BANSAL MEENAKSHI CHAUDHARY SWASTI SHARMA

([email protected]) ([email protected]) ([email protected])

ASSISTANCE BY: RAHUL SHARMA([email protected])

Abstract: Clustering is the partitioning of data into

groups of similar objects. Clustering is one of the methods

used for segmentation. Segmentation of an image entails

the division or separation of the image into groups which

consists of regions of similar attribute and dissimilar

attributes.

Several clustering methods and numerous

clustering algorithms are available in existing software

packages and new ones frequently appear in the l iterature.

These methods and algorithms vary depending on how the

similarity between observations is defined or on other

assumptions about shapes of clusters, distributions of

variables, etc. The objective of this paper is to study and

compare different data clustering algorithm. The aim ofthis paper is to compare the K-means and Fuzzy C-meansclustering.

KEYWORDS: K-mean Clustering and Fuzzy C-meanclustering.

1.INTRODUCTION

Clustering analysis is the system that collects patterns andform clusters on the basis of only the information found in thedata that describes the pattern and their relationship but theyshould have similar feature or aspect. The patterns that formclusters are similar within then to other patterns belonging to adifferent cluster .The greater the homogeneity within a cluster and greater thedifference between the clusters, the distinct and better is thecluster.Clustering generates groups of persons, products or eventwhich can be used to determine managerial strategy, or arecommonly the target of further analysis. Clustering analysisdeals with finding a structurein a collection of unlabeled data.It is important to understand the difference between clusteringand discriminant analysis. The loose definition of clusteringcan be said as it is the process in which the objects which hassimilar characteristics or behaviour are grouped together andthe objects which have dissimilar behaviour(not included inany group) are called outliers.

Clustering has been used in different areas likeengineering, data mining, medicine and biology. Clustering isalso useful in various naturally associating, decision-making,exploratory pattern-analysis, and machine-learningsituations, including document retrieval, image segmentation,and pattern classification.

Fig 1: Patterns of different data items(top),clustering of patterns(bottom)

COMPONENTS OF CLUSTERING:

1. Representation of the patterns whichinvolves feature extraction(It is the use of one ormore transformations of the input features to

produce new salient features) and feature


2/5


Volume * No.*, ___________ 2013


selection(It is the process of identify the mosteffective subset of the features which are existingfrom the beginning to use in clustering) .

2. Pattern proximityis used to calculate thedistance function between the patterns. Distancemeasure such as Euclidean distance which is used toshow the dissimilarity in patterns.

3. Grouping is the process of placing similarpatterns acting together.it can be done through manyways like hierarichal , partitioning ,agglometricclustering and many additional techniques.

4. Data abstraction is the process of extractingthe compact representation of patterns.

5. Outputresults in the formation of clusters.

Fig 2: Components of clustering

2. K-MEAN

The basic idea of k-means clustering is that clusters ofpatterns with the same target category are recognized andforcasts for new data items are made by assuming, they are ofthe same type as the nearest cluster centre. It can also bedescribed as k centroids, one for each cluster. Firstly anyrandom pattern is selected. Then the distance of the patterns to

the centroid of the groups is compare. The group which isclosest to the pattern, to that, the pattern is merged. This

process is carried until there is no such pattern left which doesnot belong to any of the groups. When this process iscompleted we re-compute the position of k new centroids.After this, a new binding has to be done between the same

pattern and the closest new centroid. This process is repeateduntil we may notice that there is no change in the location ofthe k centroids (centroids do not move).

In k-means clustering the distance between patternsand the centroid are measured in the terms of Euclideandistance. The Euclidean distance between the two multi-

dimensional data points be A={a1,a2,- - -, an} and B={b1,b2,- --,bn}be described as:

D(A,B)=

Where D is the Euclidean distance.

The k means methods aims to minimize the sum of squareddistances between all points and the cluster centre. It is wellsuited to generating global clusters. The K-Means method isrepresented in number, unsupervised, non-deterministic anditerative.

(a) Iteration 1

(b)Iteration 2


3/5


Volume * No.*, ___________ 2013


(c)Iteration 3

(d)Iteration 4

Fig 3:Using k means algorithm to find three data samples.

3.ALGORITHM OF K-MEANS

CLUSTERING

STEP 1: Specify the number of clusters(k in k-means).

STEP 2:For each cluster select a centroid.

STEP 3: Assign each object to the group (having similarbehaviour) based on the closest or nearest centroid.

STEP 4: Recalculate the position of the new k centroids.STEP 5: Repeat the above two steps until the centroids nolonger change their location or position.

The k means algorithm necessarily does not findaccurate arrangement according to the sum of square distancefunction minimum. It is also considerably sensitive to theinitial randomly selected centroids. The k-means algorithmcan be run multiple times to reduce this effect.The k mean algorithm can be better understood with the helpof a simple example:

Let as consider there are n sample feature vectors a1,a2,....,anwhich belong to the same class. They all lie into k conciseclusters, where k


4/5


Volume * No.*, ___________ 2013


n results in smaller uij and hence, fuzzier clusters. Whenthe limit n = 1, u ij converge to 0 or 1, which involves a

partitioning. If not then compute again by takingcetroids.

Fig 4:Fuzzy c mean clustering

5.ALGORITHM OF FUZZY C-MEANS

CLUSTERING

STEP 1: Specify the number of clusters(c).

STEP 2: Assign random degree of membership to eachpoint in a cluster.

STEP 3:Compute the cluster centroids.

STEP 4:Group theobject on the basis of the criterion.

STEP 5: Compute the Euclidean distance.

STEP 6: Assign object to group which has highest degreeof membership.

STEP 7: Repeat until criterion is met.

Fig 5: Multi-dimensional fuzzy c mean clustering

6.RESULT

The paper compares k means and fuzzy c means clustering,which are very similar in approaches. After analysing thealgorithms we have come to the conclusion that:

All the algorithm have some ambiguity in somedata when clustered.

K means and fuzzy c mean clustering algorithm arerecommended for huge data set.

K means and fuzzy c mean is very sensitive to noisyin dataset.this noise makes it difficult for thealgorithms to cluster an object into its suitablecluster.

The difference is that,

K-means clustering produces fairly higher accuracyand requires less computation. C-means clustering

produces close results to K-means clustering, yet itrequires more computation time than K-means

because of the fuzzy measures calculations involvedin the algorithm.

Fuzzy-C means will tend to run slower than Kmeans, since it's actually doing more work. Eachpoint is evaluated with each cluster, and moreoperations are involved in each evaluation.

K-Means just needs to do a distance calculation,whereas fuzzy c means needs to do a full inverse-distance weighting. Fuzzy-C Means clustering, each

point has a weighting associated with a particularcluster,so a point doesn't sit "in a cluster" as muchas has a weak or strong association to the cluster.

REFERENCES

1. T.Chandrasekhar, K.Thangavel and E.Elayaraja(Research Scholar, Bharathiar university,

Tamilnadu, India) Performance Analysis of EnhancedClustering Algorithm for Gene Expression Data

2. A.K. JAIN(Michigan State University),M.N. MURTY(Indian Institute of Science),P.J. FLYNN(The Ohio StateUniversity),Data Clustering: A Review

3. Mrs. Bharati R.Jipkate andDr.Mrs.V.V.Gohokar(SSGMCE, Shegaon, Maharashtra-India),A Comparative Analysis of Fuzzy C-MeansClustering and K Means Clustering Algorithms

4. Satish Garla, Goutam Chakraborty, (Oklahoma StateUniversity, Stillwater, OK, US),Gary Gaeth, (University ofIowa, Iowa City, Iowa, US) Comparison of K-means, NormalMixtures and Probabilistic-D Clustering

for B2B Segmentation using Customers Perceptions.

5. Tapas Kanungo(Senior Member, IEEE), David M.Mount(Member, IEEE),Nathan S. Netanyahu(Member,IEEE), Christine D. Piatko, Ruth Silverman, andAngela Y. Wu(Senior Member, IEEE)An Efficient k-MeansClustering Algorithm:Analysis and Implementation.

6.K.Velusamy(Department of Computer Science, KSRCollege of Arts and Science, Tiruchengodu, Tamilnadu,India), R.Manavalan(Department of Computer Science, KSRCollege of Arts and Science,Tiruchengodu, Tamilnadu,


5/5


Volume * No.*, ___________ 2013


India), Performance Analysis of Unsupervised ClassificationBased on Optimization.

Compare K,Fcm Imp

Documents

Transcript of Compare K,Fcm Imp