Compare K,Fcm Imp

download Compare K,Fcm Imp

of 5

Transcript of Compare K,Fcm Imp

  • 7/28/2019 Compare K,Fcm Imp

    1/5

    International Journal of Advances in Computing & Communications

    Volume * No.*, ___________ 2013

    79|P a g ewww.ijacc.org MohitaBansal,MeenakshiChaudhary,SwastiSharma

    A COMPARATIVE ANALYSIS BETWEEN K-MEANS AND

    FUZZY C-MEANS CLUSTERING ALGORITHMS

    MOHITA BANSAL MEENAKSHI CHAUDHARY SWASTI SHARMA

    ([email protected]) ([email protected]) ([email protected])

    ASSISTANCE BY: RAHUL SHARMA([email protected])

    Abstract: Clustering is the partitioning of data into

    groups of similar objects. Clustering is one of the methods

    used for segmentation. Segmentation of an image entails

    the division or separation of the image into groups which

    consists of regions of similar attribute and dissimilar

    attributes.

    Several clustering methods and numerous

    clustering algorithms are available in existing software

    packages and new ones frequently appear in the l iterature.

    These methods and algorithms vary depending on how the

    similarity between observations is defined or on other

    assumptions about shapes of clusters, distributions of

    variables, etc. The objective of this paper is to study and

    compare different data clustering algorithm. The aim ofthis paper is to compare the K-means and Fuzzy C-meansclustering.

    KEYWORDS: K-mean Clustering and Fuzzy C-meanclustering.

    1.INTRODUCTION

    Clustering analysis is the system that collects patterns andform clusters on the basis of only the information found in thedata that describes the pattern and their relationship but theyshould have similar feature or aspect. The patterns that formclusters are similar within then to other patterns belonging to adifferent cluster .The greater the homogeneity within a cluster and greater thedifference between the clusters, the distinct and better is thecluster.Clustering generates groups of persons, products or eventwhich can be used to determine managerial strategy, or arecommonly the target of further analysis. Clustering analysisdeals with finding a structurein a collection of unlabeled data.It is important to understand the difference between clusteringand discriminant analysis. The loose definition of clusteringcan be said as it is the process in which the objects which hassimilar characteristics or behaviour are grouped together andthe objects which have dissimilar behaviour(not included inany group) are called outliers.

    Clustering has been used in different areas likeengineering, data mining, medicine and biology. Clustering isalso useful in various naturally associating, decision-making,exploratory pattern-analysis, and machine-learningsituations, including document retrieval, image segmentation,and pattern classification.

    Fig 1: Patterns of different data items(top),clustering of patterns(bottom)

    COMPONENTS OF CLUSTERING:

    1. Representation of the patterns whichinvolves feature extraction(It is the use of one ormore transformations of the input features to

    produce new salient features) and feature

  • 7/28/2019 Compare K,Fcm Imp

    2/5

    International Journal of Advances in Computing & Communications

    Volume * No.*, ___________ 2013

    80|P a g ewww.ijacc.org MohitaBansal,MeenakshiChaudhary,SwastiSharma

    selection(It is the process of identify the mosteffective subset of the features which are existingfrom the beginning to use in clustering) .

    2. Pattern proximityis used to calculate thedistance function between the patterns. Distancemeasure such as Euclidean distance which is used toshow the dissimilarity in patterns.

    3. Grouping is the process of placing similarpatterns acting together.it can be done through manyways like hierarichal , partitioning ,agglometricclustering and many additional techniques.

    4. Data abstraction is the process of extractingthe compact representation of patterns.

    5. Outputresults in the formation of clusters.

    Fig 2: Components of clustering

    2. K-MEAN

    The basic idea of k-means clustering is that clusters ofpatterns with the same target category are recognized andforcasts for new data items are made by assuming, they are ofthe same type as the nearest cluster centre. It can also bedescribed as k centroids, one for each cluster. Firstly anyrandom pattern is selected. Then the distance of the patterns to

    the centroid of the groups is compare. The group which isclosest to the pattern, to that, the pattern is merged. This

    process is carried until there is no such pattern left which doesnot belong to any of the groups. When this process iscompleted we re-compute the position of k new centroids.After this, a new binding has to be done between the same

    pattern and the closest new centroid. This process is repeateduntil we may notice that there is no change in the location ofthe k centroids (centroids do not move).

    In k-means clustering the distance between patternsand the centroid are measured in the terms of Euclideandistance. The Euclidean distance between the two multi-

    dimensional data points be A={a1,a2,- - -, an} and B={b1,b2,- --,bn}be described as:

    D(A,B)=

    Where D is the Euclidean distance.

    The k means methods aims to minimize the sum of squareddistances between all points and the cluster centre. It is wellsuited to generating global clusters. The K-Means method isrepresented in number, unsupervised, non-deterministic anditerative.

    (a) Iteration 1

    (b)Iteration 2

  • 7/28/2019 Compare K,Fcm Imp

    3/5

    International Journal of Advances in Computing & Communications

    Volume * No.*, ___________ 2013

    81|P a g ewww.ijacc.org MohitaBansal,MeenakshiChaudhary,SwastiSharma

    (c)Iteration 3

    (d)Iteration 4

    Fig 3:Using k means algorithm to find three data samples.

    3.ALGORITHM OF K-MEANS

    CLUSTERING

    STEP 1: Specify the number of clusters(k in k-means).

    STEP 2:For each cluster select a centroid.

    STEP 3: Assign each object to the group (having similarbehaviour) based on the closest or nearest centroid.

    STEP 4: Recalculate the position of the new k centroids.STEP 5: Repeat the above two steps until the centroids nolonger change their location or position.

    The k means algorithm necessarily does not findaccurate arrangement according to the sum of square distancefunction minimum. It is also considerably sensitive to theinitial randomly selected centroids. The k-means algorithmcan be run multiple times to reduce this effect.The k mean algorithm can be better understood with the helpof a simple example:

    Let as consider there are n sample feature vectors a1,a2,....,anwhich belong to the same class. They all lie into k conciseclusters, where k

  • 7/28/2019 Compare K,Fcm Imp

    4/5

    International Journal of Advances in Computing & Communications

    Volume * No.*, ___________ 2013

    82|P a g ewww.ijacc.org MohitaBansal,MeenakshiChaudhary,SwastiSharma

    n results in smaller uij and hence, fuzzier clusters. Whenthe limit n = 1, u ij converge to 0 or 1, which involves a

    partitioning. If not then compute again by takingcetroids.

    Fig 4:Fuzzy c mean clustering

    5.ALGORITHM OF FUZZY C-MEANS

    CLUSTERING

    STEP 1: Specify the number of clusters(c).

    STEP 2: Assign random degree of membership to eachpoint in a cluster.

    STEP 3:Compute the cluster centroids.

    STEP 4:Group theobject on the basis of the criterion.

    STEP 5: Compute the Euclidean distance.

    STEP 6: Assign object to group which has highest degreeof membership.

    STEP 7: Repeat until criterion is met.

    Fig 5: Multi-dimensional fuzzy c mean clustering

    6.RESULT

    The paper compares k means and fuzzy c means clustering,which are very similar in approaches. After analysing thealgorithms we have come to the conclusion that:

    All the algorithm have some ambiguity in somedata when clustered.

    K means and fuzzy c mean clustering algorithm arerecommended for huge data set.

    K means and fuzzy c mean is very sensitive to noisyin dataset.this noise makes it difficult for thealgorithms to cluster an object into its suitablecluster.

    The difference is that,

    K-means clustering produces fairly higher accuracyand requires less computation. C-means clustering

    produces close results to K-means clustering, yet itrequires more computation time than K-means

    because of the fuzzy measures calculations involvedin the algorithm.

    Fuzzy-C means will tend to run slower than Kmeans, since it's actually doing more work. Eachpoint is evaluated with each cluster, and moreoperations are involved in each evaluation.

    K-Means just needs to do a distance calculation,whereas fuzzy c means needs to do a full inverse-distance weighting. Fuzzy-C Means clustering, each

    point has a weighting associated with a particularcluster,so a point doesn't sit "in a cluster" as muchas has a weak or strong association to the cluster.

    REFERENCES

    1. T.Chandrasekhar, K.Thangavel and E.Elayaraja(Research Scholar, Bharathiar university,

    Tamilnadu, India) Performance Analysis of EnhancedClustering Algorithm for Gene Expression Data

    2. A.K. JAIN(Michigan State University),M.N. MURTY(Indian Institute of Science),P.J. FLYNN(The Ohio StateUniversity),Data Clustering: A Review

    3. Mrs. Bharati R.Jipkate andDr.Mrs.V.V.Gohokar(SSGMCE, Shegaon, Maharashtra-India),A Comparative Analysis of Fuzzy C-MeansClustering and K Means Clustering Algorithms

    4. Satish Garla, Goutam Chakraborty, (Oklahoma StateUniversity, Stillwater, OK, US),Gary Gaeth, (University ofIowa, Iowa City, Iowa, US) Comparison of K-means, NormalMixtures and Probabilistic-D Clustering

    for B2B Segmentation using Customers Perceptions.

    5. Tapas Kanungo(Senior Member, IEEE), David M.Mount(Member, IEEE),Nathan S. Netanyahu(Member,IEEE), Christine D. Piatko, Ruth Silverman, andAngela Y. Wu(Senior Member, IEEE)An Efficient k-MeansClustering Algorithm:Analysis and Implementation.

    6.K.Velusamy(Department of Computer Science, KSRCollege of Arts and Science, Tiruchengodu, Tamilnadu,India), R.Manavalan(Department of Computer Science, KSRCollege of Arts and Science,Tiruchengodu, Tamilnadu,

  • 7/28/2019 Compare K,Fcm Imp

    5/5

    International Journal of Advances in Computing & Communications

    Volume * No.*, ___________ 2013

    83|P a g ewww.ijacc.org MohitaBansal,MeenakshiChaudhary,SwastiSharma

    India), Performance Analysis of Unsupervised ClassificationBased on Optimization.