Fuzzy C Means (Overlapping Clustering)

13
Ravi Prakash Gupta PGDB-10-10-22 Institute of Bioinformatics and Applied biotechnology, Bangalore. Fuzzy C-Means 1

Transcript of Fuzzy C Means (Overlapping Clustering)

Page 1: Fuzzy C Means (Overlapping Clustering)

Ravi Prakash GuptaPGDB-10-10-22Institute of Bioinformatics and Applied biotechnology, Bangalore.

Fuzzy C-Means 1

Page 2: Fuzzy C Means (Overlapping Clustering)

Types of ClusteringTypes of Clustering Exclusive Clustering (K-Means) Overlapping Clustering (Fuzzy C-

Means) Hierarchical clustering Mixture of Gaussians or

Probabilistic ClusteringFuzzy C-Means 2

Page 3: Fuzzy C Means (Overlapping Clustering)

“Fuzzy clustering is a process of assigning these membership levels, and then using them to assign data elements to one or more clusters.”

Fuzzy C-Means 3

Page 4: Fuzzy C Means (Overlapping Clustering)

A fuzzy set is a pair (A,m) where A is a set and . m:A[0,1]

For each X belongs to A , m(x) is called the grade of membership of x in (A,m). For a finite set A = {x1,...,xn}, the fuzzy set (A,m) is often denoted by {m(x1) / x1,...,m(xn) / xn}.

Let X€A . Then x is called not included in the fuzzy set (A,m) if m(x) = 0, x is called fully included if m(x) = 1, and x is called fuzzy member if 0 < m(x) < 1. The set {x€a/m₍x₎>0}is called the support of (A,m) and the {x€A|m(x)=1}set is called its kernel. Fuzzy C-Means 4

Page 5: Fuzzy C Means (Overlapping Clustering)

Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters using Euclidean Distance .

This method was developed by Dunn in 1973 and improved by Bezdek in 1981 and it is frequently used in pattern recognition.

Fuzzy C-Means 5

Page 6: Fuzzy C Means (Overlapping Clustering)

Step-1: Start

Step-2: Take n number of Similar points.

Step-3: Assign threshold value for each cluster.

Step-4: Take the ‘point coefficient’ or the degree to which the point is associated with the cluster.

.

Fuzzy C-Means 6

Page 7: Fuzzy C Means (Overlapping Clustering)

Step-5: Check point coefficient for each point by iterating the magnitude of the distances

Step-6: Repeat step-5 to compute : a) Centroid for each cluster b) Point coefficient of being in the cluster.

Step-7: Repeat step-6 until the algorithm minimizes intra-cluster variance as well.

Step-8: Stop

Fuzzy C-Means 7

Page 8: Fuzzy C Means (Overlapping Clustering)

With fuzzy c-means, the centroid of a cluster is computed as being the mean of all points,weighted by their degree of belonging to the cluster.

The degree of being in a certain cluster (degree of membership) related to the inverse of the distance to the cluster.

By iteratively updating the cluster centers and the membership grades for each data point, FCM iteratively moves the cluster centers to the "right"location within a data set.

Fuzzy C-Means 8

Page 9: Fuzzy C Means (Overlapping Clustering)

Medical diagnostics : Example- MRI.

Libraries: book ordering.

Marketing: finding groups of customers with similar behavior given a large database.

Insurance: identifying groups of motor insurance policy holders with a high average claim cost; identifying frauds.

Fuzzy C-Means 9

Page 10: Fuzzy C Means (Overlapping Clustering)

CASE STUDY In this work, unsupervised clustering

methods were performed to cluster the patients into three clusters by using thyroid gland data obtained by Dr.Coomans.

Here, Fuzzy C-Means (FCM) and Hard Cmeans (HCM) algorithms are used as an unsupervised clustering method to cluster the patients.

As a result of clustering algorithms, patients’ statuses are classified normal, hyperthyroid function and hypothyroid function

Page 11: Fuzzy C Means (Overlapping Clustering)

The application of Fuzzy C-Means causes the class membership to become a relative one randan object can belong to several classes at thesame time but with different degrees. This is an important feature for medical diagnostic systems to increase the sensitivity.

The graphical representation of the results is shown below

Page 12: Fuzzy C Means (Overlapping Clustering)

For the membership degrees close to 0.5 are the suspicious cases (shaded area in Table) to assign the sample to one cluster. Therefore fuzzy c-means clustering for medical diagnostic systems is more reliable than the hard one.

In medical diagnostic systems, fuzzy c-means algorithm gives the better results than hard-kmeans algorithm.

Page 13: Fuzzy C Means (Overlapping Clustering)

Fuzzy Means clustering in Medical Diagnostics.

www.endocrineweb.com/thyroid.html

www.mathworks.com

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletFCM.htmlFuzzy C-Means 13