Computer Archicture F07 - UH

17
1 COSC 6339 Big Data Analytics Fuzzy Clustering Some slides based on a lecture by Prof. Shishir Shah Edgar Gabriel Spring 2017 Clustering Clustering is a technique for finding similarity groups in data, called clusters. i.e., it groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into different clusters. Clustering is often called an unsupervised learning task as no class values denoting an a priori grouping of the data instances are given.

Transcript of Computer Archicture F07 - UH

1

COSC 6339

Big Data Analytics

Fuzzy Clustering

Some slides based on a lecture by Prof. Shishir Shah

Edgar Gabriel

Spring 2017

Clustering

• Clustering is a technique for finding similarity groups in data, called clusters. i.e.,

– it groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into different clusters.

• Clustering is often called an unsupervised learning taskas no class values denoting an a priori grouping of the data instances are given.

2

K-means algorithm• Given k, the k-means algorithm works as follows:

1)Randomly choose k data points (seeds) to be the initial

centroids, cluster centers

2)Assign each data point to the closest centroid

3)Re-compute the centroids using the current cluster

memberships.

4) If a convergence criterion is not met, go to 2).

Stopping/convergence criterion

1. no (or minimum) re-assignments of data points to different clusters,

2. no (or minimum) change of centroids, or

3. minimum decrease in the sum of squared error (SSE),

– Cj is the jth cluster, mj is the centroid of cluster Cj (the mean vector of all the data points in Cj), and dist(x, mj) is the distance between data point x and centroid mj.

k

jC j

j

distSSE1

2),(x

mx

3

Strengths of k-means

• Strengths:

– Simple: easy to understand and to implement

– Efficient: Time complexity: O(tkn),

where n is the number of data points,

k is the number of clusters, and

t is the number of iterations.

– Since both k and t are small. k-means is considered a

linear algorithm.

• K-means is the most popular clustering algorithm.

• Note that: it terminates at a local optimum if SSE is

used. The global optimum is hard to find due to

complexity.

Weaknesses of k-means• The algorithm is only applicable if the mean is defined.

– For categorical data, k-mode - the centroid is

represented by most frequent values.

• The user needs to specify k.

• The algorithm is sensitive to outliers

– Outliers are data points that are very far away from other

data points.

– Outliers could be errors in the data recording or some

special data points with very different values.

4

Weaknesses of k-means:

Problems with outliers

Weaknesses of k-means: outliers

• One method is to remove some data points in the

clustering process that are much further away from

the centroids than other data points.

– To be safe, we may want to monitor these possible

outliers over a few iterations and then decide to remove

them.

• Another method is to perform random sampling.

Since in sampling we only choose a small subset of

the data points, the chance of selecting an outlier

is very small.

– Assign the rest of the data points to the clusters by

distance or similarity comparison, or classification

5

Weaknesses of k-means (cont …)

• The algorithm is sensitive to initial seeds.

• If we use different seeds: good results

There are some methods to help choose good seeds

Weaknesses of k-means (cont …)

6

• The k-means algorithm is not suitable for

discovering clusters that are not hyper-ellipsoids

(or hyper-spheres).

+

Weaknesses of k-means (cont …)

Weaknesses of k-means (cont…)

• Membership of a point to a single cluster not always

clear

-> Fuzzy clustering can help with that

7

Boolean Logic

• In Boolean logic, an object is either a member of a set

or is not, i.e. their membership function can be

expressed as

μ𝐴 𝑥 = 1 𝑥 ∈ 𝐴0 𝑥 ∉ 𝐴

• In Boolean Logic

𝜇𝐴 ∩ ~𝐴𝑥 = ∅

𝜇𝐴 ∪ ~𝐴𝑥 = {𝐴𝑈}

• A set is a collection of objects grouped sharing a

common property

• A boolean set is also referred to as a crisp set

Fuzzy Logic

• Logic based on continuous variables

• Provides the ability to represent intrinsic ambiguity

• Fuzzification: the process of finding the membership

value of a (scalar) number in a fuzzy set

• Defuzzification: the process of converting the outcome

of a fuzzy set to a single representative number

8

Fuzzy Sets

• Indicate that the membership function can be different

than just 0 and 1

– 0 indicates no membership

– 1 indicates complete set membership

– [>0,<1] indicate partial membership

• Superset of Boolean Logic

• Fuzzy set has three principal components

– Degree of membership

– Possible Domain values

– Membership function: a continuous function that

connects a domain value to its degree of membership in

the set

Fuzzy Numbers

Gra

de o

f m

em

bers

hip

m(x

)

Support set

1.0

0

Domain

• Fuzzy number: a fuzzy set representing an approximation to a number

9

Fuzzy number ‘About 20’

Gra

de o

f m

em

bers

hip

m(x

)

14 16 18 20 22 24 26

1.0

0

Expectancy

• Expectancy e: degree of spread

• e=0: normal scalar value

Other fuzzy sets

1.0

0

4.5 5 5.5 6 7 7.56.5

Height in ft

1.0

0

4 6 8 10 14 1612

Project duration in weeks

Fuzzy set of tall men Fuzzy set for long project

Gra

de o

f m

em

bers

hip

m(x

)

Gra

de o

f m

em

bers

hip

m(x

)

10

Collection of Fuzzy Sets

1.0

0

10 15 20 25 35 4030

Client age (in years)

50 5545 65 7060

Child TeenYoung

adult

Middle

aged senior

• Each underlying fuzzy set defines a portion of the

variables domain

• A portion is not necessarily uniquely defined

Gra

de o

f m

em

bers

hip

m(x

)

Hedges: Fuzzy set transformers

• A hedge acts on a fuzzy set the same way an adjective

acts on a noun

– Increase or decrease the expectancy of a fuzzy number

– Intensify or dilute the membership of a fuzzy set

– Change the shape of a fuzzy set through contrast or

restriction

11

HedgeMathematicalExpression

A little

Slightly

Very

Extremely

Graphical Representation

[A(x)]1.3

[A(x)]1.7

[A(x)]2

[A(x)]3

HedgeMathematicalExpression

Graphical Representation

Very very

More or less

Indeed

Somewhat

2 [A(x )]2

A(x)

A(x)

if 0 A 0.5

if 0.5 < A 1

1 2 [1 A(x)]2

[A(x)]4

12

Alpha Cut Threshold

• An Alpha cut threshold defines a minimum truth

membership level for a fuzzy set

1.0

0

4 6 8 10 14 1612

Project duration in wks

Fuzzy set for long project

µ[.15]

Gra

de o

f m

em

bers

hip

m(x

)

Fuzzy AND Operator

• Example: region produced by proposition of Young

Adult and Middle Aged

• Mathematical representation

𝜇𝑇 𝑥𝑖 = min(𝜇𝐴 𝑥𝑖 , 𝜇𝐵 𝑥𝑖 )

1.0

0

10 15 20 25 35 4030

Client age (in years)

50 5545 65 7060

Young

adult

Middle

Aged

Gra

de o

f m

em

bers

hip

m(x

)

13

Fuzzy OR Operator

• Example: region produced by proposition of Young

Adult or Middle Aged

• Mathematical representation

𝜇𝑇 𝑥𝑖 = m𝑎𝑥(𝜇𝐴 𝑥𝑖 , 𝜇𝐵 𝑥𝑖 )

1.0

0

10 15 20 25 35 4030

Client age (in years)

50 5545 65 7060

Young

adult

Middle

Aged

Gra

de o

f m

em

bers

hip

m(x

)

Fuzzy NOT Operator

• Example: region produced by proposition of NOT Middle

Aged

• Mathematical representation

𝜇𝑇 𝑥𝑖 = 1 − 𝜇𝐴 𝑥𝑖

1.0

0

10 15 20 25 35 4030

Client age (in years)

50 5545 65 7060

Middle

Aged

Gra

de o

f m

em

bers

hip

m(x

)

14

Fuzzy Clustering: Motivation

• Crisp clustering allows each data point to be member of

exactly one cluster

• Fuzzy clustering assign membership values for each cluster

– Might be zero for some points

Fuzzy Clustering Concepts

• Each data point will have an associated degree of

membership for each cluster center in the range of

[0,1]

1.0

0

15

Fuzzy clustering concepts

• Fuzzification parameter m

m=1

clusters do not overlapm>1

clusters overlap

Fuzzy c-means clustering• Extension of the k-means algorithm

• Two steps:

– calculation of cluster centers

– Assignment of points to the clusters with varying degree

of memberships

• Constraint on fuzzy membership function associated

with each point: 𝑗=1𝑝

𝜇𝑗 𝑥𝑖 = 1, i=1,..,k

– p : number of clusters

– k: number of datapoints

– xi: ith data point

– µj(): function returning the membership value of xi in the

jth cluster

16

Fuzzy c-means clustering• Minimization of standard loss function

𝑘=1

𝑝

𝑖=1

𝑛

𝜇𝑘 𝑥𝑖𝑚 𝑥𝑖 − 𝑐𝑘 2

• Basic algorithm

Initialize p = number of clusters

m = fuzzification parameter

cj = cluster centers

Repeat

for all data points: calculate distance dij to all centers cj

for i=1 to n: update µj(xi) using cj

for j=1 to p: Update cj using current µj(xi)

Until cj estimates stabilize

Fuzzy c-means clustering

• With µj(xi)=

1

𝑑𝑗𝑖

1𝑚−1

𝑘=1𝑝 1

𝑑𝑘𝑖

1𝑚−1

dji being the distance of xi to cluster center cj (e.g. euclidean

distance)

• and 𝑐𝑗 = 𝑖( µj(𝑥𝑖)

𝑚𝑥𝑖)

𝑖 µj(𝑥𝑖)𝑚

17

Fuzzy c-means clustering

• Problem with c-means clustering:

– Outlier data points still have to be assigned to a cluster

Fuzzy Adaptive Clustering

• Alternative formulation for constraint on membership

𝑗=1

𝑝

𝑖=1

𝑛

µj(xi) = 𝑛

– Membership quantifiers for all sample points is n

– Individual point could have a total value of membership

function of <1

=> µj(xi)=𝑛

1

𝑑𝑗𝑖

1𝑚−1

𝑘=1𝑝 𝑧=1

𝑛 1

𝑑𝑘𝑧

1𝑚−1