Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate...
Transcript of Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate...
![Page 1: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/1.jpg)
1
Machine Learning
Lecture # 11Clustering
![Page 2: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/2.jpg)
Cluster Analysis
![Page 3: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/3.jpg)
What is Cluster Analysis?
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Finding similarities between data according to the
characteristics found in the data and grouping similar data
objects into clusters
![Page 4: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/4.jpg)
What is Cluster Analysis?
• Clustering analysis is an important human activity
• Early in childhood, we learn how to distinguish between cats and dogs
• Unsupervised learning: no predefined classes
• Typical applications
– As a stand-alone tool to get insight into data distribution
– As a preprocessing step for other algorithms
![Page 5: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/5.jpg)
Clustering
• Hard vs. Soft
– Hard: same object can only belong to single cluster
– Soft: same object can belong to different clusters
![Page 6: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/6.jpg)
Clustering
• Flat vs. Hierarchical
– Flat: clusters are flat
– Hierarchical: clusters form a tree
• Agglomerative
• Divisive
![Page 7: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/7.jpg)
Clustering: Rich Applications and Multidisciplinary Efforts
• Pattern Recognition
• Spatial Data Analysis
– Create thematic maps in GIS by clustering feature spaces
– Detect spatial clusters or for other spatial mining tasks
• Image Processing
• Economic Science (especially market research)
• WWW
– Document classification
– Cluster Weblog data to discover groups of similar access patterns
![Page 8: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/8.jpg)
Quality: What Is Good Clustering?
• A good clustering method will produce high quality clusters with
– high intra-class similarity
(Similar to one another within the same cluster)
– low inter-class similarity
(Dissimilar to the objects in other clusters)
![Page 9: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/9.jpg)
What is Cluster Analysis?
• Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Inter-cluster distances are maximized
Intra-cluster distances are
minimized
![Page 10: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/10.jpg)
Similarity and Dissimilarity Between Objects
• Distances are normally used to measure the similarity or
dissimilarity between two data objects
• Some popular ones include: Minkowski distance:
where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two p-
dimensional data objects, and q is a positive integer
• If q = 1, d is Manhattan distance
pp
jx
ix
jx
ix
jx
ixjid )||...|||(|),(
2211
||...||||),(2211 pp j
xi
xj
xi
xj
xi
xjid
![Page 11: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/11.jpg)
Similarity and Dissimilarity Between Objects (Cont.)
• If q = 2, d is Euclidean distance:
• Also, one can use weighted distance, parametric
Pearson correlation, or other disimilarity measures
)||...|||(|),( 22
22
2
11 pp jx
ix
jx
ix
jx
ixjid
![Page 12: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/12.jpg)
Major Clustering Approaches
• Partitioning approach:
– Construct various partitions and then evaluate them by some criterion, e.g., minimizing
the sum of square errors
– Typical methods: k-means, k-medoids, CLARANS
• Hierarchical approach:
– Create a hierarchical decomposition of the set of data (or objects) using some criterion
– Typical methods: Hierarchical, Diana, Agnes, BIRCH, ROCK, CAMELEON
• Density-based approach:
– Based on connectivity and density functions
– Typical methods: DBSACN, OPTICS, DenClue
![Page 13: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/13.jpg)
Clustering Approaches
1. Partitioning Methods
2. Hierarchical Methods
3. Density-Based Methods
![Page 14: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/14.jpg)
Partitioning Algorithms: Basic Concept
• Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion
• k-means and k-medoids algorithms
• k-means (MacQueen’67): Each cluster is represented by the center of the cluster
• k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each
cluster is represented by one of the objects in the cluster
![Page 15: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/15.jpg)
The K-Means Clustering Method
• Given k, the k-means algorithm is
implemented in four steps:1. Partition objects into k nonempty subsets
2. Compute seed points as the centroids of the clusters of the
current partition (the centroid is the center, i.e., mean
point, of the cluster)
3. Assign each object to the cluster with the nearest seed
point
4. Go back to Step 2, stop when no more new assignment
![Page 16: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/16.jpg)
K-means Clustering
![Page 17: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/17.jpg)
K-means Clustering
![Page 18: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/18.jpg)
K-means Clustering
![Page 19: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/19.jpg)
K-means Clustering
![Page 20: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/20.jpg)
K-means Clustering
![Page 21: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/21.jpg)
The K-Means Clustering Method
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrarily choose K object as initial cluster center
Assign each objects to most similar center
Update the cluster means
Update the cluster means
reassignreassign
![Page 22: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/22.jpg)
Example
• Run K-means clustering with 3 clusters (initial centroids: 3, 16, 25) for at least 2 iterations
![Page 23: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/23.jpg)
Example
• Centroids:3 – 2 3 4 7 9 new centroid: 5
16 – 10 11 12 16 18 19 new centroid: 14.33
25 – 23 24 25 30 new centroid: 25.5
![Page 24: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/24.jpg)
Example
• Centroids:5 – 2 3 4 7 9 new centroid: 5
14.33 – 10 11 12 16 18 19 new centroid: 14.33
25.5 – 23 24 25 30 new centroid: 25.5
![Page 25: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/25.jpg)
In class Practice
• Run K-means clustering with 3 clusters (initial centroids: 3, 12, 19) for at least 2 iterations
![Page 26: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/26.jpg)
Hierarchical Clustering
• Two main types of hierarchical clustering– Agglomerative:
• Start with the points as individual clusters
• At each step, merge the closest pair of clusters until only one cluster (or k clusters) left
Matlab: Statistics Toolbox: clusterdata,
which performs all these steps: pdist, linkage, cluster
– Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there are k clusters)
• Traditional hierarchical algorithms use a similarity or distance matrix– Merge or split one cluster at a time
– Image segmentation mostly uses simultaneous merge/split
![Page 27: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/27.jpg)
Hierarchical clustering
• Agglomerative (Bottom-up)– Compute all pair-wise pattern-pattern similarity
coefficients
– Place each of n patterns into a class of its own
– Merge the two most similar clusters into one• Replace the two clusters into the new cluster
• Re-compute inter-cluster similarity scores w.r.t. the new cluster
– Repeat the above step until there are k clusters left (k can be 1)
![Page 28: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/28.jpg)
Hierarchical clustering
• Agglomerative (Bottom up)
![Page 29: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/29.jpg)
Hierarchical clustering• Agglomerative (Bottom up)
• 1st iteration1
![Page 30: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/30.jpg)
Hierarchical clustering• Agglomerative (Bottom up)
• 2nd iteration1 2
![Page 31: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/31.jpg)
Hierarchical clustering• Agglomerative (Bottom up)
• 3rd iteration
1 23
![Page 32: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/32.jpg)
Hierarchical clustering• Agglomerative (Bottom up)
• 4th iteration
1 23
4
![Page 33: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/33.jpg)
Hierarchical clustering• Agglomerative (Bottom up)
• 5th iteration
1 23
4
5
![Page 34: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/34.jpg)
Hierarchical clustering• Agglomerative (Bottom up)
• Finally k clusters left
1 23
4
6 9
5
7
8
![Page 35: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/35.jpg)
Hierarchical clustering
• Divisive (Top-down)
– Start at the top with all patterns in one cluster
– The cluster is split using a flat clustering algorithm
– This procedure is applied recursively until each pattern is in its own singleton cluster
![Page 36: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/36.jpg)
Hierarchical clustering
• Divisive (Top-down)
![Page 37: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/37.jpg)
Hierarchical Clustering: The Algorithm
• Hierarchical clustering takes as input a set of points • It creates a tree in which the points are leaves and the internal
nodes reveal the similarity structure of the points.– The tree is often called a “dendogram.”
• The method is summarized below:
Place all points into their own clusters
While there is more than one cluster, do
Merge the closest pair of clusters
The behavior of the algorithm depends on how “closest pair
of clusters” is defined
![Page 38: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/38.jpg)
Hierarchical Clustering: Example
A
B
EF
C D
A B E FC D
This example illustrates single-link clustering in
Euclidean space on 6 points.
![Page 39: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/39.jpg)
Hierarchical Clustering
• Produces a set of nested clusters organized as a hierarchical tree
• Can be visualized as a dendrogram
– A tree like diagram that records the sequences of merges or splits
1 3 2 5 4 60
0.05
0.1
0.15
0.2
1
2
3
4
5
6
1
23 4
5
![Page 40: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/40.jpg)
Strengths of Hierarchical Clustering
• Do not have to assume any particular number of clusters– Any desired number of clusters can be obtained by
‘cutting’ the dendogram at the proper level
![Page 41: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/41.jpg)
Hierarchical Clustering: Merging Clusters
Single Link: Distance between two clusters
is the distance between the closest points.
Also called “neighbor joining.”
Average Link: Distance between clusters is
distance between the cluster centroids.
Complete Link: Distance between clusters
is distance between farthest pair of points.
![Page 42: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/42.jpg)
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Similarity?
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
Proximity Matrix
![Page 43: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/43.jpg)
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
![Page 44: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/44.jpg)
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
![Page 45: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/45.jpg)
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
![Page 46: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/46.jpg)
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function
![Page 47: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/47.jpg)
An example
Let us consider a gene measured in a set of 5 experiments: A,B,C,D and E. The values measured in the 5 experiments are:
A=100 B=200 C=500 D=900 E=1100
We will construct the hierarchical clustering of these values using Euclidean distance, centroid linkage and an agglomerative approach.
50
![Page 48: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/48.jpg)
An example
SOLUTION:
• The closest two values are 100 and 200=>the centroid of these two values is 150.
• Now we are clustering the values: 150, 500, 900, 1100
• The closest two values are 900 and 1100
=>the centroid of these two values is 1000.
• The remaining values to be joined are: 150, 500, 1000.
• The closest two values are 150 and 500
=>the centroid of these two values is 325.
• Finally, the two resulting subtrees are joined in the root of the tree.
51
![Page 49: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/49.jpg)
An example:Two hierarchical clusters of the expression values of a single gene measured
in 5 experiments.
52
100A
200B
500C
900D
1100E
100A
200B
500C
900D
1100E
The dendograms are identical: both diagrams show that:•A is most similar to B•C is most similar to the group (A,B)•D is most similar to E
In the left dendogram A and E are plotted far from each otherIn the right dendogram A and E are immediate neighbors
THE PROXIMITY IN A HIERARCHICAL CLUSTERING DOES NOT NECESSARILYCORRESPOND TO SIMILARITY
![Page 50: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/50.jpg)
What Is the Problem of the K-Means Method?
• The k-means algorithm is sensitive to outliers !
– Since an object with an extremely large value may substantially distort
the distribution of the data.
• K-Medoids: Instead of taking the mean value of the object in a
cluster as a reference point, medoids can be used, which is the
most centrally located object in a cluster.
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
![Page 51: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/51.jpg)
Limitations of K-means: Differing Sizes
Original Points K-means (3 Clusters)
![Page 52: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/52.jpg)
Limitations of K-means: Differing Density
Original Points K-means (3 Clusters)
![Page 53: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/53.jpg)
Limitations of K-means: Non-globular Shapes
Original Points K-means (2 Clusters)
![Page 54: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/54.jpg)
57
The K-Medoids Clustering Method
• Find representative objects, called medoids, in
clusters
• Medoids are located in the center of the
clusters.
– Given data points, how to find the medoid?
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
![Page 55: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/55.jpg)
58
Categorical Values
• Handling categorical data: k-modes (Huang’98)
– Replacing means of clusters with modes
• Mode of an attribute: most frequent value
• Mode of instances: for an attribute A, mode(A)= most frequent value
• K-mode is equivalent to K-means
– Using a frequency-based method to update modes of clusters
– A mixture of categorical and numerical data: k-prototype method
![Page 56: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/56.jpg)
X1 2 6
X2 3 4
X3 3 8
X4 4 7
X5 6 2
X6 6 4
X7 7 3
X8 7 4
X9 8 5
X10 7 6
K-mediods example
![Page 57: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/57.jpg)
• Initialize k mediods
• Let us assume c1 = (3,4) and c2 = (7,4)
• Calculate distance so as to associate each data object to its nearest medoid.
![Page 58: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/58.jpg)
i c1
Data objects (Xi)
Cost (distance)
1 3 4 2 6 3
3 3 4 3 8 4
4 3 4 4 7 4
5 3 4 6 2 5
6 3 4 6 4 3
7 3 4 7 3 5
9 3 4 8 5 6
10 3 4 7 6 6
i c2
Data objects (Xi)
Cost (distance)
1 7 4 2 6 7
3 7 4 3 8 8
4 7 4 4 7 6
5 7 4 6 2 3
6 7 4 6 4 1
7 7 4 7 3 1
9 7 4 8 5 2
10 7 4 7 6 2
Cluster1 = {(3,4)(2,6)(3,8)(4,7)}Cluster2 = {(7,4)(6,2)(6,4)(7,3)(8,5)(7,6)}
![Page 59: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/59.jpg)
i O′Data objects (Xi)
Cost (distance)
1 7 3 2 6 8
3 7 3 3 8 9
4 7 3 4 7 7
5 7 3 6 2 2
6 7 3 6 4 2
8 7 3 7 4 1
9 7 3 8 5 3
10 7 3 7 6 3
i c1
Data objects (Xi)
Cost (distance)
1 3 4 2 6 3
3 3 4 3 8 4
4 3 4 4 7 4
5 3 4 6 2 5
6 3 4 6 4 3
8 3 4 7 4 4
9 3 4 8 5 6
10 3 4 7 6 6
• Select one of the nonmedoids O′. Let us assume O′ = (7,3)• Now the medoids are c1(3,4) and O′(7,3)
• Do not change the mediod as S > 0
![Page 60: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/60.jpg)
Fuzzy C-means Clustering
• Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters.
• This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition.
![Page 61: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/61.jpg)
Fuzzy C-means Clustering
![Page 62: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/62.jpg)
Fuzzy C-means Clustering
![Page 63: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/63.jpg)
Fuzzy C-means Clustering
![Page 64: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/64.jpg)
Fuzzy C-means Clustering
![Page 65: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/65.jpg)
Fuzzy C-means Clustering
![Page 66: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/66.jpg)
Fuzzy C-means Clustering
![Page 67: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/67.jpg)
Fuzzy C-means Clustering
![Page 68: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/68.jpg)
Fuzzy C-means Clusteringhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html
![Page 69: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/69.jpg)
Fuzzy C-means Clustering• For example: we have initial centroid 3 & 11
(with m=2)
• For node 2 (1st element):
U11 =
The membership of first node to first cluster
U12 =
The membership of first node to second cluster
%78.9882
81
81
11
1
112
32
32
32
1
12
2
12
2
%22.182
1
181
1
112
112
32
112
1
12
2
12
2
![Page 70: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/70.jpg)
Fuzzy C-means Clustering
• For example: we have initial centroid 3 & 11 (with m=2)
• For node 3 (2nd element):
U21 = 100% The membership of second node to first cluster
U22 = 0%
The membership of second node to second cluster
![Page 71: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/71.jpg)
Fuzzy C-means Clustering
• For example: we have initial centroid 3 & 11
(with m=2)
• For node 4 (3rd element):
U31 =
The membership of first node to first cluster
U32 =
The membership of first node to second cluster
%98
49
50
1
49
11
1
114
34
34
34
1
12
2
12
2
%250
1
149
1
114
114
34
114
1
12
2
12
2
![Page 72: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/72.jpg)
Fuzzy C-means Clustering
• For example: we have initial centroid 3 & 11
(with m=2)
• For node 7 (4th element):
U41 =
The membership of fourth node to first cluster
U42 =
The membership of fourth node to second cluster
%502
1
11
1
117
37
37
37
1
12
2
12
2
%502
1
11
1
117
117
37
117
1
12
2
12
2
![Page 73: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/73.jpg)
Fuzzy C-means Clustering
• C1=
...%)50(%)98(%)100(%)78.98(
...7*%)50(4*%)98(3*%)100(2*%)78.98(2222
2222
![Page 74: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/74.jpg)
FCM Soft Clustering
![Page 75: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/75.jpg)
Gaussian Mixture Model
![Page 76: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/76.jpg)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(x
)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x
)
![Page 77: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/77.jpg)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(x
)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x
)
![Page 78: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/78.jpg)
-5 0 5 100
0.5
1
1.5
2
Component Models
p(x
)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x
)
![Page 79: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/79.jpg)
The EM Algorithm
• Dempster, Laird, and Rubin
– general framework for likelihood-based parameter estimation with missing data
• start with initial guesses of parameters
• E step: estimate memberships given params
• M step: estimate params given memberships
• Repeat until convergence
– converges to a (local) maximum of likelihood
– E step and M step are often computationally simple
– generalizes to maximum a posteriori (with priors)
![Page 80: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/80.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4ANEMIA PATIENTS AND CONTROLS
Red Blood Cell Volume
Red B
lood C
ell
Hem
oglo
bin
Concentr
ation
![Page 81: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/81.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Red B
lood C
ell H
em
oglo
bin
Concentration
EM ITERATION 1
![Page 82: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/82.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Red B
lood C
ell H
em
oglo
bin
Concentration
EM ITERATION 3
![Page 83: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/83.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Red B
lood C
ell H
em
oglo
bin
Concentration
EM ITERATION 5
![Page 84: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/84.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Red B
lood C
ell H
em
oglo
bin
Concentration
EM ITERATION 10
![Page 85: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/85.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Red B
lood C
ell H
em
oglo
bin
Concentration
EM ITERATION 15
![Page 86: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/86.jpg)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Red B
lood C
ell H
em
oglo
bin
Concentration
EM ITERATION 25
![Page 87: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/87.jpg)
Gaussian Mixture
• The Gaussian mixture architecture estimates probability density functions (PDF) for each class, and then performs classification based on Bayes’ rule:
)(
)()|()|(
XP
CPCXPXCP i
ii
Where P(X | Ci) is the PDF of class i, evaluated at X, P( Ci ) is the prior probability for class i, and P(X) is the overall PDF, evaluated
at X.
![Page 88: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/88.jpg)
Gaussian Mixture
• Unlike the unimodal Gaussian architecture, which assumes P(X | Cj) to be in the form of a Gaussian, the Gaussian mixture model estimates P(X | Cj) as a weighted average of multiple Gaussians.
Nc
k
kkj GwCXP1
)|(
where wk is the weight of the k-th Gaussian Gk and the weightssum to one. One such PDF model is produced for each class.
![Page 89: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/89.jpg)
Gaussian Mixture
• Each Gaussian component is defined as:
)]()(2/1[
2/12/
1
||)2(
1kk
Tk MXVMX
k
nk e
VG
where Mk is the mean of the Gaussian and Vk is the covariancematrix of the Gaussian.
![Page 90: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/90.jpg)
Gaussian Mixture
• Free parameters of the Gaussian mixture model consist of the means and covariance matrices of the Gaussian components and the weights indicating the contribution of each Gaussian to the approximation of P(X | Cj).
![Page 91: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/91.jpg)
Composition of Gaussian Mixture
G1,w1 G2,w2
G3,w3
G4,w4
G5,w5
Class 1
)(
)()|()|(
XP
CPCXPXCP
j
jj
Nc
k
kkj GwCXP1
)|(
)]()(2/1[
2/12/
1
||)2(
1)|( i
Ti XViX
i
dik eV
GXpG
Variables: μi, Vi, wk
We use EM (estimate-maximize) algorithm to approximate this variables.
![Page 92: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/92.jpg)
Gaussian Mixture
• These parameters are tuned using a complex iterative procedure called the estimate-maximize (EM) algorithm, that aims at maximizing the likelihood of the training set generated by the estimated PDF.
![Page 93: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/93.jpg)
Gaussian Mixture Training Flow Chart (1)
Initialize the initial Gaussian means μi, i=1,…G using the K means clustering algorithmInitialize the covariance matrices, Vi, to the distance to the nearest cluster.Initialize the weights πi =1 / G so that all Gaussian are equally likely.
)]()(2/1[
2/12/
1
||)2(
1)|( i
Ti XViX
i
di eV
GXp
)|()|(1
i
G
i
is GXpXp
Present each pattern X of the training set and model each of the classes Kas a weighte sum of Gaussians:
where G is the number of Gaussians, the πi’s are the weights, and
where Vi is the covariance matrix.
![Page 94: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/94.jpg)
Gaussian Mixture Training Flow Chart (2)
Compute:
G
j
kjj
kiikiiiip
CXp
CGXp
Xp
CGXpXGP
1
),|(
),|(
)(
),|()|(
Iteratively update the weights, means and covariances:
cN
ppi
c
i tN
t1
)(1
)1(
p
N
p
pi
ic
XttN
tic
1
)()(
1)1(
)))())(((()()(
1)1(
1
T
ipip
N
ppi
ic
i tXtXttN
tVc
E-Step for EM
M-Step for EM
Note: Here is the PE which we did in the class
![Page 95: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/95.jpg)
Gaussian Mixture Training Flow Chart (3)
Recompute τip using the new weights, means and covariances. Stop training if
Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.
thresholdttpipipi )()1(
![Page 96: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/96.jpg)
Gaussian Mixture Test Flow Chart
Present each input pattern X and compute the confidence for each class j:
where is the prior probability of class Cj estimated by counting the number of training patterns. Classify pattern X as the class with the highest confidence.
),|()( jxj CXPCP
N
NCP ci
j )(
Gaussian Mixture End
![Page 97: Lecture # 11 Clustering• Partitioning approach: – Construct various partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors – Typical methods:](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f10339f7e708231d447f1ae/html5/thumbnails/97.jpg)
100
Acknowledgements
Introduction to Machine Learning, Alphaydin
Pattern Classification” by Duda et al., John Wiley & Sons.
Read GMM from “Automated Detection of Exudates in Colored Retinal Images for Diagnosis of Diabetic Retinopathy”, Applied Optics, Vol. 51 No. 20, 4858-4866, 2012.
Mat
eria
l in
th
ese
slid
es h
as b
een
tak
en f
rom
, th
e fo
llow
ing
reso
urc
es