Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13...
Transcript of Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13...
![Page 1: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/1.jpg)
CE-717: Machine Learning Sharif University of Technology
Fall 2020
Soleymani
Clustering
![Page 2: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/2.jpg)
Unsupervised learning
2
Clustering: partitioning of data into groups of similar
data points.
Density estimation
Parametric & non-parametric density estimation
Dimensionality reduction: data representation using a
smaller number of dimensions while preserving (perhaps
approximately) some properties of the data.
![Page 3: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/3.jpg)
Clustering: Definition
3
We have a set of unlabeled data points ๐(๐)๐=1
๐and we intend
to find groups of similar objects (based on the observed
features)
high intra-cluster similarity
low inter-cluster similarity
๐ฅ1
๐ฅ2
![Page 4: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/4.jpg)
Clustering: Another Definition
4
Density-based definition:
Clusters are regions of high density that are separated from
one another by regions of low density
๐ฅ1
๐ฅ2
![Page 5: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/5.jpg)
Difficulties
5
Clustering is not as well-defined as classification
Clustering is subjective
Natural grouping may be ambiguous
![Page 6: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/6.jpg)
Clustering Purpose
6
Preprocessing stage to index, compress, or reduce the data
Representing high-dimensional data in a low-dimensional space(e.g., for visualization purposes).
Knowledge discovery from data: As a tool to understand thehidden structure in data or to group them To gain insight into the structure of the data (prior to classifier design)
Provides information about the internal structure of the data
To group or partition the data when no label is available
![Page 7: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/7.jpg)
Clustering Applications
7
Information retrieval (search and browsing) Cluster text docs or images based on their content
Cluster groups of users based on their access patterns on
webpages
![Page 8: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/8.jpg)
Clustering of docs
8
Google news
![Page 9: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/9.jpg)
Clustering Applications
9
Information retrieval (search and browsing) Cluster text docs or images based on their content
Cluster groups of users based on their access patterns on
webpages
Cluster users of social networks by interest
(community detection).
![Page 10: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/10.jpg)
Social Network: Community Detection
10
![Page 11: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/11.jpg)
Clustering Applications
11
Information retrieval (search and browsing) Cluster text docs or images based on their content
Cluster groups of users based on their access patterns on
webpages
Cluster users of social networks by interest (community
detection).
Bioinformatics cluster similar proteins together (similarity w.r.t. chemical
structure and/or functionality etc)
or cluster similar genes according to microarray data
![Page 12: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/12.jpg)
Gene clustering
12
Microarrays measures the expression of all genes
Clustering genes can help to determine new functions for
unknown genes by grouping genes
![Page 13: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/13.jpg)
Clustering Applications
13
Information retrieval (search and browsing)
Cluster text docs or images based on their content
Cluster groups of users based on their access patterns on webpages
Cluster users of social networks by interest (communitydetection).
Bioinformatics
Cluster similar proteins together (similarity wrt chemical structureand/or functionality etc) or similar genes according to microarraydata
Market segmentation
Clustering customers based on the their purchase history and theircharacteristics
Image segmentation
Many more applications
![Page 14: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/14.jpg)
14
Hierarchical Partitional
Categorization of Clustering Algorithms
Partitional algorithms: Construct various partitions and then evaluate
them by some criterion
the desired number of clusters K must be specified.
Hierarchical algorithms: Create a hierarchical decomposition of the set of
objects using some criterion
![Page 15: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/15.jpg)
Clustering methods we will discuss
15
Objective based clustering
K-means
EM-style algorithm for clustering for mixture of Gaussians (in
the next lecture)
Hierarchical clustering
![Page 16: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/16.jpg)
Partitional Clustering
16
๐ณ = ๐(๐)๐=1
๐
๐ = {๐1, ๐2, โฆ , ๐๐พ}
โ๐, ๐๐ โ โ
๐=1ฺ๐พ ๐๐ = ๐ณ
โ๐, ๐, ๐๐ โฉ ๐๐ = โ (disjoint partitioning for hard clustering)
Since the output is only one set of clusters the user
has to specify the desired number of clusters K.
Hard clustering: Each data can belong to one cluster only
Nonhierarchical, each instance is placed in
exactly one of K non-overlapping clusters.
![Page 17: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/17.jpg)
Partitioning Algorithms: Basic Concept
Construct a partition of a set of ๐ objects into a set
of ๐พ clusters
The number of clusters ๐พ is given in advance
Each object belongs to exactly one cluster in hard
clustering methods
K-means is the most popular partitioning algorithm
17
![Page 18: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/18.jpg)
Objective Based Clustering
18
Input: A set of ๐ points, also a distance/dissimilaritymeasure
Output: a partition of the data.
k-median: find centers ๐1, ๐2, โฆ , ๐๐พ to minimize
๐=1
๐
min๐โ1,โฆ,๐พ
๐(๐ ๐ , ๐๐)
k-means: find centers ๐1, ๐2, โฆ , ๐๐พ to minimize
๐=1
๐
min๐โ1,โฆ,๐พ
๐2(๐ ๐ , ๐๐)
![Page 19: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/19.jpg)
Distance Measure
19
Let ๐1 and ๐2 be two objects from the universe of
possible objects. The distance (dissimilarity) between ๐1and ๐2 is a real number denoted by ๐(๐1, ๐2)
Specifying the distance ๐(๐, ๐โฒ) between pairs (๐, ๐โฒ).
E.g., for texts: # keywords in common, edit distance
Example: Euclidean distance in the space of features
![Page 20: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/20.jpg)
K-means Clustering
Input: a set ๐ 1 , โฆ , ๐ ๐ of data points (in a ๐-dim featurespace) and an integer ๐พ
Output: a set of ๐พ representatives ๐1, ๐2, โฆ , ๐๐พ โ โ๐ as thecluster representatives data points are assigned to the clusters according to their distances to๐1, ๐2, โฆ , ๐๐พ Each data is assigned to the cluster whose representative is nearest to it
Objective: choose ๐1, ๐2, โฆ , ๐๐พ to minimize:
๐=1
๐
min๐โ1,โฆ,๐พ
๐2(๐ ๐ , ๐๐)
20
![Page 21: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/21.jpg)
Euclidean k-means Clustering
21
Input: a set ๐ 1 , โฆ , ๐ ๐ of data points (in a ๐-dim featurespace) and an integer ๐พ
Output: a set of ๐พ representatives ๐1, ๐2, โฆ , ๐๐พ โ โ๐ as thecluster representatives data points are assigned to the clusters according to their distances to๐1, ๐2, โฆ , ๐๐พ Each data is assigned to the cluster whose representative is nearest to it
Objective: choose ๐1, ๐2, โฆ , ๐๐พ to minimize:
๐=1
๐
min๐โ1,โฆ,๐พ
๐ ๐ โ ๐๐2
each point assigned to its closest cluster representative
![Page 22: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/22.jpg)
Euclidean k-means Clustering:
Computational Complexity
22
To find the optimal partition, we need to exhaustively
enumerate all partitions
In how many ways can we assign ๐ labels to ๐ observations?
NP hard: even for ๐ = 2 or ๐ = 2
For k=1:min๐
ฯ๐=1๐ ๐ ๐ โ ๐
2
๐ = ๐ =1
๐ฯ๐=1๐ ๐ ๐
For ๐ = 1, dynamic programming in time ๐(๐2๐พ).
![Page 23: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/23.jpg)
Common Heuristic in Practice: The Lloydโs
method
23
Input:A set ๐ณ of ๐ datapoints ๐ 1 , โฆ , ๐ ๐ in โ๐
Initialize centers ๐1, ๐2, โฆ , ๐๐พ โ โ๐ in any way.
Repeat until there is no further change in the cost.
For each ๐:๐๐ โ {๐ โ ๐ณ|where ๐๐ is the closest center to ๐}
For each ๐: ๐๐โmean of members of ๐๐
Holding centers ๐1, ๐2, โฆ , ๐๐พ fixed
Find optimal assignments ๐1, โฆ , ๐๐พ of data points to clusters
Holding cluster assignments ๐1, โฆ , ๐๐พ fixed
Find optimal centers ๐1, ๐2, โฆ , ๐๐พ
![Page 24: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/24.jpg)
K-means Algorithm (The Lloydโs method)
24
Select ๐ random points ๐1, ๐2, โฆ ๐๐ as clustersโ initial centroids.
Repeat until converges (or other stopping criterion):
for i=1 to N do:
Assign ๐(๐) to the closet cluster and thus ๐๐ contains all
data that are closer to ๐๐ than to anyother cluster
for j=1 to k do
๐๐ =1
๐๐ฯ๐(๐)โ๐๐
๐(๐)
Assign data based on current centers
Re-estimate centers based on current assignment
![Page 25: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/25.jpg)
25
[Bishop]
Assigning data to
clustersUpdating means
![Page 26: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/26.jpg)
Intra-cluster similarity view
26
k-means optimizes intra-cluster similarity:
๐ฝ(๐) =๐=1
๐พ
๐(๐)โ๐๐
๐ ๐ โ ๐๐2
๐๐ =1
๐๐
๐(๐)โ๐๐
๐(๐)
ฯ๐(๐)โ๐๐
๐ ๐ โ ๐๐2=
1
2 ๐๐ฯ๐(๐)โ๐๐
ฯ๐(๐
โฒ)โ๐๐๐ ๐ โ ๐ ๐โฒ
2
the average distance to members of the same cluster
![Page 27: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/27.jpg)
K-means: Convergence
It always converges.
Why should the K-means algorithm ever reach a state in which
clustering doesnโt change.
Reassignment stage monotonically decreases ๐ฝ since each vector is
assigned to the closest centroid.
Centroid update stage also for each cluster minimizes the sum of
squared distances of the assigned points to the cluster from its center.
Sec. 16.4
27
After E-step
After M-step
[Bishop]
![Page 28: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/28.jpg)
Local optimum
28
It always converges
but it may converge at a local optimum that is different
from the global optimum
may be arbitrarily worse in terms of the objective score.
![Page 29: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/29.jpg)
Local optimum
29
It always converges
but it may converge at a local optimum that is different
from the global optimum
may be arbitrarily worse in terms of the objective score.
![Page 30: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/30.jpg)
Local optimum
30
It always converges
but it may converge at a local optimum that is different
from the global optimum
may be arbitrarily worse in terms of the objective score.
Local optimum: every point is assigned to its nearest center and
every center is the mean value of its points.
![Page 31: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/31.jpg)
K-means: Local Minimum Problem
The obtained ClusteringOptimal Clustering
Original Data
31
![Page 32: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/32.jpg)
The Lloydโs method: Initialization
32
Initialization is crucial (how fast it converges, quality of
clustering)
Random centers from the data points
Multiple runs and select the best ones
Initialize with the results of another method
Select good initial centers using a heuristic
Furthest traversal
K-means ++ (works well and has provable gaurantees)
![Page 33: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/33.jpg)
Another Initialization Idea: Furthest Point
Heuristic
33
Choose ๐1 arbitrarily (or at random).
For ๐ = 2,โฆ , ๐พ
Select ๐๐ among datapoints ๐(1), โฆ , ๐(๐) that is farthest from
previously chosen ๐1, โฆ , ๐๐โ1
![Page 34: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/34.jpg)
Another Initialization Idea: Furthest Point
Heuristic
34
It is sensitive to outliers
![Page 35: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/35.jpg)
K-means++ Initialization: D2 sampling [D. Arthur and S. Vassilvitskii, 2007]
35
Combine random initialization and furthest point initialization ideas
Let the probability of selection of the point be proportional to thedistance between this point and its nearest center. probability of selecting of ๐ is proportional to ๐ท2 ๐ = min
๐<๐๐ โ ๐๐
2.
Choose ๐1 arbitrarily (or at random).
For ๐ = 2,โฆ , ๐พ Select ๐๐ among data points ๐(1), โฆ , ๐(๐) according to the distribution:
Pr(๐๐ = ๐(๐)) โ min๐<๐
๐ ๐ โ ๐๐2
Theorem: K-means++ always attains an ๐(log ๐) approximation tooptimal k-means solution in expectation.
![Page 36: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/36.jpg)
How Many Clusters?
36
Number of clusters ๐ is given in advance in the k-means
algorithm
However, finding the โrightโ number of clusters is a part of the problem
Tradeoff between having better focus within each cluster and
having too many clusters
![Page 37: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/37.jpg)
How Many Clusters?
Heuristic:
Find large gap between ๐ โ 1-means cost and
๐-means cost.
โknee findingโ or โelbow findingโ.
Hold-out validation/cross-validation on auxiliary task (e.g.,supervised learning task).
Optimization problem: penalize having lots of clusters some criteria can be used to automatically estimate k
Penalize the number of bits you need to describe the extra parameter
๐ฝโฒ(๐) = ๐ฝ(๐) + |๐| ร log๐
Hierarchical clustering
37
After E-step
After M-step
![Page 38: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/38.jpg)
K-means: Advantages and disadvantages
38
Strength
It is a simple method and easy to implement.
Relatively efficient:๐(๐ก๐พ๐๐), where ๐ก is the number of iterations.
K-means typically converges quickly
Usually ๐ก โช ๐.
Exponential # of rounds in the worst case [AndreaVattani 2009].
Weakness
Need to specify K, the number of clusters, in advance
Often terminates at a local optimum.
Initialization is important.
Not suitable to discover clusters with arbitrary shapes
Works for numerical data.What about categorical data?
Noise and outliers can be considerable trouble to K-means
![Page 39: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/39.jpg)
k-means Algorithm: Limitation
In general, k-means is unable to find clusters of arbitrary
shapes, sizes, and densities
Except to very distant clusters
39
![Page 40: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/40.jpg)
K-means
40
K-means was proposed near 60 years ago
thousands of clustering algorithms have been published since
then
However, K-means is still widely used.
This speaks to the difficulty in designing a general purpose
clustering algorithm and the ill-posed problem of
clustering.
A.K. Jain, Data Clustering: 50 years beyond k-means,2010.
![Page 41: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/41.jpg)
K-means: Vector Quantization
41
Data Compression
Vector quantization: construct a codebook using k-means
cluster means as prototypes representing examples assigned to
clusters.
๐ = 3 ๐ = 5 ๐ = 15
![Page 42: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/42.jpg)
K-means: Image Segmentation
42[Bishop]
![Page 43: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/43.jpg)
Hierarchical Clustering
43
Notion of a cluster can be ambiguous?
How many clusters?
Hierarchical Clustering: Clusters contain sub-clusters and sub-
clusters themselves can have sub-sub-clusters, and so on
Several levels of details in clustering
A hierarchy might be more natural.
Different levels of granularity
![Page 44: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/44.jpg)
44
Hierarchical Partitional
AgglomerativeDivisive
Categorization of Clustering Algorithms
![Page 45: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/45.jpg)
Hierarchical Clustering
45
Agglomerative (bottom up):
Starts with each data in a separate cluster
Repeatedly joins the closest pair of clusters, until there is only one
cluster (or other stopping criteria).
Divisive (top down):
Starts with the whole data as a cluster
Repeatedly divide data in one of the clusters until there is only one data
in each cluster (or other stopping criteria).
![Page 46: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/46.jpg)
Hierarchical Agglomerative Clustering (HAC)
Algorithm
1. Maintain a set of clusters
2. Initially, each instance forms a cluster
3. While there are more than one cluster
Pick the two closest one
Merge them into a new cluster
46
![Page 47: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/47.jpg)
Hierarchical Agglomerative Clustering (HAC)
Algorithm
1. Maintain a set of clusters
2. Initially, each instance forms a cluster
3. While there are more than one cluster
Pick the two closest one
Merge them into a new cluster
47 765 3241
7
6
54
3
2
1
Height represents the
distance at which the
merge occurs
![Page 48: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/48.jpg)
Distances between Cluster Pairs
Many variants to defining distances between pair of
clusters
Single-link
Minimum distance between different pairs of data
Complete-link
Maximum distance between different pairs of data
Centroid (Wardโs)
Distance between centroids (centers of gravity)
Average-link
Average distance between pairs of elements
48
![Page 49: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/49.jpg)
Distances between Cluster Pairs
Single-link Complete-link
Wardโs Average-link
๐๐๐ ๐ก๐๐ฟ ๐๐ , ๐๐ = min๐โ๐๐, ๐
โฒโ๐๐๐๐๐ ๐ก(๐, ๐โฒ)
๐๐๐ ๐ก๐ถ๐ฟ ๐๐ , ๐๐ = max๐โ๐๐, ๐
โฒโ๐๐๐๐๐ ๐ก(๐, ๐โฒ)
๐๐๐ ๐ก๐๐๐๐ ๐๐ , ๐๐ =๐๐ ๐๐
๐๐ + ๐๐๐๐๐ ๐ก(๐๐ , ๐๐) ๐๐๐ ๐ก๐ด๐ฟ ๐๐ , ๐๐ =
1
๐๐ โช ๐๐
๐โ๐๐โช๐๐
๐โฒโ๐๐โช๐๐
๐๐๐ ๐ก(๐, ๐โฒ)
![Page 50: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/50.jpg)
Single-Link
50
765 3241
7
6
54
32
1
keep max bridge length as small as possible.
![Page 51: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/51.jpg)
Complete Link
51
7
6
54
32
1
765 3241
keep max diameter as small as possible.
![Page 52: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/52.jpg)
Wardโs method
52
The distances between centers of the two clusters
(weighted to consider sizes of clusters too):
๐๐๐ ๐ก๐๐๐๐ ๐๐ , ๐๐ =๐๐ ๐๐
๐๐ + ๐๐๐๐๐ ๐ก(๐๐ , ๐๐)
Merge the two clusters such that the increase in k-means
cost is as small as possible.
Works well in practice.
![Page 53: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/53.jpg)
Distances between Clusters: Summary
Which distance is the best?
Complete linkage prefers compact clusters.
Single linkage can produce long stretched clusters.
The choice depends on what you need.
expert opinion is helpful
53
![Page 54: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/54.jpg)
Data Matrix vs. Distance Matrix
54
Data (or pattern) Matrix:๐ ร ๐ (features of data):
๐ฟ =๐ฅ1(1)
โฏ ๐ฅ๐(1)
โฎ โฑ โฎ
๐ฅ1(๐)
โฏ ๐ฅ๐(๐)
Distance Matrix:๐ ร ๐ (distances of each pattern pair):
๐ซ =๐(๐ 1 , ๐(1) ) โฏ ๐(๐ 1 , ๐(๐) )
โฎ โฑ โฎ๐(๐ ๐ , ๐(1) ) โฏ ๐(๐ ๐ , ๐(๐) )
Single-link, complete-link, and average link only needs the distance matrix.
![Page 55: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/55.jpg)
Computational Complexity
In the first iteration, all HAC methods compute similarity of all
pairs of ๐ individual instances which is O(๐2) similarity
computation.
In each ๐ โ 2 merging iterations, compute the distance
between the most recently created cluster and all other
existing clusters.
if done naively O ๐3 but if done more cleverly O ๐2 log๐
55
![Page 56: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/56.jpg)
Dendrogram: Hierarchical Clustering
56
Clustering obtained by cutting the dendrogram at a desired
level
Cut at a pre-specified level of similarity
where the gap between two successive combination similarities is largest
select the cutting point that produces K clusters
Where to โcutโ the dendrogram
is user-determined.
7653241
![Page 57: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/57.jpg)
Outliers
57
We can detect outliers (that are very different to all
others) by finding the isolated branches
765214 3 8
![Page 58: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/58.jpg)
K-means vs. Hierarchical
Time cost:
K-means is usually fast while hierarchical methods do not scale well
Human intuition
Hierarchical structure provides more natural output compatible withhuman intuition in some domains
Local minimum problem
It is very common for k-means
Hierarchical methods like any heuristic search algorithms also sufferfrom local optima problem.
Since they can never undo what was done previously and greedily mergeclusters
Choosing of the number of clusters
There is no need to specify the number of clusters in advance forhierarchical methods
58
![Page 59: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/59.jpg)
Clustering Validity
We need to determine whether the found clusters arereal or compare different clustering methods.
What is a good clustering?
clustering quality measurement
Main approaches:
Internal index: evaluate how well the clustering fit the datawithout reference to an external information.
External index: evaluate how well is the clustering resultwith respect to known categories.
Assumption: Ground truth labels are available
Sec. 16.3
59
![Page 60: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/60.jpg)
Internal Index: Coherence
60
Internal criterion is usually based on coherence:
Compactness of the data in the clusters
high intra-cluster similarity
closeness of cluster elements.
Separability of distinct clusters
low inter-cluster similarity
Some internal indices: Davies-Bouldin (DB), Silhouette ,
DUNN, Bayesian information criterion (BIC), Calinski-
Harabasz (CH)
![Page 61: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/61.jpg)
Internal Index: Stability
61
Evaluate cluster stability to minor perturbation of data.
For example, evaluate a clustering result by comparing it with
the obtained result after subsampling of data (e.g., subsampling
80% of data).
To find stability, we need a measure of similarity between
two k-clusterings.
It is based on comparing two k-clusterings
Similar to external indices that compare the clustering result with the
ground truth.
![Page 62: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/62.jpg)
External Index: Rand Index and Clustering
F-measure
Sec. 16.3
62
๐ ๐ผ =๐๐+๐๐
๐๐+๐๐+๐น๐+๐น๐
๐ =๐๐
๐๐+๐น๐,๐ =
๐๐
๐๐+๐น๐
๐น๐ฝ =๐ฝ2+1 ๐๐
๐ฝ2๐+๐
F measure in addition supports differential weighting of P and R.
๐ฝ๐๐๐๐๐๐ =๐๐
๐๐+๐น๐+๐น๐
Same Different
Same TP FN
Different FP TN
๐แ๐๐๐: # pairs that cluster together in both ๐ and แ๐
๐๐: # pairs that are in separate clusters in both ๐ and แ๐๐น๐: # pairs that cluster together in ๐ but not in แ๐๐น๐: # pairs that cluster together in แ๐ but not in ๐
![Page 63: Clusteringce.sharif.edu/courses/99-00/1/ce717-1/resources/root/...Clustering Applications 13 Information retrieval (search and browsing) Cluster text docs or images based on their](https://reader033.fdocuments.in/reader033/viewer/2022053122/60a8be5703de322db73252bf/html5/thumbnails/63.jpg)
Major Dilemma [Jain, 2010]
What is a cluster?
What features should be used?
Should the data be normalized?
How do we define the pair-wise similarity?
Which clustering method should be used?
How many clusters are present in the data?
Does the data contain any outliers?
Does the data have any clustering tendency?
Are the discovered clusters and partition valid?
63