1 Clustering Algorithms Hierarchical Clustering k -Means Algorithms CURE Algorithm.
Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering...
Transcript of Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering...
![Page 1: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/1.jpg)
Clustering
CS498
![Page 2: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/2.jpg)
Today’s lecture
• Clustering and unsupervised learning
• Hierarchical clustering
• K-means, K-medoids, VQ
![Page 3: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/3.jpg)
Unsupervised learning
• Supervised learning – Use labeled data to do something smart
• What if the labels don’t exist?
![Page 4: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/4.jpg)
Some inspiration
El Capitan, Yosemite National Park
![Page 5: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/5.jpg)
The way we’ll see it
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Red
Green
Blue
![Page 6: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/6.jpg)
A new question
• I see classes, but …
• How do I find them? – Can I automate this? – How any are there?
• Answer: Clustering 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Red
GreenBlue
![Page 7: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/7.jpg)
Clustering
• Discover classes in data – Divide data in sensible clusters
• Fundamentally ill-defined problem – There is often no correct solution
• Relies on many user choices
![Page 8: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/8.jpg)
Clustering process
• Describe your data using features – What’s your objective?
• Define a proximity measure – How is the feature space shaped?
• Define a clustering criterion – When do samples make a cluster?
![Page 9: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/9.jpg)
Know what you want
• Features & objective matter – Which are the two classes?
![Page 10: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/10.jpg)
Know what you want
• Features & objective matter – Which are the two classes?
Player’s height
Pla
yer’s
kno
wle
dge
of e
ntom
olog
y Basketball player recruiting
![Page 11: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/11.jpg)
Know your space
• Define a sensible proximity measure
![Page 12: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/12.jpg)
Know your space
• Define a sensible proximity measure
Angle of incidence
4π 0 2π
![Page 13: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/13.jpg)
Know your cluster type
• What forms a cluster in your space?
![Page 14: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/14.jpg)
Know your cluster type
• What forms a cluster in your space?
Speed
Alti
tude
Planes near airports
![Page 15: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/15.jpg)
Know your cluster type
• What forms a cluster in your space?
Wavefront position
Inte
nsity
Sound bouncing off a wall
![Page 16: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/16.jpg)
Know your cluster type
• What forms a cluster in your space?
East
Ants departing colony
West
South
North
![Page 17: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/17.jpg)
How many clusters?
• The deeper you look the more you’ll get
![Page 18: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/18.jpg)
There are no right answers!
• Part of clustering is an art
• You need to experiment to get there
• But some good starting points exist
![Page 19: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/19.jpg)
How to cluster
• Tons of methods
• We can use step-based logical steps – e.g., find two closest point and merge, repeat
• Or formulate a global criterion
![Page 20: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/20.jpg)
Hierarchical methods
• Agglomerative algorithms – Keep pairing up your data
• Divisive algorithms
– Keep breaking up your data
![Page 21: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/21.jpg)
Agglomerative Approach
• Look at your data points and form pairs – Keep at it
![Page 22: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/22.jpg)
More formally
• Represent data as vectors:
• Represent clusters by: • Represent the clustering by:
• Define a distance measure:
X = {xi ,i = 1,…,N}
C j
R = {C j , j = 1,…,m}e.g. R = {{x1,x3},x2{x4,x5,x6}}
d(C j ,Ci)
![Page 23: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/23.jpg)
Agglomerative clustering
• Choose: • For t = 1,…
– Among all clusters in Rt-1, find cluster pair {Ci,Cj} such that:
– Form new cluster and replace pair:
• Until we have only one cluster
R0 = {Ci = {xi},i = 1,…,N}
argmin
i,jd(Ci ,C j )
Cq =Ci ∪C j
Rt = (Rt−1−{Ci ,C j})∪{Cq}
![Page 24: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/24.jpg)
Pretty picture version
• Dendrogram
x1 x2 x3 x4 x5 R0
R1 R2
R3
R4
Sim
ilarity
![Page 25: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/25.jpg)
Pretty picture two
• Venn diagram
x1
x2
x3 x4 x5
R1
R2
R3 R4
![Page 26: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/26.jpg)
Cluster distance?
• Complete linkage – Merge clusters that result to the smallest
diameter
• Single linkage – Merge clusters with two closest data points
• Group average – Use average of distances
![Page 27: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/27.jpg)
What’s involved
• At level t we have N - t clusters • At level t+1 the pairs we consider are:
• Overall comparisons:
N −t
2⎛
⎝⎜⎜⎜⎜
⎞
⎠⎟⎟⎟⎟≡
(N −t)(N −t−1)2
N −t
2⎛
⎝⎜⎜⎜⎜
⎞
⎠⎟⎟⎟⎟t=0
N−1
∑ ≡(N −1)N(N +1)
6
![Page 28: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/28.jpg)
Not good for our case
• El Capitan picture has 63,140 pixels • How many cluster comparisons is that?
– Thanks, but no thanks …
N −t
2⎛
⎝⎜⎜⎜⎜
⎞
⎠⎟⎟⎟⎟t=0
N−3
∑ = 41,946,968,141,536
![Page 29: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/29.jpg)
Divisive Clustering
• Works the other way around
• Start with all data in one cluster
• Start dividing into sub-clusters
![Page 30: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/30.jpg)
Divisive Clustering
• Choose: • For t = 1,…
– For k = 1,…,t • Find least similar sub-clusters in each cluster
– Pick the least similar of all:
– New clustering is now:
• Until each point is a cluster
R0 = {X}
argmax
k ,i,jd(Ck ,i ,Ck ,j )
Rt = (Rt−1−{Ct})∪{C t ,i ,C t ,j}
![Page 31: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/31.jpg)
Comparison
• Which one is faster? – Agglomerative
• Divisive has a complicated search step
• Which one gives better results? – Divisive
• Agglomerative makes only local observations
![Page 32: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/32.jpg)
Using cost functions
• Given a set of data xi • Define a cost function:
– θ are the cluster parameters – is an assignment matrix – d() is a distance function
J(θ,U) = uijd(xi ,θj )
j∑
i∑
U ∈ {0,1}
![Page 33: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/33.jpg)
An iterative solution
• We can’t use a gradient method – The assignment matrix is binary-valued
• We have two parameters to find θ, U – Fix one and find the other, repeat flip case – Iterate until happy
![Page 34: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/34.jpg)
Overall process
• Initialize θ and iterate: – Estimate U
– Estimate θ
– Repeat until satisfied
uij =
1, if d(xi ,θj ) = mink d(xi ,θk )0, otherwise
⎧⎨⎪⎪⎪
⎩⎪⎪⎪
uij
∂d(xi ,θj )∂θj
= 0i∑
![Page 35: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/35.jpg)
K-means
• Standard and super-popular algorithm
• Finds clusters in terms of region centers
• Optimizes squared Euclidean distance
d(x,θ) = x− θ 2
![Page 36: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/36.jpg)
K-means algorithm
• Initialize k means µ • Iterate
– Assign samples xi to closest mean µ
– Estimate µ from assigned samples xi
• Repeat until convergence
![Page 37: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/37.jpg)
Example run – step 1
−6 −4 −2 0 2 4 6 8−6
−5
−4
−3
−2
−1
0
1
2
![Page 38: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/38.jpg)
Example run – step 2
−6 −4 −2 0 2 4 6 8−6
−5
−4
−3
−2
−1
0
1
2
![Page 39: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/39.jpg)
Example run – step 3
−6 −4 −2 0 2 4 6 8−6
−5
−4
−3
−2
−1
0
1
2
![Page 40: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/40.jpg)
Example run – step 4
−6 −4 −2 0 2 4 6 8−6
−5
−4
−3
−2
−1
0
1
2
![Page 41: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/41.jpg)
Example run – step 5
−6 −4 −2 0 2 4 6 8−6
−5
−4
−3
−2
−1
0
1
2
![Page 42: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/42.jpg)
How well does it work?
• Converges to a minimum of cost function – Not for all distances though!
• Is heavily biased by starting positions – Various initialization tricks
• It’s pretty fast!
![Page 43: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/43.jpg)
K-Means on El Capitan
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Red
Green
Blue
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RedGreen
Blue
![Page 44: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/44.jpg)
K-means on El Capitan
![Page 45: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/45.jpg)
K-means on El Capitan
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Red
Green
Blue
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
10
0.2
0.4
0.6
0.8
1
Red
Green
Blue
![Page 46: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/46.jpg)
K-means on El Capitan
![Page 47: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/47.jpg)
One problem
• K-means struggles with outliers
−5 0 5 10 15−30
−20
−10
0
10
20
30
40
−5 0 5 10 15−30
−20
−10
0
10
20
30
40
![Page 48: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/48.jpg)
K-medoids
• Medoid: – Least dissimilar data point to all others – Not as influenced by outliers as the mean
• Replace means with medoids – Redesign k-means as k-medoids
![Page 49: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/49.jpg)
K-medoids
−10 0 10 20−30
−20
−10
0
10
20
30
40Input data
−10 0 10 20−30
−20
−10
0
10
20
30
40k−means
−10 0 10 20−30
−20
−10
0
10
20
30
40k−medoids
![Page 50: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/50.jpg)
Vector Quantization
• Use of clustering for compression – Keep a codebook (≈ k-means) – Transmit nearest codebook vector instead of
current sample
• We transmit only the cluster index, not the entire data for each sample
![Page 51: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/51.jpg)
Simple example
−4 −2 0 2 4 6
−6
−4
−2
0
−6 −4 −2 0 2 4 6−6
−4
−2
0
2
![Page 52: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/52.jpg)
Vector Quantization in Audio Input sequenceFr
eque
ncy
1
2
3
4
Clu
ster
Coded sequence
Freq
uenc
y
Time
![Page 53: Clusteringluthuli.cs.uiuc.edu/~daf/courses/CS-498-DAF-PS/lecture 12 - k-means... · • Clustering and unsupervised learning • Hierarchical clustering • K-means, K-medoids, VQ](https://reader030.fdocuments.in/reader030/viewer/2022040309/5f182f0451f1e87fb57ca2c4/html5/thumbnails/53.jpg)
Recap
• Hierarchical clustering – Agglomerative, Divisive – Issues with performance
• K-means – Fast and easy – K-medoids for more robustness (but slower)