Unsupervised Learning I: K-Means...

57
Unsupervised Learning I: K-Means Clustering Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp. 487-515, 532-541, 546-552 (http://www- users.cs.umn.edu /~ kumar / dmbook /ch8.pdf )

Transcript of Unsupervised Learning I: K-Means...

Page 1: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Unsupervised Learning I:

K-Means Clustering

Reading:

Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp. 487-515, 532-541, 546-552

(http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf)

Page 2: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Unsupervised learning = No labels on training examples!

Main approach: Clustering

Page 3: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Example: Optdigits data set

Page 4: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Optdigits features f1 f2 f3 f4 f5 f6 f7 f8

f9

x = (f1, f2, ..., f64) = (0, 2, 13, 16, 16, 16, 2, 0, 0, ...) Etc. ..

Page 5: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Partitional Clustering of Optdigits

Feature 1

Feature 2

Feature 3 64-dimensional space

Page 6: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Partitional Clustering of Optdigits

Feature 1

Feature 2

Feature 3 64-dimensional space

Page 7: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Hierarchical Clustering of Optdigits

Feature 1

Feature 2

Feature 3 64-dimensional space

Page 8: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Hierarchical Clustering of Optdigits

Feature 1

Feature 2

Feature 3 64-dimensional space

Page 9: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Hierarchical Clustering of Optdigits

Feature 1

Feature 2

Feature 3 64-dimensional space

Page 10: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Issues for clustering algorithms

•  How to measure distance between pairs of instances?

•  How many clusters to create?

•  Should clusters be hierarchical? (I.e., clusters of clusters)

•  Should clustering be “soft”? (I.e., an instance can belong to different clusters, with “weighted belonging”)

Page 11: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Most commonly used (and simplest) clustering algorithm:

K-Means Clustering

Page 12: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials

Page 13: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials

Page 14: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials

Page 15: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Adapted from Andrew Moore, http://www.cs.cmu.edu/~awm/tutorials

Page 16: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing
Page 17: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

K-means clustering algorithm

Page 18: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

K-means clustering algorithm

Typically, use mean of points in cluster as centroid

Page 19: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

K-means clustering algorithm

Distance metric: Chosen by user. For numerical attributes, often use L2 (Euclidean) distance. Centroid of a cluster here refers to the mean of the points in the cluster.

d(x, y) = (xi − yi )2

i=1

n

Page 20: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Example: Image segmentation by K-means clustering by color From http://vitroz.com/Documents/Image%20Segmentation.pdf

K=5, RGB space

K=10, RGB space

Page 21: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

K=5, RGB space

K=10, RGB space

Page 22: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

K=5, RGB space

K=10, RGB space

Page 23: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

•  A text document is represented as a feature vector of word frequencies

•  Distance between two documents is the cosine of the angle between their corresponding feature vectors.

Clustering text documents

Page 24: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Figure 4. Two-dimensional map of the PMRA cluster solution, representing nearly 29,000 clusters and over two million articles.

Boyack KW, Newman D, Duhon RJ, Klavans R, et al. (2011) Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE 6(3): e18029. doi:10.1371/journal.pone.0018029 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0018029

Page 25: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Exercise 1

Page 26: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

How to evaluate clusters produced by K-means?

•  Unsupervised evaluation

•  Supervised evaluation

Page 27: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Unsupervised Cluster Evaluation

We don’t know the classes of the data instances

Let C denote a clustering (i.e., set of K clusters that is the result of a clustering algorithm) and let ci denote a cluster in C. Let |ci| denote the number of elements in ci. Clustering C:

How can we quantify how good a clustering this is?

|c1|= 9

|c2|= 6

|c3|= 6

c1 c2

c3

Page 28: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

We want to minimize the distance between elements of c and the centroid µc . coherence of each cluster c – i.e., minimize Mean Square Error (mse):

mse(c) =d(x,

x∈c∑ µc )

2

| c |

Average mse (C) =mse(c)

c∈C∑

K

c1 c2

c3

Page 29: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

We want to minimize the distance between elements of c and the centroid µc . coherence of each cluster c – i.e., minimize Mean Square Error (mse):

mse(c) =d(x,

x∈c∑ µc )

2

| c |

Average mse (C) =mse(c)

c∈C∑

K Note: The assigned reading uses sum square error rather than mean square error.

c1 c2

c3

Page 30: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

We also want to maximize pairwise separation of each cluster.

c1 c2

c3

Page 31: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

We also want to maximize pairwise separation of each cluster. That is, maximize Mean Square Separation (mss):

mss (C) =d(µi,µ j

all distinct pairs of clusters i, j∈C (i≠ j )∑ )2

K(K −1) / 2

c1 c2

c3

Page 32: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

We also want to maximize pairwise separation of each cluster. That is, maximize Mean Square Separation (mss):

mss (C) =d(µi,µ j

all distinct pairs of clusters i, j∈C (i≠ j )∑ )2

K(K −1) / 2

c1 c2

c3

Page 33: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Exercises 2-3

Page 34: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Supervised Cluster Evaluation

Suppose we know the classes of the data instances: black, blue, green

c1 c2

c3

Page 35: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Supervised Cluster Evaluation

Suppose we know the classes of the data instances: black, blue, green

Entropy of a cluster: The degree to which a cluster consists of objects of a single class.

c1 c2

c3

Page 36: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

entropy(ci ) = − pi, jj=1

|Classes|

∑ log2 pi, j

wherepi, j = probability that a member of cluster i belongs to class j

=mi, j

mi

, where mi, j is the number of instances in cluster i with class j

and mi is the number of instances in cluster i

c1 c2

c3

entropy(c1) = −39log2

39+49log2

49+29log2

29

⎝⎜

⎠⎟=1.74

entropy(c2 ) = −16log2

16+46log2

46+16log2

16

⎝⎜

⎠⎟= 0.78

entropy(c3) = −46log2

46+26log2

26

⎝⎜

⎠⎟= 0.45

Page 37: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Mean entropy of a clustering: Average entropy over all clusters in the clustering, weighted by number of elements in each cluster: where mi is the number of instances in cluster ci and m is the total number of instances in the clustering.

mean entropy(C) = mi

mi=1

K

∑ entropy(ci )

Page 38: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Mean entropy of a clustering: Average entropy over all clusters in the clustering, weighted by number of elements in each cluster: where mi is the number of instances in cluster ci and m is the total number of instances in the clustering.

mean entropy(C) = mi

mi=1

K

∑ entropy(ci )

c1 c2

c3

entropy(c1) = −39log2

39+49log2

49+29log2

29

⎝⎜

⎠⎟=1.74

entropy(c2 ) = −16log2

16+46log2

46+16log2

16

⎝⎜

⎠⎟= 0.78

entropy(c3) = −46log2

46+26log2

26

⎝⎜

⎠⎟= 0.45

Page 39: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Mean entropy of a clustering: Average entropy over all clusters in the clustering, weighted by number of elements in each cluster: where mi is the number of instances in cluster ci and m is the total number of instances in the clustering.

mean entropy(C) = mi

mi=1

K

∑ entropy(ci )

c1 c2

c3

entropy(c1) = −39log2

39+49log2

49+29log2

29

⎝⎜

⎠⎟=1.74

entropy(c2 ) = −16log2

16+46log2

46+16log2

16

⎝⎜

⎠⎟= 0.78

entropy(c3) = −46log2

46+26log2

26

⎝⎜

⎠⎟= 0.45

mean entropy(C) = 921

(1.74)+ 621

(0.78)+ 621

(0.45) =1.1

Page 40: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

What is the mean entropy of this clustering?

c1 c2

c3

Page 41: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Entropy Exercise

Page 42: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Issues for K-means

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 43: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Issues for K-means

•  The algorithm is only applicable if the mean is defined. –  For categorical data, use K-modes: The centroid is

represented by the most frequent values.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 44: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Issues for K-means

•  The algorithm is only applicable if the mean is defined. –  For categorical data, use K-modes: The centroid is

represented by the most frequent values.

•  The user needs to specify K.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 45: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Issues for K-means

•  The algorithm is only applicable if the mean is defined. –  For categorical data, use K-modes: The centroid is

represented by the most frequent values.

•  The user needs to specify K.

•  The algorithm is sensitive to outliers –  Outliers are data points that are very far away from other

data points.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 46: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Issues for K-means

•  The algorithm is only applicable if the mean is defined. –  For categorical data, use K-modes: The centroid is

represented by the most frequent values.

•  The user needs to specify K.

•  The algorithm is sensitive to outliers –  Outliers are data points that are very far away from other

data points. –  Outliers could be errors in the data recording or some

special data points with very different values.

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 47: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

CS583, Bing Liu, UIC

Issues for K-means: Problems with outliers

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 48: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Dealing with outliers •  One method is to remove some data points in the clustering

process that are much further away from the centroids than other data points. –  Expensive –  Not always a good idea!

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 49: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Dealing with outliers •  One method is to remove some data points in the clustering

process that are much further away from the centroids than other data points. –  Expensive –  Not always a good idea!

•  Another method is to perform random sampling. Since in

sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small. –  Assign the rest of the data points to the clusters by distance or

similarity comparison, or classification

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 50: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

CS583, Bing Liu, UIC

Issues for K-means (cont …) •  The algorithm is sensitive to initial seeds.

+

+

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Page 51: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

CS583, Bing Liu, UIC

•  If we use different seeds: good results

+ +

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Issues for K-means (cont …)

Page 52: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

CS583, Bing Liu, UIC

•  If we use different seeds: good results Often can improve K-means results by doing several random restarts.

+ +

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Issues for K-means (cont …)

Page 53: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

CS583, Bing Liu, UIC

•  If we use different seeds: good results Often can improve K-means results by doing several random restarts.

+ +

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Issues for K-means (cont …)

Often useful to select instances from data as initial seeds.

Page 54: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

CS583, Bing Liu, UIC

•  The K-means algorithm is not suitable for discovering clusters that are not hyper-ellipsoids (or hyper-spheres).

+

Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

Issues for K-means (cont …)

Page 55: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Other Issues •  What if a cluster is empty?

–  Choose a replacement centroid

•  At random, or

•  From cluster that has highest mean square error

•  How to choose K ?

•  The assigned reading discusses several methods for improving a clustering with “postprocessing”.

Page 56: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

Choosing the K in K-Means •  Hard problem! Often no “correct” answer for unlabeled data

•  Many proposed methods! Here are a few:

•  Try several values of K, see which is best, via cross-validation. –  Metrics: mean square error, mean square separation, penalty for too

many clusters [why?]

•  Start with K = 2. Then try splitting each cluster. –  New means are one sigma away from cluster center in direction of greatest

variation.

–  Use similar metrics to above.

Page 57: Unsupervised Learning I: K-Means Clusteringweb.cecs.pdx.edu/~mm/MachineLearningWinter2019/KMeansClustering.pdfFigure 4. Two-dimensional map of the PMRA cluster solution, representing

•  “Elbow” method: –  Plot average MSE vs. K. Choose K at which MSE (or other

metric) stops decreasing abruptly.

–  However, sometimes no clear “elbow”

“elbow”