Unsupervised Learning: Clustering
-
Upload
siyamak-jihan -
Category
Documents
-
view
22 -
download
0
description
Transcript of Unsupervised Learning: Clustering
![Page 1: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/1.jpg)
UnsuperviseUnsupervised Learning: d Learning: ClusteringClusteringSome material adapted from slides by Andrew Some material adapted from slides by Andrew
Moore, CMU.Moore, CMU.
Visit Visit http://www.autonlab.org/tutorials/ for forAndrewAndrew’’s repository of Data Mining tutorials.s repository of Data Mining tutorials.
![Page 2: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/2.jpg)
Unsupervised LearningUnsupervised Learning Supervised learning used labeled data pairs Supervised learning used labeled data pairs
(x, y) to learn a function f : X→Y.(x, y) to learn a function f : X→Y. But, what if we donBut, what if we don’’t have labels?t have labels?
No labels = No labels = unsupervised learningunsupervised learning Only some points are labeled = Only some points are labeled = semi-semi-
supervised learningsupervised learning Labels may be expensive to obtain, so we only get Labels may be expensive to obtain, so we only get
a few.a few.
ClusteringClustering is the unsupervised grouping of is the unsupervised grouping of data points. It can be used for data points. It can be used for knowledge knowledge discoverydiscovery..
![Page 3: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/3.jpg)
Clustering DataClustering Data
![Page 4: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/4.jpg)
K-Means ClusteringK-Means Clustering
K-Means ( k , data )• Randomly choose k
cluster center locations (centroids).
• Loop until convergence
• Assign each point to the cluster of the closest centroid.
• Reestimate the cluster centroids based on the data assigned to each.
![Page 5: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/5.jpg)
K-Means ClusteringK-Means Clustering
K-Means ( k , data )• Randomly choose k
cluster center locations (centroids).
• Loop until convergence
• Assign each point to the cluster of the closest centroid.
• Reestimate the cluster centroids based on the data assigned to each.
![Page 6: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/6.jpg)
K-Means ClusteringK-Means Clustering
K-Means ( k , data )• Randomly choose k
cluster center locations (centroids).
• Loop until convergence
• Assign each point to the cluster of the closest centroid.
• Reestimate the cluster centroids based on the data assigned to each.
![Page 7: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/7.jpg)
K-Means AnimationK-Means Animation
Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.
![Page 8: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/8.jpg)
Problems with K-MeansProblems with K-Means VeryVery sensitive to the initial points. sensitive to the initial points.
Do many runs of k-Means, each with Do many runs of k-Means, each with different initial centroids.different initial centroids.
Seed the centroids using a better Seed the centroids using a better method than random. (e.g. Farthest-method than random. (e.g. Farthest-first sampling)first sampling)
Must manually choose k.Must manually choose k. Learn the optimal k for the clustering. Learn the optimal k for the clustering.
(Note that this requires a performance (Note that this requires a performance measure.)measure.)
![Page 9: Unsupervised Learning: Clustering](https://reader035.fdocuments.in/reader035/viewer/2022081603/56813515550346895d9c6bea/html5/thumbnails/9.jpg)
Problems with K-MeansProblems with K-Means How do you tell it which clustering you How do you tell it which clustering you
want?want?
Constrained clustering techniquesConstrained clustering techniques
Same-cluster constraint(must-link)
Different-cluster constraint(cannot-link)