Introduction to Machine Learning
Introduction to Machine Learning Amo G. Tong 1
Lecture 13Unsupervised Learning
• K-means Framework
• Cut-based Framework
• Agglomerative Framework
• Divisive Framework
• Some materials are courtesy of Vibhave Gogate , Carlos Guestrin, Dan Klein & Luke Zettlemoyer, Eric Xing, Hastie.
• All pictures belong to their creators.
Introduction to Machine Learning Amo G. Tong 2
Introduction to Machine Learning Amo G. Tong 3
Machine Learning
Machine Learning
Supervised Learning 𝒇(𝒙) Reinforcement Learning
ParametricRegressions vs ClassificationContinuous vs DiscreteLinear vs Non-linear
Methods:Linear regressionDecision TreeNeural network….
Non parametric
Instance-based learning KNN
Unsupervised Learning
Clustering
Introduction to Machine Learning Amo G. Tong 4
Clustering
• Input: some data
• Goal: infer group information
Introduction to Machine Learning Amo G. Tong 5
Clustering
• Input: some data
• Goal: infer group information
• E.g. Group emails, search results, detection styles.
source : http://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/_images/plot_cluster_comparison_11.png
Introduction to Machine Learning Amo G. Tong 6
Clustering
• Input: some data
• Goal: infer group information
• E.g. Group emails, search results, detection styles.
Edge Foci Interest PointsDOI: 10.1109/ICCV.2011.6126263
Introduction to Machine Learning Amo G. Tong 7
Clustering (Eric Xing)
• Input: some data
• Goal: infer group information
• Clustering is subjective.
Introduction to Machine Learning Amo G. Tong 8
Clustering
• Input: some data
• Goal: infer group information
• Clustering is subjective.
• Similarity??
• Output:
• a partition
• Some pattern can reflect the group information.
Introduction to Machine Learning Amo G. Tong 9
Clustering
• Input: some data
• Goal: infer group information
• E.g. Group emails, research results, detection styles.
• We have data but there is no label.
• We do not know how many clusters.
• We do not know which data belongs to which cluster.
• We do not even know if the hidden pattern exists.
• BUT we never give up..
Introduction to Machine Learning Amo G. Tong 10
Clustering
• BUT we never give up..
• Partition based framework
• Hierarchical clustering framework
Introduction to Machine Learning Amo G. Tong 11
K-means Framework
• We have some data.
• We can define (a) the similarity between two instances and (b) the center of a set of instances.
• E.g. Euclidian space (real vector)
• Distance 𝒙𝟏 − 𝒙𝟐2
• similarity=1/distance
• Center of 𝑥1, … , 𝑥𝑛 : ഥ𝒙 =σ 𝒙𝑖
𝑛
Introduction to Machine Learning Amo G. Tong 12
K-means Framework
• We have some data.
• We can define (a) the similarity between two instances and (b) the center of a set of instances.
• Suppose there are 𝑘 clusters.
• Randomly select 𝑘 centers
• Repeat
• Assign each instance to the closest center. (now we have 𝑘 clusters)
• Recompute the center of each cluster.
• Until converge or other criteria meet
Introduction to Machine Learning Amo G. Tong 13
K-means Framework (Bishop)
• Example (Euclidian space)
Suppose k=2.Step 1: random pick two centers
Introduction to Machine Learning Amo G. Tong 14
K-means Framework (Bishop)
• Example (Euclidian space)
Suppose k=2.Step 1: random pick two centersStep 2: assign points to the closest center
Introduction to Machine Learning Amo G. Tong 15
K-means Framework (Bishop)
• Example (Euclidian space)
Suppose k=2.Step 1: random pick two centersStep 2: assign points to the closest centerStep 3: calculate the center of each cluster
Introduction to Machine Learning Amo G. Tong 16
K-means Framework (Bishop)
• Example (Euclidian space)
Suppose 𝑘 = 2.Step 1: random pick two centersStep 2: assign points to the closest centerStep 3: calculate the center of each clusterStep 4: assign points to the closest center
Repeat until converge.
• Example (Graph Segmentation)
• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.
• Informally, identify main elements in an image.
Introduction to Machine Learning Amo G. Tong 17
K-means Framework (Bishop)
Pixel and color.
• Example (Graph Segmentation)
• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.
• Informally, identify main elements in an image.
Introduction to Machine Learning Amo G. Tong 18
K-means Framework (Bishop)
• Example (Graph Segmentation)
• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.
• Informally, identify main elements in an image.
Introduction to Machine Learning Amo G. Tong 19
K-means Framework (Bishop)
• Example (Graph Segmentation)
• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.
• Informally, identify main elements in an image.
Introduction to Machine Learning Amo G. Tong 20
K-means Framework (Bishop)
• Example (Graph Segmentation)
• Formally, partition an image into regions each of which has reasonably homogeneous visual appearance.
• Informally, identify main elements in an image.
Introduction to Machine Learning Amo G. Tong 21
K-means Framework (Bishop)
Introduction to Machine Learning Amo G. Tong 22
K-means Framework
• Repeat
• Update the assignment.
• Update the means (centers).
• Until converge or other criteria meet
Introduction to Machine Learning Amo G. Tong 23
K-means Framework
• Repeat
• Update the assignment.
• Update the means (centers).
• Until converge or other criteria meet
• Given the assignment 𝐶, let 𝐶(𝑥) be the mean (center) of the cluster containing 𝑥. Consider the Euclidian distance.
• Will it converge? Yes!
• Consider a potential function 𝑓 = σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥))
• 𝑓 will never increase and 𝑓 is bounded => it will converge
Introduction to Machine Learning Amo G. Tong 24
K-means Framework
• Given the assignment 𝐶, let 𝐶(𝑥) be the means (center) of the cluster containing 𝑥. Consider the Euclidian distance.
• Repeat
• Update the assignment.
• Update the means (centers).
• Until converge or other criteria meet
• Updating the assignment will not increase 𝒇.
• Recalculating the means will not increase 𝒇.
• For a fixed cluster, which one can minizine the distance sum?
• Try Lagrange Multiplier Method (do it yourself).
𝑓 =
𝑥∈𝐷
dist(𝑥, 𝐶(𝑥))
𝑑𝑖𝑠𝑡 = 𝒙𝟏 − 𝒙𝟐2
Introduction to Machine Learning Amo G. Tong 25
K-means Framework
• Simple
• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)
• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)
Introduction to Machine Learning Amo G. Tong 26
K-means Framework
• Simple
• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)
• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)
• K-means may converge to local optimal
• How many clusters are there?
• Distance between clusters.
• How to define mean? What if the attributes are not real numbers
• Cannot handle noise
• Not suitable for non-convex patterns. (recall the pattern of knn)
Introduction to Machine Learning Amo G. Tong 27
K-means Framework
• Simple
• Intuitive, minimize σ𝑥∈𝐷 dist(𝑥, 𝐶(𝑥)) (implicitly)
• Not time consuming, O(tkn) (k: clusters, t: iterations, n: instance #)
• K-means may converge to local optima
• How many clusters are there?
• Distance between clusters.
• How to define mean? What if the attributes are not real numbers
• Cannot handle noise
• Not suitable for non-convex patterns. (recall the pattern of knn)
Introduction to Machine Learning Amo G. Tong 28
Cut-based Clustering
• Two intuitions behind a good clustering.
• (a) weaken the connection between objects in different clusters
• (b) strengthen the connection between objects within a cluster
Introduction to Machine Learning Amo G. Tong 29
Cut-based Clustering
• Two intuitions behind a good clustering.
• (a) weaken the connection between objects in different clusters
• (b) strengthen the connection between objects within a cluster
• Ground set 𝑈 = {𝑣1, … 𝑣𝑛}
• Similarity between two elements 𝑠𝑖𝑚(𝑣𝑖 , 𝑣𝑗)
• A partition 𝐶1, … , 𝐶𝑘 of 𝑈
• Inner-sim(𝐶𝑖) = σ𝑢,𝑣∈ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣)
• Inter-sim(𝐶𝑖)= σ𝑢∈ 𝐶𝑖, 𝑣∉ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣) (cut)
How to measure the goodness of a cluster?
Cost of a clustering 𝐶1, … , 𝐶𝑘
σInter−sim(𝐶𝑖)
Inner−sim(𝐶𝑖)
Introduction to Machine Learning Amo G. Tong 30
Cut-based Clustering
• Two intuitions behind a good clustering.
• Ground set 𝑈 = {𝑣1, … 𝑣𝑛}
• Similarity between two elements 𝑠𝑖𝑚(𝑣𝑖 , 𝑣𝑗)
• A partition 𝐶1, … , 𝐶𝑘 of 𝑈
• Inner-sim(𝐶𝑖) = σ𝑢,𝑣∈ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣)
• Inter-sim(𝐶𝑖)= σ𝑢∈ 𝐶𝑖, 𝑣∉ 𝐶𝑖𝑠𝑖𝑚(𝑢, 𝑣) (cut)
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
Optimal solution exists but it is hard to find.Enumerating?Polynomial time?
Introduction to Machine Learning Amo G. Tong 31
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Initialize 𝐶1, … , 𝐶𝑘 randomly.
• Repeat until converge
• Unlock all elements
• Repeat until all elements are locked.
• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any
• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.
• Lock 𝑣.
Introduction to Machine Learning Amo G. Tong 32
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Example. k=2
1
32
Cost= 0+ (3+1)/2 =2
ab
c
• Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1• Or you can do some smoothing
by assign a base similarity.
Introduction to Machine Learning Amo G. Tong 33
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Example. k=2
1
32
Cost= 0+ (3+1)/2 =2
ab
c
If move c, cost=(1+2)/3+ 0 =1
1
32
ab
c
Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1
Introduction to Machine Learning Amo G. Tong 34
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Example. k=2
1
32
Cost= 0+ (3+1)/2 =2
ab
c
If move c, cost=(1+2)/3+ 0 =1
If move b, cost=(3+2)/1+ 0 =5
1
32
ab
c
1
32
ab
c
Inner−cost(𝐶𝑖)=∞ if |𝐶𝑖|=1
Introduction to Machine Learning Amo G. Tong 35
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Heuristic algorithm.
• May not be optimal
• Is the solution good?
• Reasonable. Cost is iteratively decreased.
• Does it converge?
• Yes.
Introduction to Machine Learning Amo G. Tong 36
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Initialize 𝐶1, … , 𝐶𝑘 randomly.
• Repeat until converge (converge?)
• Unlock all elements
• Repeat until all elements are locked. (converge?)
• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any
• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.
• Lock 𝑣.
Introduction to Machine Learning Amo G. Tong 37
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Heuristic algorithm.
• May not optimal
• Is the solution good?
• Reasonable. Cost is iteratively decreased.
• Does it converge?
• Yes.
• Any other choices?
• Yes
Introduction to Machine Learning Amo G. Tong 38
Cut-based Clustering
• Find a clustering that minimizes 𝐜𝐨𝐬𝐭 = σInter−sim(𝐶𝑖)Inner−sim(𝐶𝑖)
• An algorithm
• Initialize 𝐶1, … , 𝐶𝑘 randomly.
• Repeat until converge
• Unlock all elements
• Repeat until all elements are locked.
• Randomly select one 𝐶𝑖• Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if any
• Move 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕 is maximally decreased.
• Lock 𝑣.
You may select the one that, after considered, can maximally decrease the cost
Introduction to Machine Learning Amo G. Tong 39
Cut-based Clustering
• Compare to k-means
• The number of clusters is known in advance.
• Need some initializations
• Iteratively improve the solution.
• Cut-based: consider both the inter and inner similarity
• K-means: only consider the inner similarity.
Introduction to Machine Learning Amo G. Tong 40
Agglomerative Clustering
• Idea: combine small clusters.
Introduction to Machine Learning Amo G. Tong 41
Agglomerative Clustering
• Idea: combine small clusters.
• Framework:
• Maintain a set of clusters
• Initially, each instance is one cluster
• Repeat
• Merge two closest clusters
• Until there is one cluster
• Key: how to define closeness of clusters?
Introduction to Machine Learning Amo G. Tong 42
Agglomerative Clustering
• Key: how to define closeness of clusters?
• First, define the closeness of each pair.
• The closeness of the clusters can be• The closest pair (single-link clustering)
• The farthest pair (complete-link clustering, diameter)
• Sum of all pairs? Average of all pairs.
• Ward’s method
• If you can define the distance within a cluster, find the pair of cluster that results minimum increase in in-cluster distance.
Introduction to Machine Learning Amo G. Tong 43
Agglomerative Clustering (Hastie)
• The result of agglomerative clustering hierarchy of clusters.
dendrogram
So what if we want k clusters?
Introduction to Machine Learning Amo G. Tong 44
Agglomerative Clustering
Detect
outliers.
Introduction to Machine Learning Amo G. Tong 45
Divisive Clustering
• Idea: split a large cluster into two
Introduction to Machine Learning Amo G. Tong 46
Divisive Clustering
• Idea: split a large cluster into two
• Framework:
• Maintain a set of clusters
• Initially, all instances form one cluster
• Repeat
• Split one cluster into two
• Until each cluster is a singleton.
• Key: Which cluster should we split? How to split it?
Introduction to Machine Learning Amo G. Tong 47
Divisive Clustering (Andrea)
Key: Which cluster should we split? How to split it?
Introduction to Machine Learning Amo G. Tong 48
Divisive Clustering
• Idea: split a large cluster into two
• Framework:
• Maintain a set of clusters
• Initially, all instances form one cluster
• Repeat
• Split one cluster into two
• Until each cluster is a singleton.
Which cluster should we split?
If we grow the entire dendrogram and your splitting rule is local, it does not matter.
Otherwise, you may select the one with the highest cost.
How to split it? (many choices)Equally partition it such that the cost is minimized.
DIANA.
Introduction to Machine Learning Amo G. Tong 49
Divisive Clustering
• Idea: split a large cluster into two
• Framework:
• Maintain a set of clusters
• Initially, all instances form one cluster
• Repeat
• Split one cluster into two
• Until each cluster is a singleton.
Which cluster should we split?
If we grow the entire dendrogram and your splitting rule is local, it does not matter.
Otherwise, you may select the one with the highest cost.
How to split it? (many choices)Equally partition it such that the cost is minimized.
DIANA
DIANA:To divide the selected cluster, the algorithm first looks for its most disparate observation (i.e., which has the largest average dissimilarity to the other observations of the selected cluster). This observation initiates the "splinter group". In subsequent steps, the algorithm reassigns observations that are closer to the "splinter group" than to the "old party". The result is a division of the selected cluster into two new clusters.
Introduction to Machine Learning Amo G. Tong 50
Hierarchal Clustering - Summary
• No need to specify the number of clusters in advance.
• Can be time consuming, time complexity of at least O(𝑛2), where n is the number of total objects
• Hierarchical structure stands for intuitions for some domains.
• But the interpretation is subjective.
Introduction to Machine Learning Amo G. Tong 51
Summary
• K-means
• Cut-based measurements
• Agglomerative clustering
• Divisive clustering
Spring 2019 Amo G. Tong 52
Equal-sized k-clustering
Cut-based k-clustering.
cost = σInter−cost(𝐶𝑖)
Inner−cost(𝐶𝑖)
Initialize 𝐶1, … , 𝐶𝑘 randomly.Repeat until converge (converge?)
Unlock all elementsRepeat until all elements are locked. (converge?)
Randomly select one 𝐶𝑖Randomly select one unlocked element 𝑣 ∈ 𝐶𝑖 if anyMove 𝑣 to the cluster such that 𝒄𝒐𝒔𝒕is maximally decreased.Lock 𝑣.
Given a set of 𝑘 ∗ 𝑚 elements, we want a equal-sized k-clustering. That is, each cluster has exactly 𝑚 elements.
Please describe a cut-based algorithm for such a purpose.
Hint: How to take account of the size of the clusters?
Top Related