Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms :...

37
Nearest Neighbors-Based Clustering and Classification Methods Mustafa Hajij

Transcript of Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms :...

Page 1: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Nearest Neighbors-Based Clustering and Classification

Methods

Mustafa Hajij

Page 2: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

Page 3: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

Page 4: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

Input K=3, and the pointthe majority vote is orange hence this point is classified orange

Page 5: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

In sklearn :

Input K=3, and the pointthe majority vote is orange hence this point is classified orange

Page 6: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

In sklearn :

If k=1, this classifier simply assigns theThe point to the class of its nearest neighbor

Input K=3, and the pointthe majority vote is orange hence this point is classified orange

Page 7: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

In sklearn :

If k=1, this classifier simply assigns theThe point to the class of its nearest neighbor

What happens when k is huge ?Input K=3, and the pointthe majority vote is orange hence this point is classified orange

Page 8: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

In sklearn :

What happens when k is huge ?

We can choose a different distance function for this classifierto obtain different results –depending on the data this might be desirable.

If k=1, this classifier simply assigns theThe point to the class of its nearest neighbor

Input K=3, and the pointthe majority vote is orange hence this point is classified orange

Page 9: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of K-nearest neighbors: 𝐾 π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

This is an supervised classifier that takes into consideration the nearest k-points to an input point to determine the class that point belongs to. The idea is to use the majority vote of the neighbors of the input points.

Input K=3, and the pointthe majority vote is orange hence this point is classified orange

In sklearn :

If k=1, this classifier simply assigns theThe point to the class of its nearest neighbor

What happens when k is huge ?

We can choose a different distance function for this classifierto obtain different results –depending on the data this might be desirable.

See this sklearn example

Page 10: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Application of Ι›-nearest neighbors: π‘…π‘Žπ‘‘π‘–π‘’π‘  π‘π‘’π‘–π‘”β„Žπ‘π‘œπ‘Ÿπ‘  πΆπ‘™π‘Žπ‘ π‘ π‘–π‘“π‘–π‘’π‘Ÿ

The idea is really similar to before, here instead of consider the closest k-nearest neighbor to determine the class, we Consider all instances within radius Ι› to determine the class of the input point

In sklearn :

Sklearn example

Page 11: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Neighborhood graphs and their relatives (review)

Page 12: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

K-Nearest Neighbor Graph (KNN Graph)

Sklearn implementation

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Page 13: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

K-Nearest Neighbor Graph (KNN Graph)

For a fixed integer k, connect the points x, y in 𝑋 if either 𝑑(π‘₯, 𝑦) ≀ 𝑑(π‘₯, π‘₯π‘˜) or 𝑑(π‘₯, 𝑦) ≀ 𝑑(𝑦, π‘¦π‘˜) where π‘₯π‘˜ , π‘¦π‘˜ are the π‘˜π‘‘β„Ž nearest neighbors of π‘₯, 𝑦 respectively. Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Sklearn implementation

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Page 14: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

K-Nearest Neighbor Graph (KNN Graph)

For a fixed integer k, connect the points x, y in 𝑋 if either 𝑑(π‘₯, 𝑦) ≀ 𝑑(π‘₯, π‘₯π‘˜) or 𝑑(π‘₯, 𝑦) ≀ 𝑑(𝑦, π‘¦π‘˜) where π‘₯π‘˜ , π‘¦π‘˜ are the π‘˜π‘‘β„Ž nearest neighbors of π‘₯, 𝑦 respectively. Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Sklearn implementation

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Example of 2-NN graph Example of 3-NN graphExample of 1-NN graph

The graph that we obtain is called the K-nearest neighbor graph or K-NN graph.

Page 15: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

K-Nearest Neighbor Graph (KNN Graph)

For a fixed integer k, connect the points x, y in 𝑋 if either 𝑑(π‘₯, 𝑦) ≀ 𝑑(π‘₯, π‘₯π‘˜) or 𝑑(π‘₯, 𝑦) ≀ 𝑑(𝑦, π‘¦π‘˜) where π‘₯π‘˜ , π‘¦π‘˜ are the π‘˜π‘‘β„Ž nearest neighbors of π‘₯, 𝑦 respectively. Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Sklearn implementation

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Example of 2-NN graph Example of 3-NN graphExample of 1-NN graph

The graph that we obtain is called the K-nearest neighbor graph or K-NN graph.

Page 16: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

K-Nearest Neighbor Graph (KNN Graph)

For a fixed integer k, connect the points x, y in 𝑋 if either 𝑑(π‘₯, 𝑦) ≀ 𝑑(π‘₯, π‘₯π‘˜) or 𝑑(π‘₯, 𝑦) ≀ 𝑑(𝑦, π‘¦π‘˜) where π‘₯π‘˜ , π‘¦π‘˜ are the π‘˜π‘‘β„Ž nearest neighbors of π‘₯, 𝑦 respectively. Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Sklearn implementation

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Example of 2-NN graph Example of 3-NN graphExample of 1-NN graph

The graph that we obtain is called the K-nearest neighbor graph or K-NN graph.

Page 17: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

For a fixed Ι›, connect the points π‘₯, 𝑦 if 𝑑(π‘₯, 𝑦) ≀ Ι›.

Sklearn implementation

Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Ι›- neighborhood graph

Page 18: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

For a fixed Ι›, connect the points π‘₯, 𝑦 if 𝑑 π‘₯, 𝑦 ≀ Ι›.

Sklearn implementation

Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

This is equivalent to : Insert an edge if the disks around the points intersect each other.

Note that 𝑑 π‘₯, 𝑦 ≀ɛ

2if and only if the two balls surrounding x and y intersect

Ι›- neighborhood graph

Page 19: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

For a fixed Ι›, connect the points π‘₯, 𝑦 if 𝑑 π‘₯, 𝑦 ≀ Ι›.

Sklearn implementation

Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Trying different Ι›

Ι›- neighborhood graph

Page 20: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

For a fixed Ι›, connect the points π‘₯, 𝑦 if 𝑑 π‘₯, 𝑦 ≀ Ι›.

Sklearn implementation

Doing do for all points we obtain a graph 𝐺(𝑋, 𝐸) where 𝐸 is the set of edges connect the points in 𝑋 as described above.

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

Trying different Ι›

Ι›- neighborhood graph

Page 21: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Euclidian Minimal Spanning Tree (EMST)

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one them.

The EMST of X is a minimal spanning tree where the weight of the edge between each pair of

points is the distance between those two points.

Image source-Wikipedia article

Page 22: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one it.

1. Construct the EMST of X.2. Remove the inconsistent edges to obtain a collection of connected components (clusters).3. Repeat step (2) as long as the termination condition is not satisfied.

In this case, an edge in a the tree is called inconsistent if it has a length more than certain given length L

This definition of consistent is not always idea!

Alternatively, we could simply consider the number of desired clusters k as an input

Note that deleting k edges from the spanning tree results in k+1 connected components. In particular when k=1, we obtain 2 subtrees

Page 23: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one it.

1. Construct the EMST of X.2. Remove the inconsistent edges to obtain a collection of connected components (clusters).3. Repeat step (2) as long as the termination condition is not satisfied.

Alternatively, we could simply consider the number of desired clusters k as an input

Note that deleting k edges from the spanning tree results in k+1 connected components. In particular when k=1, we obtain 2 subtrees

In this case, an edge in the tree is called inconsistent if it has a length more than a certain given length L

Page 24: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one it.

1. Construct the EMST of X.2. Remove the inconsistent edges to obtain a collection of connected components (clusters).3. Repeat step (2) as long as the termination condition is not satisfied.

Alternatively, we could simply consider the number of desired clusters k as an input

Note that deleting k edges from the spanning tree results in k+1 connected components. In particular when k=1, we obtain 2 subtrees

In this case, an edge in the tree is called inconsistent if it has a length more than a certain given length L

Page 25: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one it.

1. Construct the EMST of X.2. Remove the inconsistent edges to obtain a collection of connected components (clusters).3. Repeat step (2) as long as the termination condition is not satisfied.

Alternatively, we could simply consider the number of desired clusters k as an input

Note that deleting k edges from the spanning tree results in k+1 connected components. In particular when k=1, we obtain 2 subtrees

In this case, an edge in the tree is called inconsistent if it has a length more than a certain given length L

Page 26: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one it.

1. Construct the EMST of X.2. Remove the inconsistent edges to obtain a collection of connected components (clusters).3. Repeat step (2) as long as the termination condition is not satisfied.

Alternatively, we could simply consider the number of desired clusters k as an input

In this case, an edge in the tree is called inconsistent if it has a length more than a certain given length L

Note that deleting k edges from the spanning tree results in k+1 connected components. In particular when k=1, we obtain 2 subtrees

Page 27: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Suppose that we are given a set of points 𝑋 = 𝑝1, 𝑝2, … , 𝑝𝑛 in 𝑅𝑑 with a distance function 𝑑 defined one it.

1. Construct the EMST of X.2. Remove the inconsistent edges to obtain a collection of connected components (clusters).3. Repeat step (2) as long as the termination condition is not satisfied.

In this case, an edge in the tree is called inconsistent if it has a length more than a certain given length L

Alternatively, we could simply consider the number of desired clusters k as an input

Note that deleting k edges from the spanning tree results in k+1 connected components. In particular when k=1, we obtain 2 subtrees

This definition of consistent is not always ideal!. See here

Page 28: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Input : point cloud and Number of connected components k

Page 29: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Input : point cloud and Number of connected components k Construct EMST

Page 30: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Input : point cloud and Number of connected components k Construct EMST

Delete an edge

Page 31: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Input : point cloud and Number of connected components k Construct EMST

Delete an edge

Page 32: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Input : point cloud and Number of connected components k Construct EMST

Delete an edge

Page 33: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Zahn’s algorithm

Input : point cloud and Number of connected components k Construct EMST

Until the termination criterion is satisfied

Page 34: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Ι›- neighborhood graph

Choose an Ι›And construct the graph

Consider the connected components

the points of each connected component forma cluster

Page 35: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Graph Based Clustering Algorithms : Ι›- neighborhood graph

Choose an Ι›And construct the graph

Consider the connected components

the points of each connected component forma cluster

Page 36: Nearest Neighbors-Based Clustering and Classification MethodsGraph Based Clustering Algorithms : Zahn’s algorithm Suppose that we are given a set of points 𝑋= L1, L2,…, L𝑛

Recall : connected components of a graph

How can we find the connected components of a graph ?See this video for a review

What is a connected component of a graph ?

connected components