Clustering Aggregation

1

Nir Geffen 021537980Yotam Margolin 039719729

Supervisor Professor Zeev Volkovitch

Clustering Aggregation

ORT BRAUDE COLLEGE – SE DEPT.

9.12.2011

2

Introduction◦ Goals◦ Clustering◦ Spectral Clustering◦ Cluster Ensembles.◦ Consensus

Spectral Clustering Ensembles◦ Abstract◦ Steps◦ Pseudo

Clustering Aggregation via Self Learning Approach - CASLA◦ Abstract◦ Steps◦ Pseudo

SE Documents

Table of Contents

3

Our goal is to investigate the results of different clustering ensemble techniques and to show the exclusive distinction between the various cluster ensemble and clustering aggregation via self learning.

Introduction – Goals

4

Clustering is a method of unsupervised learning, aimed at partitioning a given data set into subsets named clusters, so that items belonging to the same cluster are similar to each other while items belonging to different clusters are not similar.

Introduction – Clustering

5

While Classic clustering methods gives solid results, they also need elaborate similarity functions and pre-configurations.

To make things easier, Spectral clustering approaches the clustering problem from a different angle. Instead of clustering the data as-is, we project it onto a space to which most noise will be perpendicular (orthogonal).

Finally, we will cluster the results using a classic algorithm to achieve the required results.

Introduction – Spectral Clustering

6

As no clustering algorithm is agreed to be superior for any data set, a common practice is to obtain several cluster partitions of the same data set.

Our next step will be to use a Consensus function to combine the resulting partitions into a new one, thereby increasing the robustness of the clustering process.

Introduction – Cluster Ensembles

7

There are 3 main algorithms to join partitions (or Clusterings). Due to long computing time, we’ll only use greedy algorithms.

These algorithms, also known as Consensus functions, mostly rely on Graph theory.

CSPA, is considered the brute-force. O() time and space complexity.

HGPA, stable, not always optimal. MCLA, high-end solution, yields solid results,

Worthy competitor to HGPA

Introduction - Consensus

8

To make full use of information included in a dataset, a multiway spectral clustering algorithm with joint model is applied to image segmentation.

Overcome the sensitivity of the joint model based multiway spectral clustering to kernel parameter and to produce the robust and stable segmentation results, spectral clustering ensemble algorithm.

Spectral Ensembles - Abstract

9

Produce r individual spectral partitions Use MCLA to obtain Sc MCLA(xi); Use HGPA to obtain Sc HGPA(xi); By ANMI criterion, get the final decision

Sc*(xi) from Sc MCLA(xi) and Sc HGPA(xi).

Spectral Ensembles - Steps

10

Being a central task in many research fields, numerous clustering algorithms have been developed and analyzed.

However, no clustering algorithm is agreed to be superior for any data set.

The performance of a clustering algorithm depends greatly on characteristics of the given data set and on parameters used by the algorithm, such as the desired number of clusters in a partition.

CASLA - Motivation

11

Use various partitions of the same data set in order to define a new metric on the data set.

Using the new metric as an enhanced input for a clustering algorithm will produce better and more robust partitions.

This process can be done repeatedly, where in each step the metric is updated using the original data as well as the new cluster partition.

CASLA - Abstract

12

1. Let R be an n x n distance matrix based on X (e.g., R = (XTX)1/2 for the Euclidean distance).2. Determine C, the desired number of clusters.3. Create cluster C-partitions Π1,…., Πm using m clustering methods,

with R as the metric.4. Compute i

j and Ʃij for any cluster πi

j in any Πi.

5. Recompute A using Equation (8).6. Set R = XTAX.7. Repeat until R converges:

8. Create a cluster partition Π of X using some clustering method, with R as the metric.9. Compute j and Ʃj for any cluster πj in Π.

10. Recompute A using Equation (8) (for m = 1).11. Set R = XTAX.

12. Output Π.

CASLA – Steps (exterior metric update)

13

1. Let R be an n x n distance matrix based on X . 2. Determine C, the desired number of clusters.3. Initialize C random clusters.4. Compute the cluster centroids c1,…, cC.

5. Repeat until R converges: 6. Assign each data element xr to the cluster πj

such that ||xr – cj||R is minimized.

7. Compute j and Ʃj for any cluster πj in Π.

8. Recompute A using Equation (8) (for m = 1).9. Set R = XTAX.

10. Output Π.

CASLA – Steps (interior metric update)

14

Use Case

15

SE Documents

16

Input file (choose) Run clustering by Zeev, run clustering by

Chinese, different threads. Show ANMI criterion for each. Show colored

graph for each. Show statistics – Time eval per round, diff

ANMI for diff methods, STD for cluster size. History Tab for prev results.

GUI - TODO

17

[1] Zeev article draft *[2] Spectral Clustering Ensemble for Image Segmentation, Xiuli, Wanggen & Licheng.[3] Eyal David[4] Dhilon[5] Sterhl

References

THE END!

Clustering Aggregation

Documents

Transcript of Clustering Aggregation