Clustering Aggregation

18
Nir Geffen 021537980 Yotam Margolin 039719729 Supervisor Professor Zeev Volkovitch 1 Clustering Aggregation ORT BRAUDE COLLEGE – SE DEPT . 9.12.2011

description

Clustering Aggregation. Nir Geffen 021537980 Yotam Margolin 039719729 SupervisorProfessor Zeev Volkovitch. ORT BRAUDE COLLEGE – SE DEPT. 9.12.2011. Table of Contents. Introduction Goals Clustering Spectral Clustering Cluster Ensembles. Consensus Spectral Clustering Ensembles - PowerPoint PPT Presentation

Transcript of Clustering Aggregation

Page 1: Clustering Aggregation

1

Nir Geffen 021537980Yotam Margolin 039719729

Supervisor Professor Zeev Volkovitch

Clustering Aggregation

ORT BRAUDE COLLEGE – SE DEPT.

9.12.2011

Page 2: Clustering Aggregation

2

Introduction◦ Goals◦ Clustering◦ Spectral Clustering◦ Cluster Ensembles.◦ Consensus

Spectral Clustering Ensembles◦ Abstract◦ Steps◦ Pseudo

Clustering Aggregation via Self Learning Approach - CASLA◦ Abstract◦ Steps◦ Pseudo

SE Documents

Table of Contents

Page 3: Clustering Aggregation

3

Our goal is to investigate the results of different clustering ensemble techniques and to show the exclusive distinction between the various cluster ensemble and clustering aggregation via self learning.

Introduction – Goals

Page 4: Clustering Aggregation

4

Clustering is a method of unsupervised learning, aimed at partitioning a given data set into subsets named clusters, so that items belonging to the same cluster are similar to each other while items belonging to different clusters are not similar.

Introduction – Clustering

Page 5: Clustering Aggregation

5

While Classic clustering methods gives solid results, they also need elaborate similarity functions and pre-configurations.

To make things easier, Spectral clustering approaches the clustering problem from a different angle. Instead of clustering the data as-is, we project it onto a space to which most noise will be perpendicular (orthogonal).

Finally, we will cluster the results using a classic algorithm to achieve the required results.

Introduction – Spectral Clustering

Page 6: Clustering Aggregation

6

As no clustering algorithm is agreed to be superior for any data set, a common practice is to obtain several cluster partitions of the same data set.

Our next step will be to use a Consensus function to combine the resulting partitions into a new one, thereby increasing the robustness of the clustering process.

Introduction – Cluster Ensembles

Page 7: Clustering Aggregation

7

There are 3 main algorithms to join partitions (or Clusterings). Due to long computing time, we’ll only use greedy algorithms.

These algorithms, also known as Consensus functions, mostly rely on Graph theory.

CSPA, is considered the brute-force. O() time and space complexity.

HGPA, stable, not always optimal. MCLA, high-end solution, yields solid results,

Worthy competitor to HGPA

Introduction - Consensus

Page 8: Clustering Aggregation

8

To make full use of information included in a dataset, a multiway spectral clustering algorithm with joint model is applied to image segmentation.

Overcome the sensitivity of the joint model based multiway spectral clustering to kernel parameter and to produce the robust and stable segmentation results, spectral clustering ensemble algorithm.

Spectral Ensembles - Abstract

Page 9: Clustering Aggregation

9

Produce r individual spectral partitions Use MCLA to obtain Sc MCLA(xi); Use HGPA to obtain Sc HGPA(xi); By ANMI criterion, get the final decision

Sc*(xi) from Sc MCLA(xi) and Sc HGPA(xi).

Spectral Ensembles - Steps

Page 10: Clustering Aggregation

10

Being a central task in many research fields, numerous clustering algorithms have been developed and analyzed.

However, no clustering algorithm is agreed to be superior for any data set.

The performance of a clustering algorithm depends greatly on characteristics of the given data set and on parameters used by the algorithm, such as the desired number of clusters in a partition.

CASLA - Motivation

Page 11: Clustering Aggregation

11

Use various partitions of the same data set in order to define a new metric on the data set.

Using the new metric as an enhanced input for a clustering algorithm will produce better and more robust partitions.

This process can be done repeatedly, where in each step the metric is updated using the original data as well as the new cluster partition.

CASLA - Abstract

Page 12: Clustering Aggregation

12

1. Let R be an n x n distance matrix based on X (e.g., R = (XTX)1/2 for the Euclidean distance).2. Determine C, the desired number of clusters.3. Create cluster C-partitions Π1,…., Πm using m clustering methods,

with R as the metric.4. Compute i

j and Ʃij for any cluster πi

j in any Πi.

5. Recompute A using Equation (8).6. Set R = XTAX.7. Repeat until R converges:

8. Create a cluster partition Π of X using some clustering method, with R as the metric.9. Compute j and Ʃj for any cluster πj in Π.

10. Recompute A using Equation (8) (for m = 1).11. Set R = XTAX.

12. Output Π.

CASLA – Steps (exterior metric update)

Page 13: Clustering Aggregation

13

1. Let R be an n x n distance matrix based on X . 2. Determine C, the desired number of clusters.3. Initialize C random clusters.4. Compute the cluster centroids c1,…, cC.

5. Repeat until R converges: 6. Assign each data element xr to the cluster πj

such that ||xr – cj||R is minimized.

7. Compute j and Ʃj for any cluster πj in Π.

8. Recompute A using Equation (8) (for m = 1).9. Set R = XTAX.

10. Output Π.

CASLA – Steps (interior metric update)

Page 14: Clustering Aggregation

14

Use Case

Page 15: Clustering Aggregation

15

SE Documents

Page 16: Clustering Aggregation

16

Input file (choose) Run clustering by Zeev, run clustering by

Chinese, different threads. Show ANMI criterion for each. Show colored

graph for each. Show statistics – Time eval per round, diff

ANMI for diff methods, STD for cluster size. History Tab for prev results.

GUI - TODO

Page 17: Clustering Aggregation

17

[1] Zeev article draft *[2] Spectral Clustering Ensemble for Image Segmentation, Xiuli, Wanggen & Licheng.[3] Eyal David[4] Dhilon[5] Sterhl

References

Page 18: Clustering Aggregation

THE END!