Cluster labeling fcl_weeklymeeting30102013

29
[email protected] SEC http://www.ad-exchange.fr/ Cluster Labeling with Double Application of SOM Vahid Moosavi Researcher at Future Cities Laboratory PhD Student at Chair for Computer Aided Architectural Design (CAAD), ETH Zurich, Ludger Hovestadt 1 November 2013

Transcript of Cluster labeling fcl_weeklymeeting30102013

Page 1: Cluster labeling fcl_weeklymeeting30102013

[email protected]

http://www.ad-exchange.fr/

Cluster Labeling with Double Application of SOM

Vahid Moosavi

Researcher at Future Cities Laboratory

PhD Student at Chair for Computer Aided Architectural Design (CAAD), ETH Zurich, Ludger Hovestadt

1 November 2013

Page 2: Cluster labeling fcl_weeklymeeting30102013

Outline

• How to explain clustering and clusters?

• Cluster labeling problem

• Current methods and the proposed method

• The Case: Finding Thematic Research Areas within FCL

Page 3: Cluster labeling fcl_weeklymeeting30102013

How to explain clustering and clusters?

• Conceptually Clusters are– Meant to show temporal and evolving identities

– To show emergent concepts from lower level concepts

– Bottom up from instances Vs. external references

– Decoupling identities from objects, things and instances and to create new dimensions in between• Brands,

• Spoken languages and dialects

• Academic disciplines

• Genres of movies, music classes , ….

3

Page 4: Cluster labeling fcl_weeklymeeting30102013

How to explain clustering and clusters?

• And Technically Clusters – Show the direction of eigenvectors of data matrix– Are the result of transformations, which can be linear or nonlinear

• And what is happening with new phenomena like Big Data and information sharing is that externally defined (some times imposed!) references are not sufficient any more.– Academic disciplines– Software industries : Google Android Vs. Microsoft windows XP!

The goal: How to make the process of clustering and concept generation, computationally practical for individuals?

4

Page 5: Cluster labeling fcl_weeklymeeting30102013

Cluster labeling problem

• Topic Modeling in Natural Language Processing– Document Clustering

– Automatic Sentiment Analysis

• Market Segmentation and Customer Clustering– CRM data

– City Call center Data (Mood of the City)

• Enterprise Knowledge Modeling Using Text Archives

Page 6: Cluster labeling fcl_weeklymeeting30102013

Topic Modeling in Natural Language ProcessingThe Expression of Emotions in 20th Century BooksAlberto Acerbi, et. al. 2013

6

Page 7: Cluster labeling fcl_weeklymeeting30102013

Topic Modeling in Natural Language ProcessingThe Expression of Emotions in 20th Century BooksAlberto Acerbi, et. al. 2013

7

Page 8: Cluster labeling fcl_weeklymeeting30102013

Clustering and Cluster Labeling In terms of Geography-Andre Skupin (2005)

Page 9: Cluster labeling fcl_weeklymeeting30102013

Clustering and Cluster Labeling

A Semantic Landscape of the Last.fm Music

Page 10: Cluster labeling fcl_weeklymeeting30102013

Current Methods

• Differential Cluster Labeling

– Mutual Information

– Chi-Squared Selection

• Cluster-Internal Labeling

– Centroid Labels

– Title Labels

– External knowledge labels

Page 11: Cluster labeling fcl_weeklymeeting30102013

The Proposed Method

• Use of SOM as a nonlinear Data-Clustering (transformation) and visualization technique

• Use of the concept of tensor to produce required data matrices for SOM

Page 12: Cluster labeling fcl_weeklymeeting30102013

The Proposed Method

Tensors

(multi-aspect data representation)

Page 13: Cluster labeling fcl_weeklymeeting30102013

13

Aspect A Features

Ob

ject

s

Ob

ject

s

Aspect A Features

• Wavelet Decomposition• One original object (one signal ) is

decomposed to several aspects (different scales or frequencies)

Page 14: Cluster labeling fcl_weeklymeeting30102013

The Proposed Method

SOM (as a nonlinear data transformation: here used for clustering and visualization)

Page 15: Cluster labeling fcl_weeklymeeting30102013

SOM is a Generic Machine works normally with Matrices of data

15

10 records, 100+ dimensions

200+ records, 100+ dimensions

200+ records, 100+ dimensionsBut with clear clusters

Page 16: Cluster labeling fcl_weeklymeeting30102013

The Proposed Method

Features

Ob

ject

s

XOriginal Data set

SOM Clustering

Clusters Vector

Ob

ject

s

Y

Ob

ject

s

Features

A second Order Tensor

Clusters Vector

Feat

ure

s

Z

Tensor

SOM

Visualization of the main concepts (potential labels) within each cluster

Page 17: Cluster labeling fcl_weeklymeeting30102013

The Case: Finding Thematic Research Areas within FCL

17

Page 18: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

18

Each row vector shows one persons interest related to those selected features

Features

Ob

ject

s

X

Page 19: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

19

First plot (each curve is one person)

Page 20: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

20SOM

Page 21: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

21

SOM + K means clustering

5 clusters detected

Now…What are the main concepts within each cluster? How to label these clusters?

Clusters Vector

Ob

ject

s

Y

Page 22: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

22

Tensor based transformation

A simple visualization of each cluster regarding to all the features

Clusters Vector

Ob

ject

s

Y

Features

Ob

ject

s

X

Clusters Vector

Feat

ure

s

Z

Page 23: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

23

Tensor based transformation + another SOM

Page 24: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

24

Tensor based transformation + another SOM

Page 25: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

25

Tensor based transformation + another SOM

Page 26: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

26

Tensor based transformation + another SOM

Page 27: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

27

Tensor based transformation + another SOM

Page 28: Cluster labeling fcl_weeklymeeting30102013

Finding Thematic Research Areas within FCL

28

Tensor based transformation + another SOM

Page 29: Cluster labeling fcl_weeklymeeting30102013

Thanks!