Cluto presentation

16

Click here to load reader

Transcript of Cluto presentation

Page 1: Cluto presentation

CLUTOA Clustering Toolkit

By

Roseline Antai

Page 2: Cluto presentation

• CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters.

Page 3: Cluto presentation

Algorithms of CLUTO

• Vcluster

• Scluster

Major difference: Input

Vcluster: actual mutidimensional representation of the objects to be clustered.

Scluster: The similarity matrix (or graph) between these objects.

Page 4: Cluto presentation

Calling Sequence

vcluster [optional parameters] MatrixFileNclusters

scluster [optional parameters] MatrixFileNClusters

Page 5: Cluto presentation

Optional Parameters

• Standard specification

-paramname or –paramname = value

• Three categories:

– Clustering algorithm parameters

– Reporting and Analysis parameters

– Cluster Visualization parameters

Page 6: Cluto presentation

Clustering algorithm parameters

• Control how CLUTO computes the clustering solution.

• Examples

1. -clmethod=string ( rb, agglo,direct,graph, etc)2. -sim = string (cos,corr,dist,jacc)3. -crfun = string (i1,i2 etc)4. -fulltree

Page 7: Cluto presentation

Reporting and Analysis Parameters

• Control the amount of information that vclusterand scluster report about the clusters as well as the analysis performed on discovered clusters.

• Examples

1. -clustfile = string. ( Default is MatrixFile.clustering.Nclusters( or GraphFile))

2. -clabelfile = string (name of the file that’s stores the labels of the columns. Used when –showfeatues, -showsummaries or –labeltree are used)

Page 8: Cluto presentation

3. -rlabelfile=string

4. -rclassfile=string (Stores the labels of the rows –objects to be clustered).

5. -showtree

6. -showfeatures (descriptive and discriminating)

Page 9: Cluto presentation

Cluster Visualization Parameters

• Simple plots of the original input matrix which show how the different objects (rows) and features (columns) are clustered together.

• Examples

1. -plottree = string; gives graphic representation of the entire hierarchical tree

2. -plotmatrix = string; shows how the rows of the original matrix are clustered together.

Page 10: Cluto presentation

A practical example

– ../cluto/Linux/vcluster -clmethod=rb -sim=cos -fulltree -rlabelfile=Final_Results/rlabelfile -rclassfile=Final_Results/classfile -showtree -plotformat=gif -plottree=Final_Results/Images/PT-Final10d -plotmatrix=Final_Results/Images/PM-Final10d -plotclusters=Final_Results/Images/PC-Final10d -showfeaturesFinal_Results/FinalOutput10d-Vt.mat 4

Page 11: Cluto presentation

roselineantai@ubuntu:~/JLSI/jlsi$ ./clusterscript2.sh

********************************************************************************

vcluster (CLUTO 2.1.1) Copyright 2001-03, Regents of the University of Minnesota

Matrix Information -----------------------------------------------------------

Name: Final_Results2/FinalOutput50dFinal.mat, #Rows: 59, #Columns: 59, #NonZeros: 3481

Options ----------------------------------------------------------------------

CLMethod=RB, CRfun=I2, SimFun=Cosine, #Clusters: 4

RowModel=None, ColModel=None, GrModel=SY-DIR, NNbrs=40

Colprune=1.00, EdgePrune=-1.00, VtxPrune=-1.00, MinComponent=5

CSType=Best, AggloFrom=0, AggloCRFun=I2, NTrials=10, NIter=10

Solution ---------------------------------------------------------------------

------------------------------------------------------------------------

4-way clustering: [I2=5.65e+01] [59 of 59], Entropy: 0.438, Purity: 0.729

------------------------------------------------------------------------

cid Size ISim ISdev ESim ESdev Entpy Purty | Sem Imp Deo Evo

------------------------------------------------------------------------

0 2 +0.779 +0.000 +0.638 +0.222 0.000 1.000 | 0 2 0 0

1 15 +0.918 +0.024 +0.853 +0.021 0.453 0.800 | 1 0 2 12

2 22 +0.923 +0.015 +0.866 +0.019 0.416 0.818 | 1 18 3 0

3 20 +0.923 +0.015 +0.869 +0.034 0.496 0.550 | 9 0 11 0

------------------------------------------------------------------------

--------------------------------------------------------------------------------

4-way clustering solution - Descriptive & Discriminating Features...

--------------------------------------------------------------------------------

Cluster 0, Size: 2, ISim: 0.779, ESim: 0.638

Descriptive: col00010 32.4%, col00039 22.7%, col00013 5.6%, col00019 3.9%, col00047 2.2%

Discriminating: col00010 53.5%, col00039 22.3%, col00013 2.5%, col00020 2.1%, col00012 1.4%

Cluster 1, Size: 15, ISim: 0.918, ESim: 0.853

Descriptive: col00013 11.3%, col00006 8.2%, col00019 7.9%, col00058 4.4%, col00047 4.2%

Discriminating: col00006 14.8%, col00007 7.4%, col00031 7.1%, col00029 6.6%, col00053 6.2%

Cluster 2, Size: 22, ISim: 0.923, ESim: 0.866

Descriptive: col00013 14.3%, col00020 7.1%, col00014 6.4%, col00019 4.4%, col00047 4.4%

Discriminating: col00020 7.0%, col00014 5.9%, col00013 5.5%, col00006 5.1%, col00010 4.3%

Cluster 3, Size: 20, ISim: 0.923, ESim: 0.869

Descriptive: col00013 8.9%, col00047 6.6%, col00019 5.2%, col00020 4.4%, col00014 3.8%

Discriminating: col00015 8.0%, col00013 7.4%, col00046 5.2%, col00006 5.0%, col00042 4.3%

--------------------------------------------------------------------------------

------------------------------------------------------------------------------

Hierarchical Tree that optimizes the I2 criterion function...

------------------------------------------------------------------------------

Sem Imp Deo Evo

------------------------------------

6

|---4

| |---2 1 18 3 0

| |---3 9 0 11 0

|-5

|-----1 1 0 2 12

|-----0 0 2 0 0

------------------------------------

------------------------------------------------------------------------------

Timing Information -----------------------------------------------------------

I/O: 0.004 sec

Clustering: 0.008 sec

Reporting: 0.316 sec

********************************************************************************

Page 12: Cluto presentation

Classfile and rlabelfile

EvoSemImpImpDeoDeoImpImpDeoDeoImpDeoDeoImpSemDeoSemImpImpEvo

0123456789101112131415

Page 13: Cluto presentation
Page 14: Cluto presentation
Page 15: Cluto presentation

The plot uses red to denote positive values and green to denote negative values. Bright red/green indicatelarge positive/negative values, whereas colors close to white indicate values close to zero.

Page 16: Cluto presentation

Sem 0Imp 1Deo 2Deo 3Sem 4Evo 5Evo 6Sem 7Imp 8Imp 9Deo 10Deo 11Imp 12Imp 13Deo 14Deo 15Imp 16Deo 17Deo 18Imp 19Sem 20