Cluto presentation
Click here to load reader
-
Upload
roseline-antai -
Category
Documents
-
view
937 -
download
0
Transcript of Cluto presentation
![Page 1: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/1.jpg)
CLUTOA Clustering Toolkit
By
Roseline Antai
![Page 2: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/2.jpg)
• CLUTO is a software package which is used for clustering high dimensional datasets and for analyzing the characteristics of the various clusters.
![Page 3: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/3.jpg)
Algorithms of CLUTO
• Vcluster
• Scluster
Major difference: Input
Vcluster: actual mutidimensional representation of the objects to be clustered.
Scluster: The similarity matrix (or graph) between these objects.
![Page 4: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/4.jpg)
Calling Sequence
vcluster [optional parameters] MatrixFileNclusters
scluster [optional parameters] MatrixFileNClusters
![Page 5: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/5.jpg)
Optional Parameters
• Standard specification
-paramname or –paramname = value
• Three categories:
– Clustering algorithm parameters
– Reporting and Analysis parameters
– Cluster Visualization parameters
![Page 6: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/6.jpg)
Clustering algorithm parameters
• Control how CLUTO computes the clustering solution.
• Examples
1. -clmethod=string ( rb, agglo,direct,graph, etc)2. -sim = string (cos,corr,dist,jacc)3. -crfun = string (i1,i2 etc)4. -fulltree
![Page 7: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/7.jpg)
Reporting and Analysis Parameters
• Control the amount of information that vclusterand scluster report about the clusters as well as the analysis performed on discovered clusters.
• Examples
1. -clustfile = string. ( Default is MatrixFile.clustering.Nclusters( or GraphFile))
2. -clabelfile = string (name of the file that’s stores the labels of the columns. Used when –showfeatues, -showsummaries or –labeltree are used)
![Page 8: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/8.jpg)
3. -rlabelfile=string
4. -rclassfile=string (Stores the labels of the rows –objects to be clustered).
5. -showtree
6. -showfeatures (descriptive and discriminating)
![Page 9: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/9.jpg)
Cluster Visualization Parameters
• Simple plots of the original input matrix which show how the different objects (rows) and features (columns) are clustered together.
• Examples
1. -plottree = string; gives graphic representation of the entire hierarchical tree
2. -plotmatrix = string; shows how the rows of the original matrix are clustered together.
![Page 10: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/10.jpg)
A practical example
– ../cluto/Linux/vcluster -clmethod=rb -sim=cos -fulltree -rlabelfile=Final_Results/rlabelfile -rclassfile=Final_Results/classfile -showtree -plotformat=gif -plottree=Final_Results/Images/PT-Final10d -plotmatrix=Final_Results/Images/PM-Final10d -plotclusters=Final_Results/Images/PC-Final10d -showfeaturesFinal_Results/FinalOutput10d-Vt.mat 4
![Page 11: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/11.jpg)
roselineantai@ubuntu:~/JLSI/jlsi$ ./clusterscript2.sh
********************************************************************************
vcluster (CLUTO 2.1.1) Copyright 2001-03, Regents of the University of Minnesota
Matrix Information -----------------------------------------------------------
Name: Final_Results2/FinalOutput50dFinal.mat, #Rows: 59, #Columns: 59, #NonZeros: 3481
Options ----------------------------------------------------------------------
CLMethod=RB, CRfun=I2, SimFun=Cosine, #Clusters: 4
RowModel=None, ColModel=None, GrModel=SY-DIR, NNbrs=40
Colprune=1.00, EdgePrune=-1.00, VtxPrune=-1.00, MinComponent=5
CSType=Best, AggloFrom=0, AggloCRFun=I2, NTrials=10, NIter=10
Solution ---------------------------------------------------------------------
------------------------------------------------------------------------
4-way clustering: [I2=5.65e+01] [59 of 59], Entropy: 0.438, Purity: 0.729
------------------------------------------------------------------------
cid Size ISim ISdev ESim ESdev Entpy Purty | Sem Imp Deo Evo
------------------------------------------------------------------------
0 2 +0.779 +0.000 +0.638 +0.222 0.000 1.000 | 0 2 0 0
1 15 +0.918 +0.024 +0.853 +0.021 0.453 0.800 | 1 0 2 12
2 22 +0.923 +0.015 +0.866 +0.019 0.416 0.818 | 1 18 3 0
3 20 +0.923 +0.015 +0.869 +0.034 0.496 0.550 | 9 0 11 0
------------------------------------------------------------------------
--------------------------------------------------------------------------------
4-way clustering solution - Descriptive & Discriminating Features...
--------------------------------------------------------------------------------
Cluster 0, Size: 2, ISim: 0.779, ESim: 0.638
Descriptive: col00010 32.4%, col00039 22.7%, col00013 5.6%, col00019 3.9%, col00047 2.2%
Discriminating: col00010 53.5%, col00039 22.3%, col00013 2.5%, col00020 2.1%, col00012 1.4%
Cluster 1, Size: 15, ISim: 0.918, ESim: 0.853
Descriptive: col00013 11.3%, col00006 8.2%, col00019 7.9%, col00058 4.4%, col00047 4.2%
Discriminating: col00006 14.8%, col00007 7.4%, col00031 7.1%, col00029 6.6%, col00053 6.2%
Cluster 2, Size: 22, ISim: 0.923, ESim: 0.866
Descriptive: col00013 14.3%, col00020 7.1%, col00014 6.4%, col00019 4.4%, col00047 4.4%
Discriminating: col00020 7.0%, col00014 5.9%, col00013 5.5%, col00006 5.1%, col00010 4.3%
Cluster 3, Size: 20, ISim: 0.923, ESim: 0.869
Descriptive: col00013 8.9%, col00047 6.6%, col00019 5.2%, col00020 4.4%, col00014 3.8%
Discriminating: col00015 8.0%, col00013 7.4%, col00046 5.2%, col00006 5.0%, col00042 4.3%
--------------------------------------------------------------------------------
------------------------------------------------------------------------------
Hierarchical Tree that optimizes the I2 criterion function...
------------------------------------------------------------------------------
Sem Imp Deo Evo
------------------------------------
6
|---4
| |---2 1 18 3 0
| |---3 9 0 11 0
|-5
|-----1 1 0 2 12
|-----0 0 2 0 0
------------------------------------
------------------------------------------------------------------------------
Timing Information -----------------------------------------------------------
I/O: 0.004 sec
Clustering: 0.008 sec
Reporting: 0.316 sec
********************************************************************************
![Page 12: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/12.jpg)
Classfile and rlabelfile
EvoSemImpImpDeoDeoImpImpDeoDeoImpDeoDeoImpSemDeoSemImpImpEvo
0123456789101112131415
![Page 13: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/13.jpg)
![Page 14: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/14.jpg)
![Page 15: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/15.jpg)
The plot uses red to denote positive values and green to denote negative values. Bright red/green indicatelarge positive/negative values, whereas colors close to white indicate values close to zero.
![Page 16: Cluto presentation](https://reader038.fdocuments.in/reader038/viewer/2022100605/559e1b011a28abd75b8b4640/html5/thumbnails/16.jpg)
Sem 0Imp 1Deo 2Deo 3Sem 4Evo 5Evo 6Sem 7Imp 8Imp 9Deo 10Deo 11Imp 12Imp 13Deo 14Deo 15Imp 16Deo 17Deo 18Imp 19Sem 20