Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A....

20
Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov

description

Clustering Problem Clustering and Classification SYRCoSE’09

Transcript of Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A....

Page 1: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering Algorithms Meta Applier (CAMA) Toolbox

Dmitry S. ShalymovKirill S. SkryganDmitry A. Lyubimov

Page 2: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

ClusteringClustering• Goals

– To detect the underlying structure in data– To reduce data set capacity– To extract unique objects

• Usage – Data mining– Machine learning– Financial mathematics– Optimization– Statistics– Pattern recognition– Control strategies development

SYRCoSE’09

Page 3: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering ProblemClustering Problem

Xxxx n },...,,{ 21

),( xx

YXA :lg

Clustering and Classification

min][

),(][

ji ji

ji jiji

yy

xxyyW

max

][

),(][

ji ji

jiji ji

yy

xxyyB

SYRCoSE’09

Page 4: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Variety of Clustering AlgorithmsVariety of Clustering Algorithms

• Hierarchical– Aglomerative– Partitioning

• Iterative– Hard (K-means, SVM, SPSA)– Fuzzy (FCM)

Important parameters-Distance norm-Number of clusters-Initial values of cluster centers

SYRCoSE’09

Page 5: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Cluster Stability AlgorithmsCluster Stability Algorithms

• Indexes

• Stability (similarity, merit) functions

• Probabilistic measures assessing the likelihood of a decision

• Density estimation approaches

SYRCoSE’09

Page 6: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Stochastic ApproximationStochastic Approximation

0/:* L)(1 kkkkk ga

/)( Lg

k

ikkikkkki c

ecyecyg2

)()()(

kik

kkkkkkkki c

cycyg

2)()()( T

kpkkk ),...,,( 21

Recursive stochastic approximation

FDSA

SPSA

SYRCoSE’09

Page 7: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

SYRCoSE’09

Page 8: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Effectiveness of SPSAEffectiveness of SPSA

SYRCoSE’09

Page 9: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Finding the number of clusters in data setFinding the number of clusters in data set

• Run the SPSA algorithm for different numbers of clusters, K, and calculate the corresponding distortions

• Select a transformation power, Y

• Calculate the “jumps” in transformed distortion

• Estimate the number of clusters in the data set by

1 KY

KY

K ddJ

Kd

KK JK maxarg*

SYRCoSE’09

Page 10: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Structure of data set detectionStructure of data set detection

SYRCoSE’09

Page 11: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

ExamplesExamples

• Iris (3 clusters, 4 features, 150 instances)

• Wine (3 clusters, 13 features, 178 instances)

• Breast Cancer (2 clusters, 32 features, 569 instances)

• Image Segmentation (7 clusters, 19 features, 2310 instances)

SYRCoSE’09

Page 12: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Software Tools for Clustering AnalysisSoftware Tools for Clustering Analysis

• Research– COMPACT– DCPR (Data Clustering & Pattern Recognition)– FCDA (Fuzzy Clustering and Data Analysis Toolbox)– ClusterPack Matlab Toolbox– The Curve Clustering Toolbox– SOM (Self-Organizing Map)– Spectral Clustering Toolbox– Yashil's FCM Clustering

• License software– SPSS– STATISTICA

• Characteristics– Visualization– Efectiveness analysis with patterns– Tools to check performance

• Shortcomings– Limited number of data sets and algorithms– No possibilities to load own algorithm– No on-line services– MATLAB

SYRCoSE’09

Page 13: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

Page 14: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Clustering Algorithms Meta ApplierClustering Algorithms Meta Applier

SYRCoSE’09

Page 15: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA. KernelCAMA. Kernel

SYRCoSE’09

Page 16: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA. KernelCAMA. Kernel

SYRCoSE’09

Page 17: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA ToolboxCAMA Toolboxhttp://ancient.punklan.net:8084/CAMA2/index.jsphttp://ancient.punklan.net:8084/CAMA2/index.jsp

SYRCoSE’09

Page 18: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

Page 19: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

CAMA ToolboxCAMA Toolbox

SYRCoSE’09

Page 20: Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.

Thank you!

SYRCoSE’09