K-means*: Clustering by Gradual Data Transformation
-
Upload
miranda-finch -
Category
Documents
-
view
25 -
download
2
description
Transcript of K-means*: Clustering by Gradual Data Transformation
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means*: Clustering by Gradual Data Transformation
Mikko Malinen and Pasi Fränti
Speech and Image Processing Unit
School of Computing
University of Eastern Finland
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means* clustering Gradual transformation of data
Model
Data
Fit the data to a model
Intermediate Final
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means clustering
Iterate between two steps:
1. Assignment step Assign the points to the nearest centroids
2. Update step Update the location of centroids
)(
)(
)1( 1t
ij Sjt
i
ti
S x
x m
},...,1*:{ )(*
)()( kiS tij
tijj
ti m x m x x
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
K-means* clustering
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Example of clustering (s2 dataset)
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
0% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
10% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
20% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
30% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
40% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
50% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
60% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
70% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
80% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
90% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
100% done
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Empty clusters problem
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Initialization
Data set transform
Empty clusters removal
K-means
Algorithm total
)(nOkfree kPhase )( nOk )1(Ok
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO
)(nO)( 2nkO )( 3nO )( 2nO
)( 1kdknO )( 2)( dnOnO )( 2
3dn
nO )( 1kdnO
)( 1kdknO )( 2)( dnOnO )( 2
3dn
nO )( 1kdnO
Time Complexity
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Time ComplexityFixed k-means
Initialization
Data set transform
Empty clusters removal
K-means
Algorithm total
)(nOkfree kPhase )( nOk )1(Ok
)(nO )(nO )(nO )(nO
)(nO)( 2nkO )( 3nO )( 2nO
)(knO )( 2nO )( 5.1nO
)(nO )(nO )(nO )(nO
)(nO)( 2nkO )( 3nO )( 2nO
)(nO
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
s1 d = 2n = 5000 k = 15
s2 d = 2n = 5000 k = 15
s3 d = 2n = 5000 k = 15
s4 d = 2n = 5000 k = 15
bridge d = 16n = 4096 k= 256
missa d = 16n = 6480 k= 256
house d = 3n=34000 k=256
thyroid d = 5n = 215 k = 2
iris d = 4n = 150 k = 2
wine d = 13n = 178 k = 3
Datasets
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error
Dataset k-means proposed GKM optimal
s1 1.85 1.01 0.89 0.89
s2 1.94 1.52 1.33 1.33
s3 1.97 1.71 1.69 1.69
s4 1.69 1.63 1.57 1.57
bridge 168.2 164.7 164.1 160.7
missa 5.33 5.15 5.34 5.12
house 9.88 9.48 5.94 5.86
thyroid 6.97 6.92 1.52 1.52
iris 3.70 3.70 2.02 2.02
wine 1.92 1.90 0.88 0.88
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Mean square error vs.number of steps
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
All correct:
Number of incorrect clusters
proposed: 36%k-means: 14%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
1 incorrect:
Number of incorrect clusters
proposed: 64%k-means: 38%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
2 incorrect:
Number of incorrect clusters
proposed: 0%k-means: 34%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
3 incorrect:
Number of incorrect clusters
proposed: 0%k-means: 10%
University of Eastern FinlandSchool of ComputingP.O. Box 111FIN- 80101 JoensuuFINLANDTel. +358 13 251 7959fax +358 13 251 7955www.uef.fi/cs
Summary
• We have presented a clustering method based on gradual transformation of data and k-means. Instead of fitting the model to data, we fit the data to a model.
• The proposed method gives better mean square error than k-means.