Determining the number of clusters using information entropy for mixed data
-
Upload
zelenia-lewis -
Category
Documents
-
view
34 -
download
1
description
Transcript of Determining the number of clusters using information entropy for mixed data
Determining the number of clusters using information entropy for mixed data
Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang
PR, 2012
1
Motivation
• The determination of the initial parameters of cluster is the most difficult problem.
• None of cluster algorithms can cluster effectively mixed data set.
3
Objectives
• To propose a generalized mechanism on mixed data set by integrating Renyi entropy and complement entropy.
• To improve k-prototype algorithm by using new generalized mechanism.
4
Methodology
• A generalized mechanism for numerical data…
6
Renyi Entropy :
Parzen window density estimation:
By the convolution theorem…
Within-Cluster Entropy:
Between-Cluster Entropy:
Improved Entropy for numerical data:
Methodology
• A generalized mechanism for categorical data…
7
Indiscernibility relation…
Complement Entropy: Within-Cluster Entropy:
Improved Entropy for categorical data:
Between-Cluster Entropy:
Huang Dissimilarity for categorical data:
Methodology
• Cluster validity index for mixed data…
9
For numerical data…
For categorical data…
For mixed data…
Conclusions
• The generalized mechanism and algorithm can cluster effectively and determine the optimal number of clusters for mixed data sets.
20