吳國龍 Mean Shift-Based Clusteringir.lib.ksu.edu.tw/bitstream/987654321/5029/1/(吳國龍... ·...
Transcript of 吳國龍 Mean Shift-Based Clusteringir.lib.ksu.edu.tw/bitstream/987654321/5029/1/(吳國龍... ·...
1
Mean Shift-Based Clustering
Kuo-Lung Wua and Miin-Shen Yangb
aDepartment of Information Management, Kun Shan University of Tchnology, Yung-Kang,
Tainan, Taiwan 71023, ROC
bDepartment of Applied Mathematics, Chung Yuan Christian University, Chung-Li, Taiwan
32023, ROC
Abstract- In this paper, a mean shift-based clustering algorithm is proposed. The mean shift
is a kernel-type weighted mean procedure. Herein, we first discuss three classes of Gaussian,
Cauchy and generalized Epanechnikov kernels with their shadows. The robust properties of
the mean shift based on these three kernels are then investigated. According to the mountain
function concepts, we propose a graphical method of correlation comparisons as an
estimation of defined stabilization parameters. The proposed method can solve these
bandwidth selection problems from a different point of view. Some numerical examples and
comparisons demonstrate the superiority of the proposed method including those of
computational complexity, cluster validity, and improvements of mean shift in large
continuous, discrete data sets. We finally apply the mean shift-based clustering algorithm to
image segmentation.
Index Terms: Kernel functions, mean shift, robust clustering, generlized Epanechnikov
kernel, bandwidth selection, parameter estimation, mountain method, noise.
2
1. INTRODUCTION
Kernel-based methods are widely used in many applications [1], [2]. There are two ways
of implementing kernel-based methods along with supervised and unsupervised learning. One
way is to transform the data space into a high-dimensional feature space F where the inner
products in F can be represented by a Mercer kernel function defined on the data space (see
Refs. [1-5]). An alternative way is to find a kernel density estimate on the data space and then
search the modes of the estimated density [6]. The mean shift [7], [8] and the mountain
method [9] are two simple techniques that can be used to find the modes of a kernel density
estimate.
Fukunaga and Hostetler [7] proposed the mean shift procedure based on asymptotic
unbiasedness, consistency and uniform consistency of a nonparametric density function
gradient estimate using a generalized kernel approach. This technique has been applied in,
image analysis [10], [11], texture segmentation [12], [13], objective tracking [14], [15], [16]
and data fusion [17]. Cheng [8] clarified the relationship between mean shift and optimization
by introducing the concept of shadows. He showed that a mean shift is an instance of gradient
ascent with an adaptive step size. Cheng [8] also proved some of the convergence properties
of a blurring mean shift procedure and showed some peculiar behaviors of mean shift in
cluster analysis with the most used Gaussian kernel. Moreover, Fashing and Tomasi [18]
showed that mean shift is a bound optimization and is equivalent to Newton’s method in the
case of piecewise constant kernels.
Yager and Filev [9] proposed mountain methods to find the approximate modes of the
data set via the mountain function, which is equivalent to the kernel density estimate defined
on the grid nodes. Chiu [19] modified the mountain method by defining the mountain
function, not on the grid nodes, but on the data set. Recently, Yang and Wu [20] proposed a
3
modified mountain method to identify the number of modes of the mountain function. The
applications of the mountain method can be found in [21], [22]. In this paper, we will propose
a mean shift-based clustering method using the concept of mountain functions to solve the
bandwidth selection problem in the mean shift procedure.
The bandwidth selection for a kernel function directly affects the performance of the
density estimation. It also heavily influences the performance of the mean shift procedure.
The modes found by the mean shift do not adequately present the dense area of the data set if
a poor bandwidth estimate is used. Comaniciu and Meer [10] summarized four different
techniques according to statistical analysis-based and task-oriented points of view. One
practical bandwidth selection technique is related to the stability of decomposition for a
density shape estimate. The bandwidth is taken as the center of the largest operating range
over which the same number of clusters are obtained for the given data [23]. Similar concepts
are used by Beni and Liu [24] to estimate the resolution parameter of a least biased fuzzy
clustering method, and also to solve the cluster validity problem. In this paper, we will offer
an alternative method of solving this problem. We first normalize the kernel function and then
estimate the defined stabilization parameter. This estimation method will be discussed in
Section 3 based on the mountain function concept.
The properties of the mean shift procedures are reviewed in Section 2.1. We discuss
some special kernels with their shadows and then define generalized Epanechnikov kernels in
Section 2.2. Note that most of mean shift-based algorithms are less likely to include the
property of robustness [25] that is often employed in clustering [26], [27], [28], [29], [30]. In
Section 3.1, we will discuss the relation between the mean shift and the robust statistics based
on the discussed kernel functions. In Section 3.2, we proposed an alternative technique to
solve the bandwidth selection problem which solved the problem from a different point of
view. The purpose of the bandwidth selection is to find a suitable bandwidth (covariance) for
a kernel function of a data point so that a suitable kernel function will induce a good density
4
estimate. Our technique assigns a fixed sample variance for the kernel function so that a
suitable stabilization parameter can induce a satisfactory density estimate. We then propose a
technique to estimate the stabilization parameter using an adaptive mountain function in
Section 3.3. According to our analysis, we propose the mean shift-based clustering algorithm
based on the defined generalized Epanechnikov kernel in Section 3.4. Some numerical
examples, comparisons and applications are stated in Section 4. These include the
computational complexity, cluster validity, improvements in large continuous and discrete
data sets and image segmentation. Finally, conclusions are given in Section 5.
2. MEAN SHIFT
Let },,{ 1 nxxX Λ= be a data set in an s-dimensional Euclidean space sR . Camastra
and Verri [3] and Girolami [4] had recently considered kernel-based clustering for X in the
feature space where the data space is transformed to a high-dimensional feature space F and
the inner products in F are represented by a kernel function. On the other hand, the kernel
density estimation with the modes of the density estimate over X is another kernel-based
clustering method based on the data space [6]. The modes of a density estimate are equivalent
to the location of the densest area of the data set where these locations could be satisfactory
cluster center estimates. In the kernel density estimation, the mean shift is a simple gradient
technique used to find the modes of the kernel density estimate. We first review the mean
shift procedures in the next sub-section.
2.1. Mean Shift Procedures
Mean shift procedures are techniques for finding the modes of a kernel density estimate.
Let RXH →: be a kernel with ).||(||)( 2jxxhxH −= The kernel density estimate is
given by
∑ =−=
n
j jjH xwxxhxf1
2 )()||(||)(ˆ (1)
where )( jxw is a weight function. Based on a uniform weight, Fukunaga and Hostetler [7]
first gave the statistical properties including the asymptotic unbiasedness, consistency and
uniform consistency of the gradient of the density estimate given by
∑ =−′−=∇
n
j jjjH xwxxhxxxf1
2 )()||(||)(2)(ˆ
5
Suppose that there exists a kernel RXK →: with )||(||)( 2jxxkxK −= such that
)()( rckrh =′ where c is a constant. The kernel H is termed a shadow of kernel K (see Ref.
[8]). Then
].)()[(ˆ
)()||(||
)()||(||])()||(||[
)())(||(||)(ˆ
12
12
12
12
xxmxf
xxwxxk
xxwxxkxwxxk
xwxxxxkxf
KK
n
j jj
n
j jjjn
j jj
n
j jjjH
−=
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡−
−
−−=
−−=∇
∑∑
∑
∑
=
=
=
=
(2)
The term )(ˆ)(ˆ)( xfxfxxm KHK ∇=− is called the generalized mean shift which is
proportional to the density gradient estimate. Formulation (2) was first remarked upon in [11]
with the uniform weight case. Taking the gradient estimator )(ˆ xfH∇ to be zero, we derive a
mode estimate as
∑∑
=
=
−
−== n
j jj
n
j jjj
K)x(w)||xx(||k
x)x(w)||xx(||k)x(mx
12
12
(3)
where H is a shadow of kernel K. Equation (3) is also called the weighted sample mean with
kernel K. The mean shift has three kinds of implementation procedures. The first is to set all
data points as initial values and then update each data point jx with )( jK xm , nj ,,1Λ=
(i.e. )( jKj xmx ← ). This procedure is called a blurring mean shift. For the blurring process,
each data point jx is updated with each iteration. Hence the density estimate )(ˆ xf H is also
changed. The purpose of the second one is still to set all data points as initial values, but we
only update the data point x with )( jK xm (i.e. )( jK xmx ← ). This procedure is called a
nonblurring mean shift. In this process, most data points and the density estimate are not
updated. The third one is to choose c initial values where c may be greater or smaller than n
and then update the data point x with )( jK xm (i.e. )( jK xmx ← ). This procedure is called
a general mean shift. Cheng [8] proved some convergence properties of the blurring mean
shift process using (3). Moreover, Comaniciu and Meer [11] also gave some properties for the
discrete data. They also discussed its relation to the Nadaraya-Watson estimator from the
kernel regression and robust M-estimator points of view.
If there is a shadow H of kernel K, then the mean shift procedure could ascertain modes
of a known density estimate )(ˆ xf H . This technique can be used to directly find modes of the
6
density shape estimate. If a shadow of kernel K does not exist or it has not been found yet,
the mean shift can be used to estimate the alternative modes (cluster centers) of the given data
set with an unknown density function.
2.2. Some Special Kernels and Their Shadows
In this subsection, we investigate those special kernels with their shadows. The most
commonly used kernels that are their own shadows are the Gaussian kernels )(xG p as
pj
pj
p xxxxgxG }]||||[exp{)]||(||[)( 22 β−−=−=
with their shadows pSG defined as
)()( xGxSG pp = , 0>p
This means that the mean shift procedures with )( jGxmx P← are used to find the modes of
the density estimate )(ˆ xf PSG. Cheng [8] showed some behaviors of mean shift in cluster
analysis with a Gaussian kernel. The maximum entropy clustering algorithm [31], [32] is a
Gaussian kernel-based mean shift with a special weight function. Chen and Zhang [33] used
the Gaussian kernel-induced distance measure to implement the spatially constrained fuzzy
c-means (FCM) [34] as a robust image segmentation method. Yang and Wu [35] directly used
)(ˆ xf PSG as a total similarity objective function. They then derived a similarity-based
clustering method (SCM) that could self-organize the cluster number and volume according
to the structure of the data.
Another important class of kernels is the Cauchy kernels defined as p
jp
jp xxxxcxC ])||||1[()]||(||[)( 122 −−+=−= β
that are based on the Cauchy density function 12 )1(1)( −+= xxfπ
, ∞<<∞− x
with their shadows defined as
)()( 1 xCxSC pp −= , 1>p .
The mean shift process with )( jCxmx P← is used to find the modes of the density
estimate )(ˆ xf PSC. There is less application with the Cauchy kernels. To improve the weakness
of FCM in a noisy environment, Krishnapuram and Keller [36] first considered relaxing the
constraint of the fuzzy c-partitions summation to 1 and then proposing the so-called
possibilistic c-means (PCM) clustering algorithm. Eventually, these possibilistic membership
functions become the Cauchy kernels. This is the only application of the Cauchy kernels to
clustering what we can find.
7
-2 -1 0 1 2
0.0
0.5
1.0
x
The Gaussian kernels
(a)
p=0.1
p=5
-2 -1 0 1 2
0.0
0.5
1.0
x
The Cauchy kernels
(b)
p=5
p=0.1
-2 -1 0 1 2
0.0
0.5
1.0
x
The generalized Epanechnikon kernel
(c)
p=5
p=0.1
The simplest kernel is the flat kernel defined as
⎪⎩
⎪⎨⎧
>−≤−
=1011
2
2
||xx||if,||xx||if,
F(x)j
j
with the Epanechnikov kernel E(x) as its shadows
⎪⎩
⎪⎨⎧
>−≤−−−
=1||||,01||||,||||1
E(x) 2
22
j
jj
xxifxxifxx
Moreover, the Epanechnikov kernel has the biweight kernel B(x) as its shadow
⎪⎩
⎪⎨⎧
>−≤−−−
=1||||,01||||,)||||1(
B(x) 2
222
j
jj
xxifxxifxx
.
For a more general presentation, we extend the Epanechnikov kernel E(x) to be the
generalized Epanechnikov kernels (x)KPE with the parameter p defined as
⎪⎩
⎪⎨⎧
>−≤−−−
=−=βββ
2
222P
E ||||,0||||,)||||1(
)]||(||[(x)Kj
jP
jPjE xxif
xxifxxxxk
Thus, the generalized Epanechnikov kernels (x)KPE has the corresponding shadows
)(xSK PE defined as
)()( 1 xKxSK PE
PE
+= , 0>p .
Fig. 1. The kernel functions with different p values. (a) Gaussian kernels. (b) Cauchy kernels. (c) Generalized
Epanechnikov kernels.
Note that we have )x(F)x(K E =0 , )x(E)x(K E =1 and )x(B)x(K E =2 when 1=β .
The mean shift process with )( jKxmx P
E← is used to find the modes of the density
estimate )(ˆ xf PESK
. In total, we present three kernel classes with their shadows. These are
8
Gaussian kernels )x(G p with their shadows )x(G)x(SG pp = , Cauchy kernels )x(C p
with their shadows )x(C)x(SC pp 1−= , and the generalized Epanechnikov kernels )x(K PE
with their shadows )x(K)x(SK PE
PE
1+= . The behaviors of these kernels with different p are
shown in Fig. 1. The mean shift procedures using these three classes of kernels can easily
find their corresponding density estimates. The parameters β and p, called the
normalization and stabilization parameters, respectively, greatly influence the performance of
the mean shift procedures for the kernel density estimate. We will discuss these in the next
section.
3. MEAN SHIFT AS A ROBUST CLUSTERING A suitable clustering method should have the ability to tolerate noise and detect outliers
in the data set. Many criteria such as the breakdown point, local-shift sensitivity, gross error
sensitivity and influence functions [25] can be used to measure the level of robustness.
Comaniciu and Meer [11] discussed some relationships of the mean shift and the
nonparametric M-estimator. Here, we provide a more detailed analysis of the robustness of
the mean shift procedures.
3.1. Analysis of Robustness
The mean shift mode estimate )x(mk can be related to the location M-estimator
∑ =−=
n
j j )x(minargˆ1
θρθθ
where ρ is an arbitrary loss measure function. θ̂ can be generated by solving the equation
011
=−=−∂∂ ∑∑ ==
n
j jn
j j )x()x()( θϕθρθ
where )()()( θρθθϕ −∂∂=− jj xx . If the kernel H is a reasonable similarity measure which
takes values on the interval [0, 1], then ( H1− ) could be a reasonable loss function. Thus, a
location M-estimator can be found by
)(ˆmaxarg)||(||maxarg
)]||(||1[minarg)(minargˆ
12
12
1
θθ
θθρθ
θθ
θθ
Hn
j j
n
j jn
j j
fxh
xhx
=−=
−−=−=
∑
∑∑
=
==
This means that )(xmk is a location M-estimator if K is a kernel with its shadow H.
The influence curve (IC) can help us to assess the relative influence of an individual
observation toward the value of an estimate. In the location problem, we have the influence
9
of an M-estimate with
∫ −′
−=
)()()(),;(
ydFyyFyIC
Yθϕθϕθ
where )(yFY denotes the distribution function of Y. The M-estimator has shown that
),;( θFyIC is proportional to its ϕ function [25]. If the influence function of an estimator
is unbounded, an outlier might cause trouble where the ϕ function is used to denote the
degree of influence.
Suppose that },,{ 1 nxx Λ is a data set in the real number space sR and )(xSG P is a
shadow of )(xG P . Then, )(xm PG is a location M-estimator with ϕ function defined to be
)()(2)](1[)( xGxxpxSGdxdxx P
jP
jGP −=−=−β
ϕ .
By applying the L’Hospital’s rule, we derive 0)(lim =−±∞→
xx jGxP
j
ϕ . Thus, we have the
influence curve of )(xm PG with 0),;( =xFxIC j when jx tends to positive or negative
infinity. This means that the influence curve is bounded and the influence of an extremely
large or small jx on the mode estimator )(xm PG is very small. We can also find the
location of jx which has a maximum influence on )(xm PG by solving
0)()( =−∂∂ xxx jGPϕ . Note that both )(xm PC
and )(xm PEK
also have the previous
properties where their ϕ functions are as follows:
)()()1(2)( xCxxpxx PjjC P −
−=−
βϕ
⎪⎩
⎪⎨⎧
≥−
≤−−+
=−β
ββϕ
2
2
)(0
)()()()1(2)(
j
jPEj
jKxxif
xxifxKxxpxxP
E
The ϕ function for the mean shift-based mode seeking estimators with 1=β is
shown in Fig. 2. The influence of an individual jx on )(xm PEK
is zero if β≥− 2)( jxx
and an extremely large or small jx will have no effect on )(xm PEK
. However, the influence
of an extremely large or small jx on )(xm PG and )(xm PC
is a monotone decreasing
function of the stabilization parameter p. An outlier has no influence when p becomes large.
In this paper, we will choose these generalized Epanechnikov kernels for our mean
shift-based clustering algorithm because they can suitably fit the data sets even with
10
10 20 30 40 50 60 70 80 90 100
-2
-1
0
1
2
The phi function of Gaussian kernels
(a)
jx
p=5p=2
p=1
10 20 30 40 50 60 70 80 90 100
-1
0
1
The phi function of Cauchy kernels
(b)
p=5
p=2
p=1.5
x j 100908070605040302010
2
1
0
-1
-2
The phi function of generalized Epanechnikon kernels
(c)
jx
p=1
p=5p=2
extremely large or small data points.
Fig. 2. The ϕ functions with different p values. (a) Gaussian kernels. (b) Cauchy kernels. (c) Generalized
Epanechnikov kernels.
3.2. Bandwidth Selection and Stabilization Parameter
Bandwidth selection greatly influences the performance of kernel density estimation.
Comaniciu and Meer [11] summarized four different techniques according to statistical
analysis-based and task-oriented points of view. Here, we give new consideration to first
normalizing the distance measure by dividing the normalization parameter β and then
focusing on estimating the stabilization parameter p. The normalization parameter β is set
to be the sample variance
n
xxn
j j∑ =−
= 12||||
β where n
xx
n
j j∑ == 1 .
We have the properties of
xn
x
xG
xxGxm
n
j j
n
jP
n
j jP
pGpP ===
∑∑∑ =
=
=
→→
1
1
1
00 )(
)(lim)(lim (4)
and
xxmxmxm PE
PP KpCpGp===
→→→)(lim)(lim)(lim
000. (5)
This means that when p tends to zero, the kernel density estimate has only one mode with the
sample mean. Figure 3 shows the histogram of a three-cluster normal mixture data with its
corresponding density estimates )(ˆ xf PSG, )(ˆ xf PSC
and )(ˆ xf PESK
. In fact, a small p will lead to
only one mode x in the estimated density, as shown in Fig. 3 for the case of p=1. However,
a too larger p will cause the density estimate to have too many modes as shown in Fig. 3 for
the case of p=50. This can be explained as follows. We denote )(max)(ˆ xGxGj
= and
11
0 10 20
0
5
10
15
(a)
Freq
uenc
y
0 5 10 15
0
10
20
30
40
50
60
70
80
(b)
P=1
P=5
P=10
p=50
fGp (x)^
0 5 10 15
0
10
20
30
40
50
60
70
80
90
100
(c)
fCp (x)^
P=1
P=5
P=10
p=50
0 5 10 15
0
10
20
30
40
50
60
70
(d)
fK p (x)^
P=1
P=5
P=10
p=50
E
)(ˆ)()( xGxGxG =′ . We then have
∑∑
∑∑
∑∑
=′
=′
=
=
∞→=
=
∞→∞→=
′
′==
1)(
1)(
1
1
1
1
1])([
])([lim
)(
)(lim)(lim
xG
xG j
n
jP
n
j jP
pn
jP
n
j jP
pGp
x
xG
xxG
xG
xxGxm P . (6)
This means that, as p tends to infinity, the data point which is the closest to the initial value
will become the peak. Hence, we have
)(lim)(lim)(lim xmxmxm PE
PP KpCpGp ∞→∞→∞→== (7)
In this case, each data point will become a mode in the blurring and nonblurring mean shift
procedures.
Fig. 3. (a) The histogram of a three-cluster normal mixture data. (b), (c) and (d) are its corresponding density
estimates )(ˆ xf PSG, )(ˆ xf PSC
and )(ˆ xf PESK
with different p.
The stabilization parameter p can control the performance of the density estimate. This
situation is somehow similar to the bandwidth selection, but with different merits. The
purpose of the bandwidth selection is to find a suitable bandwidth (covariance) for a kernel
function of a data point so that a suitable kernel function can induce a suitable density
estimate. However, our new technique assigns a fixed sample variance for the kernel function
12
and then uses a suitable p to induce a suitable density estimate. As shown in Fig. 3, different
p corresponds to different shapes of the density estimates. A suitable p will correspond to a
satisfactory kernel density estimate so that the modes found by the mean shift can present the
dense area of the data set.
3.3. A Stabilization Parameter Estimation Method
A practical bandwidth selection technique is related to the decomposition stability of the
density shape estimates. The bandwidth is taken as the center of the largest operating range
over which the same number of clusters are obtained for the given data [23]. This means that
the shapes of the estimated density are unchanged over this operating range. Although this
technique can yield a suitable bandwidth estimate, it needs to find all cluster centers (modes)
for each bandwidth over the chosen operating range. It thus requires a large computation. The
selected operating range is also performed case by case. In our stabilization parameter
estimation method, we adopt the above concept but skip the step of finding all modes for
each p. We describe the proposed method as follows.
We first define the following function
∑ =−=
n
j jiH xxhxf1
2 )||(||)(ˆ , ni ,,1Λ= , nj ,,1Λ= . (8)
This function denotes the value of the estimated density shape on the data points and is
similar to the mountain function proposed by Yager and Filev [9] which is used to obtain the
initial values for a clustering method. Note that, the original mountain function is defined on
the grid nodes. However, we define equation (8) on the data points. In equation (8), we can
set H to equal to PSG , PSC and PESK . We now explain how to use equation (8) to find a
suitable stabilization parameter p. Suppose that the density shape estimates are unchanged
with p=10 and p=11, the correlation value of )}(ˆ,),(ˆ{ 1 nHH xfxf Λ between p=10 and p=11
will be very close to one. In this situation, p=10 will be a suitable parameter estimate. In this
way, we can skip the step for finding the modes of the density estimate. In our experiments, a
good operating range for p always falls between 1 and 50. This is because we normalize the
kernel function by dividing the normalization parameterβ .
Example 1. We use the previously described graphical method of correlation comparisons
to find the suitable p for the data set shown in Fig. 3. We have the correlation values of
)}(ˆ,),(ˆ{ 1 nSKSKxfxf P
EPE
Λ and )}(ˆ,),(ˆ{ 1 nSKSKxfxf MP
EMP
E++ Λ as shown in Fig. 4. Figures 4(a),
4(b), 4(c) and 4(d) give the results of the cases with the increased shifts M=1, 2, 3 and 4,
respectively. In Fig. 4(a), the y-coordinate of the first solid circle point denotes the correlation
13
0 10 20 30 40
0.98
0.99
1.00
Index
The increased shift for the stabilization parameter p is M=1
(a)
suitable estinate
0 5 10 15 20
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
Index
The increased shift for the stabilization parameter p is M=2
(b)
suitable estinate
0 2 4 6 8 10 12 14
0.90
0.95
1.00
Index
The increased shift for the stabilization parameter p is M=3
(c)
suitable estinate
0 1 2 3 4 5 6 7 8 9 10
0.90
0.95
1.00
Index
The increased shift for the stabilization parameter p is M=4
(d)
suitable estinate
value between p=1 and p=2. The y-coordinate of the second solid circle point denotes the
correlation value between p=2 and p=3. The 1-th, 2-th, 3-th solid circle points, etc., denote
the correlation values of the respective pairs (p=1, p=2), (p=2, p=3) and (p=3, p=4), etc. In
Fig. 4(b), the y-coordinate of the first solid circle point denotes the correlation value between
p=1 and p=1+2=3. The y-coordinate of the second solid circle point denotes the correlation
value between p=3 and p=5. The 1-th, 2-th, 3-th solid circle points, etc., denote the
correlation values of the respective pairs (p=1, p=3), (p=3, p=5) and (p=5, p=7), etc. The
others can be similarly induced. Since the density shape will be unchanged when the
correlation value is close to one, we may choose p with the solid circle point closed to the
dotted line of value one.
Fig. 4. The correlation values of )}(ˆ,),(ˆ{ 1 nSKSK xfxf PE
PE
Λ and )}(ˆ,),(ˆ{ 1 nSKSK xfxf MPE
MPE
++ Λ where (a)
M=1, (b) M=2, (c) M=3 and (d) M=4.
In Fig. 4(a), with the increased shift M=1, the 10-th solid circle point is very close to the
dotted line (i.e. the correlation value between p=10 and p=11 is very close to one). Hence,
14
200
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
p=10
p=13
The density estimate using generalized Epanechnikon kernel
p=10 is a suitable estimate. We will explain why we do not choose the point that lies on the
dotted line later. In Fig. 4(b), with the increased shift M=2, the 6-th solid circle point is close
to the dotted line (i.e. the correlation value between p=11 and p=13 is close to one). Hence,
p=11 is a suitable estimate. In Fig. 4(c), with the increased shift M=3, the correlation value
between p=13 and p=16 is very close to one. Hence, p=13 is a suitable estimate. Similarly,
p=13 is a suitable estimate as shown in Fig. 4(d). Figure 5 illustrates the density estimates
using )(ˆ xf PESK
with p=10, 11, 12 and 13. We find that these density estimates actually match
the histogram of the data. The following is another example with two dimensional data.
Fig. 5. The data histograms and density estimates using generalized
Epanechnikov kernel with p=10, 11, 12 and 13.
Example 2. We use a 2-dimensional data set in this example. Figure 6(a) shows a 16-cluster
data set where the data points in each group are uniformly generated from rectangles. Figures
6(b), 6(c) and 6(d) are the correlation values with the increased shift M=1, 2 and 3,
respectively. In Fig. 6(b), the 13-th point is close to the dotted line and hence p=13 is a good
estimate. In Fig. 6(c), the 7-th point is close to the dotted line and hence we choose p=13.
Figure 6(d) also indicates that p=13 is a suitable estimate. In this example, our graphical
method of correlation comparisons with different increased shift M presents all the same
results. The density estimates using PESK with p=1, 5, 10 and 15 are shown in Figs. 7(a),
7(b), 7(c) and 7(d), respectively. In section 3.2, equations (4) and (5) show that a small p
value will cause the kernel density estimate to have only one mode with the sample mean.
Figures 3 and 7(a) also verify this point. The selected p=13 is between p=10 and p=15 whose
density shapes are shown in Figs. 7(c) and 7(d) that match the original data structure well.
15
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
x
y
(a)
10 20 30 40
0.95
0.96
0.97
0.98
0.99
1.00
Index
The increased shift of p is M=1
(b)
suitable estimate
5 10 15 20
0.90
0.95
1.00
Index
The increased shift of p is M=2
(c)
suitable estimate
5 10 15
0.8
0.9
1.0
Index
The increased shift of p is M=3
(d)
suitable estimate
Fig. 6. (a) The 16-cluster data set. (b), (c) and (d) present the correlation values with the increased shift M=1, 2
and 3, respectively.
Our graphical method of correlation comparisons can find a good density estimate. This
estimation method can accomplish the tasks where the bandwidth selection method can do it.
However, our method skips the step of finding all modes of the density estimate so that it is
much simpler and less computational. We estimate a suitable stabilization parameter p where
its operating range is always located between 1 and 50. Note that, if the increased shift M is
large such as M=4 or 5, then it may miss a good estimate for p. However, a too small
increased shift M, such as M<1, may take too much computational time. We suggest that
taking M=1 for the graphical method of correlation comparisons may perform well in most
simulations.
We now explain why we did not choose the point that lies on the dotted line according to
the following two reasons. Note that, the stabilization parameter p is similar to the number of
the bars drawn on the data histogram. The density shape with a large p corresponds to the
16
200
300
p=1
12
34
56y
01
400
3 x2
1
70
65
4
7
(a)
85
95
105
115
p=5
12
34
56y
01
125
135
3 x2
1
70
65
4
7
(b)
60
70
80
p=10
12
34
56y
01
90
100
3 x2
1
70
65
4
7
(c)
45
55
65
75
p=15
12
34
56y
01
85
95
3 x2
1
70
65
4
7
(d)
histogram with a large number of bars and hence has too many modes. Equations (6) and (7)
also show this property. The stabilization parameter p is the power of the kernel function
which takes values between zero and one. Each data point will have the value of equation (8)
being close to one with a large p case and hence the correlation value becomes large. Figure 4
and 6 also show this tendency on the curvilinear tail. According to these two reasons, we
suggest to choose the estimate of p with the point being very close to the dotted line.
Fig. 7. The density estimates of the 16-cluster data set where (a), (b), (c) and (d) are the PESKf̂
values of each
data point with p=1, 5, 10 and 15, respectively.
3.4. A Mean Shift-Based Clustering Algorithm
Since the generalized Epanechnikov kernel PEK has the robustness property as analyzed
in Section 3.1, we use the PEK kernel in the mean shift procedure. By combining the
graphical method of correlation comparisons for the estimation of the stabilization parameter
p, we propose a mean shift-based clustering method (MSCM) with the PEK kernel, called
17
)MSCM(K PE . The proposed )MSCM(K P
E procedure is therefore constructed with four steps:
(1) Select the kernel PEK ; (2) Estimate the stabilization p; (3) Use the mean shift procedures;
(4) Identify the clusters. The )MSCM(K PE procedure is with its diagram as shown in Fig. 8
and summarized as follows.
The MSCM( PEK ) procedure
Step 1. Choose the PEK kernel.
Step 2. Use the graphical method of correlation comparisons for estimating p.
Step 3. Use the nonblurring mean shift procedure.
Step 4. Identify the clusters.
Fig. 8. The diagram of the proposed )MSCM(K PE procedure.
We first implement the )MSCM(K PE procedure for the data set shown in Fig. 3.
According to Fig. 4(a), p=10 is a suitable estimate for the data set in Fig. 3(a). The results of
the )MSCM(K PE procedure are shown in Fig. 9. The curve represents the density estimate
using PESK
f̂ with p=10. The locations of the data points are illustrated by the histograms.
Figures 9(a)~(f) show these data histograms after implementing the )MSCM(K PE procedure
Select a kernel
(1) Choose the kernel PEK (used in
this paper)
(2) Choose the kernel PG or PC
Estimate the stabilization parameter p
1. Use the proposed graphical method of
correlation comparisons (used in this
paper)
2. Directly assign a value
Use the mean shift procedure
1. Nonblurring mean shift (used in this
paper)
2. General mean shift
3. Blurring mean shift
Identify clusters
1. Merge data
2. Other methods (discuss in Section 4)
Data points
18
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
(a) p=10 T=1
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
(b) p=10 T=520100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
(c) p=10 T=10
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
(d) p=10 T=20
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
(e) p=10 T=50
20100
45
40
35
30
25
20
15
10
5
0
20100
45
40
35
30
25
20
15
10
5
0
(f) p=10 T=77 convergence
with the iterative times T=1, 5, 10, 20, 25 and 77. The )MSCM(K PE procedure is convergent
when T=77. The algorithm is terminated when the locations of all data points are unchanged
when the iterative time achieves the maximum T=100. Figure 9(f) shows that all data points
are centralized to three locations, which are the modes of the density estimate PESK
f̂ . The
)MSCM(K PE procedure finds that these three clusters do indeed match the data structure.
Fig. 9. The density estimates using the )MSCM(K P
E procedure with p=10 and the histograms of the data
points where (a) T=1, (b) T=5, (c) T=10, (d) T=20, (e) T=25 and (f) T=77.
We also implement the )MSCM(K P
E procedure on the data set in Fig. 6(a). According
to the graphical method of correlation comparisons as shown in Fig. 6(b), p=13 is a suitable
estimate. The )MSCM(K PE results after the iterative times T=1, 2 and 5 are shown in Figs.
10(a), 10(b) and 10(c), respectively. The data histograms show that all data points are
centralized to 16 locations which match the data structure when the iterative time T=5. We
mention that no initialization problems occurred with our method because the stabilization
parameter p is estimated by the graphical method of correlation comparisons. We can also
solve the cluster validity problem by merging the data points which are similar to the many
progressive clustering methods [30], [37], [38].
19
(a) P=13 T=1 (b) P=13 T=2 (c) P=13 T=5 Convergence
Fig. 10. (a), (b) and (c) are the histograms of the data locations after using the )MSCM(K PE procedure with
iterative time T=1, 2 and 5, respectively.
We have shown that the mean shift can be a robust clustering method by definition of
the M-estimator and also discussed the ϕ function that denotes the degree of influence of an
individual observation for the kernels PG , PC and PEK . In the comparisons of
computational time, our graphical method of correlation comparisons for finding the
stabilization parameter p is much faster than with the bandwidth selection method. Finding all
cluster centers for the bandwidth selection method requires the computational complexity
)( 12stnO for the first selected bandwidth where s and 1t denote the data dimension and
the algorithm iteration number, respectively. Suppose that we choose the maximum 50
bandwidths of the operating range for the graphical method of correlation comparisons, then
this would require the computational complexity )( 50
12 ∑ =i itsnO for finding a suitable
bandwidth. In the )MSCM(K PE procedure, the stabilization parameter p is always in [1, 50]
because we normalize the kernel function. Thus, it requires )50( 2snO to find a suitable p
when the increased shift M=1. Then, the )MSCM(K PE procedure should be faster if we set
the increased shift M to be greater than 1.
We finally further discuss the )MSCM(K PE procedure. The first step is to choose a
kernel function where we recommend the selection of PEK . You may also choose the kernels
PG and PC . The second step is to estimate the stabilization parameter p using the proposed
graphical method of correlation comparisons. Of course, the user can also directly assign a
value for p. In our experiments, a suitable p always fell between 10 and 30. The third step is
to implement the mean shift procedure. Three kinds of mean shift procedures mentioned in
20
0
1
2
3
z
0 1 2 3 4 5 6y-1 0 1 2
3
4
5
07
-18
76543 x210
87
(a)
40302010
1.000
0.995
0.990
0.985
Index
P=20
(b)
0
1
2
3
z
0 1 2 3 4 5 6y-1 0 1 2
3
4
5
07
-18
76543 x210
87
p=20 T=1(c)
0
1
2
3
z
0 1 2 3 4 5 6y-1 0 1 2
3
4
5
07
-18
76543 x210
87
p=20 T=3(d)
0
1
2
3
z
0 1 2 3 4 5 6y-1 0 1 2
3
4
5
07
-18
76543 x210
87
p=20 T=5(e)
0
1
2
3
z
0 1 2 3 4 5 6y-1 0 1 2
3
4
5
07
-18
76543 x210
87
(f) p=20 T=11 convergence
Section 2 can be used. We suggest the nonblurring mean shift. The fourth step is to classify
all data points. If the data set only contains the round shape clusters, the nonblurring mean
shift can label the data points by merging them. However, if the data set contains other cluster
shapes such as line or circle structures, then the data points may not be centralized to some
small locations. In this case, merging the data points will not work well. We next step is to
discuss this problem and also provide some numerical examples and comparisons. We then
apply these to image processing.
4. EXAMPLES AND APPLICATIONS Some numerical examples, comparisons and applications are stated in this section. We
also consider the computational complexity, cluster validity, and improvements of the mean
shift in large continuous, discrete data sets with its application to the image segmentation.
Fig. 11. (a) The data set. (b) The graph of correlation comparisons where p=20 is a suitable estimate. (c), (d), (e)
and (f) are the data locations after using the )MSCM(K PE procedure with iterative time T=1, 3, 5 and 11,
respectively.
4.1. Numerical Examples with Comparisons
Example 3. In Fig. 11(a), a large cluster number data set with 32 clusters is presented
where we also add some uniformly noisy points. The graphical plot of the correlation
21
40302010
0.6
0.5
0.4
PC validity index for FCM clustering algorithmwith random initial values
cluster number c(a)
c=34
40302010
10000
0
-10000
FS validity index for FCM clustering algorithm
with random initial values
cluster number c(b)
c=34
40302010
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
XB validity index for FCM clustering algorithm
with random initial values
cluster number c(c)
comparisons for the data set is shown in Fig. 11(b) where p=20 is a suitable estimate. The
)MSCM(K PE results when T=1, 3, 5 and 11 are shown in Figs. 11(c), 11(d), 11(e) and 11(f),
respectively. We can see that all data points are centralized to 32 locations that suitably
match the structure of data. The results are not affected by the noisy points. The cluster
number can easily be found by merging the data points which are centralized to the same
locations. Thus, the )MSCM(K PE procedure can be a simple and useful unsupervised
clustering method. The superiority of our proposed method to other clustering methods is
discussed below.
Fig. 12. (a), (b) and (c) present the PC, FS and XB validity indexes when the random initial values are assigned.
We also use the well-known fuzzy c-means (FCM) clustering algorithm [39] to cluster
the same data set shown in Fig. 11(a). Suppose that we do not know the cluster number of the
data set. We adopt the validity indexes, such as the partition coefficient (PC) [40], Fukuyama
and Sugeno (FS) [41], and Xie and Beni (XB) [42], to solve the validity problem when using
FCM. Therefore, we need to process the FCM algorithm for each cluster number c=2, 3 …
However, the first problem is to assign the locations of the initial values in FCM. We simulate
two cases of assignments with the random initial values and the designed initial values.
Figures 12(a), 12(b) and 12(c) present the PC, FS and XB validity indexes when the random
initial values are assigned. FCM can not detect those 32 separated clusters very effectively,
but it can when the designed initial values are assigned as shown in Fig. 13. Note that, the PC
index has a large value tendency when the cluster number is small. Thus, a local optimal
value of the PC index curve may present a good cluster number estimate [43]. In the case of
the designed initials assignment, all validity measures should have an optimal solution with
c=32. However, this result is obtained only when the FCM algorithm can divide the data into
32 well-separated clusters. If the initializations are not properly chosen (for example, no
centers are initialized in one of these 32 rectangles), we can not ensure good partitions being
22
40302010
0.6
0.5
0.4
PC validity index for FCM clustering algorithmwith designed initial values
cluster number c(a)
c=32
40302010
10000
0
-10000
FS validity index for FCM clustering algorithmwith designed initial values
cluster number c(b)
c=32
40302010
0.6
0.5
0.4
0.3
0.2
0.1
0.0
XB validity index for FCM clustering algorithmwith designed initial values
cluster number c(c)
c=32
found by the FCM algorithm. In fact, most clustering algorithms always have this
initialization problem. This is a more difficult problem in a high-dimensional case. Here we
find that our proposed )MSCM(K PE procedure has the necessary robust property for the
initialization.
Fig. 13. (a), (b) and (c) present the PC, FS and XB validity indexes when the designed initial values are
assigned.
We had mentioned that the computational complexity of finding the stabilization
parameter estimate p in )MSCM(K PE is )50( 2snO . After estimating p, the computational
complexity of implementing the )MSCM(K PE procedure is )( 2
KstnO where Kt denotes
the number of iterations. In FCM, the computational complexity is )( CncstO , where Ct is
the number of iterations and c is the cluster number. For solving the validity problem, we
need to process FCM from c=2 to maxCc = where maxC is the maximum number of clusters.
Thus, the computational complexity of FCM is ))1max(( max
1∑ =
=−
Cc
c CtsCnO . Although the
complexity of the )MSCM(K PE procedure is high when the data set is large, it can directly
solve the validity problem in a simple way. In order to reduce the complexity of the
)MSCM(K PE procedure for the large data set, we will propose a technique to deal with it
later.
4.2. Cluster Validity and Identified Cluster for the )MSCM(K PE procedure
Note that, after we have found the )MSCM(K PE clustering results, we had simply
identified clusters and the cluster number by merging the data points which are centralized to
the same location. However, if the data set contains different shape clusters such as a line or a
circle structure, the data points may not be centralized to the same small region. On the other
hand, the estimated density shape may have flat curves on the modes. In this case, the method
23
3210-1-2
2
1
0
-1
-2
(a)
40302010
1.000
0.995
0.990
Index
p=28
(b)
-2 -1 0 1 2 3
-2
-1
0
1
2
(c)
of identifying clusters by merging data points may not be an effective means of finding the
clusters. If we use the Agglomerative Hierarchical Clustering (AHC) algorithm with some
linkage methods such as single linkage, complete linkage and Ward’s method, etc., we can
identify different shape clusters from the )MSCM(K PE clustering results. We demonstrate
this identified method with AHC as follows.
Fig. 14. (a) The data set. (b) The graph of correlation comparisons where p=28 is a suitable estimate. (c) The
data locations after using the )MSCM(K PE procedure.
Example 4. Figure 14(a) shows a three-cluster data set with different cluster shapes. The
graphical plot of the correlation comparisons is shown in Fig. 14(b) where p=28 is a suitable
stabilization parameter estimate. The locations of all data points after processing the
)MSCM(K PE are shown in Fig. 14(c). In this situation, the identified method of merging data
points may cause difficulties. Since the mean shift procedure can be seen as the method that
shifts the similar data points to the same location, the AHC with single linkage can help us to
find clusters so that similar data points can be easily linked into the same cluster. We process
the AHC with single linkage for the )MSCM(K PE clustering results shown in Fig. 14(c). We
obtain the hierarchical clustering tree as shown in Fig. 15(a). The hierarchical tree shows that
the data set contains three well-separated clusters. These clusters are shown in Fig. 15(b) with
different symbols. In fact, most clustering algorithms are likely to fail when they are applied
to the data set shown in Fig. 14(a).
By combing )MSCM(K PE with AHC, these methods can detect different shape clusters
for most data sets. On the other hand, it may also decrease the convergence time for the
)MSCM(K PE procedure. Note that, the estimated density shape may have flat curves on the
modes for the data set so that the )MSCM(K PE procedure is too time consuming in this case,
especially for a large data set. Since the AHC method can connect the close data points into
the same cluster, we can then stop the )MSCM(K PE procedure when the data points shift to
24
-2 -1 0 1 2 3
-2
-1
0
1
2
12
34 5 6 7
89
10
111213 14
1516 17 18 19
20 2122 2324
25262
72829 30
3132
33 34
35 363738 39 40 4
1 424344 45 46 4748 49 50
5152 53 5455
56 5758 59 6
0
6162 63 64 65 6667 6869
70 7172 73 7
4 75
76 7778 79
8081 82 83
84 85 86 87 88 89 90
91 92 93 9495 96 97
98 99100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115 11
611
711
8
119
120
121
122
123
124
125
126
127 128
129
130
131
132
133
134
135136
137 13
813
914
0
141
142
143 14
4
145
146
147
148
14915
015
115
2 153
154
155
156
157 15
8
159
160
16116
216
3
164
165
166
167
168
169 17017
1
172
173
174
175
176
177
178 179
180
181
182 183
184
185
186
187
188 189 19
0 191
192
193
194
195
196 19
7198
199
200
20120
2 203 20
4
205
206
207
208
209 21
0211
212
213
214
215
216
217
218 21
9
220
221
222
223
224 22
5
0.0
0.2
0.4
0.6
0.8
Hei
ght
the location near the mode. Therefore, we can set a larger stopping threshold for the
)MSCM(K PE procedure or set a smaller maximum iteration.
(a) (b) Fig. 15. (a) The Hierarchical Tree of the data set in Figure 14 (c). (b) Identified three clusters with different
symbols.
4.3. Implementation for Large Continuous Data Sets
We know that one important property of the )MSCM(K PE procedure is to take all data
points as the initial values so that these data points can be centralized to the locations of the
modes. The cluster validity problems can be solved simultaneously in this way. However, this
process wastes too much time for a large data set. One way to reduce the number of data
points for the )MSCM(K PE procedure is to first merge similar data points from the original
data set. Although this technique can reduce the number of data points, the )MSCM(K PE
clustering results will differ as to the original nonblurring mean shift procedure. Since the
mode represents the location of where most data are located, these data points will shift to the
mode at a few mean shift iterations. However, these data points have to continue to process
the mean shift procedure until all data points are centralized to these modes. This is because
the mean shift procedure is treated as a batch type.
Although all data points are taken to be the initial values, there are no constrains on
these initial values when we process the mean shift clustering algorithm using equation (3).
This means that we can treat the mean shift as a sequential clustering algorithm. The second
data point is inputted only when the first data point shifts to the mode. The third data point is
inputted only when the second data point shifts to the mode. The rest points can be similarly
induced. In this sequential-type mean shift procedure, only the data points that are far away
from modes require more iteration. However, most data points (nearby the modes) require
25
less iteration. The computational complexity of this sequential-type mean shift procedure is
∑ =
n
i i )tns(O1
where it denotes the iterative count of data point ix . However, the iteration
of each data point will be all equal to the largest iteration number in the batch-type mean shift
procedure where the computational complexity is )( 2KstnO with }t,,tmax{t nK Λ1= .
Note that, this technique is only feasible for the nonblurring mean shift procedure, such
as the proposed )MSCM(K PE procedure. Since the blurring mean shift algorithm update the
data points at each iterative, the location of each data point will be changed so that the
sequential type is not feasible for the blurring mean shift procedure.
4.4. Implementation for Large Discrete Data Sets
Suppose that the data set },,{ 1 nxx Λ only takes values on the set },,{ 1 myy Λ with
corresponding counts },,{ 1 mnn Λ . That is, there are 1n observations of },,{ 1 nxx Λ with
values 1y , 2n observations of },,{ 1 nxx Λ with values 2y , etc. For example, a 128*128 grey
level image will have 16384 data points (or pixels). However, it only take values on the grey
level set }255,,1,0{ Λ . The mean shift procedure has large computational complexity in this
n=16384 case. The following theorem is a technique that can greatly reduce computational
complexity.
Theorem 1. Suppose that }x,,x{x nΛ1∈ , }y,,y{y mΛ1∈ and yx = . The mean shift of
y is defined by
i
m
i ii
m
i iiiiK
n)y(w)||yy(||k
yn)y(w)||yy(||k)y(my
∑∑
=
=
−
−==
12
12
(9)
Then we have )y(m)x(m KK = . That is, the mean shift of x using equation (3) can be
replaced by the mean shift of y using equation (9) with the equivalent results.
Proof: Since },,{ 1 nxx Λ only take values on the set },,{ 1 myy Λ with corresponding counts
},,{ 1 mnn Λ , we have im
i ii n)y(w)||yy(||k∑=−
12 = ∑ =
−n
j jj )x(w)||xx(||k1
2 and
iim
i ii yn)y(w)||yy(||k∑ =−
12 = i
n
j jj x)x(w)||xx(||k∑ =−
12 . Thus, )y(m)x(m KK = . □
We can take },,{ 1 myy Λ to be initial values and then process the mean shift procedure
using equation (9) for the )MSCM(K PE procedure. The final locations of },,{ 1 myy Λ using
equation (9) will be equivalent to the final locations of },,{ 1 nxx Λ using equation (3). In this
discrete data case, the computational complexity for the )MSCM(K PE procedure becomes
26
)( 2stmO where t denotes the iterative counts and m<n. The previously discussed sequential
type of continuous data case can also be used to greatly reduce computational complexity in
this discrete data case. The following is a simple application in the image segmentation.
(a) (b) (c) (d)
Fig. 16. (a) The Lenna image with size 128*128. (b) The results using the )MSCM(K PE procedure with
p=15 when T=5. (c) The results with p=15 when T=15. (d) The results with p=15 when T=98.
4.5. Application in Image Segmentation
Example 5. Figure 16(a) is the well-known Lenna image with the size 128*128. This image
contains 16384 pixel values with a maximum value of 237 and minimum value of 33. This
means that it only takes values on }237,,34,33{ Λ and the count frequency of these pixel
values is shown in Fig. 17(a). The computational complexity of the original nonblurring
mean shift procedure is )16384()( 22 stOstnO = . After applying the described technique
using equation (9), the computational complexity of the nonblurring mean shift procedure in
this discrete case becomes )205()( 22 stOstmO = which actually greatly reduces
computational complexity. The graph of correlation comparisons for the )MSCM(K PE
procedure is shown in Fig. 17(b) where p=15 is a suitable estimate. The estimated density
shape with p=15 is shown in Fig. 17(c) which quite closely matches the pixel counts with
four modes. The )MSCM(K PE clustering results when t=5, t=15 and t=98 are shown in Figs.
16(b), 16(c) and 16(d), respectively. Following convergence, all pixel values are centralized
to four locations presented by the arrows in the x-coordinate as shown in Fig. 17(d). Note that,
for most c-means based clustering algorithms, c=8 is mostly used as the cluster number of the
Lenna data set. If we combine a validity index to search for a good cluster number estimate
for this data, the computational complexity should become very large. In the )MSCM(K PE
procedure, a suitable cluster number estimate is found automatically and the computational
27
25015050
200
100
0
pixel
coun
t
(a)
40302010
1.000
0.999
0.998
0.997
Index
corr
(b)
P=15
25015050
pixel25015050
pixel(c)
P=15
0 50 100 200 250
pixel0 50 100 150 200 250
50
150
250
P=15 T=98pi
xel
(d)
complexity for this discrete large data set can be reduced by using the above mentioned
technique.
Fig. 17. (a) The counts of the pixel of the Lenna image. (b) The graph of correlation comparisons where p=15 is
a suitable estimate. (c) The density estimate with p=15. (d) All pixel values are centralized to four locations after
using the )MSCM(K PE procedure with iterative time T=98.
For color image data, each pixel contains a three-dimensional data point that each
dimension takes values on the set }255,,1,0{ Λ and hence we will have 255*255*255
possible pixel values. However, for a 128*128 color image, the worst situation is to have
128*128 different pixel values. In general, an image data will contain many overlapping pixel
values and hence the technique can also significantly reduce computational complexity even
for color images.
4.6. Comparisons to other kernel-based clustering algorithms
Comaniciu [10] proposed the variable-bandwidth mean shift with data-driven bandwidth
selection. To demonstrate their bandwidth selection method, Comaniciu [10] used the data set
drawn with equal probability from normal distributions N(8,2), N(25,4), N(50,8) and
N(100,16) with total n=400. Figure 18 (a) shows the histogram of this data set. Comaniciu
28
0 50 100 150
0
10
20
30
40
50
60
data
Freq
uenc
y
(a)
10 20 30 40
0.992
0.993
0.994
0.995
0.996
0.997
0.998
0.999
1.000
Index
corr
P=20
(b)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
010
2030
4050
Hei
ght
(c)
[10] used 12 analysis bandwidths in the range of 1.5-17.46 where four classes of bandwidth
were detected (see Ref. [10], pages 284-285). Based on our proposed )MSCM(K PE , the
graph of correlation comparisons for the data set of Fig. 18 (a) gives p=8 and there are four
clusters presented as shown in Fig. 18 (b) and Fig. 18 (c), respectively.
Fig. 18. (a) Histogram of the data set. (b) The graph of correlation comparisons. (c) The Hierarchical Tree of the
data after implementing the )MSCM(K PE algorithm.
Furthermore, we use the data set of nonlinear structures with multiple scales in
Comaniciu [10] for our next comparisons as shown in Fig. 19 (a). The graph of correlation
comparisons, the Hierarchical Tree of the data after implementing the )MSCM(K PE
algorithm, and four identified clusters presented in different symbols are shown in Figs. 19
(b), 19 (c) and 19 (d), respectively. These results with four identified clusters from
)MSCM(K PE are coincident to those of the variable-bandwidth mean shift in Comaniciu [10].
Note that, the variable-bandwidth mean shift in Comaniciu [10] is a good technique to find a
suitable bandwidth for each data point, but it requires implementing the fixed-bandwidth
mean shift with many different selected bandwidths. Hence, it is time consuming. In our
)MSCM(K PE method, we are not to focus on estimating good bandwidth for each data point,
but on estimating the stabilization parameter p so that the )MSCM(K PE with this suitable p
can provide suitable final results. In this case, it requires implementing the mean shift process
only one time.
29
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
-1
0
1
2
(a)
1.51.00.50.0-0.5-1.0-1.5-2.0-2.5
2
1
0
-1
data(d)
Fig. 19. (a) Data set. (b) The graph of correlation comparisons. (c) The Hierarchical Tree of the data after
implementing the )MSCM(K PE algorithm. (d) Four identified clusters presented in different symbols.
The kernel k-means proposed by Girolami [4] is another kernel-based clustering
algorithm. The cluster numbers of the data sets Iris (c=3), Wine (c=3) and Crabs (c=4) are
correctly detected by kernel k-means (see Ref. [4], pages 783-784). We implement the
proposed method for the data sets Iris, Wine and Crabs where the results of these real data
sets from )MSCM(K PE are shown in Figs. 20, 21, and 22, respectively. The )MSCM(K P
E
algorithm with p=10 indicates that c=2 is suitable for the Iris data. However, there are some
clues to see c=3 is also suitable for the Iris data according to the results of Fig. 20 (b). Note
that, this is because one of the clusters of Iris is separable from the other two overlapping
clusters. The results of Wine data from the )MSCM(K PE as shown in Fig. 21 (b) give the
possible cluster numbers c=2, c=3 or c=5. The results from the )MSCM(K PE for the Crabs
data as shown in Fig. 21 (b) are the same as kernel k-means with c=4.
Although the mean shift can move data points to the data dense regions, these regions
may be very flat so that the proposed )MSCM(K PE with a linkage method, such as single
linkage, may not give a clear cluster number estimate. This situation appears in some real
40302010
1.00
0.99
0.98
0.97
0.96
0.95
0.94
0.93
0.92
Index
corr
P=8
(b)
1 234 5678 910 1112 131415 1617 1819 20 212223 242526 2728 293031 32 3334 3536 37 383940 41 4243
44
45 4647 4849 5051 5253 5455 5657 5859 606162 63 64 6566 6768 69 7071 7273 7475767778 79 8081 828384 85 86 8788 89 90 9192 9394 959697 98 99 100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Hei
ght
(c)
30
10 20 30 40
0.997
0.998
0.999
1.000
Index
P=20
(a)
10 20 30 40
0.94
0.95
0.96
0.97
0.98
0.99
1.00
Index
P=10
(a)
1 2 3 45 6 7 8 91011 1213 1415 161718 192021 22 23 24 2526 272829 30 3132 33 34 35 3637 38 394041 424344 45 4647 4849 50 51 5253 5455 5657 5859 60 61626364 6566 676869 7071 7273 74 757677 7879 80 8182 8384 858687 88 8990 9192 93 9495 96 9798 99100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
01
23
4
Hei
ght
1 2 3 4 567 89 10 1112 1314 1516 1718
19
2021 2223 24 25 262728 2930 31 3233 3435 36 3738 39 40 414243 44 454647 4849 5051 52 535455 56 57 58 59 6061 6263 64 6566 676869 70 71 727374 75 76 777879 80 8182 83 8485 86 87 8889 90 9192 93 94959697 9899 100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
020
040
060
080
010
0012
00
Hei
ght
data applications. We recommend that we use more linkage methods such as complete
linkage, Ward’s method, etc., to offer more clustering information. In our final example, we
use the Cmc data set from the UCI machine learning repository [44] in the comparisons of the
proposed )MSCM(K PE with kernel k-means. There are 10 attributes, 1473 observations and
three clusters in the Cmc data set. The )MSCM(K PE results for the Cmc data set are shown
in Fig. 23 where the single linkage, complete linkage and Ward’s method indicate that the
data set contains three clusters that are coincident to the data structure. Although kernel
k-means using the eigenvalue decomposition of the kernel matrix proposed by Girolami [4]
can estimate the correct cluster number for the Iris, Wine and Crabs data sets, the estimates of
the cluster number quite depend on the selected radial basis function (RBF) kernel width. The
eigenvalue decomposition results with different RBF kernel widths for the Cmc data set
based on the kernel matrix are shown in Fig. 24. The correct result c=3 shown in Fig. 24(a)
depends on the suitable chosen RBF kernel width 15. However, other choices with 10 and 5
as shown in Figs. 24(b) and 24(c) cannot detect the cluster number of c=3.
(b)
Fig. 20. (a) The graph of correlation comparisons and (b) The Hierarchical Trees after implementing the
)MSCM(K PE algorithm for the Iris data.
(b)
Fig. 21. (a) The graph of correlation comparisons and (b) The Hierarchical Trees after implementing the
)MSCM(K PE algorithm for the Wine data.
31
10 20 30 40
0.995
0.996
0.997
0.998
0.999
1.000
Index
P=19
(a)
1 2 3 4 56 789 10 111213 14 1516 17 1819 202122 23 2425 26272829 30 313233 3435 36 3738 3940 41 4243 444546 47484950 51 52 53545556 57 5859 606162 63646566 67686970 71 72737475 76 77 78 798081 8283 84 85 8687 888990 91 9293 949596 979899100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
05
1015
2025
Hei
ght
5040302010
1.000
0.995
0.990
Index
p=18
0.00
0.01
0.02
0.03
0.04
0.05
0.06
1 2 3 4 5 6 7 8 9 10
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
1 2 3 4 5 6 7 8 9 10
1 2 3 4 56 789 10 111213 14 1516 17 1819 202122 23 2425 26272829 30 313233 3435 36 3738 3940 41 4243 444546 47484950 51 52 53 545556 57 5859 60 6162 63646566 67686970 71 727374 75 7677 78 798081 8283 84 85 8687 888990 91 9293 9495 96 9798 99100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653 65
4
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
010
020
030
0
Hei
ght
Fig. 22. (a) The graph of correlation comparisons and (b) The Hierarchical Trees after implementing the
)MSCM(K PE algorithm for the Crabs data.
(a) (b)
(c) (d) Fig. 23. (a) Estimated p for Cmc data. (b), (c) and (d) are the single linkage, complete linkage and Ward’s
method for the Cmc data after implementing the )MSCM(K PE algorithm.
(a) (b) (c) Fig. 24. The eigenvalue decomposition of the kernel matrix to estimate the cluster number of the Cmc data.
(a), (b) and (c) are the results of the RBF kernel widths with 15, 10 and 5, respectively.
1 2 3 45 678 9 10 111213 14 1516 1718 1920 21 222324 252627 2829 3031 32 33 3435 363738 394041 424344 45 4647 4849
50
5152 53 54 5556 57 5859 6061 6263646566 67 6869 7071 72 73 747576 777879 808182 8384 85 8687 88 89 90 9192 9394 9596 9798 99100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
050
010
0015
00
Hei
ght
(b)
1 2 3 4 56 789 10 111213 14 1516 17 1819 202122 23 2425 26272829 30 313233 3435 363738 3940 41 4243 444546 47484950 51 52 53 545556 57 5859 60 6162 63646566 67686970 71 727374 75 7677 78 798081 8283 84 85 8687 888990 91 9293 9495 96 9798 99100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313 1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
02
46
810
12
Hei
ght
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
1 2 3 4 5 6 7 8 9 10
32
Finally, we mention that merging data points using AHC after the mean shift process
)MSCM(K PE not only provide the estimated cluster number, but also can give more
information about the data structure. For example, the observation 19 of Wine data as shown
in Fig. 21 (b) and the observation 141 of Crabs data as shown in Fig. 22 (b) can be been as
the abnormal observations. Figure 18(c) also shows that the observation 314 is far away from
other data points. In unsupervised clustering, our object is not only to detect the correct
cluster number for an unknown-cluster-number data set, but also to discover a reasonable
cluster structure for the data set.
5. CONCULSIONS
In this paper, we proposed a mean shift-based clustering procedure, called )MSCM(K PE .
The proposed )MSCM(K PE can be robust with three facets. In facet 1, since we combined the
mountain function to estimate the stabilization parameter, the bandwidth selection problem
for the density estimation can be solved in a graphical way. Also the operating range of the
stabilization parameter is always fixed for various data sets. This led to the proposed method
being robust for the initializations.
In facet 2, we discussed the problem of the mean shift procedure faced in the problems
of cluster validity and identified clusters. According to the properties of the nonblurring mean
shift procedure, we suggested combining the Agglomerative Hierarchical Clustering with the
single linkage to identified different shaped clusters. This technique can also save the
computational complexity of the )MSCM(K PE procedure by setting a larger stopping
threshold or setting a smaller maximum iterative count. Thus, the proposed method can be
robust for different cluster shapes in the data set.
In facet 3, we analyzed the robust properties of the mean shift procedure according to
the nonparametric M-estimator and the ϕ functions of three kernel classes. We then focused
on the generalized Epanechnikov kernel where extremely large or small data points will have
no influence on the mean shift procedures. Our demonstrations showed that the proposed
method is not influenced by noise and hence is robust as to the noise and outliers. We also
provided some numerical examples, comparisons and applications to illustrate the superiority
of the proposed )MSCM(K PE procedure including the computational complexity, cluster
validity, improvements of the mean shift in large continuous, discrete data sets and image
33
segmentation.
Acknowledgements
The authors are grateful to the anonymous referees for their critical and constructive
comments and suggestions to improve the presentation of the paper. This work was supported
in part by the National Science Council of Taiwan, under Kuo-Lung Wu’s Grant:
NSC-95-2118-M-168-001 and Miin-Shen Yang’s Grant: NSC-95-2118-M-033-001-MY2.
34
References [1] V. N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[2] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines,
Cambridge Univ. Press, Cambridge, 2000.
[3] F. Camastra and A. Verri, A novel kernel method for clustering, IEEE Trans. Pattern
Analysis and Machine Intelligence 27 (2005) 801-805.
[4] M. Girolami, Mercer kernel based clustering in feature space, IEEE Trans. Neural
Networks 13 (2002) 780-784.
[5] K.R. Muller, S. Mika, G. R a&&tsch, K. Tsuda, and B. Sch o&&lkopf, An introduction to
kernel-based learning algorithm, IEEE Trans. Neural Networks 12 (2001) 181-202.
[6] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall,
New York, 1998.
[7] K. Fukunaga and L.D. Hostetler, The estimation of the gradient of a density function with
applications in pattern recognition, IEEE Trans. Information Theory 21 (1975) 32-40.
[8] Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Analysis and
Machine Intelligence 17 (1995) 790-799.
[9] R.R. Yager and D.P. Filev, Approximate clustering via the mountain method, IEEE Trans.
Systems, Man and Cybernetics 24 (1994) 1279-1284.
[10] D. Comaniciu, An algorithm for data-driven bandwidth selection, IEEE Trans. Pattern
Analysis and Machine Intelligence 25 (2003) 281-288.
[11] D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature space analysis,
IEEE Trans. Pattern Analysis and Machine Intelligence 24 (2002) 603-619.
[12] K.I. Kim, K. Jung and J.H. Kim, Texture-based approach for text detection in images
using support vector machines and continuously adaptive mean shift algorithm, IEEE
Trans. Pattern Analysis and Machine Intelligence 25 (2003) 1631-1639.
[13] X. Yang and J. Liu, Unsupervised texture segmentation with one-step mean shift and
boundary markov random fields, Pattern Recognition Letters 22 (2001) 1073-1081.
[14] D. Comaniciu, V. Ramesh and P. Meer, Kernel-based object tracking, IEEE Trans.
Pattern Analysis and Machine Intelligence 25 (2003) 564-577.
[15] N.S. Peng, J. Yang and Z. Liu, Mean shift blob tracking with kernel histogram filtering
and hypothesis testing, Pattern Recognition Letter 26 (2005) 605-614.
[16] O. Debeir, P.V. Ham, R. Kiss and C. Decaestecker, Tracking of migrating cells nuder
phase-contrast video microscopy with combined mean-shift processes, IEEE Trans.
35
Medical Imaging 24 (2005) 697-711.
[17] H. Chen and P. Meer, Robust fusion of uncertain information, IEEE Trans. Systems,
Man and Cybernetics-Part B: Cybernetics 35 (2005) 578-586.
[18] M. Fashing and C. Tomasi, Mean shift is a bound optimization, IEEE Trans. Pattern
Analysis and Machine Intelligence 27 (2005) 471-474.
[19] S.L. Chiu, Fuzzy model identification based on cluster estimation, J. Intelligent and
Fuzzy Systems 2 (1994) 267-278.
[20] M.S. Yang and K.L. Wu, A modified mountain clustering algorithm, Pattern Analysis and
Applications 8 (2005) 125-138.
[21] N.R. Pal and D. Chakraborty, Mountain and subtractive clustering method:
Improvements and generalizations, Int. J. Intelligent Systems 15 (2000) 329-341.
[22] B.P. Velthuizen, L.O. Hall, L.P. Clarke and M.L. Silbiger, An investigation of mountain
method clustering for large data sets, Pattern Recognition 30 (1997) 1121-1135.
[23] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990.
[24] G. Beni and X. Liu, A least biased fuzzy clustering method, IEEE Trans. Pattern Analysis
and Machine Intelligence 16 (1994) 954-960.
[25] P.J. Huber, Robust Statistics. Wiley, 1981.
[26] X. Zhuang, T. Wang, and P. Zhang, A highly robust estimator through partially likelihood
function modeling and its application in computer vision, IEEE Trans. Pattern Analysis
and Machine Intelligence 14 (1992) 19-35.
[27] K.L. Wu and M.S. Yang, Alternative c-means clustering algorithms, Pattern Recognition
35 (2002) 2267-2278.
[28] C.V. Stewart, Minpran: A new robust estimator for computer vision, IEEE Trans. Pattern
Analysis and Machine Intelligence 17 (1995) 925-938.
[29] R.N. Dave and R. Krishnapuram, Robust clustering methods: A unified view, IEEE
Trans. Fuzzy Systems 5 (1997) 270-293.
[30] H. Frigui and R. Krishnapuram, A robust competitive clustering algorithm with
applications in computer vision, IEEE Trans. Pattern Analysis and Machine Intelligence
21 (1999) 450-465.
[31] K. Rose, E. Gurewitz and G.C. Fox, Statistical mechanics and phase transitions in
clustering, Physical Review Letters 65 (1990) 945-948.
[32] K. Rose, E. Gurewitz, and G..C. Fox, A deterministic annealing approach to clustering,
Pattern Recognition Letters 11 (1990) 589-594.
[33] S. Chen and D. Zhang, Robust image segmentation using FCM with special constraints
36
based on new kernel-induced distance measure, IEEE Trans. Systems, Man and
Cybernetics-Part B: Cybernetics 34 (2004) 1907-1916.
[34] M.N. Ahmed, S.M. Yamany, N. Mohamed, A.A. Farag and T. Moriarty, A modified
fuzzy c-means algorithm for bias field estimation and segmentation of MRI data, IEEE
Trans. Medical Imaging 21 (2002) 193-199.
[35] M.S. Yang and K.L. Wu, A similarity-based robust clustering method, IEEE Trans.
Pattern Analysis and Machine Intelligence 26 (2004) 434-448.
[36] R. Krishnapuram and J.M. Keller, A possibilistic approach to clustering, IEEE Trans.
Fuzzy Systems 1 (1993) 98-110.
[37] R. Krishnapuram, H. Frigui, and O. Nasraoui, Fuzzy and possibilistic shell clustering
algorithm and their application to boundary detection and surface approximation, IEEE
Trans. Fuzzy Systems 3 (1995) 29-60.
[38] H. Frigui and R. Krishnapuram, Clustering by competitive agglomeration, Pattern
Recognition 30 (1997) 1223-1232.
[39] J.C. Bezdek, Pattern Reccognition with Fuzzy Objective Function Algorithm, Plenum
Press, 1981.
[40] J.C. Bezdek, Cluster validity with fuzzy sets, J. Cybernetics 3 (1974) 58-73.
[41] Y. Fukuyama and, M. Sugeno, A new method of choosing the number of clusters for
fuzzy c-means method, In: Proceedings of the 5th Fuzzy System Symposium (in
Japanese), pp. 247–250. 1989.
[42] X.L. Xie and G. Beni, A validity measure for fuzzy clustering, IEEE Trans. Pattern
Analysis and Machine Intelligence 13 (1991) 841-847.
[43] K.L. Wu and M.S. Yang, A cluster validity index for fuzzy clustering, Pattern
Recognition Letters 26 (2005) 1275-1291.
[44] C.L. Blake and C.J. Merz, UCI repository of machine learning databases, a huge
collection of artificial and real-world data sets, 1998. Availabe from:
\underline{http://www.ics.uci.edu/ \sim mlearn/MLRepository.html}.