吳國龍 Mean Shift-Based Clusteringir.lib.ksu.edu.tw/bitstream/987654321/5029/1/(吳國龍... ·...

1

Mean Shift-Based Clustering

Kuo-Lung Wua and Miin-Shen Yangb

aDepartment of Information Management, Kun Shan University of Tchnology, Yung-Kang,

Tainan, Taiwan 71023, ROC

bDepartment of Applied Mathematics, Chung Yuan Christian University, Chung-Li, Taiwan

32023, ROC

Abstract- In this paper, a mean shift-based clustering algorithm is proposed. The mean shift

is a kernel-type weighted mean procedure. Herein, we first discuss three classes of Gaussian,

Cauchy and generalized Epanechnikov kernels with their shadows. The robust properties of

the mean shift based on these three kernels are then investigated. According to the mountain

function concepts, we propose a graphical method of correlation comparisons as an

estimation of defined stabilization parameters. The proposed method can solve these

bandwidth selection problems from a different point of view. Some numerical examples and

comparisons demonstrate the superiority of the proposed method including those of

computational complexity, cluster validity, and improvements of mean shift in large

continuous, discrete data sets. We finally apply the mean shift-based clustering algorithm to

image segmentation.

Index Terms: Kernel functions, mean shift, robust clustering, generlized Epanechnikov

kernel, bandwidth selection, parameter estimation, mountain method, noise.

2

1. INTRODUCTION

Kernel-based methods are widely used in many applications [1], [2]. There are two ways

of implementing kernel-based methods along with supervised and unsupervised learning. One

way is to transform the data space into a high-dimensional feature space F where the inner

products in F can be represented by a Mercer kernel function defined on the data space (see

Refs. [1-5]). An alternative way is to find a kernel density estimate on the data space and then

search the modes of the estimated density [6]. The mean shift [7], [8] and the mountain

method [9] are two simple techniques that can be used to find the modes of a kernel density

estimate.

Fukunaga and Hostetler [7] proposed the mean shift procedure based on asymptotic

unbiasedness, consistency and uniform consistency of a nonparametric density function

gradient estimate using a generalized kernel approach. This technique has been applied in,

image analysis [10], [11], texture segmentation [12], [13], objective tracking [14], [15], [16]

and data fusion [17]. Cheng [8] clarified the relationship between mean shift and optimization

by introducing the concept of shadows. He showed that a mean shift is an instance of gradient

ascent with an adaptive step size. Cheng [8] also proved some of the convergence properties

of a blurring mean shift procedure and showed some peculiar behaviors of mean shift in

cluster analysis with the most used Gaussian kernel. Moreover, Fashing and Tomasi [18]

showed that mean shift is a bound optimization and is equivalent to Newton’s method in the

case of piecewise constant kernels.

Yager and Filev [9] proposed mountain methods to find the approximate modes of the

data set via the mountain function, which is equivalent to the kernel density estimate defined

on the grid nodes. Chiu [19] modified the mountain method by defining the mountain

function, not on the grid nodes, but on the data set. Recently, Yang and Wu [20] proposed a

3

modified mountain method to identify the number of modes of the mountain function. The

applications of the mountain method can be found in [21], [22]. In this paper, we will propose

a mean shift-based clustering method using the concept of mountain functions to solve the

bandwidth selection problem in the mean shift procedure.

The bandwidth selection for a kernel function directly affects the performance of the

density estimation. It also heavily influences the performance of the mean shift procedure.

The modes found by the mean shift do not adequately present the dense area of the data set if

a poor bandwidth estimate is used. Comaniciu and Meer [10] summarized four different

techniques according to statistical analysis-based and task-oriented points of view. One

practical bandwidth selection technique is related to the stability of decomposition for a

density shape estimate. The bandwidth is taken as the center of the largest operating range

over which the same number of clusters are obtained for the given data [23]. Similar concepts

are used by Beni and Liu [24] to estimate the resolution parameter of a least biased fuzzy

clustering method, and also to solve the cluster validity problem. In this paper, we will offer

an alternative method of solving this problem. We first normalize the kernel function and then

estimate the defined stabilization parameter. This estimation method will be discussed in

Section 3 based on the mountain function concept.

The properties of the mean shift procedures are reviewed in Section 2.1. We discuss

some special kernels with their shadows and then define generalized Epanechnikov kernels in

Section 2.2. Note that most of mean shift-based algorithms are less likely to include the

property of robustness [25] that is often employed in clustering [26], [27], [28], [29], [30]. In

Section 3.1, we will discuss the relation between the mean shift and the robust statistics based

on the discussed kernel functions. In Section 3.2, we proposed an alternative technique to

solve the bandwidth selection problem which solved the problem from a different point of

view. The purpose of the bandwidth selection is to find a suitable bandwidth (covariance) for

a kernel function of a data point so that a suitable kernel function will induce a good density

4

estimate. Our technique assigns a fixed sample variance for the kernel function so that a

suitable stabilization parameter can induce a satisfactory density estimate. We then propose a

technique to estimate the stabilization parameter using an adaptive mountain function in

Section 3.3. According to our analysis, we propose the mean shift-based clustering algorithm

based on the defined generalized Epanechnikov kernel in Section 3.4. Some numerical

examples, comparisons and applications are stated in Section 4. These include the

computational complexity, cluster validity, improvements in large continuous and discrete

data sets and image segmentation. Finally, conclusions are given in Section 5.

2. MEAN SHIFT

Let },,{ 1 nxxX Λ= be a data set in an s-dimensional Euclidean space sR . Camastra

and Verri [3] and Girolami [4] had recently considered kernel-based clustering for X in the

feature space where the data space is transformed to a high-dimensional feature space F and

the inner products in F are represented by a kernel function. On the other hand, the kernel

density estimation with the modes of the density estimate over X is another kernel-based

clustering method based on the data space [6]. The modes of a density estimate are equivalent

to the location of the densest area of the data set where these locations could be satisfactory

cluster center estimates. In the kernel density estimation, the mean shift is a simple gradient

technique used to find the modes of the kernel density estimate. We first review the mean

shift procedures in the next sub-section.

2.1. Mean Shift Procedures

Mean shift procedures are techniques for finding the modes of a kernel density estimate.

Let RXH →: be a kernel with ).||(||)( 2jxxhxH −= The kernel density estimate is

given by

∑ =−=

n

j jjH xwxxhxf1

2 )()||(||)(ˆ (1)

where )( jxw is a weight function. Based on a uniform weight, Fukunaga and Hostetler [7]

first gave the statistical properties including the asymptotic unbiasedness, consistency and

uniform consistency of the gradient of the density estimate given by

∑ =−′−=∇

n

j jjjH xwxxhxxxf1

2 )()||(||)(2)(ˆ

5

Suppose that there exists a kernel RXK →: with )||(||)( 2jxxkxK −= such that

)()( rckrh =′ where c is a constant. The kernel H is termed a shadow of kernel K (see Ref.

[8]). Then

].)()[(ˆ

)()||(||

)()||(||])()||(||[

)())(||(||)(ˆ

12

12

12

12

xxmxf

xxwxxk

xxwxxkxwxxk

xwxxxxkxf

KK

n

j jj

n

j jjjn

j jj

n

j jjjH

−=

⎥⎥

⎦

⎤

⎢⎢

⎣

⎡−

−

−−=

−−=∇

∑∑

∑

∑

=

=

=

=

(2)

The term )(ˆ)(ˆ)( xfxfxxm KHK ∇=− is called the generalized mean shift which is

proportional to the density gradient estimate. Formulation (2) was first remarked upon in [11]

with the uniform weight case. Taking the gradient estimator )(ˆ xfH∇ to be zero, we derive a

mode estimate as

∑∑

=

=

−

−== n

j jj

n

j jjj

K)x(w)||xx(||k

x)x(w)||xx(||k)x(mx

12

12

(3)

where H is a shadow of kernel K. Equation (3) is also called the weighted sample mean with

kernel K. The mean shift has three kinds of implementation procedures. The first is to set all

data points as initial values and then update each data point jx with )( jK xm , nj ,,1Λ=

(i.e. )( jKj xmx ← ). This procedure is called a blurring mean shift. For the blurring process,

each data point jx is updated with each iteration. Hence the density estimate )(ˆ xf H is also

changed. The purpose of the second one is still to set all data points as initial values, but we

only update the data point x with )( jK xm (i.e. )( jK xmx ← ). This procedure is called a

nonblurring mean shift. In this process, most data points and the density estimate are not

updated. The third one is to choose c initial values where c may be greater or smaller than n

and then update the data point x with )( jK xm (i.e. )( jK xmx ← ). This procedure is called

a general mean shift. Cheng [8] proved some convergence properties of the blurring mean

shift process using (3). Moreover, Comaniciu and Meer [11] also gave some properties for the

discrete data. They also discussed its relation to the Nadaraya-Watson estimator from the

kernel regression and robust M-estimator points of view.

If there is a shadow H of kernel K, then the mean shift procedure could ascertain modes

of a known density estimate )(ˆ xf H . This technique can be used to directly find modes of the

6

density shape estimate. If a shadow of kernel K does not exist or it has not been found yet,

the mean shift can be used to estimate the alternative modes (cluster centers) of the given data

set with an unknown density function.

2.2. Some Special Kernels and Their Shadows

In this subsection, we investigate those special kernels with their shadows. The most

commonly used kernels that are their own shadows are the Gaussian kernels )(xG p as

pj

pj

p xxxxgxG }]||||[exp{)]||(||[)( 22 β−−=−=

with their shadows pSG defined as

)()( xGxSG pp = , 0>p

This means that the mean shift procedures with )( jGxmx P← are used to find the modes of

the density estimate )(ˆ xf PSG. Cheng [8] showed some behaviors of mean shift in cluster

analysis with a Gaussian kernel. The maximum entropy clustering algorithm [31], [32] is a

Gaussian kernel-based mean shift with a special weight function. Chen and Zhang [33] used

the Gaussian kernel-induced distance measure to implement the spatially constrained fuzzy

c-means (FCM) [34] as a robust image segmentation method. Yang and Wu [35] directly used

)(ˆ xf PSG as a total similarity objective function. They then derived a similarity-based

clustering method (SCM) that could self-organize the cluster number and volume according

to the structure of the data.

Another important class of kernels is the Cauchy kernels defined as p

jp

jp xxxxcxC ])||||1[()]||(||[)( 122 −−+=−= β

that are based on the Cauchy density function 12 )1(1)( −+= xxfπ

, ∞<<∞− x

with their shadows defined as

)()( 1 xCxSC pp −= , 1>p .

The mean shift process with )( jCxmx P← is used to find the modes of the density

estimate )(ˆ xf PSC. There is less application with the Cauchy kernels. To improve the weakness

of FCM in a noisy environment, Krishnapuram and Keller [36] first considered relaxing the

constraint of the fuzzy c-partitions summation to 1 and then proposing the so-called

possibilistic c-means (PCM) clustering algorithm. Eventually, these possibilistic membership

functions become the Cauchy kernels. This is the only application of the Cauchy kernels to

clustering what we can find.

7

-2 -1 0 1 2

0.0

0.5

1.0

x

The Gaussian kernels

(a)

p=0.1

p=5

-2 -1 0 1 2

0.0

0.5

1.0

x

The Cauchy kernels

(b)

p=5

p=0.1

-2 -1 0 1 2

0.0

0.5

1.0

x

The generalized Epanechnikon kernel

(c)

p=5

p=0.1

The simplest kernel is the flat kernel defined as

⎪⎩

⎪⎨⎧

>−≤−

=1011

2

2

||xx||if,||xx||if,

F(x)j

j

with the Epanechnikov kernel E(x) as its shadows

⎪⎩

⎪⎨⎧

>−≤−−−

=1||||,01||||,||||1

E(x) 2

22

j

jj

xxifxxifxx

Moreover, the Epanechnikov kernel has the biweight kernel B(x) as its shadow

⎪⎩

⎪⎨⎧

>−≤−−−

=1||||,01||||,)||||1(

B(x) 2

222

j

jj

xxifxxifxx

.

For a more general presentation, we extend the Epanechnikov kernel E(x) to be the

generalized Epanechnikov kernels (x)KPE with the parameter p defined as

⎪⎩

⎪⎨⎧

>−≤−−−

=−=βββ

2

222P

E ||||,0||||,)||||1(

)]||(||[(x)Kj

jP

jPjE xxif

xxifxxxxk

Thus, the generalized Epanechnikov kernels (x)KPE has the corresponding shadows

)(xSK PE defined as

)()( 1 xKxSK PE

PE

+= , 0>p .

Fig. 1. The kernel functions with different p values. (a) Gaussian kernels. (b) Cauchy kernels. (c) Generalized

Epanechnikov kernels.

Note that we have )x(F)x(K E =0 , )x(E)x(K E =1 and )x(B)x(K E =2 when 1=β .

The mean shift process with )( jKxmx P

E← is used to find the modes of the density

estimate )(ˆ xf PESK

. In total, we present three kernel classes with their shadows. These are

8

Gaussian kernels )x(G p with their shadows )x(G)x(SG pp = , Cauchy kernels )x(C p

with their shadows )x(C)x(SC pp 1−= , and the generalized Epanechnikov kernels )x(K PE

with their shadows )x(K)x(SK PE

PE

1+= . The behaviors of these kernels with different p are

shown in Fig. 1. The mean shift procedures using these three classes of kernels can easily

find their corresponding density estimates. The parameters β and p, called the

normalization and stabilization parameters, respectively, greatly influence the performance of

the mean shift procedures for the kernel density estimate. We will discuss these in the next

section.

3. MEAN SHIFT AS A ROBUST CLUSTERING A suitable clustering method should have the ability to tolerate noise and detect outliers

in the data set. Many criteria such as the breakdown point, local-shift sensitivity, gross error

sensitivity and influence functions [25] can be used to measure the level of robustness.

Comaniciu and Meer [11] discussed some relationships of the mean shift and the

nonparametric M-estimator. Here, we provide a more detailed analysis of the robustness of

the mean shift procedures.

3.1. Analysis of Robustness

The mean shift mode estimate )x(mk can be related to the location M-estimator

∑ =−=

n

j j )x(minargˆ1

θρθθ

where ρ is an arbitrary loss measure function. θ̂ can be generated by solving the equation

011

=−=−∂∂ ∑∑ ==

n

j jn

j j )x()x()( θϕθρθ

where )()()( θρθθϕ −∂∂=− jj xx . If the kernel H is a reasonable similarity measure which

takes values on the interval [0, 1], then ( H1− ) could be a reasonable loss function. Thus, a

location M-estimator can be found by

)(ˆmaxarg)||(||maxarg

)]||(||1[minarg)(minargˆ

12

12

1

θθ

θθρθ

θθ

θθ

Hn

j j

n

j jn

j j

fxh

xhx

=−=

−−=−=

∑

∑∑

=

==

This means that )(xmk is a location M-estimator if K is a kernel with its shadow H.

The influence curve (IC) can help us to assess the relative influence of an individual

observation toward the value of an estimate. In the location problem, we have the influence

9

of an M-estimate with

∫ −′

−=

)()()(),;(

ydFyyFyIC

Yθϕθϕθ

where )(yFY denotes the distribution function of Y. The M-estimator has shown that

),;( θFyIC is proportional to its ϕ function [25]. If the influence function of an estimator

is unbounded, an outlier might cause trouble where the ϕ function is used to denote the

degree of influence.

Suppose that },,{ 1 nxx Λ is a data set in the real number space sR and )(xSG P is a

shadow of )(xG P . Then, )(xm PG is a location M-estimator with ϕ function defined to be

)()(2)](1[)( xGxxpxSGdxdxx P

jP

jGP −=−=−β

ϕ .

By applying the L’Hospital’s rule, we derive 0)(lim =−±∞→

xx jGxP

j

ϕ . Thus, we have the

influence curve of )(xm PG with 0),;( =xFxIC j when jx tends to positive or negative

infinity. This means that the influence curve is bounded and the influence of an extremely

large or small jx on the mode estimator )(xm PG is very small. We can also find the

location of jx which has a maximum influence on )(xm PG by solving

0)()( =−∂∂ xxx jGPϕ . Note that both )(xm PC

and )(xm PEK

also have the previous

properties where their ϕ functions are as follows:

)()()1(2)( xCxxpxx PjjC P −

−=−

βϕ

⎪⎩

⎪⎨⎧

≥−

≤−−+

=−β

ββϕ

2

2

)(0

)()()()1(2)(

j

jPEj

jKxxif

xxifxKxxpxxP

E

The ϕ function for the mean shift-based mode seeking estimators with 1=β is

shown in Fig. 2. The influence of an individual jx on )(xm PEK

is zero if β≥− 2)( jxx

and an extremely large or small jx will have no effect on )(xm PEK

. However, the influence

of an extremely large or small jx on )(xm PG and )(xm PC

is a monotone decreasing

function of the stabilization parameter p. An outlier has no influence when p becomes large.

In this paper, we will choose these generalized Epanechnikov kernels for our mean

shift-based clustering algorithm because they can suitably fit the data sets even with

10

10 20 30 40 50 60 70 80 90 100

-2

-1

0

1

2

The phi function of Gaussian kernels

(a)

jx

p=5p=2

p=1

10 20 30 40 50 60 70 80 90 100

-1

0

1

The phi function of Cauchy kernels

(b)

p=5

p=2

p=1.5

x j 100908070605040302010

2

1

0

-1

-2

The phi function of generalized Epanechnikon kernels

(c)

jx

p=1

p=5p=2

extremely large or small data points.

Fig. 2. The ϕ functions with different p values. (a) Gaussian kernels. (b) Cauchy kernels. (c) Generalized

Epanechnikov kernels.

3.2. Bandwidth Selection and Stabilization Parameter

Bandwidth selection greatly influences the performance of kernel density estimation.

Comaniciu and Meer [11] summarized four different techniques according to statistical

analysis-based and task-oriented points of view. Here, we give new consideration to first

normalizing the distance measure by dividing the normalization parameter β and then

focusing on estimating the stabilization parameter p. The normalization parameter β is set

to be the sample variance

n

xxn

j j∑ =−

= 12||||

β where n

xx

n

j j∑ == 1 .

We have the properties of

xn

x

xG

xxGxm

n

j j

n

jP

n

j jP

pGpP ===

∑∑∑ =

=

=

→→

1

1

1

00 )(

)(lim)(lim (4)

and

xxmxmxm PE

PP KpCpGp===

→→→)(lim)(lim)(lim

000. (5)

This means that when p tends to zero, the kernel density estimate has only one mode with the

sample mean. Figure 3 shows the histogram of a three-cluster normal mixture data with its

corresponding density estimates )(ˆ xf PSG, )(ˆ xf PSC

and )(ˆ xf PESK

. In fact, a small p will lead to

only one mode x in the estimated density, as shown in Fig. 3 for the case of p=1. However,

a too larger p will cause the density estimate to have too many modes as shown in Fig. 3 for

the case of p=50. This can be explained as follows. We denote )(max)(ˆ xGxGj

= and

11

0 10 20

0

5

10

15

(a)

Freq

uenc

y

0 5 10 15

0

10

20

30

40

50

60

70

80

(b)

P=1

P=5

P=10

p=50

fGp (x)^

0 5 10 15

0

10

20

30

40

50

60

70

80

90

100

(c)

fCp (x)^

P=1

P=5

P=10

p=50

0 5 10 15

0

10

20

30

40

50

60

70

(d)

fK p (x)^

P=1

P=5

P=10

p=50

E

)(ˆ)()( xGxGxG =′ . We then have

∑∑

∑∑

∑∑

=′

=′

=

=

∞→=

=

∞→∞→=

′

′==

1)(

1)(

1

1

1

1

1])([

])([lim

)(

)(lim)(lim

xG

xG j

n

jP

n

j jP

pn

jP

n

j jP

pGp

x

xG

xxG

xG

xxGxm P . (6)

This means that, as p tends to infinity, the data point which is the closest to the initial value

will become the peak. Hence, we have

)(lim)(lim)(lim xmxmxm PE

PP KpCpGp ∞→∞→∞→== (7)

In this case, each data point will become a mode in the blurring and nonblurring mean shift

procedures.

Fig. 3. (a) The histogram of a three-cluster normal mixture data. (b), (c) and (d) are its corresponding density

estimates )(ˆ xf PSG, )(ˆ xf PSC

and )(ˆ xf PESK

with different p.

The stabilization parameter p can control the performance of the density estimate. This

situation is somehow similar to the bandwidth selection, but with different merits. The

purpose of the bandwidth selection is to find a suitable bandwidth (covariance) for a kernel

function of a data point so that a suitable kernel function can induce a suitable density

estimate. However, our new technique assigns a fixed sample variance for the kernel function

12

and then uses a suitable p to induce a suitable density estimate. As shown in Fig. 3, different

p corresponds to different shapes of the density estimates. A suitable p will correspond to a

satisfactory kernel density estimate so that the modes found by the mean shift can present the

dense area of the data set.

3.3. A Stabilization Parameter Estimation Method

A practical bandwidth selection technique is related to the decomposition stability of the

density shape estimates. The bandwidth is taken as the center of the largest operating range

over which the same number of clusters are obtained for the given data [23]. This means that

the shapes of the estimated density are unchanged over this operating range. Although this

technique can yield a suitable bandwidth estimate, it needs to find all cluster centers (modes)

for each bandwidth over the chosen operating range. It thus requires a large computation. The

selected operating range is also performed case by case. In our stabilization parameter

estimation method, we adopt the above concept but skip the step of finding all modes for

each p. We describe the proposed method as follows.

We first define the following function

∑ =−=

n

j jiH xxhxf1

2 )||(||)(ˆ , ni ,,1Λ= , nj ,,1Λ= . (8)

This function denotes the value of the estimated density shape on the data points and is

similar to the mountain function proposed by Yager and Filev [9] which is used to obtain the

initial values for a clustering method. Note that, the original mountain function is defined on

the grid nodes. However, we define equation (8) on the data points. In equation (8), we can

set H to equal to PSG , PSC and PESK . We now explain how to use equation (8) to find a

suitable stabilization parameter p. Suppose that the density shape estimates are unchanged

with p=10 and p=11, the correlation value of )}(ˆ,),(ˆ{ 1 nHH xfxf Λ between p=10 and p=11

will be very close to one. In this situation, p=10 will be a suitable parameter estimate. In this

way, we can skip the step for finding the modes of the density estimate. In our experiments, a

good operating range for p always falls between 1 and 50. This is because we normalize the

kernel function by dividing the normalization parameterβ .

Example 1. We use the previously described graphical method of correlation comparisons

to find the suitable p for the data set shown in Fig. 3. We have the correlation values of

)}(ˆ,),(ˆ{ 1 nSKSKxfxf P

EPE

Λ and )}(ˆ,),(ˆ{ 1 nSKSKxfxf MP

EMP

E++ Λ as shown in Fig. 4. Figures 4(a),

4(b), 4(c) and 4(d) give the results of the cases with the increased shifts M=1, 2, 3 and 4,

respectively. In Fig. 4(a), the y-coordinate of the first solid circle point denotes the correlation

13

0 10 20 30 40

0.98

0.99

1.00

Index

The increased shift for the stabilization parameter p is M=1

(a)

suitable estinate

0 5 10 15 20

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1.00

Index


(b)

suitable estinate

0 2 4 6 8 10 12 14

0.90

0.95

1.00

Index


(c)

suitable estinate

0 1 2 3 4 5 6 7 8 9 10

0.90

0.95

1.00

Index


(d)

suitable estinate

value between p=1 and p=2. The y-coordinate of the second solid circle point denotes the

correlation value between p=2 and p=3. The 1-th, 2-th, 3-th solid circle points, etc., denote

the correlation values of the respective pairs (p=1, p=2), (p=2, p=3) and (p=3, p=4), etc. In

Fig. 4(b), the y-coordinate of the first solid circle point denotes the correlation value between

p=1 and p=1+2=3. The y-coordinate of the second solid circle point denotes the correlation

value between p=3 and p=5. The 1-th, 2-th, 3-th solid circle points, etc., denote the

correlation values of the respective pairs (p=1, p=3), (p=3, p=5) and (p=5, p=7), etc. The

others can be similarly induced. Since the density shape will be unchanged when the

correlation value is close to one, we may choose p with the solid circle point closed to the

dotted line of value one.

Fig. 4. The correlation values of )}(ˆ,),(ˆ{ 1 nSKSK xfxf PE

PE

Λ and )}(ˆ,),(ˆ{ 1 nSKSK xfxf MPE

MPE

++ Λ where (a)

M=1, (b) M=2, (c) M=3 and (d) M=4.

In Fig. 4(a), with the increased shift M=1, the 10-th solid circle point is very close to the

dotted line (i.e. the correlation value between p=10 and p=11 is very close to one). Hence,

14

200

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

p=10

p=13

The density estimate using generalized Epanechnikon kernel

p=10 is a suitable estimate. We will explain why we do not choose the point that lies on the

dotted line later. In Fig. 4(b), with the increased shift M=2, the 6-th solid circle point is close

to the dotted line (i.e. the correlation value between p=11 and p=13 is close to one). Hence,

p=11 is a suitable estimate. In Fig. 4(c), with the increased shift M=3, the correlation value

between p=13 and p=16 is very close to one. Hence, p=13 is a suitable estimate. Similarly,

p=13 is a suitable estimate as shown in Fig. 4(d). Figure 5 illustrates the density estimates

using )(ˆ xf PESK

with p=10, 11, 12 and 13. We find that these density estimates actually match

the histogram of the data. The following is another example with two dimensional data.

Fig. 5. The data histograms and density estimates using generalized

Epanechnikov kernel with p=10, 11, 12 and 13.

Example 2. We use a 2-dimensional data set in this example. Figure 6(a) shows a 16-cluster

data set where the data points in each group are uniformly generated from rectangles. Figures

6(b), 6(c) and 6(d) are the correlation values with the increased shift M=1, 2 and 3,

respectively. In Fig. 6(b), the 13-th point is close to the dotted line and hence p=13 is a good

estimate. In Fig. 6(c), the 7-th point is close to the dotted line and hence we choose p=13.

Figure 6(d) also indicates that p=13 is a suitable estimate. In this example, our graphical

method of correlation comparisons with different increased shift M presents all the same

results. The density estimates using PESK with p=1, 5, 10 and 15 are shown in Figs. 7(a),

7(b), 7(c) and 7(d), respectively. In section 3.2, equations (4) and (5) show that a small p

value will cause the kernel density estimate to have only one mode with the sample mean.

Figures 3 and 7(a) also verify this point. The selected p=13 is between p=10 and p=15 whose

density shapes are shown in Figs. 7(c) and 7(d) that match the original data structure well.

15

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

x

y

(a)

10 20 30 40

0.95

0.96

0.97

0.98

0.99

1.00

Index

The increased shift of p is M=1

(b)

suitable estimate

5 10 15 20

0.90

0.95

1.00

Index


(c)

suitable estimate

5 10 15

0.8

0.9

1.0

Index


(d)

suitable estimate

Fig. 6. (a) The 16-cluster data set. (b), (c) and (d) present the correlation values with the increased shift M=1, 2

and 3, respectively.

Our graphical method of correlation comparisons can find a good density estimate. This

estimation method can accomplish the tasks where the bandwidth selection method can do it.

However, our method skips the step of finding all modes of the density estimate so that it is

much simpler and less computational. We estimate a suitable stabilization parameter p where

its operating range is always located between 1 and 50. Note that, if the increased shift M is

large such as M=4 or 5, then it may miss a good estimate for p. However, a too small

increased shift M, such as M<1, may take too much computational time. We suggest that

taking M=1 for the graphical method of correlation comparisons may perform well in most

simulations.

We now explain why we did not choose the point that lies on the dotted line according to

the following two reasons. Note that, the stabilization parameter p is similar to the number of

the bars drawn on the data histogram. The density shape with a large p corresponds to the

16

200

300

p=1

12

34

56y

01

400

3 x2

1

70

65

4

7

(a)

85

95

105

115

p=5

12

34

56y

01

125

135

3 x2

1

70

65

4

7

(b)

60

70

80

p=10

12

34

56y

01

90

100

3 x2

1

70

65

4

7

(c)

45

55

65

75

p=15

12

34

56y

01

85

95

3 x2

1

70

65

4

7

(d)

histogram with a large number of bars and hence has too many modes. Equations (6) and (7)

also show this property. The stabilization parameter p is the power of the kernel function

which takes values between zero and one. Each data point will have the value of equation (8)

being close to one with a large p case and hence the correlation value becomes large. Figure 4

and 6 also show this tendency on the curvilinear tail. According to these two reasons, we

suggest to choose the estimate of p with the point being very close to the dotted line.

Fig. 7. The density estimates of the 16-cluster data set where (a), (b), (c) and (d) are the PESKf̂

values of each

data point with p=1, 5, 10 and 15, respectively.

3.4. A Mean Shift-Based Clustering Algorithm

Since the generalized Epanechnikov kernel PEK has the robustness property as analyzed

in Section 3.1, we use the PEK kernel in the mean shift procedure. By combining the

graphical method of correlation comparisons for the estimation of the stabilization parameter

p, we propose a mean shift-based clustering method (MSCM) with the PEK kernel, called

17

)MSCM(K PE . The proposed )MSCM(K P

E procedure is therefore constructed with four steps:

(1) Select the kernel PEK ; (2) Estimate the stabilization p; (3) Use the mean shift procedures;

(4) Identify the clusters. The )MSCM(K PE procedure is with its diagram as shown in Fig. 8

and summarized as follows.

The MSCM( PEK ) procedure

Step 1. Choose the PEK kernel.

Step 2. Use the graphical method of correlation comparisons for estimating p.

Step 3. Use the nonblurring mean shift procedure.

Step 4. Identify the clusters.

Fig. 8. The diagram of the proposed )MSCM(K PE procedure.

We first implement the )MSCM(K PE procedure for the data set shown in Fig. 3.

According to Fig. 4(a), p=10 is a suitable estimate for the data set in Fig. 3(a). The results of

the )MSCM(K PE procedure are shown in Fig. 9. The curve represents the density estimate

using PESK

f̂ with p=10. The locations of the data points are illustrated by the histograms.

Figures 9(a)~(f) show these data histograms after implementing the )MSCM(K PE procedure

Select a kernel

(1) Choose the kernel PEK (used in

this paper)

(2) Choose the kernel PG or PC

Estimate the stabilization parameter p

1. Use the proposed graphical method of

correlation comparisons (used in this

paper)

2. Directly assign a value

Use the mean shift procedure

1. Nonblurring mean shift (used in this

paper)

2. General mean shift

3. Blurring mean shift

Identify clusters

1. Merge data

2. Other methods (discuss in Section 4)

Data points

18

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

(a) p=10 T=1

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

(b) p=10 T=520100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

(c) p=10 T=10

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

(d) p=10 T=20

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

(e) p=10 T=50

20100

45

40

35

30

25

20

15

10

5

0

20100

45

40

35

30

25

20

15

10

5

0

(f) p=10 T=77 convergence

with the iterative times T=1, 5, 10, 20, 25 and 77. The )MSCM(K PE procedure is convergent

when T=77. The algorithm is terminated when the locations of all data points are unchanged

when the iterative time achieves the maximum T=100. Figure 9(f) shows that all data points

are centralized to three locations, which are the modes of the density estimate PESK

f̂ . The

)MSCM(K PE procedure finds that these three clusters do indeed match the data structure.

Fig. 9. The density estimates using the )MSCM(K P

E procedure with p=10 and the histograms of the data

points where (a) T=1, (b) T=5, (c) T=10, (d) T=20, (e) T=25 and (f) T=77.

We also implement the )MSCM(K P

E procedure on the data set in Fig. 6(a). According

to the graphical method of correlation comparisons as shown in Fig. 6(b), p=13 is a suitable

estimate. The )MSCM(K PE results after the iterative times T=1, 2 and 5 are shown in Figs.

10(a), 10(b) and 10(c), respectively. The data histograms show that all data points are

centralized to 16 locations which match the data structure when the iterative time T=5. We

mention that no initialization problems occurred with our method because the stabilization

parameter p is estimated by the graphical method of correlation comparisons. We can also

solve the cluster validity problem by merging the data points which are similar to the many

progressive clustering methods [30], [37], [38].

19

(a) P=13 T=1 (b) P=13 T=2 (c) P=13 T=5 Convergence

Fig. 10. (a), (b) and (c) are the histograms of the data locations after using the )MSCM(K PE procedure with

iterative time T=1, 2 and 5, respectively.

We have shown that the mean shift can be a robust clustering method by definition of

the M-estimator and also discussed the ϕ function that denotes the degree of influence of an

individual observation for the kernels PG , PC and PEK . In the comparisons of

computational time, our graphical method of correlation comparisons for finding the

stabilization parameter p is much faster than with the bandwidth selection method. Finding all

cluster centers for the bandwidth selection method requires the computational complexity

)( 12stnO for the first selected bandwidth where s and 1t denote the data dimension and

the algorithm iteration number, respectively. Suppose that we choose the maximum 50

bandwidths of the operating range for the graphical method of correlation comparisons, then

this would require the computational complexity )( 50

12 ∑ =i itsnO for finding a suitable

bandwidth. In the )MSCM(K PE procedure, the stabilization parameter p is always in [1, 50]

because we normalize the kernel function. Thus, it requires )50( 2snO to find a suitable p

when the increased shift M=1. Then, the )MSCM(K PE procedure should be faster if we set

the increased shift M to be greater than 1.

We finally further discuss the )MSCM(K PE procedure. The first step is to choose a

kernel function where we recommend the selection of PEK . You may also choose the kernels

PG and PC . The second step is to estimate the stabilization parameter p using the proposed

graphical method of correlation comparisons. Of course, the user can also directly assign a

value for p. In our experiments, a suitable p always fell between 10 and 30. The third step is

to implement the mean shift procedure. Three kinds of mean shift procedures mentioned in

20

0

1

2

3

z

0 1 2 3 4 5 6y-1 0 1 2

3

4

5

07

-18

76543 x210

87

(a)

40302010

1.000

0.995

0.990

0.985

Index

P=20

(b)

0

1

2

3

z

0 1 2 3 4 5 6y-1 0 1 2

3

4

5

07

-18

76543 x210

87

p=20 T=1(c)

0

1

2

3

z

0 1 2 3 4 5 6y-1 0 1 2

3

4

5

07

-18

76543 x210

87

p=20 T=3(d)

0

1

2

3

z

0 1 2 3 4 5 6y-1 0 1 2

3

4

5

07

-18

76543 x210

87

p=20 T=5(e)

0

1

2

3

z

0 1 2 3 4 5 6y-1 0 1 2

3

4

5

07

-18

76543 x210

87

(f) p=20 T=11 convergence

Section 2 can be used. We suggest the nonblurring mean shift. The fourth step is to classify

all data points. If the data set only contains the round shape clusters, the nonblurring mean

shift can label the data points by merging them. However, if the data set contains other cluster

shapes such as line or circle structures, then the data points may not be centralized to some

small locations. In this case, merging the data points will not work well. We next step is to

discuss this problem and also provide some numerical examples and comparisons. We then

apply these to image processing.

4. EXAMPLES AND APPLICATIONS Some numerical examples, comparisons and applications are stated in this section. We

also consider the computational complexity, cluster validity, and improvements of the mean

shift in large continuous, discrete data sets with its application to the image segmentation.

Fig. 11. (a) The data set. (b) The graph of correlation comparisons where p=20 is a suitable estimate. (c), (d), (e)

and (f) are the data locations after using the )MSCM(K PE procedure with iterative time T=1, 3, 5 and 11,

respectively.

4.1. Numerical Examples with Comparisons

Example 3. In Fig. 11(a), a large cluster number data set with 32 clusters is presented

where we also add some uniformly noisy points. The graphical plot of the correlation

21

40302010

0.6

0.5

0.4

PC validity index for FCM clustering algorithmwith random initial values

cluster number c(a)

c=34

40302010

10000

0

-10000

FS validity index for FCM clustering algorithm

with random initial values

cluster number c(b)

c=34

40302010

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

XB validity index for FCM clustering algorithm

with random initial values

cluster number c(c)

comparisons for the data set is shown in Fig. 11(b) where p=20 is a suitable estimate. The

)MSCM(K PE results when T=1, 3, 5 and 11 are shown in Figs. 11(c), 11(d), 11(e) and 11(f),

respectively. We can see that all data points are centralized to 32 locations that suitably

match the structure of data. The results are not affected by the noisy points. The cluster

number can easily be found by merging the data points which are centralized to the same

locations. Thus, the )MSCM(K PE procedure can be a simple and useful unsupervised

clustering method. The superiority of our proposed method to other clustering methods is

discussed below.

Fig. 12. (a), (b) and (c) present the PC, FS and XB validity indexes when the random initial values are assigned.

We also use the well-known fuzzy c-means (FCM) clustering algorithm [39] to cluster

the same data set shown in Fig. 11(a). Suppose that we do not know the cluster number of the

data set. We adopt the validity indexes, such as the partition coefficient (PC) [40], Fukuyama

and Sugeno (FS) [41], and Xie and Beni (XB) [42], to solve the validity problem when using

FCM. Therefore, we need to process the FCM algorithm for each cluster number c=2, 3 …

However, the first problem is to assign the locations of the initial values in FCM. We simulate

two cases of assignments with the random initial values and the designed initial values.

Figures 12(a), 12(b) and 12(c) present the PC, FS and XB validity indexes when the random

initial values are assigned. FCM can not detect those 32 separated clusters very effectively,

but it can when the designed initial values are assigned as shown in Fig. 13. Note that, the PC

index has a large value tendency when the cluster number is small. Thus, a local optimal

value of the PC index curve may present a good cluster number estimate [43]. In the case of

the designed initials assignment, all validity measures should have an optimal solution with

c=32. However, this result is obtained only when the FCM algorithm can divide the data into

32 well-separated clusters. If the initializations are not properly chosen (for example, no

centers are initialized in one of these 32 rectangles), we can not ensure good partitions being

22

40302010

0.6

0.5

0.4

PC validity index for FCM clustering algorithmwith designed initial values

cluster number c(a)

c=32

40302010

10000

0

-10000

FS validity index for FCM clustering algorithmwith designed initial values

cluster number c(b)

c=32

40302010

0.6

0.5

0.4

0.3

0.2

0.1

0.0

XB validity index for FCM clustering algorithmwith designed initial values

cluster number c(c)

c=32

found by the FCM algorithm. In fact, most clustering algorithms always have this

initialization problem. This is a more difficult problem in a high-dimensional case. Here we

find that our proposed )MSCM(K PE procedure has the necessary robust property for the

initialization.

Fig. 13. (a), (b) and (c) present the PC, FS and XB validity indexes when the designed initial values are

assigned.

We had mentioned that the computational complexity of finding the stabilization

parameter estimate p in )MSCM(K PE is )50( 2snO . After estimating p, the computational

complexity of implementing the )MSCM(K PE procedure is )( 2

KstnO where Kt denotes

the number of iterations. In FCM, the computational complexity is )( CncstO , where Ct is

the number of iterations and c is the cluster number. For solving the validity problem, we

need to process FCM from c=2 to maxCc = where maxC is the maximum number of clusters.

Thus, the computational complexity of FCM is ))1max(( max

1∑ =

=−

Cc

c CtsCnO . Although the

complexity of the )MSCM(K PE procedure is high when the data set is large, it can directly

solve the validity problem in a simple way. In order to reduce the complexity of the

)MSCM(K PE procedure for the large data set, we will propose a technique to deal with it

later.

4.2. Cluster Validity and Identified Cluster for the )MSCM(K PE procedure

Note that, after we have found the )MSCM(K PE clustering results, we had simply

identified clusters and the cluster number by merging the data points which are centralized to

the same location. However, if the data set contains different shape clusters such as a line or a

circle structure, the data points may not be centralized to the same small region. On the other

hand, the estimated density shape may have flat curves on the modes. In this case, the method

23

3210-1-2

2

1

0

-1

-2

(a)

40302010

1.000

0.995

0.990

Index

p=28

(b)

-2 -1 0 1 2 3

-2

-1

0

1

2

(c)

of identifying clusters by merging data points may not be an effective means of finding the

clusters. If we use the Agglomerative Hierarchical Clustering (AHC) algorithm with some

linkage methods such as single linkage, complete linkage and Ward’s method, etc., we can

identify different shape clusters from the )MSCM(K PE clustering results. We demonstrate

this identified method with AHC as follows.

Fig. 14. (a) The data set. (b) The graph of correlation comparisons where p=28 is a suitable estimate. (c) The

data locations after using the )MSCM(K PE procedure.

Example 4. Figure 14(a) shows a three-cluster data set with different cluster shapes. The

graphical plot of the correlation comparisons is shown in Fig. 14(b) where p=28 is a suitable

stabilization parameter estimate. The locations of all data points after processing the

)MSCM(K PE are shown in Fig. 14(c). In this situation, the identified method of merging data

points may cause difficulties. Since the mean shift procedure can be seen as the method that

shifts the similar data points to the same location, the AHC with single linkage can help us to

find clusters so that similar data points can be easily linked into the same cluster. We process

the AHC with single linkage for the )MSCM(K PE clustering results shown in Fig. 14(c). We

obtain the hierarchical clustering tree as shown in Fig. 15(a). The hierarchical tree shows that

the data set contains three well-separated clusters. These clusters are shown in Fig. 15(b) with

different symbols. In fact, most clustering algorithms are likely to fail when they are applied

to the data set shown in Fig. 14(a).

By combing )MSCM(K PE with AHC, these methods can detect different shape clusters

for most data sets. On the other hand, it may also decrease the convergence time for the

)MSCM(K PE procedure. Note that, the estimated density shape may have flat curves on the

modes for the data set so that the )MSCM(K PE procedure is too time consuming in this case,

especially for a large data set. Since the AHC method can connect the close data points into

the same cluster, we can then stop the )MSCM(K PE procedure when the data points shift to

24

-2 -1 0 1 2 3

-2

-1

0

1

2

12

34 5 6 7

89

10

111213 14

1516 17 18 19

20 2122 2324

25262

72829 30

3132

33 34

35 363738 39 40 4

1 424344 45 46 4748 49 50

5152 53 5455

56 5758 59 6

0

6162 63 64 65 6667 6869

70 7172 73 7

4 75

76 7778 79

8081 82 83

84 85 86 87 88 89 90

91 92 93 9495 96 97

98 99100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115 11

611

711

8

119

120

121

122

123

124

125

126

127 128

129

130

131

132

133

134

135136

137 13

813

914

0

141

142

143 14

4

145

146

147

148

14915

015

115

2 153

154

155

156

157 15

8

159

160

16116

216

3

164

165

166

167

168

169 17017

1

172

173

174

175

176

177

178 179

180

181

182 183

184

185

186

187

188 189 19

0 191

192

193

194

195

196 19

7198

199

200

20120

2 203 20

4

205

206

207

208

209 21

0211

212

213

214

215

216

217

218 21

9

220

221

222

223

224 22

5

0.0

0.2

0.4

0.6

0.8

Hei

ght

the location near the mode. Therefore, we can set a larger stopping threshold for the

)MSCM(K PE procedure or set a smaller maximum iteration.

(a) (b) Fig. 15. (a) The Hierarchical Tree of the data set in Figure 14 (c). (b) Identified three clusters with different

symbols.

4.3. Implementation for Large Continuous Data Sets

We know that one important property of the )MSCM(K PE procedure is to take all data

points as the initial values so that these data points can be centralized to the locations of the

modes. The cluster validity problems can be solved simultaneously in this way. However, this

process wastes too much time for a large data set. One way to reduce the number of data

points for the )MSCM(K PE procedure is to first merge similar data points from the original

data set. Although this technique can reduce the number of data points, the )MSCM(K PE

clustering results will differ as to the original nonblurring mean shift procedure. Since the

mode represents the location of where most data are located, these data points will shift to the

mode at a few mean shift iterations. However, these data points have to continue to process

the mean shift procedure until all data points are centralized to these modes. This is because

the mean shift procedure is treated as a batch type.

Although all data points are taken to be the initial values, there are no constrains on

these initial values when we process the mean shift clustering algorithm using equation (3).

This means that we can treat the mean shift as a sequential clustering algorithm. The second

data point is inputted only when the first data point shifts to the mode. The third data point is

inputted only when the second data point shifts to the mode. The rest points can be similarly

induced. In this sequential-type mean shift procedure, only the data points that are far away

from modes require more iteration. However, most data points (nearby the modes) require

25

less iteration. The computational complexity of this sequential-type mean shift procedure is

∑ =

n

i i )tns(O1

where it denotes the iterative count of data point ix . However, the iteration

of each data point will be all equal to the largest iteration number in the batch-type mean shift

procedure where the computational complexity is )( 2KstnO with }t,,tmax{t nK Λ1= .

Note that, this technique is only feasible for the nonblurring mean shift procedure, such

as the proposed )MSCM(K PE procedure. Since the blurring mean shift algorithm update the

data points at each iterative, the location of each data point will be changed so that the

sequential type is not feasible for the blurring mean shift procedure.

4.4. Implementation for Large Discrete Data Sets

Suppose that the data set },,{ 1 nxx Λ only takes values on the set },,{ 1 myy Λ with

corresponding counts },,{ 1 mnn Λ . That is, there are 1n observations of },,{ 1 nxx Λ with

values 1y , 2n observations of },,{ 1 nxx Λ with values 2y , etc. For example, a 128*128 grey

level image will have 16384 data points (or pixels). However, it only take values on the grey

level set }255,,1,0{ Λ . The mean shift procedure has large computational complexity in this

n=16384 case. The following theorem is a technique that can greatly reduce computational

complexity.

Theorem 1. Suppose that }x,,x{x nΛ1∈ , }y,,y{y mΛ1∈ and yx = . The mean shift of

y is defined by

i

m

i ii

m

i iiiiK

n)y(w)||yy(||k

yn)y(w)||yy(||k)y(my

∑∑

=

=

−

−==

12

12

(9)

Then we have )y(m)x(m KK = . That is, the mean shift of x using equation (3) can be

replaced by the mean shift of y using equation (9) with the equivalent results.

Proof: Since },,{ 1 nxx Λ only take values on the set },,{ 1 myy Λ with corresponding counts

},,{ 1 mnn Λ , we have im

i ii n)y(w)||yy(||k∑=−

12 = ∑ =

−n

j jj )x(w)||xx(||k1

2 and

iim

i ii yn)y(w)||yy(||k∑ =−

12 = i

n

j jj x)x(w)||xx(||k∑ =−

12 . Thus, )y(m)x(m KK = . □

We can take },,{ 1 myy Λ to be initial values and then process the mean shift procedure

using equation (9) for the )MSCM(K PE procedure. The final locations of },,{ 1 myy Λ using

equation (9) will be equivalent to the final locations of },,{ 1 nxx Λ using equation (3). In this

discrete data case, the computational complexity for the )MSCM(K PE procedure becomes

26

)( 2stmO where t denotes the iterative counts and m<n. The previously discussed sequential

type of continuous data case can also be used to greatly reduce computational complexity in

this discrete data case. The following is a simple application in the image segmentation.

(a) (b) (c) (d)

Fig. 16. (a) The Lenna image with size 128*128. (b) The results using the )MSCM(K PE procedure with

p=15 when T=5. (c) The results with p=15 when T=15. (d) The results with p=15 when T=98.

4.5. Application in Image Segmentation

Example 5. Figure 16(a) is the well-known Lenna image with the size 128*128. This image

contains 16384 pixel values with a maximum value of 237 and minimum value of 33. This

means that it only takes values on }237,,34,33{ Λ and the count frequency of these pixel

values is shown in Fig. 17(a). The computational complexity of the original nonblurring

mean shift procedure is )16384()( 22 stOstnO = . After applying the described technique

using equation (9), the computational complexity of the nonblurring mean shift procedure in

this discrete case becomes )205()( 22 stOstmO = which actually greatly reduces

computational complexity. The graph of correlation comparisons for the )MSCM(K PE

procedure is shown in Fig. 17(b) where p=15 is a suitable estimate. The estimated density

shape with p=15 is shown in Fig. 17(c) which quite closely matches the pixel counts with

four modes. The )MSCM(K PE clustering results when t=5, t=15 and t=98 are shown in Figs.

16(b), 16(c) and 16(d), respectively. Following convergence, all pixel values are centralized

to four locations presented by the arrows in the x-coordinate as shown in Fig. 17(d). Note that,

for most c-means based clustering algorithms, c=8 is mostly used as the cluster number of the

Lenna data set. If we combine a validity index to search for a good cluster number estimate

for this data, the computational complexity should become very large. In the )MSCM(K PE

procedure, a suitable cluster number estimate is found automatically and the computational

27

25015050

200

100

0

pixel

coun

t

(a)

40302010

1.000

0.999

0.998

0.997

Index

corr

(b)

P=15

25015050

pixel25015050

pixel(c)

P=15

0 50 100 200 250

pixel0 50 100 150 200 250

50

150

250

P=15 T=98pi

xel

(d)

complexity for this discrete large data set can be reduced by using the above mentioned

technique.

Fig. 17. (a) The counts of the pixel of the Lenna image. (b) The graph of correlation comparisons where p=15 is

a suitable estimate. (c) The density estimate with p=15. (d) All pixel values are centralized to four locations after

using the )MSCM(K PE procedure with iterative time T=98.

For color image data, each pixel contains a three-dimensional data point that each

dimension takes values on the set }255,,1,0{ Λ and hence we will have 255*255*255

possible pixel values. However, for a 128*128 color image, the worst situation is to have

128*128 different pixel values. In general, an image data will contain many overlapping pixel

values and hence the technique can also significantly reduce computational complexity even

for color images.

4.6. Comparisons to other kernel-based clustering algorithms

Comaniciu [10] proposed the variable-bandwidth mean shift with data-driven bandwidth

selection. To demonstrate their bandwidth selection method, Comaniciu [10] used the data set

drawn with equal probability from normal distributions N(8,2), N(25,4), N(50,8) and

N(100,16) with total n=400. Figure 18 (a) shows the histogram of this data set. Comaniciu

28

0 50 100 150

0

10

20

30

40

50

60

data

Freq

uenc

y

(a)

10 20 30 40

0.992

0.993

0.994

0.995

0.996

0.997

0.998

0.999

1.000

Index

corr

P=20

(b)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

010

2030

4050

Hei

ght

(c)

[10] used 12 analysis bandwidths in the range of 1.5-17.46 where four classes of bandwidth

were detected (see Ref. [10], pages 284-285). Based on our proposed )MSCM(K PE , the

graph of correlation comparisons for the data set of Fig. 18 (a) gives p=8 and there are four

clusters presented as shown in Fig. 18 (b) and Fig. 18 (c), respectively.

Fig. 18. (a) Histogram of the data set. (b) The graph of correlation comparisons. (c) The Hierarchical Tree of the

data after implementing the )MSCM(K PE algorithm.

Furthermore, we use the data set of nonlinear structures with multiple scales in

Comaniciu [10] for our next comparisons as shown in Fig. 19 (a). The graph of correlation

comparisons, the Hierarchical Tree of the data after implementing the )MSCM(K PE

algorithm, and four identified clusters presented in different symbols are shown in Figs. 19

(b), 19 (c) and 19 (d), respectively. These results with four identified clusters from

)MSCM(K PE are coincident to those of the variable-bandwidth mean shift in Comaniciu [10].

Note that, the variable-bandwidth mean shift in Comaniciu [10] is a good technique to find a

suitable bandwidth for each data point, but it requires implementing the fixed-bandwidth

mean shift with many different selected bandwidths. Hence, it is time consuming. In our

)MSCM(K PE method, we are not to focus on estimating good bandwidth for each data point,

but on estimating the stabilization parameter p so that the )MSCM(K PE with this suitable p

can provide suitable final results. In this case, it requires implementing the mean shift process

only one time.

29

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

-1

0

1

2

(a)

1.51.00.50.0-0.5-1.0-1.5-2.0-2.5

2

1

0

-1

data(d)

Fig. 19. (a) Data set. (b) The graph of correlation comparisons. (c) The Hierarchical Tree of the data after

implementing the )MSCM(K PE algorithm. (d) Four identified clusters presented in different symbols.

The kernel k-means proposed by Girolami [4] is another kernel-based clustering

algorithm. The cluster numbers of the data sets Iris (c=3), Wine (c=3) and Crabs (c=4) are

correctly detected by kernel k-means (see Ref. [4], pages 783-784). We implement the

proposed method for the data sets Iris, Wine and Crabs where the results of these real data

sets from )MSCM(K PE are shown in Figs. 20, 21, and 22, respectively. The )MSCM(K P

E

algorithm with p=10 indicates that c=2 is suitable for the Iris data. However, there are some

clues to see c=3 is also suitable for the Iris data according to the results of Fig. 20 (b). Note

that, this is because one of the clusters of Iris is separable from the other two overlapping

clusters. The results of Wine data from the )MSCM(K PE as shown in Fig. 21 (b) give the

possible cluster numbers c=2, c=3 or c=5. The results from the )MSCM(K PE for the Crabs

data as shown in Fig. 21 (b) are the same as kernel k-means with c=4.

Although the mean shift can move data points to the data dense regions, these regions

may be very flat so that the proposed )MSCM(K PE with a linkage method, such as single

linkage, may not give a clear cluster number estimate. This situation appears in some real

40302010

1.00

0.99

0.98

0.97

0.96

0.95

0.94

0.93

0.92

Index

corr

P=8

(b)

1 234 5678 910 1112 131415 1617 1819 20 212223 242526 2728 293031 32 3334 3536 37 383940 41 4243

44

45 4647 4849 5051 5253 5455 5657 5859 606162 63 64 6566 6768 69 7071 7273 7475767778 79 8081 828384 85 86 8788 89 90 9192 9394 959697 98 99 100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Hei

ght

(c)

30

10 20 30 40

0.997

0.998

0.999

1.000

Index

P=20

(a)

10 20 30 40

0.94

0.95

0.96

0.97

0.98

0.99

1.00

Index

P=10

(a)

1 2 3 45 6 7 8 91011 1213 1415 161718 192021 22 23 24 2526 272829 30 3132 33 34 35 3637 38 394041 424344 45 4647 4849 50 51 5253 5455 5657 5859 60 61626364 6566 676869 7071 7273 74 757677 7879 80 8182 8384 858687 88 8990 9192 93 9495 96 9798 99100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

01

23

4

Hei

ght

1 2 3 4 567 89 10 1112 1314 1516 1718

19

2021 2223 24 25 262728 2930 31 3233 3435 36 3738 39 40 414243 44 454647 4849 5051 52 535455 56 57 58 59 6061 6263 64 6566 676869 70 71 727374 75 76 777879 80 8182 83 8485 86 87 8889 90 9192 93 94959697 9899 100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

020

040

060

080

010

0012

00

Hei

ght

data applications. We recommend that we use more linkage methods such as complete

linkage, Ward’s method, etc., to offer more clustering information. In our final example, we

use the Cmc data set from the UCI machine learning repository [44] in the comparisons of the

proposed )MSCM(K PE with kernel k-means. There are 10 attributes, 1473 observations and

three clusters in the Cmc data set. The )MSCM(K PE results for the Cmc data set are shown

in Fig. 23 where the single linkage, complete linkage and Ward’s method indicate that the

data set contains three clusters that are coincident to the data structure. Although kernel

k-means using the eigenvalue decomposition of the kernel matrix proposed by Girolami [4]

can estimate the correct cluster number for the Iris, Wine and Crabs data sets, the estimates of

the cluster number quite depend on the selected radial basis function (RBF) kernel width. The

eigenvalue decomposition results with different RBF kernel widths for the Cmc data set

based on the kernel matrix are shown in Fig. 24. The correct result c=3 shown in Fig. 24(a)

depends on the suitable chosen RBF kernel width 15. However, other choices with 10 and 5

as shown in Figs. 24(b) and 24(c) cannot detect the cluster number of c=3.

(b)

Fig. 20. (a) The graph of correlation comparisons and (b) The Hierarchical Trees after implementing the

)MSCM(K PE algorithm for the Iris data.

(b)


)MSCM(K PE algorithm for the Wine data.

31

10 20 30 40

0.995

0.996

0.997

0.998

0.999

1.000

Index

P=19

(a)

1 2 3 4 56 789 10 111213 14 1516 17 1819 202122 23 2425 26272829 30 313233 3435 36 3738 3940 41 4243 444546 47484950 51 52 53545556 57 5859 606162 63646566 67686970 71 72737475 76 77 78 798081 8283 84 85 8687 888990 91 9293 949596 979899100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

1473

05

1015

2025

Hei

ght

5040302010

1.000

0.995

0.990

Index

p=18

0.00

0.01

0.02

0.03

0.04

0.05

0.06

1 2 3 4 5 6 7 8 9 10

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

1 2 3 4 5 6 7 8 9 10

1 2 3 4 56 789 10 111213 14 1516 17 1819 202122 23 2425 26272829 30 313233 3435 36 3738 3940 41 4243 444546 47484950 51 52 53 545556 57 5859 60 6162 63646566 67686970 71 727374 75 7677 78 798081 8283 84 85 8687 888990 91 9293 9495 96 9798 99100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653 65

4

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

1473

010

020

030

0

Hei

ght


)MSCM(K PE algorithm for the Crabs data.

(a) (b)

(c) (d) Fig. 23. (a) Estimated p for Cmc data. (b), (c) and (d) are the single linkage, complete linkage and Ward’s

method for the Cmc data after implementing the )MSCM(K PE algorithm.

(a) (b) (c) Fig. 24. The eigenvalue decomposition of the kernel matrix to estimate the cluster number of the Cmc data.

(a), (b) and (c) are the results of the RBF kernel widths with 15, 10 and 5, respectively.

1 2 3 45 678 9 10 111213 14 1516 1718 1920 21 222324 252627 2829 3031 32 33 3435 363738 394041 424344 45 4647 4849

50

5152 53 54 5556 57 5859 6061 6263646566 67 6869 7071 72 73 747576 777879 808182 8384 85 8687 88 89 90 9192 9394 9596 9798 99100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

050

010

0015

00

Hei

ght

(b)

1 2 3 4 56 789 10 111213 14 1516 17 1819 202122 23 2425 26272829 30 313233 3435 363738 3940 41 4243 444546 47484950 51 52 53 545556 57 5859 60 6162 63646566 67686970 71 727374 75 7677 78 798081 8283 84 85 8687 888990 91 9293 9495 96 9798 99100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313 1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344

1345

1346

1347

1348

1349

1350

1351

1352

1353

1354

1355

1356

1357

1358

1359

1360

1361

1362

1363

1364

1365

1366

1367

1368

1369

1370

1371

1372

1373

1374

1375

1376

1377

1378

1379

1380

1381

1382

1383

1384

1385

1386

1387

1388

1389

1390

1391

1392

1393

1394

1395

1396

1397

1398

1399

1400

1401

1402

1403

1404

1405

1406

1407

1408

1409

1410

1411

1412

1413

1414

1415

1416

1417

1418

1419

1420

1421

1422

1423

1424

1425

1426

1427

1428

1429

1430

1431

1432

1433

1434

1435

1436

1437

1438

1439

1440

1441

1442

1443

1444

1445

1446

1447

1448

1449

1450

1451

1452

1453

1454

1455

1456

1457

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

1471

1472

1473

02

46

810

12

Hei

ght

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

1 2 3 4 5 6 7 8 9 10

32

Finally, we mention that merging data points using AHC after the mean shift process

)MSCM(K PE not only provide the estimated cluster number, but also can give more

information about the data structure. For example, the observation 19 of Wine data as shown

in Fig. 21 (b) and the observation 141 of Crabs data as shown in Fig. 22 (b) can be been as

the abnormal observations. Figure 18(c) also shows that the observation 314 is far away from

other data points. In unsupervised clustering, our object is not only to detect the correct

cluster number for an unknown-cluster-number data set, but also to discover a reasonable

cluster structure for the data set.

5. CONCULSIONS

In this paper, we proposed a mean shift-based clustering procedure, called )MSCM(K PE .

The proposed )MSCM(K PE can be robust with three facets. In facet 1, since we combined the

mountain function to estimate the stabilization parameter, the bandwidth selection problem

for the density estimation can be solved in a graphical way. Also the operating range of the

stabilization parameter is always fixed for various data sets. This led to the proposed method

being robust for the initializations.

In facet 2, we discussed the problem of the mean shift procedure faced in the problems

of cluster validity and identified clusters. According to the properties of the nonblurring mean

shift procedure, we suggested combining the Agglomerative Hierarchical Clustering with the

single linkage to identified different shaped clusters. This technique can also save the

computational complexity of the )MSCM(K PE procedure by setting a larger stopping

threshold or setting a smaller maximum iterative count. Thus, the proposed method can be

robust for different cluster shapes in the data set.

In facet 3, we analyzed the robust properties of the mean shift procedure according to

the nonparametric M-estimator and the ϕ functions of three kernel classes. We then focused

on the generalized Epanechnikov kernel where extremely large or small data points will have

no influence on the mean shift procedures. Our demonstrations showed that the proposed

method is not influenced by noise and hence is robust as to the noise and outliers. We also

provided some numerical examples, comparisons and applications to illustrate the superiority

of the proposed )MSCM(K PE procedure including the computational complexity, cluster

validity, improvements of the mean shift in large continuous, discrete data sets and image

33

segmentation.

Acknowledgements

The authors are grateful to the anonymous referees for their critical and constructive

comments and suggestions to improve the presentation of the paper. This work was supported

in part by the National Science Council of Taiwan, under Kuo-Lung Wu’s Grant:

NSC-95-2118-M-168-001 and Miin-Shen Yang’s Grant: NSC-95-2118-M-033-001-MY2.

34

References [1] V. N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.

[2] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines,

Cambridge Univ. Press, Cambridge, 2000.

[3] F. Camastra and A. Verri, A novel kernel method for clustering, IEEE Trans. Pattern

Analysis and Machine Intelligence 27 (2005) 801-805.

[4] M. Girolami, Mercer kernel based clustering in feature space, IEEE Trans. Neural

Networks 13 (2002) 780-784.

[5] K.R. Muller, S. Mika, G. R a&&tsch, K. Tsuda, and B. Sch o&&lkopf, An introduction to

kernel-based learning algorithm, IEEE Trans. Neural Networks 12 (2001) 181-202.

[6] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Chapman and Hall,

New York, 1998.

[7] K. Fukunaga and L.D. Hostetler, The estimation of the gradient of a density function with

applications in pattern recognition, IEEE Trans. Information Theory 21 (1975) 32-40.

[8] Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Analysis and

Machine Intelligence 17 (1995) 790-799.

[9] R.R. Yager and D.P. Filev, Approximate clustering via the mountain method, IEEE Trans.

Systems, Man and Cybernetics 24 (1994) 1279-1284.

[10] D. Comaniciu, An algorithm for data-driven bandwidth selection, IEEE Trans. Pattern


[11] D. Comaniciu and P. Meer, Mean shift: A robust approach toward feature space analysis,

IEEE Trans. Pattern Analysis and Machine Intelligence 24 (2002) 603-619.

[12] K.I. Kim, K. Jung and J.H. Kim, Texture-based approach for text detection in images

using support vector machines and continuously adaptive mean shift algorithm, IEEE

Trans. Pattern Analysis and Machine Intelligence 25 (2003) 1631-1639.

[13] X. Yang and J. Liu, Unsupervised texture segmentation with one-step mean shift and

boundary markov random fields, Pattern Recognition Letters 22 (2001) 1073-1081.

[14] D. Comaniciu, V. Ramesh and P. Meer, Kernel-based object tracking, IEEE Trans.

Pattern Analysis and Machine Intelligence 25 (2003) 564-577.

[15] N.S. Peng, J. Yang and Z. Liu, Mean shift blob tracking with kernel histogram filtering

and hypothesis testing, Pattern Recognition Letter 26 (2005) 605-614.

[16] O. Debeir, P.V. Ham, R. Kiss and C. Decaestecker, Tracking of migrating cells nuder

phase-contrast video microscopy with combined mean-shift processes, IEEE Trans.

35

Medical Imaging 24 (2005) 697-711.

[17] H. Chen and P. Meer, Robust fusion of uncertain information, IEEE Trans. Systems,

Man and Cybernetics-Part B: Cybernetics 35 (2005) 578-586.

[18] M. Fashing and C. Tomasi, Mean shift is a bound optimization, IEEE Trans. Pattern


[19] S.L. Chiu, Fuzzy model identification based on cluster estimation, J. Intelligent and

Fuzzy Systems 2 (1994) 267-278.

[20] M.S. Yang and K.L. Wu, A modified mountain clustering algorithm, Pattern Analysis and

Applications 8 (2005) 125-138.

[21] N.R. Pal and D. Chakraborty, Mountain and subtractive clustering method:

Improvements and generalizations, Int. J. Intelligent Systems 15 (2000) 329-341.

[22] B.P. Velthuizen, L.O. Hall, L.P. Clarke and M.L. Silbiger, An investigation of mountain

method clustering for large data sets, Pattern Recognition 30 (1997) 1121-1135.

[23] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990.

[24] G. Beni and X. Liu, A least biased fuzzy clustering method, IEEE Trans. Pattern Analysis

and Machine Intelligence 16 (1994) 954-960.

[25] P.J. Huber, Robust Statistics. Wiley, 1981.

[26] X. Zhuang, T. Wang, and P. Zhang, A highly robust estimator through partially likelihood

function modeling and its application in computer vision, IEEE Trans. Pattern Analysis

and Machine Intelligence 14 (1992) 19-35.

[27] K.L. Wu and M.S. Yang, Alternative c-means clustering algorithms, Pattern Recognition

35 (2002) 2267-2278.

[28] C.V. Stewart, Minpran: A new robust estimator for computer vision, IEEE Trans. Pattern


[29] R.N. Dave and R. Krishnapuram, Robust clustering methods: A unified view, IEEE

Trans. Fuzzy Systems 5 (1997) 270-293.

[30] H. Frigui and R. Krishnapuram, A robust competitive clustering algorithm with

applications in computer vision, IEEE Trans. Pattern Analysis and Machine Intelligence

21 (1999) 450-465.

[31] K. Rose, E. Gurewitz and G.C. Fox, Statistical mechanics and phase transitions in

clustering, Physical Review Letters 65 (1990) 945-948.

[32] K. Rose, E. Gurewitz, and G..C. Fox, A deterministic annealing approach to clustering,

Pattern Recognition Letters 11 (1990) 589-594.

[33] S. Chen and D. Zhang, Robust image segmentation using FCM with special constraints

36

based on new kernel-induced distance measure, IEEE Trans. Systems, Man and

Cybernetics-Part B: Cybernetics 34 (2004) 1907-1916.

[34] M.N. Ahmed, S.M. Yamany, N. Mohamed, A.A. Farag and T. Moriarty, A modified

fuzzy c-means algorithm for bias field estimation and segmentation of MRI data, IEEE

Trans. Medical Imaging 21 (2002) 193-199.

[35] M.S. Yang and K.L. Wu, A similarity-based robust clustering method, IEEE Trans.

Pattern Analysis and Machine Intelligence 26 (2004) 434-448.

[36] R. Krishnapuram and J.M. Keller, A possibilistic approach to clustering, IEEE Trans.

Fuzzy Systems 1 (1993) 98-110.

[37] R. Krishnapuram, H. Frigui, and O. Nasraoui, Fuzzy and possibilistic shell clustering

algorithm and their application to boundary detection and surface approximation, IEEE

Trans. Fuzzy Systems 3 (1995) 29-60.

[38] H. Frigui and R. Krishnapuram, Clustering by competitive agglomeration, Pattern

Recognition 30 (1997) 1223-1232.

[39] J.C. Bezdek, Pattern Reccognition with Fuzzy Objective Function Algorithm, Plenum

Press, 1981.

[40] J.C. Bezdek, Cluster validity with fuzzy sets, J. Cybernetics 3 (1974) 58-73.

[41] Y. Fukuyama and, M. Sugeno, A new method of choosing the number of clusters for

fuzzy c-means method, In: Proceedings of the 5th Fuzzy System Symposium (in

Japanese), pp. 247–250. 1989.

[42] X.L. Xie and G. Beni, A validity measure for fuzzy clustering, IEEE Trans. Pattern


[43] K.L. Wu and M.S. Yang, A cluster validity index for fuzzy clustering, Pattern

Recognition Letters 26 (2005) 1275-1291.

[44] C.L. Blake and C.J. Merz, UCI repository of machine learning databases, a huge

collection of artificial and real-world data sets, 1998. Availabe from:

\underline{http://www.ics.uci.edu/ \sim mlearn/MLRepository.html}.

吳國龍 Mean Shift-Based Clusteringir.lib.ksu.edu.tw/bitstream/987654321/5029/1/(吳國龍... ·...

Documents

Transcript of 吳國龍 Mean Shift-Based Clusteringir.lib.ksu.edu.tw/bitstream/987654321/5029/1/(吳國龍... ·...