[IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) -...

8
A Mean Shift-Based Initialization Method for K -means Iv´ an Cabria Departamento de F´ ısica Te´ orica, At´ omica y ´ Optica University of Valladolid Valladolid, Spain e-mail: [email protected] Iker Gondra Department of Mathematics, Statistics, Computer Science St. Francis Xavier University Antigonish, Nova Scotia, Canada e-mail: [email protected] Abstract—Because of its conceptual simplicity, k-means is one of the most commonly used clustering algorithms. However, its performance in terms of global optimality depends heavily on both the selection of k and the selection of the initial cluster centers. On the other hand, Mean Shift clustering does not rely upon a priori knowledge of the number of clusters. Furthermore, it finds the modes of the underlying probability density function of the observations, which would be a good choice of initial cluster centers for k-means. We present a Mean Shift-based initialization method for k-means. A comparative study of the proposed and other initialization methods is performed on two real-life problems with very large amounts of data: Facility Location and Molecular Dynamics. In the study, the proposed initialization method outperforms the other methods in terms of clustering performance. Index Terms—clustering; k-means; mean shift; initialization; facility location; molecular dynamics I. I NTRODUCTION Clustering differs from classification methods such as dis- criminant analysis in that classification involves a known number of groups and the operational objective is to assign a new observation to one of these clusters. On the other hand, cluster analysis makes no assumption about the number of clusters and ideally seeks to find an optimal number of clusters based on some objective function. In e.g., Facility Location problems, the optimal number of clusters is usually part of the optimization process. For a given number of clusters k, the number of ways of dividing n observations into k non-empty clusters is known (see e.g., [1],[2]) to be a Stirling number of the second kind which is given by C (n) k = 1 k! k j=0 (1) kj k j j n . (1) Adding the numbers C (n) k for k =1, 2,...,n gives the total number N = n k=1 C (n) k of all possible ways of dividing n data points into groups of different sizes. The question then is why not compute all possible clusters and select whatever is considered to be the best? The answer to this question is that, for large values of n, this is computationally intractable and thus, in practice, we rarely examine all possible groupings. Clustering methods are divided into hierarchical and non- hierarchical. Hierarchical clustering algorithms proceed either by a series of merges or a series of successive divisions (see e.g., [1]). An example of hierarchical clustering is the agglomerative clustering algorithm [1], [3]. Non-hierarchical algorithms start from an initial partition of observations into clusters. It is known that the final assignment of observations depends, to some extend, on the initial partition [1]. An example of a non-hierarchical clustering method is k-means. In contrast to other methods, k-means can handle the clustering of very large amounts of data in a relatively short amount of time. However, its clustering performance depends heavily on the selection of k and the selection of the initial cluster centers. On the other hand, Mean Shift clustering does not rely upon a previous knowledge of the number of clusters (i.e., the value of k in the case of the k-means method). Mean Shift also finds the modes of the underlying probability density function of the observations, which would be a very good choice of initial cluster centers for the k-means method. Both k-means and Mean Shift Clustering have been used in wide range of application domains such as e.g., computer vision [4], [5], [6]. We present a Mean Shift-based initialization method for k-means. The rest of this paper is organized as follows. Brief overviews of k-means and Mean Shift Clustering are given in Section II. The proposed Mean Shift-based k-means ini- tialization is presented in Section III. The comparative study of the proposed and other initialization methods performed on two real-life problems is presented in Section IV. Finally, concluding remarks are given in Section V. II. BACKGROUND A. K-means Because of its conceptual simplicity, k-means is, apparently, the best known and most commonly used clustering algorithm. The traditional k-means clustering algorithm finds the centers (means) of natural clusters for a given dataset {x 1 ,..., x n }, where x i R d . Given k as the number of clusters, the algorithm partitions the dataset into k clusters by attempting to minimize the total sum of intra-cluster distances, which is defined as I = n i=1 xj ∈Ci ||x j x i || 2 (2) 2012 IEEE 12th International Conference on Computer and Information Technology 978-0-7695-4858-6/12 $26.00 © 2012 IEEE DOI 10.1109/CIT.2012.124 577 2012 IEEE 12th International Conference on Computer and Information Technology 978-0-7695-4858-6/12 $26.00 © 2012 IEEE DOI 10.1109/CIT.2012.124 577 2012 IEEE 12th International Conference on Computer and Information Technology 978-0-7695-4858-6/12 $26.00 © 2012 IEEE DOI 10.1109/CIT.2012.124 579 2012 IEEE 12th International Conference on Computer and Information Technology 978-0-7695-4858-6/12 $26.00 © 2012 IEEE DOI 10.1109/CIT.2012.124 579

Transcript of [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) -...

Page 1: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

A Mean Shift-Based Initialization Method forK-means

Ivan CabriaDepartamento de Fısica Teorica, Atomica y Optica

University of ValladolidValladolid, Spain

e-mail: [email protected]

Iker GondraDepartment of Mathematics, Statistics, Computer Science

St. Francis Xavier UniversityAntigonish, Nova Scotia, Canada

e-mail: [email protected]

Abstract—Because of its conceptual simplicity, k-means is oneof the most commonly used clustering algorithms. However, itsperformance in terms of global optimality depends heavily onboth the selection of k and the selection of the initial clustercenters. On the other hand, Mean Shift clustering does not relyupon a priori knowledge of the number of clusters. Furthermore,it finds the modes of the underlying probability density functionof the observations, which would be a good choice of initialcluster centers for k-means. We present a Mean Shift-basedinitialization method for k-means. A comparative study of theproposed and other initialization methods is performed on tworeal-life problems with very large amounts of data: FacilityLocation and Molecular Dynamics. In the study, the proposedinitialization method outperforms the other methods in terms ofclustering performance.Index Terms—clustering; k-means; mean shift; initialization;

facility location; molecular dynamics

I. INTRODUCTIONClustering differs from classification methods such as dis-

criminant analysis in that classification involves a knownnumber of groups and the operational objective is to assigna new observation to one of these clusters. On the other hand,cluster analysis makes no assumption about the number ofclusters and ideally seeks to find an optimal number of clustersbased on some objective function.In e.g., Facility Location problems, the optimal number

of clusters is usually part of the optimization process. For agiven number of clusters k, the number of ways of dividingn observations into k non-empty clusters is known (see e.g.,[1],[2]) to be a Stirling number of the second kind which isgiven by

C(n)k =

1

k!

k∑j=0

(−1)k−j

(k

j

)jn. (1)

Adding the numbers C(n)k for k = 1, 2, . . . , n gives the total

number N =∑n

k=1 C(n)k of all possible ways of dividing n

data points into groups of different sizes. The question then iswhy not compute all possible clusters and select whatever isconsidered to be the best? The answer to this question is that,for large values of n, this is computationally intractable andthus, in practice, we rarely examine all possible groupings.Clustering methods are divided into hierarchical and non-

hierarchical. Hierarchical clustering algorithms proceed either

by a series of merges or a series of successive divisions(see e.g., [1]). An example of hierarchical clustering is theagglomerative clustering algorithm [1], [3]. Non-hierarchicalalgorithms start from an initial partition of observations intoclusters. It is known that the final assignment of observationsdepends, to some extend, on the initial partition [1]. Anexample of a non-hierarchical clustering method is k-means.In contrast to other methods, k-means can handle the

clustering of very large amounts of data in a relatively shortamount of time. However, its clustering performance dependsheavily on the selection of k and the selection of the initialcluster centers. On the other hand, Mean Shift clustering doesnot rely upon a previous knowledge of the number of clusters(i.e., the value of k in the case of the k-means method). MeanShift also finds the modes of the underlying probability densityfunction of the observations, which would be a very goodchoice of initial cluster centers for the k-means method. Bothk-means and Mean Shift Clustering have been used in widerange of application domains such as e.g., computer vision [4],[5], [6]. We present a Mean Shift-based initialization methodfor k-means.The rest of this paper is organized as follows. Brief

overviews of k-means and Mean Shift Clustering are givenin Section II. The proposed Mean Shift-based k-means ini-tialization is presented in Section III. The comparative studyof the proposed and other initialization methods performedon two real-life problems is presented in Section IV. Finally,concluding remarks are given in Section V.

II. BACKGROUNDA. K-meansBecause of its conceptual simplicity, k-means is, apparently,

the best known and most commonly used clustering algorithm.The traditional k-means clustering algorithm finds the centers(means) of natural clusters for a given dataset {x1, . . . ,xn},where xi ∈ Rd. Given k as the number of clusters, thealgorithm partitions the dataset into k clusters by attemptingto minimize the total sum of intra-cluster distances, which isdefined as

I =

n∑i=1

∑xj∈Ci

||xj − xi||2 (2)

2012 IEEE 12th International Conference on Computer and Information Technology

978-0-7695-4858-6/12 $26.00 © 2012 IEEE

DOI 10.1109/CIT.2012.124

577

2012 IEEE 12th International Conference on Computer and Information Technology

978-0-7695-4858-6/12 $26.00 © 2012 IEEE

DOI 10.1109/CIT.2012.124

577

2012 IEEE 12th International Conference on Computer and Information Technology

978-0-7695-4858-6/12 $26.00 © 2012 IEEE

DOI 10.1109/CIT.2012.124

579

2012 IEEE 12th International Conference on Computer and Information Technology

978-0-7695-4858-6/12 $26.00 © 2012 IEEE

DOI 10.1109/CIT.2012.124

579

Page 2: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

where xi is the mean vector for the ith cluster Ci. It is knownthat finding the globally optimal cluster centers {x1, . . . ,xk}is an NP-hard problem. A problem is NP-hard if there existsan NP-complete problem that is polynomial-time Turing-reducible to it. Thus, in practice, the k-means algorithmis a commonly used heuristic for solving this minimizationproblem, which is based on the convergence of

xi =1

|Ci|

∑xj∈Ci

xj (3)

where |Ci| is the cardinality, i.e., number of samples, in thei-th cluster. The algorithm iteratively relabels each data pointusing the Euclidean distance, which is defined as

dE(x,y) =

(d∑

i=1

(xi − yi)2

)1/2

(4)

where x = {x1, . . . , xd}, y = {y1, . . . , yd}. Algorithm1 summarizes classical k-means clustering (with Euclideandistance)

Algorithm 1 KMEANSInput: k; set of points {x1, . . . ,xn}Output: {C1, . . . , Ck}

1: Initialize centers {c1, . . . , ck}2: for each i ∈ {1, 2, . . . , n} do3: for each j ∈ {1, 2, . . . , k} do4: Compute dE(xi, cj) using Eq. 45: end for6: Assign xi to nearest cluster7: end for8: for each i ∈ {1, 2, . . . , k} do9: ci = 1

|Ci|

∑xj∈Ci

xj

10: end for11: Repeat (go to Step 2) until convergence

In order to obtain a more computationally efficient al-gorithm, the squared Euclidean distance can be used. Thisalgorithm terminates after a finite number of iterations, L.The time complexity of KMEANS is O(Lnk), where n isthe number of items in the dataset. However, its clusteringperformance in terms of global optimality heavily depends onthe selection of initial cluster centers. In practice, most of thetime the algorithm converges to a local optimum.

B. Mean Shift

The Parzen window approach (kernel density estimation) toestimating an unknown probability density function p(x) is themost popular density estimation method. As a nonparametricprocedure, it can be used with arbitrary distributions andwithout the assumption that the shape of the underlying densityis known.Given a set of sample points xi,i=1,...,n in d-dimensional

space �d, the number of points falling in a d-dimensional

hypercube with edge length h and centered at x is given byn∑

i=1

k

(x − xi

h

)

where k(u) is the following window function (kernel)

k(u) =

{1 |uj | ≤

12 ; j = 1, . . . , d

0 otherwise.

Thus an estimate of the space-averaged density at a point x

is given by

p(x) =1

nhd

n∑i=1

k

(x − xi

h

). (5)

Because the window width h determines the volume (i.e., hd)of the hypercube, it has a strong effect on the accuracy of theestimate p(x) (i.e., if h is small the volume may be emptyand if h is large important spatial variations of p(x) may belost due to averaging)[7].Although an estimate of p(x) can be obtained by (5),

sometimes, we are more interested on the modes (high-density areas) of p(x) rather than on p(x) itself. The meanshift procedure finds such modes without estimating p(x) inadvance. Assuming a radially symmetric kernel, (5) can berewritten as

p(x) =c

nhd

n∑i=1

k

(∣∣∣∣∣∣∣∣x − xi

h

∣∣∣∣∣∣∣∣2)

(6)

where c is a normalization constant [8]. The modes of p(x)are located among the zeros of its gradient ∇p(x), which isestimated by the gradient of (6)

∇p(x) =2c

nhd+2

n∑i=1

(x − xi)k′

(∣∣∣∣∣∣∣∣x − xi

h

∣∣∣∣∣∣∣∣2)

. (7)

Introducing g(x) = −k′(x) into (7) yields

∇p(x) =2c

nhd+2

[n∑

i=1

g

(∣∣∣∣∣∣ x − xi

h

∣∣∣∣∣∣2)]⎡

⎢⎢⎣n∑

i=1

xig

(∣∣∣∣ x−xih

∣∣∣∣2)n∑

i=1

g

(∣∣∣∣ x−xih

∣∣∣∣2) − x

⎤⎥⎥⎦ .

(8)

The first term of the product in (8) is proportional to p(x) andthe second term is the mean shift (i.e., the difference betweenthe weighted mean and x, the center of the window)

m(x) =

⎡⎢⎢⎣

n∑i=1

xig(∣∣∣∣x−xi

h

∣∣∣∣2)n∑

i=1

g(∣∣∣∣x−xi

h

∣∣∣∣2) − x

⎤⎥⎥⎦ . (9)

From (8) and (9), the following is obtained

m(x) =1

2h2c

∇p(x)

p(x)

which shows that the mean shift at x is proportional to thedensity gradient estimate. Therefore, m(x) always points inthe gradient-increasing direction of p(x). Thus m(x) candefine a path leading to a stationary point (mode) of the

578578580580

Page 3: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

estimated density. The mean shift procedure is obtained bysuccessive computation ofm(x) and translation of the windowby m(x). An important property of mean shift which makesit unique from other gradient-based algorithms is that, whenmoving along such path, there is no need to specify the stepsize explicitly (i.e., in low density regions the mean shift stepsare large and in high density regions near local maxima thesteps are small and thus the analysis is more refined) [8].The fact that the mean shift procedure results in a walk alongthe direction of increasing gradient towards the nearest modemakes it an ideal tool for cluster analysis. The effectivenessof mean shift clustering tightly depends on the selection ofwindow width value h. A selection of small value easily resultsin an excessive number of clusters while a selection of largevalue might blur salient details or even incur an incorrectmerging of irrelevant clusters.

III. MEAN SHIFT-BASED INITIALIZATION METHODAs previously discussed, in contrast to other methods, k-

means can be very fast for very large amounts of data, but itsperformance depends on the selection of k and the selectionof the initial cluster centers. On the other hand, Mean Shiftclustering does not rely on a previous knowledge of thenumber of clusters, the value of k in the case of the k-meansmethod. Besides, the modes of the underlying probabilitydensity function of the observations found by the Mean Shiftclustering could be the initial cluster centers for the k-meansmethod.The proposed Mean Shift-based k-means is shown in Algo-

rithm 2. The time complexity of MS-KMEANS is O(Tn2)where T is the number of iterations (in the loop of lines3-5) until convergence and n is the number of items inthe dataset. The computational bottleneck of the algorithm,which is responsible for its quadratic time complexity, is thecalculation of the mean shift vector in line 3, which requiresn distance computations and is repeated n times.In some applications, it may be the case that the optimal

value of k is known a priori, or it may also be the case that aparticular value of k must be used (this is for example the casein the Molecular Dynamics problem that we present in SectionIV). Thus, the proposed algorithm provides the flexibility ofbeing able to specify the value of k as an input. Many localmodes are different but very close and we can say that theyare practically the same point. In the tests we found that thedifferences between similar points disappear as the precisionis increased, but the computational time increases. We usedthe rm parameter, radius of modes, to calculate the positionsof the modes: If the distance between two points is smallerthan rm, then only one of them is a mode and the second isnot a mode.

IV. EXPERIMENTSIn a former paper we applied the k-means to a Facility

Location problem using four initialization methods or al-gorithms [9]. In the present paper we compare the perfor-mance of MS-KMEANS with the performance of those four

Algorithm 2 MS-KMEANSInput: k ≥ 1 (if known), k = −1 otherwise; set of pointsX = {x1, . . . ,xn}, xi ∈ �d; h; rmOutput: {C1, C2, . . .}/* Runs mean shift clustering on X . Optionally, the value ofk is set to the number of generated modes. The initial centersare equal to the modes. k-means clustering then runs. */1: M ← ∅2: for each xi ∈ X do3: Compute mean shift vector m(xt

i) using Eq. 94: xt+1

i = xt + m(xti)

5: Repeat (Step 3) until convergence to mode mj

6: M ← M⋃

mj

7: Associate xi with mode mj

8: end for9: for each mi ∈ M do10: for each mj ∈ M do11: Compute dE(mi,mj) using Eq. 412: if dE(mi,mj) ≤ rm then13: M ← M − mj

14: for each xl ∈ X associated with mj do15: Associate xl with mode mi

16: end for17: end if18: end for19: end for20: if k ≥ 1 then21: if |M | > k then22: Sort M in descending order of mass density23: M ← M − m1, where m1 is first mode in M24: for each xl ∈ X associated with m1 do25: Associate xl with nearest mode26: end for27: Repeat (Step 23) until |M | = k28: end if29: if |M | < k then30: Print “Not enough modes generated”31: Print “Run again with smaller h”32: Exit33: end if34: end if35: if k = −1 then36: k = |M |37: end if38: for each k-means cluster center ci do39: ci ← mi

40: end for41: for each i ∈ {1, 2, . . . , n} do42: for each j ∈ {1, 2, . . . , k} do43: Compute dE(xi, cj) using Eq. 444: end for45: Assign xi to nearest cluster46: end for47: for each i ∈ {1, 2, . . . , k} do48: ci = 1

|Ci|

∑xj∈Ci

xj

49: end for50: Repeat (go to Step 41) until convergence

579579581581

Page 4: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

methods: random-based initialization (Algorithm 3), uniform-based initialization (Algorithm 4), density-based initialization(Algorithm 5) and sphere-based initialization (Algorithm 6).The comparison is performed on two real-life problems withvery large amounts of data: Facility Location and MolecularDynamics. MS-KMEANS is tested with nine kernel functions:normal or Gaussian, Epanechnikov, triangular, quartic, uni-form, triweight, tricube, cosine and Lorentz.

Algorithm 3 RA-KMEANSInput: k ≥ 1; set of points X = {x1, . . . ,xn}, xi ∈ �d; seedOutput: {C1, . . . , Ck}/* The initial k centers are randomly generated. k-meansclustering then runs. */1: Seed random number generator with seed2: for each k-means cluster center ci do3: Generate random vector ri

4: ci ← ri

5: end for6: for each i ∈ {1, 2, . . . , n} do7: for each j ∈ {1, 2, . . . , k} do8: Compute dE(xi, cj) using Eq. 49: end for10: Assign xi to nearest cluster11: end for12: for each i ∈ {1, 2, . . . , k} do13: ci = 1

|Ci|

∑xj∈Ci

xj

14: end for15: Repeat (go to Step 6) until convergence

A. Facility LocationIn Facility Location problems, the location models have

been used traditionally to locate a manufacturer, storage,restaurant or shop near the customers with the minimum cost.In the last years these methods for location and distributionplanning have been transformed into methods for location ofrecycling plants and product return planning. Many researchershave developed mathematical models for products at their end-of-life and apply them to case studies. Some of those modelsinclude the development of sophisticated operations researchmodels for the planning of the recycling of industrial by-products and dismantling and recycling of products at the endof their lifetime [10]. Two very well-known cases are the planfor recycling sand with a network in two stages [11] and theplan for recycling carpet waste [12], [13].We have applied the k-means method to the Facility Lo-

cation problem of organizing the recycling of electric andelectronic equipment in Spain. According to the EuropeanCommission, the amount of Waste Electrical and ElectronicEquipment, WEEE, is growing rapidly. These electronic ap-pliances contain hazardous as well as valuable substances thatmust be treated correctly. For that reason, the directive ofthe European Parliament and the European Council on WasteElectrical and Electronic Equipment, the WEEE-directive [14],

Algorithm 4 BO-KMEANSInput: k ≥ 1; set of points X = {x1, . . . ,xn},xi ∈ �d

Output: {C1, . . . , Ck}/* Decomposes the d-dimensional space into k identical boxes.The initial k centers are the centers of mass of these boxes.k-means clustering then runs. */1: for i=1 to d do2: Sort {x1[i], . . . ,xn[i]} in ascending order3: Save smallest and largest values along i4: end for5: Decompose space into k identical boxes6: for each k-means cluster center ci do7: ci ← center of mass of ith box8: end for9: for each i ∈ {1, 2, . . . , n} do10: for each j ∈ {1, 2, . . . , k} do11: Compute dE(xi, cj) using Eq. 412: end for13: Assign xi to nearest cluster14: end for15: for each i ∈ {1, 2, . . . , k} do16: ci = 1

|Ci|

∑xj∈Ci

xj

17: end for18: Repeat (go to Step 9) until convergence

Algorithm 5 DB-KMEANSInput: k ≥ 1; set of points X = {x1, . . . ,xn}, xi ∈ �d

Output: {C1, . . . , Ck}/* Decomposes the space into qd

∗ identical boxes, with q∗ =arg minq∈N k ≤ qd. The initial k centers are the centers ofmass of the k boxes with the largest mass densities. k-meansclustering then runs. */1: for i=1 to d do2: Sort {x1[i], . . . ,xn[i]} in ascending order3: Save smallest and largest values along i4: end for5: q∗ ← arg minq∈N k ≤ qd

6: Decompose space into qd∗ identical boxes

7: Sort boxes in descending order according of mass density8: Keep the first k boxes9: for each k-means cluster center ci do10: ci ← center of mass of ith box11: end for12: for each i ∈ {1, 2, . . . , n} do13: for each j ∈ {1, 2, . . . , k} do14: Compute dE(xi, cj) using Eq. 415: end for16: Assign xi to nearest cluster17: end for18: for each i ∈ {1, 2, . . . , k} do19: ci = 1

|Ci|

∑xj∈Ci

xj

20: end for21: Repeat (go to Step 12) until convergence

580580582582

Page 5: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

Algorithm 6 DE-KMEANSInput: k ≥ 1; set of points X = {x1, . . . ,xn}, xi ∈ �d

Output: {C1, . . . , Ck}/* Calculates mass density on a sphere centered on every xi.The initial k centers are the centers of mass of the k sphereswith largest mass densities. k-means clustering then runs. */1: for each xi ∈ X do2: if d mod 2 > 0 then3: Cd ← 2(d+1)/2 ∗ π(d−1)/2/d4: end if5: if d mod 2 = 0 then6: Cd ← πd/2/(d/2)7: end if8: for i=1 to d do9: Sort {x1[i], . . . ,xn[i]} in ascending order10: Save smallest and largest values along i11: end for12: V ← volume of the subspace covered by X13: r ← 2 ∗ (V/(k ∗ Cd))

1/d

14: vi ← Cd ∗ rd

15: mdi ← mass / vi

16: end for17: Sort {mdi, . . . , mdn} in descending order18: Keep the first k spheres19: for each k-means cluster center ci do20: ci ← center of mass of ith sphere21: end for22: for each i ∈ {1, 2, . . . , n} do23: for each j ∈ {1, 2, . . . , k} do24: Compute dE(xi, cj) using Eq. 425: end for26: Assign xi to nearest cluster27: end for28: for each i ∈ {1, 2, . . . , k} do29: ci = 1

|Ci|

∑xj∈Ci

xj

30: end for31: Repeat (go to Step 22) until convergence

dated the 13th of February 2003, looks for the implementationof systems for take-back and treatment of 4 Kg of WEEE perinhabitant and year at their end-of-life phase. We have studiedthis recycling problem in a particular European country, Spain.For about 40 million inhabitants in Spain, that directive means160000 tons of WEEE per year. In Spain there are currentlyonly some recycling plants for WEEE, but there is not enoughcapacity for the treatment of all the products. Therefore, asystem of take-back and treatment must be installed in order tofulfill the requirements of the directive. The geo-demographiccharacteristics of the Spanish territory hinder the creation ofan optimum structure of recycling plants: Spain is one ofthe largest countries in the European Union, the density ofpopulation is lower in comparison with other European statesand its population is scattered. This means that a few plantscan treat all the waste and that the waste will have to be

transported along very large distances. For that reason, theinstallation of a return system implies not only the installationof recycling plants, but also of storage centers. In that system,the WEEE is left by the consumers in the existing collectionor clean points, then it is carried from the collection points tothe storage centers and stored there until it is transported laterto a recycling plant. There are three types of costs: Transport,investment+fixed, and staff. The largest percentage in weightof the total WEEE that must be collected, 58%, are the greathousehold-electric ones.The organization of this recycling requires the optimization

of two stages: The storage centers and the recycling plants.First, it is necessary to find the optimal locations of the storagecenters, but also the optimal number of storage centers, usingthe collection or clean points. Second, after that optimization,it is necessary to find the optimal locations of the recyclingplants and the optimal number of recycling plants, using thelocations of the storage centers obtained in the first level.There are 532 existing collection points in Spain, where

the waste is left by the consumers. The amount of waste isunknown because the necessary infrastructure for collectionof the WEEE does not exist. Therefore, two different amountsof great household-electric waste are used in the calculations:1) The minimum amount demanded by the WEEE-Directive(2.32 Kg/(inhabitant year), 92000 Tn/year) and 2) a largeramount predicted by some studies (4.79 Kg/(inhabitant year)[15]. We have obtained the waste mass that should be collectedin the collection points in every council by multiplying thepopulation in the councils by the requirement of 2.32 (or4.79) Kg/(inhabitant year). Hence, every collection point has,in general, a different waste mass and this is taken into accountin the calculations done by the k-means algorithm.The maximum capacity of a single storage center is 10000

Tn/year and the maximum capacity of a recycling plant isestimated to be 30000 Tn/year, based on the design of theexisting plants. These two constrain st are also taken intoaccount in the calculations done by the k-means algorithm.The costs of transport and another costs are explained inreference [9].Table I shows the best results obtained using the five ini-

tialization methods. The RA-KMEANS, BO-KMEANS, DB-KMEANS and DE-KMEANS results were already publishedin a former paper [9]. The RA-KMEANS results were obtainedafter running this algorithm eight to ten times. In the tablewe compare the Initial and Final total costs. The Initial costsare the costs calculated using the initial locations calculatedby the five initialization methods. The k-means method findsthe locations that reduce the costs, starting from those initiallocations. It can be seen in Table I that for all the initializationmethods the k-means method yields Final costs much lowerthan the corresponding Initial costs.MS-KMEANS results are better than those obtained with

the other algorithms. This algorithm yields the lowest Finaltotal cost as can be seen in the table. The table also showsthe optimal numbers of storage centers and recycling plantsobtained with the five algorithms as well as the results obtained

581581583583

Page 6: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

TABLE IRESULTS AS A FUNCTION OF THE TOTAL WEEE AND THE

INITIALIZATION METHOD. Ns AND Nr ARE THE OPTIMAL NUMBERS OFSTORAGE CENTERS AND RECYCLING PLANTS, RESPECTIVELY. TOTAL

COSTS ARE IN MILLION OF EUROS/YEAR AND WEEE IN TN/YEAR. S ANDR STANDS FOR STORAGE CENTERS AND RECYCLING PLANTS,

RESPECTIVELY. INITIAL REFERS TO 0 ITERATIONS.

Initialization method Ns Nr Initial total cost Final total costTotal WEEE 92000 S R S RMS-KMEANS 10 5 10.65 18.33 8.62 16.08RA-KMEANS 10 5 12.09 18.37 8.65 16.12BO-KMEANS 10 5 10.32 17.41 8.64 16.20DB-KMEANS 11 4 8.96 16.47 8.69 16.32DE-KMEANS 11 5 13.07 16.88 8.77 16.29

Total WEEE 190000 S R S RMS-KMEANS 20 7 30.08 47.19 16.77 30.05RA-KMEANS 20 7 26.63 35.76 16.77 30.18BO-KMEANS 20 7 19.77 33.83 16.91 30.17DB-KMEANS 22 7 17.77 30.96 16.86 30.27DE-KMEANS 23 7 23.81 35.35 17.11 30.12

400 600 800 1000 1200 1400 1600

X coordinate (km)

4000

4200

4400

4600

4800

Y c

oord

inat

e (k

m)

Clean pointStorage centerRecycling plant

Fig. 1. Distribution of storage centers and recycling plants in Spain, obtainedusing MS-KMEANS with two constraints: maximum capacity per storagecenter is 10000 Tn/year and 30000 Tn/year per recycling plant. The 532collection or clean points used in the calculations are also plotted. The resultsfor 92000 Tn/year (2.32 Kg/(inhabitant year)) are shown.

for a larger amount of waste per inhabitant and year, 4.79Kg/(inhabitant year) [15], which means 190000 Tn/year ofWEEE. For this total amount the MS-KMEANS also yieldsthe lowest Final total cost.The locations of the collection points and the locations

of the optimal storage centers and recycling plants obtainedby MS-KMEANS are plotted in Fig 1. A MS-KMEANScalculation takes about 40-70 seconds while a calculation donewith the other four algorithms takes 2-4 seconds. Hence, thecosts obtained by the MS-KMEANS are the lowest, but thecomputing time is about 10-35 times larger.B. Molecular DynamicsClassical Molecular Dynamics consists on the simulation of

the movement of the atoms of a material due to the forces onthem [16], [17], [18]. The atoms of the material are consideredas points mass and the force equations of Newton are solved tocalculate their speed and position at every moment or timestep.These simulations are computationally expensive because of

the large number of atoms and timesteps. Many thousandsor millions of atoms and tens or hundreds of thousands oftimesteps are typically used in the simulations of picosecondsof time in liquids and solids.There are three basic types of parallel algorithms for MD

simulations [19], [20]: Force, atom and spatial decompositionalgorithms. Most of the computational cost of a MD simula-tion, 90 %, consists on the calculation of the forces betweenthe atoms. These algorithms distribute the calculations of theforces among the processors. For instance, atom decomposi-tion algorithms, also named replicated data [21], [22], [23],assign a different set of atoms to every processor during theentire simulation and the forces on those atoms are computedby the corresponding processor.The spatial or domain decomposition algorithms [21], [24],

[25], [26], [27] divide or decompose the space into clusters,assign the atoms of every cluster or region to a processor andthe forces on the atoms of that cluster are calculated only bythe corresponding processor. After a number of timesteps, toomany atoms have moved away from their original assignedclusters and it is necessary to decompose again the spaceinto clusters. If we divide the set of atoms in np clustersof atoms, then the number of pairs of atoms of the clusteri is Pi = Ni(Ni − 1)/2, where Ni is the number of atomsof cluster i. The number of forces of that cluster, NFi, is, ina first approach equal to Pi. In MD the force between twoatoms depends on the coordinates of those two atoms, mainlyon the distance between those two atoms. If the atoms aretoo far away, far from a so called cutoff distance, then theforce is so negligible than it is not necessary to calculate thatforce. Hence, the number of forces that we need to calculatein cluster i is, in fact, smaller or equal than Pi. However, weshould consider also that the atoms in the borders of the clusteri are also interacting with atoms in the neighbour clusters.This increases the value of NFi obtained considering only thecutoff distance. The result is that Pi is an approximation ofNFi.If np is the number of processors available, then the ideal

decomposition consists on np clusters with NF1 = NF2... =NFnp. Every cluster is assigned only to a processor and thisprocessor calculates only the forces on the atoms of the cluster.There are different parameters to measure the computational

load-balance and the performance of a parallel MD simulation.We have studied two parameters. The first parameter is thelargest NFi of the decomposition, NFmax=max(NFi). Thisis the most important parameter for MD simulations becauseof the following reason. In a parallel MD simulation everyprocessor calculates its particular forces and then sends theresults to the master node. This node receives the resultsfrom every processor and waits until all the processors havefinished the calculations of their forces and have sent theresults. This is repeated on every iteration or timestep of thesimulation. Therefore, the cluster with the largest NFi valuewill be the last node to report the results and the whole load-balance and the calculation will depend on this parameter.The best distribution would be that with the smallest value of

582582584584

Page 7: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

Fig. 2. Hemoglobin, a molecule of 4770 atoms. Carbon, nitrogen, oxygen,sulphur and iron atoms are the black, red, blue, yellow and orange balls,respectively

NFmax. The second parameter is a measure of the goodnessof the load-balance and is L = 100 < D > / < NF >, where< NF > is the average force on a processor:

< NF >=

np∑i=1

NFi/np (10)

and < D > is given by:

< D >=

np∑i=1

|NFi− < NF > |/np (11)

A good load-balance implies low values of L. The applicationof k-means to MD simulations implies that the number ofclusters is equal to the number of processors available, np, andthat the atoms of the cluster are assigned to a processor. Thatprocessor will calculate only the forces between the atoms ofthat cluster.We have applied MS-KMEANS to two sets of atoms: The

hemoglobin, a molecule of 4770 atoms, and a slab of a fcclattice of 13500 atoms, with a lattice parameter of 7 A. Thehemoglobin is an inhomogeneous system, as can be seen inits plot in Fig. 2. On the contrary, the fcc slab is an orderedset of points: the atoms form a network of points such thatthe distance between the atoms is the same, the environmentaround every point is the same. This system of atoms isinteresting because k-means is useful for MD simulations ofinhomogeneous systems and this slab is the opposite case. Wehave used three values of the number of processors, np, andwe have searched for the parameters that yield the best load-balance, i.e., the smallest value of NFmax.We have applied MS-KMEANS to these two systems of

atoms, using different values of the rm and h parameters,and testing nine different kernel functions or profiles: normal,Epanechnikov, triangular, quartic, uniform, triweight, tricube,cosine and Lorentz. The best results obtained for each value ofnp and their corresponding parameters and profiles are shown

in Table II and III. It can be seen in those tables that MS-KMEANS reduces enormously the initial NFmax and load-balance L.The comparisons between the initialization methods are

shown in Tables IV and V. In those tables we have includedalso the computing time used by the initialization methodsin the same host. MS-KMEANS yields the best results forMD simulations, i.e., the smallest value of NFmax, for thethree numbers of processors studied. The setback, however,is that the MS-KMEANS yields the highest computing timein the same host. That computing time is about one order ofmagnitude higher than the computing time used by the otherinitialization methods. In the case of the slab of 13500 atomsthe computing time of the MS-KMEANS is even two ordersof magnitude higher for np=32.In a parallel MD simulation based on the present k-means

spatial decomposition algorithm, the computing time used bythe algorithm is obviously part of the computing time ofthe MD simulation. The high computing time of the MS-KMEANS algorithm would increase the computing time ofthe MD simulation, but on another hand, its decompositionwould reduce the computing time, compared with the otherinitialization methods.

TABLE IVRESULTS ON THE HEMOGLOBIN, A MOLECULE OF 4770 ATOMS. L IS IN %

AND TIME IN SECONDS. INITIAL REFERS TO 0 ITERATIONS.

Initial Finalnp method NFmax L NFmax L time16 MS-KMEANS 105753 38 54774 11 24.016 BO-KMEANS 92425 42 61378 14 2.516 DB-KMEANS 57953 14 56244 11 2.516 DE-KMEANS 144070 80 57678 12 4.824 MS-KMEANS 45613 25 36887 9 18.224 BO-KMEANS 85214 49 46552 20 2.524 DB-KMEANS 46823 17 42918 11 2.524 DE-KMEANS 137609 97 45519 14 5.032 MS-KMEANS 47119 33 30092 13 30.632 BO-KMEANS 73476 66 32364 17 2.532 DB-KMEANS 29920 14 31448 14 2.532 DE-KMEANS 92974 87 29071 14 4.8

TABLE VRESULTS ON A SLAB OF A FCC LATTICE OF 13500 ATOMS WITH A

LATTICE PARAMETER OF 7 A. L IS IN % AND TIME IN SECONDS. INITIALREFERS TO 0 ITERATIONS.

Initial Finalnp method NFmax L NFmax L time16 MS-KMEANS 485615 25 294511 9 19516 BO-KMEANS 295712 18 295712 18 1516 DB-KMEANS 476065 42 471586 41 1416 DE-KMEANS 383487 47 374502 14 3424 MS-KMEANS 391208 60 207243 14 62924 BO-KMEANS 295712 25 295712 25 1424 DB-KMEANS 258854 33 240747 22 1524 DE-KMEANS 661419 87 211936 9 3332 MS-KMEANS 269000 41 154280 10 131332 BO-KMEANS 419388 26 190784 21 1532 DB-KMEANS 165084 25 179833 15 1532 DE-KMEANS 692286 108 169087 11 34

583583585585

Page 8: [IEEE 2012 IEEE 12th International Conference on Computer and Information Technology (CIT) - Chengdu, Sichuan, China (2012.10.27-2012.10.29)] 2012 IEEE 12th International Conference

TABLE IIMS-KMEANS ON THE HEMOGLOBIN, A MOLECULE OF 4770 ATOMS. L IS IN % AND TIME IN SECONDS. INITIAL REFERS TO 0 ITERATIONS.

Initial Finalnp profile rm h tolerance itmax NFmax L NFmax L time16 tricube 4 20 1 100 105753 38 54774 11 24.024 normal 8 1 1 100 45613 25 36887 9 18.232 cosine 4 20 1 100 47119 33 30092 13 30.6

TABLE IIIMS-KMEANS ON A SLAB OF A FCC LATTICE OF 13500 ATOMS WITH A LATTICE PARAMETER OF 7 A. L IS IN % AND TIME IN SECONDS. INITIAL

REFERS TO 0 ITERATIONS.

Initial Finalnp profile rm h tolerance itmax NFmax L NFmax L time16 Epanechnikov 4 20 10 100 485615 25 294511 9 19524 triangular 6 20 0.1 30 391208 60 207243 14 62932 Lorentz 10 20 10 30 269000 41 154280 10 1313

V. CONCLUSION

We presented MS-KMEANS, a k-means initializationmethod based on the modes found by Mean Shift cluster-ing. The comparative study with other initialization methodsshowed that, in terms of the quality of the resulting clustering,MS-KMEANS outperforms the other methods. However, it hasalso the highest computational cost of all the methods studied.In some applications such as, e.g, Facility Location, this is notmuch of an issue because the clustering is only done once, butit remains an important weakness in other problems such as,e.g., Molecular Dynamics. This balance between quality of theresulting clustering and computational cost will be consideredand studied in future research. We will also consider otherclustering methods as part of our future work.

ACKNOWLEDGMENT

I. Cabria acknowledges support from Ministerio de Ed-ucacion y Ciencia and University of Valladolid, Spain. I.Gondra acknowledges support from the Natural Sciences andEngineering Research Council (NSERC) of Canada.

REFERENCES

[1] R. A. Johnson and Q. W. Wichern, Applied Multivariate StatisticalAnalysis. Prentice Hall, 1988.

[2] M. Abramowitz and I. A. Stegun, Eds., A handbook of mathematicalfunctions. U.S. Department of Commerce, National Bureau of StandardsApplied Mathematical Series, 1964.

[3] R. Duda, P. Hart, and D. Stock, Pattern classification. John Wiley &Sons, 2001.

[4] Y. Wang, J. Yang, and N. Peng, “Unsupervised color-texture segmenta-tion based on soft criterion with adaptive mean-shift clustering,” PatternRecognition Letters, vol. 27, pp. 386–392, April 2006.

[5] I. Y. H. Gu and V. Gui, “Colour image segmentation using adaptivemean shift filters,” in Proceedings of IEEE International Conference onImage Processing, vol. 1, 2001, pp. 726–729.

[6] I. Gondra and T. Xu, “Adaptive mean shift-based image segmentationusing multiple instance learning,” in Proceedings of IEEE InternationalConference on Digital Information Management, 2008, pp. 716–721.

[7] R. Duda, P. Hart, and D. Stork., Pattern classification, 2nd ed. Wiley,2000, ch. 4.

[8] D. Comaniciu and P. Meer, “Mean shift: A robust approach towardfeature space analysis,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 24, no. 5, pp. 603–619, May 2002.

[9] I. Cabria and D. Queiruga, “Improvement of the Koradi parallelalgorithm for molecular dynamics and application to the economicorganization and optimization of recycling costs of waste electrical andelectronic equipment,” Europhys. Lett., vol. 71, pp. 845–851, 2005.

[10] T. Spengler, H. Puchert, T. Penkuhn, and O. Rentz, “Environmentalintegrated production and recycling management,” Eur. J. Oper. Res.,vol. 97, pp. 308–326, 1997.

[11] A. I. Barros, R. Dekker, and V. Scholten, “A two-level network forrecycling sand: a case study,” Eur. J. Oper. Res., vol. 110, pp. 199–214,1998.

[12] M. J. Realff, J. C. Ammons, and D. Newton, “Carpet recycling:Determining the reverse production system design,” Polymer-PlasticsTechnology and Engineering, vol. 38, pp. 547–67, 1999.

[13] D. Louwers, B. Kip, E. Peters, F. Souren, and S. Flapper, “A facilitylocation allocation model for reusing carpet materials,” Computers andIndustrial Engineering, vol. 36, pp. 855–869, 1999.

[14] European Parliament and Council, “Directive 2002/96/EC of the Euro-pean Parliament and Council of 27 January 2003 on waste electrical andelectronic equipment (WEEE)”.

[15] Asociacion Nacional de Fabricantes de Electrodomesticos, “Gestion delos electrodomesticos de lınea blanca al final de su uso,” Madrid, 2000.

[16] M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids.Oxford: Oxford University Press, 1987.

[17] D. C. Rapaport, The Art of Molecular Dynamics Simulation. Cam-bridge: Cambridge University Press, 1995.

[18] K. Ohno, K. Esfarjani, and Y. Kawazoe, Computational MaterialsScience: From ab initio to Monte Carlo Methods. Heidelberg: Springer-Verlag, 1999.

[19] S. Plimpton, “Fast parallel algorithms for short-range molecular dynam-ics,” J. Comput. Phys., vol. 117, p. 1, 1995.

[20] G. S. Heffelfinger, “Parallel atomistic simulations,” Comput. Phys.Commun., vol. 128, p. 219, 2000.

[21] D. Fincham, “Parallel computers and molecular simulation,”Mol. Simul.,vol. 1, pp. 1–45, 1987.

[22] D. C. Rappaport, “Large-scale molecular dynamics simulation usingvector and parallel computers,” Comput. Phys. Rep., vol. 9, pp. 1–54,1988.

[23] W. Smith, “Molecular dynamics on distributed memory (mimd) parallelcomputers,” Theor. Chim. Acta, vol. 84, pp. 385–398, 1993.

[24] M. Putz and A. Kolb, “Optimization techniques for parallel moleculardynamics using domain decomposition,” Comput. Phys. Commun., vol.113, p. 145, 1998.

[25] R. A. McCoy and Y. Deng, “Parallel particle simulations of thin-filmdeposition,” Int. J. High Perform. Comput. Appl., vol. 13, pp. 16–32,1999.

[26] R. Koradi, M. Billeter, and P. Guntert, “Point-centered domain decom-position for parallel molecular dynamics simulation,” Comput. Phys.Commun., vol. 124, p. 139, 2000.

[27] S. J. Stuart, Y. Li, O. Kum, J. W. Mintmire, and A. F. Voter, “Reactivebond-order simulations using both spatial and temporal approaches toparallelism,” Struc. Chem., vol. 15, p. 479, 2004.

584584586586