Research Article Optimized K-Means Algorithm2. K-Means Clustering K-means, an unsupervised learning...

Research ArticleOptimized K-Means Algorithm

Samir Brahim Belhaouari,1 Shahnawaz Ahmed,1 and Samer Mansour2

1 Department of Mathematics & Computer Science, College of Science and General Studies, Alfaisal University,P.O. Box 50927, Riyadh, Saudi Arabia

2Department of Software Engineering, College of Engineering, Alfaisal University, P.O. Box 50927, Riyadh, Saudi Arabia

Correspondence should be addressed to Shahnawaz Ahmed; [email protected]

Received 18 May 2014; Revised 9 August 2014; Accepted 13 August 2014; Published 7 September 2014

Academic Editor: Gradimir Milovanović

Copyright © 2014 Samir Brahim Belhaouari et al. This is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.

The localization of the region of interest (ROI), which contains the face, is the first step in any automatic recognition system, whichis a special case of the face detection. However, face localization from input image is a challenging task due to possible variations inlocation, scale, pose, occlusion, illumination, facial expressions, and clutter background. In this paper we introduce a new optimizedk-means algorithm that finds the optimal centers for each cluster which corresponds to the global minimum of the k-means cluster.This method was tested to locate the faces in the input image based on image segmentation. It separates the input image into twoclasses: faces and nonfaces. To evaluate the proposed algorithm,MIT-CBCL, BioID, and Caltech datasets are used.The results showsignificant localization accuracy.

1. Introduction

The face detection problem is to determine the face existencein the input image and then return its location if it exists.But in the face localization problem it is given that the inputimage contains a face and the goal is to determine the locationof this face. The automatic recognition system, which is aspecial case of the face detection, locates, in its earliest phase,the region of interest (ROI) that contains the face. Facelocalization in an input image is a challenging task due topossible variations in location, scale, pose, occlusion, illumi-nation, facial expressions, and clutter background. Variousmethods were proposed to detect and/or localize faces in aninput image; however, there are still needs to improve theperformance of localization and detectionmethods. A surveyon some face recognition and detection techniques can befound in [1]. A more recent survey, mainly on face detection,was written by Yang et al. [2].

One of the ROI detection trends is the idea of segmentingthe image pixels into a number of groups or regions basedon similarity properties. It gained more attention in therecognition technologies, which rely on grouping the featuresvector to distinguish between the image regions, and then

concentrate on a particular region, which contains the face.One of the earliest surveys on image segmentation techniqueswas done by Fu and Mui [3]. They classify the techniquesinto three classes: features shareholding (clustering), edgedetection, and region extraction. A later survey by N. R. Paland S. K. Pal [4] did only concentrate on fuzzy, nonfuzzy,and colour images techniques. Many efforts had been done inROI detection, which may be divided into two types: regiongrowing and region clustering. The difference between thetwo types is that the region clustering searches for the clusterswithout prior information while the region growing needsinitial points called seeds to detect the clusters. The mainproblem in region growing approach is to find the suitableseeds since the clusters will grow from the neighbouringpixels of these seeds based on a specified deviation. For seedsselection, Wan and Higgins [5] defined a number of criteriato select insensitive initial points for the subclass of regiongrowing. To reduce the region growing time, Chang and Li[6] proposed a fast region growing method by using parallelsegmentation.

As mentioned before, the region clustering approachsearches for clusters without prior information. Pappas andJayant [7] generalized the K-means algorithm to group the

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014, Article ID 506480, 14 pageshttp://dx.doi.org/10.1155/2014/506480

2 Mathematical Problems in Engineering

image pixels based on spatial constraints and their intensityvariations. These constraints include the following: first,the region intensity to be close to the data and, second,to have a spatial continuity. Their generalization algorithmis adaptive and includes the previous spatial constraints.Like the K-means clustering algorithm, their algorithm isiterative. These spatial constraints are included using Gibbsrandom field model and they concluded that each region ischaracterized by a slowly varying intensity function. Becauseof unimportant features, the algorithm is not that muchaccurate. Later, to improve theirmethod, they proposed usinga caricature of the original image to keep only the mostsignificant features and remove the unimportant features [8].Besides the problem of time costly segmentation, there arethree basic problems that occur during the clustering processwhich are center redundancy, dead centers, and trappedcenters at local minima. Moving K-mean (MKM) proposedin [9] can overcome the three basic problems by minimizingthe center redundancy and the dead center as well as reducingindirectly the effect of trapped centers at local minima. Inspite of that, MKM is sensitive to noise, centers are notlocated in the middle or centroid of a group of data, andsome members of the centers with the largest fitness areassigned as members of a center with the smallest fitness. Toreduce the effects of these problems. Isa et al. [10] proposedthree modified versions of the MKM clustering algorithmfor image segmentation: fuzzy moving k-means, adaptivemoving k-means, and adaptive fuzzy moving k-means. Afterthat Sulaiman and Isa [11] introduced a new segmentationmethod based on a new clustering algorithm which isAdaptive Fuzzy k-means Clustering Algorithm (AFKM). Onthe other hand, the Fuzzy C-Means (FCM) algorithm wasused in image segmentation but it still has some drawbacksthat are the lack of enough robustness to noise and outliers,the crucial parameter 𝛼 being selected generally based onexperience, and the time of segmenting of an image beingdependent on its size. To overcome the drawbacks of theFCM algorithm, Cai et al. [12] integrated both local spatialand gray information to propose fast and robust Fuzzy C-means algorithm called Fast Generalized Fuzzy C-MeansAlgorithm (FGFCM) for image segmentation. Moreover,many researchers have provided a definition for some datacharacteristics where it has a significant impact on the K-means clustering analysis including the scattering of the data,noise, high-dimensionality, the size of the data and outliers inthe data, datasets, types of attributes, and scales of attributes[13]. However, there is a need for more investigation tounderstand how data distributions affect K-means clusteringperformance. In [14] Xiong et al. provided a formal andorganized study of the effect of skewed data distributionson K-means clustering. Previous research found the impactof high-dimensionality on K-means clustering performance.The traditional Euclidean notion of proximity is not veryuseful on real-world high-dimensional datasets, such as doc-ument datasets and gene-expression datasets. To overcomethis challenge, researchers turn tomake use of dimensionalityreduction techniques, such as multidimensional scaling [15].Second, K-means has difficulty in detecting the “natural”clusters with nonspherical shapes [13]. Modified K-means

clustering algorithm is a direction to solve this problem.Guhaet al. [16] proposed the CURE method, which makes use ofmultiple representative points to obtain the “natural” clustersshape information. The problem of outliers and noise in thedata can also reduce clustering algorithms performance [17],especially for prototype-based algorithms such as K-means.The direction to solve this kind of problem is by combiningsome outlier removal techniques before conducting K-meansclustering. For example, a simple method [9] of detectingoutliers is based on the distance measure. On the other hand,manymodifiedK-means clustering algorithms that workwellfor smallermedium-size datasets are unable to deal with largedatasets. Bradley et al. in [18] introduced a discussion ofscaling K-means clustering to large datasets.

Arthur and Vassilvitskii implemented a preliminary ver-sion, k-means++, in C++ to evaluate the performance of k-means. K-means++ [19] uses a careful seeding method tooptimize the speed and the accuracy of k-means. Experi-ments were performed on four datasets and results showedthat that this algorithm is 𝑂(log 𝑘)-competitive with theoptimal clustering. Experiments also proved its better speed(70% faster) and accuracy (potential value obtained is betterby factors of 10 to 1000) than k-means. Kanungo et al.[20] also proposed an 𝑂(𝑛3𝜖−𝑑) algorithm for the k-meansproblem. It is (9 + 𝜖) competitive but quiet slow in practice.Xiong et al. investigates the measures that best reflects theperformance of K-means clustering [14]. An organized studywas performed to understand the impact of data distributionson the performance of K-means clustering. Research workalso has improved the traditional K-means clustering so thatit can handle datasets with large variation of cluster sizes.Thisformal study illustrates that the entropy sometimes providedmisleading information on the clustering performance. Basedon this, a coefficient of variation (CV) is proposed to validatethe clustering outcome. Experimental results proved that, fordatasets with great variation in “true” cluster sizes, K-meanslessens (less than 1.0) the variation in resultant cluster sizes.For datasets with small variation in “true” cluster sizes K-means increases (greater than 0.3) the variation in resultantcluster sizes.

Scalability of the k-means for large datasets is one ofthe major limitations. Chiang et al. [21] proposed a simpleand easy to implement algorithm for reducing the compu-tation time of k-means. This pattern reduction algorithmcompresses and/or removes iteration patterns that are notlikely to change their membership.The pattern is compressedby selecting one of the patterns to be removed and setting itsvalue to the average of all patterns removed. If 𝐷[𝑟𝑖

𝑙] is the

pattern to be removed then we can write it as follows:

𝐷[𝑟𝑖

𝑙] =

|𝑅𝑖

𝑙|

∑

𝑗=1

𝐷[𝑟𝑙

𝑖𝑗] , where 𝑟𝑖

𝑙∈ 𝑅𝑖

𝑙. (1)

This algorithm can be also applied to other clusteringalgorithms like population-based clustering algorithms. Thecomputation time of many clustering algorithms can bereduced using this technique, as it is especially effectivefor large and high-dimensional datasets. In [22] Bradley

Mathematical Problems in Engineering 3

et al. proposed Scalable k-means that uses buffering anda two-stage compression scheme to compress or removepatterns in order to improve the performance of k-means.The algorithm is slightly faster than k-means but does notshow the same result always. The factors that degrade theperformance of scalable k-means are the compression pro-cesses and compression ratio. Farnstrom et al. also proposeda simple single pass k-means algorithm [23], which reducesthe computation time of k-means. Ordonez and Omiecinskiproposed relational k-means [24] that uses the block andincremental concept to improve the stability of scalable k-means.The computation time of k-means can also be reducedusing Parallel bisecting k-means [25] and triangle inequality[26] methods.

Although K-means is the most popular and simple clus-tering algorithm, the major difficulty with this process is thatit cannot ensure the global optimumresults because the initialcluster centers are selected randomly. Reference [27] presentsa novel technique that selects the initial cluster centers byusing Voronoi diagram.This method automates the selectionof the initial cluster centers. To validate the performanceexperiments were performed on a range of artificial and realworld datasets. The quality of the algorithm is assessed usingthe following error rate (ER) equation:

ER =Number ofmisclassified objects

Total number of objects× 100%. (2)

The lower the error rate the better the clustering. Theproposed algorithm is able to produce better clustering resultsthan the traditional K-means. Experimental results proved60% reduction in iterations for defining the centroids.

All previous solutions and efforts to increase the perfor-mance of K-means algorithm still need more investigationsince they are all looking for a local minimum. Searching forthe global minimum will certainly improve the performanceof the algorithm.The rest of the paper is organized as follows.In Section 2, the proposed method is introduced that findsthe optimal centers for each cluster which corresponds tothe global minimum of the k-means cluster. In Section 3 theresults are discussed and analyzed using two sets from MIT,BioID, and Caltech datasets. Finally in Section 4, conclusionsare drawn.

2. K-Means Clustering

K-means, an unsupervised learning algorithm first proposedby MacQueen, 1967 [28], is a popular method of clusteranalysis. It is a process of organizing the specified objectsinto uniform classes called clusters based on similaritiesamong objects based on certain criteria. It solves the well-known clustering problem by considering certain attributesand performing an iterative alternating fitting process. Thisalgorithm partitions a dataset 𝑋 into 𝑘 disjoint clusters suchthat each observation belongs to the cluster with the nearestmean. Let𝑥

𝑖be the centroid of cluster 𝑐

𝑖and let𝑑(𝑥

𝑗, 𝑥𝑖)be the

180

160

140

120

100

80

60

40

20

0

Clus

terin

g er

ror

Number of clusters k2 4 6 8 10 12 14 16

Global k-means

Fast global k-means

k-means

Min k-means

Figure 1: Comparison of k-means.

dissimilarity between 𝑥𝑖and object 𝑥

𝑗∈ 𝑐𝑖.Then the function

minimized by the k-means is given by the following equation:

min𝑥1 ,...,𝑥𝑘

𝐸 =

𝑘

∑

𝑖=1

∑

𝑥𝑗∈𝑐𝑖

𝑑 (𝑥𝑗, 𝑥𝑖) . (3)

K-means clustering is utilized in a vast number ofapplications including machine learning, fault detection,pattern recognition, image processing, statistics, and artificialintelligent [11, 29, 30]. k-means algorithm is considered asone of the fastest clustering algorithms with a number ofvariants that are sensitive to the selection of initial pointsand are intended for solving many issues of k-means likethe evaluation of the clusters number [31], the method ofinitialization of the clusters centroids [32], and the speed ofthe algorithm [33].

2.1. Modifications in K-Means Clustering. Global k-meansalgorithm [34] is a proposal to improve global search prop-erties of k-means algorithm and its performance on verylarge datasets by computing clusters successively. The mainweakness of k-means clustering, that is, its sensitivity to initialpositions of the cluster centers, is overcome by global k-means clustering algorithm which consists of a deterministicglobal optimization technique. It does not select initial valuesrandomly; instead an incremental approach is applied tooptimally add one new cluster center at each stage. Globalk-means also proposes a method to reduce the computa-tional load while maintain the solution quality. Experimentswere performed to compare the performance of k-means,global k-means, min k-means, and fast global k-means asshown in Figure 1. Numerical results show that the globalk-means algorithm considerably outperforms other k-meansalgorithms.


Bagirov [35] proposed a new version of the global k-means algorithm for minimum sum-of-squares clusteringproblems. He also compared three different versions ofthe k-means algorithm to propose the modified versionof the global k-means algorithm. The proposed algorithmcomputes clusters incrementally and 𝑘 − 1 cluster centersfrom the previous iteration are used to compute k-partitionof a dataset. The starting point of the 𝑘th cluster center iscomputed byminimizing an auxiliary cluster function. Givena finite set of points 𝐴 in the 𝑛-dimensional space the globalk-means compute the centroid𝑋1 of the set𝐴 as the followingequation:

𝑋1=

1

𝑚

𝑚

∑

𝑖=1

𝑎𝑖, 𝑎𝑖∈ 𝐴, 𝑖 = 1, . . . , 𝑚. (4)

Numerical experiments performed on 14 datasets demon-strate the efficiency of themodified global k-means algorithmin comparison to the multistart k-means (MS k-means) andthe GKM algorithms when the number of clusters 𝑘 > 5.Modified global k-means however requires more computa-tional time than the global k-means algorithm. Xie and Jiang[36] also proposed a novel version of the global k-meansalgorithm. The method of creating the next cluster center inthe global k-means algorithm is modified and that showeda better execution time without affecting the performanceof the global k-means algorithm. Another extension of thestandard k-means clustering is the Global Kernel k-means[37] which optimizes the clustering error in feature spaceby employing kernel-means as a local search procedure. Akernel function is used in order to solve the M-clusteringproblem and near-optimal solutions are used to optimizethe clustering error in the feature space. This incrementalalgorithm has high ability to identify nonlinearly separableclusters in input space and no dependence on cluster initial-ization. It can handle weighted data points making it suitablefor graph partitioning applications. Twomajor modificationswere performed in order to reduce the computational costwith no major effect on the solution quality.

Video imaging and Image segmentation are importantapplications for k-means clustering. Hung et al. [38]modifiedthe k-means algorithm for color image segmentation wherea weight selection procedure in the W-k-means algorithm isproposed. The evaluation function 𝐹(𝐼) is used for compari-son:

𝐹 (𝐼) = √𝑅

𝑅

∑

𝑖=1

(𝑒𝑖/256)2

√𝐴𝑖

, (5)

where 𝐼 = segmented image, 𝑅 = the number of regions in thesegmented image, 𝐴

𝑖= the area, or the number of pixels of

the 𝑖th region, and 𝑒𝑖= the color error of region 𝑖.

𝑒𝑖is the sum of the Euclidean distance of the color vectors

between the original image and the segmented image. Smallervalues of 𝐹(𝐼) give better segmentation results. Resultsfrom color image segmentation illustrate that the proposedprocedure produces better segmentation than the randominitialization. Maliatski and Yadid-Pecht [39] also proposean adaptive k-means clustering, hardware-driven approach.

Y

X

My

1

My

My

2

Mx1 Mx Mx2

C1

C2

Figure 2: The separation points in two dimensions.

The algorithm uses both pixel intensity and pixel distance inthe clustering process. In [40] also a FPGA implementationof real-time k-means clustering is done for colored images.A filtering algorithm is used for hardware implementation.Suliman and Isa [11] presented a unique clustering algo-rithm called adaptive fuzzy-K-means clustering for imagesegmentation. It can be used for different types of imagesincluding medical images, microscopic images, and CCDcamera images. The concepts of fuzziness and belongingnessare applied to produce more adaptive clustering as comparedto other conventional clustering algorithms. The proposedadaptive fuzzy-K-means clustering algorithm provides bettersegmentation and visual quality for a range of images.

2.2. The Proposed Methodology. In this method, the facelocation is determined automatically by using an optimizedK-means algorithm. First, the input image is reshaped intovectors of pixels values, followed by clustering the pixels,based on applying a certain threshold determined by thealgorithm, into two classes. Pixels with values below thethreshold are put in the nonface class.The rest of the pixels areput in the face class. However, some unwanted parts may getclustered to this face class.Therefore, the algorithm is appliedagain to the face class to obtain only the face part.

2.2.1. Optimized K-Means Algorithm. Theproposed K-meansmodified algorithm is intended to solve the limitations of thestandard version by using differential equations to determinethe optimum separation point. This algorithm finds theoptimal centers for each cluster which corresponds to theglobal minimum of the k-means cluster. As an example, saywe would like to cluster an image into two subclasses: faceand nonface. So we look for a separator which will separatethe image into two different clusters in two dimensions.Thenthe means of the clusters can be found easily. Figure 2 showsthe separation points and themean of each class. To find these


points we proposed the followingmodification in continuousand discrete cases.

Let {𝑥1, 𝑥2, 𝑥3, . . . , 𝑥

𝑁} be a dataset and let 𝑥

𝑖be an

observation. Let𝑓(𝑥) be the probabilitymass function (PMF)of 𝑥

𝑓 (𝑥) =

∞

∑

−∞

1

𝑁

𝛿 (𝑥 − 𝑥𝑖) , 𝑥

𝑖∈ R, (6)

where 𝛿(𝑥) is the Dirac function.When the size of the data is very large, the question of the

cost of the minimization of∑𝑘𝑖=1∑𝑥𝑖∈𝑠𝑖

|𝑥𝑖− 𝜇𝑖|2 arises, where

𝑘 is the number of clusters and 𝜇𝑖is the mean of cluster 𝑠

𝑖.

Let

Var 𝑘 =𝑘

∑

𝑖=1

∑

𝑥𝑖∈𝑠𝑖

𝑥𝑖− 𝜇𝑖

2. (7)

Then it is enough to minimize Var 1 without losing thegenerality since

min𝑠𝑖

𝑑

∑

𝑖=1

∑

𝑥𝑗∈𝑠𝑖

𝑥𝑗− 𝜇𝑖

2

≥

𝑑

∑

𝑙=1

min𝑠𝑖

(

𝑘

∑

𝑖=1

∑

𝑥𝑗∈𝑠𝑖

𝑥𝑙

𝑗− 𝜇𝑙

𝑖

2

) , (8)

where 𝑑 is the dimension of 𝑥𝑗= (𝑥1

𝑗, . . . , 𝑥

𝑑

𝑗) and 𝜇

𝑗=

(𝜇1

𝑗, . . . , 𝜇

𝑑

𝑗).

2.2.2. Continuous Case. For the continuous case 𝑓(𝑥) is likethe probability density function (PDF).Then for 2 classes, weneed to minimize the following equation:

Var 2 = ∫𝑀

−∞

(𝑥 − 𝜇1)2𝑓 (𝑥) 𝑑𝑥

+ ∫

∞

𝑀

(𝑥 − 𝜇2)2𝑓 (𝑥) 𝑑𝑥.

(9)

If 𝑑 ≥ 2 we can use (8)

minVar 2 ≥𝑑

∑

𝑘=1

min(∫𝑀

−∞

(𝑥𝑘− 𝜇𝑘

1)

2

𝑓 (𝑥𝑘) 𝑑𝑥

+∫

𝑀

−∞

(𝑥𝑘− 𝜇𝑘

2)

2

𝑓 (𝑥𝑘) 𝑑𝑥) .

(10)

Let 𝑥 ∈ R be any random variable with probabilitydensity function 𝑓(𝑥); we want to find 𝜇

1and 𝜇

2such that

it minimizes (7) as follows:

(𝜇∗

1, 𝜇∗

2) = arg(𝜇1 ,𝜇2)

[ min𝑀,𝜇1,𝜇2

[∫

𝑀

−∞

(𝑥 − 𝜇1)2𝑓 (𝑥) 𝑑𝑥

+∫

∞

𝑀

(𝑥 − 𝜇2)2𝑓 (𝑥) 𝑑𝑥]] ,

(11)

min𝑀,𝜇1,𝜇2

Var 2 = min𝑀

[min𝜇1

∫

𝑀

−∞

(𝑥 − 𝜇1)2𝑓 (𝑥) 𝑑𝑥

+min𝜇2

∫

∞

𝑀

(𝑥 − 𝜇2)2𝑓 (𝑥) 𝑑𝑥] ,

(12)

min𝑀,𝜇1,𝜇2

Var 2 = min𝑀

[min𝜇1

∫

𝑀

−∞

(𝑥 − 𝜇1(𝑀))2𝑓 (𝑥) 𝑑𝑥

+min𝜇2

∫

∞

𝑀

(𝑥 − 𝜇2(𝑀))2𝑓 (𝑥) 𝑑𝑥] .

(13)

We know that

min𝑥𝐸 [(𝑋 − 𝑥)

2] = 𝐸 [(𝑋 − 𝐸 (𝑥))

2] = Var (𝑥) . (14)

So we can find 𝜇1and 𝜇2in terms of𝑀. Let us define Var

1

and Var2as follows:

Var1= ∫

𝑀

−∞

(𝑥 − 𝜇1)2𝑓 (𝑥) 𝑑𝑥,

Var2= ∫

∞

𝑀

(𝑥 − 𝜇2)2𝑓 (𝑥) 𝑑𝑥.

(15)

After simplification we get

Var1= ∫

𝑀

−∞

𝑥2𝑓 (𝑥) 𝑑𝑥

+ 𝜇2

1∫

𝑀

−∞

𝑓 (𝑥) 𝑑𝑥 − 2𝜇1∫

𝑀

−∞

𝑥𝑓 (𝑥) 𝑑𝑥,

Var2= ∫

∞

𝑀


+ 𝜇2

2∫

∞

𝑀

𝑓 (𝑥) 𝑑𝑥 − 2𝜇2∫

∞

𝑀

𝑥𝑓 (𝑥) 𝑑𝑥.

(16)

To find the minimum, Var1and Var

2are both partially

differentiated with respect to 𝜇1and 𝜇

2, respectively. After

simplification, we conclude that the minimization occurswhen

𝜇1=

𝐸 (𝑋−)

𝑃 (𝑋−)

,

𝜇2=

𝐸 (𝑋+)

𝑃 (𝑋+)

,

(17)

where𝑋−= 𝑋 ⋅ 1

𝑥


To find the minimum of Var 2, we need to find 𝜕𝜇/𝜕𝑀

𝜇1= (

∫

𝑀

−∞𝑥𝑓 (𝑥) 𝑑𝑥

∫

𝑀

−∞𝑓 (𝑥) 𝑑𝑥

)

⇒

𝜕𝜇1

𝜕𝑀

=

𝑀𝑓 (𝑀)𝑃 (𝑋−) − 𝑓 (𝑀)𝐸 (𝑋

−)

𝑃2(𝑋−)

,

𝜇2= (

∫

∞

𝑀𝑥𝑓 (𝑥) 𝑑𝑥

∫

∞

𝑀𝑓 (𝑥) 𝑑𝑥

)

⇒

𝜕𝜇2

𝜕𝑀

=

𝑀𝑓 (𝑀)𝑃 (𝑋+) − 𝑓 (𝑀)𝐸 (𝑋+

)

𝑃2(𝑋+)

.

(18)


𝜕𝜇1

𝜕𝑀

=

𝑓 (𝑀)

𝑃2(𝑋−)

[𝑀𝑃 (𝑋−) − 𝐸 (𝑋

−)] ,

𝜕𝜇2

𝜕𝑀

=

𝑓 (𝑀)

𝑃2(𝑋+)

[𝑀𝑃 (𝑋+) − 𝐸 (𝑋

+)] .

(19)

In order to find the minimum of Var 2, we will find𝜕Var1/𝜕𝑀 and 𝜕𝑉𝑎𝑟

2/𝜕𝑀 separately and add themas follows:

𝜕

𝜕𝑀

(Var 2) = 𝜕𝜕𝑀

(Var1) +

𝜕

𝜕𝑀

(Var2)

𝜕Var1

𝜕𝑀

=

𝜕

𝜕𝑀

(∫

𝑀

−∞


+𝜇2

1∫

𝑀

−∞

𝑓 (𝑥) 𝑑𝑥 − 2𝜇1∫

𝑀

−∞

𝑥𝑓 (𝑥) 𝑑𝑥)

= 𝑀2𝑓 (𝑀) + 2𝜇

1𝜇

1𝑃 (𝑋−)

+ 𝜇2

1𝑓 (𝑀) − 2𝜇

1𝐸 (𝑋−) − 2𝜇

1𝑀𝑓(𝑀) .

(20)


𝜕Var1

𝜕𝑀

= 𝑓 (𝑀)[𝑀2+

𝐸2(𝑋−)

𝑃(𝑋−)2− 2𝑀

𝐸 (𝑋−)

𝑃 (𝑋−)

] . (21)

Similarly

𝜕Var2

𝜕𝑀

= −𝑀2𝑓 (𝑀) + 2𝜇

2𝜇

2𝑃 (𝑋+)

− 𝜇2

2𝑓 (𝑀) − 2𝜇

2𝐸 (𝑋+) + 2𝜇

2𝑀𝑓(𝑀) .

(22)


𝜕Var2

𝜕𝑀

= 𝑓 (𝑀)[−𝑀2+

𝐸2(𝑋+)

𝑃(𝑋+)2− 2𝑀

𝐸 (𝑋+)

𝑃 (𝑋+)

] . (23)

Adding both these differentials and after simplifying, itfollows that

𝜕

𝜕𝑀

(Var 2) = 𝑓 (𝑀)[(𝐸2(𝑋−)

𝑃(𝑋−)2−

𝐸2(𝑋+)

𝑃(𝑋+)2)

−2𝑀(

𝐸 (𝑋−)

𝑃 (𝑋−)

−

𝐸 (𝑋+)

𝑃 (𝑋+)

)] .

(24)

Equating this with 0 gives

𝜕

𝜕𝑀

(Var 2) = 0,

[(

𝐸2(𝑋−)

𝑃(𝑋−)2−

𝐸2(𝑋+)

𝑃(𝑋+)2) − 2𝑀(

𝐸 (𝑋−)

𝑃 (𝑋−)

−

𝐸 (𝑋+)

𝑃 (𝑋+)

)] = 0,

[

𝐸 (𝑋−)

𝑃 (𝑋−)

−

𝐸 (𝑋+)

𝑃 (𝑋+)

] [

𝐸 (𝑋−)

𝑃 (𝑋−)

+

𝐸 (𝑋+)

𝑃 (𝑋+)

− 2𝑀] = 0.

(25)

But [(𝐸(𝑋−)/𝑃(𝑋

−))−(𝐸(𝑋

+)/𝑃(𝑋

+))] ̸= 0, as in that case

𝜇1= 𝜇2, which contradicts the initial assumption. Therefore

[

𝐸 (𝑋−)

𝑃 (𝑋−)

+

𝐸 (𝑋+)

𝑃 (𝑋+)

− 2𝑀] = 0

⇒ 𝑀 = [

𝐸 (𝑋−)

𝑃 (𝑋−)

+

𝐸 (𝑋+)

𝑃 (𝑋+)

] ⋅

1

2

,

𝑀 =

𝜇1+ 𝜇2

2

.

(26)

Therefore, we have to find all values of𝑀 that satisfy (26),which is easier than minimizing Var 2 directly.

To find theminimum in terms of𝑀, it is enough to derive

𝜕Var 2 (𝜇1(𝑀) , 𝜇

2(𝑀))

𝜕𝑀

= 𝑓 (𝑀) [

𝐸2(𝑋−)

𝑃(𝑋 < 𝑀)2−

𝐸2(𝑋+)

𝑃(𝑋 ≥ 𝑀)2

−2𝑀[

𝐸 (𝑋−)

𝑃 (𝑋 ≤ 𝑀)

−

𝐸 (𝑋+)

𝑃 (𝑋 > 𝑀)

]] .

(27)

Then we find

𝑀 = [

𝐸 (𝑋+)

𝑃 (𝑋+)

+

𝐸 (𝑋−)

𝑃 (𝑋−)

] ⋅

1

2

. (28)

2.2.3. Finding the Centers of Some Common ProbabilityDensity Functions

(1) Uniform Distribution. The probability density function ofthe continuous uniform distribution is

𝑓 (𝑥) =

{

{

{

1

𝑏 − 𝑎

, 𝑎 < 𝑥 < 𝑏

0 otherwise,(29)


where 𝑎 and 𝑏 are the two parameters of the distribution.Theexpected values for𝑋 ⋅ 1

𝑥≥ 𝑀 and𝑋 ⋅ 1

𝑥< 𝑀 are

𝐸 (𝑋+) =

1

2

(

𝑏2−𝑀2

𝑏 − 𝑎

) ,

𝐸 (𝑋−) =

1

2

(

𝑀2− 𝑎2

𝑏 − 𝑎

) .

(30)

Putting all these in (28) and solving, we get

𝑀 =

1

2

(𝑎 + 𝑏) ,

𝜇1(𝑀) =

𝐸 (𝑋−)

𝑃 (𝑋−)

=

1

4

(3𝑎 + 𝑏) ,

𝜇2 (𝑀) =

𝐸 (𝑋+)

𝑃 (𝑋+)

=

1

4

(𝑎 + 3𝑏) .

(31)

(2) Log-NormalDistribution.Theprobability density functionof a log-normal distribution is

𝑓 (𝑥; 𝜇, 𝜎) =

1

𝑥𝜎√2𝜋

𝑒−(ln 𝑥−𝜇)2/2𝜎2

, 𝑥 > 0. (32)

The probabilities in left and right of𝑀 are, respectively,

𝑃 (𝑋+) =

1

2

−

1

2

erf (12

√2 (ln𝑀− 𝜇)𝜎

) ,

𝑃 (𝑋−) =

1

2

+

1

2

erf (12

√2 (ln𝑀− 𝜇)𝜎

) .

(33)

The expected values for𝑋 ⋅ 1𝑥≥ 𝑀 and𝑋 ⋅ 1

𝑥< 𝑀 are

𝐸 (𝑋+)

= [

1

2

𝑒𝜇+(1/2)𝜎

2

{1 + erf (12

(𝜇 + 𝜎2− ln𝑀)√2𝜎

)}] ,

𝐸 (𝑋−)

= [

1

2

𝑒𝜇+(1/2)𝜎

2

{1 + erf (12

(−𝜇 − 𝜎2+ ln𝑀)√2𝜎

)}] .

(34)

Putting all these in (28) and assuming 𝜇 = 0 and 𝜎 = 1and solving, we get

𝑀 = 4.244942583,

𝜇1(𝑀) =

𝐸 (𝑋−)

𝑃 (𝑋−)

= 1.196827816,

𝜇2(𝑀) =

𝐸 (𝑋+)

𝑃 (𝑋+)

= 7.293057348.

(35)

(3) Normal Distribution. The probability density function ofa normal distribution is

𝑓 (𝑥, 𝜇, 𝜎) =

1

𝜎√2𝜋

𝑒−(𝑥−𝜇)

2/2𝜎2

. (36)

The probabilities in left and right of𝑀 are, respectively,

𝑃 (𝑋+) =

1

2

−

1

2

erf (12

√2 (𝑀 − 𝜇)

𝜎

) ,

𝑃 (𝑋−) =

1

2

+

1

2

erf (12

√2 (𝑀 − 𝜇)

𝜎

) .

(37)

The expected values for𝑋 ⋅ 1𝑥≥ 𝑀 and𝑋 ⋅ 1

𝑥< 𝑀 are

𝐸 (𝑋+) = −

1

2

𝜇{−1 + erf (12

√2 (𝑀 − 𝜇)

𝜎

)} ,

𝐸 (𝑋−) =

1

2

𝜇{1 + erf (12

√2 (𝑀 − 𝜇)

𝜎

)} .

(38)

Putting all these in (28) and solving we get 𝑀 = 𝜇.Assuming 𝜇 = 0 and 𝜎 = 1 and solving, we get

𝑀 = 0,

𝜇1(𝑀) =

𝐸 (𝑋−)

𝑃 (𝑋−)

= −√2

𝜋

,

𝜇2(𝑀) =

𝐸 (𝑋+)

𝑃 (𝑋+)

= √2

𝜋

.

(39)

2.2.4. Discrete Case. Let𝑋 be a discrete random variable andassume we have a set of𝑁 observations from𝑋. Let 𝑝

𝑗be the

mass density function for an observation 𝑥𝑗.

Let 𝑥𝑖−1

< 𝑀𝑖−1

< 𝑥𝑖and 𝑥

𝑖< 𝑀𝑖< 𝑥𝑖+1

. We define 𝜇𝑥≤𝑥𝑖

as themean for all 𝑥 ≤ 𝑥𝑖, and 𝜇

𝑥≥𝑥𝑖as themean for all 𝑥 ≥ 𝑥

𝑖.

Define Var 2(𝑀𝑖−1) as the variance of two centers forced by

𝑀𝑖−1

as a separator

Var (𝑀𝑖−1) = ∑

𝑥𝑗≤𝑥𝑖−1

(𝑥𝑗− 𝜇𝑥𝑗≤𝑥𝑖−1

)

2

𝑝𝑗

+ ∑

𝑥𝑗≥𝑥𝑖

(𝑥𝑗− 𝜇𝑥𝑗≥𝑥𝑖

)

2

𝑝𝑗,

(40)

Var (𝑀𝑖) = ∑

𝑥𝑗≤𝑥𝑖

(𝑥𝑗− 𝜇𝑥𝑗≤𝑥𝑖

)

2

𝑝𝑗

+ ∑

𝑥𝑗≥𝑥𝑖+1

(𝑥𝑗− 𝜇𝑥𝑗≥𝑥𝑖+1

)

2

𝑝𝑗.

(41)


The first part of Var(𝑀𝑖−1) simplifies as follows:

∑


(𝑥𝑗− 𝜇𝑥𝑗≤𝑥𝑖−1

)

2

𝑝𝑗

= ∑


𝑥2

𝑗𝑝𝑗− 2𝜇𝑥𝑗≤𝑥𝑖−1

× ∑


𝑥𝑗𝑝𝑗+ 𝜇2

𝑥𝑗≤𝑥𝑖−1∑


𝑝𝑗

= 𝐸 (𝑋2⋅ 1𝑥𝑗≤𝑥𝑖−1

) −

𝐸2(𝑋 ⋅ 1

𝑥𝑗≤𝑥𝑖−1)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

.

(42)

Similarly,

∑

𝑥𝑗≥𝑥𝑖

(𝑥𝑗− 𝜇𝑥𝑗≥𝑥𝑖

)

2

𝑝𝑗= 𝐸 (𝑋

2⋅ 1𝑥𝑗≥𝑥𝑖

) −

𝐸2(𝑋 ⋅ 1

𝑥𝑗≥𝑥𝑖)

𝑃 (𝑋 ≥ 𝑥𝑖)

,

Var 2 (𝑀𝑖−1) = 𝐸 (𝑋

2⋅ 1𝑥𝑗≤𝑥𝑖−1

)

−

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≤ 𝑥𝑖−1)

+ 𝐸 (𝑋2⋅ 1𝑥𝑗≥𝑥𝑖

)

−

𝐸2(𝑋 ⋅ 1



,

Var 2 (𝑀𝑖) = 𝐸 (𝑋

2⋅ 1𝑥𝑗≤𝑥𝑖

)

−

𝐸2(𝑋 ⋅ 1

𝑥𝑗≤𝑥𝑖)

𝑃 (𝑋 ≤ 𝑥𝑖)

+ 𝐸 (𝑋2⋅ 1𝑥𝑗≥𝑥𝑖+1

)

−

𝐸2(𝑋 ⋅ 1

𝑥𝑗≥𝑥𝑖+1)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

.

(43)

Define ΔVar 2 = Var 2(𝑀𝑖) − Var 2(𝑀

𝑖−1)

ΔVar 2 = [

[

−

𝐸2(𝑋 ⋅ 1



−

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

−[

[

−

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≤ 𝑥𝑖−1)

−

𝐸2(𝑋 ⋅ 1



]

]

.

(44)

Now, in order to simplify the calculation, we rearrangeΔVar 2 as follows:

ΔVar 2 = [

[

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≤ 𝑥𝑖−1)

−

𝐸2(𝑋 ⋅ 1



]

]

+[

[

𝐸2(𝑋 ⋅ 1



−

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

.

(45)

The first part is simplified as follows:

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≤ 𝑥𝑖−1)

−

𝐸2(𝑋 ⋅ 1



=

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≤ 𝑥𝑖−1)

−

[𝐸 (𝑋 ⋅ 1𝑥𝑗≤𝑥𝑖−1

) + 𝑥𝑖𝑝𝑖]

2

𝑃 (𝑋 ≤ 𝑥𝑖−1) + 𝑝𝑖

=

1

𝑃 (𝑋 ≤ 𝑥𝑖−1) [𝑃 (𝑋 ≤ 𝑥

𝑖−1) + 𝑝𝑖]

× {𝐸2(𝑋 ⋅ 1

𝑥𝑗≤𝑥𝑖−1)𝑃 (𝑋 ≤ 𝑥

𝑖)

+ 𝑝𝑖𝐸2(𝑋 ⋅ 1


− 𝐸2(𝑋 ⋅ 1


𝑖−1)

− 2𝑥𝑖𝑝𝑖𝐸(𝑋 ⋅ 1


𝑖−1)

−𝑥2

𝑖𝑝2

𝑖𝑃 (𝑋 ≤ 𝑥

𝑖−1)}

=

1

𝑃 (𝑋 ≤ 𝑥𝑖−1) [𝑃 (𝑋 ≤ 𝑥

𝑖−1) + 𝑝𝑖]

× {𝑝𝑖𝐸2(𝑋 ⋅ 1


− 2𝑥𝑖𝑝𝑖𝐸(𝑋 ⋅ 1


𝑖−1)

−𝑥2

𝑖𝑝2

𝑖𝑃 (𝑋 ≤ 𝑥

𝑖−1)} .

(46)

Similarly, simplifying the second part yields

𝐸2(𝑋 ⋅ 1



−

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≥ 𝑥𝑖+1)

=

[𝐸 (𝑋 ⋅ 1𝑥𝑗≥𝑥𝑖+1

) + 𝑥𝑖𝑝𝑖]

2

𝑃 (𝑋 ≥ 𝑥𝑖+1) + 𝑝𝑖

−

𝐸2(𝑋 ⋅ 1


𝑃 (𝑋 ≥ 𝑥𝑖+1)

=

1

𝑃 (𝑋 ≥ 𝑥𝑖+1) [𝑃 (𝑋 ≥ 𝑥

𝑖+1) + 𝑝𝑖]

× {𝐸2(𝑋 ⋅ 1

𝑥𝑗≥𝑥𝑖+1)𝑃 (𝑋 ≥ 𝑥

𝑖+1)

− 𝑝𝑖𝐸2(𝑋 ⋅ 1


− 𝐸2(𝑋 ⋅ 1


𝑖+1)

+ 2𝑥𝑖𝑝𝑖𝐸(𝑋 ⋅ 1


𝑖+1)

+𝑥2

𝑖𝑝2

𝑖𝑃 (𝑋 ≥ 𝑥

𝑖+1) }


=

1

𝑃 (𝑋 ≥ 𝑥𝑖+1) [𝑃 (𝑋 ≥ 𝑥

𝑖+1) + 𝑝𝑖]

× {−𝑝𝑖+1𝐸2(𝑋 ⋅ 1


+ 2𝑥𝑖𝑝𝑖𝐸(𝑋 ⋅ 1


𝑖+1)

+𝑥2

𝑖𝑝2

𝑖+1𝑃 (𝑋 ≥ 𝑥

𝑖+1) } .

(47)In general, these types of optimization arise when the

dataset is very big.If we consider𝑁 to be very large then 𝑝

𝑘≪ max(∑

𝑖≥𝑘+1

𝑃𝑖; ∑𝑖≤𝑘−1

𝑃𝑖), 𝑃(𝑋 ≤ 𝑥

𝑖−1) + 𝑝𝑖≈ 𝑃(𝑋 ≤ 𝑥

𝑖), and 𝑃(𝑋 ≥

𝑥𝑖+1) + 𝑝𝑖≈ 𝑃(𝑋 ≥ 𝑥

𝑖+1):

ΔVar 2 ≈ [

[

𝐸2(𝑋 ⋅ 1

𝑥𝑗≤𝑥𝑖−1) 𝑝𝑖

𝑃2(𝑋 ≤ 𝑥

𝑖−1)

−

𝐸2(𝑋 ⋅ 1

𝑥𝑗≥𝑥𝑖+1) 𝑝𝑖

𝑃2(𝑋 ≥ 𝑥

𝑖+1)

]

]

− 2𝑝𝑖𝑥𝑖[

[

𝐸(𝑋 ⋅ 1𝑥𝑗≤𝑥𝑖−1

)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

−

𝐸 (𝑋 ⋅ 1𝑥𝑗≥𝑥𝑖+1

)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

− 𝑝2

𝑖𝑥2

𝑖[

1

𝑃 (𝑋 ≤ 𝑥𝑖−1)

−

1

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

(48)

and, as lim𝑁→∞

{𝑝2

𝑖𝑥2

𝑖[(1/𝑃(𝑋 ≤ 𝑥

𝑖−1))−(1/𝑃(𝑋 ≥ 𝑥

𝑖+1))]} =

0, we can further simplify ΔVar 2 as follows:ΔVar 2

≈[

[

𝐸2(𝑋 ⋅ 1

𝑥𝑗≤𝑥𝑖−1) 𝑝𝑖

𝑃2(𝑋 ≤ 𝑥

𝑖−1)

−

𝐸2(𝑋 ⋅ 1

𝑥𝑗≥𝑥𝑖+1) 𝑝𝑖

𝑃2(𝑋 ≥ 𝑥

𝑖+1)

]

]

− 2𝑝𝑖𝑥𝑖[

[


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

−


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

= 𝑝𝑖[

[


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

−


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

×[

[

(


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

+


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

) − 2𝑥𝑖]

]

.

(49)To find the minimum of ΔVar 2, let us locate when

ΔVar 2 = 0:

⇒ 𝑝𝑖[

[


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

−


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

×[

[

(


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

+


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

) − 2𝑥𝑖]

]

= 0.

(50)

VarD

X1 X2 X3 X4 X5 X6 Xn

M

Figure 3: Discrete random variables.

But 𝑝𝑖

̸= 0, and so is [(𝐸(𝑋 ⋅ 1𝑥𝑗≤𝑥𝑖−1

)/𝑃(𝑋 ≤ 𝑥𝑖−1)) −

(𝐸(𝑋 ⋅ 1𝑥𝑗≥𝑥𝑖+1

)/𝑃(𝑋 ≥ 𝑥𝑖+1))] < 0, since 𝑥

𝑖is an increasing

sequence, which implies that 𝜇𝑖−1

< 𝜇𝑖where

𝜇𝑖−1

=

𝐸 (𝑋 ⋅ 1𝑥𝑗≤𝑥𝑖−1

)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

,

𝜇𝑖=


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

.

(51)

Therefore,

[

[


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

+


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

− 2𝑥𝑖= 0. (52)

Let us define

𝐹 (𝑥𝑖) = 2𝑥

𝑖−[

[


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

+


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

𝐹 (𝑥𝑖) = 0

⇒ 𝑥𝑖=[

[


)

𝑃 (𝑋 ≤ 𝑥𝑖−1)

+


)

𝑃 (𝑋 ≥ 𝑥𝑖+1)

]

]

⋅

1

2

⇒ 𝑥𝑖=

𝜇𝑖−1+ 𝜇𝑖

2

,

(53)

where 𝜇𝑖−1

and 𝜇𝑖are the means of the first and second parts,

respectively.To find the minimum for the total variation, it is enough

to find the good separator𝑀 between 𝑥𝑖and 𝑥

𝑖−1such that

𝐹(𝑥𝑖) ⋅ 𝐹(𝑥

𝑖+1) < 0 and 𝐹(𝑥

𝑖) < 0 (see Figure 3).

After finding few local minima, we can get the globalminimum by finding the smallest among them.

3. Face Localization Using Proposed Method

Now, it becomes clear the superiority of the K-means modi-fied algorithmover the standard algorithm in terms of finding


Figure 4: The original image.

Figure 5: The face with shade only.

a global minimum and we can explain our solution basedon the modified algorithm. Suppose we have an image withpixels values from 0 to 255. First, we reshaped the input imageinto a vector of pixels; Figure 4 shows an input image beforethe operation of the K-means modified algorithm.The imagecontains a face part, a normal background, and a backgroundwith illumination. Then we split the image pixels into twoclasses: one from 0 to 𝑥

1and another from 𝑥

1to 255 by

applying a certain threshold determined by the algorithm.Let [0, . . . , 𝑥

1] be the class 1-which represents the nonface

part and let [𝑥1, . . . , 255] be the class 2-which represents the

face with the shade.All pixels with values less than the threshold belong to

the first class and will be assigned to 0 values. This will keepthe pixels with values above the threshold and these pixelsbelong to both the face part and the illuminated part of thebackground, as shown in Figure 5.

In order to remove the illuminated background part,another threshold will be applied to the pixels using thealgorithm meant to separate between the face part and theilluminated background part. New classes will be obtained,from 𝑥

1to 𝑥2and from 𝑥

2to 255.Then, the pixels with values

less than the threshold will be assigned a value of 0 and thenonzero pixels left represent the face part. Let [𝑥

1, . . . , 𝑥

2] be

the class 1 which represents the illuminated background partand let [𝑥

2, . . . , 255] be the class 2 which represents the face

part.The result of the removal of the illuminated back-ground

part is shown in Figure 6.

Figure 6: The face without illumination.

Figure 7: The filtered image.

One can see that some face pixels were assigned to 0 dueto the effects of the illumination. Therefore, there is a needto return to the original image in Figure 4 to fully extract theimage window corresponding to the face. A filtering processis then performed to reduce the noise and the result is shownin Figure 7.

Figure 8 shows the final result which is a window contain-ing the face extracted from the original image.

4. Results

In this Section, the experimental results of the proposedalgorithm will be analyzed and discussed. First, it starts withpresenting a description of the datasets used in this work,followed by the results of the previous part of the work.Three popular databases were used to test the proposed facelocalizationmethod.TheMIT-CBCL database has image size115 × 115, and 300 images were selected from this datasetbased on different conditions such as light variation, pose,and background variations. The second dataset is the BioIDdataset with image size 384 × 286 and 300 images from thedatabase were selected based on different conditions suchas indoor/outdoor. The last dataset is the Caltech datasetwith image size 896 × 592 and the 300 images from thedatabase were selected based on different conditions such asindoor/outdoor.

4.1. MIT-CBCL Dataset. This dataset was established by theCenter for Biological and Computational Learning (CBCL)


Figure 8: The corresponding window in the original image.

Figure 9: Samples fromMIT-CBCL dataset.

at Massachusetts Institute of Technology (MIT) [41]. It has10 persons with 200 images per person and an image size of115×115 pixels.The images of each person are with differentlight conditions, clutter background, and scale and differentposes. Few examples of these images are shown in Figure 9.

4.2. BioID Dataset. This dataset [42] consists of 1521 graylevel images with a resolution of 384 × 286 pixels. Each oneshows the frontal view of a face of one out of 23 differenttest persons. The images were taken under different lightingconditions, with a complex background, and contain tiltedand rotated faces. This dataset is considered as the mostdifficult one for eye detection (see Figure 10).

4.3. Caltech Dataset. This database [43] has been collectedby Markus Weber at California Institute of Technology. Thedatabase contains 450 face images of 27 subjects underdifferent lighting conditions, different face expressions, andcomplex backgrounds (see Figure 11).

4.4. Proposed Method Result. The proposed method is eval-uated on these datasets where the result was 100% with a

Figure 10: Sample of faces from BioID dataset for localization.

Figure 11: Sample of faces from Caltech dataset for localization.

Table 1: Face localization using K-mean modified algorithm onMIT-CBCL, BioID, and Caltech; the sets contain faces with differentposes and faces with clutter background as well as indoors/outdoors.

Dataset Image Accuracy (%)MIT-Set 1 150 100MIT-Set 2 150 100BioID 300 100Caltech 300 100

localization time of 4 s. Table 1 shows the proposed methodresults with the MIT-CBCL, Caltech, and BioID databases.In this test, we have focused on the use of images of facesfrom different angles from 30 degrees left to 30 degreesright and the variation in the background as well as thecases of indoor and outdoor images. Two sets of images areselected from theMIT-CBCL dataset and two other sets fromthe other databases to evaluate the proposed method. Thefirst set from MIT-CBCL contains the images with differentposes and the second set contains images with differentbackgrounds. Each set contains 150 images. The BioID andCaltech sets contain imageswith indoor and outdoor settings.The results showed the efficiency of the proposed K-meanmodified algorithm to locate the faces from the image withhigh background and poses changes. In addition, anotherparameter was considered in this test which is the localizationtime. From the results, the proposed method can locate theface in a very short time because it has less complexity due tothe use of differential equations.


Figure 12: Examples of face localization using k-mean modified algorithm on MIT-CBCL dataset.

(a) (b) (c)

Figure 13: Example of face localization using k-mean modified algorithm on BioID dataset: (a) the original image, (b) face position, and (c)filtered image.

Figure 12 shows some examples of face localization onMIT-CBCL.

In Figure 13(a), an original image from BioID dataset isshown.The localization position is shown in Figure 13(b). Butthere is need to do a filtering operation in order to remove theunwanted parts and this is shown in Figure 13(c) and then thearea of ROI will be determined.

Figure 14 shows the face position on an image from theCaltech dataset.

5. Conclusion

In this paper, we focus on developing the RIO detectionby suggesting a new method for face localization. In thismethod of localization, the input image is reshaped intoa vector and then the optimized k-means algorithm isapplied twice to cluster the image pixels into classes. At the

end, the block window corresponding to the class of pixelscontaining the face only is extracted from the input image.Three sets of images taken from MIT, BioID, and Caltechdatasets and differing in illumination, and backgrounds,as well as in-door/outdoor settings were used to evaluatethe proposed method. The results show that the proposedmethod achieved significant localization accuracy.Our futureresearch direction includes finding the optimal centers forhigher dimension data, whichwe believe is just generalizationof our approach to higher dimension (8) and to highersubclusters.

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper.


(a) (b) (c)

Figure 14: Example of face localization using k-mean modified algorithm on Caltech dataset: (a) the original image, (b) face position, and(c) filtered image.

Acknowledgments

This study is a part of the Internal Grant Project no.419011811122. The authors gratefully acknowledge the con-tinued support from Alfaisal University and its Office ofResearch.

References

[1] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human andmachine recognition of faces: a survey,” Proceedings of the IEEE,vol. 83, no. 5, pp. 705–740, 1995.

[2] M.-H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting faces inimages: a survey,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 24, no. 1, pp. 34–58, 2002.

[3] K. S. Fu and J. K. Mui, “A survey on image segmentation,”Pattern Recognition, vol. 13, no. 1, pp. 3–16, 1981.

[4] N. R. Pal and S. K. Pal, “A review on image segmentationtechniques,” Pattern Recognition, vol. 26, no. 9, pp. 1277–1294,1993.

[5] S.-Y.Wan andW. E.Higgins, “Symmetric region growing,” IEEETransactions on Image Processing, vol. 12, no. 9, pp. 1007–1015,2003.

[6] Y.-L. Chang and X. Li, “Fast image region growing,” Image andVision Computing, vol. 13, no. 7, pp. 559–571, 1995.

[7] T. N. Pappas and N. S. Jayant, “An adaptive clustering algorithmfor image segmentation,” in Proceedings of the InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP’89), vol. 3, pp. 1667–1670, May 1989.

[8] T. N. Pappas, “An adaptive clustering algorithm for imagesegmentation,” IEEE Transactions on Signal Processing, vol. 40,no. 4, pp. 901–914, 1992.

[9] M. Y. Mashor, “Hybrid training algorithm for RBF network,”International Journal of the Computer, the Internet and Manage-ment, vol. 8, no. 2, pp. 50–65, 2000.

[10] N. A. M. Isa, S. A. Salamah, and U. K. Ngah, “Adaptive fuzzymovingK-means clustering algorithm for image segmentation,”

IEEE Transactions on Consumer Electronics, vol. 55, no. 4, pp.2145–2153, 2009.

[11] S. N. Sulaiman and N. A.M. Isa, “Adaptive fuzzy-K-means clus-tering algorithm for image segmentation,” IEEE Transactions onConsumer Electronics, vol. 56, no. 4, pp. 2661–2668, 2010.

[12] W. Cai, S. Chen, and D. Zhang, “Fast and robust fuzzy c-meansclustering algorithms incorporating local information for imagesegmentation,” Pattern Recognition, vol. 40, no. 3, pp. 825–838,2007.

[13] P. Tan, M. Steinbach, and V. Kumar, Introduction to DataMining, Pearson Addison-Wesley, 2006.

[14] H. Xiong, J. Wu, and J. Chen, “K-means clustering versusvalidation measures: a data-distribution perspective,” IEEETransactions on Systems, Man, and Cybernetics B: Cybernetics,vol. 39, no. 2, pp. 318–331, 2009.

[15] I. Borg and P. J. F. Groenen,ModernMultidimensional Scaling—Theory and Applications, Springer Science + Business Media,2005.

[16] S. Guha, R. Rastogi, and K. Shim, “CURE: an efficient clusteringalgorithm for large databases,” ACM SIGMOD Record, vol. 27,no. 2, pp. 73–84, 1998.

[17] E.M. Knorr, R. T. Ng, and V. Tucakov, “Distance-based outliers:algorithms and applications,” VLDB Journal, vol. 8, no. 3-4, pp.237–253, 2000.

[18] P. S. Bradley, U. Fayyad, and C. Reina, “Scaling clusteringalgorithms to large databases,” in Proceedings of the 4th ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD ’98), pp. 9–15, 1998.

[19] D. Arthur and S. Vassilvitskii, “k-means++: the advantages ofcareful seeding,” in Proceedings of the 18th Annual ACM-SIAMSymposium on Discrete Algorithms, pp. 1027–1035, 2007.

[20] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko,R. Silverman, and A. Y. Wu, “A local search approximationalgorithm for k-means clustering,” Computational Geometry:Theory and Applications, vol. 28, no. 2-3, pp. 89–112, 2004.


[21] M. C. Chiang, C. W. Tsai, and C. S. Yang, “A time-efficient pat-tern reduction algorithm for k-means clustering,” InformationSciences, vol. 181, no. 4, pp. 716–731, 2011.

[22] P. S. Bradley, U. M. Fayyad, and C. Reina, “Scaling clusteringalgorithms to large databases,” in Proceedings of the ACMSIGKDD International Conference on Knowledge Discovery andData Mining (KDD ’98), pp. 9–15, 1998.

[23] F. Farnstrom, J. Lewis, and C. Elkan, “Scalability for clusteringalgorithms revisited,” SIGKDD Explorations, vol. 2, no. 1, pp. 51–57, 2000.

[24] C. Ordonez and E. Omiecinski, “Efficient disk-based K-means clustering for relational databases,” IEEE Transactions onKnowledge and Data Engineering, vol. 16, no. 8, pp. 909–921,2004.

[25] Y. Li and S. M. Chung, “Parallel bisecting k-means withprediction clustering algorithm,” Journal of Supercomputing,vol. 39, no. 1, pp. 19–37, 2007.

[26] C. Elkan, “Using the triangle inequality to accelerate k-means,”in Proceedings of the 20th International Conference on MachineLearning, pp. 147–153, August 2003.

[27] D. Reddy and P. K. Jana, “Initialization for K-means clusteringusing voronoi diagram,” Procedia Technology, vol. 4, pp. 395–400, 2012.

[28] J. B. MacQueen, “Some methods for classification and analysisof multivariate observations,” in Proceedings of 5th BerkeleySymposium on Mathematical Statistics and Probability, pp. 281–297, University of California Press, 1967.

[29] R. Ebrahimpour, R. Rasoolinezhad, Z. Hajiabolhasani, andM. Ebrahimi, “Vanishing point detection in corridors: usingHough transform and K-means clustering,” IET ComputerVision, vol. 6, no. 1, pp. 40–51, 2012.

[30] P. S. Bishnu and V. Bhattacherjee, “Software fault predictionusing quad tree-based K-means clustering algorithm,” IEEETransactions on Knowledge and Data Engineering, vol. 24, no.6, pp. 1146–1150, 2012.

[31] T. Elomaa and H. Koivistoinen, “On autonomous K-meansclustering,” in Proceedings of the International Symposium onMethodologies for Intelligent Systems, pp. 228–236, May 2005.

[32] J. M. Peña, J. A. Lozano, and P. Larrañaga, “An empiricalcomparison of four initialization methods for the K-Meansalgorithm,” Pattern Recognition Letters, vol. 20, no. 10, pp. 1027–1040, 1999.

[33] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko,R. Silverman, and A. Y. Wu, “An efficient k-means clusteringalgorithms: analysis and implementation,” IEEETransactions onPattern Analysis andMachine Intelligence, vol. 24, no. 7, pp. 881–892, 2002.

[34] A. Likas, N. Vlassis, and J. J. Verbeek, “The global k-meansclustering algorithm,” Pattern Recognition, vol. 36, no. 2, pp.451–461, 2003.

[35] A. M. Bagirov, “Modified global k-means algorithm for mini-mumsum-of-squares clustering problems,”PatternRecognition,vol. 41, no. 10, pp. 3192–3199, 2008.

[36] J. Xie and S. Jiang, “A simple and fast algorithm for global K-means clustering,” in Proceedings of the 2nd InternationalWork-shop on Education Technology and Computer Science (ETCS '10),vol. 2, pp. 36–40, Wuhan, China, March 2010.

[37] G. F. Tzortzis and A. C. Likas, “The global kernel 𝜅-meansalgorithm for clustering in feature space,” IEEE Transactions onNeural Networks, vol. 20, no. 7, pp. 1181–1194, 2009.

[38] W.-L. Hung, Y.-C. Chang, and E. Stanley Lee, “Weight selectionin W-K-means algorithm with an application in color imagesegmentation,” Computers and Mathematics with Applications,vol. 62, no. 2, pp. 668–676, 2011.

[39] B. Maliatski and O. Yadid-Pecht, “Hardware-driven adaptive 𝜅-means clustering for real-time video imaging,” IEEE Transac-tions on Circuits and Systems for Video Technology, vol. 15, no. 1,pp. 164–166, 2005.

[40] T. Saegusa and T. Maruyama, “An FPGA implementation ofreal-time K-means clustering for color images,” Journal of Real-Time Image Processing, vol. 2, no. 4, pp. 309–318, 2007.

[41] “CBCL face recognition Database,” Vol 2010,http://cbcl.mit.edu/software-datasets/heisele/facerecognition-database.html.

[42] BioID: BioID Support: Downloads: BioID Face Database, 2011.[43] http://www.vision.caltech.edu/html-files/archive.html.

Submit your manuscripts athttp://www.hindawi.com

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

MathematicsJournal of


Mathematical Problems in Engineering

Hindawi Publishing Corporationhttp://www.hindawi.com

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of


Probability and StatisticsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of


CombinatoricsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of


Operations ResearchAdvances in

Journal of


Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of Mathematics and Mathematical Sciences


The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Algebra

Discrete Dynamics in Nature and Society



Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014 Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Stochastic AnalysisInternational Journal of

Research Article Optimized K-Means Algorithm2. K-Means Clustering K-means, an unsupervised learning...

Documents

Transcript of Research Article Optimized K-Means Algorithm2. K-Means Clustering K-means, an unsupervised learning...