[IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece...

5

Click here to load reader

Transcript of [IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece...

Page 1: [IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece (2010.06.13-2010.06.19)] 2010 Second International Conferences on Advances in Multimedia - Image

Image Segmentation with Clustering K-Means and Watershed Transform

Adelina-Iulia Sarpe

University of Craiova,

Faculty of Automation, Computers and Electronics

Craiova, Romania

Email: [email protected]

Abstract—Image segmentation is a very important processfor multimedia applications. Multimedia databases use seg-mentation for the storage and indexing of images. This paperpresents a way to segment images by applying both a clusteringmethod and watershed transformation. It is well known that themajor drawback of the watershed transformation method is theoversegmentation phenomenon it produces. For this reason theimage is first segmented with the K-Means clustering method.Another well-known fact is that after applying the K-Meansalgorithm the output image contains a lot of noise. This is whythe image is then filtered with a Gasussian blur filter. Finallythe watershed transformation is applied. Tests results obtainedusing the images from a segmentation evaluation database,show that using this particular combination of methods resultsin a highly reduced oversegmentation.

Keywords-K-Means clustering; image segmentation; water-shed transformation.

I. INTRODUCTION

Image segmentation is the process of partitioning an im-

age into multiple segments. The scope is to simplify and/or

change the representation of an image into something that is

more meaningful and easier to analyze. The process consists

of labeling all pixels in a given image based on similar

characteristics, resulting in groups of pixels (also referred

to as clusters). The higher purpose is that of identifying

objects within images.

This paper presents a combination of methods with the

purpose of reducing the oversegmentation resulted from ap-

plying the watershed transformation method, thus allowing

for a better identification of objects.

The methods used are (i) K-Means clustering, for a

presegmentation of the image, (ii) Gaussian blur filter, for

reducing the image noise resulted from applying the K-

Means clustering method and also removing the unnecessary

details, and (iii) the watershed transformation. This com-

bination of methods is necessary due to the fact that the

watershed transformation, if applied to images with noise,

produces a lot of oversegmentation in its output, rendering

the end image useless.

This paper is organized as follows: Section 2 describes the

clustering technique outlining the K-Means algorithm. Sec-

tion 3 presents the filtering process with more details about

the Gaussian blur filter. Section 4 presents the watershed

transformation. Section 5 describes the experiments done

and the results obtained, together with a comparison between

these results. Finally Section 6 presents our conclusions.

II. CLUSTERING TECHNIQUE

Clustering is a grouping technique that uses a similarity

measure based on which similar items are placed together in

the same group and dissimilar items are placed in different

groups. The resulting groups are referred to as clusters and

the similarity measure by which they were generated is in

fact known as a distance measure.

This technique is considered to be the most important

unsupervised learning technique, it is widely used in the field

of computer vision and image processing and as a result

has found application in a vast array of domains such as:

Marketing, Biology, Libraries, Medical Imaging, etc.

A. K-Means Algorithm

In this paper, we use the K-Means clustering algorithm

developed by MacQueen (1967) [5] and then refined by

Hartigan and Wong in 1979 [6]. This is an algorithm to

classify or to group objects based on attributes/features, into

a K number of groups where K is a positive integer number.

For the purpose of this paper we consider the objects to be

the input image pixels and their features are their grey-level

values.

The algorithm aims at minimizing an objective function,

in this case a squared error function. The objective function:

J =K∑

j=1

N∑

i=1

‖x(j)i − cj‖

2(1)

where:

• ‖x(j)i − cj‖

2 is a chosen distance measure between the

data x(j)i and the cluster centroid cj ;

In this paper the distance measure chosen is the Euclidean

distance:

d (xi, xj) =

(

N∑

k=1

(xik − xjk)

)

12

(2)

The algorithm has the following steps:

1) We choose the number of clusters, K;

2) We then randomly chose K pixels representing the

initial group centroids;

2010 Second International Conferences on Advances in Multimedia

978-0-7695-4068-9/10 $26.00 © 2010 IEEE

DOI 10.1109/MMEDIA.2010.31

13

Page 2: [IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece (2010.06.13-2010.06.19)] 2010 Second International Conferences on Advances in Multimedia - Image

3) We assign each pixel to the group that has the closest

centroid;

4) When all pixels have been assigned, we recalculate

the positions of the K centroids;

5) Repeat Steps 2 and 3 until the centroids no longer

move. This produces a separation of the pixels into

groups from which the metric to be minimized can be

calculated.

To eliminate the drawback of the K-Means, not yielding

the same result every time is run, we used a seed-based

randomization algorithm, where, every time the K-Means

algorithm starts, the same centroids will be generated.

Below, we can see an example of the K-Means clustering

algorithm applied over the peppers image [8]. The original

image is on the left (a), while, the result image segmented

with K-Means having k=5 is on the right (b).

(a) (b)

Figure 1. Clustering KMeans: the original image (left) and segmentedimage (right)

III. IMAGE FILTERING

In order to have better results in the process of identifying

objects in images, in most cases the input images must be

preprocessed in order to remove noise and enhance contrast.

These requirements also apply in our case, due to the well

known fact that the K-Means method generates a lot of noise

in the resulting image. As a result, in this paper, we use a

Gaussian blur filter [7] to remove undesired noise from the

images.

A. Gaussian Blur

The Gaussian blur filter is a low-pass filter, that reduces

high frequency signals. It removes noise and unnecessary

details from images by using a Gaussian function to compute

a transformation that will be applied to each pixel in the

image. For 2D space the Gaussian distribution has the

following formula [7]:

G (x, y) =1

2πσe−

x2+y2

2σ2 (3)

the origin in the vertical axis, and σ is the standard

deviation of the Gaussian distribution.

To filter an image with Gaussian blur it is enough to filter

it in horizontal direction with 1D filter and then apply the

same filter to the result, in a vertical direction. The order in

which the filtering is applied is not important.

Below is an example of the Lena image [8] with salt and

pepper noise, filtered with Gaussian blur filter. The image

with salt and pepper noise is on the left (a), while, the image

resulted after applying the Gaussian blur is on the right (b):

(a) (b)

Figure 2. Gaussian filter: the original image (left) with salt and peppernoise and the resulted image (right) filtered with Gaussian blur filter.

A big advantage of the Gaussian blur filter, for the purpose

of this paper, is that of having no sharp edges, and thus not

introducing ringing into the filtered image.

IV. WATERSHED TRANSFORMATION

The watershed transformation is a segmentation method

from the class of region based methods.

A. Watersheds and Catchement Basins

The watershed and catchments basins terms are well-

known in topography; a catchment basin is an extent of land

where water drains downhill into a body of water, such as

a river, lake, reservoir, estuary, wetland, sea or ocean, the

watersheds are the separation lines between these catchment

basins.

Figure 3. Watersheds

A watershed algorithm builds a partition of the image

space in the following manner: it associates an influence

zone B(M) called catchment basin, to each minimum M of

the image. The set B(M) is connected and contains M; it

then produces a set of watershed lines which separates those

catchment basins into different sets.

In this paper, we used the immersion algorithm [1] since

it is the one of the most used watershed segmentation

algorithms. It shows an efficient way to extract watershed

lines by simulating the immersion process on the gradient

image.

14

Page 3: [IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece (2010.06.13-2010.06.19)] 2010 Second International Conferences on Advances in Multimedia - Image

B. Vincent-Soille Algorithm

If x and y are two points in X ⊂ Z, the geodesic distance

between x and y is the length of the shortest path(if any)

included in X and linking x and y. [4]

For any set A and any set B ⊂ A made of several

connected components Bi , the geodesic influence zone

IZA (Bi)) of Bi in A is the locus of the points of A whose

geodesic distance to Bi is strictly smaller than their geodesic

distance to any other component of B. Below is the recursion

defining the watershed transformation [2]:

Xhmin+1 = Fhmin+1 = MINhmin

Xh+1 = MINh ∪ IZFh+1 (Xh)

(4)

where hmin is the lowest grey-value of F , where IZFh+1

is the union of the geodesic influence zones of the connected

components of Xh in Fh + 1, and where MINh is the union

of minima of F with grey-level equal to h. The watershed

lines are the complement of Xhmin+1.

However the Vincent-Soille algorithm does not implement

the above recursion, but uses a FIFO queue to flood the

basins and to build the watershed lines. This algorithm has

two steps:

1) sort the pixels in ascending order of the grey level

value for a direct access to a certain grey level;

2) flood step starting with minima and continuing with

the other levels.

The implementation uses a FIFO queue with the following

operations:

1) add - ads pixels at the end of queue;

2) remove -removes the first element of queue;

3) init - initializes an empty queue;

4) isEmpty - returns true if the queue is empty and false

otherwise.

Using a bread first search and repetitive flooding, a unique

label is assigned to each minimum and it’s associated basin.

During the flooding step, a MASK label is assigned to all the

graph nodes with grey level h. The next step is the insertion

in the queue of all the nodes from the previous iteration,

nodes that are then used to propagate the geodesic influence

from inside the MASK labeled pixels.

If a pixel is the neighbor of two or more basins, it is

considered a watershed pixel. If a pixel can be touched

only from nodes with the same label then it is added to

the correspondent basin. Finally, the pixels which still have

the MASK value, are grouped in a set of new minima at

level h, whose connected component get a new label.

The time complexity of this algorithm is linear with the

number of pixels of the input image.

C. Oversegmentation Phenomenon

This paper addresses the oversegmentation problem that

usually appears when images are segmented with the wa-

tershed technique. An example of oversegmentation can

be seen in the peppers image [8] below. On the left is

the original image (a), while, on the right, is the image

oversegmented (b):

(a) (b)

Figure 4. Watershed transformation: the original image (left) and theresulted image after applying the watershed transformation (right)

The main goal of this paper is to reduce this phenomenon

by using an unique combination of methods aimed at reduc-

ing the number of basins.

V. EXPERIMENTS

The images were processed as follows: first the pre-

segmentation step with K-Means algorithm was applied

for pixel-based segmentation. Following an extensive series

of tests for various values of k, the number of clusters,

we determined 5 to be the number that best avoided the

oversegmentation. We then generated 5 random pixels as

cluster centroids.

Each pixel from the input image was assigned to one of

the clusters whose center (also called centroid) was nearest.

Values in the output image represent the cluster number to

which the original pixel was assigned. Each cluster is defined

by its centroid in n-dimensional space.

A disadvantage of K-Means is not yielding the same

result with each run, since the resulting clusters depend

on the initial random assignments. For the purpose of this

paper we needed to be able to ensure the same result on

recurrent runs of the K-Means algorithm, thus ensuring the

same overall result of our combination of methods. For this,

cluster centroids were determined using a fixed seed based

randomization algorithm. As a result, every time the process

starts the same centroids will be generated and the same

outcome is obtained from the K-Means phase of the image

segmentation technique used in this paper.

The output was then processed with Gaussian blur filter.

This was done in order to eliminate the noise from the image

resulted after the K-Means was applied.

The resulted image from the Gaussian blur filter is then

used as the input for the watershed transformation. Due to

previously applying the afore mentioned algorithms in the

order we described the output of the watershed transforma-

tion allows for a highly improved oversegmetation reduction

which leads to a better identification of the objects within

the image.

15

Page 4: [IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece (2010.06.13-2010.06.19)] 2010 Second International Conferences on Advances in Multimedia - Image

A. Database

For experiments we used a segmentation evaluation

database. This database is specially designed to avoid po-

tential ambiguities by only incorporating images that clearly

depict one object in the foreground that differ from its

surroundings by either intensity, texture, or other low level

cues. The ground truth segmentation were obtained by

asking human subjects to manually segment the images into

two classes foreground and background with each image

segmented by three different human subjects. The segmenta-

tion is evaluated by assessing its consistency with the ground

truth segmentation and their amounts of fragmentation. [3]

B. Experimental Results

As stated above we used a segmentation evaluation

database that also contains images segmented by human

objects. We applied our combination of methods on the

original images and the result was compared with the simple

watershed transformation applied on the same original im-

ages as well as to the segmentation done by human subjects.

It proved that our combination of methods performed very

well and the oversegmentation was significantly reduced,

in over 70 percent of the cases where our combination

was applied the outcome was almost identical with the one

obtained by human subjects.

(a) (b) (c) (d)

Figure 5. a)Original Image b)Image segmented with our segmentation(79basins) c)Human segmented image d)Image with simple watershed(2315basins)

(a) (b) (c) (d)

Figure 6. a)Original Image b)Image Segmented with our segmentation(7basins) c)Human segmented Image d)Image with simple watershed(506basins)

(a) (b) (c) (d)

Figure 7. a)Original Image b)Image segmented with our segmentation(17basins) c)Human segmented image d)Image with simple watershed(404basins)

As it can be seen from Figure 5, Figure 6 and Figure 7, the

segmentation proposed here performed superior to simple

watershed transformation. The important gain is the fact that

the ovsersegmentation is highly reduced: from hundreds and

even thousands of basins we managed to reduce it to tens

or less basins. Thus the segmentation proposed in this paper

identifies objects close to those identified by human subjects.

VI. CONCLUSION AND FUTURE WORK

This paper presented a method to segment images by

using a unique combination of image processing techniques.

When applying the watershed transformation, due to

image noise and unnecessary details, the result is always

affected by oversegmentation. For this reason, in this paper

we pre-processed the images using first K-Means and then

Gaussian blur filter.

Experiments done on the segmentation evaluation

database were compared with the simple watershed transfor-

mation method and with the segmentation done by human

subjects on the same images. Our segmentation has highly

reduced the oversegmentation and the image objects were

identified with a 70 percent success rate.

Further on, our goal is to identify the catchment basins

resulted from the watershed transformation and to label

them. Having the basins labeled we can then research a

method to use them to detect, extract and analyze blobs and

attach semantic labels that will later be used in a multimedia

search system.

REFERENCES

[1] L. Vincent and P. Soille, Watersheds in digital spaces: Anefficient algorithm based on immersion simulations, IEEEPAMI, 1991, pp. 583-598.

[2] L. Najman and M. Couprie, Watershed algorithms andcontrast preservation, Lecture Notes in Computer Science,2003, pp. 64-65.

[3] S. Alpert, M. Galun, R Basri, and A. Brandt, ImageSegmentation by Probabilistic Bottom-Up Aggregation andCue Integration, Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, June 2007, pp. 6-8.

[4] C. Lantuejoul and S. Beucher, Geodesic distance and imageanalysis, Mikroskopie, vol. 37, 1980, pp. 138-142.

[5] J. B.MacQueen, Some Methods for classification and Analysisof Multivariate Observations, Proceedings of 5-th BerkeleySymposium on Mathematical Statistics and Probability,Berkeley, University of California Press, 1967, pp. 281-297.

[6] J. A.Hartigan and M. A.Wong, Algorithm AS 136: A K-MeansClustering Algorithm, Journal of the Royal Statistical Society,Series C (Applied Statistics), JSTOR, vol. 28, 1979, pp.100-108.

16

Page 5: [IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece (2010.06.13-2010.06.19)] 2010 Second International Conferences on Advances in Multimedia - Image

[7] L. G.Shapiro and G. C.Stockman, Computer Vision, PrenticeHall, 2001, pp. 137-150.

[8] ImageProcessingPlace.com http://www.imageprocessingplace.com/root files V3/image databases.htm 01.04.2010.

17