[IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece...
Click here to load reader
-
Upload
adelina-iulia -
Category
Documents
-
view
214 -
download
0
Transcript of [IEEE 2010 Second International Conferences on Advances in Multimedia - Athens, Greece...
Image Segmentation with Clustering K-Means and Watershed Transform
Adelina-Iulia Sarpe
University of Craiova,
Faculty of Automation, Computers and Electronics
Craiova, Romania
Email: [email protected]
Abstract—Image segmentation is a very important processfor multimedia applications. Multimedia databases use seg-mentation for the storage and indexing of images. This paperpresents a way to segment images by applying both a clusteringmethod and watershed transformation. It is well known that themajor drawback of the watershed transformation method is theoversegmentation phenomenon it produces. For this reason theimage is first segmented with the K-Means clustering method.Another well-known fact is that after applying the K-Meansalgorithm the output image contains a lot of noise. This is whythe image is then filtered with a Gasussian blur filter. Finallythe watershed transformation is applied. Tests results obtainedusing the images from a segmentation evaluation database,show that using this particular combination of methods resultsin a highly reduced oversegmentation.
Keywords-K-Means clustering; image segmentation; water-shed transformation.
I. INTRODUCTION
Image segmentation is the process of partitioning an im-
age into multiple segments. The scope is to simplify and/or
change the representation of an image into something that is
more meaningful and easier to analyze. The process consists
of labeling all pixels in a given image based on similar
characteristics, resulting in groups of pixels (also referred
to as clusters). The higher purpose is that of identifying
objects within images.
This paper presents a combination of methods with the
purpose of reducing the oversegmentation resulted from ap-
plying the watershed transformation method, thus allowing
for a better identification of objects.
The methods used are (i) K-Means clustering, for a
presegmentation of the image, (ii) Gaussian blur filter, for
reducing the image noise resulted from applying the K-
Means clustering method and also removing the unnecessary
details, and (iii) the watershed transformation. This com-
bination of methods is necessary due to the fact that the
watershed transformation, if applied to images with noise,
produces a lot of oversegmentation in its output, rendering
the end image useless.
This paper is organized as follows: Section 2 describes the
clustering technique outlining the K-Means algorithm. Sec-
tion 3 presents the filtering process with more details about
the Gaussian blur filter. Section 4 presents the watershed
transformation. Section 5 describes the experiments done
and the results obtained, together with a comparison between
these results. Finally Section 6 presents our conclusions.
II. CLUSTERING TECHNIQUE
Clustering is a grouping technique that uses a similarity
measure based on which similar items are placed together in
the same group and dissimilar items are placed in different
groups. The resulting groups are referred to as clusters and
the similarity measure by which they were generated is in
fact known as a distance measure.
This technique is considered to be the most important
unsupervised learning technique, it is widely used in the field
of computer vision and image processing and as a result
has found application in a vast array of domains such as:
Marketing, Biology, Libraries, Medical Imaging, etc.
A. K-Means Algorithm
In this paper, we use the K-Means clustering algorithm
developed by MacQueen (1967) [5] and then refined by
Hartigan and Wong in 1979 [6]. This is an algorithm to
classify or to group objects based on attributes/features, into
a K number of groups where K is a positive integer number.
For the purpose of this paper we consider the objects to be
the input image pixels and their features are their grey-level
values.
The algorithm aims at minimizing an objective function,
in this case a squared error function. The objective function:
J =K∑
j=1
N∑
i=1
‖x(j)i − cj‖
2(1)
where:
• ‖x(j)i − cj‖
2 is a chosen distance measure between the
data x(j)i and the cluster centroid cj ;
In this paper the distance measure chosen is the Euclidean
distance:
d (xi, xj) =
(
N∑
k=1
(xik − xjk)
)
12
(2)
The algorithm has the following steps:
1) We choose the number of clusters, K;
2) We then randomly chose K pixels representing the
initial group centroids;
2010 Second International Conferences on Advances in Multimedia
978-0-7695-4068-9/10 $26.00 © 2010 IEEE
DOI 10.1109/MMEDIA.2010.31
13
3) We assign each pixel to the group that has the closest
centroid;
4) When all pixels have been assigned, we recalculate
the positions of the K centroids;
5) Repeat Steps 2 and 3 until the centroids no longer
move. This produces a separation of the pixels into
groups from which the metric to be minimized can be
calculated.
To eliminate the drawback of the K-Means, not yielding
the same result every time is run, we used a seed-based
randomization algorithm, where, every time the K-Means
algorithm starts, the same centroids will be generated.
Below, we can see an example of the K-Means clustering
algorithm applied over the peppers image [8]. The original
image is on the left (a), while, the result image segmented
with K-Means having k=5 is on the right (b).
(a) (b)
Figure 1. Clustering KMeans: the original image (left) and segmentedimage (right)
III. IMAGE FILTERING
In order to have better results in the process of identifying
objects in images, in most cases the input images must be
preprocessed in order to remove noise and enhance contrast.
These requirements also apply in our case, due to the well
known fact that the K-Means method generates a lot of noise
in the resulting image. As a result, in this paper, we use a
Gaussian blur filter [7] to remove undesired noise from the
images.
A. Gaussian Blur
The Gaussian blur filter is a low-pass filter, that reduces
high frequency signals. It removes noise and unnecessary
details from images by using a Gaussian function to compute
a transformation that will be applied to each pixel in the
image. For 2D space the Gaussian distribution has the
following formula [7]:
G (x, y) =1
2πσe−
x2+y2
2σ2 (3)
the origin in the vertical axis, and σ is the standard
deviation of the Gaussian distribution.
To filter an image with Gaussian blur it is enough to filter
it in horizontal direction with 1D filter and then apply the
same filter to the result, in a vertical direction. The order in
which the filtering is applied is not important.
Below is an example of the Lena image [8] with salt and
pepper noise, filtered with Gaussian blur filter. The image
with salt and pepper noise is on the left (a), while, the image
resulted after applying the Gaussian blur is on the right (b):
(a) (b)
Figure 2. Gaussian filter: the original image (left) with salt and peppernoise and the resulted image (right) filtered with Gaussian blur filter.
A big advantage of the Gaussian blur filter, for the purpose
of this paper, is that of having no sharp edges, and thus not
introducing ringing into the filtered image.
IV. WATERSHED TRANSFORMATION
The watershed transformation is a segmentation method
from the class of region based methods.
A. Watersheds and Catchement Basins
The watershed and catchments basins terms are well-
known in topography; a catchment basin is an extent of land
where water drains downhill into a body of water, such as
a river, lake, reservoir, estuary, wetland, sea or ocean, the
watersheds are the separation lines between these catchment
basins.
Figure 3. Watersheds
A watershed algorithm builds a partition of the image
space in the following manner: it associates an influence
zone B(M) called catchment basin, to each minimum M of
the image. The set B(M) is connected and contains M; it
then produces a set of watershed lines which separates those
catchment basins into different sets.
In this paper, we used the immersion algorithm [1] since
it is the one of the most used watershed segmentation
algorithms. It shows an efficient way to extract watershed
lines by simulating the immersion process on the gradient
image.
14
B. Vincent-Soille Algorithm
If x and y are two points in X ⊂ Z, the geodesic distance
between x and y is the length of the shortest path(if any)
included in X and linking x and y. [4]
For any set A and any set B ⊂ A made of several
connected components Bi , the geodesic influence zone
IZA (Bi)) of Bi in A is the locus of the points of A whose
geodesic distance to Bi is strictly smaller than their geodesic
distance to any other component of B. Below is the recursion
defining the watershed transformation [2]:
Xhmin+1 = Fhmin+1 = MINhmin
Xh+1 = MINh ∪ IZFh+1 (Xh)
(4)
where hmin is the lowest grey-value of F , where IZFh+1
is the union of the geodesic influence zones of the connected
components of Xh in Fh + 1, and where MINh is the union
of minima of F with grey-level equal to h. The watershed
lines are the complement of Xhmin+1.
However the Vincent-Soille algorithm does not implement
the above recursion, but uses a FIFO queue to flood the
basins and to build the watershed lines. This algorithm has
two steps:
1) sort the pixels in ascending order of the grey level
value for a direct access to a certain grey level;
2) flood step starting with minima and continuing with
the other levels.
The implementation uses a FIFO queue with the following
operations:
1) add - ads pixels at the end of queue;
2) remove -removes the first element of queue;
3) init - initializes an empty queue;
4) isEmpty - returns true if the queue is empty and false
otherwise.
Using a bread first search and repetitive flooding, a unique
label is assigned to each minimum and it’s associated basin.
During the flooding step, a MASK label is assigned to all the
graph nodes with grey level h. The next step is the insertion
in the queue of all the nodes from the previous iteration,
nodes that are then used to propagate the geodesic influence
from inside the MASK labeled pixels.
If a pixel is the neighbor of two or more basins, it is
considered a watershed pixel. If a pixel can be touched
only from nodes with the same label then it is added to
the correspondent basin. Finally, the pixels which still have
the MASK value, are grouped in a set of new minima at
level h, whose connected component get a new label.
The time complexity of this algorithm is linear with the
number of pixels of the input image.
C. Oversegmentation Phenomenon
This paper addresses the oversegmentation problem that
usually appears when images are segmented with the wa-
tershed technique. An example of oversegmentation can
be seen in the peppers image [8] below. On the left is
the original image (a), while, on the right, is the image
oversegmented (b):
(a) (b)
Figure 4. Watershed transformation: the original image (left) and theresulted image after applying the watershed transformation (right)
The main goal of this paper is to reduce this phenomenon
by using an unique combination of methods aimed at reduc-
ing the number of basins.
V. EXPERIMENTS
The images were processed as follows: first the pre-
segmentation step with K-Means algorithm was applied
for pixel-based segmentation. Following an extensive series
of tests for various values of k, the number of clusters,
we determined 5 to be the number that best avoided the
oversegmentation. We then generated 5 random pixels as
cluster centroids.
Each pixel from the input image was assigned to one of
the clusters whose center (also called centroid) was nearest.
Values in the output image represent the cluster number to
which the original pixel was assigned. Each cluster is defined
by its centroid in n-dimensional space.
A disadvantage of K-Means is not yielding the same
result with each run, since the resulting clusters depend
on the initial random assignments. For the purpose of this
paper we needed to be able to ensure the same result on
recurrent runs of the K-Means algorithm, thus ensuring the
same overall result of our combination of methods. For this,
cluster centroids were determined using a fixed seed based
randomization algorithm. As a result, every time the process
starts the same centroids will be generated and the same
outcome is obtained from the K-Means phase of the image
segmentation technique used in this paper.
The output was then processed with Gaussian blur filter.
This was done in order to eliminate the noise from the image
resulted after the K-Means was applied.
The resulted image from the Gaussian blur filter is then
used as the input for the watershed transformation. Due to
previously applying the afore mentioned algorithms in the
order we described the output of the watershed transforma-
tion allows for a highly improved oversegmetation reduction
which leads to a better identification of the objects within
the image.
15
A. Database
For experiments we used a segmentation evaluation
database. This database is specially designed to avoid po-
tential ambiguities by only incorporating images that clearly
depict one object in the foreground that differ from its
surroundings by either intensity, texture, or other low level
cues. The ground truth segmentation were obtained by
asking human subjects to manually segment the images into
two classes foreground and background with each image
segmented by three different human subjects. The segmenta-
tion is evaluated by assessing its consistency with the ground
truth segmentation and their amounts of fragmentation. [3]
B. Experimental Results
As stated above we used a segmentation evaluation
database that also contains images segmented by human
objects. We applied our combination of methods on the
original images and the result was compared with the simple
watershed transformation applied on the same original im-
ages as well as to the segmentation done by human subjects.
It proved that our combination of methods performed very
well and the oversegmentation was significantly reduced,
in over 70 percent of the cases where our combination
was applied the outcome was almost identical with the one
obtained by human subjects.
(a) (b) (c) (d)
Figure 5. a)Original Image b)Image segmented with our segmentation(79basins) c)Human segmented image d)Image with simple watershed(2315basins)
(a) (b) (c) (d)
Figure 6. a)Original Image b)Image Segmented with our segmentation(7basins) c)Human segmented Image d)Image with simple watershed(506basins)
(a) (b) (c) (d)
Figure 7. a)Original Image b)Image segmented with our segmentation(17basins) c)Human segmented image d)Image with simple watershed(404basins)
As it can be seen from Figure 5, Figure 6 and Figure 7, the
segmentation proposed here performed superior to simple
watershed transformation. The important gain is the fact that
the ovsersegmentation is highly reduced: from hundreds and
even thousands of basins we managed to reduce it to tens
or less basins. Thus the segmentation proposed in this paper
identifies objects close to those identified by human subjects.
VI. CONCLUSION AND FUTURE WORK
This paper presented a method to segment images by
using a unique combination of image processing techniques.
When applying the watershed transformation, due to
image noise and unnecessary details, the result is always
affected by oversegmentation. For this reason, in this paper
we pre-processed the images using first K-Means and then
Gaussian blur filter.
Experiments done on the segmentation evaluation
database were compared with the simple watershed transfor-
mation method and with the segmentation done by human
subjects on the same images. Our segmentation has highly
reduced the oversegmentation and the image objects were
identified with a 70 percent success rate.
Further on, our goal is to identify the catchment basins
resulted from the watershed transformation and to label
them. Having the basins labeled we can then research a
method to use them to detect, extract and analyze blobs and
attach semantic labels that will later be used in a multimedia
search system.
REFERENCES
[1] L. Vincent and P. Soille, Watersheds in digital spaces: Anefficient algorithm based on immersion simulations, IEEEPAMI, 1991, pp. 583-598.
[2] L. Najman and M. Couprie, Watershed algorithms andcontrast preservation, Lecture Notes in Computer Science,2003, pp. 64-65.
[3] S. Alpert, M. Galun, R Basri, and A. Brandt, ImageSegmentation by Probabilistic Bottom-Up Aggregation andCue Integration, Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, June 2007, pp. 6-8.
[4] C. Lantuejoul and S. Beucher, Geodesic distance and imageanalysis, Mikroskopie, vol. 37, 1980, pp. 138-142.
[5] J. B.MacQueen, Some Methods for classification and Analysisof Multivariate Observations, Proceedings of 5-th BerkeleySymposium on Mathematical Statistics and Probability,Berkeley, University of California Press, 1967, pp. 281-297.
[6] J. A.Hartigan and M. A.Wong, Algorithm AS 136: A K-MeansClustering Algorithm, Journal of the Royal Statistical Society,Series C (Applied Statistics), JSTOR, vol. 28, 1979, pp.100-108.
16
[7] L. G.Shapiro and G. C.Stockman, Computer Vision, PrenticeHall, 2001, pp. 137-150.
[8] ImageProcessingPlace.com http://www.imageprocessingplace.com/root files V3/image databases.htm 01.04.2010.
17