Adaptive Clustering

download Adaptive Clustering

of 14

Transcript of Adaptive Clustering

  • 8/6/2019 Adaptive Clustering

    1/14

    IEEE T RANS ACT I ONS O N SIGNAL PROCESSING VOL 10 NO 1 AP Kl l 1992 90

    An Adaptive Clustering Algorithm for ImageSegmentationThrasyvoulos N . Pappas

    Abstract-The problem of segmen ting images of objec ts withsmooth surfaces is considered. The a lgori thm we present is agenera l izat ion of the ,K-means c lus ter ing a lgori thm to inc ludespatia l constra ints and to account for local intensi ty var ia t ionsin the image. Spatia l constra ints a re inc luded by the use of aGibbs ran dom f ie ld model . Local intensi ty var ia t ions ar e ac-c ounte d for in a n i t e r a t ive proc e dure involv ing a ve ra ging ove ra s l iding window whose s ize decreases as the a lgori thm pro-gresses . Results with an e ight-neig hbor Gibbs rand om f ie ldmodel applied to pic tures of industr ia l objec ts , buildings , aer ia lphotogra phs , opt i c a l c ha ra c te r s , a nd f a ce s , show tha t the a l -gor i thm pe r form s be t t e r tha n the K-m e a ns a lgor i thm a nd i t snona da pt ive e x te nsions tha t inc orpora te spa t i a l c ons t r a in t s bythe use of Gibbs ran dom f ie lds . A hie ra rc hica l im ple m e nta t ioni s a l so pre se nte d a nd r e su l t s in be t t e r p e r form a nc e a n d f a s te rspeed of execution.The se gm e nte d im a ge s a r e c a r ic a ture s of the or ig ina l s whic hpreserve the most s ignif icant fea tures , while removing unim-portant de ta i ls . They can be used in image recognit ion and asc rude r e pre se nta t ions of the im a ge . T he c a r ic a ture s a r e e a sy todisplay or pr int us ing a few grey levels an d can be coded veryeff ic ient ly. In par t icular , segmenta t ion of faces resul ts in bi-nary sketches which preserve the main character is t ics of theface, so that it is easily recognizable.

    I . INTRODUCTIONE present a technique for segmenting a grey-scaleimage (typically 2.56 levels) into regions of uniformor slowly varying intensity. The segmented image con-sists of very few levels (typically 2-4). each denoting adifferent region, as shown in Fig. 1 . It is a sketch, or

    caricature, of the original image which preserves its mostsignificant features, while removing unimportant details.It can thus be the first stage of an image recognition sys-tem. However, assuming that the segmented image re-tains the intelligibility of the original, it can also be usedas a crude representation of the imag e. The caricature hasthe advantage that it is easy to display or print with veryfew grey levels. The number of levels is crucial for spe-cial display media like paper, cloth, and binary computerscreens. Also, the caricature can be coded very effi-ciently, since we only have to code the transitions be-tween a few grey levels.We dev elop an algorithm that separates the pixels in th eimage into clusters based on both their intensity and theirManuscript received April 5 , 1990; revised February 18. 1991.The author is with the Signal Processing Research Dep artment. AT&TIEEE Log Number 9106031.Bell Laboratories. M urray Hill. NJ 0 7 9 7 4 .

    Fig. I . Grey-scale image and its four-level s egmentation

    relative location. The intensity of each region is assumedto be a slowly varying function plus noise. Thus we as-sume that we have images of objects with smooth sur-faces, no texture. We make use of the spatial informationby assuming that the distribution of regions is describedby a Gibbs random field. The parameters of the Gibbsrandom field model the size and the shape of the regions.A well-known clustering procedure is the K-means al-gorithm [ ] , [2]. When applied to image segm entation thisapproach has two problems: i t uses no spatial constraintsand assumes that each cluster is characterized by a con-stant intensity. A number of algorithms that were recentlyintroduced by Geman and Geman [ 3 ] , Besag [4] , Derinand Elliot [SI, Lakshmanan and Derin [6], and others, useGibbs random fields and can be regarded as an extension

    of this algorithm to include spatial constraints. They allassume, howev er, that the intensity ( or some other param-eter) of each region is constant.The technique we develop can be regarded as a gener-alization of the K-means clustering algorithm in two re-spects: it is adap tive and includes spatial constraints. Likethe K-means clustering algorithm, our algorithm is itera-tive. Each region is characterized by a slowly varying in-tensity function. Given this intensity function, we definethe a posteriori probability density function for the dis-tribution of regions given the observed image. The den-sity has two components, one constrains the region inten-sity to be close to the data and the other imposes spatialcontinuity. The algorithm alternates between maximizingth e a posteriori probability density and estimating the in-tensity functions. Initially, the intensity functions areconstant in each region and equal to the K-means clustercenters. A s the algorithm progresses, the intensities areupdated by averaging over a sliding window whose size1053-587X/92$0300 C 1992 IEEE

    I

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    2/14

    902 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 40. NO . 4, APRIL 1992

    progressively decreases. Thus the algorithm starts withglobal estimates and slowly adapts to the local character-istics of each region.Our algorithm produces excellent caricatures of a va-riety of images. They include pictures of industrial ob-jects, bu ildings, optical characters, faces, and aerial pho-tographs. The binary sketches of faces, in particular,preserve the main characteristics of the face, so that it iseasily recognizable. They can thus be used both as rep-resentations of the face and as input to a face recognitionsys tem.The performance of our technique is clearly superior toK-means clustering for the class of images w e are consid-ering, as we show with several exam ples. It is also betterthan the nonadaptive extensions of K-m eans clustering thatincorporate spatial constraints by the use of Gibbs randomfields. Our experimental results indicate that the perfor-mance of the algorithm is also superior to existing regiongrowing techniques (see [7 ] for a survey ). In the two-levelsegmentation case our technique compares favorably toothe r adap tive thresholding techniques that can be found

    We also compare our adaptive clustering algorithm withthe edge detector of [lo ], [111. Its advantage o ver the edgedetection approach is that it works with regions. In theedge detection case it is often difficult to associate regionswith the detected boundaries. In our case regions are al-ways well defined. The comparison becomes particularlyinteresting when we consider the case of faces. We willshow that it is usually easier to recognize a face from abilevel caricature than from its edges.We co nclude our presentation of the ad aptive clusteringalgorithm by considering a hierarchical multigrid imple-mentation. We construct a pyramid of different resolu-tions of an image by successively filtering and decimatingby two, starting from the highest resolution image. Thedevelopment relies on understanding the behavior of thealgorithm at different levels of t h e resolution pyramid. Thehierarchical implementation results in better performanceand faster speed of execution.The m odel for the class of images we are consideringis presented in Section 11. The algorithm is presented inSectio n 111. In Section IV we exam ine the behavior of thealgorithm on a variety of images. In Section V we con-sider a hierarchical implementation of the algorithm. Thepaper's conclusions are summarized in Section V I.

    11. MODELWe model the grey-scale image as a collection of re-gions of uniform or slowly varying intensity. The onlysharp transitions in grey level occur at the region bound-aries. Let the observed grey scale im age be y. The inten-sity of a pixel at location s is denoted by y5 ,which typi-cally takes values between 0 and 255. A segmentation ofthe image into regions will be denoted by x , where x , =i means that the pixel at s belongs to region i. The numberof differ ent region types ( or classes) is K . In this sectionwe develop a model for the a posteriori probability den-

    in [81, P I .

    sity function p ( x y). By Bayes' theoremP ( X I Y) a P( Y I x)p(x> (1 )

    where p ( x ) is the a priori density of the region processan d p ( y I x ) is the conditional density of the observed im-age given the distribution of regions.W e model the region process by a Markov random field.That is, if N,y s a neighborhood of the pixel at s, he n(2)

    We consider images defined on the Cartesian grid and aneighborhood consisting of the 8 nearest pixels. Accord-ing to the Hammersley-Clifford theorem [121 the densityof x is given by a Gibbs density which has the followingform:

    p ( x , xq , al l q + s) = P ( X , xq , q E N,) .

    (3 )Here 2 is a normalizing constant and the summation isover all cliques C . A clique is a set of points that areneighbors of each other. The clique potentials V , dependonly on the pixels that belong to clique C .Our model assumes that the only nonzero potentials arethose that correspond to the one- and two-point cliquesshown in Fig. 2 . The two-point clique potentials are de-fined as follows:

    ( - B . if x. . = x.. and s. a E C

    The parameter /3 is positive, so that two neighboring pix-els are more likely to belong to the same class than todifferent classes. Increasing the value of /3 has the effectof increasing the size of the regions and smoothing theirboundaries. The one-point clique potentials are defined asfollows:Vc ( x ) = ai, if x, = i and s E C , all i. ( 5 )

    The lower the value of a ' , the more likely that pixel sbelongs to class i. Thus, the parameters ai eflect our apriori knowledge of the relative likelihood of the differentregion types. In the remainder of this paper we will as-sume that all region types are equally likely and set a' =0 for all i. A more extensive discussion of Gibbs randomfields can be found in [3]-[5].The conditional density is modeled as a white Gaussianprocess, with mean pi and variance oz. ' Each region i ischaracterized by a different pi which is a slowly varyingfunction of s. Thus, the intensity of each region is mod-eled as a signal pi plus white Gaussian noise with varianced.The combined probability density has the form

    'Note that Ihe superscript i in pi an d a' s an index, not a n exponent . Thefew instances in the text where an exponent is used, as in 0 2 , should beobvious from the context .

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    3/14

    PAPPAS: ADAPTIVE CLUSTERING ALGORITHM 903

    F i g . 2 . Cliques for Gibbs probability density funct ion

    We observe that the probability density function has twocomponents. One constrains the region intensity to beclose to the data . The other imposes spatial continuity.Note that if the intensity functions pi do not depend ons, hen we get a model similar to those used in 131-[5]. Ifwe also set p = 0, then we get the K-means algorithm.Thus our technique can be regarded as a generalization ofthe K-means clustering algorithm . The Gibbs random fieldintroduces the spatial constraints, and the pixel depen-dence of the intensity functions makes it adapt ive . Wefound both of these additions to be essential for the pro-posed algorithm. and the noise vari-ance 2 are known , then we m ust estimate both the distri-bution of regions x and the intensity functions p: . Thiswill be done in the following section. Note that the onlyconstraint we impose on the functions p: is that they areslowly varying. The estimation procedure we present inthe following section uses this fact and implicitly imposesthe constraint. T he problem with using a parametric model(e .g ., polynom ial) for the region intensities is that, sincethe regions and intensity functions must be estimated con-currently, the parametric model becomes sensitive to in-accurate and fragmented region estimates.Alternative models for the conditional density are pos-sible. For example, Chalmond [13] and Dinten [14] us eBesag's autonormal model [4 ] in the context of image res-toration, and Bouman and Liu [15], [16] use an autore-gressive model for segmentation of textured images.W e would like to point ou t that the Gibbs field is a goodmodel of the region process only if we have a good modelfor the conditional density. The Gibbs model by itself isnot very useful. It is easy to see that, in the absence of anobservation, the optimal segm entation is just on e region.In their discussion of the Ising model, Kindermann andh e l l [17] observe that the Gibbs field is ill behaved ifthere is no strong outside force (observed image withedges, in our case).

    If we assume that the parameter

    111. ALGORITHMIn this section we consider an algorithm for estimatingthe distribution of regions x and the intensity functions

    p:. Note that the functions pi are defined on the same gridas the original grey-scale image y and the region distri-bution x.Like the K-means clustering algorithm, our al-gorithm is iterative. It alternates between estimating x ndthe intensity functions p:.First, we consider the problem of estimating the localintensity functions. Given the region labels x,we estimatethe intensity pi at each pixel s as follows. We avera ge the

    I" . . _ . _ . _ _ .'

    F i g . 3 . Local average e s t imat ion .

    grey levels of all the pixels that belong to region i and areinside a window of width W centered at pixel s, as shownin Fig. 3 . When the num ber of pixels of type i inside thewindow centered at s is too small, the estimate pi is notvery reliable. At these points the intensity pi is undefined.When this happens, x, cannot be assigned to level i . Thuswe have to pick the minimum number of pixels T,,, thatare necessary for estimating p: . The higher this thresholdthe more robust the computation of pf . On the other hand,any small isolated regions with area less than this thresh-old may disappear. Thus the presence of this thresholdstrengthens the spatial constraints. A reasonable choicefor the minimum num ber of pixels necessary for estimat-ing p: is Tmln= W , the window width. This value of thethreshold guarantees that long one-pixel-wide regions willbe preserved.

    We ha ve to obtain estim ates of pi for all region types iand all pixels s.Clearly, the required computation is enor-mous, especially for large window sizes. Instead, wecompute the estimates p: only on a grid of points and usebilinear interpolation to obtain the remaining values. Thespacing of the grid points depends on the window size.We chose the spacing equal to half the window size ineach dimension (50% overlap). Fig. 4 shows a windowand the location of the centers of adjacent windows. Thefunctions pi are smooth and there fore the interpolated val-ues are good approximations of the values we would ob-tain by an exact calculation. Note that the larger the win-dow the sm oother the functions, hence the spacing of thegrid points increases with the window size. Note also that,because of the dependence of the grid spacing on the win-dow size, the amount of computation is independent ofwindow size.Fig. 5 shows an original im age, its two-level segm en-tation, and the local intensity functions for the two levels.Second , we consider the problem of estimating the dis-tribution of regions. Given the intensity functions PI,, we

    Authorized licensed use limited to: Geor ia State Universit Downloaded on March 19 2009 at 15:16 from IEEE X lore Restrictions a l

  • 8/6/2019 Adaptive Clustering

    4/14

    904

    0

    0

    0

    0

    I E E E TRANSACTIONS ON SIGNAL PROCESSING, VOL. 40. NO . 4 . APRIL 1992

    0 0 0

    0

    0

    0 0 0 0 0Fig. 4. Window locations for local average estimation.

    F ig . 5 . Local intensity functions. (a) Original image. (b ) Two-level segmentation. (c) Level 0 intensity function. (d) Level 1intensity function.

    want to maximize the a posteriori probability density (6)to obtain the MAP estimate of x. Finding the global max-imum of this function requires a lot of computations. Itcan be done using the Gibbs sampler and simulated an-nealing [3]. Instead, w e maxim ize the conditional densityat each point x, given the data y and the current x at all

    Authorized licensed use limited to: Geor ia State Universit Downloaded on March 19 2009 at 15:16 from IEEE X lore Restrictions a l

  • 8/6/2019 Adaptive Clustering

    5/14

    PAPPAS: ADAPTIVE CLUSTERING ALGORITHM 905

    other points: 1 get segmentation iby K-means I

    The equality on the left follows from the M arkovian prop-erty and the whiteness of the noise. The maximization isdone at every point in the image and the cycle is repeateduntil convergence. This is the iterated conditional modes(ICM) approach proposed by Besag [4], sometimes alsocalled the greedy algorithm [181. It corresponds to instan-taneous freezing in simulated annealing. Bouman and Liuuse this greedy algorithm in their multiple resolution ap-proach for image segmentation [151, [161. The order inwhich the points are updated within a cycle depends onthe computing environment [4]. In our implementation theupdating was done in a raster scan. This procedure con-verges to a local maximum [4]. It stops when the numberof pixels that change during a cycle is less than a thresh-old (typically M / 10 for an M X M image). The conver-gence is very rapid, usually in less than five cycles, con-sistent with Besags observations.Now we con sider the overall adaptive clustering algo-rithm. We obtain an initial estimate of x by the K-meansalgorithm [2]. Starting from this segmentation, ou r algo-rithm alternates between estimating x and estimating theintensity functions pi . We define an iteration to consist ofone update of x and one update of the functions pi. Thewindow size for the intensity function estimation is keptconstant until this procedure converges, usually in lessthan ten iterations (The number of iterations could varywith imag e content.) Ou r stopping criterion is that the lastiteration converges in one cycle. T he whole procedure isthen repeated with a new window size. The adaptation isachieved by varying the window size W . Initially, thewindow for estimating the intensity functions is the wholeimage and thus the intensity functions of each region areconstant. As the algorithm progresses, the window sizedecreases. The reason for this is that in the early stagesof the algorithm, the segmentation is crude, and a largewindow is necessary for robust estimation of the intensityfunctions. As the algorithm progresses, the segmentationbecomes better, and smaller windows give more reliableand accurate estimates. Thus the algorithm , starting fromglobal estim ates, slowly ad apts to the local characteristicsof each region. A detailed flowchart of our algorithm isgiven in Fig. 6 . The algorithm stops when the m inimumwindow size is reached. Typically we keep reducing thewindow size by a factor of two, until a minimum windowsize of W = 7 pixels.If the window size is kept constant and equal to thewhole image, then the result is the same as in the ap-proach of [3]-[6]. This nonadaptive clustering algorithmis only an intermediate result of our adaptive clusteringalgorithm. It results in a segmentation similar to the K-

    window equalswhole image

    get new estimates fisI iven p:get new segmentation iI n=n+lFig . 6 . Adaptive clustering algorithm

    means; only the edges are smoother. The strength of theproposed approach lies in the variation of the window sizewhich allows the intensity functions to adjust to the localcharacteristics of the image. Note that the estimation pro-cedure imposes the smoothness constraint on the local in-tensity functions. Also, there is no need to split our re-gions into a set of connected pixels as is done in [191.T hevariable window size takes care of this problem.We assume that the noise variance 2 is known or canbe estimated independently of the algorithm . This is a rea-sonable assumption in many cases, especially when wehave images obtained under similar conditions. From (6)it follows that increasing 2 is equivalent to increasing 0.In fact, the parameter p of the Gibbs random field can bechosen so that the region model is weaker in the earlystages (algorithm follows the data), and stronger in thelater stages (algorithm follows model) [4], [20]. In ourimplementations we chose the constant value 0 = 0.5 fo rall the iterations. Once the algorithm converges at thesmallest window size, then p could be increased to obtainsmoother segment contours. Thus, given an image to besegmented, the most important p arameter to choose is thenoise variance. In Section V we show that the perfor-mance of the algorithm is reasonable over a wide rangeof noise variances. In fact, the noise variance controls theamount of detail detected by the algorithm.We also have to choose the number of different regionsK. We found that K = 4 works well for most images.Note that, in contrast to the K-means algorithm and someof the other adaptive schemes [7], [8], the choice of K isnot critical because the adaptation of the intensity func-tions allows th e same region type to have different inten-sities in different parts of the image. In the following sec-tion we discuss the choice of K more ex tensively.

    Ill I

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    6/14

    906 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 40. NO . 4. APRIL 1992

    I( C )

    F i g . 7 . E x a mple A . (a) Original image . (b) K-me a ns a lg o r i thm ( K = 4 l e v e l s ) . (c ) Adaptive c lus ter ing ( K = 4 l e v e l s )

    The amount of computation for the estimation of x de -pends on the image dimensions, t h e number of differentwindows used, the number of classes and, of course, im-age content. A s we pointed out earlier, the amount ofcomputation for each local intensity calculation does notdepend on the window size. We should also point out thatboth the local intensity calculation and each cycle of theprobability maximization can be done in parallel. Thus,even though the algorithm is very slow on serial ma-chines, it can be considerably faster on a parallel com-puter.IV . EXAMPLES

    In this section we examine the performance of the al-gorithm on a variety of images. We discuss the choice ofthe number of regions K , and compare adaptive clusteringto K-means clustering and edge detection.An original image is shown in Fig. 7(a ); the resolutionis 256 X 25 6 pixels and the grey-scale range is 0 to 255

    (8 b). White Gaussian noise with U = 7 grey levels hasbeen added to this image. Fig. 7(b) shows the result ofthe K-means algorithm for K = 4 levels. The two prob-lems we mentioned earlier are apparent. There are a lotof isolated pixels and areas of uniform intensity (e.g. ,walls on lower right) are not well separated. Fig. 7(c)shows the result of our adaptiv e algorithm for K = 4 ev -els. The superior performance of the adaptive algorithmis apparent. Both problems have been elim inated. We canunderstand the limitations of the K-means algorithm whenwe look at a scan line of the image. The dotted line inFig. 8(a) shows a horizontal scan line in the bottom halfof the image of Fig. 7(a), and the solid lines show thefour characteristic levels selected by t h e K-means algo-rithm. The solid line of Fig. 8(b) shows the resulting seg-mentation. We observe that while p0 an d p' are reason-able models of the local image intensity, p 2 an d p3 ar enot. This is because the selection of the four levels isbased on the histogram of the whole image and does not

    n

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    7/14

    PAPPAS: ADAPTIVE CLUSTERING A L G O R I T H M

    255image profile

    1281 I '640

    0 64 128 192 255(a )

    2550image profile.M 3

    P2

    v '

    . .. .0 1 I I ,0 64 128 192 255

    (b)Fi g 8 K-means algorithm (a) Image profile and characteristic levels (b )Image profile and model

    necessarily match the local characteristics. Fig. 9(a)shows the intensity functions of the adaptive algorithmand Fig. 9(b) shows the corresponding segmentation. Theintensity functions have indeed adapted to the local inten-sities of the image.Another original image is shown in Fig. 10(a); the res-olution is 244 X 256 pixels and the grey-scale range is 0to 255. The standard deviation of the noise was estimatedto be (T = 7 grey levels. Fig. lO(b) shows the result of t h eK-means algorithm. Fig. 1O(c) shows the result of ouradaptive algorithm. Again w e chose K = 4. The problemswe mentioned above are again evident in the K-means re-sult. The adaptive algorithm gives a much better segmen-tation, even though som e problems are still present (e. g. ,the rings of the wrench in the lower right are not welldefined). Overall, we could say that the adaptive algo-rithm produces a simple representation of the imag e, whileretaining most of the important inform ation.One of the important parameters in the adaptive clus-tering algorithm is the number of classes K. In most clus-tering algorithms the choice of the number of classes isimportant and could have a significant effect on the qual-

    25 5image profile

    64

    0 0 64 128 192 255(a )

    907

    P:

    255image profile

    64

    0 0 64 128 192( b )

    v:

    5

    Fig. 9. Adaptive clustering algorithm. (a ) Image profile and local intensityfunctions ( I S X 15 window) . (b) Image profile and model ( I S X 15 win-d o w ) .

    it y of segmentation. Thus, many authors consider esti-mating of the number of classes as a part of their segmen-tation algorithms, for example [21], [1 6]. In the Case ofthe K-means algorithm, choosing the wrong number ofclasses can be disastrous as we will see in the followingexamples. However, our adaptive algorithm is quite ro-bust to the choice of K. This is because the characteristiclevels of each class adapt to the local characteristics ofthe ima ge, and thus regions of entirely different intensitiescan belong to the sam e class, as long as they are separatedin space.Fig. Il (a ) shows the result of the K-means algorithmfor K = 2. Because of the wid e variation of the grey-scaleintensity throughout the image, a lot of detail has beenlost in the K-means case. Fig. ll(b) shows the result ofour adaptive algorithm for K = 2. Here most of the detailin the image has been r ecove red.2 We can argue that the'Note that the thin oblique lines of Figs. I I(b) and 7(b) have disappearedin Figs. 1 ( b ) and 7(b ), respectively. Th is is caused by the combination ofthe Gibbs constraints and the threshold T,,,, defined in Section 111. The onlyone-pixel-wide features preserved are those of high contrast.

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    8/14

    9OX IkkF I R A N S A C T I O N S ON S I G N A L P R O C L S S I N G VO L 40 N O 4 A P R I L I Y Y ?

    ib )Fi g . I I . Example A . (a ) K-means algorithm ( K = 2 leve ls ) . ( b ) Adaptive

    clustering ( K = 2 l eve l s ) .In many images K = 4 is the best choice. This appearsto be related to the four-color theorem [22], even thougha proof would be very difficult. However, for K = 2 theadaptive algorithm results in a very good representationof the image, even though the individual segments don'talways correspond to actual edges in the image. Th e valueof K = 2 is especially im portant because it simplifies boththe display (e.g. , on paper or on binary comp uter screens)and coding of the image.A s another example, consider the original image shownin Fig. 12(a) ; the resolution is 512 X 512 pixels and thegrey-scale range is 0 to 255. The standard deviation of

    shows the result of the K-means algorithm for K = 2. Fig.12(c) shows the result of our adaptive algorithm . The ad -vantage of the adaptive algorithm is apparent. For ex-ample, an airstrip at the lower left of the image has dis-appeared in the K-means case, but is nicely preserved bythe adaptive algorithm.

    (C )Fig . 10. Example B . ( a ) Original image. ib ) K-nie;in\ algorithm ( K = 4leve ls ) . (c ) Adaptive clustering ( K = 4 l e v c l s ) . the noise was assumed to be (T = 4 grey levels. Fig. 12(b)segmentation is not perfect; indeed, some edges are miss-ing (e .g ., the sky in the upper right) and som e spuriousedges are present (e .g. , around the w indows in the lowerleft). Clearly, K = 4 results in a better segmentation.

    Authorized licensed use limited to: Geor ia State Universit Downloaded on March 19 2009 at 15:16 from IEEE X lore Restrictions a l

  • 8/6/2019 Adaptive Clustering

    9/14

    P A P P A S : A D A P T I V E C L U S T E R I N G A L G O R I I ' H M

    (c )Fig. 12 . Example C. (a ) Or ig ina l imagc . (b )K-means algorithm ( K = 2 l eve l s ) . ( c ) Adapt ive c lu \ t enng ( K = 2 level\)

    In some cases K is constrained by the statement of theproblem. For example, consider the image of Fig. 13(a).Here K = 2 since each point of the image either belongsto the character or to the background. The resolution is64 X 64 pixels and the grey-scale range is 0 to 255. Thisimage was obtained by blurring the optical character "E:"with a Gaussian filter (the standard d eviation is 2. 4 pixels)and adding white Gaussian no ise with U = 10grey levels.Fig. 13(b ) shows the result of the K-means algorithm. Fig.13(c) shows the result of our adaptive algorithm. W e ob-serve that the adaptive algorithm recovers the hole in thecenter of the characte r. This could be crucial in the correctinterpretation of th e character. Thus our algorithm couldbe an important component of an optical character rec-ognition system .The performance of the two-level adaptive algorithm isparticularly good on grey-scale images of human faces.Consider the original image shown in Fig. 14(a); the res-olution is 512 X 512 pixels and the grey-scale range is 0

    to 255. The standard deviation of the noise was assumedto be U = 7 grey levels. F ig. 14(b ) shows the result of theK-means algorithm and Fig . 14(c) shows the result of ou radaptive algorithm. The K-means algorithm produces animage that is too light and, therefore, barely gets some ofthe facial characteristics. The adaptive algorithm, on theother hand, gives a very good sketch of the face. Notethat the nose is implied by a couple of black regions at itsbase, much like an artist might sketch it with a few brushstrokes. In general, the K-means algorithm produces a bi-nary image that may be too light or too dark, dependingon the overall image content. The proposed algorithmadapts to the local characteristics and preserves the fea-tures of the face.The binary ske tches preserve the characteristics of thehuman faces as we have observed over several examplesobtained under different conditions (see Section V fo rmore examples). A s we discussed, the sketches can beused for displaying or printing on special media like pa-

    m -- I

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    10/14

    910 I E E E TRANSACTIONS ON SIGNAL PROCESSING, VOL. 40. NO . 4, APRIL 1992

    (a ) (b ) (C)F ig . 13 . Example D. (a ) Origina l image . (b ) K-means algorithm ( K = 2 levels) . (c) Adaptive clustering ( K = 2 levels) .

    (C ) (d )F i g . 14. Example E . (4 Origina l image . (b ) K-means algorithm ( K = 2 levels) . (c ) Adaptive clustering ( K = 2 levels) . (d)Edge de tec t ion .

    II

  • 8/6/2019 Adaptive Clustering

    11/14

    PAPPAS: A D A P T I V E C L U S T E R I N G ALGORITHM 91 I

    per, cloth and binary computer screens. They can be codedwith very few bits of data, sinc e we only have to code thetransitions between a few grey levels. In [23] we showthat coding of typical face caricatures requires less than0. 1 b/p ixe l, which is substantially lower than what wave-form coding techniques require.The dual to image segmentation is edge detection. It isthus interesting to compare our segmentation results withan edge detector. Here we use the edge detector of Boieand Cox [lo], [ll]. Fig. 14(d) shows the result of theedge detector with integrated line detection; their algo-rithm includes an automatic threshold selection. The al-gorithm is very successful in detecting the most imp ortantedges. We observe, however, that it is not as easy to rec-ognize the person from the edges as it is from the binarysketches. This dem onstrat es the advanta ge of representingan image as a set of black and white regions, as opposedto a set of edges. Note the basic tradeoff between regionsegmentation and edge detection. As we m entioned, notall region transitions correspond to edges in the image.On the other hand, the detected edges do not necessarilydefine closed regions.As an indication of the amo unt of computation requiredby the adaptive clustering algortihm, we look at the run-ning time on a SU N SPAR C station 1+ . The computationtime for the four-level segmentation of Fig. 7(c) was 1 1min of user time. The K-means clustering of Fig. 7(b)required only 10 s of user time. The computation time forthe two-level segmentation of Fig. ll (b ) was 7.6 min ofuser time and for the K-means clustering of Fig. 1 (a ) 4sec of user time. The resolution of these images is 256 x256 pixels. When the resolution increases to 512 x 51 2pixels, as in F ig . 12 , the computation times become 35min for the adaptive algorithm and 13 s for the K-meansalgori thm. In our implementations we made no systematicattempt to make the code more efficient, and the choiceof termination criteria was conservative.

    V . H I E R A R C H I C A LULTIGRIDM P L E M E N T A T I O NIn this section we present a hierarchical implementationof our segmentation algorithm. The goal is to improveboth its performance and speed of execution. The devel-opment of such a hierarchical approach relies on under-standing the behavior of the algorithm at different levelsof the resolution pyramid. In particular, we would like toknow how the various parameters of the algorithm changewith the resolution of the image.We construct a pyramid of images at different resolu-tions in the following way. Starting from the highest res-olution image, we obtain the next image in the pyramidby filtering and decimating by a factor of two. The low-

    pass decimation filters we use have a sharp cutoff so thatthe sharpness of the edges is preserved at the lower res-olutions.The image model we described in Section I1 can bewritten as follows:(8)S = 4 + n ,

    where I, is the piecewise continuous signal and n, is whiteGaussian noise with standard deviation U. The signal atthe next resolution has the same formy ; = z: + n: (9)

    where I;, he filtered and decimated Z,, is approximatelypiecewise continuous and n:, the filtered and decimatedn , , is approximately white Gaussian with standard devia-tion U = u/2. Thus, the model is still valid at the nextresolution and only the noise standard deviation 0 hangesby a factor of two.Based on our previous discussion of the role of the pa-rameter p , it is reasonable to assume that it does notchang e with resolution. Bouman a nd Liu [15], 1161 adopta similar approach in their multiple resolution segmenta-tion of textured images. An interesting alternative ap-proach for the choice of the Gibbs parameters is proposedby Gidas [24], who obtains an approximation of p at dif-ferent resolutions based on a probabilistic model for thewhole pyramid. Given our choice of and U , the onlyremaining parameter is the detection threshold Tmln.W eassume that Tmln W , the window width, is a reasonablechoice for the detection threshold T,,,,at the highest lev el.As we lowe r the resolution w e expect that significant fea-tures become smal ler and , therefor e, we should reduce thedetection threshold T,,,,,. We decrease T,,, by a factor oftwo from one resolution to the next.The hierarchical implementation of the algorithm is de-scribed in Fig . 15 . We start at the lowest resolution im-age, and apply our segmentation algorithm as describedin the previous sections, starting from the K-means clus-ters. When the minimum window size is reached, wemove to the next level in the pyramid and use the currentsegmentation, expanded by a factor of two, as a startingpoint. Since we now have a good starting point, we neednot iterate through the whole sequence of windows. Westart where we left off at the previous resolution, whichis twice the minimum window size of that resolution.This approach reduces the amount of computation sig-nificantly, since most iterations are at the lowest level ofresolution. The improvement in performance is also sig-nificant as the following example indicates. In our imple-mentations we used a four-level pyramid.An original image is shown in Fig . 16(a) ; the resolutionis 512 x 51 2 pixels. We assume that the noise level ofthis image is U = 8 grey levels. Fig. 16(b) shows theresult of the adaptive algorithm for K = 2 levels, appliedto the full-scale im age. F ig . 1 6(c) shows the result of thehierarchical implementation of the two-level adaptiv e al-gorithm. Both result in very good segmentations. We canargue, however, that the hierarchical segmentation hasbetter adjusted to the local image characteristics, as canbe seen by the difference in the lips and the area abovethe hat.As an indication of the potential compu tational savingsof the hierarchical im plementation we look at the runningtimes of the two algorithms on a SU N SPAR C station 1+ .The full-scale implementation of Fig. 16(b ) required 36

    111 1

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    12/14

    912

    lowest resolution image

    I E E E TRANSACTIONS ON S I G N A L PROCESSING, VOL. 40. NO . 4, AP RI L 1992

    get segmentation iby K-means

    window equalswhole image

    doublewindow sizeiterate to getnew estimateswindow size

    highest

    Fig. 15. Hierarchical adaptlve clustering algorithm

    min of user time, while the hierarchical implementationof Fig. 16(c) required only 8.2 min of user time. The first(lowest resolution) stage required 10 s of user time, thesecond 16 s , the third 118 s , and the last 348 s . Again ,no systematic attempt was made in our implementationsto improve the efficiency of the code.We now look at the performance of the algorithm fordifferent noise-level estima tes. An original ima ge is shownin Fig. 17(a) ; the resolution is 512 x 512 pixels. Figs.17(b)-(e), show the result of the hierarchical implemen-tation of the adaptive algorithm (K = 2) for noise levelestimates U = 4, 8, 16, and 32 grey levels, respectively. lh lThe original ima ge is the same in all cases. O bserve howthe amount of detail detected by the algorithm decreasesas the noise variance increases. This example i l lustratesthat the algorithms performance is reasonable over a widerange of noise variances and, m oreover, that we can con-trol the amount of detail by choosing the noise variance.Thus, o ur knowledge of the level of noise determines theamount of detail in the segmented image.V I. CONCLUSION

    We presented an algorithm for segmenting images ofobjects with smooth surfaces. We used Gibbs randomfields to model the region process. The parameters of th eGibbs random field model the size and the shape of th eregions. The intensity of each region is modeled as aslowly varying function plus white Gaussian noise. Theadaptive technique we developed can be regarded as ageneralization of the K-means clustering algorithm. Webuildings, aerial photographs, faces , and Optical c h a m -applied our algorithm to pictures of industrial objects, (C )

    (c ) Hierarchical adaptive clustering ( K = 2)F~ ~ 16 Example F (a) Original image (b) Adaptive clustering ( K = 2)ters. The segmented images are useful for image display,

    I I

    Authorized licensed use limited to: Georgia State University. Downloaded on March 19, 2009 at 15:16 from IEEE Xplore. Restrictions apply.

  • 8/6/2019 Adaptive Clustering

    13/14

    PAPPAS: ADAPTIVE CLUSTERING ALGORITHM 913

    ( d ) ( e )F i g . 17 . Examp le G : Hierarchical adapt ive clustering ( K = 2 ) . ( a ) Original image. ( b ) U = 4 . (c ) U = 8 . ( d ) U = 16. ( e )

    U = 3 2 .

  • 8/6/2019 Adaptive Clustering

    14/14

    914 IEEE TRANSACTIONS ON SIGNAL PROCESSING, V O L . 40, NO . 4, APRIL 1992

    coding, and recognition. We believe that our approachcan be extended to th ewith slowly varying textures, as well as moving images.[I31 B. Chalm ond, Image restoration using an estimated Markov mo del,Signal Processing, vol. 15, no. 2, pp. 115-129, Sept. 1988.[ I41 J . M. Dinten , Tomographic r econs t ruc t ion of axially symmetric ob-jects: Regularization by a Markovian modelization. presented at the

    Of images that regions

    ACKNOWLEDGMENT10th 1nt . Cong r . Pa t t. Recog. , 1990.[I51 C. Bouman and B. Liu, Segmentation of textured images using amultiple resolution approach, in Proc . IC A SSP-88 (New York), Apr.1988 ,pp . 1124-1127.[16] C. Bouman and B. Liu, Multiple resolution segmentation of tex-tured images, IEEE Trans. Putt . Anal. Machine Intel l . . vol. 13 , no.

    The author like to thank I . Ox for providing th ecode for the Boie-Cox edge detector.REFERENCES

    [ I ] J . T . T o u a n d R . C . G o n z a l e z , Pattern Recognit ion Principles .Reading MA: Addison-Wesley, 1974.121 R. M. Gray and Y. Linde, Vector quantizers and predictive quan-tizers for Gauss-Markov sources, IEEE Trans. Commun . , vol.COM-30, no . 2 , pp . 381-389, Feb. 1982.[3] S . Geman and D. Gema n, Stochastic relaxation, Gibbs distr ibu-tions, and the Bayesian restoration of images, IEEE Trans. Putt .A nal . Mach ine ln te l l . , vol . PAMI-6 , no . 6 , pp . 721-741, Nov. 1984.[4] J. Besag, On the statistical analysis of ditty pictures, J . R oy . S ta t .Soc . B , vol . 48 , no . 3 , pp . 259-302, 1986.[5] H. Derin and H. Elliot, Modeling and segmentation of noisy andtextured images using Gibbs random fields, IEEE Trans. Patt . Anal.Machine Intel l . , vol. PAMI-9, no. 1, pp. 39-55. Jan . 1987.[6 ] S . Lakshmanan and H . Derin, Simultaneous parameter estimationand segmentation of Gibbs random fields using simulated annealing,IEEE Trans. Patt . Anal. Machine In te l l . , vol. 1 I , no . 8, pp. 799-813, Aug. 1989.[7 ] R . M. Haralick and L. G . Shapiro, Image segmentation tech-niques, C omput . V i s ion , Graph . , Image Proces s ing, vol. 29, pp.[8] P. K. Sahoo, S . Soltani, and A . K . C. Wo ng, A survey of thresh-olding techniques, C omput . V i s ion , Graph . , Image Proces s ing, vol.41, pp. 233-260, 1988.[9] J . S . Weszka , A survey of threshold selection techniques, C o m -put . Graph . Image Proces s ing, vol . 7 , pp . 259-265, 1978.[ lo] R. A. Boie and I . J. Cox, Two-dimensional optimum edge recog-nition using matched and Wiener filters for machine vision, in Proc .IEEE First Int. Con$ Compur. Vision (Lond on, England), June 8- 1 I ,

    [ I l l I . J . Cox, R . A . Boie . and D. A. Wal lach . Line r ecogni tlon , inProc. Tenth Int . Cant Put t . R ecogn . (Atlantic City, NJ), June 16-[ 121 J . Besag, Spatial interaction and the statistical analysis of lattice

    100-132, 1985.

    1987, pp. 450-456.

    21, 1 990, pp. 639-645.systems, J . R oy . S ta t . Soc . B , vol . 26 , no . 2 , pp . 192-236, 1974.

    2 , pp . 991113, Feb. 1991.[I71 R. Kindermann and J. L. Snell, Markov Random Fields and rheirApplications. American Mathematical Society, 1980.[18] H . Derin, Ex periment ations with simulated annealing in allocationoptimization problems, in P r o c . CISS-86, pp. 165-171.[I91 T . Aach, U. Franke, and R. Mester , Top-down image segmentationusing object detection and contour relaxation, in Proc . IC A SSP-89(Glasgo w, U.K .) , May 1989. pp. 1703-1706.[20] G. Wolberg and T. Pavlidis, Restoration of binary images usingstochastic relaxation with annealing, Parr. Recogn. Lett . , vol . 3 .no. 6, pp. 375-387, D ec. 1985.[21] J. Zhang and J. W. Modestino, A model-f itting approach to clustervalidation with application to stochastic model-based image segmen-tation, IEEE Trans. Pat:. Anal . Machines Intel l . , vol. 12, no. I O ,pp. 1009-1017, Oct . 1990.[22] K. Appel and W . Haken , Every planar map is four colorable: PartsI and 11, 111. J . M a t h . , vol. 21, no. 3, pp. 429-567, Sept. 1977.[23] T. N. Pappas, A daptive thresholding and sketch-codingof grey levelimages , Proc . SP lE In t. Soc. O p t . E n g . , vol. 1199, Nov. 1989.1241 B. Gidas, A renormalization group approach to image processingproblems, IEEE Trans. Putt . Anal. Machine I n t . , vol. 11 , no. 2,pp. 164-180, Feb. 1989.

    Thrasyvoulos N. Pappas received the S.B.,S.M ., and Ph .D. degrees in electr ical engineeringand computer science from the M assachusetts In-stitute of Technology, Cambridge, in 1979, 1982,and 1987, respectively.He is currently with the Signal Processing Re-search Department at AT&T Bell Laboratories,Murray Hill, NJ. His research interests are in im-age processing and computer vision.