Integration of local and global shape analysis for logo classification

9
Integration of local and global shape analysis for logo classification Jan Neumann, Hanan Samet * ,1 , Aya Soffer 2 Computer Science Department, Center for Automation Research, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA Abstract A comparison is made of global and local methods for the shape analysis of logos in an image database. The qualities of the methods are judged by using the shape signatures to define a similarity metric on the logos. As rep- resentatives for the two classes of methods, we use the negative shape method which is based on local shape information and a wavelet-based method which makes use of global information. We apply both methods to images with different kinds of degradations and examine how a particular degradation highlights the strengths and shortcomings of each method. Finally, we use these results to develop a new adaptive weighting scheme which is based on the relative performances of the two methods. This scheme gives rise to a new method that is much more robust with respect to all degradations examined and works by automatically predicting if the negative shape or wavelet method is performing better. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Shape representation; Shape recognition; Image databases; Symbol recognition; Logos 1. Introduction We examine three different approaches for classifying images with several components in an image database. One approach uses a local method to represent the image, the second uses a global method, while the third combines both using an adaptive weighting scheme based on relative per- formance. The local method uses so-called nega- tive symbols, as described in Soffer and Samet (1998), to compute a number of statistical and morphological shape features for each connected component of an image foreground and back- ground. The global method uses a wavelet decom- position of the horizontal and vertical projections of the global image as described in Jaisimha (1996). As a sample application of well-defined multi-component images, we use logos. Several studies have reported results on some form of logo recognition. Each study used either global or local methods. These include local invariants (Doer- mann et al., 1996; Kliot and Rivlin, 1997), wavelet features (Jaisimha, 1996), neural networks (Cesa- rini et al., 1997), and graphical distribution features Pattern Recognition Letters 23 (2002) 1449–1457 www.elsevier.com/locate/patrec * Corresponding author. E-mail addresses: [email protected] (J. Neumann), [email protected] (H. Samet), [email protected] (A. Soffer). 1 The support of the National Science Foundation under Grants CDA-95-03994, IRI-97-12715, EIA-99-00268, and IIS- 00-86162 is gratefully acknowledged. 2 Currently at IBM Research Lab, Haifa 31905, Israel. 0167-8655/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII:S0167-8655(02)00105-8

Transcript of Integration of local and global shape analysis for logo classification

Integration of local and global shape analysis forlogo classification

Jan Neumann, Hanan Samet *,1, Aya Soffer 2

Computer Science Department, Center for Automation Research, Institute for Advanced Computer Studies, University of Maryland,

College Park, MD 20742, USA

Abstract

A comparison is made of global and local methods for the shape analysis of logos in an image database. The

qualities of the methods are judged by using the shape signatures to define a similarity metric on the logos. As rep-

resentatives for the two classes of methods, we use the negative shape method which is based on local shape information

and a wavelet-based method which makes use of global information. We apply both methods to images with different

kinds of degradations and examine how a particular degradation highlights the strengths and shortcomings of each

method. Finally, we use these results to develop a new adaptive weighting scheme which is based on the relative

performances of the two methods. This scheme gives rise to a new method that is much more robust with respect to all

degradations examined and works by automatically predicting if the negative shape or wavelet method is performing

better. � 2002 Elsevier Science B.V. All rights reserved.

Keywords: Shape representation; Shape recognition; Image databases; Symbol recognition; Logos

1. Introduction

We examine three different approaches forclassifying images with several components in animage database. One approach uses a local methodto represent the image, the second uses a globalmethod, while the third combines both using anadaptive weighting scheme based on relative per-

formance. The local method uses so-called nega-tive symbols, as described in Soffer and Samet(1998), to compute a number of statistical andmorphological shape features for each connectedcomponent of an image foreground and back-ground. The global method uses a wavelet decom-position of the horizontal and vertical projectionsof the global image as described in Jaisimha(1996). As a sample application of well-definedmulti-component images, we use logos. Severalstudies have reported results on some form of logorecognition. Each study used either global or localmethods. These include local invariants (Doer-mann et al., 1996; Kliot and Rivlin, 1997), waveletfeatures (Jaisimha, 1996), neural networks (Cesa-rini et al., 1997), and graphical distribution features

Pattern Recognition Letters 23 (2002) 1449–1457

www.elsevier.com/locate/patrec

* Corresponding author.

E-mail addresses: [email protected] (J. Neumann),

[email protected] (H. Samet), [email protected] (A. Soffer).1 The support of the National Science Foundation under

Grants CDA-95-03994, IRI-97-12715, EIA-99-00268, and IIS-

00-86162 is gratefully acknowledged.2 Currently at IBM Research Lab, Haifa 31905, Israel.

0167-8655/02/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.

PII: S0167-8655 (02 )00105-8

(Kato, 1992). The performance in case of certaindegradations was examined. In this paper wecompare the local and global methods under theinfluence of several image degradations. The per-formance measure is the ranking of the originallogo after inputing a degraded version of it intothe classifier. The results exhibit the advanta-ges and disadvantages of local methods, based onshape features, in contrast to global methods,rooted in signal processing. Finally, we present analgorithm that combines both methods into asingle, robust framework by adaptively weightingthe contributions of each method according to anestimate of their relative performance.

2. Wavelet method

We studied logos in the UMD-Logo-Databasewhich are gray-scale images that are scanned ver-sions of black and white logos (Fig. 1a). We as-sume that the logos have already been segmented

and binarized by a preprocessing step. The prob-lems of segmentation and threshold selection arebeyond the scope of this paper. The classificationmethods should be scale, translation, and rotationinvariant. To achieve this, we apply some nor-malizing steps to the input images before we startthe computation of any features (Fig. 1b). Aftershifting the image so that the centroid of the whitepixel locations is located at the image center,which gives us translational invariance, we rotatethe image around the centroid so that its majorprincipal axis is aligned with the horizontal. Thisgives us rotational invariance. Finally, we resizethe image so that the bounding box of the logosymbol is a given percentage of the image size.This accounts for changes in scale of the inputlogos. These transformations make it possible toperform the following computations without ref-erence to orientation, position, and scale.

Given a normalized image we compute thehorizontal and vertical projections (Fig. 1c–d) ofthis binary image which are defined as PðyÞ ¼

Fig. 1. The steps of the wavelet method: (a) original image, (b) normalized image, (c) horizontal projection, (d) vertical projection,

(e) low-pass wavelet coefficients of the horizontal projection and (f) low-pass wavelet coefficients of the vertical projection (x-axis: index

of coefficient, y-axis: coefficient magnitude).

1450 J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457

Pnx¼1 Iðx; yÞ and PðxÞ ¼

Pmy¼1 Iðx; yÞ. This means

that we are counting the number of white pixelsfor each row and column. Next, we use a wavelettransform to low-pass filter the projections (Fig.1e–f). In our experiments, we used the Haar waveletand the Daubechies wavelet. This process is illus-trated in Fig. 1 for the Haar wavelet. These coeffi-cient vectors, called signatures, are used to comparedifferent logos among each other. We use the L1-Norm to compute the difference between theirsignatures because the L1-Norm is known to berobust against outliers (Strang and Nguyen, 1996).

3. The negative shape method

The novel idea of the negative shape method asdefined in Soffer and Samet (1998) for the repre-sentation of symbol-like data such as found inlogos is that we compute the shape features notjust for the components of the foreground thatconstitute the symbol itself, but also for the com-ponents that make up the background of the im-age containing the symbol.

We start by labeling the connected componentsof the image and its background. To each com-ponent of the labeled image, we first apply thepreprocessing steps described in Section 2 and thencompute the following shape features:

1. F1: Invariant moment: Normalized distance ofpixel positions to the centroid.

2. F2: Eccentricity: The ratio between the lengthsof the major and minor axes.

3. F3: Circularity: The ratio between the perimeterof the component and the perimeter of a circleof equivalent area.

4. F4: Rectangularity: The ratio between the areaof the component and the area of its boundingbox.

5. F5: Hole area ratio: The ratio between the areaof the holes inside the component and the areaof the solid part of the component.

6. F6, F7: Horizontal (Vertical) gap ratio: Thesquare of the number of pixels inside the com-ponent that have a right (bottom) neighbor thatdoes not belong to the component divided bythe area of the component.

For the negative shape method we define thedistance measure between two logos L1 and L2 asfollows. First, we normalize the value range foreach element of the feature vector over all thelogos in the dataset. Then, for each component ofL1 we find the component of L2 that has thesmallest Euclidean distance in feature space to it.Finally, the average of these minimal distancesover all the components of L1 yields a measure forthe distance between the two logos.

4. Comparison between the methods

All methods were implemented in MatlabTM

and were applied to the logos contained in theUMD-Logo-Database (123 logos, can be found athttp://documents.cfar.umd.edu/resources/database/umdlogo.html). The system was tested by provid-ing it with an input logo and ranking the logos inthe database based on their similarity to this logo.Below, we investigate the robustness of the meth-ods when the logos are corrupted using four dif-ferent image degradation methods. We choose twoglobal degradations as demonstrated in Fig. 2aand b, and two local degradations as shown in Fig.2c and d which we will describe in more detaillater. Global degradation means that the degra-dation influences all regions of the image in thesame way while leaving the large-scale image

Fig. 2. The two global (a–b) and two local (c–d) types of degradations.

J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457 1451

structure intact. Local degradations either influ-ence only a localized image region (Fig. 2c) or leavethe small-scale image structure invariant (Fig. 2d).

For each method, we degrade the images in thedatabase, varying the amount of degradation inten equally spaced steps. We then classify the logosin the database based on the computed distance inthe feature space to the degraded input, before weexamine the rank––in terms of feature space dis-tance––of the corresponding uncompromised logo.To analyze the influence of a degradation on theperformance of a method, we plot the cumulativedistribution of the ranks of the original logoswhich is defined as follows. Let i be the index of alogo and n the number of logos in the database. Ifwe degrade logo i and input it into the classifier,the corresponding original, uncompromised logowill have rank rkðiÞ. The cumulative distributionof the ranks (CDR) is now defined as

CDRðRÞ ¼ 1

n

Xni¼1

/ðrkðiÞ;RÞ where

/ðrkðiÞ;RÞ ¼1 : rkðiÞ6R

0 : rkðiÞ > R:

�ð1Þ

This monotone function indicates the percentageof the logos for which the classifier has rankedtheir undegraded match in the database among thefirst R logos thereby giving us a means to measurethe performance of each classifier. In order to showthe difference in performance between the methodsthat we used, we also included graphs displayingthe difference of the respective CDRs. Note thatthe closer the CDR is to 1, the better the classifierperforms because the correct matches have beengiven high ranks.

4.1. Global degradations: optical sensor noise andreduced resolution

To simulate the image degradation that iscaused by processes such as fax transmissions orphotocopying, we use the well-known documentdegradation model by Kanungo et al. (2000). Itmodels the inversion of individual pixel (fromforeground to background and vice versa) due tolight intensity fluctuations, the sensitivity of the

sensor and the thresholding level used in the bi-narization, as well as the blurring caused by thepoint-spread function of the scanner optical sys-tem. We first computed the distance d of each pixelfrom the logo boundary, then we flipped each pixelvalue with the probability pð0j1; d; aÞ ¼ pð1j0; d; a;a0Þ ¼ a0e

�ad2. Finally, we correlated the value of

neighbouring pixels by performing a morphologi-cal closing operation with a disk structuring ele-ment of diameter k. For more details, see Kanungoet al. (2000). In our experiments, we set a0 ¼ 0:5and k ¼ 2, and a was varied logarithmic steps from0:05 to 1 as indicated on the x-axis in Fig. 3a–c. Anexample degraded image is given in Fig. 2a.

The wavelet method outperforms the negativeshape method noticeably as can be seen in Fig. 3c.Even for large amounts of noise, the waveletmethod classifies accurately as indicated by the flatCDR in Fig. 3b, while the performance of thenegative shape method worsens rapidly (Fig. 3a). Itis to be expected that the wavelet method outper-forms the negative shape method when we simulateoptical sensor noise because flipping foregroundand background pixels with equal probabilityshould have only a small effect on the distributionof white and black pixels in a row or column, andthus the projection should not change much. Onthe other hand, in the negative shape method, wecompute the feature vectors only on a small subsetof the pixels of each component. In this case, thenoise will change the spatial distribution of thepixels more drastically because of the smallernumber of pixels involved. Thus the negativeshape method is less robust towards optical sensornoise than the wavelet method.

To see how the methods handle differences inimage resolution, which is obviously not offset bythe scaling invariance since we work on digitizedimages, we reduce the size of the input imagesthrough sub-sampling using bilinear interpolation(e.g., Fig. 2b). The parameter value is the size ratiobetween the original and the sub-sampled imageas indicated on the x-axis in Fig. 3d–f.

As before, the wavelet method outperforms thenegative shape method (Fig. 3f), although thenegative shape method does not exhibit the samebreakdown in performance as in the case of ran-dom noise (Fig. 3d). Since we use the low-pass

1452 J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457

wavelet coefficients for the classifier, the reducedresolution does not influence the performance ofthe wavelet method drastically. This is because sub-sampling an image by bilinear interpolation has asimilar effect as low-pass filtering the image. Thelow-pass wavelet coefficients of a low-pass filteredimage are in general very similar to the low-passcoefficients computed on the original image. Asbefore, the negative shape method is affected by thisdegradation because even when large scale changesare hardly visible, local shape features such ascircularity, rectangularity and gap ratios are moresusceptible to local changes due to a loss of detail.

4.2. Local degradations: occlusion and swirl

To model the occlusion of parts of a logo, weadd a component to the logo image which in thiscase is a black rectangle of varying size (see Fig.2c). The parameter here is the percentage of theimage that is occluded by the rectangle.

The performance graphs show that occlusionhas a greater effect on the wavelet method than thenegative shape method (Fig. 4a–c) although bothmethods are able to handle small occlusions well.Since the addition of an extra object or the omis-sion of parts of the image causes global changes tothe distribution of pixels in each row or column,the projections are strongly affected and thus soare the wavelet coefficients. Because of the localstructure of the shape features, the componentsthat are not occluded are not degraded at all andtheir feature values are unchanged. In the classifierwe average the best feature vector matches for allthe components in the input image. Since an oc-clusion is more likely to combine components intolarger aggregates than to break them into manynew ones, these few new components which do nothave a corresponding component in the originalimage, are influencing the feature distance only toa small extent. Except for very degenerate config-urations, the influence of the new components is

Fig. 3. Performance under global degradations: (a–c) show the results when we added optical sensor noise and (d–f) when we reduced

the resolution. We see in (a) and (d) that the shape method is very sensitive to global degradations, and is noticeably outperformed by

the wavelet method as seen in the difference plots in (c) and (f), where we subtract the CDR of the negative shape method from the

CDR of the wavelet method.

J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457 1453

averaged out by the continuing good matches ofthe feature vectors of the remaining uninfluencedcomponents.

Swirling is a smooth deformation of an imagewhich can be used to model a non-isotropicstretching of a logo. The relative position of eachrow is shifted to the left or right by an offset givenby a smooth function, where the offset is limited toa certain percentage of the image width which isgiven as a parameter. This deforms the logo as ifwe would stretch a rubber sheet in different di-rections (see Fig. 2d).

This degradation has very different effects onthe two methods. The performance of the waveletmethod worsens rapidly with increasing swirl untilwe basically get a random ranking, as indicated bythe linear increase of the CDR in Fig. 4e. In con-trast, the negative shape method classifies muchbetter as seen in the difference of the CDRs in Fig.4f because it is possible to approximate this de-formation by localized similarity transforms sincethe deformation is smooth. The local features used

by the negative shape method are invariant tosimilarity transforms due to the component nor-malization. Therefore, it is much less affected bythis degradation than the wavelet method. Recallthat the wavelet method is only globally invariantto similarity transforms due to the global prepro-cessing, but not locally.

5. The best of both worlds: a combination of the two

methods

In Section 4 we saw that the wavelet and thenegative shape methods perform very differently ifthe input logo is corrupted by either local or globaldegradations. Thus, if we were able to detect whichmethod is performing better, then we could devisea switching mechanism that chooses the betterperforming method automatically.

To be able to compare both methods, we firstnormalize the featured distances by computing forboth methods the average of the pairwise feature

Fig. 4. Local degradations: (a–c) shows the results for the occlusion and (d–f) the results for the swirl. Notice how the local degra-

dations affect the performance of the wavelet method much more than the negative shape method as indicated in the difference plots (c)

and (f), where we subtract the CDR of the negative shape method from the CDR of the wavelet method.

1454 J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457

distances between all uncompromised logos in thedatabase. We denote these normalization factorsby Nshape and Nwave. For each type of degradation,we vary the amount of degradation in 10 equally-spaced steps and compute the following measureover all logos in the database and for all amountsof degradations:

RrankdiffðDÞ :¼Yi2L

FshapeðiÞNshape

Nwave

FwaveðiÞ

!1=jLj

;

where L :¼ fijrkshapeðiÞ � rkwaveðiÞ ¼ Dg:

ð2Þ

FshapeðiÞ is the feature distance between a degradedversion of logo i and the undegraded version of ias computed by the negative shape method andFwaveðiÞ the feature distance as computed by thewavelet method, while rkshapeðiÞ and rkwaveðiÞ arethe corresponding ranks of the matching unde-graded logo (see Section 4). This measure corre-lates each difference in ranking with the geometricmean of the corresponding ratios between thefeature distances computed by the two methods.Therefore, we relate the performance of the clas-sifier with the relative feature space distances. Notethat we use the geometric mean instead of thearithmetic mean as Rrankdiff involves a product ofratios rather than a sum of values.

Fig. 5 shows that whenever one method per-forms better (positive rank difference if the waveletmethod is performing better, negative if the nega-

tive shape method performs better), this method ingeneral also computes a lower normalized featuredistance for the original logo (the ratio is largerthan 1 if the wavelet method computes the lowernormalized feature distance, the ratio is smallerthan 1 if the negative shape method computes thelower normalized feature distance). Understandingthis relationship between the change in the ratio ofthe normalized feature distances and the relativeperformance of the two methods when applied todegraded images enabled us to adaptively weightthe respective contributions of the two methodswhen combining them into a single distance mea-sure. Therefore, in order to take advantage of therespective strengths of both methods we devisedthe following performance-dependent weightingscheme.

Let the vector of feature distances of the de-graded logo i to all undegraded logos in the dat-abase be DshapeðiÞ for the negative shape methodand be DwaveðiÞ for the wavelet method. Moreover,denote their means by AshapeðiÞ and by AwaveðiÞ,respectively. Now, if the ratio

RanðiÞ ¼AshapeðiÞNshape

Nwave

AwaveðiÞð3Þ

indicates that one of the two methods is per-forming worse than expected, then we decreaseits weight in the final classification and increasethe weight of the other method. The combined

Fig. 5. Relationship between relative feature distance and classification performance. For each type of degradation, we plot the

difference D in the classification of the matching logo (for all logos i, we subtract the rkðiÞ as determined by the wavelet method from

rkðiÞ as determined by the shape method), and compare it to RrankdiffðDÞ, the geometric mean of the ratios as defined in Eq. (2).

J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457 1455

feature distance vector DcombðiÞ for the degradedinput logo i is then a weighted sum of DwaveðiÞ andDshapeðiÞ:

DcombðiÞ ¼ RanðiÞDwaveðiÞNwave

þ RanðiÞ�1 DshapeðiÞNshape

ð4Þ

If the ratio Ran decreases (increases), this indi-cates that the degradation of the input logo causesthe wavelet method to compute feature distanceslarger (smaller) than the negative shape method.Consequently, the weight of Dwave with respect toDshape in Eq. (4) will be reduced (increased), be-cause we have less (more) confidence in the waveletmethod’s ability to classify the input logo correctlythan in the negative shape method’s ability to do so.

This adaptive weighting scheme increases therobustness of the classification dramatically. Com-paring the performance of the combined method

on the images degraded as described in Section 4,we see that our weighting scheme is able to detectthe change in relative performance and adjustthe weights to mimic the classification of the bet-ter performing method. Almost always the com-bined method equals the performance of the bettermethod (Fig. 6c–f) and surpasses by far the worsemethod (Fig. 6a, b, g and h) in terms of the CDR.This shows that our combined scheme is effectivein capturing global as well as local shape infor-mation and is thus able to deal well with all theimage degradations of the kind that we described.

6. Summary and future work

Both the wavelet as well as the negative shapemethod are well-suited for certain kinds of image

Fig. 6. Performance of the combined method: in (a)–(d) we subtract the CDR of the negative shape method from the CDR of the

combined method, and in (e)–(h) we subtract the CDR of the wavelet method from the CDR of the combined method.

1456 J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457

degradations but are very sensitive to others. Thisdiscrepancy in performance can be explained bythe difference between local shape feature-basedmethods and global filter-based methods. On theone hand, we have the wavelet method that oper-ates on the global image and computes featuresthat are relatively invariant to degradations thatare isotropic. On the other hand, we have thenegative shape method which operates on localimage regions. Thus its features are relatively in-variant to changes that leave the image at otherlocations mostly intact such as occlusions or pre-serve the local image structure such as the swirldeformation. We take advantage of the fact thatboth basic methods perform very differently onimages that exhibit degradations of either local orglobal nature by devising a performance-depen-dent weighting scheme that combines the results ofboth methods. Our combined algorithm shows anoticeable improvement in the robustness of theclassification by combining the strengths andavoiding the weaknesses of the respective methods.This weighting scheme performs the better themore different the performances of the underlyingmethods are because this makes it easier to detectif one method is performing poorly with respect tothe other method. Therefore, the wavelet and thenegative shape methods are very well-suited to becombined by a performance-dependent weightingscheme. Of course, the overall performance of ouradaptive scheme is limited by the best performanceof either method, thus the application of thisweighting scheme to more advanced logo recog-nition methods is planned for future work. Inaddition, we will try to improve the synergy be-

tween the two methods by using local image in-formation to estimate how much an image regionis degraded and then use this locality informationto adaptively weight the feature vectors on thecomponent level.

References

Cesarini, F., Fracesconi, E., Gori, M., Marinai, S., Sheng, J.Q.,

Soda. G., 1997. A neural-based architecture for spot-noisy

logo recognition. Proceedings of the Fourth International

Conference on Document Analysis and Recognition, Ulm,

Germany, August 1997, pp. 175–179.

Doermann, D., Rivlin, E., Weiss, I., 1996. Applying algebraic

and differential invariants for logo recognition. Machine

Vision and Applications 9 (2), 73–86.

Jaisimha, M.Y., 1996. Wavelet features for similarity based

retrieval of logo images. Proceedings of the SPIE, Docu-

ment Recognition III, Vol. 2660, San Jose, CA, January

1996, pp. 89–100.

Kanungo, T., Haralick, R.M., Baird, H.S., Stuezle, W.,

Madigan, D., 2000. Statistical nonparametric methodology

for document degradation model validation. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence 22

(11), 1209–1223.

Kato, T., 1992. Database architecture for content-based image

retrieval. Proceedings of the SPIE, Image Storage and

Retrieval Systems, Vol. 1662, San Jose, CA, February 1992,

pp. 112–123.

Kliot, M., Rivlin, E., 1997. Shape retrieval in pictorial

databases via geometric features. Technical Report

CIS9701, Technion––IIT, Computer Science Department,

Haifa, Israel, 1997.

Soffer, A., Samet, H., 1998. Using negative shape features for

logo similarity matching, Proceedings of the 14th Interna-

tional Conference on Pattern Recognition, Brisbane, Aus-

tralia, August 1998, pp. 571–573.

Strang, G., Nguyen, T., 1996. Wavelets and Filter Banks.

Wellesley-Cambridge Press, Wellesley, MA.

J. Neumann et al. / Pattern Recognition Letters 23 (2002) 1449–1457 1457