Pattern matching using the blur hit–miss transform

Journal of Electronic Imaging 9(2), 140–150 (April 2000).

Downl

Pattern matching using the blur hit–misstransformDan S. Bloomberg

Luc VincentXerox Palo Alto Research Center

Palo Alto, California 94304E-mail: [email protected]

oallonanch

pleithdi-

aly-to

ctedth-s.in

sult

te,ut,te-vo-

tic

henenver,are

andreing.late

andctur-

thattaintyold-ro-endis-h

oferatefea-

200

Abstract. The usefulness of the hit–miss transform (HMT) and re-lated transforms for pattern matching in document image applica-tions is examined. Although the HMT is sensitive to the types ofnoise found in scanned images, including both boundary and ran-dom noise, a simple extension, the blur HMT, is relatively robust.The noise immunity of the blur HMT derives from its ability to treatboth types of noise together, and to remove them by appropriatedilations. In analogy with the Hausdorff metric for the distance be-tween two sets, metric generalizations for special cases of the blurHMT are derived. Whereas Hausdorff uses both directions of thedirected distances between two sets, a metric derived from a specialcase of the blur HMT uses just one direction of the directed dis-tances between foreground and background components of twosets. For both foreground and background, the template is alwaysthe first of the directed sets. A less-restrictive metric generalization,where the disjoint foreground and background components of thetemplate need not be set complements, is also derived. For imageswith a random component of noise, the blur HMT is sensitive only tothe size of the noise, whereas Hausdorff matching is sensitive to itslocation. It is also shown how these metric functions can be derivedfrom the distance functions of the foreground (FG) and background(BG) of an image, using dilation by the appropriate templates. Theblur HMT can be used as a fast heuristic to avoid more expensiveinteger-based matching techniques, and it is implemented efficientlywith boolean image operations. The FG and BG images are dilatedwith structuring elements that depend on image noise and patternvariability, and the results are then eroded with templates derivedfrom patterns to be matched. Subsampling the patterns on a regulargrid can improve speed and maintain match quality, and examplesare given that indicate how to explore the parameter space. Trun-cated matches give the same result as full erosions, are muchfaster, and for some applications can be performed at a restrictedset of locations. © 2000 SPIE and IS&T. [S1017-9909(00)00802-3]

1 Introduction

Pattern matching techniques are critical for all aspectsthe analysis of document images. Documents are typicscanned into a binary image, and many of the operatisubsequently performed, both for page segmentationcharacter identification, use the pattern matching teniques~e.g.,erosionand its dual,dilation! of binary imagemorphology. There are several reasons: they are immented by fast boolean operations; they can be used efor extracting or extending pixel aggregations, both for

Paper 97032 received July 30, 1997; revised manuscript received Feb. 21,accepted for publication Mar. 10, 2000.1017-9909/2000/$15.00 © 2000 SPIE and IS&T.

140 / Journal of Electronic Imaging / April 2000 / Vol. 9(2)

oaded From: http://electronicimaging.spiedigitallibrary.org/ on 10/11/2013 T

fysd-

-er

rect use in later image processing and for subsequent ansis; they are translationally invariant; they can be usedtreat both foreground~FG! and background~BG! simulta-neously; and they can be used without regard to connecomponent analysis. Further, there exist a variety of meods for controlling the noise immunity of these operation

One of the most important uses of pattern matching isthe analysis of character shapes. For binary input, the reof image processing can be either binary or gray~integervalue! images. Binary results are much faster to compubut they contain less information. Even for binary outpthe internal operations can be integer or boolean. For inger operations, such as convolution and thresholded conlution ~rank order filters1!, some level of noise immunity isachieved, but at the price of doing expensive arithmeoperations on each pixel.

We use the termtemplateto refer to the pattern of FGand BG pixels that are to be matched in the image. Thit–miss transform~HMT! is a faster boolean operatiothat performs translationally invariant matching betweboth the FG and BG of template and image sets. Howeit is prone to error from noise because exact matchesrequired between image and template in both the FGBG. Because of the simplicity and power of the HMT, thehave been many attempts to use it for pattern matchThe usual approach is to choose a subset of the temppixels, typically, sparse. We cite a few examples.

Zhao and Daut2 gained noise immunity, relative toHMT, by using either boundary pixels of eroded FG aBG templates, or skeletons of these templates, as struing elements for the HMT. Wilson3 automated the designof the structuring elements through a training processsearched for the smallest subset of pixels that would atthe desired level of discrimination. Kraus and Dougher4

generated a sparse set of structuring elements by threshing a single gray-scale instance of each character. Apppriate choice of thresholds is the critical element: if chostoo conservatively, the subset is too sparse and lackscriminatory ability; if chosen too tightly, instances witatypical variation are missed. Gillies5 took a somewhat dif-ferent approach, accumulating statistics from instanceseach character, and thresholding the aggregates to gennonsparse structuring elements. From these, multipixel

0;

erms of Use: http://spiedl.org/terms

har

n

has, iT.

anofesixetcst

ricar

esenbeen-

opixar-is

erccu

isur-ise

pe.tely

atrettlethebeer-anch-

oftive

thes

ly.thetiontoortric

esowm-reof

eth

d,re-Ta-

eheinonary

co-on-the

m-agewe

andesic

h af

Pattern matching using the blur hit–miss transform

Downl

tures were extracted and used to train a classifier for cacter discrimination.

One characteristic that these methods have in commoan attempt to compensate forimage noiseby altering thetemplate, and leaving the image alone. We argue here talthough it is useful to choose a subset of template pixelis important to alter the image before performing the HMThe blur hit–miss transform~BHMT! has been introducedto do precisely this.6 Unlike the HMT, the BHMT performsthe match between template and image, for both FGBG, with a variable degree of tolerance to alignmentimage and template pixels. The ‘‘blur’’ parameter specifithe maximum distance allowed between a template pand the nearest image pixel, in order to constitute a mafor that template pixel. Stated this way, there is an intereing relation between the BHMT and the Hausdorff metfor the distance between two sets, but the differencesimportant for their uses in applications.

To understand the usefulness of the BHMT, it is necsary to consider the origin of noise in scanned documimages. We postulate a simple model, where variabilitytween instances in the image is caused by two differprocesses. One type isboundary noise, caused by the binarization process along the edge of an object. Dependingthe subpixel alignment of scanned objects with scannerels, considerable edge variation occurs. This boundnoise is typically restricted to a width of two pixels, including both FG and BG boundary pixels. The second typerandom noise, either generated in printing or due to scanndefects such as dirt on the platten. This is assumed to oindependently of the pixels in the scanned object, andmost often observed as isolated FG ‘‘pepper’’ pixels srounded by BG. It should be noted that both types of nooccur inboundary pixels, defined to be pixels of either FGor BG that are adjacent to a pixel of the opposite tyThus, operations that treat boundary pixels appropriawill influence both types of noise.

Even without random noise, boundary noise will defea HMT that uses boundary pixels in the template. Thefore, when computing matches, it is necessary to give lior no weight to the boundary pixels. On the other hand,nonboundary pixels, because of their high correlationtween template and image, are critical for matches. Diffences occurring between nonboundary pixels in imagetemplate, although relatively rare, will defeat both a mating technique like the HMT, that requires an exact matchall pixels, and a metric such as Hausdorff, that is sensito such ‘‘outliers.’’

The paper is organized as follows. In Sec. 2.1BHMT is defined, and the method in which it provideimmunity to both types of noise is described qualitativeIn Sec. 2.2, the Hausdorff metric is introduced, andconnection between this measure on sets and operausing morphology is made. Also, an illustration is givenshow why the Hausdorff metric is not appropriate fmatching templates to noisy images. In Sec. 2.3, two mefunctions are constructed that are related to special casthe BHMT. The same example is then used to show hthe BHMT succeeds in matching templates to noisy iages. Then, in Sec. 2.4, the BHMT metric functions aagain derived, this time from dilations by the templatethe distance function for the image. In Sec. 3, several m


-

is

tt

d

lh-

e

-t-t

n-y

r

-

-

d

s

of

-

ods for efficiently implementing the BHMT are describeincluding subsampling the template. Some experimentalsults are given in Sec. 4 to illustrate the use of the BHMin identifying characters, and the major findings of the pper are summarized in Sec. 5.

We end this section with an illustration in Fig. 1 of thfamily of rank and blur template matching operations. THMT generalizes the erosion to operations that matchboth FG and BG. The rank operations take thresholdsconvolutions, whereas the blur operations remove boundpixels appropriately before doing strict matching. TherankHMT, a relatively expensive integer operation, requireslocation of image and template pixels, but eases the cstraint on the number of matches. The rank HMT andBHMT can also be combined into therank BHMT, inwhich a match is accepted if only a given number of teplate pixels are within a given distance of the nearest impixel. In the sequel, we concentrate on the BHMT, butgive one example of the use of therank BHMT.

2 Blur HMT

2.1 Basic Definitions

We are strictly interested in the discrete case of setsfunctions defined onZ2, although extensions can be madto the continuous case or higher dimensions. The bamorphological operations areerosionanddilation. The ero-sion of a binary imageX by a structuring element~SE! B isthe set operation defined by

X*B5 ùbPB

X2b5$xPZ2uBx#X%, ~1!

whereX2b is the translation of imageX by 2b. The sec-ond definition states that erosion generates a set witnonempty result at every location where the translate oBfits entirely withinX. The dilation of an imageX is defined

X% B5 øbPB

Xb5 øxPX

Bx5$x1bPZ2uxPX,bPB%. ~2!

Fig. 1 Family of rank and blur template matching operations.

Journal of Electronic Imaging / April 2000 / Vol. 9(2) / 141


ate

n,its

se

oth

y tobluheelsthlslesge

thetem

inrg

hesenhens

n in

ir

SEionhe

la

to

e

owstheheet-ted

e

n,

e

s-

in-

n

ary

wn

anded

Bloomberg and Vincent

Downl

The first and second definitions state that dilation genera set composed of the union of translations ofX by ele-ments inB, and vice versa. Note that* and % arenot theoriginal definitions of Minkowski subtraction and additiorespectively, which require an inversion of the SE aboutcenter.7

The HMT is a morphological template matcher whodefinition is based on the erosion operator.7 The HMT of abinary imageX by a disjoint pair (t f ,tb) of SEs is definedas the set transformation

X^ ~t f ,tb!5~X*t f !ù~XC*tb!, ~3!

whereXC is the set of BG pixels ofX. The HMT generatesa set with a nonempty result at every location where bthe FG SEt f fits entirely withinX and the BG SEtb fitsentirely within XC, the complement ofX. It is common tospeak of the elements int f as hits, of elements intb asmisses, and elements not in their union asdon’t cares.

There are several methods for reducing the sensitivitboundary noise. We can erode the template SEs by theSEs, or dilate the image by the blur SEs, all prior to tHMT. Eroding the template removes its boundary pixfrom consideration, whereas dilating the image removesimage boundary pixels. The random noise pixels are aaffected: eroding the template opens up FG and BG hoso they are not included in the match; dilating the imacloses up holes~i.e., removes salt and pepper! in the FGand BG. The results of these operations~followed by theHMT! differ, and the choice must be made based onstatistics of expected noise. For document images, theplate is expected to be free of salt and pepper noisesituations where it can be generated by averaging a laset of instances. However, this isnot true for the image.Salt and pepper noise in the image will prevent matcbetween the FG and BG of the template, respectively. Gerally, it is imperative to remove noise pixels from both ttemplate and image before doing the HMT. For situatiowhere random noise is more frequent in the image thathe template, we thus define the BHMT of a binary imageXusing the SE pair (t f ,tb) for the template and the SE pa(b f ,bb) for blur as follows~also, see Ref. 6!:

X^ ~t f ,tb ;b f ,bb!5~X% b f !*t fù~XC% bb!*tb . ~4!

It can be noted that there is no requirement that theused for blur are symmetric about their center. Translatof the center of a SE simply results in translation of tdilated image. Also note thatt f and tb are typically non-overlapping. Take, for example, the case whereb f5o,bb5o, and t fùtbÞB: it corresponds to a traditionaHMT ~no blur!, using an overlapping pair of SEs. In suchcase, for any setX, one can easily verify thatX^ (t f ,tb)5B: it is indeed impossible to translate (t f ,tb) in such away that the translatedt f is included inX andthe translatedtb is included inXC. With some blur, that is when setsb f

and bb are not reduced to a single pixel, it is possibleobtain a nonempty BHMT result even whent fùtbÞB.



s

r

eo,

-

e

-

s

However, the practical interest of using overlappingt f andtb is extremely limited. In the sequel, we typically assumthat t fùtb5B.

2.2 Relation Between Hausdorff Metricand Morphology

The Hausdorff metric is a distance between sets that allone to define a topology on the set of all possible sets inplane.8 In image terminology, it is a distance between tFG of two images. The relation between the Hausdorff mric and a pair of blurred template matches has been nopreviously,9 and we present the connection here.

Define the distance function10 from a pointp to the near-est point in a setX to be d(p,X). If pPX, then d(p,X)50. For two setsT and I, define thedirected Hausdorffdistance11 from T⇒I as the maximum over the pixels in thsetT of the distance from the pixel inT to its nearest pixelin I:

D~T,I !5suptPTd~ t,I !. ~5!

For applications to document images, considerT to be aFG template and considerI to be awindowedsubset of theimageX with support equal to that of the template. Thefor each position in the plane represented byX, there existsa windowed subsetI ,X and a directed Hausdorff distancD(T,I ) betweenT and the co-locatedI. Suppose this dis-tance isd. If the setI is dilated by a disk of radiusd, thedistance between the dilated set andT will be zero. Conse-quently, an erosion of the dilatedI by T will give a non-empty result.

The Hausdorff metricDH is formed symmetrically be-tweenT and I, as the maximum of the two directed Haudorff distances:

DH~T,I !5max$D~T,I !,D~ I ,T!%. ~6!

Its relation to the blurred match between template and wdowed image subset is9

DH~T,I !5 inf$r>0uT#~ I % rB! and I #~T% rB!%, ~7!

whereB is the unit disk SE. This is a symmetrical relatiobetween FG sets. IfI andT are very similar, small dilationsact only to reduce the distance contributions from boundpixels. Thus, a single nonboundary noise pixel in eitherI orT can render the Hausdorff distance quite large.

The effects of noise are illustrated by the two sets shoin Fig. 2. Call the sets on the left and rightT andI, respec-tively. We have chosen the template to be less noisyslightly eroded with respect to the image. The direct

Fig. 2 Two sets used to illustrate the effect of noise in Hausdorffdistance. One percent of random noise was added to the set on theright.


n

Theion

,lesueofrgeorfe-

onsInet-a

thaG

d

,d-

erate

gs

oe ose

ets

km-

soint

FGnceT

and

are

eby

otwo

te

etedd

ed


Downl

Hausdorff distances generated by these sets are showFig. 3. The left frame is the directed distanceD(T(x,y),I )from T⇒I , evaluated at each possible location ofT withrespect toI. Darker values represent shorter distances.best match, with a distance of 0, is from the dark regnear the center. However, because of the noise inI, thedistanceD(I (x,y),T) from I⇒T, shown in the right framehas a very large value at that location. In fact, the smaldistances inD(I (x,y),T) are found near the boundaries, dto clipping. This clipping effect is another complicationusing a windowed directed Hausdorff distance from a laimage to a small template. For this example, the Hausddistance DH(T,I ) is identical to the directed distancD(I (x,y),T) for every translate ofI, because the match between the two sets is entirely obscured by the noise inI.

2.3 Blur HMT Metric

The BHMT produces a binary image representing locatiof a match with a given amount of FG and BG blurring.this section, we construct two BHMT-related distance mrics, in analogy with the Hausdorff metric. These aregeneralized distance between image and templatewhen thresholded, produce a BHMT for the value of Fand BG blur equal to the threshold.

When the template SET is located on some windowesubsetI of X, its center falls on coordinates~x,y!. Labeleach subsetI of X by this location ~x,y!. Then, form aFG/BG metricDFB , in analogy toDH , that measures thedirected distance between the FG and BG parts ofT and I.The obvious choice for direction isT⇒I . Indeed, since thetemplate is supposed to have been carefully chosenshould exhibit little boundary pixel noise and no salt-anpepper noise. Since the main purpose of the dilation options used as part of the BHMT operation is to eliminasuch noise before matching@see Eq.~4!#, it would not makesense to use directionI⇒T. For the same reason, usinHausdorff distanceDH typically does not work as well athis directed blur HMT metric because of adverse effectssalt-and-pepper-type noise. For example, mainly becauspepper noise, the Hausdorff distance between the twoof Fig. 2 would be very large. On the contrary, thedirectedblur HMT distance from Fig. 2~a! to 2~b! would be dra-matically smaller, which reflects the fact that these two ssimply are different instances of the same letter P.

Fig. 3 Directed Hausdorff distances generated from the sets in Fig.2. The left and right frames are the directed distances D(T(x,y),I)and D(I(x,y),T), respectively.


in

t

f

t,

it

-

ff

ts

Accordingly, we now dilate the full imageX and use the~x,y! translate ofT,T(x,y) , to compareT with each subsetIof X in computing the metric:

DFB~T~x,y!,X!5 inf$r>0uT~x,y!#~X% rb!

and T~x,y!C #~XC

% rB!% ~8!

5max$D~T~x,y!,X!,D~T~x,y!C ,XC!%. ~9!

To understand the relation betweenDFB and the BHMT,consider the BHMT in its most simple form, with two disSEs of equal radius for the blur and two SEs for the teplates that are set complements. Settingt f5T and tb

5TC, the BHMT is found by thresholdingDFB :

X^ ~T,TC;rB,rB !5$~x,y!PZ2uDFB~T~x,y!,X!<r %. ~10!

The restriction inDFB that the two SEs for the templateare set complements can easily be relaxed to the disjconstraint for SEs in the HMT, by requiring only thatt f#Tandtb#TC. Then, the metricDFB is generalized to

DBHMT~t f ~x,y!,tb~x,y!

,X!

5 inf$r>0ut f ~x,y!#~X% rB!

and tb~x,y!#~XC

% rB!% ~11!

5max$D~t f ~x,y!,X!,D~tb~x,y!

,XC!%, ~12!

with the thresholding relation

X^ ~t f ,tb;rB,rB !

5$~x,y!PZ2uDBHMT~t f ~x,y!,tb~x,y!

,X!<r %. ~13!

We use the notation ‘‘BHMT* ’’ to indicate the specialcase where the same dilation operator is used for bothand BG. For the general case there are two BHMT distametrics, one for the FG and one for the BG, and the BHMis derived from them by thresholding each separatelyAND-ing the results.

For reasons of both efficiency and effectiveness, weusually interested in the BHMT wheretbÞt f

C and b f

Þbb . Examples will be given in Sec. 4. In Sec. 2.4, warrive at a distance metric generalization for the BHMTa different route.

WhereasDH has bidirectional symmetry between twFG sets, and ignores the BG,DFB has FG/BG symmetry buimposes a directionality on the relation between the tsetsT andX. Unlike the Hausdorff metric,DFB andDBHMTare relatively immune to salt and pepper noise pixels inX.However, they are sensitive to noise in the templateT, so inpractice, one must ensure that the FG and BG of templaTis free of salt and pepper noise.

Referring to Fig. 4, illustrating the BHMT for the samexample as previously shown for Hausdorff, the direcdistanceD(T,I ) in the first frame is identical to the directeHausdorffD(T,I ) in Fig. 3. ~Although we now omit the~x,y! label onT, remember that these functions are defin



Gis

bhereay

d-

he

ere’ata

the-

loainra

in

f-

it

nl

oreol-

an

of

te-

sa

che-fits

re

ofa,

esseds

.r-

y,Eq.

c-

nc-trics.

nts’

uea

h.

seila-


Downl

over the set (x,y)PZ2 of translates ofT.! However, thesecond frame gives the directed distance for the BD(TC,I C). The effect of noise on this distance is small: itdetermined by the size of the noise inI, rather than thedistancefrom the noise toT. The contribution from the BGdoes affect the overall match to a small extent, as shownDFB in the third frame. This is the special case of tBHMT metric DBHMT where the FG and BG templates aset complements. Because the BHMT distances are alwdirectedT⇒X from the template to the large image, bounary effects occur only on the boundary of the imageX.

Geometrically, the BHMT as we have defined it has tfollowing interpretation: the ‘‘blur dilations’’ of imageXand its complementXC create a halo of pixels near imagboundaries. This halo can be regarded as the ‘‘don’t caregion of imageX, and one of its nice characteristics is thit typically contains all the salt and pepper pixels. InBHMT operation based on templateT, a match is obtainedwhen the FG of the template is located on FG pixels ofimage or on ‘‘halo pixels,’’ and when the BG of the template is located on BG pixels of the image or on ‘‘hapixels.’’ In other words, this BHMT operation introducesway to use ‘‘don’t care’’ pixels in the image as well asthe template used for matching. As such, it is a natuextension of the traditional hit–miss operation.

2.4 Blur HMT Metric Derived From DistanceFunction

Let us now consider a templateT and a binary imageX. Foreach translationT(x,y) of this template we are interestedcomputing the directed Hausdorff distanceD(T(x,y),X) be-tween the translated template andX. This Hausdorff dis-tance is equal to the smallest isotropic dilation size oXsuch thatT(x,y) is included in this dilated image. Specifically, if B represents the 333 isotropic ball of the eight-connected distance function~333 square!, we can write

D~T~x,y!,X!5min$n>0uT~x,y!#~X% nB!%, ~14!

whereX% nB represents then-fold dilation of X by B, orequivalently the dilation ofX by a (2n11)3(2n11)square structuring element, whose center is located ongeometric center.

Consider now for any pixel~x,y! the following distancefunction:

dX~x,y!5min$n>0u~x,y!P~X% nB!%. ~15!

Fig. 4 BHMT distances generated from the sets in Fig. 2. The firstand second frames are the directed distances D(T,I) and D(TC,IC),respectively. The third frame is DFB , the maximum of the two di-rected distances.



,

y

s

’

l

s

This distancedX is simply a traditional distance functiocomputed on the background ofX: it assigns to each pixeits eight-connected distance to the nearest pixel ofX. Ob-viously, any pixel~x,y! included inX is given a value of 0by this distance function. See Refs. 10 and 12 for minformation on distance functions and their use in morphogy.

Putting together the previous two equations we cwrite:

D~T~x,y!,X!5max$dX~ i , j !u~ i , j !PT~x,y!%. ~16!

In other words, the directed Hausdorff distance fromT(x,y)to X can be obtained by extracting the maximal valuedistance functiondX over pixels~i,j! belonging to the trans-lated templateT(x,y) . Therefore:

D~T~x,y!,X!5~dX% T!~x,y!, ~17!

that is, the directed Hausdorff distance between templaTtranslated to pixel~x,y! is equal to the value of the grayscale dilation of distance functiondX by templateT at pixel~x,y!.

The benefits of Eq.~17! are numerous. First, it provideus with a computationally attractive method to computemap of the quality of directed Hausdorff match at eapixel location: in this map, the pixels with value 0 corrspond to locations where the translated template exactlyinside imageX, pixels with value 1 are the locations whethe template fits inside a dilation of size 1 ofX, etc. Second,looking at this map as a gray-scale image, a numbertechniques can now be used to extract its local minimwhich provide us with the location of the local best matchof the template. In addition, the same method can be uwith TC and XC, thereby providing a map of the matchebetween template complement and image complement

We can go one step further: to improve the ‘‘granulaity’’ of this metric and speed up the algorithm significantlwe propose to use an asymmetric distance function in~17!. Instead of defining it based on a 333 structuringelementB, use a 232 square with the center of the struturing element at the upper left. UsingS for distance func-tions has two main advantages:

• The distance function based onS can be computed ina single raster-order pass through imageX instead ofthe two passes required by traditional distance futions. See Ref. 13 for more details on this asymmedistance and its use in fast morphological algorithm

• The granularity of Hausdorff distance measuremeis improved by a factor of 2. In the ‘‘match map’obtained through application of Eq.~17! using thisasymmetric distance, all the pixels with an odd val2p21 correspond to cases where a dilation with(2p)3(2p) square provided the Hausdorff matcUsing the distance based on 333 elementB, onewould not be able to differentiate between thematches and the lower-quality matches where a dtion by (2p11)3(2p11) was required for thematch.


ila-ecleftfor

fan

et-st,on

snce

. In

et-edTold

theeded


Downl

However, one caution in using such a nonsymmetric dtion is that the results are shifted by one pixel with respto the ones obtained using a symmetric dilation. Theand right frames of Fig. 5 show this distance functionthe two sets in Fig. 2.

Now, distanceDBHMT requires taking the maximum otwo such directed distances, one computed for the FG

Fig. 5 Distance function generated from the left and right sets inFig. 2, respectively.


t

d

one for the BG. Because the distance function is asymmric, the location of the result is translated to the southearelative toX, by an amount equal to the distance functiitself. ThresholdingDBHMT with some valuer generates theidentical set as using blur dilation with anr 113r 11 SEon the BG and FG before the HMT.

This set of relations is illustrated in Fig. 6, which showthe sequence of operations that generate BHMT distametrics and BHMT images. We start with the imageX andFG and BG templates. Grid spacings of 2 inx andy direc-tions are used for generating the FG and BG templatesthe FG, the distance function is found forX, and dilatedwith the FG template, giving the FG-directed distance mric. The dual process in the BG yields the BG-directdistance metric. The maximum of these gives the BHM*metric, and for this example it is possible to find a threshthat yields a single match in the BHMT* set. The sameresult can be derived using the threshold individually onFG and BG metrics, and AND-ing the result. With ththreshold chosen, it should be noted that the threshol

Fig. 6 Sequence of operations that generate BHMT distance metrics and BHMT images.



istchT.

shence

e oh-of

ned

a-esteres

madistheodp-th

ol-

thea

he

thethere-ndle-

ism-ttio

t is

m-in

ran-ularthelarrid-

bethe

her adionionsnged

pi-

arytedes,

isla-

ingachachosetion

ithege;

w.at

Gth

n-or

toatoveor

es, itxi-ssto

ave

debedle-

ro,thenes


Downl

FG metric yields matches in two locations, one of whichbetween the template and the ‘‘g’’ in the image. This mawas not seen in the BG, which removed it from the BHMDetails of the BHMT* metric are shown in Fig. 7.

In the general case, one would choose different threold values~blur SEs! in the FG and BG, in order to makthe matching process more robust. This is the differebetween the BHMT and the BHMT* . When an asymmetricdistance function is used, which is analogous to the usdifferent asymmetric SEs for FG and BG blur, the thresolded binary images, which have different translationsthe match with respect to the image, must be realigbefore being AND-ed.

3 Efficient Implementations

Our interest is in finding relatively efficient implementtions of the BHMT that are effective at locating matchwithout a large number of false positives. For characrecognition, for example, the purpose is not to use the bpossible pattern matchers, such as those used to estiprobabilities for templates in a maximum-likelihoocalculation.14 Instead, we might want information thatgood enough to be used as a heuristic for narrowingsearch space for more computationally intensive meththat do a better job of identifying characters. The rank oerations, such as the rank BHMT, are less efficient thanBHMT because they require two~integer! convolutions bySEs, followed by thresholding. The BHMT uses only boean operations.

We now give two approaches to the efficient use ofBHMT for identifying text characters in an image giventemplate.

3.1 Subsampled BHMT

To improve the efficiency in a direct implementation of tBHMT, it is possible to:

• Scale down both image and template. Because wematch all template pixels at each image position,total number of pixels to be matched varies asfourth power of the scaling parameter. The actualduction in computation will be between the secopower and the fourth power, depending on the impmentation.

• Subsample the template. Suppose each templatesubsampled by imposing a regular grid, with subsapling factorsnx andny . This has two effects. First, idecreases the computation required. The reducvaries from approximatelyny to the productnx3ny ,depending on the implementation. The second effec

Fig. 7 Detail of BHMT* metric for the example in Fig. 6.



-

f

tte

s

e

n

that the subsampling tends to reduce the overall teplate dimensions, effectively augmenting the blurthe image.

The template can also be subsampled by choosing adom subset of template pixels, rather than a rectanggrid, but for a given number of template pixels chosen,matching is significantly more accurate when a rectangugrid is chosen. Results using rectangular subsampled gding of the template are given in Sec. 4, where it willseen that some choices of subsampling greatly improveresults.

The first step is to perform the blur dilations on both tFG and BG of the image. This can then be used fomultiplicity of templates. The BHMT can be implementein several ways. For fully parallel methods, each eroscan be formed separately by the usual set of translatand ANDs, where the unit of operation can be anythifrom the pixel to the entire image. The BHMT is generatfrom the intersection of the FG and BG results.

Matches of a template to a character in the image tycally succeed at more than one~x,y! location. So that we donot overcount the matches, after the BHMT it is necessto identify the matches by labeling the eight-conneccomponents in the image. For the sparse BHMT imagthis is relatively fast compared to the BHMT itself.

3.2 Truncated BHMT

There is another implementation of the BHMT thatfaster, somewhat serialized, and largely circumvents thebeling process itself. The idea is to truncate the matchprocess in each location at the first instance of failure. Etemplate can be composed of an array of words, with eword representing the pixels in a template row. Suppboth FG and BG templates are to be tested at some loca~x,y!. Choose the FG template and align its first row wthe image to test for a match.~The test requires only threboolean operations: AND between template and imaXOR between this result and the template; test for 0.! If aline match is found, proceed to the next template roWhenever a line match is not found, quit the process~x,y! and move to the next image location. If a full Fmatch is found, repeat with the BG template. If bomatches succeed, record the location~the labeling process!.

Matches to a single image feature typically occur cotiguously within some region that is comparable tosmaller than the blur size. The serial aspect is requiredavoid recording multiple positions for an image feature thhas already been matched. When a match occurs, mseveral pixels away before looking for the next match. Fthe same reason, when scanning successive image linis useful to avoid regions where a match was found promally on lines above. Truncation of the matching procereduces the computation time by a factor proportionalthe number of lines in the FG and BG templates that hON pixels.

The rank BHMT defined earlier in the paper can provia finer control over the matching. The approach descriin the previous paragraph also provides an efficient impmentation of the rank BHMT. Rather than testing for zecount the ON pixels in the template line that are OFF inimage. Accumulate this sum over succesive template li


dedndhey

Tonca

s/tio

at50

aloforlyre

agec

lusm

segassif ingew

n ogere

h aca-wo

e odetli-d terstelles

hinidsthesees

anth

theassadvi-ew

thethenk

ecange,le

g,omn ain

htd.

thee,-

es

ndlurare’’sses

ed;e

eoutal

m-nsodarem-

reases;ifi-


Downl

until either the rank threshold for that template is exceeor the full template is matched. Do this for both FG aBG, which generally have different thresholds. As in tBHMT, avoid searching for matches in the vicinity of anlocation where a full match has already been found.

The efficiency of the truncated approach to the BHMcan be estimated within a factor of 2 or so, dependingimplementation details and the hardware. For most lotions of the template~s!, the match will fail on the first line,requiring about ten machine instructions~MIs!. Suppose onaverage that 20 MIs are required for each~x,y! location. A400 MIPS machine can then match 20 million positionsecond. For a document image where the vertical locaof text base lines is known within62 pixels, and where thetext linewidth is 2000 pixels, matches are required10 000 positions for each text line. For a full page withtext lines, the matching time is about 25 ms.

3.3 Use of Truncated Rank BHMT

The matching operation of the rank BHMT, performed intruncated fashion and only on a small subset of imagecations, can be used to build an efficient JBIG2 encoderbinary images. JBIG2 is a lossy encoding where similashaped connected components are replaced by a singleresentative template and a set of locations in the imwhere this template is to appear. The JBIG2 standard spfies the file format, but not the encoding method.

The basis of the JBIG2 encoder is an unsupervised ctering procedure, including the shape-matching algorithConsider a two-pass method, where the image is premented into eight-connected components. In the first peach component is examined sequentially to determineis sufficiently similar to the representative of an existiclass, and if not, it becomes the representative of a nclass. The comparison is done using a truncated versiothe rank BHMT, where both the template and the imacomponent to be compared are of comparable size. Toduce computation, it is preferable to evaluate the matcjust one relative location of template and image. This lotion can be chosen by aligning the centroids of the timages.15 With the rank BHMT, the FG and BG of theimage are dilated and the FG and BG of each templatsimilar dimensions are tested in a truncated way, asscribed in Sec. 3.2. Two thresholds, for FG and BG ouers, are set for each template. A template is considerematch an image component if both the FG and BG outlifall below their thresholds. If more than one templamatches an image component, the one with the smanumber of outlier pixels~i.e., the best fit! should be used.

Once the initial clustering is made, the instances witeach cluster are combined, by again aligning the centroto make a less noisy template. These templates areused for the second pass. All image components arequentially compared with the new templates, and the bfit is chosen. If an image component does not matchtemplate, it is used as the template for a new class, as infirst pass. The amount of image distortion, produced bysubstitution of the templates for each instance in the clis controlled by the blur size and the thresholds. Thevantage of the rank BHMT over the rank Hausdorff is edenced in the second pass, where the template noismuch less than the noise in the image components. As


-

n

-r

p-ei-

-.-,

t

f

-t

f-

o

t

,n-t

ye

,-

ise

have seen previously, small salt and pepper noise inimage components is removed by the dilations, sothresholds can be set lower for rank BHMT than for raHausdorff.

4 Illustrative Results for BHMT

In this section, the use of the BHMT for matching imagcharacters is briefly explored. A number of parametersbe varied independently: the FG and BG blur of the imathe x and y grid subsampling of the template, and scareduction of both image and template.

To demonstrate the effect of blur and template griddinan instance of the character ‘‘a’’ is chosen at random frthe image at the top of Fig. 9, and is subsampled oregular grid. A set of 25 gridded FG templates is shownFig. 8, where thenx subsampling increases to the rig~from 1 to 5!, and theny subsampling increases downwarThere is another set~not shown! for the BG templates. Inthe upper image in Fig. 9, there are 88 instances ofcharacter ‘‘a.’’ These are highlighted in the bottom imaghaving been identified by a BHMT using the blur parameters (b f ,bb)5(2,4) and grid spacings (nx ,ny)5(2,2).With this combination, as with several others, all instancwere identified and no false positives were found.

Results for five combinations of FG/BG blur factors a16 grid subsamplings are shown in Table 1. For each bfactor pair, the numbers of misses and false positivesgiven. For example, the two columns labeled ‘‘Blur 2,4used (b f ,bb)5(2,4). With strict matching parameter~fine gridding, small dilation!, there are no false positiveand a significant number of misses. A few false positivare seen for Blur 3,3, particularly for largernx . With Blur4,4, the number of false positives is significantly increasthis amount of dilation removes the differentiation of th‘‘a’’s from other similar characters. A few intermediatcombinations succeeded in finding all occurrences withany false positives. As seen from Table 1, the optimworking region is near the parameters (b f ,bb)5(2,4) and(nx ,ny)5(2,2).

When FG and BG blurs are not equal, there is an asymetry in the results. Table 1 shows particular combinatiowhere the BG blur is larger than the FG blur that give gomatches. When FG blur is larger, as for Blur 4,2, theretypically more misses and more false positives. The asymetry exists because the image has large solid white athat can give false positive matches to all BG templatconsequently, excessive FG dilation contributes signcantly to these errors.

Fig. 8 Gridded FG templates for (nx ,ny) varying from (1,1) in theupper-left template to (5,5) in the lower-right template. Grid spacingnx increases to the right; ny increases downward.




148 / Journ

Downloaded From

Fig. 9 Top: example image. Bottom: characters matched by the BHMT for the templates in Fig. 8 (andthe BG templates as well) have been highlighted.

Table 1 Use of BHMT to identify 88 instances of the character ‘‘a,’’ using a template derived bysubsampling one of those instances. For text of this size and thickness, the most stable region formatches is with parameters near FG and BG blurs (b f ,bb)5(2,4) and grid spacings (nx ,ny)5(2,2). For each set of FG/BG blurs, the numbers of misses and false positives are given for differentgrid spacings.

nx ny

Blur 2,2 Blur 3,3 Blur 4,4 Blur 2,4 Blur 2,5 Blur 4,2

nmiss nfp nmiss nfp nmiss nfp nmiss nfp nmiss nfs nmiss nfp

1 1 84 0 19 0 4 0 8 0 4 0 83 0

1 2 80 0 10 0 0 32 1 0 1 0 77 0

1 3 71 0 16 0 4 0 6 0 2 0 68 0

1 4 65 0 16 0 4 1 4 0 0 0 64 0

2 1 80 0 5 0 0 1 0 0 0 0 79 0

2 2 70 0 1 1 0 74 0 0 0 0 69 0

2 3 54 0 4 0 0 11 0 0 0 0 53 0

2 4 46 0 4 0 0 54 0 0 0 0 46 0

3 1 65 0 1 0 0 15 3 0 3 0 61 0

3 2 47 0 0 0 2 130 0 0 0 0 43 0

3 3 55 0 0 1 0 150 0 0 0 5 51 0

3 4 14 0 0 31 13 328 0 0 0 33 14 16

4 1 39 0 0 0 ¯ ¯ 4 0 4 0 34 0

4 2 0 0 0 32 ¯ ¯ 0 0 0 3 0 168

4 3 16 0 0 21 ¯ ¯ 1 0 1 6 9 100

4 4 1 0 6 193 ¯ ¯ 0 0 1 45 11 609

al of Electronic Imaging / April 2000 / Vol. 9(2)

: http://electronicimaging.spiedigitallibrary.org/ on 10/11/2013 Terms of Use: http://spiedl.org/terms

annd

edgs

ntom

teTer

inpixavnly

mup-erinM

tric

rffilaofthes

nd-

se.ry

foraninaFGax

rnertheoffaumfo

anioncathe

ir

cal

t-

ar-

-

ra-

ic-

m-

m-

ov


Downl

When matching multiple characters in a font and size,optimum pair of FG and BG blur SEs can be chosen, athe template grid spacings can be individually optimizfor each character, including use of different grid spacinfor FG and BG templates. The BHMT exhibits significaimmunity to random noise. For example, when randnoise at the 1% level shown in Fig. 2~b! is added to theimage, the numbers of missed and false positive characfrom the BHMT are not significantly changed. The BHMis typically more computationally efficient with coarsgrid spacing, particularly in they direction.

5 Summary

Translationally invariant methods for pattern matchingscanned document images have no dependencies onconnectedness in either the image or template. We hfocused on the most efficient techniques that require oboolean operations. The basic operation, the HMT~whichshould be called the hit-and-miss transform!, is maximallysensitive to noise inboth FG and BG.

Fortunately, there are ways to increase the noise imnity of the HMT. For extensions that use only boolean oerations~as opposed to linear convolution and rank-ordfilters!, and considering the nature of binary pixel noiseboth images and templates, we have argued that the BHis the best choice.

Considerable attention was devoted to distance methat can be derived from~special cases of! the BHMT. Webegan with the well-known relation between the Hausdometric and blurred template matches, and derived simmetrics for the BHMT.This distance provides a measurethe goodness of fit of the template at every location inimage. Comparing the Hausdorff and BHMT mechanismof action on noisy document images, we showed why~1!the bidirectional symmetry of Hausdorff is problematic a~2! the unidirectional but FG/BG-symmetric BHMT provides immunity to both boundary and random pixel noiAn intuitive presentation of these differences is a primagoal of this paper.

We also showed how the BHMT* metric can be derivedin the gray-scale regime starting with distance functionsthe FG and BG image. For most sensitivity, we chooseeight-connected asymmetric function that is generatedone raster scan and increments the distance to the southeast. These distance functions are then dilated with theand BG templates, and combined using the pixelwise Moperator.

The BHMT is useful for binary document image pattematching tasks. We have shown the results of an expment on pattern matching for characters, to illustrateeffects of FG and BG blur, and of regular subsamplingsthe templates. Regular gridding of the templates givesbetter results than using random subsets, for the same nber of elements chosen. We also discussed methodstruncating the matches; this is much more efficient thusing full erosions at every location. These truncatmethods are also applicable to rank operations, whichbe designed to have fewer matching failures thanBHMT itself.


rs

ele

-

T

s

r

nd

i-

r-r

n

References

1. P. Maragos and R. W. Schafer, ‘‘Morphological filters—Part II: Therelations to median, order-statistic, and stack filters,’’IEEE Trans.Acoust., Speech, Signal Process.ASSP-35, 1170–1184~1987!.

2. D. Zhao and D. G. Daut, ‘‘Shape recognition using morphologitransformation,’’ J. Visual Commun. Image Represent2, 230–243~1991!.

3. S. S. Wilson, ‘‘Training structuring elements in morphological neworks,’’ in Mathematical Morphology in Image Processing, E. R.Dougherty, Ed. Marcel Dekker, New York~1992!.

4. E. Kraus and E. Dougherty, ‘‘Segmentation-free morphological chacter recognition,’’Proc. SPIE2181, 14–23~1994!.

5. A. M. Gillies, ‘‘Automatic gemeration of morphological template features,’’ Proc. SPIE1350, 252–261~1990!.

6. D. S. Bloomberg and P. Maragos, ‘‘Generalized hit–miss opetions,’’ Proc. SPIE1350, 116–128~1990!.

7. J. Serra,Image Analysis and Mathematical Morphology, Academic,New York ~1982!.

8. G. Matheron,Random Sets and Integral Geometry, J. Wiley, NewYork ~1975!.

9. H. J. A. M. Heijmans,Morphological Image Operators, p. 296, Aca-demic, New York~1994!.

10. A. Rosenfeld and J. L. Pfalz, ‘‘Distance functions on digital ptures,’’ Pattern Recogn.1, 33–61~1968!.

11. D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge, ‘‘Coparing images using the Hausdorff distance,’’IEEE Trans. Pattern.Anal. Mach. Intell.15, 850–863~1993!.

12. L. Vincent, ‘‘New trends in morphological algorithms,’’Proc. SPIE1451, 158–169~1991!.

13. L. Vincent, ‘‘Fast opening functions and morphological granuloetries,’’ Proc. SPIE2300, 253–267~1994!.

14. G. Kopec and P. Chou, ‘‘Document image decoding using Marksource models,’’IEEE Trans. Pattern. Anal. Mach. Intell.16, 602–618 ~1994!.

15. K. Popat~private communication!.

Dan S. Bloomberg received the A.B. de-gree in physics from the University of Cali-fornia at Berkeley, and the Ph.D. degree insolid state physics from Simon Fraser Uni-versity, British Columbia, in 1973. He hasbeen with the Xerox Corporation since1976, where he is currently a principal sci-entist at the Palo Alto Research Center. AtXerox he worked initially on theoretical andexperimental aspects of digital magneticand magnetooptical recording systems. His

current research interests are on techniques for, and applicationsenabled by, the analysis of scanned document images. The tech-niques include multiresolution image morphology for segmentationand maximum likelihood methods for document image decoding;typical applications are high accuracy OCR, the use of digital dataon paper, document image summarization, and image cleanup forprint, display and compression. Dr. Bloomberg is a member of theIEEE, SPIE, ACM, AAAS and Computer Professionals for SocialResponsibility.

Luc Vincent received the Engineering De-gree from Ecole Polytechnique, France, in1986, and the Doctorate in MathematicalMorphology from the Ecole des Mines deParis in 1990. Dr. Vincent then worked atHarvard University as a Postdoctoral Fel-low, and joined Xerox in 1991, where hehas worked on various aspects of OCRand document image analysis, with em-phasis on preprocessing, compression,and segmentation issues. From 1996 to

1998, he was Director of Software Development at Xerox DesktopDocument Systems, in Palo Alto, CA. He then joined Scansoft, Inc.,as Director of Advanced Development and his team played a keyrole in the development of core imaging technology for the Pagisproduct line. Luc Vincent joined Xerox’s Palo Alto Research Center(PARC) in 1999, and presently manages the Imaging Componentsgroup of the Advanced Systems Development Laboratory. Over thepast decade, Luc Vincent has been involved in a variety of image




Downl

processing projects, in such areas as medical applications, biology,industrial inspection, oil exploration, oceanography, etc. His mainresearch interests are in image segmentation, feature extraction,and algorithm design. He has published over fifty papers and book



chapters on such topics. Dr. Vincent has chaired over a dozen con-ferences in fields related to image analysis. He is chairman of theupcoming International Symposium on Mathematical Morphology(ISMM ’2000).


Pattern matching using the blur hit–miss transform

Documents

Transcript of Pattern matching using the blur hit–miss transform