Matching Shapes - Electrical Engineering & Computer Sciences

8
Eighth IEEE International Conference on Computer Vision (July 2001) Matching Shapes Serge Belongie, Jitendra Malik and Jan Puzicha Department of Electrical Engineering and Computer Sciences University of California, Berkeley, CA 94720, USA sjb,malik,puzicha @cs.berkeley.edu Abstract We present a novel approach to measuring similar- ity between shapes and exploit it for object recogni- tion. In our framework, the measurement of similar- ity is preceded by (1) solving for correspondences be- tween points on the two shapes, (2) using the correspon- dences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descrip- tor, the shape context, to each point. The shape con- text at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin–plate splines provide a flexi- ble class of transformation maps for this purpose. Dis- similarity between two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework. Results are presented for sil- houettes, trademarks, handwritten digits and the COIL dataset. 1 Introduction Consider the two 5’s in Figure 1. Regarded as vec- tors of pixel brightness values and compared using norms, they are very different. However, regarded as shapes they appear rather similar to a human observer. Our objective in this paper is to operationalize a notion of shape similarity, with the ultimate goal of using that as a basis for category-level recognition. We approach this as a three stage process: (1) solve the correspon- dence problem between the two shapes, (2) use the cor- respondences to estimate an aligning transform, and (3) compute the distance between the two shapes as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transformation. We wish to solve the problem in considerable gener- ality. Shapes are arbitrary 2D figures, e.g. derived from edges extracted in images of 3D objects, not just silhou- ettes. The family of aligning transforms include affine as well as non-rigid smooth transformations, parametrized using thin plate splines. Matching errors between cor- responding points are computed using both shape and local appearance differences. At the heart of our approach is a tradition of match- ing shapes by deformation that can be traced at least as far back as D’Arcy Thompson. In his classic work On Growth and Form [27], Thompson observed that re- lated but not identical shapes can often be deformed into alignment using simple coordinate transformations. Fis- chler and Elschlager [9] operationalized this approach using energy minimization in a mass-spring model. Grenander et al. [13] developed these ideas in a prob- abilistic setting. Yuille’s [31] version of the deformable template concept fitted hand-crafted parametrized mod- els, e.g. for eyes, in the image domain using gradient descent. Von der Malsburg and collaborators [19] used elastic graph matching for aligning faces. Our primary contribution is a simple and robust al- gorithm for finding correspondences between shapes. Shapes are represented by a set of points sampled from the shape contours (typically 100 or so pixel locations sampled from the output of an edge detector are used). There is nothing special about the points. They are not required to be landmarks or curvature extrema, etc.; as we use more samples we obtain ever better approxima- tions to the underlying shape. We introduce a shape de- scriptor, the shape context, to describe the coarse distri- bution of the rest of the shape with respect to a point on the shape. Finding correspondences between two shapes is then equivalent to finding for each sample point on one shape the sample point on the other shape that has the most similar shape context. Maximizing similarities and enforcing uniqueness naturally leads to a setup as a bipartite graph matching (equivalently, optimal assign- ment) problem. As desired, we can incorporate other

Transcript of Matching Shapes - Electrical Engineering & Computer Sciences

Page 1: Matching Shapes - Electrical Engineering & Computer Sciences

Eighth IEEE International Conference on Computer Vision (July 2001)

Matching Shapes

SergeBelongie,JitendraMalik andJanPuzichaDepartmentof ElectricalEngineeringandComputerSciences

Universityof California,Berkeley, CA 94720,USA�sjb,malik,puzicha� @cs.berkeley.edu

Abstract

We presenta novel approach to measuringsimilar-ity betweenshapesand exploit it for object recogni-tion. In our framework, the measurementof similar-ity is precededby (1) solving for correspondencesbe-tweenpointsonthetwoshapes,(2) usingthecorrespon-dencesto estimatean aligning transform. In order tosolvethecorrespondenceproblem,weattach a descrip-tor, the shapecontext, to each point. The shapecon-text at a referencepoint capturesthedistribution of theremainingpointsrelativeto it, thusoffering a globallydiscriminativecharacterization. Correspondingpointson two similar shapeswill havesimilar shapecontexts,enablingus to solvefor correspondencesasan optimalassignmentproblem. Giventhepoint correspondences,weestimatethe transformationthat bestaligns the twoshapes;regularizedthin–platesplinesprovide a flexi-ble classof transformationmapsfor this purpose. Dis-similarity betweentwo shapesis computedasa sumofmatchingerrorsbetweencorrespondingpoints,togetherwith a term measuringthe magnitudeof the aligningtransform. We treat recognition in a nearest-neighborclassificationframework. Resultsare presentedfor sil-houettes,trademarks,handwrittendigits and the COILdataset.

1 Intr oduction

Considerthe two 5’s in Figure1. Regardedasvec-tors of pixel brightnessvaluesandcomparedusing ���norms, they are very different. However, regardedasshapesthey appearrathersimilar to a humanobserver.Our objective in this paperis to operationalizea notionof shapesimilarity, with theultimategoalof usingthatasa basisfor category-level recognition. We approachthis as a threestageprocess:(1) solve the correspon-denceproblembetweenthetwo shapes,(2) usethecor-respondencesto estimateanaligningtransform,and(3)computethedistancebetweenthetwoshapesasasumofmatchingerrorsbetweencorrespondingpoints,together

with a term measuringthe magnitudeof the aligningtransformation.

We wish to solve theproblemin considerablegener-ality. Shapesarearbitrary2D figures,e.g.derivedfromedgesextractedin imagesof 3D objects,not justsilhou-ettes.Thefamily of aligningtransformsincludeaffineaswell asnon-rigidsmoothtransformations,parametrizedusingthin platesplines. Matchingerrorsbetweencor-respondingpoints are computedusing both shapeandlocalappearancedifferences.

At the heartof our approachis a traditionof match-ing shapesby deformationthat can be tracedat leastas far backasD’Arcy Thompson. In his classicworkOn GrowthandForm [27], Thompsonobservedthatre-latedbut not identicalshapescanoftenbedeformedintoalignmentusingsimplecoordinatetransformations.Fis-chler and Elschlager[9] operationalizedthis approachusing energy minimization in a mass-springmodel.Grenanderet al. [13] developedtheseideasin a prob-abilistic setting.Yuille’s [31] versionof thedeformabletemplateconceptfitted hand-craftedparametrizedmod-els, e.g. for eyes, in the imagedomainusing gradientdescent.Von derMalsburg andcollaborators[19] usedelasticgraphmatchingfor aligningfaces.

Our primary contribution is a simpleandrobust al-gorithm for finding correspondencesbetweenshapes.Shapesarerepresentedby a setof pointssampledfromthe shapecontours(typically 100 or so pixel locationssampledfrom the outputof an edgedetectorareused).Thereis nothingspecialaboutthepoints. They arenotrequiredto be landmarksor curvatureextrema,etc.; aswe usemoresampleswe obtainever betterapproxima-tionsto theunderlyingshape.We introducea shapede-scriptor, theshapecontext, to describethecoarsedistri-bution of therestof theshapewith respectto a point ontheshape.Findingcorrespondencesbetweentwo shapesis then equivalent to finding for eachsamplepoint ononeshapethe samplepoint on the othershapethathasthemostsimilar shapecontext. Maximizing similaritiesandenforcinguniquenessnaturallyleadsto a setupasabipartitegraphmatching(equivalently, optimal assign-ment) problem. As desired,we can incorporateother

Page 2: Matching Shapes - Electrical Engineering & Computer Sciences

Figure1. Examplesof two handwrittendigits.

sourcesof matchinginformationreadily, e.g.similarityof localappearanceat correspondingpoints.

Given the correspondencesat samplepoints,we ex-tendthe correspondenceto the completeshapeby esti-matingan aligning transformationthatmapsoneshapeontotheother. Thetransformationscanbepickedfromany of a numberof families– we have usedEuclidean,affineandregularizedthin platesplinesin variousappli-cations. Oncethe shapesarealigned,computingsim-ilarity scoresandrecognitionby � -NN classificationisrelatively straightforward.

We demonstrateobjectrecognitionin a wide varietyof settings.We dealwith 2D objects,e.g. the MNISTdatasetof handwrittendigits (Fig. 5), silhouettes,andtrademarks(Fig. 7), as well as 3D objects from theColumbiaCOIL dataset,modeledusingmultiple views(Fig. 6). Thesearewidely usedbenchmarksandourap-proachturnsout to be the leadingperformeron all theproblemsfor which thereis comparativedata.

Thestructureof this paperis asfollows. We discussrelatedwork in Section2. In Section3 we thendescribeour shapematchingmethodin detail. Our transforma-tion model is discussedin Section4. We thendiscussthe problemof measuringshapesimilarity in Section5anddemonstrateour proposedmeasureon a variety ofdatabasesincluding handwrittendigits and picturesof3D objects.Finally, we concludein Section6.

2 Prior Work on ShapeMatching

An extensive survey of shapematchingin computervision canbefoundin [28]. Broadlyspeaking,therearetwo approaches:(1) feature-based,and(2) brightness-based.

Feature-basedapproachesinvolve the useof spatialarrangementsof extracted featuressuch as edgesorjunctions. Silhouetteshave beendescribed(and com-pared)usingFourierdescriptors,e.g.[32], skeletonsde-rived usingBlum’s medialaxis transform [26], or di-rectly matchedusing dynamicprogramminge.g. [11].Since silhouettesare limited as shapedescriptorsforgeneral objects1, other approaches[14, 10] treat the

1They ignoreinternalcontoursandaredifficult to extractfrom realimages.

shapeasa setof points in the 2D image,extractedus-ing, say, an edgedetector. Amit and Geman[1] findkey pointsor landmarks,andrecognizeobjectsusingthespatialarrangementsof point sets.However not all ob-jectshave distinguishedkey points(think of a circle forinstance),andusingkey pointsalonesacrificestheshapeinformationavailablein smoothportionsof objectcon-tours. Most closelyrelatedto our approachis theworkof Rangarajanand collaborators[12, 7], which is dis-cussedin Section3.2.

Brightness-basedapproachesmake more direct useof pixel brightnessvalues.Severalapproaches[19, 29, 8]first attemptto find correspondencesbetweenthe twoimages,beforedoing the comparison. This turns outto bequitea challengeasdifferentialopticalflow tech-niquesdo not copewell with the large distortionsthatmustbehandleddueto pose/illuminationvariations.Er-rors in finding correspondencewill causedownstreamprocessingerrorsin the recognitionstage.As an alter-native, therearea numberof methodsthat build clas-sifiers without explicitly finding correspondences.Insuchapproaches,onereliesona learningalgorithmhav-ing enoughexamplesto acquirethe appropriateinvari-ances.Someexamplesinclude[21, 6] for handwrittendigit recognition,[22] for facerecognition,andisolated3D objectrecognition[24].

3 Matching with ShapeContexts

In our approach,a shapeis representedby a discretesetof pointssampledfrom the internalor externalcon-tourson the shape.Thesecanbe obtainedaslocationsof edgepixelsasfoundby anedgedetector, giving usaset ��� � � � � � � � ��� � , ��������� , of � points.They neednot,andtypically will not,correspondto key-pointssuchas maximaof curvatureor inflection points. We pre-fer to samplethe shapewith roughly uniform spacing,thoughthis is alsonot critical. Fig. 2(a,b)showssamplepointsfor two shapes.Assumingcontoursarepiecewisesmooth,we canobtainasgoodanapproximationto theunderlyingcontinuousshapesasdesiredby picking � tobesufficiently large.

For eachpoint ��� on thefirst shape,we want to findthe“best” matchingpoint � � on thesecondshape.Thisis acorrespondenceproblemsimilarto thatin stereopsis.Experiencetheresuggeststhatmatchingis easierif oneusesa rich local descriptor, e.g.a grayscalewindow oravectorof filter outputs,insteadof just thebrightnessata singlepixel or edgelocation. Rich descriptorsreducetheambiguityin matching.

As a key contribution we proposea descriptor, the

Page 3: Matching Shapes - Electrical Engineering & Computer Sciences

(a) (b) (c)

θ

log

r

θ

log

r

θ

log

r

(d) (e) (f) (g)

Figure2. Shapecontext computationandmatching.(a,b)Sampled

edgepointsof two shapes.(c) Diagramof log-polarhistogrambins

usedin computingtheshapecontexts. Weuse5 binsfor � � ��� and12

binsfor . (d-f) Exampleshapecontexts for referencesamplesmarked

by ! " # " $ in (a,b). Eachshapecontext is a log-polarhistogramof the

coordinatesof the restof the point setmeasuredusingthe reference

point asthe origin. (Dark=large value.) Note thevisual similarity of

the shapecontexts for ! and # , which werecomputedfor relatively

similar pointson thetwo shapes.By contrast,theshapecontext for $is quitedifferent.(g) Correspondencesfoundusingbipartitematching,

with costsdefinedby the % & distancebetweenhistograms.

shapecontext, that could play such a role in shapematching.Considerthesetof vectorsoriginatingfrom apoint to all othersamplepointson a shape.Thesevec-tors expressthe configurationof the entireshaperela-tive to the referencepoint. Obviously, this setof ')(+*vectorsis a rich description,sinceas ' getslarge, therepresentationof theshapebecomesexact.

Thefull setof vectorsasa shapedescriptoris muchtoo detailedsinceshapesandtheir sampledrepresenta-tion may vary from one instanceto anotherin a cate-gory. We identify thedistributionoverrelativepositionsasa morerobustandcompact,yet highly discriminativedescriptor. For a point ,�- on the shape,we computeacoarsehistogram.�- of therelativecoordinatesof there-maining '/(0* points,.�- 1 2 3547608 9�:4;,�-=<�1 9>(�,�- 3�? bin 1 2 3 @BA (1)

This histogramis definedto betheshapecontext of ,�- .The descriptorshouldbe moresensitive to differencesin nearbypixels. We thus proposeto usea log-polarcoordinatesystem.An exampleis shown in Fig. 2(c).

Considera point ,�- on the first shapeanda point 9 Con the secondshape. Let DE- C;4FDG1 ,�- H 9 C 3 denotethecost of matchingthesetwo points. As shapecontextsaredistributionsrepresentedashistograms,it is natural2

to usethe I�J teststatistic:

DE- C>4 *K LMN O�P Q .�- 1 2 3�(;. C 1 2 3 R J.�- 1 2 3TSU. C 1 2 3where .�- 1 2 3 and . C 1 2 3 denotethe V -bin normalizedhistogramat ,�- and 9 C , respectively.

Giventhesetof costsD�- C betweenall pairsof pointsWonthefirst shapeandX onthesecondshapewewantto

minimize the total costof matchingsubjectto the con-straint that the matchingbe one-to-one.This is an in-stanceof the squareassignment(or weightedbipartitematching)problem,whichcanbesolvedin YG1 Z)[ 3 timeusingtheHungarianmethod.In ourexperiments,weusethemoreefficientalgorithmof [17]. Theinput to theas-signmentproblemis a squarecost matrix with entriesD�- C . Theresultis a permutation\]1 W 3 suchthat thesum^ - D - _ ` a - b is minimized.

When the numberof sampleson two shapesis notequal, the cost matrix can be madesquareby adding“dummy” nodesto eachpointsetwith aconstantmatch-ing cost of c d . The sametechniquemay alsobe usedeven when the samplenumbersare equalto allow forrobusthandlingof outliers. In this case,a point will bematchedto a “dummy” whenever thereis no realmatchavailable at smallercost than c d . Thus, c d can be re-gardedasa thresholdparameterfor outlierdetection.

Thecost DE- C for matchingpointscaninclude,anad-ditional term basedon the local appearancesimilarityat points ,�- and 9 C . This is particularly useful whenwe are comparingshapesderived from gray-level im-agesinsteadof line drawings.For example,onecanaddacostbasedoncoloror texturesimilarity, SSDbetweensmallgray-scalepatches,distancebetweenvectorsof fil-teroutputs,similarity of tangentangles,andsoon.

3.1 Invarianceand Robustness

A matchingapproachshouldbe (1) invariantunderscalingandtranslation,and(2) robustundersmallaffinetransformations,occlusionandpresenceof outliers. Incertainapplications,onemaywantcompleteinvarianceunderrotation,or perhapseven the full groupof affinetransformations.We now evaluateshapecontext match-ing by thesecriteria.

2Alternatives includeBickel’s generalizationof the Kolmogorov-Smirnov testfor 2D distributions[4], whichdoesnot requirebinning.

Page 4: Matching Shapes - Electrical Engineering & Computer Sciences

Invarianceto translationis intrinsic to theshapecon-text definitionsinceall measurementsaretakenwith re-spectto pointsontheobject.To achievescaleinvariancewe normalizeall radial distancesby the meandistancee betweenthe fTg pointpairsin theshape.

Sinceshapecontexts areextremelyrich descriptors,they areinherentlyinsensitive to smallperturbationsofpartsof the shape.While we have no theoreticalguar-anteeshere,robustnessto small affine transformations,occlusionsandpresenceof outliersis evaluatedexperi-mentallyin Sect.4.1.

In the shapecontext framework, we canprovide forcompleterotationinvarianceif thisis desirablefor anap-plication. Insteadof usingtheabsoluteframefor com-puting the shapecontext at eachpoint, onecanusethetangentvectorat eachpoint as the positive h -axis. Inthis way the referenceframeturnswith the tangentan-gle, andtheresultis a completelyrotationinvariantde-scriptor. In the extendedversionof this paper[3] wedemonstratethis experimentallyusingthe datasetfromKimia andcollaborators[26].

3.2 Relatedwork

The most comprehensive body of work on shapecorrespondencein this generalsetting is the work ofRangarajanandcollaborators[12, 7]. They developedan iterative optimizationalgorithm to determinepointcorrespondencesandunderlyingimagetransformationsjointly, where typically some generic transformationclassis assumed,e.g.affine or thin platesplines. Thecostfunction that is beingminimizedis thesumof Eu-clideandistancesbetweena point on the transformedfirst shapeandthesecondshape.Thissetsupachicken-and-egg problem: the distancesmake senseonly whenthereis at leastaroughalignmentof shape.Jointestima-tion of correspondencesandshapetransformationleadsto a difficult, highly non-convex optimizationproblem,which is addressedusingdeterministicannealing[12].Theshapecontext is averydiscriminativepointdescrip-tor, facilitatingeasyandrobustcorrespondencerecoveryby incorporatingglobal shapeinformation into a localdescriptor.

As far aswe areawareof, theshapecontext descrip-tor and its usefor matching2D shapesis novel. Themostcloselyrelatedideain pastwork is thatdueto John-sonandHebert[16] in theirwork onrangeimages.Theyintroducedarepresentationfor matchingdensecloudsoforiented3D pointscalledthe “spin image”. A spin im-ageisa2D histogramformedbyspinningaplanearoundanormalvectoronthesurfaceof theobjectandcounting

thepointsthatfall insidebinsin theplane.

4 Modeling Transformations

Givena setof correspondencesbetweentwo shapes,onecanproceedto estimatea transformationthatmapsthemodelinto thetarget.For thispurposetherearesev-eraloptions;perhapsmostcommonis theaffine model.In this work, we usethe thin platespline(TPS)model,which is commonlyusedfor representingflexible co-ordinatetransformations[30, 25]. Bookstein[5], forexample, found it to be highly effective for modelingchangesin biologicalforms. Thethin platesplineis the2D generalizationof thecubicspline. In its regularizedform, which is discussedbelow, theTPSmodelincludestheaffine modelasa specialcase.We will now providesomebackgroundinformationon theTPSmodel.

Let i j denotethe target function values at corre-spondinglocations k�jlnm h j o p j q in the plane, withr lts o u o v v v o f . In particular, we will set i j equalto h wj and p wj in turn to obtainonecontinuoustransfor-mation for eachcoordinate. We assumethat the loca-tions m h�j o p j q areall differentandarenot collinear. TheTPSinterpolant x�m hTo p q minimizesthe bendingenergyy z l+{ {|x g} }E~ u x g} �E~ x g� � � h � p andhastheform:x�m h�o p q�l�� � ~ � } h ~ � � p~��� j ��� � j �Um � m h�j o p j q]�Um h�o p q � q (2)

where �Gm � q;l�� g � � � � . In order for x�m h�o p q to havesquareintegrablesecondderivatives,we requirethat�� j �T� � j)l�� and �� j ��� � j h j)l �� j �T� � j p j)l�� (3)

Togetherwith the interpolationconditions,x�m h�j o p j q�li j , this yieldsa linearsystemfor theTPScoefficients:��� ��>� ��� � � ��� l � i �U� (4)

where� j ��l�Gm � m h j o p j q��;m h�� o p � q � q , the

rth row of

�is m s o h j o p j q , � and i arecolumnvectorsformedfrom� j and i j , respectively, and � is thecolumnvectorwithelements� � o � } o � � . Wewill denotethe m f ~|� q ��m f ~|� qmatrix of this systemby � . As discussede.g.in [25], �is nonsingularandwe canfind thesolutionby inverting� . If wedenotetheupperleft f)�Gf blockof �E  � by ¡ ,thenit canbeshown that

y zG¢ i � ¡Ei|l � ��� � .

Page 5: Matching Shapes - Electrical Engineering & Computer Sciences

10 20 30 40 50 60 70

10

20

30

40

50

60

7010 20 30 40 50 60 70

10

20

30

40

50

60

7010 20 30 40 50 60 70

10

20

30

40

50

60

70

10 20 30 40 50 60 70

10

20

30

40

50

60

7010 20 30 40 50 60 70

10

20

30

40

50

60

7010 20 30 40 50 60 70

10

20

30

40

50

60

70

Figure3. Illustrationof thematchingprocessappliedto theexample

of Fig. 1. Top row: 1st iteration. Bottomrow: 5th iteration. Left col-

umn:estimatedcorrespondencesshown relative to transformedmodel,

with tangentvectorsshown. Middle column: estimatedcorrespon-

dencesshown relative to untransformedmodel. Right column: result

of transformingthemodelbasedon thecurrentcorrespondences;this

is theinput to thenext iteration.Thegrid pointsillustratetheinterpo-

latedtransformationover £�¤ . Herewe have useda regularizedTPS

modelwith ¥ ¦]§)¨ .When thereis noisein the specifiedvalues © ª , one

maywishto relaxtheexactinterpolationrequirementbymeansof regularization.This is accomplishedby mini-mizing «;¬ ­�® ¯°0±ª ²T³ ´ © ªTµ¶­ ´ · ª ¸ ¹ ª º º »�¼U½�¾ ¿ . Thereg-ularization parameter ½ , a positive scalar, controlstheamountof smoothing;thelimiting caseof ½�¯+À reducesto exact interpolation.As demonstratedin [30], we cansolve for theTPScoefficientsin theregularizedcasebyreplacingthematrix Á by Á�¼)½�¾ , where¾ is the Â/Ã|Âidentity matrix. It is interestingto notethat the highlyregularizedTPSmodeldegeneratesto the least-squaresaffinemodel.

To addressthe dependenceof ½ on the datascale,suppose · ª ¸ ¹ ª º and ´ ·�Ī ¸ ¹ Ī º arereplacedby ´ ÅT· ª ¸ Å ¹ ª ºand ´ ÅT·�Ī ¸ Å ¹ Ī º , respectively, for somepositive constantÅ . Then it can be shown that the parametersÆG¸ Ç�¸ ¾ ¿of theoptimal thin platesplineareunaffectedif ½ is re-placedby Å » ½ . This simplescalingbehavior suggestsa normalizeddefinitionof the regularizationparameter.Let Å againrepresentthe scaleof the point setasesti-matedby the meanedgelengthbetweentwo points inthe set. Thenwe candefine ½ in termsof Å and ½�È , ascale-independentregularizationparameter, via thesim-ple relation ½/¯ Å » ½�È .

Thecompletematchingalgorithmis obtainedby al-ternating betweenthe stepsof recovering correspon-dencesand estimating transformations(see Fig. 3).We usually employ a fixed numberof iterations,typi-cally threein largescaleexperiments,but morerefinedschemesarepossible.OnaregularPentiumIII 500MHz

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

0.7 0.8 0.9 1 1.1 1.2 1.3 1.40.6

0.8

1

1.2

1.4

1.6

1.8

2

0.2 0.4 0.6 0.8 1 1.2 1.40.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

−0.2 0 0.2 0.4 0.6 0.8 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

0.02 0.04 0.06 0.08−0.05

0

0.05

0.1

0.15

0.2

Degree of Deformation

Err

or

−0.02 0 0.02 0.04 0.06−0.05

0

0.05

0.1

0.15

Noise Level

Err

or

−1 −0.5 0 0.5 1 1.5 2 2.5 3−0.05

0

0.05

0.1

0.15

0.2

0.25

Outlier to Data Ratio

Err

or

0.02 0.04 0.06 0.08−0.05

0

0.05

0.1

0.15

0.2

Degree of Deformation

Err

or

−0.02 0 0.02 0.04 0.06−0.05

0

0.05

0.1

0.15

0.2

Noise Level

Err

or

−1 −0.5 0 0.5 1 1.5 2 2.5 3−0.05

0

0.05

0.1

0.15

0.2

Outlier to Data Ratio

Err

or

Figure 4. Empirical robustnessevaluation, following [7]. Two

model pointsetsare shown in the first column of rows 1 and 2.

Columns2-4 show examplesof point setsfor thedeformation,noise,

andoutlier tests.Row 3 shows errorasa functionof thedeformation,

noise,or outlier to dataratio for ourmethod( É ), [7]’smethod( Ê ) and

iteratedclosestpoint ( Ë ) for thefish shapein row 1. Row 4 shows the

resultsfor theChinesecharacterin row 2. Theerrorbarsindicatethe

std.dev. of theerrorover 100randomtrials.

workstationthis processtakesroughly200mswhentheshapeshave100samplepointseach.

4.1 Empirical RobustnessEvaluation

In order to study the robustnessof our proposedmethod,we performedthesyntheticpoint setmatchingexperimentsdescribedin [7]. Theexperimentsarebro-ken into threepartsdesignedto measurerobustnesstodeformation,noise,andoutliers. (The latter testseachincludea “moderate”amountof deformation.)In eachtest,wesubjectedthemodelpointsetto oneof theabovedistortionsto createa“target” pointset.Wethenranouralgorithm to find the bestwarping betweenthe modelandthetarget. Finally, theperformanceis quantifiedbycomputingtheaveragedistancebetweenthecoordinatesof thewarpedmodelandthoseof thetarget.Theresultsare shown in Fig. 4. More detailsof the experimentsmaybefoundin [2].

Page 6: Matching Shapes - Electrical Engineering & Computer Sciences

5 ShapeSimilarity and Recognition

We define the shape distance Ì)Í Î|Ï Ð�Ñ betweenshapesÎ and Ð asaweightedsumof threeterms:shapecontext distance,imageappearancedistanceandbend-ing energy. We will demonstratethe useof this dis-tancefor recognitionin a nearest-neighborclassifierfora numberof differentobjectrecognitionproblems.

We measureshapecontext distancebetweenshapesÎ and Ð as the symmetric sum of shape con-text matching costs over best matching points, i.e.Ì sc Í Î�Ï Ð�ÑÓÒÕÔÖG×0Ø Ù Ú�Û Ü Ý�Þ|ß à á Ù âGã Í äTÏ å¶Í æ Ñ Ñ;çÔè × á Ù â Û Ü Ý]ÞGß à Ø Ù Ú�ã Í äTÏ å;Í æ Ñ Ñ where å�Í é Ñ denotestheestimatedTPSshapetransformation.

Often there is additional appearanceinformationavailable that is not capturedby our notion of shape,e.g.local imagepatches,textural information,color, etc.As a key benefitof the shapematchingframework, thedistortedimagecanbewarpedbackinto a normalformafter recovery of the underlying2D imagetransforma-tion, thuscorrectingfor distortionsof theimageappear-ance. We useda term Ì ac Í Î�Ï Ð�Ñ for appearancecostwhichis thesumof squareddifferencesin Gaussianwin-dowsaroundcorrespondingpoints.

Thethird termcorrespondsto the ‘amount’ of trans-formationnecessaryto align theshapes.In theTPScasethe bendingenergy Ì be Í Î�Ï Ð�Ñ�ÒBê>ëTì)ê is a naturalmeasure.

5.1 Digit Recognition

We begin with resultson the well-known MNISTdatasetof handwrittendigits, which consistsof 60,000training and10,000testdigits[21]. Matchingused100point samplesselectedfrom the Canny edgesof eachdigit image.We employeda TPStransformationmodelandused3 iterationsof shapecontext matchingandTPSre-estimation.WeusedanearestneighborclassifierwithÌ/Í Î�Ï Ð�Ñ asdefinedabove.

Nearestneighborclassifiershave thepropertythatasthe numberof examples í in the training set îðï ,the1-NN error convergesto a value ñ�ò óGô , where óGôis the BayesRisk (for õ -NN, by making õîöï andõ�÷ í0î�ø , the error î�óGô ). However, what mattersinpracticeis theperformancefor small í , andthisgivesusawayto comparedifferentsimilarity/distancemeasures.In Fig. 5, our shapedistanceis comparedto SSD(sumof squareddifferencesbetweenpixel brightnessvaluesof imagesregardedasvectors).

On the MNIST dataset nearly 30 algorithmshave been compared (http://www.research.att.com/

0 2000 4000 6000 8000 100000

0.05

0.1

0.15

0.2

0.25

0.3

size of training set

test

set

err

or r

ate

SSDSD

103

104

0.01

0.02

0.03

0.04

0.05

0.06

size of training set

test

set

err

or r

ate

K=1K=3K=5

Figure5.Handwrittendigit recognitionontheMNIST dataset.Left:Testseterrorsof a1-NNclassifierusingSSDandShapeDistance(SD)measures.Right: Detail of performancecurve for ShapeDistance,includingresultswith trainingsetsizesof 15,000and20,000.Resultsareshown onasemilog-ù scalefor ú0û)ü ý þ ý ÿ nearestneighbors.

� yann/exdb/mnist/index.html). Thelowesttestseterrorratepublishedat this timeis ø � � � for aboostedLeNet-4with a training set of size � ø Ï ø ø ø���� ø syntheticdis-tortionsper training digit. Our error rateusing20,000trainingexamplesand -NN is ø � � � .

5.2 MPEG-7 ShapeSilhouetteDatabase

Our next experiment involves the MPEG-7 shapesilhouettedatabase,specificallyCore ExperimentCE-Shape-1 part B, which measuresperformance ofsimilarity-basedretrieval [15]. Thedatabaseconsistsof1400images:70 shapecategories,20 imagesper cate-gory. Theperformanceis measuredusingtheso-called“bullseye test,” in which eachimageis usedasa queryandonecountsthenumberof correctimagesin thetop40 matches.

As this experimentinvolves intricateshapeswe in-creasedthe numberof samplesfrom 100 to 300. Insomecategoriesthe shapesappearrotatedandflipped,which we addressusing a modified distancefunction.The distancedistÍ GÏ ��Ñ betweena referenceshapeanda queryshape� is definedas

distÍ �|Ï >Ñ�Ò Þ|ß à � distÍ �|Ï �� Ñ Ï distÍ �|Ï �� Ñ Ï distÍ �|Ï �� Ñ �where � Ï � and � denotethreeversionsof : un-changed,verticallyflipped,andhorizontallyflipped.

With thesechangesin placebut otherwiseusingthesameapproachasin the MNIST digit experiments,weobtainaretrieval rateof 76.51%.Currentlythebestpub-lished performanceis achieved by Latecki et al. [20],with a retrieval rateof 76.45%,followedby Mokhtarianetal. [23] at 75.44%.

Page 7: Matching Shapes - Electrical Engineering & Computer Sciences

0 2 4 6 8 10 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

average no. of prototypes per object

test

set

err

or r

ate

SSD SD SD−proto

Figure6. Left: 3D object recognitionusing the COIL-20 dataset.

Comparisonof testseterrorfor SSD,ShapeDistance(SD),andShape

Distancewith � -medoidprototypes(SD-proto)vs. numberof proto-

type views. For SSD and SD, we varied the numberof prototypes

uniformly for all objects.For SD-proto,thenumberof prototypesper

objectdependedonthewithin-objectvariationaswell asthebetween-

objectsimilarity. Right: � -medoidprototypeviews for two different

3D objects,usinganaverageof 4 viewsperobject.With thisapproach,

resourcesareallocatedadaptively dependingon thevisualcomplexity

of anobject. In this exampleweobserve thattheAnacinbox requires

twiceasmany views asthebabypowderbottle.

5.3 Columbia COIL-20 Database

Ournext experimentinvolvesthe20commonhouse-hold objectsfrom theCOIL-20 database[24]. Eachob-ject wasplacedon a turntableandphotographedevery� �

for a total of 72 views per object. We preparedourtraining setsby selectinga numberof equally spacedviews for eachobjectandusingtheremainingviews fortesting.Thematchingalgorithmandshapedistanceareexactly thesameasfor digits.

Fig. 6(a) shows the performanceof a � -NN classi-fier using our shapedistanceas well as SSD (sum ofsquareddifferences).SSD performsvery well on thiseasydatabasedueto thelackof variationin lighting [14](PCA just makesit faster).

In a companionpaper[2] we recentlydevelopedanovel editingalgorithmbasedon shapecontext similar-ity and � -medoidclustering.Theeditingalgorithmis il-lustratedin Fig.6(b). Moreviewsarechosenfor visuallycomplex categories.This ideais relatedto the“aspect”conceptas discussedin [18]. The curve marked SC-protoin Fig. 6(a)showstheimprovedclassificationper-formanceusingthis prototypeselectionstrategy insteadof equally-spacedviews. Notethatweobtaina2.4%er-ror ratewith anaverageof only 4 two-dimensionalviewsfor eachthree-dimensionalobject,thanksto theflexibil-ity providedby thematchingalgorithm.

5.4 Trademark Retrieval

The automaticidentificationof trademarkinfringe-ment is of commercialinterest. Currently, trademarksare broadly classifiedaccordingto the Vienna code,andinfringementsaredetectedby manuallylooking forcloseperceptualsimilarity in an appropriatecategory.Shape,togetherwith text and texture, is key in defin-ing perceptualsimilarity. Usingournotionof shapedis-tance,Fig. 7 depictsnearestneighborretrieval resultsfrom a databaseof 300 trademarks.We experimentedwith eightdifferentquerytrademarksfor eachof whichthe databasecontainedat leastone potential infringe-ment. It is clearlyseenthat thepotentialinfringementsareeasilydetectedandappearasmostsimilaronthetopranksdespitesubstantialvariationof the actualshapes.It hasbeenmanually verified that no visually similartrademarkhasbeenmissedby thealgorithm.

6 Conclusion

We havepresenteda new approachto theanalysisofshape. A key characteristicof our approachis the es-timationof shapesimilarity andcorrespondencesbasedon a novel descriptor, theshapecontext. In our experi-mentswe have demonstratedexcellentperformanceonawide varietyof datasets,bothof 2D and3D objects.

Acknowledgments This research is supported by(ARO) DAAH04-96-1-0341,the Digital Library GrantIRI-9411334,an NSF graduateFellowship for S.B andthe GermanResearchFoundationby DFG grant PU-165/1. We wish to thank H. Chui and A. Rangarajanfor providing thesynthetictestingdatausedin 4.1.

References

[1] Y. Amit, D. Geman,and K. Wilder. Joint induction of shapefeaturesandtreeclassifiers. IEEE Trans.PAMI, 19(11):1300–1305,November1997.

[2] S. Belongie,J. Malik, andJ. Puzicha. Shapecontext: A newdescriptorfor shapematchingandobjectrecognition. In NIPS,November2000.

[3] S.Belongie,J.Malik, andJ.Puzicha.Shapematchingandobjectrecognitionusingshapecontexts. TechnicalReportUCB//CSD-00-1128,UC Berkeley, January2001.

[4] P. J. Bickel. A distribution free versionof the Smirnov two-sampletest in the multivariatecase. Annalsof MathematicalStatistics, 40:1–23,1969.

[5] F. L. Bookstein.Morphometrictoolsfor landmarkdata: geom-etry andbiology. CambridgeUniv. Press,1991.

Page 8: Matching Shapes - Electrical Engineering & Computer Sciences

query 1: 0.086 2: 0.108 3: 0.109

query 1: 0.066 2: 0.073 3: 0.077

query 1: 0.046 2: 0.107 3: 0.114

query 1: 0.046 2: 0.107 3: 0.114

query 1: 0.117 2: 0.121 3: 0.129

query 1: 0.096 2: 0.147 3: 0.153

query 1: 0.078 2: 0.116 3: 0.122

query 1: 0.092 2: 0.10 3: 0.102

Figure7. Trademarkretrieval resultsbasedonadatabaseof 300dif-

ferentreal-world trademarks.Weusedanaffine transformationmodel

anda weightedcombinationof shapecontext similarity � sc andthe

sumover local tangentorientationdifferences.

[6] C. BurgesandB. Scholkopf. Improving theaccuracy andspeedof supportvectormachines.In NIPS, pages375–381,1997.

[7] H. ChuiandA. Rangarajan.A new algorithmfor non-rigidpointmatching.In CVPR, volume2, pages44–51,June2000.

[8] T. Cootes,D. Cooper, C. Taylor, andJ. Graham. Active shapemodels- their training and application. ComputerVision andImage Understanding(CVIU), 61(1):38–59,Jan.1995.

[9] M. FischlerandR. Elschlager. Therepresentationandmatchingof pictorial structures.IEEE Trans.Computers, C-22(1):67–92,1973.

[10] D. Gavrila andV. Philomin.Real-timeobjectdetectionfor smartvehicles.In Proc.7th Int. Conf. ComputerVision, pages87–93,1999.

[11] Y. GdalyahuandD. Weinshall. Flexible syntacticmatchingofcurvesandits applicationto automatichierarchicalclassificationof silhouettes.IEEE Trans.PAMI, 21(12):1312–1328,1999.

[12] S. Gold, A. Rangarajan,C.-P. Lu, S. Pappu,andE. Mjolsness.New algorithmsfor 2D and3D point matching:poseestimationandcorrespondence.PatternRecognition, 31(8),1998.

[13] U. Grenander, Y. Chow, and D. Keenan. HANDS: A PatternTheoretic StudyOf Biological Shapes. Springer, 1991.

[14] D. Huttenlocher, R. Lilien, andC. Olson. View-basedrecogni-tion usingan eigenspaceapproximationto the Hausdorff mea-sure.PAMI, 21(9):951–955,Sept.1999.

[15] S. JeanninandM. Bober. Descriptionof coreexperimentsforMPEG-7 motion/shape.TechnicalReport ISO/IEC JTC 1/SC29/WG11MPEG99/N2690,MPEG-7,Seoul,March1999.

[16] A. E. JohnsonandM. Hebert.Recognizingobjectsby matchingorientedpoints. In CVPR, pages684–689,1997.

[17] R. Jonker andA. Volgenant. A shortestaugmentingpathalgo-rithm for denseandsparselinearassignmentproblems.Comput-ing, 38:325–340,1987.

[18] J.J.KoenderinkandA. J.vanDoorn.Theinternalrepresentationof solid shapewith respectto vision. Biological Cybernetics,32:211–216,1979.

[19] M. Lades,C. Vorbruggen,J. Buhmann,J. Lange,C. von derMalsburg, R. Wurtz, and W. Konen. Distortion invariant ob-ject recognitionin the dynamiclink architecture. IEEE Trans.Computers, 42(3):300–311,March1993.

[20] L. J. Latecki, R. Lakamper, and U. Eckhardt. Shapedescrip-torsfor non-rigidshapeswith asingleclosedcontour. In CVPR,pages424–429,2000.

[21] Y. LeCun,L. Bottou,Y. Bengio,andP. Haffner. Gradient-basedlearningappliedto documentrecognition. Proceedingsof theIEEE, 86(11):2278–2324,November1998.

[22] B. Moghaddam,T. Jebara,and A. Pentland. Bayesianfacerecognition.PatternRecognition, 33(11):1771–1782,November2000.

[23] F. Mokhtarian,S. Abbasi, and J. Kittler. Efficient and robustretrieval by shapecontentthrough curvature scalespace. InA. W. M. SmeuldersandR. Jain,editors,Image DatabasesandMulti-MediaSearch, pages51–58.World Scientific,1997.

[24] H. MuraseandS. Nayar. Visual learningandrecognitionof 3-D objectsfrom appearance.Int. Journal of ComputerVision,14(1):5–24,Jan.1995.

[25] M. J.D. Powell. A thin platesplinemethodfor mappingcurvesinto curvesin two dimensions.In ComputationalTechniquesandApplications(CTAC95), Melbourne,Australia,1995.

[26] D. Sharvit,J.Chan,H. Tek, andB. Kimia. Symmetry-basedin-dexing of imagedatabases.J. VisualCommunicationandImageRepresentation, 1998.

[27] D. W. Thompson.OnGrowthandForm. Dover, 1917.

[28] R. C. Veltkampand M. Hagedoorn. Stateof the art in shapematching.TechnicalReportUU-CS-1999-27,Utrecht,1999.

[29] T. Vetter, M. J.Jones,andT. Poggio.A bootstrappingalgorithmfor learninglinear modelsof object classes. In CVPR, pages40–46,1997.

[30] G. Wahba.SplineModelsfor ObservationalData. SIAM, 1990.

[31] A. Yuille. Deformabletemplatesfor facerecognition.J. Cogni-tiveNeuroscience, 3(1):59–71,1991.

[32] C. Zahn and R. Roskies. Fourier descriptorsfor planeclosedcurves. IEEETrans.Computers, 21(3):269–281,March1972.