IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf ·...

16
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 943 Nonlinear Shape Registration without Correspondences Csaba Domokos, Jozsef Nemeth, and Zoltan Kato Senior member, IEEE Abstract—In this paper, we propose a novel framework to estimate the parameters of a diffeomorphism that aligns a known shape and its distorted observation. Classical registration methods first establish correspondences between the shapes and then compute the transformation parameters from these landmarks. Herein, we trace back the problem to the solution of a system of nonlinear equations which directly gives the parameters of the aligning transformation. The proposed method provides a generic framework to recover any diffeomorphic deformation without established correspondences. It is easy to implement, not sensitive to the strength of the deformation, and robust against segmentation errors. The method has been applied to several commonly used transformation models. The performance of the proposed framework has been demonstrated on large synthetic datasets as well as in the context of various applications. Index Terms—Image registration, diffeomorphism, nonlinear transformation, planar homography, thin plate spline, shape matching. 1 I NTRODUCTION R EGISTRATION is a fundamental problem in various fields of image processing where images taken from different views, at different times, or by different sensors need to be compared or combined. In a general setting, one is looking for a transformation which aligns two images such that one image (called the observation) becomes similar to the second one (called the template). Most of the existing approaches assume a linear transformation (rigid-body, similarity, affine) between the images, but in many applications nonlinear de- formations [1] (e.g. projective, polynomial, elastic) need to be considered. Typical applications include visual inspection [2], object matching [3] and medical image analysis [4]. Good surveys can be found in [5], [6]. Registration methods can be divided into two main cat- egories: Landmark-based methods and featureless (or area- based) approaches. Landmark-based methods rely on ex- tracted corresponding landmarks [5], [7], then the aligning transformation is recovered as a solution of a system of equations constructed from the established correspondences. Unfortunately, the correspondence problem itself is far from trivial, especially in the case of strong deformations. On Csaba Domokos, Jozsef Nemeth and Zoltan Kato are with the Department of Image Processing and Computer Graphics, University of Szeged, P.O. Box 652, H-6701 Szeged, Hungary. Fax:+36 62 546 397, Tel:+36 62 546 399, Email: {dcs,nemjozs,kato}@inf.u-szeged.hu. This work has been partially supported by the Hungarian Scientific Research Fund – OTKA K75637; the grant CNK80370 of the National In- novation Office (NIH) & the Hungarian Scientific Research Fund (OTKA); a PhD Fellowship of the University of Szeged, Hungary; the European Union and co-financed by the European Regional Development Fund within the projects T ´ AMOP-4.2.2/08/1/2008-0008 and T ´ AMOP-4.2.1/B- 09/1/KONV-2010-0005; and by ContiTech Fluid Automotive Hung´ aria Ltd. Prostate images were provided by Le2i-UMR CNRS 5158, Universit´ e de Bourgogne, Le Creusot, France and Computer Vision and Robotics Group, Universitat de Girona, Girona, Spain. Manuscript received 5 Mar. 2010; revised 9 June 2011; accepted 26 Sept. 2011; published online 8. Oct. 2011. the other hand, many featureless approaches estimate the transformation parameters directly from image intensity values over corresponding regions [8] or define a cost function based on a similarity metric and find the solution via a complex nonlinear optimization procedure [9]. A common assumption of both approaches is that the strength of the transformation is limited or close to iden- tity: The neighborhood of a landmark is searched for cor- respondences, while area-based methods may get stuck in local minima for strong deformations. Furthermore, both ap- proaches rely on the availability of rich radiometric informa- tion: Landmark-based methods usually match local brightness patterns around salient points [10] while featureless methods make use of intensity correlation between image patches. In many cases, however, such information may not be available (e.g. binary shapes) or it is very limited (e.g. prints, images of traffic signs). Another common problem is strong radiometric distortion (e.g. X-ray images, differently exposed images). Although there are some time consuming methods to cope with brightness change across image pairs [11], such image degradations are difficult to handle. While these issues make classical brightness-based features unreliable thus challeng- ing current registration techniques, the segmentation of such images can be straightforward or readily available within a particular application. Therefore a valid alternative is to solve the registration problem using a binary representation (i.e. segmentation) of the images [12]. In this paper, based on our previous work [13], [14], we propose a novel framework to estimate nonlinear diffeomor- phic transformations without establishing correspondences or restricting the strength of the deformation. The basic idea is to set up a system of nonlinear equations by integrating a set of nonlinear functions over the image domains and then solve it by classical Levenberg-Marquardt algorithm [15]. If perfect graylevel images would be available without any radiometric distortion, then the estimation of an aligning homeomorphism could be traced back to the solution of a linear system of 0000–0000/00$00.00 c 2012 IEEE

Transcript of IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf ·...

Page 1: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 943

Nonlinear Shape Registration withoutCorrespondences

Csaba Domokos, Jozsef Nemeth, and Zoltan Kato Senior member, IEEE

Abstract—In this paper, we propose a novel framework to estimate the parameters of a diffeomorphism that aligns a known shapeand its distorted observation. Classical registration methods first establish correspondences between the shapes and then computethe transformation parameters from these landmarks. Herein, we trace back the problem to the solution of a system of nonlinearequations which directly gives the parameters of the aligning transformation. The proposed method provides a generic framework torecover any diffeomorphic deformation without established correspondences. It is easy to implement, not sensitive to the strength of thedeformation, and robust against segmentation errors. The method has been applied to several commonly used transformation models.The performance of the proposed framework has been demonstrated on large synthetic datasets as well as in the context of variousapplications.

Index Terms—Image registration, diffeomorphism, nonlinear transformation, planar homography, thin plate spline, shape matching.

F

1 INTRODUCTION

R EGISTRATION is a fundamental problem in various fieldsof image processing where images taken from different

views, at different times, or by different sensors need to becompared or combined. In a general setting, one is lookingfor a transformation which aligns two images such that oneimage (called the observation) becomes similar to the secondone (called the template). Most of the existing approachesassume a linear transformation (rigid-body, similarity, affine)between the images, but in many applications nonlinear de-formations [1] (e.g. projective, polynomial, elastic) need to beconsidered. Typical applications include visual inspection [2],object matching [3] and medical image analysis [4]. Goodsurveys can be found in [5], [6].

Registration methods can be divided into two main cat-egories: Landmark-based methods and featureless (or area-based) approaches. Landmark-based methods rely on ex-tracted corresponding landmarks [5], [7], then the aligningtransformation is recovered as a solution of a system ofequations constructed from the established correspondences.Unfortunately, the correspondence problem itself is far fromtrivial, especially in the case of strong deformations. On

• Csaba Domokos, Jozsef Nemeth and Zoltan Kato are with the Departmentof Image Processing and Computer Graphics, University of Szeged, P.O.Box 652, H-6701 Szeged, Hungary. Fax:+36 62 546 397, Tel:+36 62 546399,Email: {dcs,nemjozs,kato}@inf.u-szeged.hu.

• This work has been partially supported by the Hungarian ScientificResearch Fund – OTKA K75637; the grant CNK80370 of the National In-novation Office (NIH) & the Hungarian Scientific Research Fund (OTKA);a PhD Fellowship of the University of Szeged, Hungary; the EuropeanUnion and co-financed by the European Regional Development Fundwithin the projects TAMOP-4.2.2/08/1/2008-0008 and TAMOP-4.2.1/B-09/1/KONV-2010-0005; and by ContiTech Fluid Automotive Hungaria Ltd.Prostate images were provided by Le2i-UMR CNRS 5158, Universite deBourgogne, Le Creusot, France and Computer Vision and Robotics Group,Universitat de Girona, Girona, Spain.

Manuscript received 5 Mar. 2010; revised 9 June 2011; accepted 26 Sept.2011; published online 8. Oct. 2011.

the other hand, many featureless approaches estimate thetransformation parameters directly from image intensity valuesover corresponding regions [8] or define a cost function basedon a similarity metric and find the solution via a complexnonlinear optimization procedure [9].

A common assumption of both approaches is that thestrength of the transformation is limited or close to iden-tity: The neighborhood of a landmark is searched for cor-respondences, while area-based methods may get stuck inlocal minima for strong deformations. Furthermore, both ap-proaches rely on the availability of rich radiometric informa-tion: Landmark-based methods usually match local brightnesspatterns around salient points [10] while featureless methodsmake use of intensity correlation between image patches. Inmany cases, however, such information may not be available(e.g. binary shapes) or it is very limited (e.g. prints, images oftraffic signs). Another common problem is strong radiometricdistortion (e.g. X-ray images, differently exposed images).Although there are some time consuming methods to copewith brightness change across image pairs [11], such imagedegradations are difficult to handle. While these issues makeclassical brightness-based features unreliable thus challeng-ing current registration techniques, the segmentation of suchimages can be straightforward or readily available within aparticular application. Therefore a valid alternative is to solvethe registration problem using a binary representation (i.e.segmentation) of the images [12].

In this paper, based on our previous work [13], [14], wepropose a novel framework to estimate nonlinear diffeomor-phic transformations without establishing correspondences orrestricting the strength of the deformation. The basic idea isto set up a system of nonlinear equations by integrating a setof nonlinear functions over the image domains and then solveit by classical Levenberg-Marquardt algorithm [15]. If perfectgraylevel images would be available without any radiometricdistortion, then the estimation of an aligning homeomorphismcould be traced back to the solution of a linear system of

0000–0000/00$00.00 c⃝ 2012 IEEE

Page 2: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 944

equations [16]. In real applications, however, such a strictrequirement cannot be satisfied. Herein, we will show thatregistration can be solved without making use of any intensityinformation. The main contribution is a unifying framework,which provides the registration of planar shapes under variousdiffeomorphic deformations (e.g. planar homography, polyno-mial or thin plate spline transformations). We have conducteda comprehensive test on a large set of synthetic images todemonstrate the performance and robustness of the proposedapproach. The method has been successfully applied to avariety of real images, e.g. alignment of hip prosthesis X-rayimages, registering traffic signs and handwritten characters, orvisual inspection of printed signs on tubular structures.

The paper organizes as follows. After a brief review of re-lated methods, the general registration framework is presentedin Section 2 followed by the description of commonly useddeformation models in Section 3. Section 4 and Section 5 dis-cusses numerical implementation issues. Finally, experimentalresults and comparative tests are presented in Section 6 andSection 7 concludes the paper.

1.1 State of the art

While registration of grayscale or color images is well stud-ied [17], [18], [19], [20], [21], [22], [23], the alignment ofbinary shapes [24], [25], [3], [26], [27], [28], [29] receivedless attention. Most of the current approaches are restrictedto the affine group [24], [25], [30], [31]. In [24], Domokosand Kato showed that it is possible to trace back the affinematching problem to an exactly solvable polynomial systemof equations. Moments and invariants also provide an efficienttool for recovering linear deformations [25]. A geometric,variational framework is introduced in [30], which uses activecontours to simultaneously segment and register images. Themethod [30] is applied to medical image registration, where2D and 3D rigid body transformations are considered. Anotherstatistics-based algorithm is proposed in [31] for registrationof edge-detected images, which utilizes edge pixel matchingto determine the ”best” translations. Then a statistical proce-dure, namely the McNemar test, is used to determine whichcandidate solutions are not significantly worst than the bestones. This allows for the construction of confidence regionsin the registration parameters. Note that this method is limitedto solving for 2D translations only [31].

In this paper, we are interested in nonlinear alignmentof shapes, which is a more challenging problem. The mostcommon nonlinear registration methods are based on pointcorrespondences [7], [32], [3]. Although there are robustkeypoint detectors like SIFT [10] or SURF [33], these arerelying on rich intensity patterns thus their use is limited inbinary registration. Landmark-based nonlinear shape matchinghas been addressed by Belongie et al. [3]. The method firstsearches for point correspondences between the two objects,then estimates the transformation using a generic thin platespline model. The point matches are established using a novelsimilarity metric, called shape context, which consists in con-structing a log-polar histogram of surrounding edge pixels. Theadvantage compared to traditional landmark based approaches

is that landmarks need not be salient points and radiometricinformation is not involved. Basically the method can beregarded as matching two points sets, each of them beinga dense sample from the corresponding shape’s boundary.Obviously, there is no guarantee that point pairs are exactlycorresponding because of the sampling procedure. However,having a dense sample will certainly keep mismatch errorat a minimum. The correspondences are simply establishedby solving a linear assignment problem, which requires timeconsuming optimization methods. For example, the complexityof the Hungarian method adopted in [3] is O(N3).

An important class of nonlinear transformations is the planeto plane homography which aligns two images of the sameplanar object taken from different views. Lepetit and Fuaproposed a method [17] for keypoint recognition on grayscaleimages. The main idea is to find keypoints during a trainingphase where a projectively different image set of target objectis used. Although the recognition of keypoints becomes veryfast, the training phase is very time consuming. In [18], aFourier domain based approach is presented using intensitiesfor computing the image-to-image transformation. Images aretransformed into the Fourier domain where the transformationparameters are computed using cross correlation methods.In [26], planar homography is computed in the Fourier domainfrom a starting affine estimation using the shape contours.In [27], the concept of characteristic line is employed to showsome useful properties of a planar homography matrix, whichrelate with Euler angles of the planar pattern.

Stochastic models with iterative optimization techniques arealso quite popular in this domain: In [7], Guo et al. proposea method to register shapes which underwent diffeomorphicdistortions, where simulated annealing is used to estimatepoint correspondences between the boundary points of theshapes. A Brownian motion model in the group of diffeo-morphisms has been introduced in [19]. The authors exploit aprior for warps based on a simple invariance principle underwarping. An estimation based on this prior guarantees aninvertible, source-destination symmetric, and warp-invariantwarp. The maximum-likelihood warp is then computed viaa PDE scheme. [20] uses a Markov Random Field model tosolve the registration problem. The deformation is describedby a field of discrete variables, representing displacements of(blocks of) pixels. Exact maximum a posteriori inference isintractable hence a linear programming relaxation technique isused. In [34], the registration problem is formulated as proba-bilistic inference using a generative model and the expectation-maximization algorithm. The authors define a data-driven tech-nique which makes use of shape features. This gives a hybridalgorithm which combines generative and discriminative mod-els. The measure of similarity is defined in terms of the amountof transformation required. The shapes are represented bysparse-point or continuous-contour representations dependingon the form of the data. Klein et al. presented a stochasticgradient descent optimization method with adaptive step sizeprediction [21]. This method employs a stochastic subsamplingtechnique to accelerate the optimization process. The selectionmechanism for the method’s free parameters takes into accountthe chosen similarity measure, the transformation model, and

Page 3: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 945

the image content, in order to estimate proper values for themost important settings.

Bronsetin et al. studied some fundamental problems inthe analysis of non-rigid deformable shapes [28]. In partic-ular, a novel similarity criteria for shape comparison andits extension to partial similarity has been proposed. Theyshowed that the correspondence problem is also solvable usingtheir similarity metric. In [32], Worz and Rohr proposed anovel approximation approach to registrate elastic deforma-tions. This landmark-based method uses Gaussian elastic bodysplines. Other methods use variational techniques [22]. Wenote that these methods has a rather high computational de-mand. In [29], a non-rigid registration algorithms is proposedbased on L2 norm and information-theory.

Another common approach is to approximate a nonlineardeformation via piecewise linear transformations: In [23], anovel framework to fuse local rigid or affine componentsinto a global invertible transformation, called Log-Euclideanpolyaffine, has been presented. A simple algorithm is proposedto compute efficiently such transformations and their inverseson regular grids.

2 REGISTRATION FRAMEWORK

In the general case, we want to recover the parameters of anarbitrary φ : R2 → R2 diffeomorphism which aligns a pairof shapes. Let us denote the point coordinates of the templateand observation by x = [x1, x2]

T ∈ R2 and y = [y1, y2]T ∈

R2 respectively. The following identity relation is assumedbetween the point coordinates of the shapes:

y = φ(x) ⇔ x = φ−1(y), (1)

where φ−1 : R2 → R2 is the corresponding inverse transfor-mation. Note that φ−1 always exists since a diffeomorphismis a bijective function such that both the function and itsinverse have continuous mixed partial derivatives. Supposethat shapes are represented by their characteristic function1 : R2 → {0, 1}, where 0 and 1 correspond to the backgroundand foreground respectively. If we denote the template by 1t

and the observation by 1o, the following equality also holds

1o(y) = 1o

(φ(x)

)= 1t(x), (2)

since x and y are corresponding point coordinates.Classical landmark based approaches would now set up a

system of equations from Eq. (1) using point correspondences.However, we are interested in a direct approach without solv-ing the correspondence problem. As a consequence, we cannotdirectly use Eq. (1)–(2) because we do not have establishedpoint pairs. However, we can multiply these equations andthen integrate out individual point correspondences yielding∫

R2

y1o(y)dy =

∫R2

φ(x)1t(x) |Jφ(x)| dx, (3)

where the integral transformation y = φ(x), dy = |Jφ(x)| dxhas been applied. The Jacobian determinant |Jφ| : R2 → R

|Jφ(x)| =

∣∣∣∣∣∣∂φ1

∂x1

∂φ1

∂x2

∂φ2

∂x1

∂φ2

∂x2

∣∣∣∣∣∣ (4)

ω from Eq. (25) ω from Eq. (26) ω from Eq. (27)

Fig. 1. The effect of various ω functions. Top: the gener-ated coloring of a binary shape. Bottom: the correspond-ing volumes.

gives the measure of the transformation at each point. Notethat in the case of affine (i.e. linear) transformations, the partialderivatives of the distortion are constants, hence the Jacobianis also constant and the transformation measure can be simplycomputed as the ratio of the shape areas. This property hasbeen explored in [24]. Herein, however, the transformationis nonlinear causing the Jacobian to become a non-constantfunction of the coordinates.

Since multiplying with the characteristic functions essen-tially restricts the integral domains to the foreground regionsFt = {x ∈ R2|1t(x) = 1} and Fo = {y ∈ R2|1o(y) = 1},we obtain the following finite integral equation:∫

Fo

ydy =

∫Ft

φ(x) |Jφ(x)| dx. (5)

The diffeomorphism φ can be decomposed as

φ(x) = [φ1(x), φ2(x)]T , (6)

where φ1, φ2 : R2 → R are coordinate functions. HenceEq. (5), which is in vector form, can be decomposed into asystem of two equations using these coordinate functions:∫

Fo

yidy =

∫Ft

φi(x) |Jφ(x)| dx, i = 1, 2. (7)

The parameters of φ are the unknowns of these equations.Usually, φ has more than two unknown parameters thereforea system of two equations is not enough to recover φ.

2.1 Construction of the system of equationsFirst of all, let us notice that the identity relation in Eq. (1)remains valid when a function ω : R2 → R is acting on bothsides of the equation [24], [13], [14]. Indeed, for a properlychosen ω

ω(y) = ω(φ(x)) ⇔ ω(x) = ω(φ−1(y)). (8)

Thus the following integral equation is obtained from Eq. (5)∫Fo

ω(y)dy =

∫Ft

ω(φ(x)

)|Jφ(x)| dx. (9)

The basic idea of the proposed framework is to generate suf-ficiently many equations using a set of nonlinear ω functions.Let the number of parameters of φ denoted by k and let{ωi}ℓi=1, ωi : R2 → R denote the set of adopted nonlinear

Page 4: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 946

functions. In order to solve for all unknowns, we need at leastk equations, hence ℓ ≥ k. We thus obtain the following systemof equations∫

Fo

ωi(y)dy =

∫Ft

ωi

(φ(x)

)|Jφ(x)| dx, i = 1, . . . , ℓ,

(10)where each ωi function provides one new equation. Note thatthe generated equations provide no new information, theysimply impose additional constraints. Note also that theseequations need to be algebraically independent. While thiscondition is difficult to verify in practice, it is also clear thatlinear independence of ωi functions -which is easier to verify-is crucial, as linear dependency would result in algebraicallydependent equations. Therefore in practice, we always use aset of nonlinear ωi functions. The solution of the system givesthe parameters of the aligning transformation. Intuitively, eachωi generates a consistent coloring of the shapes as shown inFig. 1. From a geometric point of view, Eq. (5) simply matchesthe center of mass of the template and observation while thenew equations in Eq. (9) match the volumes over the shapesconstructed by the nonlinear functions ωi (see Fig. 1).

2.2 DiscussionRelation to moment-based approaches: Although the

derivations in the previous section are not moment-based perse, it is interesting to analyze how the resulting equation ofEq. (10) is related to moments. Image moments and invari-ants [35] were introduced by Hu [36] for 2D pattern analysis.Since then, they became one of the most popular region-based descriptors because any shape can be reconstructed fromits infinite set of moments [37]. Traditional two dimensional(p+ q)th order moments of a function ϱ : R2 → R are definedas

mpq =

∫R2

xp1x

q2ϱ(x)dx ,

where p, q ∈ N0. When ϱ is an image function then thesemoments are also referred to as image moments. In the binarycase, where objects are represented by their silhouette, ϱ isa characteristic function yielding mpq =

∫F xp

1xq2dx with

F = {x ∈ R2 : ϱ(x) = 1}. This is often called shapeor geometric moment as it only uses polynomials of thecoordinates. Generally, orthogonal moments, such as Legen-dre [37] or Zernike moments [38], are numerically more stablethan regular moments. We remark, however, that orthogonalmoments can be expressed by regular moments.

In this sense, we can recognize a 0th order function momentof ωi in the left hand side of Eq. (10) (just like any functionintegral can be regarded as the 0th order moment of thefunction itself). Similarly to Legendre or Zernike moments,our function moments could also be expressed in terms ofshape moments whenever the adopted ωi functions are poly-nomials. When ωi is not polynomial then its Taylor expansionresults in an approximating polynomial which in turn yieldsan infinite sum of shape moments. The right hand side ofEq. (10) is more complex as it includes the product of theunknown transformation φ(x) and its Jacobian determinant|Jφ(x)| which are not necessarily polynomials. Therefore,

independently of the choice of ωi, it can only be expressed interms of shape moments by expanding it into a Taylor series.

It is thus clear that our system of equations outlined inEq. (10) cannot be rewritten in terms of a finite set of classicalshape moments, and hence not even in terms of orthogonalmoments. This result corresponds to similar findings reportedin [39], [40] in the context of projective invariants. What wepropose in this paper is another approach, which –startingfrom the identity relation in Eq. (1)– builds up a frameworkto generate an arbitrary set of equations.

Invariance vs. covariance: Moment invariants [36], [39]are extensively studied as they provide a powerful tool forshape matching. Basically, invariants are functions immuneto the action of a particular deformation. There is a wellestablished theory on affine invariants [41], [35], but invariantsof higher order deformations are hard to construct. Recentlyimportant results on the existence of projective moment invari-ants [40] as well as on generalized invariants, called ImplicitMoment Invariants [42], [35], have also been reported. Herein,we are not interested in constructing invariants as, beingimmune to the deformation, they do not provide constraints onthe actual transformation parameters. Instead, we need covari-ant functions that vary with the transformation φ(x), henceconstraining its parameters. Indeed, invariance and covarianceplay a complementary role: While invariants identify a shaperegardless of its deformation, covariants identify the actualdeformation.

Registration vs. matching: There is a fundamentaldifference between the problem of registration and shapematching [35]. In either case, we fix the family of possibletransformations. In the case of matching, we need to determinewhether two objects are from the same class or not. Forthat purpose, it is enough to ask whether there exists atransformation which aligns the objects (i.e. whether theyare on the same orbit of the fixed transformation class), butthe aligning transformation is not of interest. However, inthe registration problem we always assume that there existsa transformation which aligns the objects and we need toestimate its parameters. This explains why multiple objectmatching algorithms often make use of invariants, ignoring theeffect of the unknown transformation, and why covariance isused to solve the registration problem. Due to the difficulty infinding appropriate invariants under elastic deformations[42],[35], nonlinear shape matching (or recognition) is oftensolved by registering a given observation, representing thedeformed shape to be recognized, to the templates stored in adatabase [3]. A similarity metric is then constructed using thestrength of the deformation (e.g. bending energy) and the shapeis recognized as the template with the minimal distortion.

3 MODELING DEFORMATION FIELDS

It is a quite common assumption in image registration, thatthe deformation field is smooth and invertible, especiallywhen the resulting deformation field is further analyzed (e.g.in deformation-based morphometry or construction of shapemodels). Diffeomorphisms provide a convenient mathematicalframework to describe such deformations. Various parametric

Page 5: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 947

models of diffeomorphic deformations have been proposed inthe literature [1]. These are either based on a physical model(e.g. planar homography) or on a general parameterizationusing different basis functions (e.g. thin plate spline, B-spline). Herein, we will focus on some broadly used classof deformations, but our framework can be applied to othernonlinear transformations as well (see Section 6.4, for in-stance). Essentially, all we need to apply our framework to aparticular deformation model are the formulas of the adopteddiffeomorphism φ(x) and its Jacobian |Jφ(x)|.

3.1 Planar homographyPerspective images of planar scenes are usual in perception ofman made environments. In such cases, a planar scene and itsimage are related by a plane to plane homography, also knownas a plane projective transformation. Estimating the parametersof such transformations is a fundamental problem in computervision with various applications.

Let us denote the homogeneous coordinates of the templateand observation by x′ = [x′

1, x′2, x

′3]

T ∈ P2 and y′ =[y′1, y

′2, y

′3]

T ∈ P2, respectively. Planar homography is thena linear transformation in the projective plane P2

y′ = Hx′ ⇔ x′ = H−1y′ , (11)

where H = {Hij} is the unknown 3×3 transformation matrixthat we want to recover. Note that H has only 8 degree offreedom thus one of its 9 elements can be fixed. Herein wewill set H33 = 1. Although H33 could be 0 or small ingeneral, the coordinates of the input shapes are normalizedbefore matching into [−0.5; 0.5] × [−0.5; 0.5] with center ofmass being the origin (see Section 5). Thus if H33 would be 0then H would map the origin [0, 0, x′

3]T of the template into

[H13x′3,H23x

′3, 0]

T on the observation (i.e. to infinity yieldingan ellipse to become a parabola), which is quite unlikely to beobserved in a real image pair. Similarly, if H33 is very small,then the origin is mapped to a distant point implying extremedistortion which is again unlikely in practice. These are closeto degenerate situations for which a numerically stable solutionmay not exists anyway.

As usual, the inhomogeneous coordinates y = [y1, y2]T ∈

R2 of a homogeneous point y′ are obtained by projectivedivision

y1 =y′1y′3

=H11x1 +H12x2 +H13

H31x1 +H32x2 + 1≡ χ1(x)

y2 =y′2y′3

=H21x1 +H22x2 +H23

H31x1 +H32x2 + 1≡ χ2(x) , (12)

where χi : R2 → R. Indeed, planar homography is a lineartransformation in the projective plane P2, but it becomesnonlinear within the Euclidean plane R2. The nonlinear trans-formation corresponding to H is denoted by χ : R2 → R2,χ(x) =

[χ1(x), χ2(x)

]Tand the Jacobian determinant |Jχ| :

R2 → R is given by

|Jχ(x)| =

∣∣∣∣∣∣∂χ1

∂x1

∂χ1

∂x2

∂χ2

∂x1

∂χ2

∂x2

∣∣∣∣∣∣ = |H|(H31x1 +H32x2 + 1)

3 . (13)

3.2 Polynomial transformations

Polynomial transformations are important because they canbe used to approximate other distortions (e.g. via a Taylorexpansion as shown in Section 3.2.1) and they allow for amore efficient numerical implementation of our method. Letp : R2 → R2 be a polynomial transformation with p(x) =[p1(x), p2(x)]

T . Without loss of generality, we can assumethat d = deg(p1) = deg(p2), furthermore

p1(x) =

d∑i=0

d−i∑j=0

aijxi1x

j2, and p2(x) =

d∑i=0

d−i∑j=0

bijxi1x

j2,

where aij and bij are the unknown parameters of the transfor-mation and the number of these parameters is (d+2)(d+1).The Jacobian |Jp| : R2 → R is also polynomial

|Jp(x)| =( d∑

i=1

d−i∑j=0

iaijxi−11 xj

2

)( d∑j=1

d−j∑i=0

jbijxi1x

j−12

)

−( d∑

j=1

d−j∑i=0

jaijxi1x

j−12

)( d∑i=1

d−i∑j=0

ibijxi−11 xj

2

).

3.2.1 Taylor series expansion of a planar homography

In the case of planar homographies, the integrands in Eq. (10)can be approximated by a Taylor series expansion yieldinga system of polynomial equations. For example, consider theterm ωi(χ(x)) |Jχ(x)|. If ωi is polynomial then both the nomi-nator and denominator of ωi(χ(x)) is polynomial and remainsa polynomial when multiplied by |Jχ(x)| from Eq. (13). Thuswe have ωi ◦ χ |Jχ| = p(H11, . . . , H23)ξ(H31,H32) wherep is a polynomial, while ξ(H31,H32) is not. For example,assuming the ωi set from Eq. (25), we get

ξ(H31,H32) =1

(H31x1 +H32x2 + 1)3+n+m,

where n and m are the exponents in ωi. In order to have apolynomial approximation, let us rewrite ξ(H31,H32) in termsof its multivariate kth order Taylor series about (0, 0): T k

ξ(0,0).The expansion is performed about (0, 0) as it corresponds tothe identity transformation. Note that this also gives an equallygood approximation for all affine transformations (i.e. whereH31 = H32 = 0). The degree of T k

ξ(0,0) must be high enoughto cover the expected range of projectivity. In our experiments,we found that k = 5 gives good results. We thus have thefollowing polynomial approximation of the integrand:

ωi ◦ χ |Jχ| ≈ p(H11, . . . ,H23)Tkξ(0,0). (14)

The advantage of such a polynomial integrand is increasedcomputational efficiency, as discussed in Section 4.

3.3 Thin plate spline

Thin plate splines (TPS) [1], [43], [44] are widely used toapproximate non-rigid deformations using radial basis func-tions. Given a set of control points ck ∈ R2 and associatedmapping coefficients aij , wki ∈ R with i = 1, 2, j = 1, 2, 3

Page 6: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 948

and k = 1, . . . ,K, the TPS interpolating points ck is givenby [1]

ςi(x) = ai1x1 + ai2x2 + ai3 +K∑

k=1

wkiQ(||ck − x||) , (15)

where Q : R → R is the radial basis function

Q(r) = r2 log r2 .

Note that parameters include 6 global affine parameters aij and2K local coefficients wki for the control points. In classicalcorrespondence-based approaches control points are placed inextracted point matches, i.e. we know the exact mapping at thecontrol points and mappings of other points are interpolatedusing TPS. In our approach, however, TPS can be regardedas a parametric model to approximate the underlying free-form deformation. The parameters of this model are thenestimated by our method. In order to capture deformationseverywhere, we place the radial basis functions (i.e. controlpoints) on a uniform grid. Obviously, a finer grid allows torecover finer details of the deformation field at the price ofmore equations. The physical interpretation of Eq. (15) is athin plate deforming under point loads acting in the controlpoints. Additional constraints are that the sum of the loadsapplied to the plate as well as moments with respect to bothaxes should be 0. These are needed to ensure that the platewould not move or rotate under the imposition of the loads,thus remaining stationery [1]:

K∑k=1

wki = 0 andK∑

k=1

ckjwki = 0, i, j = 1, 2 . (16)

Another interpretation of the above constraints is that the plateat infinity behaves according to the affine term. Let ς : R2 →R2, ς(x) = [ς1(x), ς2(x)]

T a TPS map with 6+2K parameters.The Jacobian |Jς(x)| of the transformation ς is composed ofthe following partial derivatives (i, j = 1, 2)

∂ςi∂xj

= aij −K∑

k=1

2wki(ckj − xj)(1 + log(||ck − x||2)

).

(17)

4 CHOICE OF ω FUNCTIONS

Given φ(x) and its Jacobian |Jφ(x)| of a particular deforma-tion model, the parameters of the aligning transformation areobtained as a solution of the system of equations Eq. (10).For constructing these equations, we need an appropriate setof functions {ωi}ℓi=1. Theoretically, any nonlinear functionsatisfying Eq. (8) could be applied. In practice, however thereare two important considerations. First, our equations arealways corrupted by errors arising from imperfect data (e.g.segmentation and discretization errors). Therefore the solutionis obtained via least-squares minimization of the algebraicerror. Since both sides of these equations contain an integralof the corresponding ωi function, the characteristics of ωi

clearly influence the overall error. In particular, we expect anequal contribution from each equation in order to guaranteean unbiased error measure. Second, iterative least-squares

minimization algorithms, like the Levenberg-Marquardt algo-rithm [15], require the evaluation of the equations at everyiteration step. Thus the time complexity of the algorithm isconsiderably decreased if the integrals can be precomputed,hence avoiding scanning the image pixels at every iteration.

4.1 NormalizationThe algebraic error of the system Eq. (10) is obtained as thesum of squared errors :

ℓ∑i=1

(∫φ(Ft)

ωi(y)dy −∫Fo

ωi(y)dy

)2

,

where φ is the estimated transformation. On the other hand, thegeometric error is measured as the absolute difference betweenthe registered shapes:

|G| = |φ(Ft) △ Fo|,

where △ is the symmetric difference. Let G1 = φ(Ft)\Fo andG2 = Fo\φ(Ft), yielding G = G1 ∪ G2 and G1 ∩ G2 = ∅.Since

∫φ(Ft)∩Fo

(ωi(y)− ωi(y)) dy = 0, the algebraic errorcan be expressed as

ℓ∑i=1

(∫G1

ωi(y)dy −∫G2

ωi(y)dy

)2

. (18)

The ith equation contributes to the error by the differenceof the integrals of ωi over the non-overlapping domains G1

and G2. Thus the magnitude of the contributed value dependsnot only on the geometric error G but also on the values ωi

takes over these domains. Large variations in the range ofdifferent ωi functions yield an uneven contribution of differentequations which leads to a biased algebraic error or, in extremecases, to numerical instability.

A usual remedy is to normalize the coordinates of bothshapes into the unit square ([−0.5, 0.5] × [−0.5, 0.5] in ourexperiments), and to choose ωi with a range limited to asimilar interval (e.g. [−1, 1]). This is achieved by dividingthe integrals in Eq. (10) with an appropriate constant corre-sponding to the maximal magnitude of the integral. Since theintegral of a given ωi depends on the integration domain (i.e.the actual transformed shape), a trivial upper bound wouldbe the infinite integral

∫R2 |ωi|. Unfortunately, this integral

may not be computed or yields an infinite value thus makingthis kind of normalization unfeasible. Therefore we need tofind a finite domain which contains all intermediate shapesduring the minimization process. We found experimentally(see Fig. 2), that the transformations occurring during the least-squares minimization process do not transform the shapes outof a circle with center in the origin and a radius

√22 (i.e. the

circumscribed circle of the unit square). We thus adopt thefollowing constant

Ni =

∫∥x∥≤

√2

2

|ωi(x)|dx, (19)

and the normalized version of Eq. (10) becomes∫Fo

ωi(y)dy

Ni=

∫Ft

ωi

(φ(x)

)|Jφ(x)| dx

Ni, i = 1, . . . , ℓ.

(20)

Page 7: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 949

Fig. 2. Coverage of transformed shapes of ≈ 1500synthetic observations during the minimization process.Pixel values represent the number of intermediate shapesthat included a particular pixel. For reference, we alsoshow the circle with radius

√22 used for normalization.

4.2 Computational efficiencyThe Levenberg-Marquardt algorithm requires the evaluationof the equations at every iteration step. Unfortunately, theintegrands on the right hand side of Eq. (20) include theunknowns implying that we have to recompute these integralsat each iteration, yielding a time complexity of O(k(N+M)),where k is the number of iterations (typically around 1000in our experiments), while N and M are the number of theforeground pixels of the template and observation respectively.If we could eliminate the unknowns from the integrandsthen the integrals could be precomputed and the runtime ofthe solver would become independent from the number offoreground pixels M + N . We will show that this can beachieved by applying polynomial ωi functions in Eq. (20).

Let us suppose that φ(x) =∑n

i=1 aiϕi(x), where ai ∈ R,and ϕi : R2 → R2 are basis functions. Note that polynomialor thin plate spline deformations are of the above formwhile other diffeomorphisms can be approximated by the firstfew terms of their Taylor expansion [16] yielding the aboverepresentation. Furthermore, let us denote a = [a1, . . . , an]and ϕ(x) = [ϕ1(x), . . . , ϕn(x)].

Definition 1: When a function f : R2 → R2 is such that

f( n∑

i=1

aiϕi(x))=

m∑j=1

gj(a)hj

(ϕ(x)

),

where m ∈ N and gj : Rn → R, hj : R2n → R2 for 1 ≤ j ≤m, then we say f is separable with respect to a and ϕ(x).

The following theorem states that applying polynomial ωi

functions in Eq. (20) results in a regular nonlinear equationwith respect to the unknowns a1, . . . , an instead of an integralequation.

Theorem 1: When f : R2 → R2 is polynomial, then thefollowing equality holds:∫

Ff(φ(x)

)|Jφ(x)| dx =

m∑i=1

gi(a)

∫Fhi

(ϕ(x)

)dx, (21)

where m ∈ N and gi : Rn → R, hi : R2n → R2 for 1 ≤ i ≤m.

The proof can be found in Section 4.2.1. As a consequence,choosing a polynomial ωi function allows us to eliminate theunknowns a from the integrand. Hence

∫F hi

(ϕ(x)

)dx has to

be computed only once and the time complexity of the solverbecomes independent of the size of the input images.

4.2.1 Proof of Theorem 1The statement follows from the next three lemmas.

Lemma 1: If f1 and f2 are separable with respect to(a,ϕ(x)), then the function F (x) = f1(x)f2(x) is alsoseparable.

Proof: Since both f1 and f2 are separable, there exist twosets of functions g

(1)i , g

(2)j : Rn → R and h

(1)i , h

(2)j : R2n →

R2 for 1 ≤ i ≤ s and 1 ≤ j ≤ t such that

F (x) =s∑

i=1

g(1)i (a)h

(1)i (ϕ(x))

t∑j=1

g(2)j (a)h

(2)j (ϕ(x))

=s∑

i=1

t∑j=1

g(1)i (a)g

(2)j (a)h

(1)i (ϕ(x))h

(2)j (ϕ(x)).

Making use of the notations gl = g(1)i g

(2)j and hl = h

(1)i h

(2)j

with l = (i− 1)t+ j, we get

F (x) =st∑l=1

gl(a)hl(ϕ(x)),

which completes the proof.Lemma 2: If φ(x) =

∑ni=1 aiϕi(x), then |Jφ(x)| is sepa-

rable with respect to (a,ϕ(x)).Proof: Let us denote the components of the basis func-

tions as ϕi(x) = [ϕi1(x), ϕi2(x)]. The partial derivatives ∂lφk

(k, l = 1, 2) of φ(x) are then given by

∂φk

∂xl=

n∑i=1

ai∂lϕik(x), k, l = 1, 2

from which the Jacobian determinant of Eq. (4) can be writtenas

|Jφ(x)| =( n∑

i=1

ai∂1ϕi1(x))( n∑

j=1

aj∂2ϕj2(x))

−( n∑

i=1

ai∂2ϕi1(x))( n∑

j=1

aj∂1ϕj2(x))

=n∑

i=1

n∑j=1

aiaj(∂1ϕi1(x)∂2ϕj2(x)

−∂2ϕi1(x)∂1ϕj2(x)).

Setting gl(a) = aiaj and hl(ϕ(x)) = ∂1ϕi1(x)∂2ϕj2(x) −∂2ϕi1(x)∂1ϕj2(x) with l = (i− 1)n+ j, we get

|Jφ(x)| =n2∑l=1

gl(a)hl(ϕ(x)).

The Jacobian is thus separable, which completes the proof.Lemma 3: If φ(x) =

∑ni=1 aiϕi(x) and p(x) =

[p1(x1, x2), p2(x1, x2)], where p1, p2 ∈ R[x1, x2] are poly-nomials with deg(p1) = d1 and deg(p2) = d2, then p(φ(x))is separable with respect to (a,ϕ(x)).

Page 8: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 950

Proof: Let us consider the first component p1 of

p( n∑

i=1

aiϕi(x))=[p1

( n∑i=1

aiϕi(x)), p2

( n∑i=1

aiϕi(x))]

.

Since p1(x1, x2) =∑

j cjxqj1 x

rj2 is polynomial,

p1

( n∑i=1

aiϕi1(x),n∑

i=1

aiϕi2(x))=

∑j

cj

( n∑i=1

aiϕi1(x))qj( n∑

i=1

aiϕi2(x))rj

.

(∑ni=1 aiϕi1(x)

)qjcan be further expanded by making use

of the Multinomial theorem [45] as∑ qj !

s1! . . . sn!as11 . . . asnn ϕ11(x)

s1 . . . ϕn1(x)sn ,

where s1, . . . , sn ∈ N0 and∑n

i=1 si = d1. For the lth term ofthe above sum, let us define gl1(a) =

qj !s1!...sn!

as11 . . . asnn andhl1(ϕ(x)) =

∏ni=1 ϕi1(x)

si , yielding( n∑i=1

aiϕi1(x))qj

=∑l

gl1(a)hl1(ϕ(x)).

Hence(∑n

i=1 aiϕi1(x))qj

is separable and similarly(∑ni=1 aiϕi2(x)

)rjis also separable. Furthermore, their

product is also separable by Lemma 1, thus we proved thatp1 is separable. Similarly, it is easy to see that p2 is alsoseparable, which completes the proof.Now the statement of Theorem 1 is easily seen: f(φ(x)) and|Jφ(x)| are separable by Lemma 3 and Lemma 2, respectively.Hence their product f(φ(x))|Jφ(x)| is also separable byLemma 1 and using the basic properties of integral calculus,we get Eq. (21).

5 NUMERICAL IMPLEMENTATION

The equations in Eq. (20) are constructed in the continuumbut in practice we only have a limited precision digital image.Consequently, the integrals can only be approximated bya discrete sum over the foreground pixels introducing aninherent, although negligible error into our computation. Thecontinuous domains Ft and Fo are represented as finite sets offoreground pixels denoted by Ft and Fo, and Eq. (20) becomes

1

Ni

∑y∈Fo

ωi(y) =1

Ni

∑x∈Ft

ωi

(φ(x)

)|Jφ(x)| , i = 1, . . . , ℓ.

(22)

In addition to the above equations, the final system to be solvedmay contain further equations depending on the adopteddeformation model. These equations are either to improvenumerical stability (like in the case of planar homography) orto enforce additional constraints required by the model (e.g.thin plate spline).

Planar homography: Each equation of Eq. (22) canalso be written in three alternative forms by making use ofthe corresponding inverse transformation χ−1 and the reverseintegral transformation x = χ−1(y), dx =

∣∣Jχ−1(y)∣∣ dy (see

Eq. (24)):1

Ni

∑x∈Ft

ωi(x) =1

Ni

∑y∈Fo

ωi

(χ−1(y)

) ∣∣Jχ−1(y)∣∣

1

Ni

∑x∈Ft

ωi(x) |Jχ(x)| =1

Ni

∑y∈Fo

ωi

(χ−1(y)

)1

Ni

∑x∈Ft

ωi

(χ(x)

)=

1

Ni

∑y∈Fo

ωi(y)∣∣Jχ−1(y)

∣∣ . (23)

Note that the above equations are equivalent to the originalequation, including them into the system is redundant inthe mathematical sense. However, in practice they play animportant role in ensuring numerical stability of the finalsolution. Therefore in our implementation, we use all fourequations for each ωi.

Thin plate spline: The number of required equations,hence the number of ωi functions, depends on how many con-trol points we used. Furthermore, the constraints in Eq. (16)has to be included in the system of equations.

Solution and complexity: The obtained system of equa-tion is then solved by iterative least squares minimization usingthe Levenberg-Marquardt algorithm (LM) [15]. The simplepseudo code of the algorithm is shown in Algorithm 1. Thetime complexity of the algorithm is O(N + M) wheneverwe adopt a polynomial set {ωi}ℓi=1. Note, that LM findsa local minimum. However, our numerical experiments showthat the solution found by LM is quite close to the geomet-rically correct one. A theoretical analysis would be far toocomplex, but intuitively we can argue as follows: To avoidgeometrically wrong local minima, proper normalization iscrucial. As explained in Section 4.1, the equations need to bebalanced and shapes must be normalized into the unit square.This guarantees, that initially shapes are overlapping makingthe identity transform a good initialization, while balancedequations eliminate undesirable bias during iterations causedby large coefficients in some equations. Finally, we have toremark that deformations with higher degree of freedom (e.g.TPS) may have many geometrically correct solutions (i.e.

χ−11 (y) = x1 =

(H22 −H32H23)y1 − (H12 −H32H13)y2 +H23H12 −H22H13

(H32H21 −H31H22)y1 − (H32H11 −H31H12)y2 +H22H11 −H21H12

χ−12 (y) = x2 =

−(H21 −H31H23)y1 + (H11 −H31H13)y2 − (H23H11 −H21H13)

(H32H21 −H31H22)y1 − (H32H11 −H31H12)y2 +H22H11 −H21H12(24)

|Jχ−1(y)| =|H|2(

(H32H21 −H31H22)y1 − (H32H11 −H31H12)y2 +H22H11 −H21H12

)3

Page 9: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 951

many transformation may produce an almost perfect alignmentdue to the fact that deformations are only visible around theboundary of the shapes). Therefore, although the parameterspace is of higher dimension, LM also has a higher chance tofind a local minima close to one of these correct solutions.

Algorithm 1: Pseudo code of the proposed algorithmInput : Binary images of the template and observationOutput: k parameters of the estimated transformation φNormalize each input shape into the unit square1

[−0.5, 0.5]× [−0.5, 0.5] such that the shape’s center ofmass becomes the origin.Choose a set of ωi : R2 → R (i = 1, . . . , ℓ ≥ k)2

functions as described in Section 4.Construct the system of equations of Eq. (22) and3

include any additional equations if needed (e.g. Eq. (23)).Find a least-squares solution of the system using a4

Levenberg-Marquardt algorithm. The solver is initializedwith the parameters of the identity transformation.Unnormalizing the solution gives the parameters of the5

aligning transformation.

6 EXPERIMENTAL RESULTS

The proposed method ha been tested on various synthetic andreal datasets. The performance of the algorithm has also beencompared to two other nonlinear registration methods: ShapeContext [3] which has been developed for general nonlinearregistration and homest [46] which implements a classicalalgorithm for homography estimation. The proposed algorithmhas been implemented in Matlab R2008 and all tests havebeen ran on a Pentium IV 3.2 GHz under Linux operatingsystem. The demo implementation of our method is availablefor download at http://www.inf.u-szeged.hu/∼kato/software/.

Registration results were quantitatively evaluated using twokind of error measures. The first one (δ) is the absolute differ-ence of the registered shapes, while ϵ measures the distancebetween the true φ and the estimated φ transformation:

δ =|Fr △ Fo||Fr|+ |Fo|

· 100%, ϵ =1

|Ft|∑x∈Ft

∥φ(x)− φ(x)∥,

where Ft, Fo, and Fr denote the set of foreground pixels of thetemplate, observation, and the registered template respectively.

Intuitively, ϵ shows the average transformation error perpixel. Note that this measure can only be evaluated on syn-thetic images where the applied transformation is known whileδ can always be computed. On the other hand, ϵ gives abetter characterization of the transformation error as it directlyevaluates the mistransformation. δ sees only the percentage ofnon-overlapping area between the observation and registeredshape. Hence the value of δ depends also on the compactness,topology, and segmentation error of the shapes.

6.1 Comparison of various ω functionsAccording to our theoretical results presented in Section 4,we expect that the precision of the recovered transformation

Eq. (25) Eq. (26) Eq. (27)

Eq. (28) Eq. (29) Eq. (30)

Fig. 3. Plots of tested {ωi} function sets.

parameters is independent of the choice of the {ωi} set as longas equations are properly normalized. To verify these findings,we evaluated the registration quality of various {ωi} functionsets. We considered power and trigonometric functions as wellas polynomials, a total of 6 different sets (see Fig. 3):

1) Power functions

ωi(x) = xni1 xmi

2 (25)

with (ni,mi) ∈ {(0, 0), (1, 0), (0, 1), (1, 1), (2, 0), (0, 2),(2, 1), (1, 2), (2, 2), (3, 0), (0, 3), (3, 1), (1, 3)}

2) Rotated power functions

ωi(x) = (x1 cosαi−x2 sinαi)ni(x1 sinαi+x2 cosαi)

mi

(26)with αi ∈

{0, π

6 ,π3

}and (ni,mi) ∈ {(1, 2), (2, 1),

(1, 3), (3, 1)}3) Mixed trigonometric functions

ωi(x) = sin(nix1π) cos(mix2π) (27)

with (ni,mi) ∈ {(1, 2), (2, 1), (2, 2), (1, 3), (3, 1),(2, 3), (3, 2), (3, 3), (1, 4), (4, 1), (2, 4), (4, 2)}

4) Trigonometric functions

ωi(x) = Qi(nix1)Ri(mix2) (28)

with Qi(x), Ri(x) ∈ {sin(x), cos(x)} and (ni,mi) ∈{(1, 1), (1, 2), (2, 1)}

5) Polynomials

ωi(x) = Pni(x1)Pmi(x2) (29)

with (ni,mi) ∈ {(1, 2), (2, 1), (1, 3), (3, 1), (2, 3),(3, 2), (1, 4), (4, 1), (2, 4), (4, 2), (3, 4), (4, 3)}composed of the following random polynomials:

P1(x) = 2x2 − x− 1

P2(x) = 2x3 − x2

P3(x) = x3 − 30x2 + 3x+ 2

P4(x) = 3x5 − x2 + 5x− 1

6) Polynomials

ωi(x) = Lni(x1)Lmi(x2) (30)

Page 10: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 952

with (ni,mi) ∈ {(2, 3), (3, 2), (2, 4), (4, 2), (3, 4),(4, 3), (2, 5), (5, 2), (3, 5), (5, 3), (4, 5), (5, 4)}composed of the following Legendre polynomials:

L2(x) =1

2

(3x2 − 1

)L3(x) =

1

2

(5x3 − 3x

)L4(x) =

1

8

(35x4 − 30x2 + 3

)L5(x) =

1

8

(63x5 − 70x3 + 15x

)The quantitative evaluation of the above function sets are

summarized in Table 1. Basically, all median δ error measuresare between 0.1 − 0.2. Although the mean values have aslightly bigger variance, this is mainly caused by a few outliersrather than a systematic error. It is thus fair to say that theconsidered ωi functions perform equally well, which confirmsour theoretical results.

The question is therefore naturally arising: Which {ωi}set should be used? Or in more general: What propertiesshould the {ωi} set have? From a theoretical point of view,there are only trivial restrictions on the applied functions:Obviously, ωi must be an integrable function over the finitedomains Fo and Ft. The functions have to be rich enough,i.e. they have to produce a varying surface over the shapedomain (e.g. see Fig. 3). For example, the constant functionω(x) ≡ c is clearly wrong as it makes Eq. (8) always trueindependently of the underlying deformation. From a practicalpoint of view, the picture is different: First of all, we have tosolve numerically a system of integral equations. Accordingto Theorem 1, we can reduce this problem to the solution ofa nonlinear system of equations when the ωi functions arepolynomial. The empirical results presented in this sectionshow, that registration quality is almost unaffected by thechoice of ωi functions but computational efficiency is clearlyincreased for a polynomial {ωi} set. Therefore we recommendto use low order polynomials for computational efficiency. Inour experiments, we have used the set 1), unless otherwisenoted.

{ωi} set δ(%) ϵ(pixel)m µ σ m µ σ

1) 0.09 0.53 3.38 0.08 3.03 22.362) 0.11 1.01 5.01 0.10 4.40 24.143) 0.21 12.28 19.61 0.19 20.14 41.734) 0.12 1.52 6.25 0.11 6.02 25.795) 0.10 0.80 4.75 0.08 3.27 18.606) 0.10 0.99 4.84 0.08 4.17 20.78

TABLE 1Quantitative comparison of various {ωi} function sets. m,

µ, and σ denote the median, mean, and deviation.

6.2 Quantitative evaluation on synthetic dataHerein, we will focus on planar homography. Synthetic testswith other deformation models can be found in [14]. Ourbenchmark dataset contains 37 different shapes and their trans-formed versions, a total of ≈ 1500 images of size 256× 256.

Template Observation SC [3] Proposed

Fig. 4. Planar homographies: Example images fromthe synthetic data set and registration results obtainedby Shape Context [3] and the proposed method. Theobservation and the registered template were overlaid,overlapping pixels are depicted in gray whereas non-overlapping ones are shown in black.

Runtime (sec.) δ (%) ϵ (pixel)SC P T SC P T P T

m 98.72 16.04 5.67 2.69 0.09 0.16 0.08 0.14µ 102.78 27.04 6.52 4.41 0.54 0.88 2.97 3.79σ 28.26 45.34 3.62 4.79 3.42 3.34 22.04 20.26

TABLE 2Comparative tests of the proposed method on the

synthetic dataset for recovering a planar homography.SC – Shape Context [3]; P – proposed method using

Eq. (22)–(24); and T – using its Taylor expanded form.m, µ, and σ denote the median, mean, and deviation.

The applied plane projective transformations were randomlycomposed of 0.5, . . . , 1.5 scalings; −π

4 , . . . ,π4 rotations along

the three axes; −1, . . . , 1 translations along both x and yaxis and 0.5, . . . , 2.5 along the z axis; and a random focallength chosen from [0.5, 1.5]. Note that these are projectivetransformations mapping a template shape from a plane placedin the 3D Euclidean space to the xy plane. Some typicalexamples of these images can be seen in Fig. 4, while asummary of registration results is presented in Table 2. Wehave also compared the performance of our method to that ofShape Context [3]. For testing, we used the program providedby the authors and set its parameters empirically to theiroptimal value (beta init = 30 , n iter = 30 , annealing rater = 1 ). We remark that the program’s only output is theregistered shape, hence ϵ could not be computed. Finally, wedemonstrate that by using the multivariate Taylor expandedform of the planar homography transformation presented inSection 3.2.1, CPU time can be considerably reduced at theprice of a negligible loss in quality. In these tests, we haveused the 5th order approximation of the integrands.

Page 11: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 953

(a) missingpixels

(b) occlusion (c)disocclusion

(d) boundaryerror

Fig. 5. Sample observations with various degradations.

(a) missing pixels 5% 10% 15% 20%

Shape Context [3] m 21.85 24.91 26.38 27.2σ 5.97 6.14 6.37 6.56

Proposed method m 2.98 5.69 8.51 11.57σ 4.13 5.23 6.09 6.74

(b) size of occlusion 1% 2.5% 5% 10%

Shape Context [3] m 3.03 3.55 4.55 6.79σ 4.79 4.79 5.09 7.03

Proposed method m 1.41 3.40 6.19 11.27σ 3.49 4.18 5.09 6.6

(c) size of disocclusion 1% 2.5% 5% 10%

Shape Context [3] m 3.63 4.52 6.25 9.28σ 5.19 5.61 6.84 7.78

Proposed method m 1.93 4.54 8.28 13.62σ 4.31 5.13 6.16 7.09

(d) size of boundary error 1% 5% 10% 20%

Shape Context [3] m 2.86 3.78 4.68 6.92σ 4.72 4.83 5.04 5.92

Proposed method m 0.54 1.67 2.67 4.03σ 3.28 3.5 3.9 4.47

TABLE 3Median (m) and standard deviation (σ) of δ error (%) vs.various type of segmentation errors as shown in Fig. 5.

6.2.1 Robustness

In practice, segmentation never produces perfect shapes.Therefore we have also evaluated the robustness of the pro-posed approach against segmentation errors. Besides usingvarious kind of real images inherently subject to such errors,we have also conducted a systematic test on simulated data:In the first testcase, 5%, . . . , 20% of the foreground pixelshas been removed from the observation before registration.In the second case, we occluded continuous square-shapedregions of size equal to 1%, . . . , 10% of the shape, while inthe third case we disoccluded a similar region. Finally, werandomly added or removed squares uniformly around theboundary of a total size 1%, . . . , 10% of the shape. Note thatwe do not include cases where erroneous foreground regionsappear as disconnected regions, because such false regions canbe efficiently removed by appropriate morphological filtering.We therefore concentrate on cases where segmentation errorscannot be filtered out. See samples of these errors in Fig. 5.

Table 3 shows that our method is quite robust whenevererrors are uniformly distributed (first and fourth testcases) overthe whole shape. However, it becomes less stable in case oflarger localized errors, like occlusion and disocclusion. Thisis a usual behavior of area-based methods because they arerelying on quantities obtained by integrating over the wholeobject area. Thus large missing parts would drastically changethese quantities resulting in false registrations. Nevertheless, inmany application areas one can take images under controlled

conditions which guarantees that observations are not occluded(e.g. medical imaging, industrial inspection). Note also thatShape Context [3] is consistently outperformed by our methodexcept in the cases of occlusion and disocclusion.

6.3 Real images

Herein, we will demonstrate the relevance of our approach invarious application domains using two common deformationmodels: planar homography and thin plate spline.

6.3.1 Planar homography

Traffic signs: Nowadays, modern cars include manysafety systems. Automatic traffic sign recognition is a majorchallenge of such intelligent systems, where one of the keytasks is the matching of a projectively distorted sign witha template. Herein, we have used classical thresholding andsome morphological operations for segmentation but automaticdetection/segmentation is also possible [47]. Fig. 6 and Fig. 8show some registration results. Each image pair was takenfrom different signboards. The main challenges were strongdeformations, segmentation errors and variations in the styleof the depicted objects. For example, the observations inFig. 6(f), Fig. 8(b) and Fig. 8(c) do not contain exactly thesame shape as the object on the template. In particular, theSTOP sign in Fig. 8(c) uses different fonts. In spite of thesedifficulties, our method was able to recover a quite accuratetransformation (the average δ error was 12, 66% on theseimages).

X-ray images: Traumatic hip replacement is a surgicalprocedure in which the hip joint is replaced by a prostheticimplant. In the short term post-operatively, infection is a majorconcern. An inflammatory process causes bone resorptionand subsequent loosening or fracture often requiring revisionsurgery. In current practice, clinicians assess loosening byinspecting a number of post-operative X-ray images takenover a period of time. Obviously, such an analysis requiresthe registration of X-ray images as shown in Fig. 7. Evenvisual inspection can benefit from registration as clinicallysignificant prosthesis movement can be very small [48]. Sinceone is looking for deformations of the bone surrounding theimplant, alignment must be based on the implant as it is theonly imaged part which is guaranteed to remain unchangedfrom one image to the other. There are two main challengeshere: One is the highly non-linear radiometric distortion whichmakes any graylevel-based method unstable [49]. Fortunately,the segmentation of the prosthetic implant is quite straight-forward [50], herein we used active contours to segment theimplant [51]. The second problem is that the true transfor-mation is not a plane projective one, it also depends on theposition of the implant in 3D space. Indeed, there is a rigid-body transformation in 3D space between the implants, whichbecomes a nonlinear mapping between the X-ray images.Since the X-ray images are always taken in a well definedstandard position of the patient’s leg, the planar homographytransformation model was a good approximation here. Someregistration results are presented in Fig. 7.

Page 12: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 954

(a) (b) (c)

(d) (e) (f)

Fig. 6. Registration results on traffic signs. The first rowshows the templates while below them the correspondingobservations with the overlaid contour of the registrationresults.

SIFT [10] SC [3] Proposed

δ = 18.65% δ = 1.83% δ = 1.64%

δ = 2.84% δ = 10.23% δ = 1.32%

Fig. 7. Registration results on hip prosthesis X-ray im-ages. The overlaid contours show the aligned contoursof the corresponding images on the left. Images in thesecond column show the registration results obtained bySIFT [10]+homest [46], in the third column the resultsof Shape Context [3]+homest [46], while the last columncontains the results of the proposed method.

(a) (b) (c) (d)

Fig. 8. Registration results on traffic signs. The tem-plates are in the first row, then the results obtained bySIFT [10]+homest [46] (second row), where the imagesshow point correspondences between the images foundby SIFT [10] in the third row. The results obtained byShape Context [3]+homest [46] (fourth row) and theproposed method in the last row. The contours of theregistered images are overlaid.

Comparison: Since the grayscale versions of the imageswere available, it was possible to compare our method to afeature-correspondence based solution. For that purpose, wehave used homest [46], which implements a kind of “goldstandard” algorithm composed of [52], [53]. The point corre-spondences has been extracted by the SIFT [10] method. Asinput, we provided the masked signboard region for traffic signmatching and the prosthesis region for medical registration.Furthermore, we have also extracted point correspondencesestablished by Shape Context [3]. Here, the input was thebinary mask itself used for SIFT as well as for our method.Although the SIFT parameter called distRatio, controllingthe number of the extracted correspondences, has been man-ually fine-tuned, we could not get reliable results due to thelack of rich radiometric features. Fig. 7 shows two results onX-ray images while on traffic signs (see Fig. 8), SIFT couldnot find enough correspondences in about half of the cases. Asfor Shape Context-based correspondences, we got somewhatbetter alignments (an average δ of 33.47% for the traffic signimages).

Page 13: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 955

Fig. 9. Sample images from the MNIST dataset andregistration results using a thin plate spline model. Firstand second rows show the images used as templatesand observations while the 3rd and 4th rows show theregistration results obtained by Shape Context [3] and theproposed method, respectively.

Runtime (sec.) δ (%)m µ σ m µ σ

Shape Context [3] 35.02 34.43 7.58 7.86 9.40 4.71Proposed method 10.00 9.81 1.47 7.66 8.93 4.22

TABLE 4Comparative results on 2000 image pairs from the

MNIST database. m, µ, and σ stand for the median,mean, and standard deviation.

6.3.2 Thin plate splineMatching handwritten characters: The performance of

our method has been evaluated on aligning handwritten digitsfrom the MNIST dataset [54]. A standard approach in matchingcharacters is to align the observation (to be recognized) witheach of the digit templates, and recognize it as the templatewith the lowest deformation. A similar approach is usedin [3] which can be applied in our case too. Herein, wewill concentrate on the alignment of these characters. Sincethis is a free-form deformation, we used the thin plate splinemodel with 25 control points placed on a regular grid overthe input shapes. The model has 2 · 25 + 6 = 56 parameters.The equations were generated using the function set Eq. (25)with parameters 0 ≤ mi, ni ≤ 8, mi + ni ≤ 8 resulting inan overdetermined system of 81 equations. The experimentconsisted of ≈ 2000 test cases, some example images andregistration results are shown in Fig. 9. Moreover, we alsocompared our results to Shape Context [3], which also usesa thin plate spline model but control points are placed oncorresponding contour points. Comparative results in Table 4show that our method provides slightly better matches within1/3 CPU time.

Aligning multimodal prostate images: Transectal Ul-trasound (TRUS) guided needle biopsy is used to confirm thepresence of cancerous tissues in prostate [55], [56]. Unfor-tunately, the localization of malignant tissue regions is chal-lenged by the rather low signal to noise ratio of Ultrsound (US)images. Therefore in clinical practice, samples are collectedfrom different zones of the prostate to maximize the chanceof locating malignant tissues. Superior contrast of soft tissuesof the prostate gland in MR images can considerably improvedetection of cancerous tissues, but interventional MRI guidedbiopsy is expensive and complicated. Therefore a viable so-lution is the fusion of the two modalities to exploit the high

Fig. 10. Alignment of MRI (left) and US (right) prostateimages using a TPS deformation model. The contours ofthe registered MRI images are overlaid on the US images.δ errors are 2.12% (first row) and 1.88% (second row).

quality of MR images in TRUS interventional biopsies [55],[56]. An essential part of this procedure is the alignment ofthe segmented prostate regions in the two modalities. Since theprostate may undergo deformations due to the insertion of theendorectal probe through the rectum during the MR imaging aswell as inflation of the endorectal balloon, nonlinear registra-tion is needed for the multimodal alignment. Due to the ratherdifferent content of these modalities, radiometric informationcannot be used reliably. Fortunately, the segmentation of theprostate is available in both modalities, which is efficientlyobtained by an Active Appearance Model [55]. Based onthese segmentations, point correspondences are established bymaking use of the prostate geometry in [56]. In our case,however, there is no need for further processing, the shapescan be directly aligned without established correspondences.Fig. 10 shows some examples of aligned prostate imagesobtained by our method.

6.4 Application in industrial inspectionAn important step in hose manufacturing for automotiveindustry is to print various signs on the hose surface in orderto facilitate installation (see Fig. 11). The quality control ofthis process involves visual inspection of the printed signs. Inan automated inspection system, this can be implemented bycomparing images of the printed sign to its template, whichrequires the alignment of the template and observation shapes.The main challenges are segmentation errors and complex dis-tortions. The physical model of the contact printing procedureis as follows:

1) The stamp (basically a planar template of the sign) ispositioned on the hose surface. This can be described bya 2D rotation and scaling S : R2 → R2 of the template.

2) Then the stamp is pressed onto the surface, modeled asa transformation γ : R2 → R3 that maps the template’splane to a cylinder with radius r:

γ(x) =[r sin

x1

r, x2,−r cos

x1

r

]T.

Page 14: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 956

Fig. 11. Registration results of printed signs. Top: planartemplates. Bottom: the corresponding observations withthe overlaid contour of the registration results. The firstimage pair shows the segmented regions used for reg-istration. Note the typical segmentation errors. (Imagesprovided by ContiTech Fluid Automotive Hungaria Ltd.)

3) Finally, a picture is taken with a camera, which isdescribed by a classical projective transformation P :R3 → R2 with an unknown camera matrix.

Thus the transformation

φ(x) = (P ◦ γ ◦ S)(x) (31)

acting between a planar template and its distorted observationhas 11 parameters: S has 3 parameters, γ has one (r), andP has 7 (six extrinsic parameters and the focal length). TheJacobian |Jφ| is straightforward to compute, although yieldsa lengthy formula that we omit here due to lack of space.Equations were generated by the function set Eq. (26) withparameters using all combinations for αi ∈

{0, π

6 ,π3

}and

(ni,mi) ∈ {(1, 2), (2, 1), (1, 3), (3, 1)} yielding a system of12 equations. The method has been tested on more than150 real images and it proved to be efficient in spite ofsegmentation errors and severe distortions.

7 CONCLUSION

We have proposed a general framework for recovering diffeo-morphic deformations. The fundamental difference comparedto classical image registration algorithms is that our modelworks without any landmark, feature detection, or correspon-dences by adopting a novel idea where the transformation is

obtained as a solution of a set of nonlinear equations. Exper-imental results show that the proposed method provides goodalignment on both real and synthetic images. Furthermore,its robustness has been demonstrated on a large syntheticdataset as well as on real images. Although our method clearlydominates state of the art correspondence-based methods, ithas to be noted that, being calculated from the whole object,our equations are sensitive to partial occlusions. On the otherhand, a common limitation of classical approaches is that theyassume a deformation close to identity in order to establishreliable correspondences. Therefore we see our contributionas a complementary method rather than a replacement for allprevious registration algorithms. Its superiority can be fullyexploited in applications where occlusion can be kept at a min-imum (e.g. medical imaging or industrial inspection), whilefeature-based methods can be more efficient when occlusionsare common (e.g. surveillance). A rigorous theoretical analysison selecting an optimal {ωi} set has also been presented andour findings have been confirmed experimentally. A uniquefeature of the proposed framework is that it can be used notonly with standard transformations but also with application-specific deformation models.

Finally, we would like to mention two open questions whichmight inspire further research. In this paper, we have beendealing with binary images, but the extension of our methodto gray-level images is feasible. For example, assuming thatthe image functions f(y) and g(x) are covariant under theunknown transformation φ, i.e. g(x) = f(φ(x)) = f(y), wecould rewrite Eq. (9) as∫

Fo

yω(f(y))dy =

∫Ft

φ(x)ω(g(x)) |Jφ(x)| dx. (32)

The main advantage of this equation is that the nonlinearfunction ω is acting on the image functions f and g henceavoiding a nonlinear mapping of the unknown deformationparameters. Unfortunately, the above equation also implies,that no radiometric distortion is allowed – something whichis quite unlikely in any real imaging system when objects aresubject to elastic deformations. Therefore the key challengehere is to recover radiometric distortion.

The second open problem is the optimal choice of the ωi

functions with respect to a given shape. Although in ourexperiments we did not observe this kind of dependency,it is definitely a valid and interesting question. Since theintegration domains in Eq. (9) are determined by the shapes,the integrals are clearly influenced by the characteristics ofthe ωi functions over the shapes. The right answer mayalso depend on the interpretation of optimality. In this paper,we explored the case when optimality means computationalefficiency. We shown, that a polynomial {ωi} set is themost favorable as iterative LSE methods (like the LevenbergMarquardt algorithm) doesn’t need to scan through the inputshapes at each iteration. Note that under certain conditionsdiscussed in Section 2.2, such a function set yields to classicalshape moments. However, if optimality means a minimumnumber of equations, then the optimal set may depend onthe shape as well as on the actual deformation. Since wewant to recover an aligning transformation, this latter is also

Page 15: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 957

an important factor as the equations need to represent thedeformation rather then the underlying shape. The analysisof this dependency is definitely interesting from a theoreticalpoint of view.

REFERENCES

[1] L. Zagorchev and A. Goshtasby, “A comparative study of transformationfunctions for nonrigid image registration,” IEEE Transactions on ImageProcessing, vol. 15, pp. 529–538, March 2006.

[2] J. A. Ventura and W. Wan, “Accurate matching of two-dimensionalshapes using the minimal tolerance zone error,” Image and VisionComputing, vol. 15, pp. 889–899, December 1997.

[3] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and objectrecognition using shape context,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 24, pp. 509–522, April 2002.

[4] D. L. G. Hill, P. G. Batchelor, M. Holden, and D. J. Hawkes, “Medicalimage registration,” Physics in Medicine and Biology, vol. 46, pp. R1–R45, March 2001.

[5] B. Zitova and J. Flusser, “Image registration methods: A survey,” Imageand Vision Computing, vol. 21, pp. 977–1000, October 2003.

[6] J. B. A. Maintz and M. A. Viergever, “A survey of medical imageregistration,” Medical Image Analysis, vol. 2, pp. 1–36, March 1998.

[7] H. Guo, A. Rangarajan, S. Joshi, and L. Younes, “Non-rigid registrationof shapes via diffeomorphic point matching,” in Proceedings of Interna-tional Symposium on Biomedical Imaging: From Nano to Macro, vol. 1,(Arlington, VA, USA), pp. 924–927, IEEE, April 2004.

[8] S. Mann and R. W. Picard, “Video orbits of the projective group a simpleapproach to featureless estimation of parameters,” IEEE Transactions onImage Processing, vol. 6, pp. 1281–1295, September 1997.

[9] M. S. Hansen, M. F. Hansen, and R. Larsen, “Diffeomorphic statisticaldeformation models,” in Proceedings of International Conference onComputer Vision, (Rio de Janeiro, Brazil), pp. 1–8, IEEE, October 2007.

[10] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”International Journal of Computer Vision, vol. 60, pp. 91–110, Novem-ber 2004.

[11] S. Kaneko, Y. Satohb, and S. Igarashi, “Using selective correlationcoefficient for robust image registration,” Pattern Recognition, vol. 36,pp. 1165–1173, May 2003.

[12] K. M. Simonson, S. M. Drescher, and F. R. Tanner, “A statistics-basedapproach to binary image registration with uncertainty analysis,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 29,pp. 112–125, January 2007.

[13] J. Nemeth, C. Domokos, and Z. Kato, “Recovering planar homographiesbetween 2D shapes,” in Proceedings of International Conference onComputer Vision, (Kyoto, Japan), pp. 2170–2176, IEEE, September2009.

[14] J. Nemeth, C. Domokos, and Z. Kato, “Nonlinear registration of binaryshapes,” in Proceedings of International Conference on Image Process-ing, (Cairo, Egypt), pp. 1001–1004, IEEE, November 2009.

[15] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinearparameters,” SIAM Journal on Applied Mathematics, vol. 11, no. 2,pp. 431–441, 1963.

[16] J. M. Francos, R. Hagege, and B. Friedlander, “Estimation of mul-tidimensional homeomorphisms for object recognition in noisy en-vironments,” in Proceedings of Conference on Signals, Systems andComputers, vol. 2, (Pacific Grove, CA, USA), pp. 1615–1619, November2003.

[17] V. Lepetit and P. Fua, “Keypoint recognition using randomized trees,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 28, pp. 1465–1479, September 2006.

[18] M. P. Kumar, S. Kuthirummal, C. V. Jawahar, and P. J. Narayanan, “Pla-nar homography from Fourier domain representation,” in Proceedingsof International Conference on Signal Processing and Communications,(Bangalore, India), pp. 560–564, IEEE, December 2004.

[19] M. Nielsen, P. Johansen, A. D. Jackson, B. Lautrup, and S. Hauberg,“Brownian warps for non-rigid registration,” Journal of MathematicalImaging and Vision, vol. 31, pp. 221–231, July 2008.

[20] A. Shekhovtsov, I. Kovtun, and V. Hlavac, “Efficient MRF deformationmodel for non-rigid image matching,” Computer Vision and ImageUnderstanding, vol. 112, pp. 91–99, October 2008.

[21] S. Klein, J. P. Pluim, M. Staring, and M. A. Viergever, “Adaptive stochas-tic gradient descent optimisation for image registration,” InternationalJournal of Computer Vision, vol. 81, pp. 227–239, March 2009.

[22] S. Marsland and C. Twining, “Constructing diffeomorphic representa-tions for the groupwise analysis of nonrigid registrations of medicalimages,” IEEE Transactions on Medical Imaging, vol. 23, pp. 1006–1020, August 2004.

[23] V. Arsigny, O. Commowick, N. Ayache, and X. Pennec, “A fast and log-euclidean polyaffine framework for locally linear registration,” Journalof Mathematical Imaging and Vision, vol. 33, pp. 222–238, February2009.

[24] C. Domokos and Z. Kato, “Parametric estimation of affine deformationsof planar shapes,” Pattern Recognition, vol. 43, pp. 569–578, March2010.

[25] T. Suk and J. Flusser, “Affine normalization of symmetric objects,”in Proceedings of International Conference on Advanced Concepts forIntelligent Vision Systems (J. Blanc-Talon, W. Philips, D. Popescu, andP. Scheunders, eds.), vol. 3708 of Lecture Notes in Computer Science,(Antwerp, Belgium), pp. 100–107, Springer, September 2005.

[26] P. Jain and C. V. Jawahar, “Homography estimation from planarcontours,” in Proceedings of International Symposium on 3D DataProcessing, Visualization, and Transmission, (Chapel Hill, NC, USA),pp. 877–884, June 2006.

[27] J. Wang and Y. Liu, “Characteristic line of planar homography matrixand its applications in camera calibration,” in Proceedings of Interna-tional Conference on Pattern Recognition, vol. 1, (Hong-Kong, China),pp. 147–150, IEEE, August 2006.

[28] A. M. Bronstein, M. M. Bronstein, A. M. Bruckstein, and R. Kimmel,“Analysis of two-dimensional non-rigid shapes,” International Journalof Computer Vision, vol. 78, pp. 67–88, June 2008.

[29] H. D. Tagare, D. Groisser, and O. Skrinjar, “Symmetric non-rigid reg-istration: A geometric theory and some numerical techniques,” Journalof Mathematical Imaging and Vision, vol. 34, pp. 61–88, May 2009.

[30] A. Yezzi, L. Zollei, and T. Kapurz, “A variational framework for jointsegmentation and registration,” in Proceedings of IEEE Workshop onMathematical Methods in Biomedical Image Analysis, (Kauai, HI, USA),pp. 44–51, IEEE, Dec. 2001.

[31] K. M. Simonson, S. M. Drescher, and F. R. Tanner, “A statistics-basedapproach to binary image registration with uncertainty analysis,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 29,pp. 112–125, January 2007.

[32] S. Worz and K. Rohr, “Physics-based elastic registration using non-radial basis functions and including landmark localization uncertainties,”Computer Vision Image Understanding, vol. 111, no. 3, pp. 263–274,2008.

[33] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robustfeatures (SURF),” Computer Vision and Image Understanding, vol. 110,pp. 346–359, August 2008.

[34] Z. Tu, S. Zheng, and A. Yuille, “Shape matching and registration bydata-driven EM,” Computer Vision and Image Understanding, vol. 109,pp. 290–304, March 2008.

[35] J. Flusser, T. Suk, and B. Zitova, Moments and Moment Invariants inPattern Recognition. Wiley & Sons, Oct. 2009.

[36] M.-K. Hu, “Visual pattern recognition by moment invariants,” IRETransactions on Information Theory, vol. 8, pp. 179–187, February 1962.

[37] A. Foulonneau, P. Charbonnier, and F. Heitz, “Multi-reference shapepriors for active contours,” International Journal of Computer Vision,vol. 81, pp. 68–81, January 2009.

[38] M. R. Teague, “Image analysis via the general theory of moments,”Journal of the Optical Society of America, vol. 70, pp. 920–930, August1980.

[39] L. van Gool, T. Moons, E. Pauwels, and A. Oosterlinck, “Vision andLie’s approach to invariance,” Image and Vision Computing, vol. 13,pp. 259–277, May 1995.

[40] J. Flusser and T. Suk, “Projective moment invariants,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 26, pp. 1364–1367,October 2004.

[41] J. Flusser and T. Suk, “Pattern recognition by affine moment invariants,”Pattern Recognition, vol. 1, pp. 167–174, January 1993.

[42] J. Flusser, J. Kautsky, and F. Sroubek, “Implicit moment invariants,”International Journal of Computer Vision, vol. 86, pp. 72–86, January2010.

[43] A. Goshtasby, “Registration of images with geometric distortions,” IEEETransactions on Geoscience and Remote Sensing, vol. 26, pp. 60–64,January 1988.

[44] F. L. Bookstein, “Principal warps: Thin-plate splines and the decom-position of deformations,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 11, pp. 567–585, June 1989.

[45] R. Merris, Combinatorics. Wiley-Interscience Series in Discrete Math-ematics and Optimization, John Wiley & Sons, 2 ed., August 2003.

Page 16: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...kato/papers/TPAMI-2010-03-0146.R2_Kato.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5,

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 — [MAY 21, 2012. 16:02] 958

[46] M. Lourakis, “homest: A C/C++ library for robust, non-linear homog-raphy estimation.” software, http://www.ics.forth.gr/∼lourakis/homest/,2008.

[47] C. F. Paulo and P. L. Correia, “Automatic detection and classificationof traffic signs,” in Proceedings of Workshop on Image Analysis forMultimedia Interactive Services (L. O’Conner, ed.), (Santorini, Greece),pp. 11–14, IEEE, June 2007.

[48] M. Downing, P. Undrill, P. Ashcroft, D. Hukins, and J. Hutchison,“Automated femoral measurement in total hip replacement radiographs,”in Proceedings of International Conference on Image Processing and ItsApplications, vol. 2, (Dublin, Ireland), pp. 843–847, IEEE, July 1997.

[49] C. Florea, C. Vertan, and L. Florea, “Logarithmic model-based dynamicrange enhancement of hip X-ray images,” in Proceedings of Interna-tional Conference on Advanced Concepts for Intelligent Vision Systems(J. Blanc-Talon, W. Philips, D. Popescu, and P. Scheunders, eds.),vol. 4678 of Lecture Notes in Computer Science, (Delft, Netherlands),pp. 587–596, Springer, August 2007.

[50] A. Oprea and C. Vertan, “A quantitative evaluation of the hip prosthesissegmentation quality in X-ray images,” in Proceedings of InternationalSymposium on Signals, Circuits and Systems, vol. 1, (Iasi, Romania),pp. 1–4, IEEE, July 2007.

[51] T. Boudier, “The snake plugin for ImageJ.” software, http://www.snv.jussieu.fr/∼wboudier/softs/snake.html.

[52] Z. Zhang, R. Deriche, O. D. Faugeras, and Q. T. Luong, “A robusttechnique for matching two uncalibrated images through the recovery ofthe unknown epipolar geometry,” Artificial Intelligence, vol. 78, pp. 87–119, October 1995.

[53] R. Hartley, “In defense of the eight-point algorithm,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 19, pp. 580–593, June1997.

[54] Y. LeCun and C. Cortes, “The MNIST database of handwritten digits.”database, http://yann.lecun.com/exdb/mnist/.

[55] S. Ghose, A. Oliver, R. Marti, X. Llado, J. Freixenet, J. C.Vilanova,and F. Meriaudeau, “Prostate segmentation with texture enhanced activeappearance model,” in Proceedings of International Conference onSignal-Image Technology and Internet Based Systems, (Kuala Lumpur,Malaysia), pp. 18–22, IEEE, Dec. 2010.

[56] J. Mitra, A. O. R. Marti, X. Llado, J. C.Vilanova, and F. Meriaudeau,“A thin-plate spline based multimodal prostate registration with opti-mal correspondences,” in Proceeings of International Conference onSignal-Image Technology and Internet Based Systems, (Kuala Lumpur,Malaysia), pp. 7–11, IEEE, Dec. 2010.

Csaba Domokos (SM07) received the MS de-gree in computer science and the MS degreein Mathematics from the University of Szeged,Hungary in 2006 and 2010, respectively. He ob-tained his PhD degree from University of Szegedin 2011. His current research interests includecomputer vision, image based rendering, imageregistration, image transformations and shapematching.

Jozsef Nemeth received the MS degree in com-puter science from the University of Szeged,Hungary in 2009. Then he started his PhD stud-ies at the Department of Image Processing andComputer Graphics, University of Szeged. Hiscurrent research interests include image seg-mentation, image registration, image transfor-mations, shape matching and handwritten sig-nature verification.

Zoltan Kato received the BS and MS degreesin computer science from the Jozsef Attila Uni-versity, Szeged, Hungary in 1988 and 1990, andthe PhD degree from University of Nice doinghis research at INRIA – Sophia Antipolis, Francein 1994. Since then, he has been a visitingresearch associate at the Computer Science De-partment of the Hong Kong University of Science& Technology; an ERCIM postdoc fellow at CWI,Amsterdam; and a visiting fellow at the Schoolof Computing, National University of Singapore.

In 2002, he joined the Institute of Informatics, University of Szeged,Hungary, where he is heading the Department of Image Processing andComputer Graphics. His research interests include image segmentation,statistical image models, Markov random fields, color, texture, motion,shape modeling, variational and level set methods. He is a SeniorMember of IEEE and President of the Hungarian Association for ImageAnalysis and Pattern Recognition.