Automatic Construction of 3D Parts+Geometry Models · DECLARATION This thesis was prepared at the...

107
AUTOMATIC CONSTRUCTION OF 3 D PARTS+GEOMETRY MODELS Medical Image Registration In Extreme Cases kasper korsholm marstal IMM-M.Sc.-2013-???? Department of Applied Mathematics and Computer Science DTU Compute Technical University of Denmark 26 July 2013 1st edition

Transcript of Automatic Construction of 3D Parts+Geometry Models · DECLARATION This thesis was prepared at the...

A U T O M AT I C C O N S T R U C T I O N O F 3 D PA RT S + G E O M E T RY M O D E L S

Medical Image Registration In Extreme Cases

kasper korsholm marstal

IMM-M.Sc.-2013-????

Department of Applied Mathematics and Computer ScienceDTU Compute

Technical University of Denmark

26 July 2013 – 1st edition

Kasper Korsholm Marstal: Automatic Construction of 3D Parts+GeometryModels, Medical Image Registration In Extreme Cases, c© 26 July 2013

Handmade using environmentally friendly code.

D E C L A R AT I O N

This thesis was prepared at the department of Informatics and Math-ematical Modelling at the Technical University of Denmark in fulfil-ment of the requirements for acquiring an M.Sc. in Medicin & Tech-nology.

Kgs. Lyngby, 26-July-2013

Kasper Korsholm Marstal(s112392)

A B S T R A C T

Image registration is a key algorithmic component to statistical analy-sis and machine learning in medical image processing. However, dueto variability of shape and appearance of complex structures in im-ages algorithms may reach different local minima more or less closeto the optimal solution without proper initialization. Today, an affinetransformation is commonly used as inititalization to mitigate theseissues. Affine initialization may recover global, translation, orienta-tion and scale but may be insufficient in regions where local deforma-tion is required. A novel and fairly sophisticated method of obtain-ing initialization is the Parts+Geometry Model (PGM). In this thesiswe extend the work of Zhang et al. on 2D Parts+Geometry Modelsto 3D images. A Parts+Geometry learns the spatial relationship be-tween anatomical landmarks in images to constrain a local deforma-tion model and find a sparse set of corresponding landmarks in con-sistent geometric configurations. The proposed method is comparedto affine initialization and the effect on groupwise non-rigid registra-tion on several medical data sets of varying difficulties. However, theproposed method improved registration in only a few cases and moredevelopment is needed for the algorithm to be generally applicable.

R E S U M E ( D A N I S H A B S T R A C T )

Billedregistrering er nødvendig for at sammenligne information fraforskellige medicinske billeder. Store forskelle mellem strukturer, or-ganer og væv forekommer problematiserer proceduren og kan med-føre forfejlet registrering. I dag anvendes ofte lineære transforma-tioner, som et foreløbigt skridt, for at bedre at tilpasse objekter tilforskellige i billeder inden den egentlige registreingsprocess. Lineæretransformationer transformere hele billedet på én gang og kan der-for ikke kompensere for flere objekter i samme billede på sammetid. Til dette formål er parts+geometry paradigmet for nyligt blevetfremlagt af Zhang et al. til anvendelse på 2D billeder. Metoden lærerhvordan strukturer i billeder ligger i forhold til hinanden, og brugerdenne information til at finde et sæt af punktvise korrespondencermellem billeder. I denne opgave udvides metoden til 3D billeder.Vores metode sammenlignes med den standard lineære algoritmepå flere datasæt af varierende sværhedsgrad. Desværre forbedrer 3Dparts+geometry metoden kun registreringsresultatet i få tilfælde ogmetoden skal udvikles videre for at kunne regnes for anvendelig.

vii

Whenever I don’t know how to do something I justwrite the interface for it and pass it to a student.

— Tim Cootes

A C K N O W L E D G E M E N T S

First of all, I would like to thank Rasmus Poulsen and Tim Cootesfor giving me the opportunity to come to Manchester to work onthe thesis and for their excellent guidance throughout the process. Aspecial thanks to Tim Cootes for providing much of the existing code,for our weekly meetings and for introducing me to all of his kindscolleagues which made the up-start period very easy. Also thanks toTim Cootes for providing MR images and segmentations of the head,thanks to Xin Cheng for providing CT images and segmentations ofthe wrist, thanks to the Parker Institute for providing MR imagesof knee and manual segmentations cartilage, and many thanks toRasmus Bouert at Frederiksberg Hospital for providing MR imagesof wrists, knees and heel pads.

Many thanks to the Julia, David, Vivian, Elke, Ben, Tom and allthe other friends I made over there who made my stay really great. Iremember popping in in the local pub on my way home after workinglate into the night juuuust in case you were there - and you were!Things like this a stay fabulous!

Thanks to all my friends back in Denmark. Everyone from Bülowsvej,Rysensteen, Ry Højskole and Egmont Kollegiet. University wouldhave been the same without such a great group of friends outsideacademia. Thank you Nabo, your random facebook posts remindedme of home while I was away. And a warm thanks to my friends inphysics at the University of Copenhagen. University would not havebeen the same without such a great group of friends in academia!

Finally, I would like to thank my family. From t = 0 they havegiven me infinite support. Especially my mother who may actuallyhave given me infinity+1 support. The last three weeks of the write-up was done in her summerhouse under her caretaking (and carefulsupervision). I literally had three different kinds of homemade jamson the same day and I doubt the thesis would have been finished intime without the subsequent sugar rush.

ix

C O N T E N T S

i medical image registration 1

1 introduction 3

1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Overview of The Proposed Method . . . . . . . . . . . 5

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Experimental Design . . . . . . . . . . . . . . . . . . . . 6

1.5 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Mathematical Notation . . . . . . . . . . . . . . . . . . . 7

2 literature review 11

2.1 Image Registration . . . . . . . . . . . . . . . . . . . . . 11

2.2 Parts+Geometry Models . . . . . . . . . . . . . . . . . . 15

2.2.1 Parts+Geometry Models For Object Recognition 16

2.2.2 Parts+Geometry Models In Medical Image Anal-ysis . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Mathematical Framework of Image Registration . . . . 18

2.3.1 Pairwise Registration . . . . . . . . . . . . . . . . 18

2.3.2 Groupwise Volumetric Registration . . . . . . . 20

2.3.3 Parts+Geometry Models in 2D . . . . . . . . . . 22

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

ii 3d parts+geometry models 29

3 3d parts+geometry models 31

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Model Building Process . . . . . . . . . . . . . . . . . . 32

3.3 Keypoint Detector . . . . . . . . . . . . . . . . . . . . . . 33

3.3.1 Scale-Space Theory . . . . . . . . . . . . . . . . . 35

3.3.2 Scale-normalized derivative operators . . . . . . 38

3.3.3 Discrete Scale-space . . . . . . . . . . . . . . . . 38

3.3.4 Sub-voxel Keypoint Localization . . . . . . . . . 39

3.4 Keypoint Descriptor . . . . . . . . . . . . . . . . . . . . 40

3.4.1 Mathematical Formulation . . . . . . . . . . . . 41

3.4.2 Similarity Metric . . . . . . . . . . . . . . . . . . 42

3.4.3 Implementation . . . . . . . . . . . . . . . . . . . 42

3.5 Tuning of Hyperparameters . . . . . . . . . . . . . . . . 43

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 experimental design 45

4.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3 Performance Assessment . . . . . . . . . . . . . . . . . . 49

4.4 Affine- And Groupwise Registration Implementation . 49

4.5 Generating Synthetic Anatomical Variation . . . . . . . 50

4.5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . 50

xi

xii contents

5 discussion and conclusion 55

5.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 72

bibliography 75

L I S T O F F I G U R E S

Figure 1 Schematic depiction of the local minima prob-lem in image registration. (a)-(c) shows threesynthetic images each consisting of three smallerspheres at different positions relative to a larger,bounding sphere. (d) is the mean image of (a)-(c). (e) is the mean image after affine initializa-tion. (f) is the mean image after multi-resolutiongroupwise non-rigid registration of images in(e) . The groupwise registration is clearly sus-ceptible to the local minima in which the affineinitialization was trapped. . . . . . . . . . . . . 5

Figure 2 Schematic depiction of geometrical definitionsassociated with an image grid and its voxelsadopted in this work. Sketches adapted fromthe ITK software guide [47]. . . . . . . . . . . . 9

Figure 3 Schematic depiction of a typical pairwise reg-istration system. Sketch inspired by [55]. . . . . 19

Figure 4 Pairwise registration of two MR images of thehead, one in red and one in blue. In (b) corre-spondences turn pink since red and blue con-tribute equally in RGB-space. Images had sim-ilar pose and did not need initialization. . . . . 19

Figure 5 Schematic depiction of a typical groupwise reg-istration system. Sketch inspired by [55]. . . . . 20

Figure 6 Groupwise registration of six images. In (b)correspondences turn grey since all colors con-tribute equally in RGB-space. Images had sim-ilar pose and did not need initialization. . . . . 21

xiii

xiv List of Figures

Figure 7 Schematic depiction of the matching of a modelto a new image. A weightωij is assigned to ev-ery match candidate j of a keypoint i using (6)and the Viberti algorithm finds the most likelyconfiguration of keypoints on the target imagein a maximum-likelihood sense. Arrows de-note optimal correspondences estimated by themodel. On the left, the black dots denote fixedimage keypoints located by the keypoint de-tector and the black lines connecting the dotsdenote their geometrical relationship. On theright, circles denote the neighborhood on themoving image in which match candidates (greycircles) are defined on a regular spaced grid.The double-stroke lines denote the default ge-ometric relationships and dashed lines denotegeometrical relationships estimated by the model. 23

Figure 8 Schematic depiction the model building pro-cess of 3D PGMs. . . . . . . . . . . . . . . . . . 32

Figure 9 Schematic depiction of the basic scale problemas it occurs in edge detection. The dots are greylevel values, the dotted line represents the gra-dient on a fine scale, the dashed line representsthe gradient on a medium scale and the solidline represents the gradient on coarse scale.Sketch adapted from [12]. . . . . . . . . . . . . 35

Figure 10 Schematic depiction of non-creation and non-enhancement of local extrema in the one-dimensionalcase. Sketch adapted from [12]. . . . . . . . . . 37

Figure 11 Schematic depiction of scale-space pyramids.Sketches adpated from [12] . . . . . . . . . . . 40

Figure 12 Detected keypoints on a MRI of the head. . . . 40

Figure 13 An example of a spinimage from an MRI of thehead. (a)-(c) shows the gradient direction andmagnitude projected onto the axial, coronal-and sagittal plane respectively. (d) is the spinimage at this position. . . . . . . . . . . . . . . 42

List of Figures xv

Figure 13 Example images from data sets and associatedmanual segmentations used in this work. Leftcolumn shows axial view, center column showssagittal view and right column shows coronalview. Name of the dataset used to identify thedataset throughout this section is given in mono-spaced

text. (a)-(c) spheres are synthetic images usedto illustrate the problem of local minima in theintroduction, (d)-(f) cmamri are T1-weighted MRIsof the head, (g)-(h) wristct are CT images ofthe wrist, (j)-(l) heelpadmri are T1-weightedMRIs of the heelpad of healthy subjects, (m)-(o) kneemri are T2-weighted MRIs of knees ofOsteoarthritic patients, (p)-(r) wristmri are T2-weighted MRIs of the wrist of healthy subjectsprovided by the radiology departmant at Fred-eriksberg Hospital and (s)-(u) are T2-weightedMRIs of knees of severely osteoarthritic patients. 48

Figure 14 First three eigenmodes of variation for 18 T1-weighted MR images of heads. The first row(a)-(c) corresponds to the first eigenmode ofvariation, the second row (d)-(f) to the secondmode of variation and the third row (g)-(i) tothe third mode of variation seen from axial,sagittal and coronal views respectively. The greenshape corresponds the mean shape, the red shapecorresponds to mean shape plus one standarddeviation and the blue shape to mean shapeminus one standard deviations of the associ-ated mode of variation. The first mode of varia-tion seems to govern the size of the heads. Thesecond mode seems to govern stretching of theskull. The third mode seems to govern shapeof forehead and placement of eyes (thereforeimage (i) shows a slice containing eyes as op-posed to the central slice). . . . . . . . . . . . . 53

Figure 15 Experiment 1 results. Affine and groupwise reg-istration of original data. First column showsmean images before registration, second col-umn shows images after affine initialization,third column shows groupwise registration ofimages without initialization and fourth col-umn shows groupwise registration of imagesthat has been initialized with an affine trans-form. . . . . . . . . . . . . . . . . . . . . . . . . 56

xvi List of Figures

Figure 16 Experiment 2. DSC scores and error bars ofaffine initialization plotted against 3D PGM ini-tialization. Error bars denote the one standarddeviation mentioned in the table. . . . . . . . . 59

Figure 17 Experiment 2 results. PGM and groupwise reg-istration of original data. First column showsmean images before registration, second col-umn shows mean images after 3D PGM initial-ization, third column shows mean images after3D PGM+groupwise registration and fourth col-umn shows mean images after affine+3D PGM+groupwiseregistration. . . . . . . . . . . . . . . . . . . . . 60

Figure 18 DSC scores of groupwise registration vs affine+groupwiseregistration of all data sets for η = [1, 2, 3, 4].Different levels of noise are encoded in colors(black, red, green and blue respectively) anddifferent data sets are encoded in markers. . . 61

Figure 19 Experiment 3. DSC scores and error bars ofgroupwise registration plotted against affine+groupwiseregistrationat minimum and maximum noiselevel. Errorbars denote one standard deviation. Only fivedata sets are shown since registration of wristmrifailed in this case. . . . . . . . . . . . . . . . . . 64

Figure 20 Experiment 3 results. Affine and groupwise reg-istration of synthetic images. First column showsmean images before registration, second col-umn shows images after affine initialization,third column shows groupwise registration ofimages without initialization and fourth col-umn shows groupwise registration of imagesthat has been initialized with an affine trans-form. For each data set there are two rows. Thefirst shows images for η = 1 and the secondone shows images for η = 4. . . . . . . . . . . . 66

Figure 21 DSC scores of affine+groupwise registration vsaffine+3D PGM+groupwise registration regis-tration of all data sets for η = [1, 2, 3, 4]. Differ-ent levels of noise are encoded in colors (black,red, green and blue respectively) and differentdata sets are encoded in different markers. . . 67

List of Figures xvii

Figure 22 Experiment 4 results. Affine+PGM+groupwiseregistration of synthetic images. First columnshows mean images before registration withη = 1, second column shows images after reg-istration with η = 1, third column shows meanimages images before registration using η = 4

and fourth column shows mean images afterregistration using η = 4. Images of affine+groupwiseregistration is given in figure 20. . . . . . . . . 70

L I S T O F TA B L E S

Table 1 Existing registration approaches covered in thissection. Methods are sorted on transformationmodel, then on cost function and finally onoptimization strategy. The different categoriesare explained elsewhere in this section. Morethan one checkmark in a category indicate thanboth approaches was tested. In this work, themethod of [55] is used for affine initializationand the method of [72] is used for groupwiseregistration. The methods are publicly avail-able through the elastix registration toolbox[55]. . . . . . . . . . . . . . . . . . . . . . . . . . 12

Table 2 Existing parts+geometry approaches coveredin this section. Methods are sorted on type ofgraph and then on method of training. Thedifferent categories are explained elsewhere inthis section. More than one checkmark in acatagory indicate than both approaches wastested. In this work, the method of [103] servesas the basis for extension of PGMs to 3D. . . . 15

Table 3 Experiment 1 results. DSC ± std. of affine,-groupwise- and affine+groupwise registrationof original data. Baseline denotes unregisteredimages. . . . . . . . . . . . . . . . . . . . . . . . 55

Table 4 Experiment 2 results. DSC ± std. of 3D PGM-, affine+3D PGM-, 3D PGM+groupwise- andaffine+3D PGM+groupwise registration of orig-inal data. The baseline is the same as in exper-iment 1 and included here for convenience. . . 57

Table 5 Experiment 3 results. Affine- and affine+groupwiseregistration of synthetic data. . . . . . . . . . . 63

Table 6 Experiment 4 results. PGM and affine+pgm+groupwiseregistration of synthetic data. . . . . . . . . . . 69

xviii

L I S T O F A L G O R I T H M S

1 Genetic algorithm pseudo code. P is the set of key-points on reference image which is assumed to havebeen computed in advance, I is the set of images, Gis the collection of randomly sampled parts+geometrymodels and Gt is a particular parts+geometry model.getGraphMatch() uses the Viberti-algorithm to choosethe global optimal combination of match candidates thatminimize (3). . . . . . . . . . . . . . . . . . . . . . . . . . 26

xix

A C R O N Y M S

AAM Active Appearance Model

ASM Active Shape Model

CT Computed Tomography

DP Dynamic Programming

MRI Magnetic Resonance Imaging

PCA Principal Component Analysis

PET Positron Emission Tomography

PGM Parts+Geometry Model

SPECT Single Photon Emission Computed Tomography

TPS Thin Plate Spline

xx

Part I

M E D I C A L I M A G E R E G I S T R AT I O N

Image registration is a key algorithmic component to sta-tistical analysis and machine learning in medical imageprocessing. However, obtaining anatomical correspondenceacross images with large pose differences and/or similarsubstructures requires careful initialization. Part I gives anoverview of existing methods and highlights significantchallenges often encountered in clinical environments. Thepart is concluded with a thorough treatment of 2D Parts+Geometrymodels of Zhang et al. on which this thesis is based.

1I N T R O D U C T I O N

In the past two decades radiological sciences have witnessed a revo-lutionary progress in computer assisted-diagnosis (CAD). The con-tinuing developments of imaging methods has inspired new com-puterized methods for image reconstruction, processing and analysisof medical images that facilitate non-invasive assessment of in-vivopathologies.

Naturally, there is a widespread interest in relating information indifferent images for diagnosis and treatment [46]. Image registrationis inevitable whenever images acquired from different subjects, at dif-ferent time points or from different scanners need to be combined orcompared for analysis or visualization. Image registration is thereforea prerequisite for a wide range of medical image analysis tasks includ-ing automatic delineation and measurement of healthy anatomy andanomalies, model-based statistical analysis of biomarkers for monitor-ing disease progression and semantic navigation and visualization.The registration goal - transforming images into a common coordi-nate system so corresponding pixels represent homologous biologi-cal points - is reached by specifying a mathematical model of imagetransformations and then determining model parameters of the de-sired alignment. The parameters of the model provide the space ofpossible solutions and reduces the registration procedure to an op-timization problem. Registration techniques undergo continuous de-velopment and has been an active research topic in the past decade.Example from the literature include automatic construction of activeappearance models (AAMs) [9] and active shape models (ASMs) [11],neuroanatomical and functional analysis [29], automatic segmenta-tion [60], motion correction of functional Magnetic Resonance Imag-ing (fMRI) data [72] etc. The literature is extensive and many meth-ods work well in controlled environments, see [86, 85, 10, 19, 55, 72]to mention just a few.

However, difficult data sets are sometimes encountered in clinicalpractice. Without proper initialization algorithms may reach differ-ent local minima more or less close to the optimal solution. Thismay be due to pathology, variability of shape and appearance ofcomplex structures, inter- and intra- subject pose differences and/orself-similar substructures. A preliminary initialization step is usuallyapplied to mitigate the problem of local minima in the optimizationprocess.

Today, an affine transformation is commonly used as initialization.Affine initialization may recover global, translation, orientation and

3

4 introduction

scale but may be insufficient in regions where local deformation isrequired. This is illustrated in figure 1 where a sub-optimal solutionis found because the initialization algorithm ends up in a local min-ima. In figure 1(d) the Dice Similarity Coefficient (DSC) is 0.33 forthe small spheres. In 1(e) the alignment of outer spheres outweighsthe cost of mis-alignment of small spheres which cause them to settlein suboptimal positions. In 1(f) the groupwise non-rigid registrationalgorithm subsequently applied [55] obtains a DSC of 0.46± 0.3 andfails to recover the global optimum due to bad initialization (DSC isdefined on the range [0, 1] with 1 being complete overlap, see section4.3 for details). This is a general problem on difficult data sets andmotivates an automatic method of obtaining robust non-rigid initial-ization across groups of images.

A novel and fairly sophisticated method of obtaining initialization- of which this thesis is dedicated to - is the Parts+Geometry Model(PGM) [106]. A PGM learns the spatial relationship between anatom-ical landmarks in images to constrain a local deformation model andfind a sparse set of corresponding landmarks in consistent geometricconfigurations. This paradigm has been shown to significantly out-perform standard affine initialization on three challenging data setsin 2D, in all likelyhood due to the ability to deal with self-similarparts in complex arrangements. In this work we aim to extend PGMsfrom 2D to 3D which may potentially provide robust initialization to3D images where the standard affine approach is inadequate. Robustinitialization will then be available to data acquired by 3D imagingsystems such as Magneric Resonance (MR) and Computed Tomogra-phy (CT).

The remainder of the chapter presents an overview of the structureof the thesis, the proposed method and results. Details are left forrelevant chapters.

1.1 objectives

This thesis deals with a core problem within medical image analysis,namely obtaining robust anatomical correspondence across groups ofimages. In this work we aim to establish an initial deformation fieldon ill-posed data sets so that dense correspondence can be subse-quently obtained using off-the-shelf algorithms. Specifically, we wishto derive a Parts+Geometry method in 3D for initializing groupwisenon-rigid registration methods. The method should be fully auto-matic. The following main objectives are set forth,

(i) Design a generic extension to 3D.

(ii) Evaluate the 3D Parts+Geometry method on several medicaldata sets.

1.2 overview of the proposed method 5

(a) (b) (c)

(d) (e) (f)

Figure 1: Schematic depiction of the local minima problem in image registra-tion. (a)-(c) shows three synthetic images each consisting of threesmaller spheres at different positions relative to a larger, boundingsphere. (d) is the mean image of (a)-(c). (e) is the mean image af-ter affine initialization. (f) is the mean image after multi-resolutiongroupwise non-rigid registration of images in (e) . The groupwiseregistration is clearly susceptible to the local minima in which theaffine initialization was trapped.

It is emphasized that perfect correspondences are not sought - justenough good ones to sufficiently align images for subsequent volu-metric registration.

1.2 overview of the proposed method

A part+geometry model represent an object by an undirected graph.The vertices are defined by their positions and associated neighbor-hood descriptors (parts). The edges are defined by a set of arcs thatdescribes the pairwise geometric relationship between vertices (ge-ometry). 3D parts+geometry models are constructed to estimate asparse set of groupwise correspondences. The method is inspiredby the 2D PGM approach of Zhang et al. that more or less solvesthe groupwise initialization problem in 2D. However, the problem ismore ill-posed in three dimensions because extra degrees of freedomintroduces a higher-dimensional solution space which significantlycomplicates the model building- and fitting process. The key ideas ofZhang et al. extends naturally to 3D although the extra degrees offreedom and computational complexity dictate alternate approachesin some areas.

To obtain the desired initial deformation field in 3D the followingsteps are performed.

6 introduction

(i) A pool of distinctive anatomical keypoints are automatically lo-cated on a reference image using a scale-space approach. Eachkeypoint is defined by a pose (position, scale and orientation)and a signature that encodes local neighborhood appearance.Spin images are used for this purpose since they are rotationallyinvariant with respect to local geometry which addresses someof the extra degrees of freedom in 3D.

(ii) A genetic algorithm searches the combinatorial space of key-points. For each combination, a Parts+Geometry Model is boot-strapped on a reference image and used to search for matchcandidates across the image set. A new Parts+Geometry Modelis build from the best matches on the other images and used toestimate the correspondences across the other images.

(iii) Models are ranked by a groupwise image similarity measureand the best model is used to estimate final correspondences.

1.3 contributions

This thesis describes a complete system for constructing 3D parts+geometrymodels. Specifically, the contributions include

(i) A algorithm for unsupervised learning of parts+geometry mod-els in 3D.

(ii) Application of the proposed system to a wide range of data setsof varying difficulty.

Our implementation of 3D parts+geometry models improves reg-istration in a few cases but fails to recover useful information in themajority of experiments. In its current form, initialization with 3DPGMs should be applied with caution.

1.4 experimental design

The performance of the proposed system is demonstrated on 6 medi-cal data sets of various difficulties.

(i) T1-weighted MRIs of the head from 18 healthy subjects. Evalu-ated using manual segmentations of white matter, grey matterand the brain stem.

(ii) T2-weighted MRIs of the knees from 11 healthy subjects. Evalu-ated using manual segmentations of femur and tibia.

(iii) T1-weighted MRIs of the heel pads from 7 healthy subjects. Eval-uated using manual segmentations of cuboid and calcaneus.

1.5 outline of thesis 7

(iv) CT images of wrists from 24 healthy subjects. Evaluated us-ing segmentations of ulna, radius, lunate, scaphoid, triquetrum,pisiform, trapezium, trapezoid, capitate and hamate bones.

(v) T1-weighted MRIs of the hands from 11 healthy subjects. Eval-uated using manual segmentations of the second, third andfourth metacarpal.

(vi) T2-weighted MRIs of the knees from 50 subjects with varyingdegrees of Osteoarthritis (OA). Evaluated using manual seg-mentations of knee articular cartilage.

An extensive test suite was developed to evaluate the performanceof the proposed method. All of the datasets above and associatedmanual segmentations are registered and the DSC is used to compareaffine-, groupwise- and PGM registration and various combinationsthereof. Further, models of anatomical variability are built from orig-inal data and used to generate synthetic images with user-definedlevels of random anatomical variations. This facilitates evaluation ofregistration performance in extreme cases.

1.5 outline of thesis

The thesis is structured into four chapters.

Chapter I: Introduction (this chapter) outlines motivation, the proposedmethod and thesis.

Chapter II: Literature Review introduces previous work on image regis-tration and parts+geometry models. This chapter also outlinesseries of papers on 2D Parts+Geometry models by Zhang et al.on which this thesis is based.

Chapter III: 3D Parts+Geometry Models introduces the main contribu-tions of this work, namely extending 2D PGMs into the 3D do-main.

Chapter IV: Experimental Design outlines the experiments, noise modeland libraries used for affine- and groupwise registration.

Chapter V: Discussion and Conclusion evaluates the performance of 3DPGMs on several data sets.

1.6 mathematical notation

To ease reading and understanding the notation used in this work isenumerated below.

8 introduction

Vectors are column vectors and typeset in non-italic lowercase bold-face. Inline vectors use commas to seperate elements,

v = [a,b, c,d]T

Vector functions are typeset in non-italic lowercase boldface,

f(v) = v + v

Special functions such as a part+geometry model or a cost function istypeset in calligraphic font, e.g. G and C.

Matrices are typeset in non-italic boldface capitals,

M =

[a b

c d

]

The dot-product operator is typeset as,

a · b =∑i

aibi

Sets are typeset as,

α β γ ∈ S

Unit vectors are typset as,

1 = [1, . . . , 1]T

An image is defined on the d-dimensional discrete domain Ωd whosepixel values exist as Dirac delta functions located at the pixelcentre,

I : x ∈ Ωd× 7→ R+ (1)

where x denotes a position on the grid and I(x) denotes a valueof image I at position x. The image origin is associated withthe coordinates of the first pixel in the image (see 2a). A voxel(volume element) is the three dimensional generalization of apixel and is considered to be the cubic region surrounding thecentre holding the data value (see figure 2b).

1.6 mathematical notation 9

(x0,y0)

hx

hy

(a) Schematic depiction of the geometricalconcepts associated with an image grid.In this example the image size is 7× 6,origin is located in (x0,y0) and phys-ical spacing between voxels is hx andhy usually measured in units of mil-limetres.

Image origin

hy

hx

Voronoi regionPixel Coverage

Linear Interpolation regionDelaunay region

Voxel coordinate

(b) Schematic depiction of the geomet-rical concepts associated with avoxel.

Figure 2: Schematic depiction of geometrical definitions associated with animage grid and its voxels adopted in this work. Sketches adaptedfrom the ITK software guide [47].

2L I T E R AT U R E R E V I E W

2.1 image registration

Image registration is a common but difficult problem in medical im-age processing. Numerous issues associated with image acquisitionmay complicate the task: Position of subjects in scanners may differbetween scans, images may be acquired by different scanners, stemfrom different subjects, time points or viewpoints, contain noise andscanner-induced artefacts, all of which are factors that add to com-plexity of the registration problem [44]. Obtaining robust correspon-dences are important in many CAD applications since machine learn-ing methods rise and fall with the quality of available training data[44].

A common approach has been to manually segment groups of im-ages. While manual labelling is a time-consuming but feasible solu-tion for 2D models with a limited number of landmarks, it is highlyimpractical in the 3D domain. Not only is the required number oflandmarks higher than in the 2D case, but it also becomes increas-ingly difficult to identify and pinpoint corresponding points, evenfor experts [44]. Manual labelling inevitably introduces bias in land-mark selection and resulting correspondences which may lead toweak models [22]. In the past decade a significant amount of researchhas focused on methods for automatically aligning images to reducetime-consuming and error prone human interaction [103]. In this sec-tion a selection of existing techniques are covered. The methods aredistinguished by the basic components of a registration approach: thetransformation model, cost function, and optimization strategy.

Several transformation models can be used to represent the defor-mation field. We discriminate between rigid transforms (e.g. similar-ity or affine transform [55]) and non-rigid transforms (e.g. piece-wiseaffine warps [19], fluid deformation fields [50] or free-form deforma-tion B-splines [86, 72]). The cost function is typically either intensity-based, e.g. [14, 10], or landmark based, e.g. [70, 4]. Some methodsemploy a deformation regularisation term in the cost function whichincrease registration accuracy in some cases [90, 19]. Finally, we distin-guish between pairwise and groupwise approaches. Pairwise meth-ods matches a moving image to a fixed image while the groupwiseapproach performs this optimization for all images simultaneouslyand thus avoids introducing bias towards a chosen reference image.This is desirable for subsequent statistical analysis.

11

12 literature review

Method Dim Transformation Model Cost Function Optimization Strategy

Rigid Non-rigid Landmark-based Intensity-based Pairwise Groupwise

Bhatia et al. [72] 3D X X X

Balci et al. [10] 3D X X X

Cootes et al. [19] 2D and 3D X X X

Wu et al. [101] 3D X X X

Metz et al. [72] 4D (3D+t) X X X

Jones et al. [50] 2D X X X

Bookstein et al. [15] 2D X X X

Arun et al. [4] 3D X X X

Evans et al. [29] 3D X X X

Mardia et al. [70] 3D X X X

Rohr et al. [84] 3D X X X

Johnson et al. [49] 2D X X X X

Hellier et al. [45] 2D X X X X

Hartkens et al. [43] 2D X X X X

Miller et al. [73] 2D X X X

Zollei et al. [108] 3D X X X

Pennec et al. [75] 3D X X X X

Jenkinson et al. [48] 3D X X X

Rueckert et al. [86] 3D X X X X

Schnabel et al. [86] 3D X X X X

Klein et al. [55] 2D and 3D X X X X X X

Table 1: Existing registration approaches covered in this section. Methodsare sorted on transformation model, then on cost function andfinally on optimization strategy. The different categories are ex-plained elsewhere in this section. More than one checkmark in acategory indicate than both approaches was tested. In this work,the method of [55] is used for affine initialization and the methodof [72] is used for groupwise registration. The methods are publiclyavailable through the elastix registration toolbox [55].

Details about catogarizations and methods are outlined below. Anoverview is given in Table 1. Note that both intensity- and landmark-based cost functions are mentioned here since PGMs can viewed asa hybrid approach that combines intensity- with landmark-based in-formation (using the landmark information as a regularization term).

Early registration methods assumes that between image acquisi-tions, the anatomical and pathological structures of interest do notdeform or distort, e.g. [48]. This rigid body assumption simplifiesthe registration procedure but techniques that make this assumptionhave limited applicability. Many organs deform substantially, for ex-ample with the cardiac or respiratory cycles or as a result of changein position. Further, imaging equipment is imperfect so regardless ofthe organ being imaged the rigid body assumption can be violatedas a result of scanner-induced geometrical distortions that differ be-tween images [46]. Therefore, today they are mainly used in caseswhere the imaged objects are known to be rigid, e.g. intra-subject reg-istration of bone, or for initialization. This is due to the few degreesof freedom (rotation, scaling, shearing, translation) which can com-pensate for global affine pose differences. A non-rigid method is thenapplied that can compensate for local tissue deformation. In the pastdecade there has been substantial progress in non-rigid registration

2.1 image registration 13

approaches. Below the developments most relevant to this work isoutlined.

Non-rigid registration methods are capable of aligning images wherecorrespondence cannot be achieved without localized deformations.They are most popular in the literature, generally fully automaticand yield good results in controlled environments [34, 54]. A pop-ular approach was pioneered by Rueckert et al. [86] who modelledlocal breast motion by a free-form deformation (FFD) based on B-splines. The algorithm was applied to the full-automatic registrationof 3D breast MRI. [88] generalized this method and combined multi-resolution optimization with free-form deformations (FFDs) based onmulti-level B-splines. B-splines are frequently encountered in state-of-the-art methods, e.g. [14, 10, 72], although alternatives exist. D’Agostinoet al. [21] proposed to model image warps as a viscous fluid that de-forms under the influence of forces derived from the gradient registra-tion criterion. The method was evaluated for non-rigid inter-subjectregistration of MR brain images. Miller et al. [74] developed the math-ematical metric theory of diffeomorphisms that has also been pickedup by others, e.g. [51].

However, non-rigid registration methods are typically compute in-tensive and there is no guarantee of one-to-one correspondence forimportant anatomical locations. Convergence is particularly elusivein cases of self-similar structures where many local minima are present(see figure 1). Landmark-based methods try to mitigate this issue bynaturally incorporating expert knowledge in the registration scheme,either via manual labelling or by a carefully designed feature extrac-tor. Early work concentrated on (i) selecting the corresponding land-marks manually and (ii) using an interpolating transformation model,e.g. [15, 29, 70]. The typical approach employ thin-plate splines warpsthat takes into account localization errors by individually weight-ing the landmarks according to their localization uncertainty [84]. Adrawback of these methods is that landmark extraction is prone toerror and that intensities of the images are neglected once landmarkshas been identified.

Groupwise registration methods try to mitigate uncertainties asso-ciated with any one image by simultaneously registering all imagesin a population. A common approach is to calculate the group meanimage and then register all images towards the this image, see e.g.[73, 85, 14, 108, 18, 10, 19, 72, 89]. Such groupwise approaches incor-porate as much image information in registration process as possi-ble which eliminates bias towards a chosen reference frame. Milleret al. [73] was among the first to introduce an efficient groupwiseregistration method in which they consider the sum of univariate en-tropies along pixel stacks as a joint alignment criterion. Zollei et al.[108] successfully applied this method to groupwise registration ofmedical images using affine transforms and used a stochastic gradi-

14 literature review

ent descent algorithm for optimization. The method was extendedby Balci et al. [10] to include a free-form deformation model basedon B-splines. Bhatia et al. [14] use a gradient projection method un-der the constraint that the average deformation over images is zeroto define an implicit frame of reference. This constraint ensures thatthe reference frame lie in the center of the population without everhaving to calculate it explicitly. The idea has been picked up by othermethods, e.g. [10, 72]. Metz et al. [72] proposed a method for motionestimation in dynamic medical imaging data. The method is basedon a 3D (2D+time) or 4D (3D+time) free-form B-spline deformationmodel and a similarity metric that minimizes variance of intensities.The software is publicly available as an extension to the registrationpackage elastix and used in this work to perform groupwise regis-tration.

State-of-the-art non-rigid groupwise registration methods are ableregister most well-posed data sets. Significant pose differences mayundermine performance however, at least in part due to a fuzzy meanimage in the beginning of the registration procedure. Therefore, algo-rithms may fail without good initialization on challenging data. Al-though simple affine initialization may work for images of simplestructures [19], it can fail hopelessly when registering images of com-plex structures that contain many local minima [103].

A number of authors [29, 62, 77, 2] have noticed the limitation ofglobal affine transformation to register complex objects. Some meth-ods have been introduced to tackle this problem, e.g. [100, 101] whoproposed to perform local affine registration independently on a setof parts of the object. The resulting local deformation fields are usedto compose an initial global deformation field. Usually the parts aremanually selected based on some prior (e.g., anatomical structures[62]), but automatic methods also exist, e.g. [77] and [2].

Another solution would be to introduce feature information intothe voxel-based registration algorithms in order to incorporate higherlevel information about the expected deformation. Several authorsrecognize that combining volumetric and landmark registration mayprovide the best of both worlds and have introduced such hybridalgorithms, e.g. [75, 43, 49, 45]. PGMs exploit both volumetric andlandmark-based information by driving an intensity guided modelbuilding process regularized by geometric constraints. PGMs maytherefore be viewed as such a hybrid approach that aims to (i) min-imize geometric distances between image patches (landmark-based)and (ii) minimize similarity between patches (intensity-based) in agraph framework. Zhang et al. introduced PGMs for groupwise ini-tialization and showed that the PGM approach may cope with chal-lenging pose differences and self-similarity in 2D. In this work weinvestigate whether PGMs may also cope with these problems in 3D.

2.2 parts+geometry models 15

Method Dim Type of Graph Training

Pairwise heuristic k-fan Tree-like Full Supervised Automatic

Fischler et al. [36] 2D X X

Fergus et al. [32] 2D X X

Donner et al. [24] 2D X X

Adeshina et al. [1] 2D X X

Fidler et al. [33] 2D X X

Donner et al. [25, 26] 3D X X

Schmidt et al. [87] 3D X X X

Felzenszwalb et al. [30] 2D X X

Zhu et al. [107] 2D X X

Babalola [6, 7] 3D X X

Zhang [103, 105, 104, 106] 3D X X

Weber et al. [98] 2D X X

Fergus et al. [31] 2D X X

Table 2: Existing parts+geometry approaches covered in this section. Meth-ods are sorted on type of graph and then on method of training. Thedifferent categories are explained elsewhere in this section. Morethan one checkmark in a catagory indicate than both approacheswas tested. In this work, the method of [103] serves as the basis forextension of PGMs to 3D.

Before outlining the actual method, existing part+geometry meth-ods also needs introduction. The following section outlines the devel-opment of PGMs in the literature.

2.2 parts+geometry models

A part+geometry model consist of a set of pairwise geometricallyconstrained landmarks and a signature of the local neighborhood foreach landmark. PGMs are popular in the object recognition commu-nity where they were originally introduced and have recently beenapplied to medical image problems.

The model can be learned in either a supervised or unsupervisedfashion. There are a variety of formulations yet all are underpinnedby two fundamentals; (i) during the training phase parts are iden-tified and the geometric relationships between them are configuredand (ii) application of the model to an image involves identifying po-tential match candidates for each part and then choose the best candi-dates in a sense that optimizes both appearance and geometry corre-spondence criterions. We distinguish existing techniques by the basiccomponents of a parts+geometry model: type of graph, optimizationstrategy and method of training. Details about these categorizationsare outlined below. An overview is given in Table 2.

In the following section PGMs are described as they appear inthe object recognition community followed by a review of PGMs inmedical imaging. As the literature is extensive the following sectionsare confined to developments most relevant to this work. A generaloverview of the field can be found in [79].

16 literature review

2.2.1 Parts+Geometry Models For Object Recognition

The original paper [36] introduced PGMs as an image matching al-gorithm that was capable of finding an object embedded in noisy im-ages (e.g. faces of different persons). The main contribution was theidea of explicitly modelling objects by a collection of parts arrangedin deformable configurations. Each part encoded local visual prop-erties of the object and the deformable configuration constituted ajoint probability density function of the spatial relationships betweenthe parts that could be used to identify plausible matches of objectsin images. The ability to deal with self-similar substructures is in alllikelihood due to the combination of local and global optimizationcriterions: Neighborhood descriptors measures the local fit and geo-metrical information regularizes the global geometrical structure oflocal descriptors.

Weber et al. [98] extended the original work by automating theconstruction of the model for category-level object recognition usinga fully connected graph. Useful parts were identified using a key-point detector and k-means clustering. An initial model was foundby a greedy exhaustive search and parameters were inferred by anExpectation-Maximization (EM) algorithm. Fergus et al. [31] furtherdeveloped this work by considering the variation of appearance andlarge scale changes. Later in [32], they introduced a simple star modelto reduce the complexity of the model. A star model only has edgesbetween one vertice that is connected to all others vertices. In addi-tion, the parts were localized using a combination of different featuredetectors (e.g., Kadir and Brady detector [52] and multi-scale Harris[42]) rather than a single one [31].

The graph formulation is particularly suited to combine informa-tion on appearance and spatial configuration but searching an imageusing a fully connected graph was non-trivial due to computationalcomplexity. Felzenszwalb and Huttenlocher [30] addressed the short-comings of the original formulation, most noteably (i) the model hadmany parameters and (ii) the resulting energy minimization problemwas hard to solve efficiently. They estimated model parameters froma set of labelled images using a maximum likelihood formulationand cast the objective function as an acyclic graph (tree-like structure)which allowed for use of existing powerful solvers, i.e. a DynamicProgramming (DP) method. Crandall et al. [20] introduced a k-fanmodel. A k-fan model is the generalization of a star model and hask reference parts that are fully connected while other parts are con-nected to one of the reference parts. To minimize human interventionfrom labelling images, they used an expectation-maximization (EM)algorithm to learn the model. Zhu et al. [107] introduced hierarchi-cal parts+geometry models that are constructed recursively by com-posing simple structures (i.e. edges) to larger and more meaningful

2.2 parts+geometry models 17

parts (i.e. eye contours). Bergholdt et al. used a probabilistic represen-tation based on second-order Markov Random Fields (MRFs). Theprobabilistic graphical model applied randomised classification treesto feature vectors such as SIFT and color features to identify parts.During image search these gave classification scores indicating theprobability that features correspond to a particular part. Geometrywas also modelled using classification trees with normalized edgelengths orientations as inputs.

As the popularity of parts+geometry models grew, a number ofauthors applied PGMs to medical image data some of which are ex-tended to 3D. This development is outlined in the following section.

2.2.2 Parts+Geometry Models In Medical Image Analysis

Donner et al. [24] showed that careful selection of a small number ofparts on a single image can be used to construct a parts+geometrymodel from the whole set, which can lead to good sparse corre-spondences. A similar approach can be found in [1]. Fidler et al.[33] described a method for automatically constructing hierarchicalparts+geometry models from small numbers of training images ofobjects by building up compositional models from commonly occur-ring simple structures. Langs et al. [60, 59] described a method ofconstructing sparse shape models from unlabelled images by findingmultiple interest points. An MDL principle was used to determineoptimal correspondences and find the model which minimizes thedescription length of the feature points. Another related approachwas developed by Karlsson et al. [53], who built patch models tominimize an MDL function and estimating the cost of explaining thewhole of each image using the patches.

Some part-based approaches have been extended to 3D. Schmidt etal. [87] applied the method of [17] to the extraction of the spine from3D MR images. The methods embeds shape knowledge of anatomi-cal structures in a probabilistic graphical model. Donner et al. [25, 26]extended the sparse appearance models of [24] to 3D. Parts and geo-metric relationships were manually identified on the training set. Anelastic model expressed the confidence of locations of parts and theoptimal choice of candidates were obtained using MRF. Adeshina etal. [1] used a PGM to initialize a groupwise registration algorithm.They focused on objects with repeating structures (such as bones inthe hands). Babalola [6] initialized 3D AAMs with a PGM that wasbuilt using the approach of [103].

Despite the many works on PGMs few of them deals with the prob-lem of consistent localization but focused on whether a particularpart was present on an object or not. Zhang et al. explicitly addressedthis problem in a series of papers [103, 105, 104, 106] and showedthat derived models allowed for improved registration and automatic

18 literature review

annotation of medical images in 2D even on challenging data sets.Babalola et al. [7] extended this method to pairwise registration of3D images using PGMs and spin images. Parts-and-geometry mod-els was used to automatically align images before registration andthis initialization procedure was showed to increase overall registra-tion accuracy. The PGM approach has thus proven itself useful forgroupwise initialization in 2D [103] and was successfully extendedto pairwise registration in 3D [7]. It seems obvious to try and extendthese two methods for groupwise registration of 3D images. It maynot be an easy task as illustrated by the lack of 3D methods in Ta-ble 2. This may be due to the additional degrees of freedom in 3Dvolumetric images that makes model building and fitting a challeng-ing procedure. However, this gap in the literature holds potential forimprovement and is the major concern of this thesis.

This concludes the verbal treatment of previous work. In the fol-lowing sections the mathematical frameworks for image registrationand parts+geometry models are introduced.

2.3 mathematical framework of image registration

2.3.1 Pairwise Registration

Registration is the act of deforming a moving image IM(x) to fit afixed image IF(x). This corresponds to finding a coordinate transfor-mation T(x) that makes IM(T(x)) spatially aligned with IF(x). Math-ematically, the registration problem is formulated as an optimizationproblem in which the cost function C is minimized with respect to T .The optimizer can interface with the moving image by introducinga parametrization of the transformation. The optimization problemreads,

µ = arg minµ

C(Tµ; IF; IM) (2)

where µ indicates that the transform has been parameterized. Thisvector contains the transformation parameters [55]. The reader is re-ferred to [41] for an overview on nonparametric methods.

The minimization problem in (2) is usually solved with an iterativeoptimization method embedded in a multi-resolution setting. Figure4 gives an example of such a registration using the method of [72].A schematic overview of the basic registration components and theirrelations is given in figure 3 and explained below.

The cost function C measures the degree of similarity between the mov-ing and fixed image. Many options have been proposed in theliterature. Commonly used intensity-based cost functions are

2.3 mathematical framework of image registration 19

IF

IM

C

TµkTµk(x)

IM(Tµk(x)

)µk, x

µk

C, ∂C∂µ

Pyramid level

µ

Figure 3: Schematic depiction of a typical pairwise registration system.Sketch inspired by [55].

(a) Sagittal slices of two MR images priorto registration.

(b) Sagittal slices of the registered MR im-ages.

Figure 4: Pairwise registration of two MR images of the head, one in red andone in blue. In (b) correspondences turn pink since red and bluecontribute equally in RGB-space. Images had similar pose and didnot need initialization.

the Sum of Squared Differences (SSD) [93, 58], normalized cor-relation (NC) [91, 76], mutual information (MI) [96, 68, 94, 78]and normalized mutual information (NMI) [82, 83, 92]. A regu-larization are sometimes included in the cost function in orderto penalize undesired deformations [90, 82].

The coordinate transformation Tµ determines the degrees of freedom ofthe deformation. In some applications it may be sufficient toconsider only rigid transformations [96, 99], but often a moreflexible transformation model is needed, allowing for local de-formations [71, 86, 15, 8, 23, 40, 81, 21, 5, 35, 3, 27, 13, 95].

The optimizer estimates µ in iterative fashion, usually taking deriva-tives of the cost function with respect to µk. Several possibilitiesfor the optimization method are discussed in [68, 56, 55].

The sampler is responsible for selecting voxels used to compute C and∂C∂µ . The most straightforward strategy is to use all voxels from

20 literature review

C

TµkTµk(x)

IM(Tµk(x)

)

µk, xµk

C, ∂C∂µ

µ

Figure 5: Schematic depiction of a typical groupwise registration system.Sketch inspired by [55].

the fixed image but this is time-consuming for large images. Acommon approach is to use a subset of voxels, selected on auniform grid or to sample randomly.

The interpolater is needed when IM(Tµk(x)

)is evaluated at non-voxel

positions. Several methods for interpolation exist, varying inquality and speed, including nearest neighbor, linear and nth-order B-spline interpolation.

A multi-resolution pyramid strategy improves the capture range and ro-bustness of the registration. The basic idea is to first estimateT(x) on a low resolution version of the images and then prop-agate the estimated deformation to higher resolutions. A pyra-mid of images is thus constructed in the registration process.An overview of multi-resolution strategies is given in [61].

The reader is referred to either of [69, 46, 44, 55, 54] for elaboratetreatment of individual algorithmic components.

Pair-wise methods, whether volumetric or landmark-based, are lesssuited to obtain correspondence across a population of images. Thiswould typically involve registering all images to a chosen referenceimage which inevitably introduces bias. Groupwise registration offersunbiased image alignment which is attractive for subsequent statisti-cal analysis. This approach has gained significant attention in recentliterature and is the method of choice for state-of-the-art algorithms[10, 72].

2.3.2 Groupwise Volumetric Registration

A common approach in groupwise registration is to iteratively updateestimates of correspondences between each image and a group aver-age [38, 39, 9, 19, 89]. Figure 13 illustrates an instance of groupwiseregistration. A schematic depiction of a typical groupwise registrationsystem is depicted in 5 and explained below.

The cost function typically has two components, one based on imageintensities, that is, comparing the warped target image with the

2.3 mathematical framework of image registration 21

(a) Sagittal slices of six MR images of thehead prior to groupwise registration.

(b) Sagittal slices of six groupwise regis-tered MR images.

Figure 6: Groupwise registration of six images. In (b) correspondences turngrey since all colors contribute equally in RGB-space. Images hadsimilar pose and did not need initialization.

reference, and one based on the shape often used for regularisa-tion. Similarity is typically measured by similarity functionalsapplied to corresponding reference frame locations across im-ages subtracted from the mean image. Measures include SSD[89], Minimum Description Length (MDL) [18], variance [72] orjoint entropy [73]. The shape term is usually a local measure topenalise excessive local deformation, such as a bending energy[86].

The coordinate transformation commonly consist of B-splines, [85, 10,72], piece-wise affine warps [18, 28] or dense flow fields [50, 51].

The optimizer, interpolater and pyramid strategy is not much different fromthe pairwise scheme (section 2.3.1 on page 18) other than thenumber of degrees of freedom in the optimization process hasexploded.

While groupwise non-rigid registration works well in controlledenvironments, it is a challenge to localize robust correspondences ondifficult data sets without considering geometric information [103].The PGM approach has emerged in medical image processing withthe intention of providing an initial rough deformation field that en-ables pure intensity-based algorithms to reach the global optimum inimages exhibiting large pose differences. In the following section sucha model, the 2D Parts+Geometry Model of Zhang et al., is reviewed.The discussion is confined to the most relevant ideas for our extensioninto 3D. For an elaborate treatment of individual algorithmic compo-nents the reader is referred to the original work [103, 105, 104, 106].

22 literature review

2.3.3 Parts+Geometry Models in 2D

A parts+geometry model represents an object by a graph G. The ver-tices ν are composed of positions x = (x,y)T and associated localregion descriptors p : R2 7→ Rm (parts). The edges ξ are defined by aset of arcs that describes the pairwise geometric relationship betweenvertices (geometry).

Let f(xi, xj) denote the part cost which measures the degree of sim-ilarity between a part i at position xi and its match j at position xjon another image. Let g(xi, xj) denote the geometric cost which mea-sures the deformation mismatch between the arc formed by a pair ofparts i and j on one image and the arc formed by their matches i andj on another image. The best model G for a given image is the onethat minimizes the objective function,

C(G) =∑νij∈G

f(pi, pj) +α∑ξij∈G

g(xi, xj) (3)

where α determines the amount regularization by geometry. In thework of Zhang et al. the part cost is calculated as Sum of Squared Dif-ferences (SSD) on a normalized intensity scale over fixed size imagepatches,

f(pi, pj) = ‖pi − pj‖22 (4)

In this case, p contains normalized image intensities of fixed sizepatches (see further below for normalization procedure). The geomet-ric cost g(xi, xj) is calculated as,

g(xi, xj) =((

xj − xi)− dij

)T S−1((

xj − xi)− dij

)(5)

where dij is the mean part separation and S is an estimate of thecovariance matrix. Both are calculated automatically. Equation (5) as-sumes that the relative position of patches follows a Gaussian distri-bution N(dij, Sij). Zhang et al. notes that another functional can beused if this assumption is violated. In practice equation (5) seems toefficiently capture geometric discrepancies.

Applying a model to an image involves searching a local neigh-borhood of each part for match candidates on the target image asdepicted in figure 7. The best model is the combination of matchcandidates (one for each part) that minimize (3). Edges are definedbetween a fixed image keypoint and all match candidates in neigh-borhood on the moving images and assigned weights according to(6). This is a combinatoric problem and arbitrary graphical structureswill lead the problem to be NP-hard [16]. Instead, if each node can bethought of as having at most two parents in a tree-like structure theViberti-algorithm can be used to quickly find the global optimum ina Maximum-likelihood sense in O(nk3) time where n is the number

2.3 mathematical framework of image registration 23

Fixed image keypoints Moving image match candidates

ωi0j

Figure 7: Schematic depiction of the matching of a model to a new image.A weight ωij is assigned to every match candidate j of a keypointi using (6) and the Viberti algorithm finds the most likely configu-ration of keypoints on the target image in a maximum-likelihoodsense. Arrows denote optimal correspondences estimated by themodel. On the left, the black dots denote fixed image keypointslocated by the keypoint detector and the black lines connectingthe dots denote their geometrical relationship. On the right, circlesdenote the neighborhood on the moving image in which matchcandidates (grey circles) are defined on a regular spaced grid. Thedouble-stroke lines denote the default geometric relationships anddashed lines denote geometrical relationships estimated by themodel.

of vertices and k is the number of edges to a vertice. The choice ofk = 2 seems to be a compromise between computational complexityand descriptive power of the graph.

Formally, the Viterbi algorithm is a dynamic programming algo-rithm that finds the most likely sequence of hidden states X = x1, . . . , xT that may occupy any of the states S = s1, . . . , sK for a given sequenceof observations Y = y1, . . . ,yT that may occupy any of the statesO = o1, . . . ,oN. In this work an implementation provided by Uni-versity of Manchester is used. The candidates are then encoded in anacyclic graph with the weight of the edges given by,

wij = f(pi, pj) +αg(xi, xj) (6)

and parsed to the library. For an elaborate treatment of the Vibertialgorithm the reader is referred to the original paper [97].

The following section explains the model building process of Zhanget al. in detail (getGraphMatch() on line 24 of algorithm 1), i.e. howto build a model G that minimizes (14) given set of images.

Model Building. To bootstrap the procedure, a model is first built froma single image. Randomly selected keypoints define an initialgeometry that is initialized using a variant of Prims algorithm

24 literature review

for the minimum spanning tree where each part is connectedto the closest two parents. The geometric cost for each ξ ∈ G

are initialized with Gaussians. The mean dij is given by theseparation in the reference image and standard deviation is setto 25% of the length of the arc,

Sij = (1/16)|dij|I (7)

where I is the identity matrix. This model is applied to thegroup of images as follows. Match candidates are identified us-ing an exhaustive search for local optima for each part. Duringthe search, resolution, orientations and scales of parts are sys-tematically varied. For each image the global optimal combina-tion of match candidates is selected by minimizing (3) using theViterbi algorithm. Resulting model matches are ranked by theirfit value C and the best 50% is used to re-estimate the geometricdistributions of the arcs, namely

Sij ∀ i, j. (8)

Identifying Landmarks (geometry) The method is bootstrapped on a cho-sen reference image on which a set of part candidates is ob-tained by ranking patches on a densely overlapping grid usinga coarse to fine strategy according to measure of "flatness" q,

qi = minδx

|ui(δx) − ui| (9)

where u(δx) is the intensity vector of the region when displacingit by δx. If qi is small then part i is found to represent a flatregion. If qi is large, part i is found to be well localized. Thebest 50% is retained for robustness to outliers and missing data.The process is repeated on all images and forward-backwardmatching from keypoints on reference image to all other imagesis used to rank the keypoints. The assumption is that a point isit is more likely to pass a forward-backward matching test if itis well-localized. Again, the upper 50 percentile is retained theset of which constitutes the input set P to algorithm 1.

2.4 summary 25

Computing Landmark Descriptors (parts) Equation (3) requires local neigh-borhood descriptors for each landmark. Intensities between im-age patches are compared on a normalized intensity scale sodifferences between different pairs of patches can be compared.Normalized descriptors are obtained using the following equa-tions,

pi = βm∑j=1

|gi − gij|

σij, pj = β

m∑j=1

|gj − gij|

σij(10)

where gij is the mean intensity across patches, σij is the stan-dard deviation of intensity difference from the mean acrosspatches and β is a normalization factor chosen so that the stan-dard deviation of the fits across the training set is unity. Theparameters gij, σi and β are bootstrapped on the chosen refer-ence image and re-estimated when the initial parts+geometrymodel is applied to the other images.

Model Evaluation. The following procedure constitutes getCost() online 26 of algorithm 1.

A model is evaluated using a groupwise SSD measure. Imagesare warped into a reference frame using the correspondencesobtained by the model. Borders are augmented with points sothe whole image can be warped using piece-wise affine ap-proach. The set of points are the aligned using Procrustes Anal-ysis (PA) and a triangulation of each image image is warpedinto the reference frame of the mean image and the cost of ex-plaining the whole image set with this model defined by thesum of absolute differences,

U =

N∑k=1

∑x∈R

∣∣Ik(x) − I (W−1(x))∣∣ (11)

where N is the number of images, R is the region of interest,Ik(x) is the target image intensity at x and W(x) is the trans-formation from mean image I to image Ik. The method wastested on three data sets of varying difficulty and was shown toimprove initialization over the standard affine approach in diffi-cult cases. The reader is referred to the original work for imagesand results of Zhang et al. [103, 105, 104, 106].

2.4 summary

A significant amount of research has focused on intensity-based im-age registration but large pose differences are challenging in somecases. Affine initialization is commonly employed but may lead to

26 literature review

input : P, Ioutput : The best parts+geometry model G∗

1 // Randomly generate Npop models from Nsiz parts from P

2 G← getRandomSamples(P, m, n)3 // Compute utility of each model using eqn. (3) and (14)

4 Ut← getUtility(G, I)5 G← sort(G, Ut)6 for i← 2 to Ngen do7 // Retain upper 50 percentile)

8 G← retainBest(G, Npop/2)9 // Generate new 50 percentile

10 for j← 1 to Npop/2 do11 // Select two random models from G

12 G1,G2← getRandomModels(G)

13 // Get a random union from the two pairs

14 G←getRandomUnion(G1, G2), G

15 Ut← getUtility(G, I)16 G← sort(G, Ut)

17 G∗ ←retainBest(G, 1)18

19 // Subroutine getUtility()

20 getUtility(G, I)21 foreach Gt ∈ G do22 foreach Ik ∈ I do23 // Using eqn. (3) as explained in section 2.3.3

24 uk,t ←getGraphMatch(Gt, Ik)

25 // Using eqn. (14)

26 Ut ← getCost(uk,t)

Algorithm 1: Genetic algorithm pseudo code. P is the set of key-points on reference image which is assumed to have been com-puted in advance, I is the set of images, G is the collection ofrandomly sampled parts+geometry models and Gt is a particularparts+geometry model. getGraphMatch() uses the Viberti-algorithmto choose the global optimal combination of match candidates thatminimize (3).

sub-optimal solutions. Zhang et al. introduced a method using 2DPGMs to initialize groups of images and showed that this paradigmimproves the final registration result. The method was successfullyextended to 3D pairwise registration. There is a potential for improv-ing 3D groupwise registration using these approaches. To our knowl-edge, this is the first attempt to extend the methodology of PGMs to

2.4 summary 27

groupwise initialization in 3D. In part two of the thesis the proposedmethod is described and applied to medical data sets.

Part II

3 D PA RT S + G E O M E T RY M O D E L S

This part of the thesis describes a method for automaticconstruction and evaluation of 3D Parts+Geometry Mod-els. Effort has been put into designing a large testing suiteand conducting experiments on several data sets of vari-ous difficulties.

33 D PA RT S + G E O M E T RY M O D E L S

3.1 introduction

This section describes a complete system for constructing 3D PGMs.By "complete" we mean that the only input to the system is a setof unlabelled images and the output is set of correspondences. Allintermmediate tasks will be done automatically by the system givenappropriate parameters.

The University of Manchester has kindly provided the source codefor Zhang et al. [103] and for the spin images used by Babalola etal. [7]. A large part of the exisisting codebase operates independentlyof the underlying image dimension and can be re-used. For example,the genetic algorithm operates on an integer list of keypoints and theViberti algorithm operates on edge weights. This leaves the followingcomponents for extension to 3D:

Keypoint detector The original approach to keypoint detection involvedexamining rectangles in a densely overlapping grid. This be-comes computationally infeasible in 3D. Instead, the widely usedscale-space approach is adopted in this work. Using this paradigmkeypoints are efficiently located as extrema in scale-space.

Parts (neighborhood descriptor) Computing neighborhood similarity isdifficult because the coordinate system in which to compareneighborhoods is undefined. The original approach involved anexhaustive search in a local neighborhood where position, res-olution, orientation and scale was systematically varied. Suchan exhaustive search becomes significantly expensive in 3D. Toovercome this problem, spin images are used for neighborhooddescriptors. A spin image is computed with respect the normalabout a point. This addresses two rotational degrees of freedom.Voxel intensities along the last degree of freedom are summedin bins of a histogram which make up the spin image. The sizeof the region used to compute the spin image is proportinalto the scale selected by the keypoint descriptor. This design ofthe neighborhood descriptor reduces the exhaustive search toposition only and greatly decreases computational cost of theoriginal approach.

Geometry (point distances) Distance calculations extends trivially to 3Dand will not be treated in detail.

The 3D model building process is complicated by roughly an orderof magnitude more data that needs to be processed and extra degrees

31

32 3d parts+geometry models

Image 1

Image N

Median Image

Image N-1

Initial Model

Final Model

Image 2

Keypoint Detector

Image 1

Pyramid 1

Pyramid M

Pyramid 2

Figure 8: Schematic depiction the model building process of 3D PGMs.

of freedom that introduce a higher-dimensional solution space whichis likely to contain more local minima. To mitigate these issues, theproposed method is embedded in a multi-resolution setting. Seperatemodels are constructed for each level in the pyramid. As previouslydiscussed, this is a common approach in medical image registrationthat allows low frequency information to be registered prior to finestructres and was found to greatly increase the capture range of theapproach.

An overview of the 3D model building process is outlined in thenext section. A schematic depiction is given in figure 8. Subsequently,terminology, implementation and mathematical treatment of the key-point detector and keypoint descriptor is explained.

3.2 model building process

The original graph formulation is computationally very efficient andallows the 3D model formulation to follow closely that of the 2Dmodel.

Bootstrapping. The method is bootstrapped on the median image ofthe population. The median image is defined as the image clos-est to the center of the population,

Im = arg mini

1

N

N∑j=1,j6=i

‖Ii − Ij‖22 (12)

Three scale-space octaves are constructed on the median imageat t = [0.5, 1, 1.5, 2, 2.5, 3]. Keypoints are identified using four-dimensional non-maximal suppression on gradient magnitudein each octave. A genetic algorithm is used to search the combi-natorial space of possible models cf. algorithm 1 using 16 gener-ations and 512 individuals per generation. For each individualthe genetic algorithm will choose a random subset of keypointsto build an initial model. As with the 2D model, a variant ofPrims algorithm for the minimum spanning tree is used to con-nect parts to the closest two neighbors. The geometric cost foreach ξ ∈ G are initialized with Gaussians. The mean dij is given

3.3 keypoint detector 33

by the separation in the reference image and standard deviationis set to 25% of the length of the arc,

Sij = (1/16)|dij|I (13)

where I is the identity matrix.

Fitting the initial model to other images. Spin images are computed formatch candidates on the other images. Match candidates aredefined on a 9× 9× 9 regular-spaced grid centered on each key-point. This was found to be a reasonable trade-off between com-putational complexity and flexibility of the model. The spac-ing of the grid depends on the pyramid level. In this imple-mentation, three pyramid levels are used with isotropic spacingof 4mm, 2mm and 1mm respectively. Due to rotational invari-ance of spin images, only one neighborhood descriptor needs tobe computed for each spatial location. The optimal correspon-dences are found on the graph using the same approach as inthe 2D model, see figure 7. Resulting model matches are rankedby their fit value C and the best 50% is used to re-estimate thegeometric distributions of the arcs and the resulting model isrefitted to the other images.

Evaluating keypoint subset. Each individual generated by the geneticalgorithm is evaluated using a groupwise similarity measure us-ing the same approach as in the 2D model. Images are warpedinto a reference frame using the correspondences obtained bythe model. Borders are augmented with points so the whole im-age can be warped using piece-wise affine approach. The set ofpoints are the aligned using Procrusted Analysis and a triangu-lation of each image image is warped into the reference frameof the mean image and the cost is defined as the sum of stackvariances along the fourth dimension,

U =1

N

N∑k=1

∑x∈R

(Ik(W(x)) − Im (x))2 (14)

where N is the number of images, R is the region of overlap ofall images, Ik(x) is the target image intensity at x and W(x) isthe transformation from Ik to the median image Im.

Multi-resolution implementation.

The above process is repeated for each individual in algorithm 1.Below the individual algorithmic components are treated in detail.

3.3 keypoint detector

A common approach to keypoint detection is to use local extrema ofdifferent combinations of scale-normalized derivatives as likely can-

34 3d parts+geometry models

didates for interesting structures. This allows for automatic scale se-lection which adapt the local scales of processing to the local imagestructure. Therefore we propose to utilize the scale-space to quicklyanatomical keypoints.

Keypoints are identified according to the following steps.

Scale-space representation. In the spirit of Lowe’s popular Scale Invari-ant Feature Tracker (SIFT) [67] a hybrid multi-resolution andmulti-scale approach is adopted. Each resolution (figure 11a)contains a logarithimic scale-space representation (figure 11b).This is achieved by (i) repeatly convolving the image with thesame filter to extact multiple scale representations of an imageand then (ii) downsampling by a factor of 2. The process is re-peated while it makes sense. Each multi-reslution image con-tains multiple scale-representations and is referred to as an oc-tave.

Non-maximal suppression on gradient magnitude. A pixel is discarded ifthe gradient magnitude increases in any of the 80 neighbor-ing positions. This favors points on surfaces where curvatureis high. The process is performed seperately on all 4D octaves.This procedure constitutes the automatic scale selection mecha-nism and determines the support of the keypoint descriptor onthe local neighborhood. The use of non-maximal suppressionon gradient rather than on the conventional Laplacian of Gaus-sian is motivated by using spin images as keypoint descriptor(see section section 3.4 on page 40). Spin images are orientedwith respect to gradient to achieve rotation invariance. Key-points where the gradient is well-defined are thus more likelyto be recognized in different images.

Thresholding on gradient magnitude. Keypoints are thresholded usingthe median of gradient magnitudes thus retaining the upper 50

percentile. The median was chosen as a conservative yet data-dependent threshold.

Keypoint localization. The spatial sub-pixel locations of the remainingkeypoints are computed using equation (29) on the scale wherethe gradient is largest. While the process can be repeated forincreased accuracy a single pass was sufficient for our purpose.

Pruning. Keypoint locations are compared across octaves and prunedif they are less than 2 mm from each other. The keypoint withbiggest gradient is retained. Typically, 103 hypothesis remainsat this stage (order of magnitude).

The following sections explains the scale-space implementation indetail.

3.3 keypoint detector 35

grey

leve

l

samplingdirection

Figure 9: Schematic depiction of the basic scale problem as it occurs in edgedetection. The dots are grey level values, the dotted line representsthe gradient on a fine scale, the dashed line represents the gradienton a medium scale and the solid line represents the gradient oncoarse scale. Sketch adapted from [12].

3.3.1 Scale-Space Theory

The scale of observation is criticial to how structures are percievedin images. Anatomical structures comes in all shapes and sizes so theamount of information conveyed by an image depends on the scale ofobservation. For example, regard the fundamental problem of edgedetection in image processing. Figure 9 shows a cross-section throughan object boundary in a 2D image. The black dots represent the grey-level values sampled along the cross-section direction. In order toextract an edge, gradients have to be computed by discrete approxi-mation of the first derivative. The different slopes of the straight linesindicate that the result of the first derivative estimation strongly de-pends on the size of the support of the difference operator and thuson the scale of observation.

The need for inference about scale has led to the development ofthe scale-space framework that represents an image as a family ofgradually smoothed images where each component conveys informa-tion about a particular scale of observation. It is not possible to knowa priori what scales are relevant and hence it is only reasonable togauge all scales simultaneously. The smallest scale is defined as thesmallest level-of-detail on which input data can be resolved, e.g. oneimage pixel, and the largest scale as the largest possible structre, e.g.a whole image. The derived scale-space framework allows for for au-tomatic scale selection and comparison of information content acrossdifferent scales.

Principles from biological vision and physics suggests the use of aGaussian kernel to construct the scale-space. Convolving with a Gaus-

36 3d parts+geometry models

sian kernel closely resemble receptive field profiles registered in neu-rophysiological studies of the mammalian retina and the first stagesin the visual cortex [66] and the Gaussian arise as the Green’s functionto the diffusion problem. Convolving a scalar field f(x) with a Guas-sian of standard deviation σ is equivalent to solving the isotropic heatequation,

∂tL =

1

2∇2L (15)

In these respects the scale-space framework can be seen as a the-oretically well-founded paradigm for early-stage computer vision,which has been thoroughly developed and tested over the years by[57, 102, 37, 63, 65, 66] and many others. These papers establish a setof scale-space axioms describing properties of the scale-space repre-sentation.

Let L : Rd ×R+ 7→ R be the scale space representation of a con-tinuous scalar field f : Rd 7→ R. Smoothing with a Gaussian kernelg : R1 ×R+ 7→ R of various widths t leads to the linear Gaussianscale-space,

L(x; t) = f(x) ∗ g(x; t) (16)

where x is the spatial coordinates and t = σ2/2 is the real-valuedscale (t > 0) and time analogue to the heat equation. This family offunctions define the solution the diffusion equation the with initialcondition,

L(x; t = 0) = f(x) (17)

(the original data signal). The d-dimensional Gaussian kernel of stan-dard deviation σ is given by,

g(x; t) =1

(πt)d/2exp

(−

x2

t

)(18)

Many kernels facilitate construction of a scale-space but the Gaus-sian kernel is mathematically garuanteed to satisfy a number of use-ful properties. Let Tt be the scale-space operator (Ttf)(x) = g(x; t) ∗f(x) at scale t. Besides linearity, shift invariance, isotropy and positiv-ity the most important properties are the following.

Semi-group structure.

g(x; t1) ∗ g(x; t2) = g(x; t1 + t2) (19)

with the associated cascade smoothing property,

L(x; t2) = g(x, t2 − t1) ∗ L(x; t1) (20)

The set of smoothing functions commutes with respect to com-position. A scale representation can be obtained by direct con-volution on scale t or by recursively applying smaller filters(whose power of 2 add up to t) which makes the convolutioncomputationally less expensive.

3.3 keypoint detector 37

scal

e/ti

me

positionin space

equilibrium state

initial temperature distribution

Figure 10: Schematic depiction of non-creation and non-enhancement of lo-cal extrema in the one-dimensional case. Sketch adapted from[12].

Non-enhancement of local extrema in any number of dimensions.

∂tL(x; t) 6 0 at spatial maxima

∂tL(x; t) > 0 at spatial minima (21)

If for some scale level a point is a non-degenerate local maxi-mum then its value must not increase when the scale parameterincreases. The physical analogy is straightforward. As depictedin figure 10, a hot spot in one-dimensional space will only cooldown and a cold spot will only warm upwith increasing timeuntil it has reached the equilibrium temperature of the system.

Scale invariance. There is no prefered size of an image processing op-erator.

Depending on the implementation the Gaussian kernel is also seper-able in Cartesian coordinates. Convolution in d dimensions can bewritten as the convolution of d one-dimensional filters. For instance,the 2D Guassian filter can be written as the outer product of twoone-dimensional Gaussians,

g(x,y, t) = g(x,y)g(y, t)T (22)

It is considerably cheaper to convolve an image with d 1-dimensonalfilters rather than convolving the image with one big d-dimensionalfilter as the computational complexity of the convolution algorihmdepends on the size of the filter.

The Gaussian scale-space is the most common type of scale-spaceused in image processing and computer vision. In this work, the

38 3d parts+geometry models

scale of the support of local neighborhood descriptor is therefore de-termined by extremas in Gaussian scale-space. The scale-space of avolumetric image is a 4D space in which the zero-crossings of 3Dhyper-surfaces constitutes points of interest.

3.3.2 Scale-normalized derivative operators

A direct consequence of the non-enhancement property is that smooth-ing reduces the high frequency information in the image causing theamplitudes of spatial derivatives to monotonically decrease with in-crease in scale. This is schematically depicted in figure 10 where theamplitude of signal variation dicreases with scale. In order to devisea method which permits comparison across scales it is necessary toperform normalization with respect to scale.

Let Ln be the nth-order non-normalized derivative of an image atscale t and let ∂ρLn be the nth-order normalized derivative at thesame scale. The scale normalized derivative can be expressed as

∂ρLn(x;√2t) = (2t)γn/2Ln(x;

√2t) (23)

which corresponds to a change of variables ∂ρ = tγ/2∂x. The γ-normalized derivatives are dimensionless and thus scale-invariant forγ = 1. The spatial derivatives also satisfy the diffusion equation. Inthe one dimensional case,

∂ρLxi =1

2∇2Lxi (24)

The γ-normalized derivatives arises by necessity given the require-ment that the scale selection mechanism should commute with rescal-ings of the image pattern.

Combining the output from such Gaussian derivative operators oneobtains a multi-scale differential geometric representation that attainsa maximum where the standard deviation of a Gaussian kernel bestmatches the intrinsic scale of the data point. This facilitates automaticscale selection for each keypoint by non-maximal suppression on thefour-dimensional neighborhood.

3.3.3 Discrete Scale-space

The scale-space can be constructed from the formal definition in astraightforward manner by successively applying Gaussian kernels tothe original data. However, the diffusion equation contains the first or-der temporal derivative and second spatial derivatives (the Lapacian)of the data signal which assumes a continuous image domain. In prac-tice, to compute scale-space scales in the discrete domain, discretizedversions of the convolution operator or heat equation is needed, re-spectively. On anisotropic structures grids, which medical images are

3.3 keypoint detector 39

confined to, there are at least three basically different methods forsmoothing data with a Gaussian filter kernel: (i) discrete samples ofGaussians functions which can be performed either in the spatial do-main or in the frequency domain, (ii) apply for each dimension a bi-nomial filter whose weights are (up to normalisation) the even rowsof the Pascal triangle and (iii) discretise the Gaussian and use finitedifference approximations of derivative operators. This correspondsto a discretization of the heat equation. In the one-dimensional case,the filter mask of the gradient is [−1, 0, 1]T and the Laplacian is of theform [1,−2, 1]T . Higher-dimensional operators are obtained by con-volving the other dimensions with same (anisotropically weighted)one-dimensional filter.

In this work we adopt method (iii). In the discrete domain theGreen’s function of the heat equation becomes the discrete Gaussiankernel,

T(n, t) = exp(−t)In(t) (25)

where In(t) is the modified Bessel function of integer order. Convolv-ing the image by the discrete Guassian kernel,

L(x; t) =∞∑−∞ f(x −n)T(n, t) (26)

one obtains the solution to the discrete diffusion equation (discretespace, continuous time) analogous of the formal continuous solution.This ensures that the scale-space axioms carry over in the discretedomain [64].

3.3.4 Sub-voxel Keypoint Localization

Scale-space smoothing will cause extremas to drift as scale increases.This is in part due the nature of the discrete domain in which thetrue location of extremas will "snap-to-grid". Hence, some kind of lo-calization is required to shift points back to their true spatial locationand scale value.

The true keypoint location x∗ = [x∗,y∗, z∗, t∗] can be approximatedby a second-order Taylor expansion around the maximum on the gridx,

I(x + δ) ≈ I(x) + δg+ 12δTHδ (27)

where g and H is the gradient (4-vector of first order derivatives)and Hessian (4-by-4 matrix of second order derivatives) respectively.Since the hyperparabola achieves it maximum where its derivativesare equal to zero, we take the derivative of the right hand side ofequation (27) with respect to δ and set it equal to zero,

g + Hδ = 0 (28)

40 3d parts+geometry models

decr

easi

ngre

s.

scale

subsampledsmoothed(half res.)

originalresolution

(a) Schematic depiction of a multi-resolution image representa-tion. Each level consists of mul-tiple scale represenations and isreferred to as an octave.

cons

tant

res.

log scale

coarse level

medium level

originalresolution

(b) Schematic depiction of amulti-scale image repre-sentation.

Figure 11: Schematic depiction of scale-space pyramids. Sketches adpatedfrom [12]

(a) (b) (c)

Figure 12: Detected keypoints on a MRI of the head.

From this equation the correction vector δ can be computed,

δ = −H−1g. (29)

The sub-pixel location is given by x∗ = x+δ. The gradient and Hes-sian are obtained using standard finite differencing. This procedurehas been found to improve the stability of keypoint detectors in theliterature and may, for our purpose, provide a more reliable estimateof the gradient and its direction.

The above procedure results in a subset of scale-space extrema thatare located in regions of high curvature. The assumption is that thiswill be edges of distinct anatomical features. See figure 12 for anexample of the keypoint detector applied to an image.

3.4 keypoint descriptor

A robust neighborhood descriptor is needed for meaningful compar-ison of keypoints when images exhibit significant pose differences.Rotational invariance is a key requirement to efficiently compare key-points across images. To this end, spin images are utilized.

3.4 keypoint descriptor 41

Spin images are named after the image generation process that canbe visualized as a sheet spinning about the normal of a point. A spinimage is constructed from a single point basis that defines a referencepoint and orientation relative to which the position of other points inthe local neighborhood can be described by two spatial coordinates.The feature vector s(a, r; t) is essentially a 2D histogram indexed byan axial distance a and a radial distance r with the distances be-ing proportional to t in this implementation. This allows for moreappropirate comparison of neighborhood descriptors in different im-ages since local neighborhood is always compared with respect tothe orientation local anatomy. Using gradient magnitude instead ofthe widely adopted Laplacian og Gaussians (LoG) to construct thescale space, keypoints are detected at points where the orientationassignment is well-defined.

3.4.1 Mathematical Formulation

The spin image is defined at a keypoint x∗ and associated with itsgradient direction u. Any other point xj can be mapped to a 2D point(aj, rj) relative to this frame,

aj = (x∗ − xj) · u

rj =√‖x∗ − xj‖22 − a2j (30)

aj is the distance along the normal u from x∗ and rj is the radialdistance of xj from the line along u through x∗. This is invariant torotation around the normal.

A 2D histogram is constructed from a gradient magnitude imageby (i) computing (ai, ri) from x and (ii) update the neighboring fourbins using bilinear interpolation if ai < amax and ri < rmax. Theprocess is repeated for all voxels xi in a cylinder about x∗ with itsaxis along u. Each bin of the histogram is normalized with respectto the swept volume. The histogram is further normalized so that itselements sum to unity. The histogram is populated using a gradientmagnitude image which ensures equal contributions from bright anddark regions.

The size of the window used populate the histogram is should belarge enough to capture local anatomical information. Therefore itshould be proportional to the scale at which the keypoint was se-lected. Intuitively, σ in (18) is the standard deviation of the kernelthat best matches the intrinsic pattern of the local neighborhood. Thesupport region should therefore cover the extent of this kernel. Thefollowing relation is constructed to determine the extent of supportregion,

amax = rmax = 3 ∗√t/2 = 3 ∗ σ (31)

42 3d parts+geometry models

(a) (b) (c) (d)

Figure 13: An example of a spinimage from an MRI of the head. (a)-(c)shows the gradient direction and magnitude projected onto theaxial, coronal- and sagittal plane respectively. (d) is the spin im-age at this position.

or three times the standard deviation of the selected scale. This covers99.7% of the area under the curve.

3.4.2 Similarity Metric

Two compare two neighborhood descriptors the Bhattacharyya coef-ficient is used,

B(hi, hj) =∑k

√hikhjk (32)

By construction B(hi, hj) is in the range [0, 1] with B(hi, hj) = 1 forhi = hj. Each element in the spin image is replaced with its squareroot so spin images can be compared simply by their dot products.The Bhattacharyya distance is an approximate measurement of theamount of overlap between two statistical samples and is widely usedin machine learning applications for comparing frequency distribu-tions [7].

3.4.3 Implementation

Gradient magnitude image. The image is smoothed with σ = 2 and agradient magnitude image is computed.

Reference frame. Gradient direction is computed based on the scale forall keypoints.

Computation. The smallest bounding box containing the local neigh-borhood for all spin images is computed. We then loop overvoxels inside the bounding box and update overlapping spinimages accordingly. This ensures that each voxel in the image isonly visited once.

Normalization Spin images are normalized so they sum to unity andeach element is replaced with its square root.

3.5 tuning of hyperparameters 43

3.5 tuning of hyperparameters

The proposed method has two main hyperparameters: The numberof parts in a model n and the weight between geometry cost andpart similarity cost α that needs to be specified. This is a commonproblem of the methods of automatic construction of parts+geometrymodels [20, 31, 98]. The proposed method was tested on cmamri

and wristct data sets using combinations of n = [16, 32, 64, 128 andα = [0.10.51210] for different levels of noise η = [1, 2, 3, 4] and seemedfairly robust to the choice of n and alpha in the sense no combi-nations were particularly better than others, i.e. one set of settingsmay be good on some images but fail on others. We then choose thesimplest case of n = 16 and α = 1 for all experiments.

3.6 summary

The construction of 3D PGMs follow closely that of the 2D model.The method is extended in three key areas. A scale-space approachis adopted for keypoint detection, spin images are used for neighbor-hood description and the model building process is embedded in amulti-resolution setting.

The next section presents data and the experimental design.

4E X P E R I M E N TA L D E S I G N

The following sections describe experiments, data sets, method ofquantification of performance and the method for inducing syntheticanatomical variation in images. The results are presented in and dis-cussed in section 5.

4.1 experiments

Four experiments are conducted.

Experiment 1 registers original data sets using affine initialization andgroupwise registration and sets the baseline for groupwise reg-istration performance. Also studies the effect of affine initializa-tion.

Experiment 2 registers original data sets using 3D PGM initializationand groupwise registration. Studies the effect of 3D PGM ini-tialization and the effect of including affine initialization before3D PGMs.

Experiment 3 registers synthetically generated data sets using affineinitialization and groupwise registration and sets the baselinefor registration performance in extreme cases. Studies the effectof affine initialization in extreme cases.

Experiment 4 registers synthetically generated data sets using 3D PGMinitialization and groupwise registration. Studies the effect of3D PGM initialization in extreme cases.

Experiments are performed on one core of a 2.66Ghz Intel Core i7CPU with 8GB RAM. Due to the limited amount of RAM, experi-ments are conducted on only 4 images from each data set. The key-point detector and experiment bookkeeping is implemented in Mat-lab. The 3D PGM algorithm is implemented in C++.

4.2 data

Sample images and associated manual segmentations are given infigure 13. The medical data sets used in this work are following.

cmamri T1-weighted MRIs of the head from 18 healthy subjects. Eval-uated using manual segmentations of white matter, grey matterand the brain stem. Images and manual segmentations were

45

46 experimental design

kindly provided by Timothy Cootes of the University of Manch-ester. Heads rarely need initialization since they are topolog-ically simple, rigid structures and is considered a well-poseddata set. A multi-resolution approach is often sufficiently ableto cope with pose differences and so this is considered a well-posed data set.

kneemri T2-weighted MRIs of the knees from 11 arthritic subjects.Evaluated using manual segmentations of femur and tibia. Im-ages were kindly provided by the Department of Radiology,Frederiksberg Hospital. It is often difficult to perform inter-subjectregistration of knees because the amount of fat varies greatlybetween subjects. Using a T2-weighted MR sequence only com-plicates the data set since his sequence highlights fat.

heelpadmri T1-weighted MRIs of heel pads from 7 healthy subjects.Evaluated using manual segmentations of cuboid and calcaneus.Images were kindly provided by the Department of Radiology,Frederiksberg Hospital. The foot consists of many similar bonesbut is of simple topology.

wristct CT images of wrists from 24 healthy subjects. Evaluated us-ing segmentations of ulna, radius, lunate, scaphoid, triquetrum,pisiform, trapezium, trapezoid, capitate and hamate bones. Im-ages and manual segmentations were kindly provided by XinCheng of the University of Manchester. The wrist is a very com-plicated joint consisting of many similar bones that makes ini-tialization important. Further, CT offers less contrast betweenbone and tissues compared to MRI so this is considered a diffi-cult data set.

handmri T2-weighted MRIs of the hands from 11 subjects with rheuma-toid arthritis. Evaluated using manual segmentations of the threecentral metacarpals. Images were kindly provided by the De-partment of Radiology, Frederiksberg Hospital. This data setis considered slightly less difficult than the CT wrist data be-cause of the greater contrast between tissues and because it isevaluated on the metacarpals which are of somewhat simplertopology compared to the wrist joint.

kneemricarot T2-weighted MRIs of the knees from 50 severely osteoarthriticsubjects. Evaluated using manual segmentations of knee articu-lar cartilage. Images and manual segmentations from the CAROTstudy [80] were kindly provided by The Parker Institute, Fred-eriksberg Hospital. The small volume of cartilage means thateven small misalignments will cause a significant drop in DSC.Registration is further complicated by severe pathology in theseimages, e.g. osteophytes (bone spur formations) and regions ofdegenerated cartilage.

4.2 data 47

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

48 experimental design

(m) (n) (o)

(p) (q) (r)

(s) (t) (u)

Figure 13: Example images from data sets and associated manual segmenta-tions used in this work. Left column shows axial view, center col-umn shows sagittal view and right column shows coronal view.Name of the dataset used to identify the dataset throughout thissection is given in mono-spaced text. (a)-(c) spheres are syntheticimages used to illustrate the problem of local minima in the in-troduction, (d)-(f) cmamri are T1-weighted MRIs of the head, (g)-(h) wristct are CT images of the wrist, (j)-(l) heelpadmri are T1-weighted MRIs of the heelpad of healthy subjects, (m)-(o) kneemriare T2-weighted MRIs of knees of Osteoarthritic patients, (p)-(r)wristmri are T2-weighted MRIs of the wrist of healthy subjectsprovided by the radiology departmant at Frederiksberg Hospi-tal and (s)-(u) are T2-weighted MRIs of knees of severely os-teoarthritic patients.

4.3 performance assessment 49

4.3 performance assessment

The image registration problem is ill-posed in the sense that thereis no ground truth with which to compare results. Instead, the stan-dard approach is to manually segment anatomical structures in im-ages and apply the found transformations to segmentations. This fa-cilitates quantification of registration performance by measuring theoverlap of transformed binary segmentations. For this purpose, theDice Similarity Coefficient (DSC) is used. The DSC is defined on theinterval [0; 1] as the ratio of joint volume to the total volume,

DSC =2|X∩ Y||X|+ |Y|

(33)

For multiple images, a reference image is chosen and the mean DSCis reported. When multiple segmented objects are present in an imagethe mean DSC of corresponding objects is reported.

4.4 affine- and groupwise registration implementation

To facilitate the research on medical image registration and to sim-plify its application, the open source software package elastix hasbeen developed [55]. The elastix software has a modular design, in-cluding several optimization methods, multi-resolution schemes, in-terpolators, transformation models and cost functions. This allowsthe user to quickly compare different registration methods in order toselect a satisfactory configuration for a specific application. transformixwas developed simultaneously with elastix and allows for the trans-formation found by elastix to another image. Both programs have acommand-line interfaces that enables automated processing of largenumbers of data sets by means of scripting. This feature is heavilyused throughout this work.

The software is built upon a widely used open source library formedical image processing, the Insight Toolkit (ITK) [47]. The use ofthe ITK implies that the low-level functionality (image classes, mem-ory allocation, etc.) thoroughly tested. The library makes use of tem-plating, operator overloading and various other fancy C++ tricks thatallows for fast execution. Executables and source code are publiclyavailable online under the BSD license.

In this work, a groupwise registration method implemented inelastix based on a 4D (3D + time) free-form B-spline deformationmodel is used [72]. The cost function is the sum of variances overthe 4th dimension and the optimizer is Stochastic Gradient Descent(SGD). It is stochastic in the sense that it randomly samples a subsetof voxels in images to speed up computation. An implicit referenceframe is obtained by requiring the sum of deformations to be zero.This eliminates the need to choose a reference image and avoid in-troducing bias towards any particular image when shape models are

50 experimental design

constructed. Four pyramid levels are used with grid spacings of 8mm,4mm, 2mm and 1mm respectively. 2nd order B-Spline interpolation isused during registration and final image is sampled using 3rd orderB-Spline interpolation. All other parameters and settings are eitherestimated from data automatically by elastix or are kept at theirdefault values.

Affine initialization is obtained in straightforward, pair-wise fash-ion [55] using the median image as the fixed image. The cost functionis the sum of squared distances between intensity values and the op-timizer is also SGD. Four pyramid levels are used with grid spacingsof 8mm, 4mm, 2mm and 1mm respectively. 0th order B-Spline inter-polation is used to warp binary segmentations.

4.5 generating synthetic anatomical variation

4.5.1 Method

A shape model is employed to learn anatomical variation from orig-inal data to construct synthetic images. The aim is to quantify theperformance of the proposed method in extreme cases at user-definedlevels noise. Specifically, images are registered using the elastix non-rigid groupwise registration method and the keypoint detector is ap-plied to the resulting mean image. The shape model is constructedfrom points propagated back to the original images using the inversetransform and thus captures the natural anatomical variation in thedata. Significant pose differences can be obtained by synthesising alarge amount of variation.

Note that the groupwise registration method may not obtain per-fect correspondences since no initialization is used. Nonetheless, it isable to capture a sufficient level of variation for the purpose of induc-ing synthetic anatomical variation. The remainder of the paragraphoutlines shape model formulation.

4.5.1.1 Mathematical Formulation

A classical statistical method of describing variation in data is the lin-ear orthogonal transformation; Principal Component Analysis (PCA).PCA forms the basis for many algorithms, including ASMs and AAMs.PCA projects feature vectors into a subspace spanned by the most sig-nificant eigendirections in the data space. Each principal componentis a linear combination of the original variables. In essence, vectorsare rotated around their mean such that the greatest variance by anyprojection of the data comes to lie on the first principal component,the second greatest variance on the second component, and so on.This facilitates modelling of shape variation. Conceptually, PCA isused here construct a subspace of anatomical variation in the originalimages.

4.5 generating synthetic anatomical variation 51

For each image in a data set, the coordinates of associated key-points are concatenated into a shape vector ~x,

x = [x1,y1, z1, . . . , xn,yn, zn] (34)

Each vector is 3n-dimensional. The PCA is performed as an eigen-analysis of the covariance matrix of shape vectors. The MaximumLikelihood (ML) estimate of the covariance matrix Σ is given by

Σ =1

N− 1

N∑i=1

(xi − x)(xi − x) (35)

where x is the mean shape estimated as

x =1

N

N∑i=1

xi (36)

and −1 in the denominator in (34) stems from the one degree of free-dom used to calculate the mean shape. The principal directions ofvariation in the data space is now described by the eigendecomposi-tion of the covariance matrix C,

CΦ =ΦΛ (37)

where the columns Φ are eigenvectors of the covariance matrix,

Φ =[φT1 ...φTm

](38)

and Λ is the diagonal matrix of corresponding eigenvalues,

Λ =

λ1

. . .

λm

(39)

A synthetic image can the be generated at η times the level of vari-ation in original images by deforming the mean image with a shapegiven by the following equation,

x = x + ηΦ√Λr (40)

where r is a vector containing Normal and Independently Distributed(NID) random numbers of mean 0 and standard deviation 1 and

√Λ

is the matrix of square roots of eigenvalues. This is more convenientrather than the eigenvalues themselves, since the sqaure root of thevariance, i.e. the standard deviation, is in units of original data. Equa-tion (40) facilitates generation of synthetic datasets at user-specifiedlevels of noise.

Figure 14 on page 53 shows the first three modes of variation in adata set of T1-weighted MR images of heads. Eigenvectors explaining

52 experimental design

up to 95% of variation is retained and the rest is discarded as noise.Thus to retain p percent of the combined variation in the training set,t modes can be chosen satisfying:

t∑i=1

λi >p

100

n∑i=1

λi (41)

This step basically is a regularization of the solution space.Shapes are usually aligned prior to application of PCA using Gen-

eralized Procrustes Analysis (GPA), e.g. [GPA], in order to obtain amore compact model and thus a more compact search space for thebenefit of fx. ASMs and AAMs optimizers. In these experiments how-ever, the shape model is used to induce noise so it is not neccessarilydesirable to align images using GPA which effectively removes de-grees of freedom from data. While this approach leads to a less com-pact model, it facilitates a unified model of variation which is simplerto control in an experimental setting than fx. a model of affine vari-ations plus a model of pure shape variation. Note that the first fewprincipal components will therefore tend to govern affine variations.

4.5 generating synthetic anatomical variation 53

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 14: First three eigenmodes of variation for 18 T1-weighted MR im-ages of heads. The first row (a)-(c) corresponds to the first eigen-mode of variation, the second row (d)-(f) to the second modeof variation and the third row (g)-(i) to the third mode of varia-tion seen from axial, sagittal and coronal views respectively. Thegreen shape corresponds the mean shape, the red shape corre-sponds to mean shape plus one standard deviation and the blueshape to mean shape minus one standard deviations of the as-sociated mode of variation. The first mode of variation seems togovern the size of the heads. The second mode seems to governstretching of the skull. The third mode seems to govern shape offorehead and placement of eyes (therefore image (i) shows a slicecontaining eyes as opposed to the central slice).

5D I S C U S S I O N A N D C O N C L U S I O N

Data set Baseline Affine Groupwise Affine+Groupwise

cmamri 0.69 ± 0.10 0.61 ± 0.13 0.78 ± 0.077 0.78 ±0.073

kneemri 0.56 ± 0.08 0.77 ± 0.06 0.87 ± 0.05 0.88 ±0.042

heelpadmri 0.078 ± 0.11 0.83 ± 0.04 0.90 ± 0.030 0.90 ±0.031

wristct 0.31 ± 0.21 0.34 ± 0.24 0.44 ± 0.27 0.42 ±0.26

wristmri 0.37 ± 0.18 0.67 ± 0.027 0.86 ± 0.02 0.85 ±0.0087

kneemricarot 0.0071 ± 0.011 0.12 ± 0.15 0.23 ± 0.20 0.24 ±0.20

Table 3: Experiment 1 results. DSC ± std. of affine,- groupwise- andaffine+groupwise registration of original data. Baseline denotes un-registered images.

5.1 experiment 1

This experiment studies the registration problem as it typically ap-pears in clinical practice. Some data sets are well-behaved and somedata sets pose challenges. Table 4 shows DSC and standard deviationof registration results and images are show in figure 15.

Almost every data set (heelpadmri being the exception) containsonly marginal pose-differences wherefore affine initialization makeslittle difference to the groupwise registration approach. In some cases,affine registration seems to perform well on its own. This is due thesimple topology of, for example, the heel. Note, however, that the seg-mentations of kneemri, heelpadmri and wristmri are all on bone onthe central part of the image which are precisely the kind structuressuited for affine registration. Looking at the images of registration re-sults on figure 15, affine registration does not perform as well as theDSC scores convey. For example, the fingers on wristmri are not aswell aligned as the scores imply.

Two data sets stand out. The wristct data set is particularly chal-lenging due to less contrast between tissues as previously discussed.The kneemricarot data set evaluated on knee articular cartilage whichtakes up a volume orders of magnitude smaller than the bone segmen-tation on kneemri. Thus even small deviations between images cancause a significant drop in DSC. While this makes segmentation of ar-ticular cartilage less suited for evaluation of registration algorithms,it offers an example of challenges registration systems face in clinicalpractice - not all biomarkers easily processed computationally.

The next experiment compares the performance of 3D PGMs toaffine initialization.

55

56 discussion and conclusion

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

(q) (r) (s) (t)

(u) (v) (w) (x)

Figure 15: Experiment 1 results. Affine and groupwise registration of orig-inal data. First column shows mean images before registration,second column shows images after affine initialization, third col-umn shows groupwise registration of images without initializa-tion and fourth column shows groupwise registration of imagesthat has been initialized with an affine transform.

5.2 experiment 2 57

Data set Baseline PGM Affine+PGM PGM+Groupwise Affine+PGM+Groupwise

cmamri 0.69 ± 0.10 0.41 ± 0.16 0.50 ± 0.12 0.76 ± 0.076 0.77 ±0.07

kneemri 0.56 ± 0.08 0.61 ± 0.14 0.74 ± 0.056 0.86 ± 0.043 0.87 ±0.040

heelpadmri 0.078 ± 0.11 0.26 ± 0.20 0.58 ± 0.15 0.84 ± 0.039 0.88 ±0.036

wristct 0.31 ± 0.21 0.28 ± 0.21 0.22 ± 0.19 0.42 ± 0.24 0.38 ±0.24

wristmri 0.37 ± 0.18 0.36 ± 0.15 0.54 ± 0.33 0.84 ± 0.01 0.83 ±0.002

kneemricarot 0.0071 ± 0.011 0.0014 ± 0.0034 0.045 ± 0.092 0.22 ± 0.19 0.218 ±0.19

Table 4: Experiment 2 results. DSC ± std. of 3D PGM-, affine+3D PGM-,3D PGM+groupwise- and affine+3D PGM+groupwise registrationof original data. The baseline is the same as in experiment 1 andincluded here for convenience.

5.2 experiment 2

This experiment examines the performance of 3D PGMs on typicaldata sets as they would appear in clinical practice. Table 4 showsDSC and standard deviation of registration and images are shown infigure 15. Comparison with affine initialization is showed in figure16.

On its own 3D PGM initialization performs significantly worse thanaffine initialization and actually induce additional pose differences inthe data. This is of course very undesirable for an algorithm that wasdeveloped with robustness in mind. A combination of two issues issuspected to be the root of the problem: (i) the capture range of themodel is defined by the size of the match candidate neighborhoodsand the resolution of the match candidate grid and (ii) the algorithmdoes not have a mechanism for allowing missing nodes. The algo-rithm may then choose spurious match candidates, for example, incases where translation and scale differences make meaningful cor-respondences harder to find. Large grid spacing between match can-didates are dictated by the computation complexity of the model oncoarse pyramid levels which may limit the amount of good matchcandidates. The is even more problematic in the case of groupwiseregistration where a keypoint may have meaningful correspondenceson a subset of images that cause the keypoint to be selected for thefinal model even though correspondences are not meaningful on allimages. An initial affine registration mitigates parts of these issuesas shown by table 4. For the rest of the experiments, 3D PGMs willtherefore be initialized with an affine transformation.

Looking at results after groupwise registration, there seem to beno significant differences in including PGM in the initialization pro-cedure. If any, the results obtained using affine+groupwise is slightlybetter than when using affine+3D PGM+groupwise registration. Thegroupwise registration procedure is able to compensate for most ifnot all of the variation induced by the 3D PGMs.

The experiment shows that no benefit is gained from applying 3DPGMs in its current form to well-posed images. The next experiments

58 discussion and conclusion

investigate whether 3D PGMs can help the groupwise registrationprocedure in cases of extreme pose differences where affine initializa-tion is insufficient on its own.

5.2 experiment 2 59

(a) Affine vs 3D PGM. The linear regression coefficients are y = 0.54x+ 0.04 andshows a clear tendency for affine registration to be superior to 3D PGMs in theircurrent form.

(b) Affine+Groupwise from experiment 1 vs 3D PGM+Groupwise. The linear re-gression coefficients are y = 0.98x+ 0.01 and shows tendency for differencesbetween the two approaches.

Figure 16: Experiment 2. DSC scores and error bars of affine initializationplotted against 3D PGM initialization. Error bars denote the onestandard deviation mentioned in the table.

60 discussion and conclusion

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

(q) (r) (s) (t)

(u) (v) (w) (x)

Figure 17: Experiment 2 results. PGM and groupwise registration of orig-inal data. First column shows mean images before registration,second column shows mean images after 3D PGM initialization,third column shows mean images after 3D PGM+groupwise reg-istration and fourth column shows mean images after affine+3DPGM+groupwise registration.

5.3 experiment 3 61

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DSC Groupwise

DSC

Affi

ne+G

roup

wis

e

cmamri (1)cmamri (2)cmamri (3)cmamri (4)kneemri (1)kneemri (2)kneemri (3)kneemri (4)heelpadmri (1)heelpadmri (2)heelpadmri (3)heelpadmri (4)wristct (1)wristct (2)wristct (4)wristct (4)wristmri (1)wristmri (2)wristmri (3)kneemricarot (1)kneemricarot (2)kneemricarot (3)kneemricarot (4)linregunit slope

Figure 18: DSC scores of groupwise registration vs affine+groupwise regis-tration of all data sets for η = [1, 2, 3, 4]. Different levels of noiseare encoded in colors (black, red, green and blue respectively)and different data sets are encoded in markers.

5.3 experiment 3

This experiment studies the effect of affine initialization in cases ofextreme pose differences. The experiment uses a model of anatomicalvariation to induce synthetic noise in images and will serve as thebaseline for assessment performance of 3D PGMs in extreme cases(experiment 4). Table 5 shows DSC and standard deviation of regis-tration and images are shown in figure 20. The effect of affine initial-ization on minimum and maximum noise level is showed in figure21.

For η = 1, i.e. close to the amount of anatomical variation in orig-inal images, an extra initialization step makes very little differenceon the registration result since the groupwise registration algorithmis already initialized sufficiently close to the global optimum in thesearch space. For η = 4 the registration system fails for all data setsexcept cmamri. This is visualized in figure 18 where all blue datapoints (η = 4) are concentrated in the area of low scores except forcmamri denoted by the diamond marker. This may be due to the suf-ficiently simple topology of the head that is easily registered usingan affine transformation registration and even a non-rigid approach

62 discussion and conclusion

(aided by the multi-resolution setting). Registration of kneemricarot(squares) results in low scores regardless of the level of noise as seenfrom figure 18. This is because of the segmentation of small volumeof articular cartilage in addition to pose differences as previously dis-cussed. Registration of the wristct data set fails hopelessly but this isalso the most difficult data set of them all.

It is interesting to note that the registration procedure seems tobreak down on multiple data sets going from η = 3 to η = 4 and thatan extra affine initialization step does not seem to mitigate the prob-lem. The reason may be that the groupwise registration procedurefails, at least in part, due to the amount non-rigid variation inducedin images which no affine registration procedure is going to make upfor.

Figure 18 visualizes some of the other tendencies that have beenpreviously discussed in other experiments. Some data sets are well-posed and some are ill-posed. This is seen from colors (correspondingto different data sets registered on the same noiselevel) spanning theentire scale. Affine initialization does not seem to help the groupwiseregistration approach as seen from data points being close to the unitslope regardless of the data set and levels of noise.

From figure 21 and 18 it is evident that affine initialization slightlyimproves the final registration result in some cases and slightly de-grades final result in others except for η = 4 where both groupwise-and affine+groupwise registration fails hopelessly.

The table and images with results are shown on the next pages.

5.3 experiment 3 63

Data set

cmam

ri

Noise level (η) Groupwise Affine+Groupwise

1 0.78 ± 0.072 0.77 ± 0.074

2 0.78 ± 0.077 0.76 ± 0.076

3 0.77 ± 0.071 0.77 ± 0.077

4 0.78 ± 0.071 0.72 ± 0.083

Data set

knee

mri

Noise level (η) Groupwise Affine+Groupwise

1 0.88 ± 0.045 0.88 ± 0.041

2 0.88 ± 0.044 0.88 ± 0.043

3 0.86 ± 0.043 0.87 ± 0.044

4 0.15 ± 0.17 0.37 ± 0.19

Data set

heel

padm

ri

Noise level (η) Groupwise Affine+Groupwise

1 0.89 ± 0.033 0.89 ± 0.034

2 0.88 ± 0.036 0.89 ± 0.030

3 0.87 ± 0.037 0.89 ± 0.036

4 0.12 ± 0.13 0.05 ± 0.06

Data set

wri

stct

Noise level (η) Groupwise Affine+Groupwise

1 0.40 ± 0.26 0.40 ± 0.27

2 0.42 ± 0.27 0.42 ± 0.26

3 0.39 ± 0.27 0.36 ± 0.26

4 0.002 ± 0.0077 0.09 ± 0.14

Data set

wri

stm

ri

Noise level (η) Groupwise Affine+Groupwise

1 0.86 ± 0.012 0.84 ± 0.013

2 0.85 ± 0.011 0.83 ± 0.0073

3 0.83 ± 0.0067 0.84 ± 0.012

4 0.28 ± 0.43 Failed

Data set

knee

mri

caro

t

Noise level (η) Groupwise Affine+Groupwise

1 0.23 ± 0.19 0.24 ± 0.21

2 0.25 ± 0.21 0.21 ± 0.18

3 0.18 ± 0.17 0.17 ± 0.18

4 0.19 ± 0.16 0.16 ± 0.16

Table 5: Experiment 3 results. Affine- and affine+groupwise registration ofsynthetic data.

64 discussion and conclusion

(a) Groupwise vs Affine+Groupwise (noiselevel η = 1)

(b) Groupwise vs Affine+Groupwise (noiselevel η = 4)

Figure 19: Experiment 3. DSC scores and error bars of groupwise regis-tration plotted against affine+groupwiseregistration at minimumand maximum noiselevel. Error bars denote one standard devia-tion. Only five data sets are shown since registration of wristmrifailed in this case.

5.3 experiment 3 65

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

(q) (r) (s) (t)

(u) (v) (w) (x)

66 discussion and conclusion

(y) (z) (aa) (ab)

(ac) (ad) (ae) (af)

(ag) (ah) (ai) (aj)

(ak) (al) (am) (an)

(ao) (ap) (aq) (ar)

(as) (at) (au) (av)

Figure 20: Experiment 3 results. Affine and groupwise registration of syn-thetic images. First column shows mean images before registra-tion, second column shows images after affine initialization, thirdcolumn shows groupwise registration of images without initial-ization and fourth column shows groupwise registration of im-ages that has been initialized with an affine transform. For eachdata set there are two rows. The first shows images for η = 1 andthe second one shows images for η = 4.

5.4 experiment 4 67

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DSC Affine+3D PGM+Groupwise

DSC

Affi

ne+G

roup

wis

e

cmamri (1)cmamri (2)cmamri (3)cmamri (4)kneemri (1)kneemri (2)kneemri (3)kneemri (4)heelpadmri (1)heelpadmri (2)heelpadmri (3)heelpadmri (4)wristct (1)wristct (2)wristct (4)wristct (4)wristmri (1)wristmri (2)wristmri (3)kneemricarot (1)kneemricarot (2)kneemricarot (3)kneemricarot (4)linregunit slope

Figure 21: DSC scores of affine+groupwise registration vs affine+3DPGM+groupwise registration registration of all data sets for η =

[1, 2, 3, 4]. Different levels of noise are encoded in colors (black,red, green and blue respectively) and different data sets are en-coded in different markers.

5.4 experiment 4

This experiment studies the effect of including 3D PGMs during theinitialization of non-rigid groupwise registration in cases of extremepose differences and is the main experiment of the thesis. The exper-iment uses a model of anatomical variation to induce synthetic noisein images. Table 6 shows DSC and standard deviation of registrationand images are shown in figure 22. The comparison with affine ini-tialization on different levels of noise from experiment 3 is shown in21.

For data sets of simple structures there seem to be no significant dif-ference in including 3D PGMs in the registration procedure. Since 3DPGMs was shown to sometimes induce additional pose differencesin images in experiment 3, 3D PGMs should not be used on imageswhere the standard affine approach is adequate. 3D PGMs improveregistration over affine initialization in 2 out of 23 cases. However,while the improvement is significant these data points may very wellbe considered outliers. Each score is based on registration of only fourimages so one cannot rule out that this is due to statistical fluctuationsin induced anatomical noise.

This experiment highlights the concept of "no free lunch", i.e. thererarely is a single best approach able to deal with cases of all diffi-culties. A user would need to apply his or her expert knowledge in

68 discussion and conclusion

the design of the registration procedure for a given data set and de-cide which method of initialization should be used depending on theimages of interest.

5.4 experiment 4 69

Data set

cmam

ri

Noise level (η) Affine+PGM+Groupwise

1 0.76 ± 0.067

2 0.73 ± 0.072

3 0.75 ± 0.066

4 0.69 ± 0.010

Data set

knee

mri

Noise level (η) Affine+PGM+Groupwise

1 0.87 ± 0.044

2 0.86 ± 0.054

3 0.85 ± 0.010

4 0.86 ± 0.050

Data set

heel

padm

ri

Noise level (η) Affine+PGM+Groupwise

1 0.84 ± 0.060

2 0.88 ± 0.034

3 0.87 ± 0.035

4 Failed

Data set

wri

stct

Noise level (η) Affine+PGM+Groupwise

1 0.34 ± 0.26

2 0.46 ± 0.25

3 0.15 ± 0.23

4 0.39 ± 0.25

Data set

wri

stm

ri

Noise level (η) Affine+PGM+Groupwise

1 0.83 ± 0.017

2 0.83 ± 0.019

3 0.82 ± 0.019

4 0.11 ± 0.010

Data set

knee

mri

caro

t

Noise level (η) Affine+PGM+Groupwise

1 0.23 ± 0.18

2 0.16 ± 0.16

3 0.063 ± 0.066

4 0.070 ± 0.11

Table 6: Experiment 4 results. PGM and affine+pgm+groupwise registrationof synthetic data.

70 discussion and conclusion

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

(q) (r) (s) (t)

(u) (v) (w) (x)

Figure 22: Experiment 4 results. Affine+PGM+groupwise registration ofsynthetic images. First column shows mean images before reg-istration with η = 1, second column shows images after registra-tion with η = 1, third column shows mean images images beforeregistration using η = 4 and fourth column shows mean imagesafter registration using η = 4. Images of affine+groupwise regis-tration is given in figure 20.

5.5 discussion 71

5.5 discussion

The method did not improve registration in the majority of cases andwill need further development before it can be considered applicable.As with any field, expert knowledge is essential in the design of med-ical image processing system where different anatomical structuresbenefit from different algorithmic approaches. In some extreme cases,a user may find use for 3D PGMs and in other (most) cases a simpleaffine transform is sufficient. In any case results should be carefullyexamined. On well-posed data sets the 3D PGM approach offers nobenefit in its curent form.

The proposed method seems to suffer from the following problems.

Computational complexity. The computational complexity of the opti-mization problem can be approximated by O(NgenNindNInK

3)

where Ngen and Nind is the number of generations and in-dividuals used by the genetic algorithm respecively, NI is thenumber of images, m is the number of vertices (parts) in themodel and K is the number of edges to each vertice. Althoughthe computational efficiency of the system can be improved byreducing n and K this may also reduce the accuracy of the regis-tration. Also, the large number of keypoints in images dictatesreasonably large Ngen and Nind to cover a sufficient size of thesearch space. A faster system would require a keypoint detectoryielding fewer keypoints or a method of ranking the utility ofkeypoints and then use the subset of the best ones.

The computational burden of computing spin images in theneighborhoods of thousands of keypoints in all images in thepopulation is huge even when rotational degrees of freedom hasbeen removed. A significant amount time (about 5 minutes) isused to compute spin images for each image before the geneticalgorithm optimization can commence. Registering a group ofimages takes roughly two hours depending on the number ofimages in the population.

Seperate models in a pyramid. Information gained in one level of a pyra-mid is only relayed to the following level by the warped im-ages. Much of the information, such as geometrical constraints,is discarded and the model building process starts from scratch.It may be beneficial to initialize a model with the informationgained on coarser scale.

Difficult choices of n and α. Good values for n and α can vary betweendata sets and mean the difference between success and hopelessfailure. Exploring good values for n and α required repeatingexperiments on every data set which is not desirable given thecomputational complexity. Zhang et al. developed a method

72 discussion and conclusion

for automatic estimation of m using a voting-based approachinstead of a genetic algorithm. In this scheme, many modelsare build from random subset of keypoints and keypoints areassigned a vote from each model that is proportional to theutility of the model (see algorithm 1). The assumption is thatgood keypoints will receive votes from good models and viceversa. This methodology outperformed the optimization-basedapproach employed in this work and if time permitted it wouldhave been interesting to study if the same was true for 3D.

Limitation of a single model. The performance of the system only de-pends on the quality of the final model in each level of the pyra-mid. However, many factors influence the utility of a model:(i) keypoint quality, i.e. bad keypoints may lead to spuriousmatches, (ii) the cost function (5), i.e. the Gaussian distributionmay not be sufficient to capture the large shape variations ofthe object, (iii) the optimization-based approach, i.e. the geneticalgorithm may end up in a local minima. As a result, it is veryunlikely for a single model to deal well with the whole set ofimages. A scheme for multi-model initialization developed byZhang et al. could be applied to 3D PGMs but this was beyondthe scope of this work.

Semi-groupwise approach. The method is not a true groupwise approachin the purest sense since images are registered to a referenceimage as opposed to, for example, the groupwise non-rigid reg-istration algorithm. If time permitted, a cost measure similar tothis could have been developed.

Few registered images. As previously discussed, the amount of avail-able RAM available to the groupwise registration procedureonly allowed four images from each data set to be registered.Ideally, many more images would have been used in the regis-tration procedure.

5.6 conclusion

We have described an automatic system for initialization of groupwise-nonrigid registration methods using Parts+Geometry models in 3D.The model assumes no domain knowledge of the object of interestand is fully automatic. We did not seek perfect correspondences - justenough good ones to sufficiently align images for subsequent volu-metric registration. However, experiments show that 3D PGMs doesnot outperform earlier standard affine initialization and should not beapplied in its current form. The 3D model building process includeextra degrees of freedom and a higher-dimensional solution spacewhich the design of the 3D extension was not able to accomodate.The potential benefit of 3D PGMs cannot be completely dismissed,

5.6 conclusion 73

however. With more development and application-specific tuning ofparameters, the 3D PGM may be a viable for some data sets.

B I B L I O G R A P H Y

[1] S.A. Adeshina and T.F. Cootes. Constructing part-based modelsfor groupwise registration. In Biomedical Imaging: From Nano toMacro, 2010 IEEE International Symposium on, pages 1073 –1076,april 2010.

[2] V. Arsigny, O. Commowick, X. Pennec, and N. Ayache. A log-euclidean polyaffine framework for locally rigid or affine regis-tration. In Biomedical Image Registration, volume 4057 of LectureNotes in Computer Science, pages 120–127. Springer Berlin Hei-delberg, 2006.

[3] V. Arsigny, X. Pennec, and N. Ayache. Polyrigid and polyaffinetransformations: a novel geometrical tool to deal with non-rigiddeformations - application to the registration of histologicalslices. Med Image Anal, 9(6):507–23, 2005.

[4] K.S. Arun and K.S. Sarath. An automatic feature based reg-istration algorithm for medical images. In Advances in RecentTechnologies in Communication and Computing (ARTCom), 2010 In-ternational Conference on, pages 174 –177, oct. 2010.

[5] Brian Avants and James C. Gee. Geodesic estimation forlarge deformation anatomical shape averaging and interpola-tion. NeuroImage, 23, Supplement 1(0):S139 – S150, 2004.

[6] Kola Babalola and Tim Cootes. Using parts and geometry mod-els to initialise active appearance models for automated seg-mentation of 3d medical images. In Proceedings of the 2010 IEEEinternational conference on Biomedical imaging: from nano to Macro,pages 1069–1072, Piscataway, NJ, USA, 2010. IEEE Press.

[7] Kolawole Babalola, Andrew Gait, and Timothy F. Cootes. Aparts-and-geometry initialiser for 3d non-rigid registration us-ing features derived from spin images. Neurocomputing, 2013.

[8] Ruzena Bajcsy and Stane Kovacic. Multiresolution elasticmatching. Comput. Vision Graph. Image Process., 46(1):1–21, April1989.

[9] S. Baker, I. Matthews, and J. Scheider. Automatic constructionof active appearance models as an image coding problem. IEEETrans. Pattern. Anal. Mach. Intell., 26(10):1380–1384, October 204.

[10] S. Balci, P. Golland, and W. Wells. Non-rigid groupwise regis-tration using b-spline deformation model. 07 2007.

75

76 bibliography

[11] Miguel Ángel González Ballester. Hear-eu project description.2012.

[12] Dirk Bauer. Selective visualisation of unsteady 3D flow using scale-space and feature-based techniques. PhD thesis, Swiss Federal In-stitute of Technology, 2006.

[13] M. Faisal Beg, Michael I. Miller, Alain Trouvé, and LaurentYounes. Computing large deformation metric mappings viageodesic flows of diffeomorphisms. Int. J. Comput. Vision,61(2):139–157, February 2005.

[14] K.K. Bhatia, J.V. Hajnal, B.K. Puri, A.D. Edwards, and D. Rueck-ert. Consistent groupwise non-rigid registration for atlas con-struction. In Biomedical Imaging: Nano to Macro, 2004. IEEE Inter-national Symposium on, pages 908 – 911 Vol. 1, april 2004.

[15] F.L. Bookstein. Principal warps: thin-plate splines and the de-composition of deformations. Pattern Analysis and Machine In-telligence, IEEE Transactions on, 11(6):567 –585, jun 1989.

[16] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energyminimization via graph cuts. Pattern Analysis and Machine Intel-ligence, IEEE Transactions on, 23(11):1222–1239, 2001.

[17] M.C. Burl and P. Perona. Recognition of planar object classes.In In Proc. IEEE Comput. Soc. Conf. Comput. Vision and PatternRecogn, pages 223–230, 1996.

[18] T. F. Cootes, C. J. Twining, V. Petrovic, R. Schestowitz, and C. J.Taylor. Groupwise construction of appearance models usingpiece-wise affine deformations. In in Proceedings of 16 th BritishMachine Vision Conference, pages 879–888, 2005.

[19] T. F. Cootes, C. J. Twinning, V. Petrovic, K. O. Babalola, and C. J.Taylor. Computing accurate correspondences across groups ofimages. IEEE Trans. Pattern. Anal. Mach. Intell., 32(11):1994–2005,November 2010.

[20] D. J. Crandall and D. P. Huttenlocher. Weakly supervised learn-ing of part-based spatial models for visual object recognition.Proc. Eur. Conf. Computer Vis., 1:16–29, 2006.

[21] E. D’Agostino, F. Maes, D. Vandermeulen, and P. Suetens. A vis-cous fluid model for multimodal non-rigid image registrationusing mutual information. In Takeyoshi Dohi and Ron Kikinis,editors, Medical Image Computing and Computer-Assisted Interven-tion — MICCAI 2002, volume 2489 of Lecture Notes in ComputerScience, pages 541–548. Springer Berlin Heidelberg, 2002.

bibliography 77

[22] R.H. Davies, C.J. Twining, T.F. Cootes, J.C. Waterton, and C.J.Taylor. A minimum description length approach to statisti-cal shape modeling. Medical Imaging, IEEE Transactions on,21(5):525–537, 2002.

[23] M.H. Davis, A. Khotanzad, D.P. Flamig, and S.E. Harms. Aphysics-based coordinate transformation for 3-d image match-ing. Medical Imaging, IEEE Transactions on, 16(3):317 –328, june1997.

[24] R. Donner, B. Micusik, G. Langs, and H. Bischof. Sparse mrfappearance models for fast anatomical structure localisation. InProceedings of the British Machine Vision Conference, pages 109.1–109.10. BMVA Press, 2007. doi:10.5244/C.21.109.

[25] René Donner, Erich Birngruber, Helmut Steiner, Horst Bischof,and Georg Langs. Localization of 3D anatomical structures us-ing random forests and discrete optimization. In Bjoern Menze,Georg Langs, Zhuowen Tu, and Antonio Criminisi, editors,Medical Computer Vision. Recognition Techniques and Applicationsin Medical Imaging, volume 6533 of Lecture Notes in ComputerScience, pages 86–95. Springer Berlin Heidelberg, 2011.

[26] René Donner, Georg Langs, Branislav Micušik, and HorstBischof. Generalized sparse MRF appearance models. Imageand Vision Computing, 28(6):1031 – 1038, 2010.

[27] Aloys du Bois d’Aische, Mathieu De Craene, Xavier Geets, Vin-cent Gregoire, Benoit Macq, and Simon K. Warfield. Efficientmulti-modal dense field non-rigid registration: alignment ofhistological and section images. Medical Image Analysis, 9(6):538

– 546, 2005.

[28] S. Duchesne, J. Pruessner, and D.L. Collins. Appearance-basedsegmentation of medial temporal lobe structures. Neuroimage,17(2):515–31, 2002.

[29] Alan C. Evans, Weiqian Dai, D. Louis Collins, Peter Neelin,and Sean Marrett. Warping of a computerized 3-d atlas tomatch brain image volumes for quantitative neuroanatomicaland functional analysis. pages 236–246, 1991.

[30] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Pictorialstructures for object recognition. Int. J. Comput. Vision, 61(1):55–79, January 2005.

[31] R. Fergus, P. Perona, and A. Zisserman. Object class recogni-tion by unsupervised scale-invariant learning. Proc. IEEE Conf.Comput. Vis. Pattern Recognit., 2003.

78 bibliography

[32] R. Fergus, P. Perona, and A. Zisserman. A sparse object cat-egory model for efficient learning and exhaustive recognition.Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pages 380–387,2005.

[33] Sanja Fidler, Marko Boben, and Ales Leonardis. Optimizationframework for learning a hierarchical shape vocabulary for ob-ject class detection. In BMVC’09, pages –1–1, 2009.

[34] B. Fischer and J. Modersitzki. Combination of automatic non-rigid and landmark based registration: the best of both worlds.In M. Sonka and J. M. Fitzpatrick, editors, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol-ume 5032 of Society of Photo-Optical Instrumentation Engineers(SPIE) Conference Series, pages 1037–1048, May 2003.

[35] Bernd Fischer and Jan Modersitzki. A unified approach to fastimage registration and a new curvature based registration tech-nique. APPLICATIONS, pages 107–124, 2004.

[36] M.A. Fischler and R.A. Elschlager. The representation andmatching of pictorial structures. Computers, IEEE Transactionson, C-22(1):67 – 92, jan. 1973.

[37] Luc M.J Florack, Bart M ter Haar Romeny, Jan J Koenderink,and Max A Viergever. Scale and the differential structure ofimages. Image and Vision Computing, 10(6):376 – 388, 1992.

[38] A. F. Frangi, D. Rueckert, J. A. Schnabel, and W. J. Nissen. Auto-matic construction of multiple-object three dimensional statisti-cal shape models. Proc. Int. Conf. Inf. Process. Med. Imag., pages78–91, 2001.

[39] A. F. Frangi, D. Rueckert, J. A. Schnabel, and W. J. Nissen. Au-tomatic construction of multiple-object three dimensional sta-tistical shape models. IEEE Trans. Med. Imag., 21(9):1151–1166,September 2002.

[40] J. C. Gee. On matching brain volumes. Pattern Recognition,32:99–111, 1998.

[41] Eldad Haber and Jan Modersitzki. Numerical methods for im-age registration. 2004.

[42] C. Harris and M. Stephens. A combined edge and corner detec-tor. Proc. Alvey Vis. Conf., pages 147–151, 1988.

[43] Thomas Hartkens, Derek L. G. Hill, Andy D. Castellano-Smith,David J. Hawkes, Calvin R. Maurer, Jr., A. J. Martin, Walter A.Hall, H. Liu, and Charles L. Truwit. Using points and surfacesto improve voxel-based non-rigid registration. In Proceedings of

bibliography 79

the 5th International Conference on Medical Image Computing andComputer-Assisted Intervention-Part II, MICCAI ’02, pages 565–572, London, UK, UK, 2002. Springer-Verlag.

[44] Tobias Heinmann and Hans-Peter Meinzer. Statistical shapemodels for 3d medical image segmentation: A review. MedicalImage Analysis, 13:543–563, 2009.

[45] P. Hellier and C. Barillot. Coupling dense and landmark-basedapproaches for nonrigid registration. Medical Imaging, IEEETransactions on, 22(2):217 –227, feb. 2003.

[46] Derek L. G. Hill, Philipp G Batchelor, Mark Holden, andDavid J Hawkes. Medical image registration. Physics in Medicineand Biology, 46(3), 2001.

[47] Luis Ibanez, William Schroeder, Lydia Ng, and Josh Cates. Theitk software guide. 2003.

[48] Mark Jenkinson and Stephen Smith. A global optimisationmethod for robust affine registration of brain images. MedicalImage Analysis, 5(2):143 – 156, 2001.

[49] Hans J. Johnson, Gary E. Christensen, Gary E. Christensen, andD. Sc. Consistent landmark and intensity-based image registra-tion. IEEE Transactions on Medical Imaging, 21:450–461, 2002.

[50] MichaelJ. Jones and Tomaso Poggio. Multidimensional mor-phable models: A framework for representing and matchingobject classes. International Journal of Computer Vision, 29:107–131, 1998.

[51] S. Joshi, Brad Davis, B Matthieu Jomier, and Guido Gerig B.Unbiased diffeomorphic atlas construction for computationalanatomy. Neuroimage, 23:151–160, 2004.

[52] T. Kadir and M. Brady. Saliency, scale and image description.Int. J. Comput. Vis., 45(2):83–105, 2004.

[53] J. Karlsson and K. Astrom. Mdl patch correspondences on un-labeled images with occlusions. In Computer Vision and PatternRecognition Workshops, 2008. CVPRW ’08. IEEE Computer SocietyConference on, pages 1 –8, june 2008.

[54] Fahmi Khalifa, GarthM. Beache, Georgy Gimel’farb, JasjitS.Suri, and AymanS. El-Baz. State-of-the-art medical image regis-tration methodologies: A survey. In Ayman S. El-Baz, RajendraAcharya U, Majid Mirmehdi, and Jasjit S. Suri, editors, MultiModality State-of-the-Art Medical Image Segmentation and Registra-tion Methodologies, pages 235–280. Springer US, 2011.

80 bibliography

[55] S. Klein, M. Staring, K. Murphy, M.A. Viergever, and J. Pluim.elastix: A toolbox for intensity-based medical image registra-tion. Medical Imaging, IEEE Transactions on, 29(1):196 –205, jan.2010.

[56] S. Klein, M. Staring, and J. P.W. Pluim. Evaluation of optimiza-tion methods for nonrigid medical image registration using mu-tual information and b-splines. Trans. Img. Proc., 16(12):2879–2890, December 2007.

[57] J. J. Koenderink. The structure of images. In Biol. Cyb., vol-ume 50, pages 362–370, 1984.

[58] J. Kybic and M. Unser. Fast parametric elastic image registra-tion. Image Processing, IEEE Transactions on, 12(11):1427 – 1442,nov. 2003.

[59] Georg Langs, René Donner, Philipp Peloschek, and HorstBischof. Robust autonomous model learning from 2d and 3ddata sets. In Proceedings of the 10th international conference onMedical image computing and computer-assisted intervention - Vol-ume Part I, MICCAI’07, pages 968–976, Berlin, Heidelberg, 2007.Springer-Verlag.

[60] Georg Langs and Bischof Horst Philipp Peloschek, Rene Don-ner. Annotation propagation by mdl based correspondences.Proceedings of Computer Vision Winter Workshop, pages 11–16,2006.

[61] Hava Lester and Simon R. Arridge. A survey of hierarchi-cal non-linear medical image registration. Pattern Recognition,32(1):129 – 149, 1999.

[62] B Likar and F Pernuš. A hierarchical approach to elastic regis-tration based on mutual information. Image and Vision Comput-ing, 19(1–2):33 – 44, 2001.

[63] T. Lindeberg. Scale-Space Theory in Computer Vision. KluwerAcademic Publishers, 1994.

[64] Tony Lindeberg. Scale-space theory in computer vision. Springer,1993.

[65] Tony Lindeberg. Feature detection with automatic scale selec-tion. International Journal of Computer Vision, 30:79–116, 1998.

[66] Tony Lindenberg. Scale-Space. John Wiley & Sons, Inc., 2007.

[67] David G. Lowe. Distinctive image features from scale-invariantkeypoints. International Journal of Computer Vision, 60:91–110,2004.

bibliography 81

[68] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, andP. Suetens. Multimodality image registration by maximizationof mutual information. Medical Imaging, IEEE Transactions on,16(2):187 –198, april 1997.

[69] J. B. Antoine Maintz and Max A. Viergever. An overview ofmedical image registration methods. Technical report, In Sym-posium of the Belgian hospital physicists association (SBPH-BVZF, 1996.

[70] K. V. MARDIA, J. T. KENT, C. R. GOODALL, and J. A. LITTLE.Kriging and splines with derivative information. Biometrika,85(2):505, 1998.

[71] D. Mattes, D.R. Haynor, H. Vesselle, T.K. Lewellen, and W. Eu-bank. Pet-ct image registration in the chest using free-formdeformations. Medical Imaging, IEEE Transactions on, 22(1):120

–128, jan. 2003.

[72] C.T. Metz, S. Klein, M. Schaap, T. van Walsum, and W.J. Niessen.Nonrigid registration of dynamic medical imaging data usingnd+t b-splines and a groupwise optimization approach. MedicalImage Analysis, 15(2):238 – 249, 2011.

[73] E.G. Miller, N.E. Matsakis, and P.A. Viola. Learning from oneexample through shared densities on transforms. In ComputerVision and Pattern Recognition, 2000. Proceedings. IEEE Conferenceon, volume 1, pages 464 –471 vol.1, 2000.

[74] Michael Miller, Alain Trouve, and Laurent Younes. On the met-rics and euler-lagrange equations of computational anatomy.Annual Review of Biomedical Engineering, 4(1):375–405, 2002.

[75] Xavier Pennec, Nicholas Ayache, and Jean-Philippe Thirion.Handbook of medical imaging. chapter Landmark-based regis-tration using features identified through differential geometry,pages 499–513. Academic Press, Inc., Orlando, FL, USA, 2000.

[76] G.P. Penney, J. Weese, J.A. Little, P. Desmedt, D.L.G. Hill, andD.J. Hawkes. A comparison of similarity measures for use in2-d-3-d medical image registration. Medical Imaging, IEEE Trans-actions on, 17(4):586 –595, aug. 1998.

[77] Alain Pitiot, Gregoire Malandain, Eric Bardinet, and Paul M.Thompson. Piecewise affine registration of biological images.pages 91–101, 2003.

[78] J.P.W. Pluim, J.B.A. Maintz, and M.A. Viergever. Mutual-information-based registration of medical images: a survey.Medical Imaging, IEEE Transactions on, 22(8):986 –1004, aug. 2003.

82 bibliography

[79] J. Ponce, M. Hebert, C. Schmid, and A. Zisserman. Towardscatagory-level object recognition. Springer-Verlag, 2006.

[80] Birgit Falk Riecke, R. Christensen, M. Boesen, P. Christensen, L.S. Lohmander, Arne Astrup, and Henning Bliddal. Bad kneesare no excuse for failure to lose weight. Obesity Facts, (suppl.2):220, 2009.

[81] G.K. Rohde, A. Aldroubi, and B.M. Dawant. The adaptive basesalgorithm for intensity-based nonrigid image registration. Med-ical Imaging, IEEE Transactions on, 22(11):1470 –1479, nov. 2003.

[82] T. Rohlfing, Jr. Maurer, C.R., D.A. Bluemke, and M.A. Jacobs.Volume-preserving nonrigid registration of mr breast imagesusing free-form deformation with an incompressibility con-straint. Medical Imaging, IEEE Transactions on, 22(6):730 –741,june 2003.

[83] Torsten Rohlfing, Robert Brandt, Randolf Menzel, Calvin R.Maurer, and Jr. Evaluation of atlas selection strategies foratlas-based image segmentation with application to confocalmicroscopy images of bee brains. NeuroImage, 21(4):1428–1442,2004.

[84] K. Rohr, H.S. Stiehl, R. Sprengel, T.M. Buzug, J. Weese, andM.H. Kuhn. Landmark-based elastic registration using approx-imating thin-plate splines. Medical Imaging, IEEE Transactionson, 20(6):526 –534, june 2001.

[85] D. Rueckert, A.F. Frangi, and J.A. Schnabel. Automatic con-struction of 3-d statistical deformation models of the brain us-ing nonrigid registration. Medical Imaging, IEEE Transactions on,22(8):1014 –1025, aug. 2003.

[86] D. Rueckert, L.I. Sonoda, C. Hayes, D.L.G. Hill, M.O. Leach,and D.J. Hawkes. Nonrigid registration using free-form defor-mations: application to breast mr images. Medical Imaging, IEEETransactions on, 18(8):712 –721, aug. 1999.

[87] Stefan Schmidt, Jörg Kappes, Martin Bergtholdt, VladimirPekar, Sebastian Dries, Daniel Bystrov, and Christoph Schnörr.Spine detection and labeling using a parts-based graphicalmodel. In Nico Karssemeijer and Boudewijn Lelieveldt, editors,Information Processing in Medical Imaging, volume 4584 of Lec-ture Notes in Computer Science, pages 122–133. Springer BerlinHeidelberg, 2007.

[88] Julia A. Schnabel, Daniel Rueckert, Marcel Quist, JaneM.Blackall, AndyD. Castellano-Smith, Thomas Hartkens, Grae-meP. Penney, WalterA. Hall, Haiying Liu, CharlesL. Truwit,

bibliography 83

FransA. Gerritsen, DerekL.G. Hill, and DavidJ. Hawkes. Ageneric framework for non-rigid registration based on non-uniform multi-level free-form deformations. In WiroJ. Niessenand MaxA. Viergever, editors, Medical Image Computing andComputer-Assisted Intervention – MICCAI 2001, volume 2208 ofLecture Notes in Computer Science, pages 573–581. Springer BerlinHeidelberg, 2001.

[89] K. Sidorov, D. Marshall, and S. Richmond. An efficient stochas-tic approach to groupwise non-rigid image registration. Proc.IEEE Conf. Comput. Vis. Pattern Recognit., pages 2208–2213, 2009.

[90] Marius Staring, Stefan Klein, and Josien P. W. Pluim. A rigid-ity penalty term for nonrigid registration. Medical Physics,34(11):4098–4108, 2007.

[91] C. Studholme, D.L.G. Hill, and D.J. Hawkes. Automated 3-dregistration of mr and ct images of the head. Medical ImageAnalysis, 1(2):163 – 175, 1996.

[92] C. Studholme, D.L.G. Hill, and D.J. Hawkes. An overlap invari-ant entropy measure of 3d medical image alignment. PatternRecognition, 32(1):71 – 86, 1999.

[93] P. Thevenaz, U.E. Ruttimann, and M. Unser. A pyramid ap-proach to subpixel registration based on intensity. Image Pro-cessing, IEEE Transactions on, 7(1):27 –41, jan 1998.

[94] P. Thevenaz and M. Unser. Optimization of mutual informationfor multiresolution image registration. Image Processing, IEEETransactions on, 9(12):2083 – 2099, dec 2000.

[95] Tom Vercauteren, Xavier Pennec, Aymeric Perchant, andNicholas Ayache. Diffeomorphic demons: Efficient non-parametric image registration. NeuroImage, 45(1, Supplement1):S61 – S72, 2009.

[96] P. Viola and III Wells, W.M. Alignment by maximization ofmutual information. In Computer Vision, 1995. Proceedings., FifthInternational Conference on, pages 16 –23, jun 1995.

[97] A.J. Viterbi. Error bounds for convolutional codes and anasymptotically optimum decoding algorithm. Information The-ory, IEEE Transactions on, 13(2):260–269, 1967.

[98] M. Weber, M. Welling, and P. Perona. Unsupervised learningof models for recognition. Proc. Eur. Conf. Computer Vis., pages18–32, 2000.

[99] Jay West, J. Michael Fitzpatrick, Matthew Y. Wang, Benoit M.Dawant, Calvin R. Maurer, Jr., Robert M. Kessler, Robert J.

84 bibliography

Maciunas, Christian Barillot, Didier Lemoine, Andre Collignon,Frederik Maes, Paul Suetens, Dirk Vandermeulen, Petra A.van den Elsen, Sandy Napel, Thilaka S. Sumanaweera, BethHarkness, Paul F. Hemler, Derek L. G. Hill, David J. Hawkes,Colin Studholme, J. B. Antoine Maintz, Max A. Viergever, Gre-goire Malandain, Xavier Pennec, Marilyn E. Noz, Gerald Q.Maguire, Jr., Michael Pollack, Charles A. Pelizzari, Richard A.Robb, Dennis Hanson, and Roger P. Woods. Comparison andevaluation of retrospective intermodality brain image registra-tion techniques. JOURNAL OF COMPUTER ASSISTED TO-MOGRAPHY, 21:554–566, 1997.

[100] Guorong Wu, Hongjun Jia, Qian Wang, and Dinggang Shen.Groupwise registration with sharp mean. In Proceedings ofthe 13th international conference on Medical image computing andcomputer-assisted intervention: Part II, MICCAI’10, pages 570–577,Berlin, Heidelberg, 2010. Springer-Verlag.

[101] Guorong Wu, Hongjun Jia, Qian Wang, and Dinggang Shen.Sharpmean: Groupwise registration guided by sharp mean im-age and tree-based registration. NeuroImage, 56(4):1968 – 1981,2011.

[102] A. L. Yuille and T. A. Poggio. Scaling Theorems for Zero-Crossings. IEEE Trans. Pattern Analysis and Machine Intell., 8:15–25, 1986.

[103] Pei Zhang, Steve A. Adeshina, and Timothy F. Cootes. Auto-matic learning sparse correspondences for initialising group-wise registration. In Tianzi Jiang, Nassir Navab, JosienP.W.Pluim, and MaxA. Viergever, editors, Medical Image Computingand Computer-Assisted Intervention – MICCAI 2010, volume 6362

of Lecture Notes in Computer Science, pages 635–642. SpringerBerlin Heidelberg, 2010.

[104] Pei Zhang and T. F. Cootes. Automatic construction ofparts+geometry models for initializing groupwise registration.IEEE Trans. Med. Imag., 31(2), February 2012.

[105] Pei Zhang and TimothyF. Cootes. Automatic part selection forgroupwise registration. In Gábor Székely and HorstK. Hahn,editors, Information Processing in Medical Imaging, volume 6801

of Lecture Notes in Computer Science, pages 636–647. SpringerBerlin Heidelberg, 2011.

[106] Pei Zhang, Pew-Thian Yap, Dinggang Shen, and TimothyF.Cootes. Initialising groupwise non-rigid registration usingmultiple parts+geometry models. In Nicholas Ayache, HervéDelingette, Polina Golland, and Kensaku Mori, editors, Medical

bibliography 85

Image Computing and Computer-Assisted Intervention – MICCAI2012, volume 7512 of Lecture Notes in Computer Science, pages156–163. Springer Berlin Heidelberg, 2012.

[107] Long (Leo) Zhu, Chenxi Lin, Haoda Huang, Yuanhao Chen,and Alan Yuille. Unsupervised structure learning: Hierarchi-cal recursive composition, suspicious coincidence and competi-tive exclusion. In Proceedings of the 10th European Conference onComputer Vision: Part II, pages 759–773, Berlin, Heidelberg, 2008.Springer-Verlag.

[108] Lilla Zöllei, Erik Learned-Miller, Eric Grimson, and WilliamWells. Efficient population registration of 3d data. In Yanxi Liu,Tianzi Jiang, and Changshui Zhang, editors, Computer Visionfor Biomedical Image Applications, volume 3765 of Lecture Notesin Computer Science, pages 291–301. Springer Berlin Heidelberg,2005.

colophon

This document was typeset using the typographical look-and-feelclassicthesis developed by André Miede. The style was inspiredby Robert Bringhurst’s seminal book on typography “The Elements ofTypographic Style”. classicthesis is available for both LATEX and LYX:

http://code.google.com/p/classicthesis/

Final Version as of July 26, 2013 (Masters thesis 1st edition).