Variational Inference for Image Segmentation - BAMBI...

12
Variational Inference for Image Segmentation Claudia Blaiotta 1 , M. Jorge Cardoso 2 , and John Ashburner 1 1 Wellcome Trust Centre for Neuroimaging, University College London, London, UK 2 Centre for Medical Image Computing, University College London, London, UK Abstract. Variational inference techniques are powerful methods for learning probabilistic models and provide significant advantages with re- spect to maximum likelihood (ML) or maximum a posteriori (MAP) approaches. Nevertheless they have not yet been fully exploited for im- age processing applications. In this paper we present a variational Bayes (VB) approach for image segmentation. We aim to show that VB pro- vides a framework for generalizing existing segmentation algorithms that rely on an expectation-maximization formulation, while increasing their robustness and computational stability. We also show how optimal model complexity can be automatically determined in a variational setting, as opposed to ML frameworks which are intrinsically prone to overfitting. Keywords: Variational Bayes, Gaussian Mixture Model, Image Seg- mentation 1 Introduction Many of the most widely used image segmentation algorithms rely on probabilis- tic modelling techniques to fit the intensity distributions of images. Nevertheless, the majority of currently available tools are based on maximum likelihood (ML) or maximum a posteriori (MAP) estimation of the model parameters, without recourse to a fully Bayesian formulation. In fact, the latter approach has been poorly explored in the field of medical imaging, in spite of a promising potential which was shown for example in [1–3]. A fully Bayesian treatment of probabilistic models involves the introduction of prior distributions over all unobserved variables Z (model parameters and latent variables) and the evaluation of the posterior distribution p(Z|X), as well as of the model evidence p(X). Very often and also for relatively simple mod- els, computing the full posterior distributions over the hidden variables turns out to be infeasible. In such cases, variational techniques represent a compu- tationally effective way of evaluating approximated solutions to the inference problem by postulating a specific form, or factorization, of the posterior distri- butions. As a result, the estimated posteriors will never be exact. Nevertheless variational methods have proved to be more convenient than standard ML or MAP techniques, since, at a substantially similar computational cost, they avoid the problems related to overfitting which are intrinsic to the other methods. In other words, variational techniques open up the possibility of learning the op- timal model structure (the one with highest generalization capability) without

Transcript of Variational Inference for Image Segmentation - BAMBI...

Page 1: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

Variational Inference for Image Segmentation

Claudia Blaiotta1, M. Jorge Cardoso2, and John Ashburner1

1 Wellcome Trust Centre for Neuroimaging, University College London, London, UK2 Centre for Medical Image Computing, University College London, London, UK

Abstract. Variational inference techniques are powerful methods forlearning probabilistic models and provide significant advantages with re-spect to maximum likelihood (ML) or maximum a posteriori (MAP)approaches. Nevertheless they have not yet been fully exploited for im-age processing applications. In this paper we present a variational Bayes(VB) approach for image segmentation. We aim to show that VB pro-vides a framework for generalizing existing segmentation algorithms thatrely on an expectation-maximization formulation, while increasing theirrobustness and computational stability. We also show how optimal modelcomplexity can be automatically determined in a variational setting, asopposed to ML frameworks which are intrinsically prone to overfitting.

Keywords: Variational Bayes, Gaussian Mixture Model, Image Seg-mentation

1 Introduction

Many of the most widely used image segmentation algorithms rely on probabilis-tic modelling techniques to fit the intensity distributions of images. Nevertheless,the majority of currently available tools are based on maximum likelihood (ML)or maximum a posteriori (MAP) estimation of the model parameters, withoutrecourse to a fully Bayesian formulation. In fact, the latter approach has beenpoorly explored in the field of medical imaging, in spite of a promising potentialwhich was shown for example in [1–3].

A fully Bayesian treatment of probabilistic models involves the introductionof prior distributions over all unobserved variables Z (model parameters andlatent variables) and the evaluation of the posterior distribution p(Z|X), as wellas of the model evidence p(X). Very often and also for relatively simple mod-els, computing the full posterior distributions over the hidden variables turnsout to be infeasible. In such cases, variational techniques represent a compu-tationally effective way of evaluating approximated solutions to the inferenceproblem by postulating a specific form, or factorization, of the posterior distri-butions. As a result, the estimated posteriors will never be exact. Neverthelessvariational methods have proved to be more convenient than standard ML orMAP techniques, since, at a substantially similar computational cost, they avoidthe problems related to overfitting which are intrinsic to the other methods. Inother words, variational techniques open up the possibility of learning the op-timal model structure (the one with highest generalization capability) without

Page 2: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

performing ad-hoc cross validation analyses [4–6]. Another interesting aspect ofworking within a VB framework is that it leads to a more general formulationof the EM algorithm, which has the same convergence properties and highercomputational stability. In fact, one important limitation of the ML formulationis the presence of singular points of the likelihood function, which have to beavoided during the optimization process to assure numerical stability.

A convenient way of restricting the space of the approximating posteriordistribution q(Z) consists in assuming that it factorizes into a product of terms,

each one involving just a subset of Z (mean field theory): q(Z) =∏Ss=1 qs(Zs) .

The optimal solution for the different factors {qs}s=1,...,S can be obtainedby maximizing a functional that defines a lower bound on the log marginalprobability log p(X). Such a lower bound can be derived from the followingdecomposition which holds for any q(Z)

log p(X) =

∫q(Z) log

{p(X,Z)

q(Z)

}dZ +

∫q(Z) log

{q(Z)

p(Z|X)

}dZ . (1)

The first integral in (1) defines a lower bound L(q) on the logarithm of the modelevidence while the second integral is the Kullback-Leibler divergence DKL(q‖p)between the variational approximating posterior and the true posterior distribu-tion. It can be proved that the optimal form of each factor qs(Zs) is

qs(Zs) ∝ exp(Es6=s[log p(X,Z)]) . (2)

Equation (2) is not an analytical solution anyway, since the different factorshave optimal forms that depend on one another. As a result, the natural approachfor solving this variational optimization problem consist in iteratively updatingeach factor given the most recent forms of the other ones. This leads to a schemethat turns out to be very similar to the structure of the EM algorithm [6].

For some complex models, a fully Bayesian treatment of all unobserved vari-ables might still be extremely impractical, if not impossible, even when varia-tional techniques are used. Anyway, in this case it is still possible to obtain MAPpoint estimates for the parameters which are not treated in a fully Bayesian man-ner. Such values are obtained in a way that is a generalization of the M-step inthe EM algorithm. In particular, the function that needs to be optimized is theexpectation of the logarithm of the joint probability of X and Z, E[log p(X,Z)].

In this paper we show that VB techniques can be effectively applied to solvethe problem of image segmentation, providing extra robustness, stability andflexibility over existing methods, but without significantly affecting the com-putational cost. Moreover we demonstrate that a fully Bayesian formulation ofimage segmentation can be exploited for automatically determining the optimalmodel complexity. The framework described in the following sections is builton the segmentation method implemented in the SPM12 1 software (WellcomeTrust Centre for Neuroimaging, UCL, London, UK) and our aim is to extendthis widely used method by incorporating priors on tissue intensities, so as to

1 http://www.fil.ion.ucl.ac.uk/spm

82

Page 3: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

increase its robustness. In fact our future work will focus on learning, from largedata sets, informative intensity priors that could be used in combination withthe proposed method.

2 Data Model

Let X denote the observed data, that is to say the intensities correspondingto D images of the same subject acquired with different modalities. The signalat voxel j can then be represented by a D-dimensional vector xj ∈ RD, withj ∈ {1, . . . , N}.

We model the distribution of xj as a multivariate Gaussian mixture consist-ing of K clusters parametrized by mean vectors {µk}k=1,...,K and covariancematrices {Σk}k=1,...,K . The mixing proportions of the different components aregiven by Θπ = {πjk} with πjk ∈ [0, 1] and

∑k πjk = 1. Essentially πjk indicates

the prior probability of signal at spatial location j being drawn from cluster k.Moreover we assume that the K Gaussians can be partitioned into T subsets,

corresponding to different tissue types. We denote these subsets by {Ct}t=1,...,T ,

with⋃Tt Ct = {1, . . . ,K}. This means that each tissue t ∈ {1, . . . , T} is itself

represented by a GMM consisting of Kt components with∑tKt = K.

We consider the prior probability of voxel j belonging to tissue t to be givena priori (through a probabilistic atlas) and we indicate this probability as τjt.Furthermore we allow these tissue priors to be rescaled by a set of weights{wt}t=1,...,T to accommodate for individual differences in tissue composition.Finally we introduce a set of parameters {gk}k=1,...,K denoting the normalizedweights of the different Gaussians associated with one tissue type, so that

∀t ∈ {1, . . . , T} :∑k∈Ct

gk = 1. (3)

As a result we can express the mixing proportions Θπ of our GMM as

πjk = gk ·τjt · wt∑Tt′ τjt′ · wt′

, (4)

where {τjk} are known parameters, while {wt}t=1,...,T and {gk}k=1,...,K have tobe estimated from the observed data X.

To correct for intensity non-uniformity artifacts, we introduce a multiplica-tive D-dimensional bias field denoted by {bj(Θβ)}j=1,...,N , where Θβ is a vectorof parameters. Each of the D components of the bias is modelled as the expo-nential of a linear combination of discrete cosine transform basis functions.

Finally, to account for the variability of anatomical shapes among subjects,we allow the probabilistic atlas given by {τt}t=1,...,T to be deformed according toa displacement field parametrized by the set of vectors Θα = {αj}j=1,...,N . Thewarped tissue priors can therefore be expressed as {τt(ϕ(Θα))}t=1,...,T , whereϕ(Θα) is a coordinate mapping from the individual image space into the at-las space. The parametrization adopted here consists in adding to the identity

83

Page 4: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

transform a small displacement field {αj}j=1,...,N , so that yj = yj +αj , wherethe vector yj encodes the coordinates of the centre of voxel j.

If we introduce a set of binary latent variables Z denoting the class member-ships of the observed data X, the probability of Z given the mixing proportionsΘπ and the deformation parameters Θα is equal to

p(Z|Θπ, Θα) =

N∏j=1

K∏k=1

(πjk(ϕ(Θα)))zjk , (5)

where we have assumed that all data are independent.The conditional distribution of the observed intensities given the latent vari-

ables, the Gaussian means Θµ and covariances ΘΣ and the bias field parametersΘβ can be expressed as follows, with Bj = diag(bj), modelling the bias field

p(X|Z, Θµ, ΘΣ , Θβ) =

N∏j=1

K∏k=1

(det(Bj) N (Bjxj |µk,Σk))zjk , (6)

The joint probability of all the random variables conditioned on the mixingproportions is given, for our model, by

p(X,Z, Θµ, ΘΣ , Θβ , Θα|Θπ) =

p(X|Z, Θµ, ΘΣ , Θβ)p(Z|Θπ, Θα)p(Θµ, ΘΣ)p(Θα)p(Θβ) . (7)

The voxel specific mixing proportions Θπ are treated here as deterministic pa-rameters depending on the available anatomical atlas, on the tissue weights {wt}and on the within-tissue mixing proportions {gk}, therefore they are determinedvia ML estimation. The priors on the means and covariances of the differentclasses are modelled as Gaussian-Wishart distributions

p(Θµ, ΘΣ) =

K∏k=1

N (µk|m0k, b−10kΣk)W(Σ−1k |W0k, ν0k) , (8)

The terms p(Θα) and p(Θβ) represent prior probability distributions overthe deformation and bias field parameters. Their function is to regularize thesolution obtained through model fitting by penalizing improbable parametersvalues. In doing so, they assure greater physical plausibility of the resultingnon-uniformity and deformation fields, while also improving numerical stabilitywithin the optimization process. Here we adopt the same regularization schemedescribed in [7]. The question of how to determine optimal forms for the regu-larization terms is beyond the scope of this work and therefore is not addressedhere. Interestingly, such a problem could also be solved in a variational inferenceframework, as shown in [8].

A lower bound on the marginal likelihood p(X, Θα, Θβ |Θπ) is given by

L =∑Z

∫ ∫q(Z, Θµ, ΘΣ) log

{p(X,Z, Θµ, ΘΣ , Θβ , Θα|Θπ)

q(Z, Θµ, ΘΣ)

}dΘµdΘΣ . (9)

84

Page 5: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

To make the problem tractable we assume that the variational distributionq(Z, Θµ, ΘΣ) factorizes as q(Z, Θµ, ΘΣ) = q(Z)q(Θµ, ΘΣ) .

A graphical representation of the described generative model is provided infigure 1.

Λk

µk

m0k

β0k

ν0k

W0k

K

τt

wt T

gkt

Kt

πj zj

xjαj

N

m0k

β0k

ν0k

W0k

Θβ

Fig. 1: Directed acyclic graph rep-resenting the generative GaussianMixture Model adopted in thiswork. Small circles indicate fixedparameters, large unfilled circlesdenote unobserved random vari-ables, while shaded large circlescorrespond to observed randomvariables. We also make use of platenotation to indicate repetition ofvariables.

3 Model Learning

The statistical model descried in Section 2 can be fit to data adopting a vari-ational version of the standard EM algorithm for MLE. The objective of thisoptimization procedure is to learn optimal solutions for the variational pos-terior distribution q(Z)q(Θµ, ΘΣ), to estimate MAP values for the parameters{Θα, Θβ} and ML values for {gk} and {wt}. Finally, inferring tissue labels can bedone determining, voxel by voxel, the tissue type that has the highest posteriorprobability of having generated the data.

3.1 Variational E-step

In the variational generalization of the EM algorithm (VBEM) we can still distin-guish two steps: VE-step and VM-step. In the variational E-step the functionalL in (9) is maximized with respect to the posterior factor q(Z) over the latentvariables [6]. Making use of (2) we find that

q(Z) ∝ exp(

log p(Z|Θπ, Θα) + Eµ,Σ [log p(X|Z, Θµ, ΘΣ , Θβ)]). (10)

If we define

log ρjk = log p(Z|Θπ, Θα) + Eµ,Σ [log p(X|Z, Θµ, ΘΣ , Θβ)] , (11)

it follows that

q(Z) =

N∏j=1

K∏k=1

(ρjk∑Kc=1 ρjc

)zjk=

N∏j=1

K∏k=1

(γjk)zjk . (12)

85

Page 6: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

The quantity log ρjk can be computed from (11) to give

log ρjk = log πjk(ϕ(Θα))− D

2log(2π) +

1

2EΣk

[log |(Σk)−1|

]− 1

2Eµk,Σk

[(Bjxj − µk)TΣ−1k (Bjxj − µk)

].

(13)

The terms {γjk} can then be used to compute the following sufficient statisticsof the observed data, which will serve to update the posterior distributions of{µk}k=1,...,K and {Σk}k=1,...,K , during the VM-step

s0k =

N∑j=1

γjk , s1k =

N∑j=1

γjkBjxj , S2k =

N∑j=1

γjk(Bjxj)(Bjxj)T . (14)

3.2 Variational M-step

In the following VM-step we can derive approximate solutions for the posteriordistributions over the cluster means and covariance matrices [6]. Making againuse of (2) we obtain

q(Θµ, ΘΣ) ∝ exp

N∑j=1

K∑k=1

γjk logN (Bjxj |µk,Σk) +

K∑k=1

log p(Θµ, ΘΣ)

.

(15)

It can be proved that the posterior distributions on the means and covariancesof the different Gaussians take the same form as the corresponding priors [6],that is

q(Θµ, ΘΣ) =

K∏k=1

N (µk|mk, b−1k Σk)W(Σ−1k |Wk, νk) , (16)

The parameters that govern these posterior distributions can be computed asa function of the prior hyperparameters and the sufficient statistics obtained inthe previous VE-step, as follows

bk = b0k + s0k ,

mk =b0km0k + s1kb0k + s0k

,

W−1k = W−1

0k + S2k +b0ks0km0km

T0k

b0k + s0k− s1ks

T1k

b0k + s0k− b0ks1km

T0k

b0k + s0k− b0km0ks

T1k

b0k + s0k,

νk = ν0k + s0k + 1 .

(17)

The point estimates of the mixing proportions {gk}k=1,...,K and of the tissueweights {wt}t=1,...,T can instead be updated by

gk =s0k∑c∈Ct s0c

, wt =

∑k∈Ct s0k

N∑j=1

τjt(ϕ(Θα))∑Tt′=1 τjt′(ϕ(Θα))wt′

. (18)

86

Page 7: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

3.3 Computing the Lower Bound

The lower bound can be easily evaluated, once the sufficient statistics and thevariational posterior distributions have been computed [6], by

L = EZ,Θµ,ΘΣ [log p(X|Z, Θµ, ΘΣ , Θβ)] + EZ[log p(Z|Θπ, Θα)]

+EΘµ,ΘΣ [log p(Θµ, ΘΣ)] + log p(Θα) + log p(Θβ)

−EZ[log q(Z)]− EΘµ,ΘΣ [log q(Θµ, ΘΣ)] , (19)

with

EZ,Θµ,ΘΣ

[log p(X|Z, Θµ, ΘΣ , Θβ)

]=

1

2

K∑k=1

s0k

(E[

log |Σ−1k |]−D log(2π)− D

βk

)− 1

2

K∑k=1

s0k(νkm

TkWkmk

)−1

2

K∑k=1

νk(Tr(WkS2k − 2s1km

TkWk)

)+

N∑j=1

K∑k=1

γjk log |Bj | . (20)

EZ[log p(Z|Θπ, Θα)] =

N∑j=1

K∑k=1

γjk log πjk(ϕ(Θα)) . (21)

EΘµ,ΘΣ [log p(Θµ, ΘΣ)] =

1

2

K∑k=1

D logβ0k2π−Dβ0k

βk+ 2 logBW (W0k, ν0k) +

1

2

K∑k=1

E[

log |Σ−1k |](ν0k −D)

−νk Tr(W−10k Wk + β0k(mk −m0k)(mk −m0k)TWk) .

(22)

EZ[log q(Z)] =

N∑j=1

K∑k=1

γjk log γjk . (23)

EΘµ,ΘΣ [log q(Θµ, ΘΣ)] =

E[

log |Σ−1k |](1

2νk −D

)+

1

2D

(log

βk2π− 1− νk

)+ logBW (Wk, νk) . (24)

The term BW (W , ν) in equations (22) and (24) indicates the normalizingconstant for a Wishart distribution parametrized by W and ν.

3.4 Estimating Bias Field and Deformations

In order to estimate optimal parameters to represent the bias field we need,at each iteration of the algorithm, to maximize the lower bound (19) on theobjective function with respect to the parametersΘβ . A closed form solution does

87

Page 8: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

not exist in this case; therefore, recourse to numerical optimization techniquescannot be avoided. The optimization problem can be formulated as follows

Θβ = arg maxΘβ{EZ,Θµ,ΘΣ [log p(X|Z, Θµ, ΘΣ , Θβ)] + log p(Θβ)} . (25)

Similarly, the deformation field that best matches the population based atlas{τt}t=1,...,T to an individual image can be estimated solving the following opti-mization problem

Θα = arg maxΘα{EZ[log p(Z|Θπ, Θα)] + log p(Θα)} . (26)

In both cases we adopt a Gauss-Newton optimization scheme.

4 Experimental Results

4.1 Experiments on Synthetic Data

The accuracy of the presented variational algorithm was first assessed makinguse of synthetic data produced by the Brainweb MRI simulator [9]. In particular,synthetic T1- and T2-weighted scans were generated from a healthy adult brainmodel, with the parameters indicated in table 1

Table 1: Simulation parameters selected to generate the synthetic data which wasused to evaluate the accuracy of the presented VB algorithm. A bias intensityof 20% corresponds to values of the non uniformity field in the range [0.9, 1.1] .

Sequence TR (ms) Flip angle (deg) TE (ms) Bias field

T1w SFLASH 18 30 10 20%T2w DSE LATE 3300 90 35,120 20%

The images were first segmented using the method described in this paper;the resulting gray matter and white matter segmentations were then comparedto the ground truth provided by the underlying model of the data. The similaritybetween ground truth and estimated segmentations was quantified by computingDice similarity coefficients. To examine the behaviour of the algorithm withrespect to noise, the same analyses were repeated for three different levels ofnoise in the data (3%, 5% and 9% of the brightest intensity). For this experimentwe used the probabilistic atlas provided by the SPM12 software which includesgray matter, white matter, CSF, bone, soft tissue and air. Moreover we set thefollowing hyperparameter values (weakly informative priors)

β0 = 1, m0 =

∑Nj=1 xj

N, ν0 = D, W−1

0 =

∑Nj=1(xj −m0)(xj −m0)T

N. (27)

In addition, equivalent accuracy measures were computed on the segmenta-tions obtained processing the same data with the Maximum Likelihood algo-rithm implemented in SPM12. Results are shown in table 2 and indicate that

88

Page 9: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

our method provides very accurate segmentations while never performing worsethan the SPM ML approach.

Table 2: Dice similarity coefficients between the estimated segmentations (ofgray and white matter) and the ground truth labels, for our VB algorithm andfor SPM ML approach.

Noise level 3% 5% 9%

Algorithm VB ML VB ML VB ML

GM 0.91 0.90 0.91 0.89 0.87 0.87Tissue

WM 0.95 0.94 0.94 0.94 0.89 0.89

For this simulated data we could also evaluate the accuracy of the presentedalgorithm with respect to bias field correction. To do so we computed Pear-son’s correlation coefficients between the estimated non-uniformity field and theground truth. Results are shown in table 3, where we report as well the correla-tion coefficients achieved by SPM segmentation algorithm. Both methods exhibitvery good agreement between the estimated bias field and the ”true” one, whichwas used to simulate the data.

Table 3: Pearson’s correlation coefficients between estimated and ground truthbias fields for the presented VB method and for SPM ML method.

Noise level 3% 5% 9%

Algorithm VB ML VB ML VB ML

T1w 0.83 0.83 0.84 0.82 0.68 0.61Modality

T2w 0.88 0.88 0.89 0.89 0.88 0.70

Run time was 3 min 23 s for our algorithm and 3 min 26 s for SPM, on aQuad-Core PC at 3.19 GHz with 12 GB RAM.

4.2 Experiments on Real Data

To test the performance of the presented method on real data we used T1-, T2-and PD-weighted scans of 15 randomly selected subjects from the freely availableIXI brain database 2. Images were all acquired with the same scanner, a PhilipsMedical Systems Gyroscan Intera 1.5 Tesla. We applied the algorithm describedin Section 3 to segment such a multispectral data set with the same settingsadopted for the previous experiments.

Unfortunately no ground truth is available for this data. Anyway we com-pared our results with those obtained, on the same images, with the widely usedsegmentation algorithm implemented in SPM12 and described in [7]. A quanti-tative assessment of the accuracy and reliability of such a method can be foundin [10].

2 http://www.brain-development.org

89

Page 10: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

(a) T1w (b) GM - ML (c) GM - VB (d) wGM - TPM (e) GM - TPM

Fig. 2: A T1-weighted image included in the IXI data set is shown in panel(a). The corresponding gray matter (GM) segmentations obtained with the MLmethod implemented in SPM12 and with our VB approach are illustrated insubfigures (b) and (c) respectively. In panel (d) we illustrate how the gray matteratlas, which is depicted in panel (e), is warped by our algorithm to match theindividual image.

GM WM CSF0.7

0.75

0.8

0.85

0.9

0.95

1

DS

C

(a)

T1w

T2

w

100 200 300 400 500 600 700 800 900 1000

100

200

300

400

500

600

700

BONE

CSF

SOFT TISSUE

GM

WM

(b)

Fig. 3: (a) Dice similarity coefficients between the results produced by our al-gorithm and by the segmentation algorithm implemented in SPM12. The barsindicate the average scores for gray matter (GM) white matter (WM) and cere-brospinal fluid (CSF). Error bars indicate standard deviations. Scatter plots ofthe coefficients are also shown, superimposed to the bar plot. (b) Contour plotof the intensity distributions of gray matter, white matter, cerebrospinal fluid,bone and soft tissue obtained for one subject included in the IXI dataset, overlaidon the joint histogram of the T1- and T2-weighted images. The optimal num-ber of components is determined automatically by our VB algorithm thereforeirrelevant model components are not shown here.

90

Page 11: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

For simple illustrative purposes we report in Figure 2 the gray matter seg-mentations obtained, for one of the selected subjects, with SPM ML approach(b) and with our VB method (c).

To quantitatively compare the two algorithms, we computed Dice similaritycoefficients between the set of resulting gray matter, white matter and cere-brospinal fluid segmentations. Results are reported in Figure 3(a). We obtainedvery high values of the overlap measure, for all tissue classes (DSC>0.9). Thisconfirms our hypothesis that the presented VB framework could represent a pow-erful generalization of existing methods for image segmentation, such as the oneprovided with SPM12. The results presented in this section were obtained usingweakly informative priors, therefore the high similarity between the performanceof ML and VB approaches is to be expected, nonetheless it will be possible toachieve even greater robustness, compared to ML approaches, by learning infor-mative priors of tissue intensities from available data sets, which will be part ofour future work.

In addition, it should be noted that variational techniques have an inherentcapability of avoiding overfitting, which represents a fundamental advantage overML model learning [4,6]. In the case of mixture models, this allows for example todetermine the most appropriate number of components (K) without performingcross-validation, which is usually rather demanding in terms of computationsand amount of data required [6].

For the model presented here, if the number of Gaussians is set to a value thatis higher than the optimal one, the redundant components will be automaticallypruned out of the model [5] as their responsibilities γjk are quickly driven tozero by the algorithm. This follows from the fact that the variational objectivefunction contains an implicit penalty for complex models [4].

We illustrate such a behavior using the images of one subject in the IXIdatabase. In particular we set 5 Gaussians for each of the tissue types of interestand, after convergence of the VBEM algorithm, we observed only 2 componentssurviving for gray matter, 1 for white matter, 3 for CSF, 2 for bone and 4 for softtissues, as shown in Figure 3(b). In a similar setting a ML algorithm would havesimply found the best fit to the data, making use of all the available components,but the optimal number of Gaussians would have had to be determined a priori,through some form of model comparison.

5 Discussion

Evaluating posterior probability distributions over model parameters and latentvariables is often a very demanding task in the context of probabilistic modellingproblems, especially when working with large-scale data sets. Unfortunately thisis usually the case for image processing applications.

VB techniques belong to the family of deterministic approaches for findingapproximate solutions to inference problems. Despite not providing exact re-sults, they allow learning of fully Bayesian models without the computationaldrawbacks of sampling techniques. The resulting algorithms have the interesting

91

Page 12: Variational Inference for Image Segmentation - BAMBI 2016bambi.cs.ucl.ac.uk/2015/ProceedingsPDF/BAMBI2015_08_p81.pdf · Variational Inference for Image Segmentation Claudia Blaiotta1,

property of generalizing ML or MAP approaches, while automatically addressingthe overfitting issues associated with ML estimation.

In this paper we have shown that VB represents a viable and effective frame-work for performing medical image segmentation, in spite of not having beenexploited so far in such a field. We demonstrated the accuracy of our variationalsegmentation algorithm on synthetic data and we compared its performance onreal data to that the widely used SPM image segmentation algorithm which webelieve our method could be a powerful extension of. We have also provided evi-dence that the VBEM algorithm presented in Section 3 can be used to automat-ically determine the optimal number of components (model complexity) of theunderlying GMM, without incurring overfitting as the standard EM approach.Further increases in the robustness of the framework presented here could beachieved by learning informative intensity priors from a population data set, aproblem that we aim to address in our future work.

References

1. Woolrich, M.W., Behrens, T.E.: Variational Bayes inference of spatial mixturemodels for segmentation. Medical Imaging, IEEE Transactions on 25(10), 1380–1391 (2006)

2. Tian, G., Xia, Y., Zhang, Y., Feng, D.: Hybrid genetic and variational expectation-maximization algorithm for Gaussian-mixture-model-based brain MR image seg-mentation. Information Technology in Biomedicine, IEEE Transactions on 15(3),373–380 (2011)

3. Batmanghelich, N.K., Saeedi, A., Cho, M., Estepar, R.S.J., Golland, P.: Generativemethod to discover genetically driven image biomarkers. In: Information Processingin Medical Imaging. pp. 30–42. Springer (2015)

4. Attias, H.: Inferring parameters and structure of latent variable models by varia-tional Bayes. In: Proceedings of the Fifteenth conference on Uncertainty in artificialintelligence. pp. 21–30. Morgan Kaufmann Publishers Inc. (1999)

5. Corduneanu, A., Bishop, C.M.: Variational Bayesian model selection for mixturedistributions. In: Artificial intelligence and Statistics. vol. 2001, pp. 27–34. MorganKaufmann Waltham, MA (2001)

6. Bishop, C.M., et al.: Pattern Recognition and Machine Learning, vol. 1. SpringerNew York (2006)

7. Ashburner, J., Friston, K.J.: Unified segmentation. Neuroimage 26(3), 839–851(2005)

8. Simpson, I.J., Schnabel, J.A., Groves, A.R., Andersson, J.L., Woolrich, M.W.:Probabilistic inference of regularisation in non-rigid registration. NeuroImage59(3), 2438–2451 (2012)

9. Cocosco, C.A., Kollokian, V., Kwan, R.K.S., Pike, G.B., Evans, A.C.: Brainweb:Online interface to a 3D MRI simulated brain database. In: NeuroImage. Citeseer(1997)

10. Fellhauer, I., Zollner, F.G., Schroder, J., Degen, C., Kong, L., Essig, M., Thomann,P.A., Schad, L.R.: Comparison of automated brain segmentation using a brainphantom and patients with early alzheimer’s dementia or mild cognitive impair-ment. Psychiatry Research: Neuroimaging (2015)

92