arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge...

12
Linear Shape Deformation Models with Local Support Using Graph-based Structured Matrix Factorisation Florian Bernard 1,2 Peter Gemmar 3 Frank Hertel 1 Jorge Goncalves 2 Johan Thunberg 2 1 Centre Hospitalier de Luxembourg, Luxembourg {bernard.florian,hertel.frank}@chl.lu 2 Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg {johan.thunberg,jorge.goncalves}@uni.lu 3 Trier University of Applied Sciences, Trier, Germany [email protected] ... ] = +[ PCA (global support) Our (local support) PCA (global support) Our (local support) PCA (global support) Our (local support) ¯ x Φ 1 Φ M y() Figure 1. Global support factors of PCA lead to implausible body shapes, whereas the local support factors of our method give more realistic results. (Best viewed on screen when zoomed in) Abstract Representing 3D shape deformations by linear models in high-dimensional space has many applications in computer vision and medical imaging, such as shape-based interpola- tion or segmentation. Commonly, using Principal Compo- nents Analysis a low-dimensional (affine) subspace of the high-dimensional shape space is determined. However, the resulting factors (the most dominant eigenvectors of the co- variance matrix) have global support, i.e. changing the co- efficient of a single factor deforms the entire shape. In this paper, a method to obtain deformation factors with local support is presented. The benefits of such models include better flexibility and interpretability as well as the possi- bility of interactively deforming shapes locally. For that, based on a well-grounded theoretical motivation, we formu- late a matrix factorisation problem employing sparsity and graph-based regularisation terms. We demonstrate that for brain shapes our method outperforms the state of the art in local support models with respect to generalisation ability and sparse shape reconstruction, whereas for human body shapes our method gives more realistic deformations. 1. Introduction Due to their simplicity, linear models in high- dimensional space are frequently used for modelling non- linear deformations of shapes in 2D or 3D space. For ex- ample, Active Shape Models (ASM) [11], based on a sta- tistical shape model, are popular for image segmentation. Usually, surface meshes, comprising faces and vertices, are employed for representing the surfaces of shapes in 3D. Di- mensionality reduction techniques are then used to learn a low-dimensional representation of the vertex coordinates from training data. Frequently, an affine subspace that is sufficiently close to the training shapes is used. To be more specific, mesh deformations are modelled by expressing the vertex coordinates as the sum of a mean shape ¯ x and a linear combination of M modes of variation Φ =[Φ 1 ,..., Φ M ], according to a weight/coefficient vector α, i.e. the ver- tices deformed by α are given by y(α)=¯ x + Φα, see Fig. 1. Commonly, by using Principal Components Anal- ysis (PCA), the modes of variation are set to the most dominant eigenvectors of the covariance matrix of train- ing shapes. PCA-based models are computationally con- venient due to the orthogonality of the eigenvectors of the (real and symmetric) covariance matrix. Furthermore, due 1 arXiv:1510.08291v1 [cs.CV] 28 Oct 2015

Transcript of arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge...

Page 1: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

Linear Shape Deformation Models with Local Support Using Graph-basedStructured Matrix Factorisation

Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2

1Centre Hospitalier de Luxembourg, Luxembourgbernard.florian,[email protected]

2Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourgjohan.thunberg,[email protected]

3Trier University of Applied Sciences, Trier, [email protected]

... ]↵=+[

PCA (global support)

Our (local support)

PCA (global support)

Our (local support)

PCA (global support)

Our (local support)

x1 M y(↵)

Figure 1. Global support factors of PCA lead to implausible body shapes, whereas the local support factors of our method give morerealistic results. (Best viewed on screen when zoomed in)

Abstract

Representing 3D shape deformations by linear models inhigh-dimensional space has many applications in computervision and medical imaging, such as shape-based interpola-tion or segmentation. Commonly, using Principal Compo-nents Analysis a low-dimensional (affine) subspace of thehigh-dimensional shape space is determined. However, theresulting factors (the most dominant eigenvectors of the co-variance matrix) have global support, i.e. changing the co-efficient of a single factor deforms the entire shape. In thispaper, a method to obtain deformation factors with localsupport is presented. The benefits of such models includebetter flexibility and interpretability as well as the possi-bility of interactively deforming shapes locally. For that,based on a well-grounded theoretical motivation, we formu-late a matrix factorisation problem employing sparsity andgraph-based regularisation terms. We demonstrate that forbrain shapes our method outperforms the state of the art inlocal support models with respect to generalisation abilityand sparse shape reconstruction, whereas for human bodyshapes our method gives more realistic deformations.

1. Introduction

Due to their simplicity, linear models in high-dimensional space are frequently used for modelling non-linear deformations of shapes in 2D or 3D space. For ex-ample, Active Shape Models (ASM) [11], based on a sta-tistical shape model, are popular for image segmentation.Usually, surface meshes, comprising faces and vertices, areemployed for representing the surfaces of shapes in 3D. Di-mensionality reduction techniques are then used to learna low-dimensional representation of the vertex coordinatesfrom training data. Frequently, an affine subspace that issufficiently close to the training shapes is used. To be morespecific, mesh deformations are modelled by expressing thevertex coordinates as the sum of a mean shape x and a linearcombination of M modes of variation Φ = [Φ1, . . . ,ΦM ],according to a weight/coefficient vector α, i.e. the ver-tices deformed by α are given by y(α) = x + Φα, seeFig. 1. Commonly, by using Principal Components Anal-ysis (PCA), the modes of variation are set to the mostdominant eigenvectors of the covariance matrix of train-ing shapes. PCA-based models are computationally con-venient due to the orthogonality of the eigenvectors of the(real and symmetric) covariance matrix. Furthermore, due

1

arX

iv:1

510.

0829

1v1

[cs

.CV

] 2

8 O

ct 2

015

Page 2: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

to the diagonalisation of the covariance matrix, an axis-aligned Gaussian distribution of the weight vectors of thetraining data is obtained. A problem of PCA-based modelsis that eigenvectors have global support, meaning that ad-justing the weight of a single factor affects all vertices ofthe shape (Fig. 1).

Thus, in this work, instead of eigenvectors as modes ofvariation, we consider more general factors that have lo-cal support, where adjusting the weight of a single factorleads only to a spatially localised deformation of the shape(Fig. 1). The set of all factors can be seen as a dictionary inorder to represent shapes by a linear combination of the fac-tors. Benefits of factors with local support include more re-alistic deformations (cf. Fig. 1), better interpretability of thedeformations (e.g. clinical interpretability in a medical con-text [27]), and the possibility of interactive local mesh de-formations (e.g. editing animated mesh sequences in com-puter graphics [23], or enhanced flexibility for interactive3D segmentation based on statistical shape models [3]).

1.1. PCA Variants

PCA is a non-convex problem that admits the efficientcomputation of the global optimum, e.g. by Singular ValueDecomposition (SVD). However, the downside is that theincorporation of arbitrary (convex) regularisation terms isnot possible due to the SVD-based solution. Therefore,incorporating regularisation terms into PCA is an activefield of research and several variants have been presented:Graph-Laplacian PCA [19] obtains factors such that thecomponents vary smoothly according to a given graph. Ro-bust PCA [7] formulates PCA as a convex low-rank matrixfactorisation problem, where the nuclear norm is used asconvex relaxation of the matrix rank. A combination of bothGraph-Laplacian PCA and Robust PCA has been presentedin [26]. The Sparse PCA (SPCA) method [16, 38] obtainssparse factors. Structured Sparse PCA (SSPCA) [18] ad-ditionally imposes structure on the sparsity of the factorsusing group sparsity norms, such as the mixed `1/`2 norm.

1.2. Deformation Model Variants

In [20], the flexibility of shape models has been in-creased by using PCA-based factors in combination witha per-vertex weight vector, in contrast to a single weightvector that is used for all vertices. In [12, 34], it is shownthat additional elasticity in the PCA-based model can beobtained by manipulation of the sample covariance matrix.Whilst both approaches increase the flexibility of the shapemodel, they result in global support factors.

In [27], SPCA is used to model the anatomical shapevariation of the 2D outlines of the corpus callosum. In [32],2D images of the cardiac ventricle were used to train an Ac-tive Appearance Model based on Independent ComponentAnalysis (ICA) [17]. Other applications of ICA for statis-

tical shape models are presented in [29, 37] as well as in[8], where ICA is used for motion editing. The Orthomaxmethod, where the PCA basis is determined first and thenrotated such that it has a “simple” structure, is used in [28].The major drawback of SPCA, ICA and Orthomax is thatthe spatial relation between vertices is completely ignored.

The Key Point Subspace Acceleration method based onVarimax, where a statistical subspace and key points are au-tomatically identified from training data, is introduced in[21]. For mesh animation, in [31] the clusters of spatiallyclose vertices are determined first by spectral clustering, andthen PCA is applied for each vertex cluster, resulting in onesub-PCA model per cluster. This two-stage procedure hasthe problem, that, due to the independence of both stages, itis unclear whether the clustering is optimal with respect tothe deformation model. Also, a blending procedure for theindividual sub-PCA models is required. A similar approachof first manually segmenting body regions and then learninga PCA-based model has been presented in [36].

Neumann et al. have presented the Sparse Localised De-formation Components (SPLOCS) method for obtaining lo-calised deformation modes from animated mesh sequences[23]. They use a matrix factorisation formulation usinga weighted `1/`2 norm regulariser. Local support factorsare obtained by explicitly modelling local support regions,which are in turn used to update the weights of the `1/`2norm in each iteration. This makes the non-convex optimi-sation problem even harder to solve and dismisses conver-gence guarantees. With that, a suboptimal initialisation ofthe support regions, as it is presented in their work, affectsthe quality of the found solution.

In the compressed manifold modes method [22], lo-cal support basis functions of the (discretised) Laplace-Beltrami operator of a single input mesh are determined.

1.3. Aims and Main Contributions

The work presented in this paper has the objective oflearning local support deformation factors from trainingdata. The main application of the resulting shape modelis recognition, segmentation and shape interpolation [3].Whilst our work remedies several of the mentioned short-comings of existing methods, it can also be seen as comple-mentary to SPLOCS, which is more tailored towards artisticediting and mesh animation. The most significant differenceto SPLOCS is that we aim at letting the training shapes au-tomatically determine the location and size of each localsupport region, which is advantageous for our applicationsin mind. This is achieved by formulating a matrix factori-sation problem that incorporates regularisation terms whichsimultaneously account for sparsity and smoothness of thefactors, where the graph-based smoothness regulariser ac-counts for smoothly varying neighbour vertices. In contrastto SPLOCS or sub-PCA, this results in an implicit cluster-

Page 3: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

ing that is part of the optimisation and does not require aninitialisation of local support regions, which in turn sim-plifies the optimisation procedure. Moreover, by integrat-ing a smoothness prior into our framework we can han-dle small training datasets, even if the desired number offactors exceeds the number of training shapes. Our opti-misation problem is formulated in terms of the StructuredLow-Rank Matrix Factorisation framework [14], which hasappealing theoretical properties.

2. MethodsFirst, we familiarise the reader with our notation and lin-

ear shape deformation models. Then, we state the objec-tive and its formulation as optimisation problem, followedby the theoretical motivation. Finally, the block coordinatedescent algorithm and the factor splitting method are pre-sented.

2.1. Notation

Ip denotes the p×p identity matrix, 1p the p-dimensionalvector containing ones, 0p×q the p× q zero matrix, and S+

p

the set of p × p positive semi-definite matrices. Let A ∈Rp×q . We use the notation AA,B to denote the submatrix ofA with the rows indexed by the (ordered) index set A andcolumns indexed by the (ordered) index set B. The colondenotes the full (ordered) index set, e.g. AA,: is the matrixcontaining all rows of A indexed by A. For brevity, wemay write Ar to denote the p-dimensional vector formedby the r-th column of A. The operator vec(A) creates a pq-dimensional column vector from A by stacking its columns,and ⊗ denotes the Kronecker product.

2.2. Linear Shape Deformation Models

Let Xk ∈ RN×3 be the matrix representation of a shapecomprising N points (or vertices) in 3 dimensions, and letXk : 1 ≤ k ≤ K be the set of K training shapes. In or-der to process multiple shapes in a meaningful way it is es-sential that the rows in each Xk correspond to homologouspoints, which we assume as given. By using the columnvector notation xk= vec(Xk)∈R3N , all xk can be arrangedin the matrix X=[x1, . . . ,xK ] ∈ R3N×K . For brevity, weassume that the mean shape X has been subtracted from allshapes, such that

∑k Xk=0N×3, and that undesirable pose

differences have been eliminated. X is normalised such thatthe standard deviation of vec(X) is one [23].

Pairwise relations between vertices are modelled by aweighted undirected graph G=(V, E ,ω) that is shared by allshapes. The node set V=1, . . . , N enumerates all N ver-tices, the edge set E⊆1, . . . , N2 represents the connectiv-ity of the vertices, and ω ∈ R|E|+ is the weight vector. The(scalar) weight ωe of edge e=(i, j) ∈ E denotes the affinitybetween vertex i and j, where close vertices (in some ap-propriate sense) have high value ωe. We assume there are

no self-edges, i.e. (i, i) /∈ E . Whilst in most applicationsthe graph encodes pairwise spatial connectivity, affinitiesthat are not of spatial nature are possible (e.g. symmetries,or prior anatomical knowledge in medical applications).

For the standard PCA-based method [11], the modes ofvariation in the M columns of the matrix Φ ∈ R3N×M

are defined as the M most dominant eigenvectors of thesample covariance matrix C = 1

K−1XXT . However, weconsider the generalisation where Φ contains general 3N -dimensional vectors, the factors, in its M columns. In bothcases, the (linear) deformation model (modulo the meanshape) is given by

y(α) = Φα , (1)

with weight vector α ∈ RM . Due to vectorisation,the rows with indices 1, . . . , N, N+1, . . . , 2N and2N+1, . . . , 3N of y (or Φ), correspond to the x, y and zcomponents of the N vertices of the shape, respectively.

2.3. Objective and Optimisation Problem

Roughly spoken, the objective is to findΦ = [Φ1, . . . ,ΦM ] and A = [α1, . . . ,αK ] ∈ RM×Kfor a given M such that, according to eq. (1), we can write

X ≈ ΦA , (2)

where the factors Φm have local support. Local supportmeans that Φm is sparse and that all active vertices, i.e.vertices that correspond to the non-zero elements of Φm,are connected by (sequences of) edges in the graph G.

Now we state our problem formally as an optimisationproblem. The theoretical motivation thereof is based on [2,6, 14] and is recapitulated in section 2.4, where it will alsobecome clear that our chosen regularisation term is relatedto the Projective Tensor Norm [2, 14].

A general matrix factorisation problem with squaredFrobenius norm loss is given by

minΦ,A‖X−ΦA‖2F + Ω(Φ,A) , (3)

where the regulariser Ω imposes certain properties upon Φand A. The optimisation is performed over some compactset (which we assume implicitly from here on). An obviousproperty of local support factors is sparsity. Moreover, it isdesirable that in each factor components that are spatiallyclose vary smoothly. Both properties together seem to bepromising candidates to obtain local support factors, whichwe reflect in our regulariser. Our optimisation problem isgiven by

minΦ∈R3N×M

A∈RM×K

‖X−ΦA‖2F + λ

M∑m=1

‖Φm‖Φ‖(Am,:)T ‖A , (4)

where ‖ · ‖Φ and ‖ · ‖A denote vector (semi-)norms. For

Page 4: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

z′ ∈ RK , z ∈ R3N , we define

‖z′‖A = λA‖z′‖2 , and (5)

‖z‖Φ = λ1‖z‖1+λ2‖z‖2 + λ∞‖z‖H1,∞ + λG‖Ez‖2 . (6)

Both `2 norm terms will be discussed in section 2.4. The `1norm is used to obtain sparsity in the factors. The (mixed)`1/`∞ norm is defined by

‖z‖H1,∞ =∑g∈H‖zg‖∞ , (7)

where zg denotes a subvector of z indexed by g ∈ H. By us-ing the collection H = i, i+N, i+ 2N : 1 ≤ i ≤ N,a grouping of the x, y and z components per vertex isachieved, i.e. within a group g only the component withlargest magnitude is penalised and no extra cost is to bepaid for the other components being non-zero.

The last term in eq. (6), the graph-based `2 (semi-)norm,imposes smoothness upon each factor, such that neighbourelements according to the graph G vary smoothly. Based onthe incidence matrix of G, we choose E such that

‖Ez‖2=

√ ∑d∈0,N,2N

∑(i,j)=ep∈E

ωep(zd+i − zd+j)2 . (8)

As such, E is a discrete (weighted) gradient operator and‖E · ‖22 corresponds to Graph-Laplacian regularisation [19].E is specified in the supplementary material.

2.4. Theoretical Motivation

For a matrix X ∈ R3N×K and norms ‖ · ‖Φ and ‖ · ‖A,let us define the function

ψM (X) = min(Φ∈R3N×M ,

A∈RM×K):ΦA=X

M∑m=1

‖Φm‖Φ‖(Am,:)T ‖A . (9)

The function ψ(·)= limM→∞ ψM (·) defines a norm knownas Projective Tensor Norm or Decomposition Norm [2, 14].

Lemma 1. For any ε > 0 there exists an M(ε) ∈ N suchthat ‖ψ(X)− ψM(ε)(X)‖ < ε.

Proof. For ψ(X) there is a sequence Φi∞i=1 and a se-quence AT

i ∞i=1 such that ψ(X) =∑∞i=1 ‖Φi‖Φ‖AT

i ‖A.Let lm =

∑mi=1 ‖Φi‖Φ‖AT

i ‖A. The sequence lmis monotone, bounded from above and convergent. Letl∞ = ψ(X) denote its limit. Since the sequence is con-vergent, there is M(ε) such that ‖l∞ − lj‖ < ε forj ≥M(ε).

We now proceed by introducing the optimisation problem

minZ‖X− Z‖2F + λψM (Z) . (10)

Next, we establish a connection between Problem (10) andour problem (4). First, we assume that we are given a so-lution pair (Φ,A) minimising Problem (4). By defining

Z = ΦA, Z is a solution to Problem (10). Secondly, as-sume we are given a solution Z minimising Problem (10).To find a solution solution pair (Φ,A) minimising Prob-lem (4), one needs to compute the (Φ,A) that achieves theminimum of the right-hand side of (9) for a given Z.

This shows that given a solution to one of the problems,one can infer a solution to the other problem. Now we re-formulate Problem (10) in order to numerically compute itssolution. Define the matrices

Q =

AT

]∈ R3N+K×M and Y = QQT , (11)

and the function F : S+3N+K → R as

F (Y) = F (QQT ) = ‖X−ΦA‖2F + λψM (ΦA) . (12)

Let Y∗ denote a solution to the optimisation problem

minY∈S+

3N+K

F (Y) . (13)

For a given Y∗, Problem (10) is minimised by the upper-right block matrix of Y∗. The difference between (10) and(13) is that the latter is over the set of positive semi-definitematrices, which, at first sight, does not present any gain.However, under certain conditions, the global solution forQ, rather than the product Y = QQT , can be obtained di-rectly [6]. In [2] it is shown that if Q is a rank deficient localminimum of F (QQT ), then it is also a global minimum.Whilst these results only hold for twice differentiable func-tions F , Haeffele et al. have presented analogous results forthe case of F being a sum of a twice-differentiable term anda non-differentiable term [14], such as ours above.

As such, any (rank deficient) local optimum of Problem(4) is also a global optimum. If in ψ(·), both ‖ · ‖Φ and‖ · ‖A are the `2 norm, ψ(·) is equivalent to the nuclearnorm, commonly used as convex relaxation of the matrixrank [14, 25]. In order to steer the solution towards beingrank deficient, we include `2 norm terms in ‖ · ‖Φ and ‖ · ‖A(see (5) and (6)). With that, part of the regularisation term in(4) is the nuclear norm that accounts for low-rank solutions.

2.5. Block Coordinate Descent

A solution to problem (4) is found by block coordinatedescent (BCD) [35] (Algorithm 1). It employs alternatingproximal steps, which can be seen as generalisation of gra-dient steps for non-differentiable functions. Since comput-ing the proximal mapping is repeated in each iteration, anefficient computation is essential. The proximal mappingof ‖ · ‖A in eq. (5) has a closed-form solution by blocksoft thresholding [24]. The proximal mapping of ‖ · ‖Φ ineq. (6) is solved by dual forward-backward splitting [9, 10](see supplementary material). The benefit of BCD is that itscales well to large problems. However, a downside is thatby using the alternating updates one has only guaranteedconvergence to a Nash equilibrium point [14, 35].

Page 5: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

repeat// update Φ

Φ′ ← Φ− εΦ∇Φ‖X−ΦA‖2F // gradient step (loss)form = 1, . . . ,M do // proximal step Φ (penalty)

Φm ← proxλ∑‖·‖Φ‖(Am,:)T ‖A

(Φ′m)

// update A

A′ ← A− εA∇A‖X−ΦA‖2F // gradient step (loss)form = 1, . . . ,M do // proximal step A (penalty)

Am ← proxλ∑‖Φm‖Φ‖·‖A

(A′Tm )

until convergenceAlgorithm 1: Simplified view of block coordinate de-scent. For details see [14, 35].

2.6. Factor Splitting

Whilst solving problem (4) leads to smooth and sparsefactors, there is no guarantee that the factors have only asingle local support region. In fact, due to the theoreti-cal motivation in section 2.4, the solution of eq. (4) issteered towards being low-rank. However, the price to payfor a low-rank solution is that capturing multiple supportregions in a single factor is preferred over having each sup-port region in an individual factor (e.g. for M = 2 andany a 6= 0, b 6= 0, the matrix Φ = [Φ1Φ2] has a lowerrank when Φ1 = [aT bT ]T and Φ2 = 0, compared toΦ1 = [aT 0T ]T and Φ1 = [0T bT ]T ).

A simple yet effective way to deal with this issue is tosplit factors with multiple disconnected support regions intomultiple (new) factors (see supplementary material). Sincethis is performed after the optimisation problem has beensolved, it is preferable over pre- or intra-processing proce-dures [31, 23] since the optimisation remains unaffected andthe data term in eq. (4) does not change due to the splitting.

3. Experimental Results

We compared the proposed method with PCA [11], ker-nel PCA (kPCA, cf. 3.2.2), Varimax [15], ICA [17], SPCA[18], SSPCA [18], and SPLOCS [23] on two real datasets,brain structures and human body shapes. Only our methodand the SPLOCS method explicitly target to obtain localsupport factors, whereas the SPCA and SSPCA methodsobtain sparse factors (for the latter the `1/`2 norm withgroups defined by H, cf. eq. (7), is used). The meth-ods kPCA, SPCA, SSPCA, SPLOCS and ours require to setvarious parameters, which we address by random samplingover the parameter space (see supplementary material).

For all evaluated methods a factorisation ΦA is ob-tained. W.l.o.g. we normalise the rows of A to have stan-dard deviation one (since ΦA = ( 1

sΦ)(sA) for s 6= 0).Then, the factors in Φ are ordered descendingly accordingto their `2 norms.

Next we describe the scores used to quantify mesh recon-struction, generalisation, sparse reconstruction and speci-ficity.

3.1. Quantitative Measures

For x = vec(X) and x = vec(X), the average error

eavg(x, x) =1

N

N∑i=1

‖Xi,:−Xi,:‖2 (14)

and the maximum error

emax(x, x) = maxi=1,...,N

‖Xi,: − Xi,:‖2 (15)

measure the agreement between shape X and shape X.

The reconstruction error for shape xk is measured bysolving the overdetermined system Φαk = xk for αk inthe least-squares sense, and then computing eavg(xk,Φαk)and emax(xk,Φαk), respectively.

To measure the specificity error, ns shape samplesare drawn randomly (ns=1000 for the brain shapes andns=100 for the body shapes). For each drawn shape, theaverage and maximum errors between the closest shape inthe training set are computed, which are denoted by savgand smax, respectively. For simplicity, we assumed that theparameter vector α follows a zero mean Gaussian distribu-tion, where the covariance matrix Cα is estimated from theparameter vectors αk of the K training shapes. With that, arandom shape sample xr is generated by drawing a sampleof the vectorαr from its distribution and setting xr = Φαr.

For evaluating the generalisation error, a collection ofindex sets I ⊂ 21,...,K is used, where each set J ∈ I de-notes the set of indices of the test shapes for one run and |I|is the number of runs. We used five-fold cross-validation,i.e. |I| = 5 and each set J contains round(K5 ) randomintegers from 1, . . . ,K. In each run, the deformationfactors ΦJ are computed using the shapes with indicesJ = 1, . . . ,K \ J as training data. For all test shapesxj , where j ∈ J , the parameter vector αj is determined bysolving ΦJαj = xj in the least-squares sense. Eventually,the average reconstruction error eavg(xj ,Φ

Jαj) and themaximum reconstruction error emax(xj ,Φ

Jαj) are com-puted for each test shape, which we denote as gavg and gmax,respectively. Moreover, the sparse reconstruction errorsg0.05

avg and g0.05max are computed in a similar manner, with the

difference that αj is now determined by using only 5% ofthe rows (selected randomly) of ΦJ and xj , denoted by ΦJ

and xj . For that, we minimise ‖ΦJαj − xj‖22 + ‖Γαj‖22with respect to αj , which is a least-squares problem withTikhonov regularisation, where Γ is obtained by Choleskyfactorisation of Cα = ΓTΓ. The purpose of this measureis to evaluate how well a deformation model interpolatesan unseen shape from a small subset of its points, which isrelevant for shape model-based surface interpolation [3].

Page 6: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

3.2. Brain Structures

The first evaluated dataset comprises 8 (bilateral) brainstructure meshes of K=17 subjects. All 8 meshes togetherhave a total number of N=1792 vertices that are in cor-respondence among all 17 subjects. Moreover, all mesheshave the same topology, i.e. seen as graphs they are iso-morphic. A single deformation model is used to model thedeformation of all 8 meshes in order to capture the interre-lation between the brain structures. We fix the number ofdesired factors to M=96 to account for a sufficient amountof local details in the factors. Whilst the meshes are smoothand comparably simple in shape (cf. Fig. 3), a particularchallenge is that the training dataset comprises only K=17shapes.

3.2.1 Second-order Terms

Based on anatomical knowledge, we use the brain struc-ture interrelation graph Gbs = (Vbs, Ebs) shown in Fig. 2,where an edge between two structures denotes that a de-formation of one structure may have a direct effect on thedeformation of the other structure. Using Gbs, we now build

SN+STN_L SN+STN_R

NR_L

Th_L

Put+GP_L Put+GP_R

NR_R

Th_R

SN+STN: substantia nigra and subthalamic nucleusNR: nucleus ruberTh: thalamusPut+GP: putamen and globus pallidus

Figure 2. Brain structure interrelation graph.

a distance matrix that is then used in the SPLOCS methodand our method. For o ∈ Vbs, let go ⊂ 1, . . . , N de-note the (ordered) indices of the vertices of brain structureo. Let Deuc ∈ RN×N be the Euclidean distance matrix,where (Deuc)ij = ‖Xi,: − Xj,:‖2 is the Euclidean distancebetween vertex i and j of the mean shape X. Moreover, letDgeo ∈ RN×N be the geodesic graph distance matrix of themean shape X using the graph induced by the (shared) meshtopology. By Do

euc ∈ R|go|×|go| and Dogeo ∈ R|go|×|go| we

denote the Euclidean distance matrix and the geodesic dis-tance matrix of brain structure o, which are submatrices ofDeuc and Dgeo, respectively. Let do denote the average ver-tex distance between neighbour vertices of brain structureo. We define the normalised geodesic graph distance ma-trix of brain structure o by Do

geo = 1do

Dogeo and the matrix

Dgeo ∈ RN×N is composed by the individual blocks Dogeo.

The normalised distance matrix between structure o ando is given by Do,o

bs = 2do+do

(Deuc)go,go ∈ R|go|×|go|. The(symmetric) distance matrix Dbs ∈ RN×N between allstructures is constructed by

(Dbs)go,go =

Do,o

bs if (o, o) ∈ Ebs

0|go|×|go| else. (16)

For the SPLOCS method we used the distance matrix D =αDDgeo + (1− αD)Dbs. For our method, we construct thegraph G = (V, E ,ω) (cf. section 2.2) by having an edgee = (i, j) in E for each ωe = αD exp(−(Dgeo)2

i,j) + (1 −αD) exp(−(Dbs)

2ij) that is larger than the threshold θ. We

used θ = 0.1 to ignore edges with low weights in order toreduce the runtime of computing the proximal operator of‖ · ‖Φ. In both cases we set αD = 0.5.

3.2.2 Dealing with the Small Training Set

For the PCA, Varimax and ICA methods the number of fac-tors cannot be larger than K. For numerical reasons, whenusing the PCA and Varimax methods we set M such that99.999% of the variance in the training data can be ex-plained. Moreover, when we use our method, we eventuallyremove factors that are close to zero. The final number offactors M for each method is presented in Fig. 3.

For the used matrix factorisation framework it has beenreported in [14] that setting M to a value larger than the ex-pected rank of the solution but smaller than full rank hasempirically led to good results. However, since our ex-pected rank is M = 96 and the the full rank is smaller thanor equal to K = 17 this is not possible.

We deal with this issue by compensating the insufficientamount of training data by assuming smoothness of the de-formations. Based on concepts introduced in [12, 33, 34],instead of the data matrix X, we factorise a modified covari-ance matrix K, which we refer to as kernelised covariancematrix. The standard PCA method finds the M most domi-nant eigenvectors of the covariance matrix C by the (exact)factorisation C = Φ diag(λ1, . . . , λM )ΦT . Using the ker-nelised covariance K, one can incorporate additional elas-ticity into the resulting linear deformation model. For ex-ample, as shown in [34], using K = I results in independentvertex movements. A more interesting approach is to com-bine the (scaled) covariance matrix with a smooth vertexdeformation. We define K = 1

‖ vec(XXT )‖∞XXT + Keuc,with Keuc = I3 ⊗ K′euc ∈ R3N×3N . The matrix K′euc isgiven by the elements

(K′euc)ij = exp(−((Deuc)ij

2β‖ vec(Deuc)‖∞)2) , (17)

where β is the bandwidth.Setting Φ to the M most dominant eigenvectors of the

symmetric and positive semi-definite [4] matrix K gives thesolution of kernel PCA (kPCA) [4]. In terms of our pro-posed matrix factorisation problem given in eq. (4), we finda factorisation ΦA of K, instead of factorising the data ma-trix X. Since the regularisation term remains unchanged,the factor matrix Φ is still sparse and smooth (due to thenorm ‖ · ‖Φ). Moreover, due to the nuclear norm term beingcontained in the product ‖ ·‖Φ‖ ·‖A (cf. section 2.4), the re-

Page 7: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

Figure 3. The colour-coded magnitude (blue corresponds to zero, yellow to the maximum deformation in each plot) for the three deforma-tion factors with largest `2 norm is shown in the rows. The factors obtained by SPCA and SSPCA are sparse but not spatially localised (seered arrows). Our method is the only one that obtains local support factors. (Best viewed on screen when zoomed in)

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

0

0.5

1

eav

g[m

m]

reconstruction error

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

5

10

15

sav

g[m

m]

specificity error

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

1

1.5

2

2.5

gav

g[m

m]

generalisation error

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

2

3

g0.0

5av

g[m

m]

sparse reconstruction error

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

0

2

4

em

ax

[mm

]

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

10

20

sm

ax

[mm

]

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

4

6

8

10gm

ax

[mm

]

PCAkP

CA

Varimax IC

ASPCA

SSPCA

SPLOCS our

20

40

g0.0

5m

ax[m

m]

Figure 4. Boxplots of the considered quantitative measures for the brain shapes dataset. In each plot, the horizontal axis shows the individualmethods and the vertical axis the error scores described in section 3.1. Compared to SPLOCS, which is the only other method explicitlystriving for local support deformation factors, our method has a smaller generalisation error and sparse reconstruction error. The sparse butnot spatially localised factors obtained by SPCA and SSPCA (cf. Fig. 3) result in a large maximum sparse reconstruction error (bottomright).

sulting factorisation ΦA is steered towards being low-rank,in favour of the elaborations in section 2.4. However, theresulting matrix A now represents the weights for approxi-mating the kernelised covariance K by a linear combinationof the columns of Φ, rather than approximating the data ma-trix X by a linear combination of the columns of Φ. For theknown Φ, the weights that best approximate the data matrixX are found by solving the linear system ΦA = X in theleast-squares sense for A.

3.2.3 Results

The magnitude of the deformation of the first three factorsare shown in Fig. 3, where it can be seen that only SPCA,SSPCA, SPLOCS and our method obtain sparse deforma-tion factors. The SPCA and SSPCA methods do not incor-porate the spatial relation between vertices and as such thedeformations are not spatially localised (see the red arrowsin Fig. 3, where more than a single region is active). Thefactors obtained by the SPLOCS method are non-smoothand do not exhibit local support, in contrast to our method,where smooth deformation factors with local support areobtained.

The quantitative results presented in Fig. 4 reveal that

our method has a larger reconstruction error. This can beexplained by the fact that due to the sparsity and smooth-ness of the deformation factors a very large number of ba-sis vectors is required in order to exactly span the subspaceof the training data. Instead, our method finds a simple(sparse and smooth) representation that explains the dataapproximately, in favour of Occam’s razor. Also, the av-erage reconstruction error remains below 1mm, which isquite low considering that from left to right the brain struc-tures span approximately 6cm. Considering specificity, allmethods perform (roughly) in the same range. Looking atthe generalisation score, it can be seen that PCA, Varimaxand ICA, which have the lowest reconstruction errors, havethe highest values regarding generalisation, which under-lines that these methods overfit to the small training dataset.The kPCA method is able to overcome this issue due tothe smoothness assumption. SPCA and SSPCA have goodgeneralisation scores, however, these methods have a veryhigh maximum reconstruction error, which can be seen inthe bottom right in Fig. 4. Our method and SPLOCS arethe only ones that explicitly strive for local support fac-tors. Since our method outperforms SPLOCS with respectto generalisation and the reconstruction error, we claim that

Page 8: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

Figure 6. The visualisation of the deformation magnitudes showthat the SPCA, SSPCA, SPLOCS and our method obtain localsupport factors (the second factor of our method has local supportsince the connection is at the back of the body shape). The bottomrow depicts randomly drawn shape instances, where it can be seenthat only the methods that obtain local support deformation factorsresult in plausible shapes. (Best viewed on screen when zoomedin)

our method outperforms the state of the art.

3.3. Human Body Shapes

Our second experiment is based on 1531 female humanbody shapes [36], where each shape comprises 12500 ver-tices that are in correspondence among the training set. Dueto the large number of training data and the high level ofdetails in the meshes (see Fig. 6), a kernelisation as donein section 3.2.2 is not used and we directly factorise thedata matrix X. The graph G for our method is created inthe same manner as for the brain shape dataset. The dif-ference is that now Dgeo = 1

dDgeo is used to compute the

edge weights, where d denotes the average vertex distancebetween neighbour vertices and αD = 1.

3.3.1 Results

As shown in Fig. 5, quantitatively the evaluated methodshave comparable performance, with the exception that ICAhas worse overall performance. The most noticeable dif-ference between the methods is the specificity error, whereSPCA, SSPCA and our method perform best.

In Fig. 6 it can be seen that for SPCA, SSPCA, SPLOCS

Figure 7. Shapes x−1.5Φm for SPCA (m=1), SSPCA (m=3),SPLOCS (m=1) and our (m=1) method (cf. Fig. 6). Our methoddelivers the most realistic per-factor deformations. (Best viewedon screen when zoomed in)

and our method the deformation factors have local support.So it seems that for reasonably large datasets sparsity alone,as used in SPCA and SSPCA, is sufficient to account forlocal support factors. However, our method is the only onethat explicitly strives for smoothness of the factors, whichleads to more realistic human body deformations, as it isdemonstrated in Fig. 7.

4. Conclusion

We presented a novel approach for learning a linear de-formation model from training shapes, where the resultingfactors exhibit local support. By embedding sparsity andsmoothness regularisers into a theoretically well-groundedmatrix factorisation framework, we model local support re-gions implicitly, and thus get rid of the initialisation of thesize and location of local support regions, which so far hasbeen necessary in existing methods. On the brain shapesdataset our method improves the state of the art with re-spect to generalisation and sparse reconstruction. For thebody shapes dataset, quantitatively our method is on parwith existing methods, whilst it delivers more realistic per-factor deformations. Since articulated motions violate oursmoothness assumption, our method cannot handle them.However, when smooth deformations are a reasonable as-sumption, our method offers a higher flexibility and betterinterpretability compared to existing methods, whilst at thesame time delivering more realistic deformations.

5. Acknowledgements

The authors would like to thank Yipin Yang and col-leagues for making the human body shapes dataset publiclyavailable. We also thank Benjamin D. Haeffele and ReneVidal for providing their code.

We would also like to thank Thomas Buhler and DanielCremers for their helpful comments on the manuscript, aswell as Luis Salamanca for helping to improve our figures.

The authors gratefully acknowledge the financial sup-port by the Fonds National de la Recherche, Luxembourg(6538106, 8864515).

Page 9: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our

0

5

10

15

eav

g[c

m]

reconstruction error

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our

2

3

4

sav

g[c

m]

specificity error

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our0

5

10

15

gav

g[c

m]

generalisation error

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our

5

10

15

g0.0

5av

g[c

m]

sparse reconstruction error

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our

0

20

40

em

ax

[cm

]

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our

5

10

15

sm

ax

[cm

]

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our0

20

40

gm

ax

[cm

]

PCA

Varimax IC

ASPCA

SSPCA

SPLOCS our

10

20

30

40

g0.0

5m

ax[c

m]

Figure 5. Boxplots of the considered quantitative measures in each column for the body shapes dataset. Quantitatively all methods havecomparable performance, apart from ICA which performs worse.

References[1] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Convex

optimization with sparsity-inducing norms. Optimization forMachine Learning, pages 19–53, 2011. 11

[2] F. Bach, J. Mairal, and J. Ponce. Convex sparse matrix fac-torizations. arXiv.org, 2008. 3, 4

[3] F. Bernard, L. Salamanca, J. Thunberg, F. Hertel,J. Goncalves, and P. Gemmar. Shape-aware 3D Interpola-tion using Statistical Shape Models. In Proceedings of ShapeSymposium, Delemont, Sept. 2015. 2, 5

[4] C. M. Bishop. Pattern Recognition and Machine Learning(Information Science and Statistics). Springer-Verlag NewYork, Inc., Secaucus, NJ, USA, 2006. 6

[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cam-bridge University Press, 2009. 12

[6] S. Burer and R. D. Monteiro. Local minima and convergencein low-rank semidefinite programming. Mathematical Pro-gramming, 103(3):427–444, 2005. 3, 4

[7] E. J. Candes, X. Li, Y. Ma, and J. Wright. Robust principalcomponent analysis? Journal of the ACM (JACM), 58(3):11,2011. 2

[8] Y. Cao, A. Shapiro, P. Faloutsos, and F. Pighin. Motion edit-ing with independent component analysis. Visual Computer,2007. 2

[9] P. L. Combettes, D. Dung, and B. C. Vu. Dualization of sig-nal recovery problems. Set-Valued and Variational Analysis,18(3-4):373–404, 2010. 4, 11, 12

[10] P. L. Combettes and J.-C. Pesquet. Proximal splitting meth-ods in signal processing. In Fixed-point algorithms for in-verse problems in science and engineering, pages 185–212.Springer, 2011. 4, 11, 12

[11] T. F. Cootes and C. J. Taylor. Active Shape Models - SmartSnakes. In In British Machine Vision Conference, pages 266–275. Springer-Verlag, 1992. 1, 3, 5

[12] T. F. Cootes and C. J. Taylor. Combining point distributionmodels with shape models based on finite element analysis.Image and Vision Computing, 1995. 2, 6

[13] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Effi-cient projections onto the l1-ball for learning in high dimen-sions. In Proceedings of the 25th international conferenceon Machine learning, pages 272–279. ACM, 2008. 11

[14] B. Haeffele, E. Young, and R. Vidal. Structured low-rankmatrix factorization: Optimality, algorithm, and applicationsto image processing. In Proceedings of the 31st InternationalConference on Machine Learning (ICML-14), pages 2007–2015, 2014. 3, 4, 5, 6, 11

[15] H. H. Harman. Modern factor analysis. University ofChicago Press, 1976. 5

[16] M. Hein and T. Buhler. An inverse power method for nonlin-ear eigenproblems with applications in 1-spectral clusteringand sparse PCA. In Advances in Neural Information Pro-cessing Systems 23, pages 847–855, 2010. 2

[17] A. Hyvarinen, J. Karhunen, and E. Oja. Independent Com-ponent Analysis. John Wiley & Sons, Apr. 2001. 2, 5

[18] R. Jenatton, G. Obozinski, and F. Bach. Structured sparseprincipal component analysis. AISTATS, 2009. 2, 5, 12

[19] B. Jiang, C. Ding, B. Luo, and J. Tang. Graph-LaplacianPCA: Closed-form solution and robustness. In Computer Vi-sion and Pattern Recognition (CVPR), 2013 IEEE Confer-ence on, pages 3492–3498. IEEE, 2013. 2, 4

[20] C. Last, S. Winkelbach, F. M. Wahl, K. W. Eichhorn, andF. Bootz. A locally deformable statistical shape model.In Machine Learning in Medical Imaging, pages 51–58.Springer, 2011. 2

[21] M. Meyer and J. Anderson. Key point subspace accelerationand soft caching. ACM Transactions on Graphics (TOG),26(3):74, 2007. 2

[22] T. Neumann, K. Varanasi, C. Theobalt, M. Magnor, andM. Wacker. Compressed manifold modes for mesh process-ing. In Computer Graphics Forum, pages 35–44. Wiley On-line Library, 2014. 2

[23] T. Neumann, K. Varanasi, S. Wenger, M. Wacker, M. Mag-nor, and C. Theobalt. Sparse localized deformation compo-nents. ACM Transactions on Graphics (TOG), 32(6):179,2013. 2, 3, 5, 12

[24] N. Parikh and S. Boyd. Proximal algorithms. Foundationsand Trends in optimization, 2013. 4, 11, 12

[25] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear normminimization. SIAM Review, 52(3):471–501, 2010. 4

[26] N. Shahid, V. Kalofolias, X. Bresson, M. Bronstein, andP. Vandergheynst. Robust Principal Component Analysis onGraphs. arXiv.org, Apr. 2015. 2

Page 10: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

[27] K. Sjostrand, E. Rostrup, C. Ryberg, R. Larsen,C. Studholme, H. Baezner, J. Ferro, F. Fazekas, L. Pantoni,D. Inzitari, and G. Waldemar. Sparse Decomposition andModeling of Anatomical Shape Variation. Medical Imaging,IEEE Transactions on, 26(12):1625–1635, 2007. 2

[28] M. B. Stegmann, K. Sjostrand, and R. Larsen. Sparse mod-eling of landmark and texture variability using the orthomaxcriterion. In Medical Imaging, pages 61441G–61441G. In-ternational Society for Optics and Photonics, 2006. 2

[29] A. Suinesiaputra, A. F. Frangi, M. Uzumcu, J. H. Reiber,and B. P. Lelieveldt. Extraction of myocardial contractil-ity patterns from short-axes MR images using independentcomponent analysis. In Computer Vision and MathematicalMethods in Medical and Biomedical Image Analysis, pages75–86. Springer, 2004. 2

[30] R. Tarjan. Depth-first search and linear graph algorithms.SIAM Journal on Computing, 1(2):146–160, 1972. 12

[31] J. R. Tena, F. De la Torre, and I. Matthews. Interactiveregion-based linear 3d face models. ACM Transactions onGraphics, 2011. 2, 5

[32] M. Uzumcu, A. F. Frangi, M. Sonka, J. H. C. Reiber, andB. P. F. Lelieveldt. ICA vs. PCA Active Appearance Models:Application to Cardiac MR Segmentation. MICCAI, pages451–458, 2003. 2

[33] Y. Wang and L. H. Staib. Boundary finding with correspon-dence using statistical shape models. In IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), pages338–345, 1998. 6

[34] Y. Wang and L. H. Staib. Boundary finding with prior shapeand smoothness models. Pattern Analysis and Machine In-telligence, IEEE Transactions on, 22(7):738–743, 2000. 2,6

[35] Y. Xu and W. Yin. A block coordinate descent methodfor regularized multiconvex optimization with applicationsto nonnegative tensor factorization and completion. SIAMJournal on Imaging Sciences, 6(3):1758–1789, 2013. 4, 5

[36] Y. Yang, Y. Yu, Y. Zhou, S. Du, J. Davis, and R. Yang. Se-mantic Parametric Reshaping of Human Body Models. InProceedings of the 2014 Second International Conference on3D Vision-Volume 02, pages 41–48. IEEE Computer Society,2014. 2, 8

[37] R. Zewail, A. Elsafi, and N. Durdle. Wavelet-Based Inde-pendent Component Analysis For Statistical Shape Model-ing. In Canadian Conference on Electrical and ComputerEngineering, pages 1325–1328. IEEE, 2007. 2

[38] H. Zou, T. Hastie, and R. Tibshirani. Sparse principal com-ponent analysis. Journal of Computational and GraphicalStatistics, 15(2):265–286, 2006. 2

Page 11: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

A. Supplementary Material

A.1. The matrix E in eq. (8)

The matrix E ∈ R3|E|×3N is defined as E = I3 ⊗ E′,with E′ ∈ R|E|×N being the weighted incidence matrix ofthe graph G with elements

E′pq =

√ωep if q = i for ep = (i, j)

−√ωep if q = j for ep = (i, j)

0 otherwise ,(18)

where ep ∈ E with p = 1, . . . , |E| denotes the p-th edge.

A.2. Proximal Operators

Definition 1. The proximal operator (or proximal mapping)proxsθ(y) : Rn → Rn of a function θ : Rn → R scaled bys > 0 is defined by

proxsθ(y) = arg minx

1

2s‖x− y‖22 + θ(x) . (19)

A.2.1 Proximal Operator of ‖ · ‖A

The proximal operator prox‖·‖A(y) of

‖ · ‖A = λA‖ · ‖2 ,cf. eq. (5), can be computed by the so-called block softthresholding [24], i.e.

proxs‖·‖2(y) = (1− s/‖y‖2)+y , (20)

where for a vector x ∈ Rp, (x)+ replaces each negativeelement in x with 0. As such, a very efficient way for com-puting the proximal mapping of ‖ · ‖A is available.

A.2.2 Proximal Operator of ‖ · ‖Φ

The proximal mapping of

‖ · ‖Φ =λ1‖ · ‖1 + λ2‖ · ‖2+ λ∞‖ · ‖H1,∞ + λG‖E · ‖2 ,

cf. eq. (6), is given by

prox‖·‖Φ(z) =

arg minx

1

2‖x− z‖22 + λ1‖x‖1 + λ2‖x‖2 (21)

+ λ∞‖x‖H1,∞ + λG‖Ex‖2 ,which is a less trivial case, mainly because of the non-separability caused by the linear mapping E inside the `2norm. However, by introducing

f(x) = λ1‖x‖1 + λ2‖x‖2 + λ∞‖x‖H1,∞ (22)

and

g(x) = λG‖x‖2 , (23)

eq. (21) can be rewritten as

arg minx

1

2‖x− z‖22 + f(x) + g(Ex) . (24)

In this form, (24) can now be solved by a dual forward-backward splitting procedure [9, 10] as shown in Algorithm2. Now the efficient computability hinges on the efficientcomputation of the (individual) proximal mappings of f andg .

Input: z ∈ R3N

Output: x = prox‖·‖Φ(z)Parameters: λ, λ1, λ∞, λ2, λGInitialise: v, ε← 1e−4, γ ← 1.999, β = 1+ε

2// Normalise E (homogeneity of norm)

λG ← λG‖E‖F ; E← E‖E‖F

repeatx← proxλ1‖·‖1(z−ETv) // `1 prox, (25)x← proxλ∞‖·‖H1,∞(x) // `1/`∞, [1, 13]

x← proxλ2‖·‖2(x) // `2 prox, (20)v← v+βγ(Ex−proxλG/γ‖·‖2(v+γEx

γ )) // †

until convergenceAlgorithm 2: Dual forward-backward splitting algorithmto compute the proximal mapping of ‖ · ‖Φ.

Since g is a (weighted) `2 norm, its proximal mapping isgiven by block soft thresholding presented in eq. (20).

In f , the sum of weighted `1 and `1/`∞ norms is aninstance of the sparse group lasso and can be computed byapplying the soft thresholding [24]

proxs‖·‖1(y) = (y − s)+ − (−y − s)+ (25)

first, followed by group soft thresholding [1]. As shown in[14, Thm. 3], the `2 term can be additionally incorporatedby composition, i.e. by subsequently applying block softthresholding as presented in (20).

Now, for solving the group soft thresholding for theproximal mapping of the `1/`∞ norm, one can use thefact that for any norm ω with dual norm ω∗, proxsω(y) =y − projω∗≤s(y), where projω∗≤s(y) is the projection ofy onto the ω∗ norm ball with radius s [1]. The dual norm ofthe `1/`∞ norm, eq. (7), is the `∞/`1 norm

‖z‖H∞,1 = maxg∈H‖zg‖1 . (26)

The orthogonal projection proj‖·‖H∞,1≤s onto the `∞/`1 ballis obtained by projecting separately each subvector zg ontothe `1 ball in R|g| [1], demanding an efficient projectiononto the `1 norm ball. Due to the special structure of ourgroups in H, i.e. there are exactly N non-overlappinggroups, each of them comprising of three elements, a vec-torised Matlab implementation of the method by Duchi etal. [13] can be employed.

Page 12: arXiv:1510.08291v1 [cs.CV] 28 Oct 2015Florian Bernard1,2 Peter Gemmar3 Frank Hertel1 Jorge Goncalves2 Johan Thunberg2 1Centre Hospitalier de Luxembourg, Luxembourg fbernard.florian,hertel.frankg@chl.lu

Definition 2. (Convex Conjugate)Let θ?(y) = supx(yTx− θ(x)) be the convex conjugate ofθ.

The update of the dual variable v in Algorithm 2 (see theline marked with †) is based on the update

v← v + β(proxγ(λG‖·‖2)?(v + γEx)− v) , (27)

presented in [10, 9], where (·)? denotes the convex conju-gate.

Let us introduce some tools first.

Lemma 2. (Extended Moreau Decomposition) [24]It holds that

∀y : y = proxsθ(y) + sproxθ?/s(y/s) . (28)

Lemma 3. (Conjugate of conjugate) [5]For closed convex θ, it holds that θ?? = θ.

Corollary 1. For convex and closed θ, it holds that

∀y : y = proxsθ?(y) + sproxθ/s(y/s) . (29)

Proof. Define θ′ = θ? and apply Lemma 2 and Lemma 3with θ′ in place of θ.

Since for our choice of θ = λG‖·‖2, θ is a closed convexfunction, by Corollary 1, one can write

proxγ(λG‖·‖2)?(y) = y − γ proxλG/γ‖·‖2(y/γ) . (30)

With that, the right-hand side of eq. (27) can be written as

v + β(v + γEx− γ proxλG/γ‖·‖2(v + γEx

γ)− v)

=v + β(γEx− γ proxλG/γ‖·‖2(v + γEx

γ))

=v + βγ(Ex− proxλG/γ‖·‖2(v + γEx

γ)) . (31)

A.3. Factor Splitting

The factor splitting procedure is presented in Algorithm 3.

Input: factor φ ∈ RN×3 where vec(φ) = Φm, graph G = (V, E)

Output: J factors Φ′ ∈ R3N×J with local supportInitialise: E′ = ∅, Φ′ = [ ]// build activation graph G′foreach (i, j) = e ∈ E do

if φi,: 6= 0 ∨ φj,: 6= 0 then // vertex i or j is activeE′ = E′ ∪ e // add edge

G′ = (V, E′)// find connected components [30]

C = connectedComponents(G′)foreach c ∈ C ⊂ V do // add new factor

φ′ = 0N×3

φ′c,: = φc,:Φ′ = [Φ′, vec(φ′)]

Algorithm 3: Factor splitting procedure.

A.4. Parameter Random Sampling

The methods kPCA, SPCA, SSPCA, SPLOCS and oursrequire to set various parameters. In order to find a goodparametrisation we conducted random sampling over theparameter space, where we determined reasonable rangesfor the parameters experimentally.

In Table 1 the assumed distributions of the parametersfor each method are summarised, where U(a, b) denotes theuniform distribution with the open interval (a, b) as support,N (µ, σ2) denotes the normal distribution with mean µ andvariance σ2, 10U(·,·) is a distribution of the random variabley = 10x, where x ∼ U(·, ·), and dmax denotes the largestdistance between all pairs of vertices of the mean shape X .

Method Parameter Distribution

kPCA β ∼ (U(1, 10))−1

SPCA λ ∼ 13N

10U(−4,−3) (see eq. (2) in [18])SSPCA λ ∼ 1

N10U(−4,−3) (see eq. (2) in [18])

SPLOCS λ ∼ (3K) · 10U(−4,−3); dmin ∼ c−w; dmax ∼ c+w(see eq. (6) in [23]), where c ∼ U(0.1, dmax − 0.1)and w ∼ U(0,min(|c− 0.1|, |dmax − 0.1− c|))

our β ∼ (U(1, 10))−1; λ ∼ 3NKM

max(10−8,N (64, 25));

λG ∼ (3|E|)−0.5 · 10U(−2,1); λ1 ∼ (3N)−0.5 · 10U(−2,1);λ2 ∼ (3N)−0.5 · 10U(−2,1); λ∞ ∼ 2N−0.5 · 10U(−2,1);λA ∼ K−0.5 · 10U(−5,2)

Table 1. Assumed distributions of the parameters interpreted asrandom variables.

For each of the nr random draws (we set nr = 500 forthe brain shapes dataset, and nr = 50 for the human bodyshapes, due to the large size of the dataset), we computethe 8 scores described in section 3.1 and store them in thescore matrix S ∈ Rnr×8. After (linearly) mapping each nr-dimensional column vector in S onto the interval [0, 1], thebest parametrisation is determined by finding the index ofthe smallest value of the vector S18 ∈ Rnr .