Gaussian Graphical Models with latent structure

Post on 24-May-2015

709 views 0 download

Tags:

Transcript of Gaussian Graphical Models with latent structure

Penalized Maximum Likelihood Inference forSparse Gaussian Graphical Models with

Latent Structure

Christophe Ambroise, Julien Chiquet and Catherine Matias

Laboratoire Statistique et Genome,La genopole - Universite d’Evry

Statistique et sante publique, le 13 janvier 2009

Ambroise, Chiquet, Matias 1

Inferring Sparse Networks with LatentStructure

Christophe Ambroise, Julien Chiquet and Catherine Matias

Laboratoire Statistique et Genome,La genopole - Universite d’Evry

Statistique et sante publique, le 13 janvier 2009

Ambroise, Chiquet, Matias 1

Biological networksDifferent kinds of biological interactions

Families of networksI protein-protein

interactions,I metabolic pathways,I regulation network.

lexA

dinI

recF

rpD rpH

SsB

recA

umD

rpS

Regulation example : SOS Network E. Coli

Let us focus on regulatory networks . . . and look for influencenetwork

Ambroise, Chiquet, Matias 2

Biological networksDifferent kinds of biological interactions

Families of networksI protein-protein

interactions,I metabolic pathways,I regulation network.

lexA

dinI

recF

rpD rpH

SsB

recA

umD

rpS

Regulation example : SOS Network E. Coli

Let us focus on regulatory networks . . . and look for influencenetwork

Ambroise, Chiquet, Matias 2

Biological networksDifferent kinds of biological interactions

Families of networksI protein-protein

interactions,I metabolic pathways,I regulation network.

lexA

dinI

recF

rpD rpH

SsB

recA

umD

rpS

Regulation example : SOS Network E. Coli

Let us focus on regulatory networks . . . and look for influencenetwork

Ambroise, Chiquet, Matias 2

What questions?

Network

Inference

Supervised

Un-supervised

StructureDegree

distri-bution

Communityanalysis

Stat.model

Spectralclustering

How to find the interactions?What knowledge the structurecan provide?

Given a new node, what are theinteraction with the known nodes?

Given two nodes, do theyinteract?

Communities’ characteristics?Ambroise, Chiquet, Matias 3

What questions?

Network

Inference

Supervised

Un-supervised

StructureDegree

distri-bution

Communityanalysis

Stat.model

Spectralclustering

How to find the interactions?What knowledge the structurecan provide?

Given a new node, what are theinteraction with the known nodes?

Given two nodes, do theyinteract?

Communities’ characteristics?Ambroise, Chiquet, Matias 3

ProblemInfer the interactions between genes from microarray data

Microarray gene expression data,p genes, n experiments Which ones interact/co-express?

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Major Issues

I combinatory: 2p2

possible graphsI dimension problem: n� p

Here, we reduce p to a number of fixed genes of interest

Ambroise, Chiquet, Matias 4

ProblemInfer the interactions between genes from microarray data

Microarray gene expression data,p genes, n experiments

Inference

Which ones interact/co-express?

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Major Issues

I combinatory: 2p2

possible graphsI dimension problem: n� p

Here, we reduce p to a number of fixed genes of interest

Ambroise, Chiquet, Matias 4

ProblemInfer the interactions between genes from microarray data

Microarray gene expression data,p genes, n experiments

Inference

Which ones interact/co-express?

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Major Issues

I combinatory: 2p2

possible graphsI dimension problem: n� p

Here, we reduce p to a number of fixed genes of interest

Ambroise, Chiquet, Matias 4

Our ideas to tackle these issues

Introduce prior taking the topology of the network intoaccount for better edge inference

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Relying on biological constraints

1. few genes effectively interact (sparsity),2. networks are organized (latent structure).

Ambroise, Chiquet, Matias 5

Our ideas to tackle these issues

Introduce prior taking the topology of the network intoaccount for better edge inference

G0 G1

G2

G3

G4

G5

G6

G7

G8

G9

Relying on biological constraints

1. few genes effectively interact (sparsity),2. networks are organized (latent structure).

Ambroise, Chiquet, Matias 5

Our ideas to tackle these issues

Introduce prior taking the topology of the network intoaccount for better edge inference

A1 A2

A3

B1

B2

B3

B4

B5

C1

C2

Relying on biological constraints

1. few genes effectively interact (sparsity),2. networks are organized (latent structure).

Ambroise, Chiquet, Matias 5

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 6

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 6

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 6

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 7

GGMsGeneral settings

The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray

experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.

The graphical interpretation

Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,

since rij|P\{i,j} = −Kij/√KiiKjj .

K describes the graph of conditional dependencies.

Ambroise, Chiquet, Matias 8

GGMsGeneral settings

The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray

experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.

The graphical interpretation

Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,

since rij|P\{i,j} = −Kij/√KiiKjj .

K describes the graph of conditional dependencies.

Ambroise, Chiquet, Matias 8

GGMs and regressionNetwork inference as p independent regression problems

One may use p different linear regressions

Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,

Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):

α = arg minα

1n

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.

Major drawback: need of a symmetrization step to obtain afinal estimate of K.

Ambroise, Chiquet, Matias 9

GGMs and regressionNetwork inference as p independent regression problems

One may use p different linear regressions

Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,

Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):

α = arg minα

1n

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.

Major drawback: need of a symmetrization step to obtain afinal estimate of K.

Ambroise, Chiquet, Matias 9

GGMs and regressionNetwork inference as p independent regression problems

One may use p different linear regressions

Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,

Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):

α = arg minα

1n

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.

Major drawback: need of a symmetrization step to obtain afinal estimate of K.

Ambroise, Chiquet, Matias 9

GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood

Consider the approximation P(X) =∏pi=1 P(Xi|X\i).

PropositionThe solution to

K = arg maxK,Kij 6=Kji

log L(X; K) + ρ ‖K‖`1 , (1)

with

L(X; K) =p∑i=1

( n∑k=1

log P(Xki |Xk

\i; Ki)),

shares the same null-entries as the solution of the pindependent penalized regressions.

Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization

Ambroise, Chiquet, Matias 10

GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood

Consider the approximation P(X) =∏pi=1 P(Xi|X\i).

PropositionThe solution to

K = arg maxK,Kij 6=Kji

log L(X; K) + ρ ‖K‖`1 , (1)

with

L(X; K) =p∑i=1

( n∑k=1

log P(Xki |Xk

\i; Ki)),

shares the same null-entries as the solution of the pindependent penalized regressions.

Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization

Ambroise, Chiquet, Matias 10

GGMs and penalized likelihood

The penalized likelihood of the Gaussian observationsUse a penalty term

n

2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,

where Sn is the empirical covariance matrix.

Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.

Ambroise, Chiquet, Matias 11

GGMs and penalized likelihood

The penalized likelihood of the Gaussian observationsUse a penalty term

n

2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,

where Sn is the empirical covariance matrix.

Natural generalizationUse different penalty parameters for different coefficients

n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1 ,

where ρZ(K) = (ρZi,Zj (Kij))i,j is a penalty function dependingon an unknown underlying structure Z.

Ambroise, Chiquet, Matias 11

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 12

The concentration matrix structureModelling connection heterogeneity

Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.

The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with

I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.

A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

Ambroise, Chiquet, Matias 13

The concentration matrix structureModelling connection heterogeneity

Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.

The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with

I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.

A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

Ambroise, Chiquet, Matias 13

The concentration matrix structureModelling connection heterogeneity

Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.

The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with

I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.

A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

Ambroise, Chiquet, Matias 13

Some possible structures

Figure: From Affiliation to Bipartite

A1 A2

A3

B1

B2

B3

B4

B5

C1

C2

ExampleModular (affiliation) networkTwo kinds of Laplace distributions

1. intra-cluster q = `, fin(·;λin);

2. inter-cluster q 6= `, fout(·;λout).

Ambroise, Chiquet, Matias 14

Some possible structures

Figure: From Affiliation to Bipartite

A1 A2

A3

B1

B2

B3

B4

B5

C1

C2

ExampleModular (affiliation) networkTwo kinds of Laplace distributions

1. intra-cluster q = `, fin(·;λin);

2. inter-cluster q 6= `, fout(·;λout).

Ambroise, Chiquet, Matias 14

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 15

Looking for a criteria. . .

We wish to infer non-null entries of K knowing the data. Thenour strategy is

K = arg maxK�0

P(K|X) = arg maxK�0

log P(X,K).

Marginalization over Z

Because distribution of K is known conditional on the structure !

K = arg maxK�0

log∑Z∈ZLc(X,K,Z),

where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.

An EM–like strategy is used hereafter to solve this problem.

Ambroise, Chiquet, Matias 16

Looking for a criteria. . .

We wish to infer non-null entries of K knowing the data. Thenour strategy is

K = arg maxK�0

P(K|X) = arg maxK�0

log P(X,K).

Marginalization over Z

Because distribution of K is known conditional on the structure !

K = arg maxK�0

log∑Z∈ZLc(X,K,Z),

where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.

An EM–like strategy is used hereafter to solve this problem.

Ambroise, Chiquet, Matias 16

Looking for a criteria. . .

We wish to infer non-null entries of K knowing the data. Thenour strategy is

K = arg maxK�0

P(K|X) = arg maxK�0

log P(X,K).

Marginalization over Z

Because distribution of K is known conditional on the structure !

K = arg maxK�0

log∑Z∈ZLc(X,K,Z),

where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.

An EM–like strategy is used hereafter to solve this problem.

Ambroise, Chiquet, Matias 16

The complete likelihood

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1

−∑

i,j∈P,i 6=jq,`∈Q

ZiqZjl log(2λq`) +∑

i∈P,q∈QZiq logαq + c,

where Sn is the empirical covariance matrix andρZ(K) =

(ρZiZj (Kij)

)(i,j)∈P2 is defined by

ρZiZj (Kij) =∑q,`∈Q

ZiqZj`Kij

λq`.

Ambroise, Chiquet, Matias 17

The complete likelihood

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1

−∑

i,j∈P,i 6=jq,`∈Q

ZiqZjl log(2λq`) +∑

i∈P,q∈QZiq logαq + c,

where Sn is the empirical covariance matrix andρZ(K) =

(ρZiZj (Kij)

)(i,j)∈P2 is defined by

ρZiZj (Kij) =∑q,`∈Q

ZiqZj`Kij

λq`.

Part concerning K: PML with a LASSO-type approach.

Ambroise, Chiquet, Matias 17

The complete likelihood

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK)) − ‖ρZ(K)‖`1

−∑

i,j∈P,i 6=jq,`∈Q

ZiqZjl log(2λq`) +∑

i∈P,q∈QZiq logαq + c,

where Sn is the empirical covariance matrix andρZ(K) =

(ρZiZj (Kij)

)(i,j)∈P2 is defined by

ρZiZj (Kij) =∑q,`∈Q

ZiqZj`Kij

λq`.

Part concerning Z: estimation with a variational approach.

Ambroise, Chiquet, Matias 17

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 18

An EM strategy

The conditional expectation to maximize

Q(K|K(m)

)= E

{logLc(X,K,Z)|X; K(m)

},∑Z∈Z

P(Z|X,K(m)

)logLc(X,K,Z)

=∑Z∈Z

P(Z|K(m)

)logLc(X,K,Z).

ProblemI No closed-form of Q

(K|K(m)

)because P(Z|K) cannot be

factorized.I We use variational approach to approximate P(Z|K).

Ambroise, Chiquet, Matias 19

An EM strategy

The conditional expectation to maximize

Q(K|K(m)

)= E

{logLc(X,K,Z)|X; K(m)

},∑Z∈Z

P(Z|X,K(m)

)logLc(X,K,Z)

=∑Z∈Z

P(Z|K(m)

)logLc(X,K,Z).

ProblemI No closed-form of Q

(K|K(m)

)because P(Z|K) cannot be

factorized.I We use variational approach to approximate P(Z|K).

Ambroise, Chiquet, Matias 19

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 20

Variational estimation of the latent structureDaudin et. al, 2008

PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =

∏iRτ i(Zi) where Rτ i is a multinomial distribution with

parameters τ i.

I Maximize a lower bound of the log-likelihood

J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).

I Using its tractable form, we have

J (Rτ (Z)) =∑Z

Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).

Ambroise, Chiquet, Matias 21

Variational estimation of the latent structureDaudin et. al, 2008

PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =

∏iRτ i(Zi) where Rτ i is a multinomial distribution with

parameters τ i.

I Maximize a lower bound of the log-likelihood

J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).

I Using its tractable form, we have

J (Rτ (Z)) =∑Z

Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).

This term plays the role of E(Lc(X,K,Z)|X,K(m))

Ambroise, Chiquet, Matias 21

Variational estimation of the latent structureDaudin et. al, 2008

PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =

∏iRτ i(Zi) where Rτ i is a multinomial distribution with

parameters τ i.

I Maximize a lower bound of the log-likelihood

J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).

I Using its tractable form, we have

J (Rτ (Z)) =∑Z

Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).

This term plays the role of E(Lc(X,K,Z)|X,K(m))

Maximizing J leads to a fix-point relationship for τ

Ambroise, Chiquet, Matias 21

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 22

The M–stepSeen as a penalized likelihood problem

We aim at solvingK = arg max

K�0Qτ (K),

where

Penalized likelihood problem

Qτ (K) ={n

2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst

},

Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.

Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.

We deal with a more complex penalty term here.

Ambroise, Chiquet, Matias 23

The M–stepSeen as a penalized likelihood problem

We aim at solvingK = arg max

K�0Qτ (K),

where

Penalized likelihood problem

Qτ (K) ={n

2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst

},

Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.

Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.

We deal with a more complex penalty term here.

Ambroise, Chiquet, Matias 23

Let us work on the covariance matrix

PropositionThe maximization problem over K is equivalent to the following,dealing with the covariance matrix Σ:

Σ = arg max‖(Σ−Sn)·/P‖∞≤1

log det(Σ),

where ·/

is the term-by-term division and

P = (pij)i,j∈P =2n

∑q,`

τiqτj`λq`

.

The proof use some optimization, primal/dual tricks

Ambroise, Chiquet, Matias 24

A Block-wise resolution

Denote

Σ =

[Σ11 σ12

σᵀ12 Σ22

], Sn =

[S11 s12

sᵀ12 S22

], P =

[P11 p12

pᵀ12 P22

], (2)

where Σ11 is a (p− 1)× (p− 1) matrix, σ12 is a p− 1 lengthcolumn vector and Σ22 is a scalar.

Each column of Σ satisfies (by det of Schur complement)

σ12 = arg min{‖(y−s12)·/p12‖∞≤1}

{yᵀΣ

−1

11 y},

Ambroise, Chiquet, Matias 25

A `1–norm penalized writing

PropositionSolving the block-wise problem is equivalent to solve thefollowing dual problem

minβ

∥∥∥∥12Σ

1/2

11 β − Σ−1/2

11 s12

∥∥∥∥2

2

+ ‖p12 ? β‖`1 ,

where ? is the term-by-term product. Vectors σ12 and β arelinked by

σ12 = Σ11β/2.

A LASSO-like formulation with existing costless algorithms

Ambroise, Chiquet, Matias 26

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

end

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

Ambroise, Chiquet, Matias 27

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 28

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 29

Simulations settings

Five inference methods

1. InvCorEdge estimation based on empirical correlation matrix inversion.

2. GeneNet (Strimmer et al.)Edge estimation based on partial correlation with shrinkage.

3. GLasso (Friedman et al.)Edge estimation uses a uniform penalty matrix.

4. “perfect” SIMoNe (best results our method can aspire to)Edge estimation uses a penalty matrix constructed according to the theoretic

node classification.

5. SIMoNe (Statistical Inference for MOdular NEtworks)Edge estimation uses a penalty matrix constructed according to the estimated

node classification, iteratively.

Ambroise, Chiquet, Matias 30

Test simulation setup

Simulated Graphs

I Graphs simulated using an affiliation model (two sets ofparameters: intra-groups and inter-groups connections)

I p = 200 nodes p(p− 1)/2 = 19900 possible interactions.I 50 graphs (repetitions) were simulated per situation.I Gene expression data (i.e., Gaussian samples) was then

simulated using the sampled graph:1. Favorable setting (n = 10p),2. Middle case (n = 2p)3. Unfavorable setting (n = p/2)

Unstructured graph

I When no structure SIMoNe is comparable to GeneNet andGLasso

Ambroise, Chiquet, Matias 31

Concentration matrix and structure

(a) (b) (c)

Figure: Simulation of the structured sparse concentration matrix.Adjacency matrix without (a), with (b) columns reorganizedaccording the affiliation structure and corresponding graph (c).

Ambroise, Chiquet, Matias 32

Example of graph recoveryFavorable case

Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33

Example of graph recoveryFavorable case

Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33

Precision/Recall CurvesDefinition

Precision =TP

TP + FP= Proportion of true positives among all positives

Recall =TP

TP + FN= Proportion of true positive among all edges

Ambroise, Chiquet, Matias 34

Precision/Recall CurvesFavorable setting – n = 10p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor

Ambroise, Chiquet, Matias 34

Precision/Recall CurvesFavorable setting – n = 6p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor

Ambroise, Chiquet, Matias 34

Precision/Recall CurvesMiddle case – n = 3p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor

Ambroise, Chiquet, Matias 34

Precision/Recall CurvesMiddle case – n = 2p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe

Ambroise, Chiquet, Matias 34

Precision/Recall CurvesUnfavorable case – n = p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNet

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe

Ambroise, Chiquet, Matias 34

Precision/Recall CurvesUnfavorable case – n = p/2

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Recall

Prec

ision

SIMoNeGLassoPerfectGeneNet

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe

Ambroise, Chiquet, Matias 34

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Ambroise, Chiquet, Matias 35

First results on real a datasetPrediction of the outcome of preoperative chemotherapy

Two types of patients

1. Patient response can be classified as either a pathologiccomplete response (PCR)

2. or residual disease (Not PCR).

Gene expression data

I 133 patients (99 not PCR, 34 PCR)I 26 identified genes (differential analysis)

Ambroise, Chiquet, Matias 36

First result on real a datasetPrediction of the outcome of preoperative chemotherapy

AMFR

BB_S4

BECNI

BTG3

CA12

CTNND2

E2F3

ERBB4

FGFRIOP

FLJ10916

FLJI2650

GAMT

GFRAI

IGFBP4

JMJD2B

KIA1467

MAPT

MBTP_SI

MELK

METRN

PDGFRA

RAMPI

RRM2

SCUBE2

THRAP2

ZNF552

Full SampleAmbroise, Chiquet, Matias 37

First result on real a datasetPrediction of the outcome of preoperative chemotherapy

AMFR

BB_S4

BECNI

BTG3

CA12

CTNND2

E2F3

ERBB4

FGFRIOP

FLJ10916

FLJI2650

GAMT

GFRAI

IGFBP4

JMJD2B

KIA1467

MAPT

MBTP_SI

MELK

METRN

PDGFRA

RAMPI

RRM2

SCUBE2

THRAP2

ZNF552

Not PCRAmbroise, Chiquet, Matias 37

First result on real a datasetPrediction of the outcome of preoperative chemotherapy

AMFR

BB_S4

BECNI

BTG3

CA12

CTNND2

E2F3

ERBB4

FGFRIOP

FLJ10916

FLJI2650

GAMT

GFRAI

IGFBP4

JMJD2B

KIA1467

MAPT

MBTP_SI

MELKMETRN

PDGFRA

RAMPI

RRM2

SCUBE2

THRAP2

ZNF552

PCRAmbroise, Chiquet, Matias 37

Conclusions

To sum-up

I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.

I The estimation strategy is based on a variational EM

algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe

Perspectives

I Consider alternative prior more biologically relevant: hubs,motifs.

I Time segmentation when dealing with temporal data

Ambroise, Chiquet, Matias 38

Conclusions

To sum-up

I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.

I The estimation strategy is based on a variational EM

algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe

Perspectives

I Consider alternative prior more biologically relevant: hubs,motifs.

I Time segmentation when dealing with temporal data

Ambroise, Chiquet, Matias 38

Penalty choice (1)

Let Ci denote the connectivity component of i in the trueconditional dependency graph, and Ci the correspondingcomponent resulting from the estimate K.

PropositionFix some ε > 0 and choose the penalty parameters λ such that,for all q, ` ∈ Q,

2p2Fn−2

2nλq`

(maxi 6=j

SiiSjj −1λ2q`

)−1/2

(n− 2)1/2

≤ ε,where 1− Fn−2 is the c.d.f. of a Students’s t-distribution withn− 2 degrees of freedom. Then

P(∃k, Ck * Ck) ≤ ε. (3)

Ambroise, Chiquet, Matias 39

Penalty choice (2)

It’s enough to choose λq` such as

λq`(ε) ≥2n

(n− 2 + t2n−2

2p2

))1/2

×

maxi 6=j

ZiqZj`=1

SiiSjj

−1/2

tn−2

2p2

)−1

.

Ambroise, Chiquet, Matias 40

Penalty choice (3)

Practically,

I Relax the λq` in the E–step (variational inference), thusmaking variational EM in the E-step.

I Fix the λq` in the M-step, adapting the above rule to thecontext.E.g., for an affiliation structure, we fix the ratio λin/λout = 1.2 and either let the

value 1/λin vary when considering precision/recall curves for synthetic data, or fix

this parameter relying on the above rule when dealing with real data

Ambroise, Chiquet, Matias 41