Gaussian Graphical Models with latent structure

Penalized Maximum Likelihood Inference forSparse Gaussian Graphical Models with

Latent Structure

Christophe Ambroise, Julien Chiquet and Catherine Matias

Laboratoire Statistique et Genome,La genopole - Universite d’Evry

Statistique et sante publique, le 13 janvier 2009

Ambroise, Chiquet, Matias 1

Inferring Sparse Networks with LatentStructure

Christophe Ambroise, Julien Chiquet and Catherine Matias

Laboratoire Statistique et Genome,La genopole - Universite d’Evry

Statistique et sante publique, le 13 janvier 2009

Biological networksDifferent kinds of biological interactions

Families of networksI protein-protein

interactions,I metabolic pathways,I regulation network.

rpD rpH

Regulation example : SOS Network E. Coli

Let us focus on regulatory networks . . . and look for influencenetwork

rpD rpH

What questions?

Network

Inference

Supervised

Un-supervised

StructureDegree

distri-bution

Communityanalysis

Stat.model

Spectralclustering

How to find the interactions?What knowledge the structurecan provide?

Given a new node, what are theinteraction with the known nodes?

Given two nodes, do theyinteract?

Communities’ characteristics?Ambroise, Chiquet, Matias 3

What questions?

Network

Inference

Supervised

Un-supervised

StructureDegree

distri-bution

Communityanalysis

Stat.model

Spectralclustering

How to find the interactions?What knowledge the structurecan provide?

Given a new node, what are theinteraction with the known nodes?

Given two nodes, do theyinteract?

Communities’ characteristics?Ambroise, Chiquet, Matias 3

ProblemInfer the interactions between genes from microarray data

Microarray gene expression data,p genes, n experiments Which ones interact/co-express?

Major Issues

I combinatory: 2p2

possible graphsI dimension problem: n� p

Here, we reduce p to a number of fixed genes of interest

Microarray gene expression data,p genes, n experiments

Inference

Which ones interact/co-express?

Major Issues

I combinatory: 2p2

Microarray gene expression data,p genes, n experiments

Inference

Which ones interact/co-express?

Major Issues

I combinatory: 2p2

Our ideas to tackle these issues

Introduce prior taking the topology of the network intoaccount for better edge inference

Relying on biological constraints

1. few genes effectively interact (sparsity),2. networks are organized (latent structure).

Outline

Give the network a modelGaussian graphical modelsProviding the network with a latent structureThe complete likelihood

Inference strategy by alternate optimizationThe E–step: estimation of the latent structureThe M–step: inferring the connectivity matrix

Numerical ExperimentsSynthetic dataBreast cancer data

Outline

GGMsGeneral settings

The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray

experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.

The graphical interpretation

Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,

since rij|P\{i,j} = −Kij/√KiiKjj .

K describes the graph of conditional dependencies.

GGMsGeneral settings

The Gaussian modelI Let X ∈ Rp be a random vector such as X ∼ N (0p,Σ);I let (X1, . . . , Xn) be an i.i.d. size–n sample (e.g., microarray

experiments);I let X be a n× p matrix such as (Xk)ᵀ is the kth row of X;I let K = (Kij)(i,j)∈P2 := Σ−1 be the concentration matrix.

The graphical interpretation

Xi ⊥⊥ Xj |XP\{i,j} ⇔ Kij = 0⇔ edge (i, j) /∈ network,

since rij|P\{i,j} = −Kij/√KiiKjj .

K describes the graph of conditional dependencies.

GGMs and regressionNetwork inference as p independent regression problems

One may use p different linear regressions

Xi = (X\i)ᵀα+ ε, where αj = −Kij/Kii,

Meinshausen and Bulhman’s approach (06)Solve p independent Lasso problems (`1–norm enforcessparsity):

α = arg minα

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

where Xi is the ith column of X, and X\i is the full matrix with ithcolumn removed.

Major drawback: need of a symmetrization step to obtain afinal estimate of K.

α = arg minα

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

α = arg minα

∥∥Xi −X\iα∥∥2

2+ ρ ‖α‖`1 ,

GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood

Consider the approximation P(X) =∏pi=1 P(Xi|X\i).

PropositionThe solution to

K = arg maxK,Kij 6=Kji

log L(X; K) + ρ ‖K‖`1 , (1)

L(X; K) =p∑i=1

( n∑k=1

log P(Xki |Xk

\i; Ki)),

shares the same null-entries as the solution of the pindependent penalized regressions.

Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization

GGMs and LassoSolving p penalized regressions⇔maximize the penalized pseudo-likelihood

Consider the approximation P(X) =∏pi=1 P(Xi|X\i).

PropositionThe solution to

K = arg maxK,Kij 6=Kji

log L(X; K) + ρ ‖K‖`1 , (1)

L(X; K) =p∑i=1

( n∑k=1

log P(Xki |Xk

\i; Ki)),

shares the same null-entries as the solution of the pindependent penalized regressions.

Those p terms are not independent, as K is not diagonal ! Still requires the post-symmetrization

GGMs and penalized likelihood

The penalized likelihood of the Gaussian observationsUse a penalty term

2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,

where Sn is the empirical covariance matrix.

Banerjee et al. Model selection through sparse maximumlikelihood estimation for multivariate Gaussian, JMLR, 2008.

GGMs and penalized likelihood

The penalized likelihood of the Gaussian observationsUse a penalty term

2(log det(K)− Tr(SnK))− ρ‖K‖`1 ,

where Sn is the empirical covariance matrix.

Natural generalizationUse different penalty parameters for different coefficients

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1 ,

where ρZ(K) = (ρZi,Zj (Kij))i,j is a penalty function dependingon an unknown underlying structure Z.

Outline

The concentration matrix structureModelling connection heterogeneity

Assumption: there exists a latent structure spreading thevertices into a set Q = {1, . . . , q, . . . , Q} of classes of connectivity.

The classes of connectivityDenote Z = {Zi = (Zi1, . . . , ZiQ)}i where Ziq = 1{i∈q} are thelatent independent variables, with

I α = {αq}, the prior proportions of groups,I (Zi) ∼M(1, α), a multinomial distribution.

A mixture of Laplace distributionsAssume Kij |Z independent. Then Kij | {ZiqZj` = 1} ∼ fq`(·),where

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

fq`(x) =1

2λq`exp

{− |x|λq`

}, q, ` ∈ Q.

Some possible structures

Figure: From Affiliation to Bipartite

ExampleModular (affiliation) networkTwo kinds of Laplace distributions

1. intra-cluster q = `, fin(·;λin);

2. inter-cluster q 6= `, fout(·;λout).

Some possible structures

Figure: From Affiliation to Bipartite

ExampleModular (affiliation) networkTwo kinds of Laplace distributions

1. intra-cluster q = `, fin(·;λin);

2. inter-cluster q 6= `, fout(·;λout).

Outline

Looking for a criteria. . .

We wish to infer non-null entries of K knowing the data. Thenour strategy is

K = arg maxK�0

P(K|X) = arg maxK�0

log P(X,K).

Marginalization over Z

Because distribution of K is known conditional on the structure !

K = arg maxK�0

log∑Z∈ZLc(X,K,Z),

where Lc(X,K,Z) = P(X,K,Z) is complete-data likelihood.

An EM–like strategy is used hereafter to solve this problem.

K = arg maxK�0

log P(X,K).

K = arg maxK�0

log P(X,K).

K = arg maxK�0

The complete likelihood

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1

−∑

i,j∈P,i 6=jq,`∈Q

ZiqZjl log(2λq`) +∑

i∈P,q∈QZiq logαq + c,

where Sn is the empirical covariance matrix andρZ(K) =

(ρZiZj (Kij)

)(i,j)∈P2 is defined by

ρZiZj (Kij) =∑q,`∈Q

ZiqZj`Kij

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK))− ‖ρZ(K)‖`1

−∑

(ρZiZj (Kij)

ZiqZj`Kij

Part concerning K: PML with a LASSO-type approach.

Proposition

logLc(X,K,Z) =n

2(log det(K)− Tr(SnK)) − ‖ρZ(K)‖`1

−∑

(ρZiZj (Kij)

ZiqZj`Kij

Part concerning Z: estimation with a variational approach.

Outline

An EM strategy

The conditional expectation to maximize

Q(K|K(m)

{logLc(X,K,Z)|X; K(m)

},∑Z∈Z

P(Z|X,K(m)

)logLc(X,K,Z)

=∑Z∈Z

P(Z|K(m)

)logLc(X,K,Z).

ProblemI No closed-form of Q

(K|K(m)

)because P(Z|K) cannot be

factorized.I We use variational approach to approximate P(Z|K).

An EM strategy

The conditional expectation to maximize

Q(K|K(m)

{logLc(X,K,Z)|X; K(m)

},∑Z∈Z

P(Z|X,K(m)

)logLc(X,K,Z)

=∑Z∈Z

P(Z|K(m)

)logLc(X,K,Z).

ProblemI No closed-form of Q

(K|K(m)

)because P(Z|K) cannot be

factorized.I We use variational approach to approximate P(Z|K).

Outline

Variational estimation of the latent structureDaudin et. al, 2008

PrincipleUse an approximation R(Z) of P(Z|K) in the factorized form,Rτ (Z) =

∏iRτ i(Zi) where Rτ i is a multinomial distribution with

parameters τ i.

I Maximize a lower bound of the log-likelihood

J (Rτ (Z)) = L(X,K)−DKL(Rτ (Z)‖P(Z|K)).

I Using its tractable form, we have

J (Rτ (Z)) =∑Z

Rτ (Z)Lc(X,K,Z) +H(Rτ (Z)).

parameters τ i.

J (Rτ (Z)) =∑Z

This term plays the role of E(Lc(X,K,Z)|X,K(m))

parameters τ i.

J (Rτ (Z)) =∑Z

This term plays the role of E(Lc(X,K,Z)|X,K(m))

Maximizing J leads to a fix-point relationship for τ

Outline

The M–stepSeen as a penalized likelihood problem

We aim at solvingK = arg max

K�0Qτ (K),

Penalized likelihood problem

Qτ (K) ={n

2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst

Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.

We deal with a more complex penalty term here.

The M–stepSeen as a penalized likelihood problem

We aim at solvingK = arg max

K�0Qτ (K),

Penalized likelihood problem

Qτ (K) ={n

2(log det(K)− Tr(SnK))− ‖ρτ (K)‖`1 + Cst

Friedman, Hastie, Tibshirani. Sparse inverse covariance estimationwith the Lasso, Biostatistics, 2007.

We deal with a more complex penalty term here.

Let us work on the covariance matrix

PropositionThe maximization problem over K is equivalent to the following,dealing with the covariance matrix Σ:

Σ = arg max‖(Σ−Sn)·/P‖∞≤1

log det(Σ),

where ·/

is the term-by-term division and

P = (pij)i,j∈P =2n

∑q,`

τiqτj`λq`

The proof use some optimization, primal/dual tricks

A Block-wise resolution

Denote

[Σ11 σ12

σᵀ12 Σ22

], Sn =

[S11 s12

sᵀ12 S22

], P =

[P11 p12

pᵀ12 P22

], (2)

where Σ11 is a (p− 1)× (p− 1) matrix, σ12 is a p− 1 lengthcolumn vector and Σ22 is a scalar.

Each column of Σ satisfies (by det of Schur complement)

σ12 = arg min{‖(y−s12)·/p12‖∞≤1}

{yᵀΣ

11 y},

A `1–norm penalized writing

PropositionSolving the block-wise problem is equivalent to solve thefollowing dual problem

∥∥∥∥12Σ

11 β − Σ−1/2

11 s12

∥∥∥∥2

+ ‖p12 ? β‖`1 ,

where ? is the term-by-term product. Vectors σ12 and β arelinked by

σ12 = Σ11β/2.

A LASSO-like formulation with existing costless algorithms

The full EM algorithm

while bQτ ( bK(m)) has not stabilized do

//THE E-STEP: LATENT STRUCTURE INFERENCEif m = 1 then

// First passApply spectral clustering on the empirical covariance S to initialize bτ

elseCompute bτ with via fix-point algorithm, using bK(m−1)

//THE M-STEP: NETWORK INFERENCEConstruct the penalty matrix P according to bτwhile bΣ(m)

has not stabilized do

for each column of bΣ(m)do

Compute bσ12 by solving the LASSO–like problem with path-wisecoordinate optimization

endend

Compute bK(m) by block-wise inversion of bΣ(m)

m← m+ 1end

endend

m← m+ 1end

endend

m← m+ 1end

endend

m← m+ 1end

Outline

Simulations settings

Five inference methods

1. InvCorEdge estimation based on empirical correlation matrix inversion.

2. GeneNet (Strimmer et al.)Edge estimation based on partial correlation with shrinkage.

3. GLasso (Friedman et al.)Edge estimation uses a uniform penalty matrix.

4. “perfect” SIMoNe (best results our method can aspire to)Edge estimation uses a penalty matrix constructed according to the theoretic

node classification.

5. SIMoNe (Statistical Inference for MOdular NEtworks)Edge estimation uses a penalty matrix constructed according to the estimated

node classification, iteratively.

Test simulation setup

Simulated Graphs

I Graphs simulated using an affiliation model (two sets ofparameters: intra-groups and inter-groups connections)

I p = 200 nodes p(p− 1)/2 = 19900 possible interactions.I 50 graphs (repetitions) were simulated per situation.I Gene expression data (i.e., Gaussian samples) was then

simulated using the sampled graph:1. Favorable setting (n = 10p),2. Middle case (n = 2p)3. Unfavorable setting (n = p/2)

Unstructured graph

I When no structure SIMoNe is comparable to GeneNet andGLasso

Concentration matrix and structure

(a) (b) (c)

Figure: Simulation of the structured sparse concentration matrix.Adjacency matrix without (a), with (b) columns reorganizedaccording the affiliation structure and corresponding graph (c).

Example of graph recoveryFavorable case

Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33

Example of graph recoveryFavorable case

Figure: Theoretical graph and SIMoNe estimationAmbroise, Chiquet, Matias 33

Precision/Recall CurvesDefinition

Precision =TP

TP + FP= Proportion of true positives among all positives

Recall =TP

TP + FN= Proportion of true positive among all edges

Precision/Recall CurvesFavorable setting – n = 10p

I With n� p, PerfectSIMoNe and SIMoNeperform equivalently

I When 3p > n > p thestucture is partiallyrecovered, SIMoNeimproves the edgesselection.

I When n ≤ p all methodsperform poorly. . .

0.0 0.2 0.4 0.6 0.8 1.0

Recall

SIMoNeGLassoPerfectGeneNetInvcor

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe, InvCor

Precision/Recall CurvesFavorable setting – n = 6p

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Precision/Recall CurvesMiddle case – n = 3p

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Precision/Recall CurvesMiddle case – n = 2p

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Figure: GeneNet, GLasso, PerfectSIMoNe, SIMoNe

Precision/Recall CurvesUnfavorable case – n = p

0.0 0.2 0.4 0.6 0.8 1.0

Recall

SIMoNeGLassoPerfectGeneNet

Precision/Recall CurvesUnfavorable case – n = p/2

0.0 0.2 0.4 0.6 0.8 1.0

Recall

SIMoNeGLassoPerfectGeneNet

Outline

First results on real a datasetPrediction of the outcome of preoperative chemotherapy

Two types of patients

1. Patient response can be classified as either a pathologiccomplete response (PCR)

2. or residual disease (Not PCR).

Gene expression data

I 133 patients (99 not PCR, 34 PCR)I 26 identified genes (differential analysis)

First result on real a datasetPrediction of the outcome of preoperative chemotherapy

CTNND2

FGFRIOP

FLJ10916

FLJI2650

IGFBP4

JMJD2B

KIA1467

MBTP_SI

PDGFRA

SCUBE2

THRAP2

ZNF552

Full SampleAmbroise, Chiquet, Matias 37

CTNND2

FGFRIOP

FLJ10916

FLJI2650

IGFBP4

JMJD2B

KIA1467

MBTP_SI

PDGFRA

SCUBE2

THRAP2

ZNF552

Not PCRAmbroise, Chiquet, Matias 37

CTNND2

FGFRIOP

FLJ10916

FLJI2650

IGFBP4

JMJD2B

KIA1467

MBTP_SI

MELKMETRN

PDGFRA

SCUBE2

THRAP2

ZNF552

PCRAmbroise, Chiquet, Matias 37

Conclusions

To sum-up

I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.

I The estimation strategy is based on a variational EM

algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe

Perspectives

I Consider alternative prior more biologically relevant: hubs,motifs.

I Time segmentation when dealing with temporal data

Conclusions

To sum-up

I We proposed an inference strategy based on apenalization scheme given by an underlying unknownstructure.

I The estimation strategy is based on a variational EM

algorithm, in which a LASSO-like procedure is embedded.I Preprint on arxiv.I R package SIMoNe

Perspectives

I Consider alternative prior more biologically relevant: hubs,motifs.

I Time segmentation when dealing with temporal data

Penalty choice (1)

Let Ci denote the connectivity component of i in the trueconditional dependency graph, and Ci the correspondingcomponent resulting from the estimate K.

PropositionFix some ε > 0 and choose the penalty parameters λ such that,for all q, ` ∈ Q,

2p2Fn−2

2nλq`

(maxi 6=j

SiiSjj −1λ2q`

)−1/2

(n− 2)1/2

≤ ε,where 1− Fn−2 is the c.d.f. of a Students’s t-distribution withn− 2 degrees of freedom. Then

P(∃k, Ck * Ck) ≤ ε. (3)

Penalty choice (2)

It’s enough to choose λq` such as

λq`(ε) ≥2n

(n− 2 + t2n−2

maxi 6=j

ZiqZj`=1

SiiSjj

−1/2

tn−2

Penalty choice (3)

Practically,

I Relax the λq` in the E–step (variational inference), thusmaking variational EM in the E-step.

I Fix the λq` in the M-step, adapting the above rule to thecontext.E.g., for an affiliation structure, we fix the ratio λin/λout = 1.2 and either let the

value 1/λin vary when considering precision/recall curves for synthetic data, or fix

this parameter relying on the above rule when dealing with real data

Gaussian Graphical Models with latent structure

Documents

Transcript of Gaussian Graphical Models with latent structure

Structure Learning in Locally Constant Gaussian Graphical ...math.bu.edu/people/apratim/Dissertation_ApratimGanguly.pdf · Structure Learning in Locally Constant Gaussian Graphical

Structured Regularization for conditional Gaussian graphical model

Localized discriminative Gaussian process latent variable ... · PDF fileLocalized discriminative Gaussian process latent variable model for text ... we use Discriminative Gaussian

The Gaussian Process Latent Variable Model (GPLVM)

Speeding Up Latent Variable Gaussian Graphical Model ... · is the latent variable Gaussian graphical model (LVGGM), which was proposed in [9], and later investigated in [22, 24].

Gaussian Process Structural Equation Models with Latent Variables

Learning Latent Tree Graphical Models - MIT CSAILpeople.csail.mit.edu/myungjin/publications/latentTree.pdf · Learning Latent Tree Graphical Models well as very favorable computational

Latent Variable Graphical Model Selection Using Harmonic ...€¦ · Latent Variable Graphical Model Selection using Harmonic Analysis: Applications to the Human Connectome Project

Latent Variable Graphical Model Selection Using Harmonic Analysis: Applications …openaccess.thecvf.com/content_cvpr_2016/papers/Kim... · 2017-04-04 · Latent Variable Graphical

Learning Latent Tree Graphical Models

University of Groningen Inference of Gaussian graphical ... · Inference of Gaussian graphical models and ordinary differential equations Vujacic, Ivan ... Inference of Gaussian graphical

Time-Varying Gaussian Graphical Models of Molecular ...

Latent Variable Graphical Model Selection using Harmonic ...ranger.uta.edu/~wonhwa/publication/cvpr2016_wonhwa.pdf · Latent Variable Graphical Model Selection using Harmonic Analysis:

Multiscale Gaussian Graphical Models and Algorithms for ...ssg.mit.edu/ssg_theses/ssg_theses_2000_2009/Choi_SM_5_07.pdfMultiscale Gaussian Graphical Models and Algorithms for Large-Scale

Biological Network Inference via Gaussian Graphical Models

Approximate Inference in Gaussian Graphical Modelsssg.mit.edu/~dmm/publications/malioutov_phd.pdfApproximate Inference in Gaussian Graphical Models by Dmitry M. Malioutov Submitted

Latent Gaussian models: Approximate Bayesian inference (INLA)folk.ntnu.no/joeid/phdclass_30jan18.pdf · 2018. 1. 30. · Latent Gaussian models: Approximate Bayesian inference (INLA)

Learning GP-BayesFilters via Gaussian process latent ... · LEARN, a framework for training GP-BayesFilters without ground truth states. Our approach extends Gaussian Process Latent

Learning latent variable structured prediction models with Gaussian perturbations · Learning latent variable structured prediction models with Gaussian perturbations Kevin Bello

Gaussian Graphical Models - Oxford Statisticssteffen/teaching/cimpa/gauss.pdfGaussian graphical models Gaussian Graphical Models Ste en Lauritzen University of Oxford CIMPA Summerschool,