Biological Network Inference via Gaussian Graphical Models

163
An introduction to Biological network inference via Gaussian Graphical Models Christophe Ambroise, Julien Chiquet Statistique et G´ enome, CNRS & Universit´ e d’ ´ Evry Val d’Essonne ao Paulo – School on Advance Science – Octobre 2012 http://stat.genopole.cnrs.fr/~cambroise Network inference 1

Transcript of Biological Network Inference via Gaussian Graphical Models

Page 1: Biological Network Inference via Gaussian Graphical Models

An introduction to Biological network inference viaGaussian Graphical Models

Christophe Ambroise, Julien Chiquet

Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne

Sao Paulo – School on Advance Science – Octobre 2012

http://stat.genopole.cnrs.fr/~cambroise

Network inference 1

Page 2: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 2

Page 3: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 3

Page 4: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 4

Page 5: Biological Network Inference via Gaussian Graphical Models

Real networks

IMany scientific fields :

I World Wide Web

I Biology, sociology, physics

INature of data under study:

I Interactions between N

objects

I O(N

2) possible interactions

INetwork topology :

I Describes the way nodesinteract, structure/functionrelationship

Sample of 250 blogs (nodes) with their links

(edges) of the French political Blogosphere.

Network inference 5

Page 6: Biological Network Inference via Gaussian Graphical Models

What the reconstructed networks are expected to be 1 (1)Regulatory networks

E. coli regulatory network

I relationships betweengene and their products

I inhibition/activation

I impossible to recover atlarge scale

I always incomplete

Figure 1: Regulatory network of E. Coli displayed with colors for each MixNet class (5 groups).

2

1

1and are presumably wrongly assumed to be

Network inference 6

Page 7: Biological Network Inference via Gaussian Graphical Models

What the reconstructed networks are expected to be (2)Regulatory networks

Figure: Regulatory network identified in mammalian cells: highly structured

Network inference 7

Page 8: Biological Network Inference via Gaussian Graphical Models

What the reconstructed networks are expected to be (3)Protein-Protein interaction networks

Figure: Yeast PPI network : do not be mislead by the representation, trust stat !

Network inference 8

Page 9: Biological Network Inference via Gaussian Graphical Models

What the reconstructed networks are expected to be (3)Protein-Protein interaction networks

Figure: Yeast PPI network : do not be mislead by the representation, trust stat !Network inference 8

Page 10: Biological Network Inference via Gaussian Graphical Models

What the reconstructed networks are expected to be (3)Protein-Protein interaction networks

Figure: Yeast PPI network : do not be mislead by the representation, trust stat !

Network inference 8

Page 11: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 9

Page 12: Biological Network Inference via Gaussian Graphical Models

What are we looking at?

Central dogma of molecular biology

DNA mRNA Proteinstranscription translation

replication

Proteins

I are building blocks of any cellular functionality,

I are encoded by the genes,

Ido interact (at the protein and gene level – regulations).

Network inference 10

Page 13: Biological Network Inference via Gaussian Graphical Models

What questions in functional genomics? (1)

Various levels/scales of study

I genome: sequence analysis,

I transcriptome: gene expression levels,

I proteome: protein functions and interactions.

Questions

1. Biological understandingI Mechanisms of diseases,I gene/protein functions and interactions.

2. Medical/clinical careI Diagnostic (type of disease),I prognostic (survival analysis),I treatment (prediction of response).

Network inference 11

Page 14: Biological Network Inference via Gaussian Graphical Models

What questions in functional genomics? (1)

Various levels/scales of study

I genome: sequence analysis,

I transcriptome: gene expression levels,

I proteome: protein functions and interactions.

Questions

1. Biological understandingI Mechanisms of diseases,I gene/protein functions and interactions.

2. Medical/clinical careI Diagnostic (type of disease),I prognostic (survival analysis),I treatment (prediction of response).

Network inference 11

Page 15: Biological Network Inference via Gaussian Graphical Models

What questions in functional genomics? (2)

Central dogma of molecular biology

DNA mRNA Proteinstranscription translation

replication

Basic biostatistical issues

Selecting some genes of interest (biomarkers),

Looking for interactions between them (pathway analysis).

Network inference 12

Page 16: Biological Network Inference via Gaussian Graphical Models

How is this measured? (1)Microarray technology: parallel measurement of many biological features

signal processing

X =

0

B@x

11 x

21 x

31 . . . x

p1

...x

1n x

2n x

21 . . . x

pn

1

CA

Matrix of features n ⌧ p

Expression levels of p

probes are simultaneously

monitored for n individuals

pretreatment

Network inference 13

Page 17: Biological Network Inference via Gaussian Graphical Models

How is this measured? (2)Next Generation Sequencing: parallel measurement of even many more biological features

assembling

X =

0

B@k

11 k

21 k

31 . . . k

p1

...k

1n k

2n k

21 . . . k

pn

1

CA

Matrix of features nn p

Expression counts are extracted

from small repeated sequences

and monitored for n individuals

pretreatment

Network inference 14

Page 18: Biological Network Inference via Gaussian Graphical Models

What questions are we dealing with? (1)Supervised canonical example at the gene level: di↵erential analysis

Leukemia (Golub data, thanks to P. Neuvial)

I AML – Acute Myeloblastic Leukemia, n1 = 11,

I ALL – Acute Lymphoblastic Leukemia n2 = 27,

a n1 + n2 vector of outcome with each patient’s tumor type.

Supervised classificationFind genes with significantdi↵erent expression levelsbetween groups – biomarkers

prediction purpose

Network inference 15

Page 19: Biological Network Inference via Gaussian Graphical Models

What questions are we dealing with? (2)Unsupervised canonical example at the gene level: hierarchical clustering

Same kind of data, no outcome is considered

(Unsupervised) clustering

Find groups of gene which showstatisticaldependencies/commonalities –hoping for biological interactions

exploratory purpose

functional understanding

Can we do better than that ? And how do genes interact anyway?

Network inference 16

Page 20: Biological Network Inference via Gaussian Graphical Models

What questions are we dealing with? (2)Unsupervised canonical example at the gene level: hierarchical clustering

Same kind of data, no outcome is considered

(Unsupervised) clustering

Find groups of gene which showstatisticaldependencies/commonalities –hoping for biological interactions

exploratory purpose

functional understanding

Can we do better than that ? And how do genes interact anyway?

Network inference 16

Page 21: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 17

Page 22: Biological Network Inference via Gaussian Graphical Models

The problem at hand

Inference

⇡ 10s/100s microarray/sequencing experiments

⇡ 1000s probes (“genes”)

Inference

Modeling questions prior to inference

1. What do the nodes represent? (the easiest one)

2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?

Network inference 18

Page 23: Biological Network Inference via Gaussian Graphical Models

The problem at hand

Inference

⇡ 10s/100s microarray/sequencing experiments

⇡ 1000s probes (“genes”)

Inference

Modeling questions prior to inference

1. What do the nodes represent? (the easiest one)

2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?

Network inference 18

Page 24: Biological Network Inference via Gaussian Graphical Models

The problem at hand

Inference

⇡ 10s/100s microarray/sequencing experiments

⇡ 1000s probes (“genes”)

Inference

Modeling questions prior to inference

1. What do the nodes represent? (the easiest one)

2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?

Network inference 18

Page 25: Biological Network Inference via Gaussian Graphical Models

The problem at hand

Inference

⇡ 10s/100s microarray/sequencing experiments

⇡ 1000s probes (“genes”)

Inference

Modeling questions prior to inference

1. What do the nodes represent? (the easiest one)

2. What is/should be the meaning of an edge? (the toughest one)I Biologically?I Statistically?

Network inference 18

Page 26: Biological Network Inference via Gaussian Graphical Models

More questions/issues

Modelling

I Is the network dynamic of static?

I How has the data been generated? (time-course/steady state)

I Are the edges oriented or not? (causality)

I What do the edges represent for my particular problem?

Statistical challenges

I (Ultra) high dimensionality,

I Noisy data, lack of reproducibility,

I Heterogeneity of the data (many techniques, various signals).

Network inference 19

Page 27: Biological Network Inference via Gaussian Graphical Models

More questions/issues

Modelling

I Is the network dynamic of static?

I How has the data been generated? (time-course/steady state)

I Are the edges oriented or not? (causality)

I What do the edges represent for my particular problem?

Statistical challenges

I (Ultra) high dimensionality,

I Noisy data, lack of reproducibility,

I Heterogeneity of the data (many techniques, various signals).

Network inference 19

Page 28: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 20

Page 29: Biological Network Inference via Gaussian Graphical Models

Canonical model settingsBiological microarrays in comparable conditions

Notations

1. a set P = {1, . . . , p} of p variables:these are typically the genes (could be proteins);

2. a sample N = {1, . . . ,n} of individuals associated to the variables:these are typically the microarray (could be sequence counts).

Basic statistical modelThis can be view as

I a random vector X in Rp , whose j th entry is the j th variable,I a n-size sample (X

1, . . . ,X n), such as X i is the ith microarrays,

I could be independent identically distributed copies (steady-state)I could be dependent in a certain way (time-course data)

I assume a parametric probability distribution for X (Gaussian).

Network inference 21

Page 30: Biological Network Inference via Gaussian Graphical Models

Canonical model settingsBiological microarrays in comparable conditions

Notations

1. a set P = {1, . . . , p} of p variables:these are typically the genes (could be proteins);

2. a sample N = {1, . . . ,n} of individuals associated to the variables:these are typically the microarray (could be sequence counts).

Basic statistical modelThis can be view as

I a random vector X in Rp , whose j th entry is the j th variable,I a n-size sample (X

1, . . . ,X n), such as X i is the ith microarrays,

I could be independent identically distributed copies (steady-state)I could be dependent in a certain way (time-course data)

I assume a parametric probability distribution for X (Gaussian).

Network inference 21

Page 31: Biological Network Inference via Gaussian Graphical Models

Canonical model settingsBiological microarrays in comparable conditions

Notations

1. a set P = {1, . . . , p} of p variables:these are typically the genes (could be proteins);

2. a sample N = {1, . . . ,n} of individuals associated to the variables:these are typically the microarray (could be sequence counts).

Basic statistical modelThis can be view as

I a random vector X in Rp , whose j th entry is the j th variable,I a n-size sample (X

1, . . . ,X n), such as X i is the ith microarrays,

I could be independent identically distributed copies (steady-state)I could be dependent in a certain way (time-course data)

I assume a parametric probability distribution for X (Gaussian).

The data

Stacking (X

1, . . . ,X n), we met the usual individual/variable table X

Inference X =

0

B@x

11 x

21 x

31 . . . x

p1

...x

1n x

2n x

21 . . . x

pn

1

CA

Network inference 21

Page 32: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 22

Page 33: Biological Network Inference via Gaussian Graphical Models

Modeling relationship between variables (1)Independence

Definition (Independence of events)

Two events A and B are independent if and only if

P(A,B) = P(A)P(B),

which is usually denoted by A ?? B . Equivalently,

IA ?? B , P(A|B) = P(A),

IA ?? B , P(A|B) = P(A|Bc

)

Example (class vs party)

partyclass Labour Tory

working 0.42 0.28bourgeoisie 0.06 0.24

partyclass Labour Tory

working 0.60 0.40bourgeoisie 0.20 0.80

Table: Joint probability (left) vs. conditional probability (right)Network inference 23

Page 34: Biological Network Inference via Gaussian Graphical Models

Modeling relationship between variables (1)Independence

Definition (Independence of events)

Two events A and B are independent if and only if

P(A,B) = P(A)P(B),

which is usually denoted by A ?? B . Equivalently,

IA ?? B , P(A|B) = P(A),

IA ?? B , P(A|B) = P(A|Bc

)

Example (class vs party)

partyclass Labour Tory

working 0.42 0.28bourgeoisie 0.06 0.24

partyclass Labour Tory

working 0.60 0.40bourgeoisie 0.20 0.80

Table: Joint probability (left) vs. conditional probability (right)Network inference 23

Page 35: Biological Network Inference via Gaussian Graphical Models

Modeling relationships between variables (2)Conditional independence

Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with

Definition (Conditional independence of events)

Two events A and B are independent if and only if

P(A,B |C ) = P(A|C )P(B |C ),

which is usually denoted by A ?? B |CExample (Does QI depends on weight?)

Consider the events A = ”having low QI”, B = ”having low weight”.

Network inference 24

Page 36: Biological Network Inference via Gaussian Graphical Models

Modeling relationships between variables (2)Conditional independence

Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with

Definition (Conditional independence of events)

Two events A and B are independent if and only if

P(A,B |C ) = P(A|C )P(B |C ),

which is usually denoted by A ?? B |CExample (Does QI depends on weight?)

Consider the events A = ”having low QI”, B = ”having low weight”.

Network inference 24

Page 37: Biological Network Inference via Gaussian Graphical Models

Modeling relationships between variables (2)Conditional independence

Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with

Definition (Conditional independence of events)

Two events A and B are independent if and only if

P(A,B |C ) = P(A|C )P(B |C ),

which is usually denoted by A ?? B |CExample (Does QI depends on weight?)

Consider the events A = ”having low QI”, B = ”having low weight”.Estimating2 P(A,B), P(A) and P(B) in a sample would lead to

P(A,B) 6= P(A)P(B)

2stupidly

Network inference 24

Page 38: Biological Network Inference via Gaussian Graphical Models

Modeling relationships between variables (2)Conditional independence

Generalizing to more than two events requires strong assumptions(mutual independence). Better handle with

Definition (Conditional independence of events)

Two events A and B are independent if and only if

P(A,B |C ) = P(A|C )P(B |C ),

which is usually denoted by A ?? B |CExample (Does QI depends on weight?)

Consider the events A = ”having low QI”, B = ”having low weight”.But in fact, introducing C = ”having a given age”,

P(A,B |C ) = P(A|C )P(B |C )

Network inference 24

Page 39: Biological Network Inference via Gaussian Graphical Models

Independence of random vectors (1)Independence and Conditional independence: natural generalization

DefinitionConsider 3 random vector X ,Y ,Z with distribution fX , fY , fZ , jointlyfXY , fXYZ . Then,

IX and Y are independent iif fXY (x , y) = fX (x )fY (y);

IX and Y are conditionally independent on Z , z : fZ (z ) > 0 iiffXY |Z (x , y ; z ) = fX |Z (x ; z )fY |Z (y ; z ).

Proposition (Factorization criterion)

X and Y are independent (resp. conditionally independent on Z ) iif

there exists functions g and h such as, for all x and y

1. fXY (x , y) = g(x )h(y),

2. fXYZ (x , y , z ) = g(x , z )h(y , z ), for all z fZ (z ) > 0.

Network inference 25

Page 40: Biological Network Inference via Gaussian Graphical Models

Independence of random vectors (1)Independence and Conditional independence: natural generalization

DefinitionConsider 3 random vector X ,Y ,Z with distribution fX , fY , fZ , jointlyfXY , fXYZ . Then,

IX and Y are independent iif fXY (x , y) = fX (x )fY (y);

IX and Y are conditionally independent on Z , z : fZ (z ) > 0 iiffXY |Z (x , y ; z ) = fX |Z (x ; z )fY |Z (y ; z ).

Proposition (Factorization criterion)

X and Y are independent (resp. conditionally independent on Z ) iif

there exists functions g and h such as, for all x and y

1. fXY (x , y) = g(x )h(y),

2. fXYZ (x , y , z ) = g(x , z )h(y , z ), for all z fZ (z ) > 0.

Network inference 25

Page 41: Biological Network Inference via Gaussian Graphical Models

Independence of random vectors (2)Independence vs Conditional independence

f ;X ?? Z |Y

f ;X ?? Y |Z

f ;Y ?? Z |X

f ; fXYZ

f ; fX fY fZ

Figure: Mutual independence, Conditional dependence, full dependence.Network inference 26

Page 42: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 27

Page 43: Biological Network Inference via Gaussian Graphical Models

Definition

DefinitionA graphical model gives a graphical (intuitive) representation of thedependence structure of a probability distribution.

Graphical structure $ Random variables/Random vector

It links

1. a random vector (or a set of random variables.) X = {X1, . . . ,Xp}with distribution P,

2. a graph G = (P, E) whereI P = {1, . . . , p} is the set of nodes associated to each variable,I E is a set of edges describing the dependence relationship of X ⇠ P.

Network inference 28

Page 44: Biological Network Inference via Gaussian Graphical Models

Definition

DefinitionA graphical model gives a graphical (intuitive) representation of thedependence structure of a probability distribution.

Graphical structure $ Random variables/Random vector

It links

1. a random vector (or a set of random variables.) X = {X1, . . . ,Xp}with distribution P,

2. a graph G = (P, E) whereI P = {1, . . . , p} is the set of nodes associated to each variable,I E is a set of edges describing the dependence relationship of X ⇠ P.

Network inference 28

Page 45: Biological Network Inference via Gaussian Graphical Models

Conditional Independence GraphsDefinition

DefinitionThe conditional independence graph of a random vector X is theundirected graph G = {P, E} with the set of node P = {1, . . . , p} andwhere

(i , j ) /2 E , Xi ?? Xj |P\{i , j}.

PropertyIt owns the Markov property: any two subsets of variables separated by athird is independent conditionally on variables in the third set.

Network inference 29

Page 46: Biological Network Inference via Gaussian Graphical Models

Conditional Independence GraphsDefinition

DefinitionThe conditional independence graph of a random vector X is theundirected graph G = {P, E} with the set of node P = {1, . . . , p} andwhere

(i , j ) /2 E , Xi ?? Xj |P\{i , j}.

PropertyIt owns the Markov property: any two subsets of variables separated by athird is independent conditionally on variables in the third set.

Network inference 29

Page 47: Biological Network Inference via Gaussian Graphical Models

Conditional Independence GraphsAn example

Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.

Apply the factorization property

fX (x ) = exp(u + x1 + x1x2 + x2x3x4)

= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)

Graphical representation

G = (P, E) such as P = {1, 2, 3, 4}and

E =

1 2

3

4

Network inference 30

Page 48: Biological Network Inference via Gaussian Graphical Models

Conditional Independence GraphsAn example

Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.

Apply the factorization property

fX (x ) = exp(u + x1 + x1x2 + x2x3x4)

= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)

Graphical representation

G = (P, E) such as P = {1, 2, 3, 4}and

E = {?}

1 2

3

4

Network inference 30

Page 49: Biological Network Inference via Gaussian Graphical Models

Conditional Independence GraphsAn example

Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.

Apply the factorization property

fX (x ) = exp(u + x1 + x1x2 + x2x3x4)

= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)

Graphical representation

G = (P, E) such as P = {1, 2, 3, 4}and

E = {(1, 2)}

1 2

3

4

Network inference 30

Page 50: Biological Network Inference via Gaussian Graphical Models

Conditional Independence GraphsAn example

Let X1,X2,X3,X4 be four random variables with joint probability densityfunction fX (x ) = exp(u + x1 + x1x2 + x2x3x4) with u a given constant.

Apply the factorization property

fX (x ) = exp(u + x1 + x1x2 + x2x3x4)

= exp(u) · exp(x1 + x1x2) · exp(x2x3x4)

Graphical representation

G = (P, E) such as P = {1, 2, 3, 4}and

E = {(2, 3), (3, 4), (2, 4)}

1 2

3

4

Network inference 30

Page 51: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)Motivation

Limitation of undirected graphsSometimes an ordering on the variables is known, which allows to breakthe symmetry in the graphical representation to introduce, in some sense,“causality” in the modeling.

Consequences

I Each element of E has to be directed.

I There are no directed cycle in the graph.

We thus deal with a directed acyclic graph (or DAG).

Network inference 31

Page 52: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)Definition

Definition (Ordering)

An ordering between variables {1, . . . , p} is a relation � such that: i) forall couple (i , j ), either i � j or j � i , ii) � is transitive iii) � is notreflexive.

I A natural ordering is obtained when variables are observed acrosstime,

I A natural conditioning set for a pair of variables (i , j ) is the past,denoted P(j ) = 1, . . . , j for j .

Definition (DAG)

The directed conditional dependence graph of X is the directed graphG = (P, E�

) where

(i , j ) such as i � j /2 E , Xj ?? Xi |P(j )\{i , j}.

Network inference 32

Page 53: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)Definition

Definition (Ordering)

An ordering between variables {1, . . . , p} is a relation � such that: i) forall couple (i , j ), either i � j or j � i , ii) � is transitive iii) � is notreflexive.

I A natural ordering is obtained when variables are observed acrosstime,

I A natural conditioning set for a pair of variables (i , j ) is the past,denoted P(j ) = 1, . . . , j for j .

Definition (DAG)

The directed conditional dependence graph of X is the directed graphG = (P, E�

) where

(i , j ) such as i � j /2 E , Xj ?? Xi |P(j )\{i , j}.

Network inference 32

Page 54: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)Factorization and Markov property

Another view is a parent/descendant relationships to deal with theordering of the nodes:

The factorization property

fX (x ) =

pY

k=1

fXk |pak (xk |pak ),

where pak are the parents of node k .

Network inference 33

Page 55: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3

fX6|X4fX7|X4,X5

.

Network inference 34

Page 56: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1 · · · fX2fX3fX4|X1,X2,X3fX5|X1,X3

fX6|X4fX7|X4,X5

.

Network inference 34

Page 57: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1fX2 · · · fX3fX4|X1,X2,X3fX5|X1,X3

fX6|X4fX7|X4,X5

.

Network inference 34

Page 58: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1fX2fX3 · · · fX4|X1,X2,X3fX5|X1,X3

fX6|X4fX7|X4,X5

.

Network inference 34

Page 59: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3· · · fX5|X1,X3

fX6|X4fX7|X4,X5

.

Network inference 34

Page 60: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3

· · · fX6|X4fX7|X4,X5

.

Network inference 34

Page 61: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3

fX6|X4· · · fX7|X4,X5

.

Network inference 34

Page 62: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)An example

x1

x2 x3

x4 x5

x6 x7

fX (x ) = ?fX1fX2fX3fX4|X1,X2,X3fX5|X1,X3

fX6|X4fX7|X4,X5

.

Network inference 34

Page 63: Biological Network Inference via Gaussian Graphical Models

Directed Acyclic conditional independence Graph (DAG)Markov property

Local Markov propertyFor any Y 2 dek where dek are the descendants of k , then

Xk ?? Y | pak ,

that is, Xk is conditionally independent on its non-descendants given itsparents.

Network inference 35

Page 64: Biological Network Inference via Gaussian Graphical Models

Local Markov property: example

x1

x2 x3

x4 x5

Check that x4 ?? x5 | {x2, x3}, byusing the factorization property.

P(x4|x5, x2, x3) =

P(x2, x3, x4, x5)

P(x2, x3, x5)

=

P(x2)P(x3)P(x4|x2, x3)P(x5|x3)P(x2)P(x3)P(x5|x3)

= P(x4|x2, x3).

Network inference 36

Page 65: Biological Network Inference via Gaussian Graphical Models

Local Markov property: example

x1

x2 x3

x4 x5

Check that x4 ?? x5 | {x2, x3}, byusing the factorization property.

P(x4|x5, x2, x3) =

P(x2, x3, x4, x5)

P(x2, x3, x5)

=

P(x2)P(x3)P(x4|x2, x3)P(x5|x3)P(x2)P(x3)P(x5|x3)

= P(x4|x2, x3).

Network inference 36

Page 66: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 37

Page 67: Biological Network Inference via Gaussian Graphical Models

Modeling the genomic dataGaussian assumption

The data

Inference X =

0

B@x

11 x

21 x

31 . . . x

p1

...x

1n x

2n x

21 . . . x

pn

1

CA

Assuming fX (X) multivariate Gaussian

Greatly simplifies the inference:

naturally links independence and conditional independence to thecovariance and partial covariance,

gives a straightforward interpretation to the graphical modelingpreviously considered.

Network inference 38

Page 68: Biological Network Inference via Gaussian Graphical Models

Modeling the genomic dataGaussian assumption

The data

Inference X =

0

B@x

11 x

21 x

31 . . . x

p1

...x

1n x

2n x

21 . . . x

pn

1

CA

Assuming fX (X) multivariate Gaussian

Greatly simplifies the inference:

naturally links independence and conditional independence to thecovariance and partial covariance,

gives a straightforward interpretation to the graphical modelingpreviously considered.

Network inference 38

Page 69: Biological Network Inference via Gaussian Graphical Models

Start gently with the univariate Gaussian distribution

The Gaussian distribution is thenatural model for the level ofexpression of gene (noisy data).

We note X ⇠ N (µ, �2), so as EX = µ, VarX = �2 and

fX (x ) =

1

�p

2⇡exp

⇢� 1

2�2(x � µ)

2

�,

and

log fX (x ) = � log �p

2⇡ � 1

2�2(x � µ)

2.

Useless for modeling the distribution of expression level for a wholebunch of genes.

Network inference 39

Page 70: Biological Network Inference via Gaussian Graphical Models

Start gently with the univariate Gaussian distribution

The Gaussian distribution is thenatural model for the level ofexpression of gene (noisy data).

We note X ⇠ N (µ, �2), so as EX = µ, VarX = �2 and

fX (x ) =

1

�p

2⇡exp

⇢� 1

2�2(x � µ)

2

�,

and

log fX (x ) = � log �p

2⇡ � 1

2�2(x � µ)

2.

Useless for modeling the distribution of expression level for a wholebunch of genes.

Network inference 39

Page 71: Biological Network Inference via Gaussian Graphical Models

One step forward: bivariate Gaussian distributionNeed concepts of covariance and correlation

Let X ,Y be two real random variables.

Definitions

cov(X ,Y ) = Eh�X � E(X )

��Y � E(Y )

�i= E(XY ) � E(X )E(Y ).

⇢XY = cor(X ,Y ) =

cov(X ,Y )pVar(X ) · Var(Y )

.

Proposition

Icov(X ,X ) = Var(X ) = E[(X � EX )(Y � EY )],

Icov(X + Y ,Z ) = cov(X ,Z ) + cov(X ,Z ),

IVar(X + Y ) = Var(X ) + Var(Y ) + 2cov(X ,Y ).

IX ?? Y ) cov(X ,Y ) = 0.

IX ?? Y , cov(X ,Y ) = 0 when X ,Y are Gaussian.

Network inference 40

Page 72: Biological Network Inference via Gaussian Graphical Models

One step forward: bivariate Gaussian distributionNeed concepts of covariance and correlation

Let X ,Y be two real random variables.

Definitions

cov(X ,Y ) = Eh�X � E(X )

��Y � E(Y )

�i= E(XY ) � E(X )E(Y ).

⇢XY = cor(X ,Y ) =

cov(X ,Y )pVar(X ) · Var(Y )

.

Proposition

Icov(X ,X ) = Var(X ) = E[(X � EX )(Y � EY )],

Icov(X + Y ,Z ) = cov(X ,Z ) + cov(X ,Z ),

IVar(X + Y ) = Var(X ) + Var(Y ) + 2cov(X ,Y ).

IX ?? Y ) cov(X ,Y ) = 0.

IX ?? Y , cov(X ,Y ) = 0 when X ,Y are Gaussian.

Network inference 40

Page 73: Biological Network Inference via Gaussian Graphical Models

The bivariate Gaussian distribution

fXY (x , y) =

1p2⇡ det ⌃

exp{1

2

�x � µ1 y � µ2

�⌃

�1

✓x � µ1

y � µ2

◆}

where ⌃ is the variance/covariance matrix which is symmetric andpositive definite.

⌃ =

✓Var(X ) cov(Y ,X )

cov(Y ,X ) Var(Y )

◆.

and

fX ,Y (x , y) =

1p2⇡(1 � ⇢2XY )

exp � 1

2(1 � ⇢2XY )

(x

2+ y

2+ 2⇢XY xy),

where ⇢XY is the correlation between X ,Y and describe the interactionbetween them.

Network inference 41

Page 74: Biological Network Inference via Gaussian Graphical Models

The bivariate Gaussian distribution

fXY (x , y) =

1p2⇡ det ⌃

exp{1

2

�x � µ1 y � µ2

�⌃

�1

✓x � µ1

y � µ2

◆}

where ⌃ is the variance/covariance matrix which is symmetric andpositive definite. If standardized,

⌃ =

✓1 ⇢XY

⇢XY 1

◆.

and

fX ,Y (x , y) =

1p2⇡(1 � ⇢2XY )

exp � 1

2(1 � ⇢2XY )

(x

2+ y

2+ 2⇢XY xy),

where ⇢XY is the correlation between X ,Y and describe the interactionbetween them.

Network inference 41

Page 75: Biological Network Inference via Gaussian Graphical Models

The bivariate Gaussian distribution

The Covariance MatrixLet

X ⇠ N (0,⌃),

with unit variance and⇢XY = 0

⌃ =

✓1 0

0 1

◆.

The shape of the 2-Ddistribution evolvesaccordingly.

Network inference 42

Page 76: Biological Network Inference via Gaussian Graphical Models

The bivariate Gaussian distribution

The Covariance MatrixLet

X ⇠ N (0,⌃),

with unit variance and⇢XY = 0.9

⌃ =

✓1 0.9

0.9 1

◆.

The shape of the 2-Ddistribution evolvesaccordingly.

Network inference 42

Page 77: Biological Network Inference via Gaussian Graphical Models

Full generalization: multivariate Gaussian vectorNow need partial covariance and partial correlation

Let X ,Y ,Z be real random variables.

Definitions

cov(X ,Y |Z ) = cov(X ,Y ) � cov(X ,Z )cov(Y ,Z )/Var(Z ).

⇢XY |Z =

⇢XY � ⇢XZ⇢YZq1 � ⇢2XZ

q1 � ⇢2YZ

.

Give the interaction between X and Y once removed the e↵ect of Z .

PropositionWhen X ,Y ,Z are jointly Gaussian, then

cov(X ,Y |Z ) = 0 , cor(X ,Y |Z ) = 0 , X ?? Y |Z .

Network inference 43

Page 78: Biological Network Inference via Gaussian Graphical Models

Full generalization: multivariate Gaussian vectorNow need partial covariance and partial correlation

Let X ,Y ,Z be real random variables.

Definitions

cov(X ,Y |Z ) = cov(X ,Y ) � cov(X ,Z )cov(Y ,Z )/Var(Z ).

⇢XY |Z =

⇢XY � ⇢XZ⇢YZq1 � ⇢2XZ

q1 � ⇢2YZ

.

Give the interaction between X and Y once removed the e↵ect of Z .

PropositionWhen X ,Y ,Z are jointly Gaussian, then

cov(X ,Y |Z ) = 0 , cor(X ,Y |Z ) = 0 , X ?? Y |Z .

Network inference 43

Page 79: Biological Network Inference via Gaussian Graphical Models

The multivariate Gaussian distribution

Allow to give a modeling for the expression level of a whole set of genesP:

Gaussian vectorLet X ⇠ N (µ,⌃), and assume any block decomposition with {a, b} apartition of P

⌃ =

✓⌃ab ⌃ba

⌃ab ⌃bb

◆.

Then

1. Xa is Gaussian with distribution N (µa ,⌃aa)

2. Xa |Xb = x is Gaussian with distribution N (µa|b ,⌃a|b) known.

Network inference 44

Page 80: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 45

Page 81: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 46

Page 82: Biological Network Inference via Gaussian Graphical Models

Steady-state data: scheme

⇡ 10s microarrays over time

⇡ 1000s probes (“genes”)

Inference

Which interactions?

Network inference 47

Page 83: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (1)

Model for data generation

I A microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp ,

I Consider n biological replicate in the same condition, which forms ausual n-size sample (X1, . . . ,Xn).

Consequence: a Gaussian Graphical Model

IX ⇠ N (µ,⌃) with X1, . . . ,Xn i.i.d. copies of X ,

I⇥ = (✓ij )i ,j2P , ⌃

�1 is called the concentration matrix.

Network inference 48

Page 84: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (1)

Model for data generation

I A microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp ,

I Consider n biological replicate in the same condition, which forms ausual n-size sample (X1, . . . ,Xn).

Consequence: a Gaussian Graphical Model

IX ⇠ N (µ,⌃) with X1, . . . ,Xn i.i.d. copies of X ,

I⇥ = (✓ij )i ,j2P , ⌃

�1 is called the concentration matrix.

Network inference 48

Page 85: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (2)Interpretation as a GGM

Multivariate Gaussian vector and covariance selection

� ✓ijp✓ii✓jj

= cor

�Xi ,Xj |XP\i ,j

�= ⇢ij |P\{i ,j},

Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.

conditional dependency between Xj and Xi

ornon-null partial correlation between Xj and Xi

m✓ij 6= 0

if and only ifi

j

?

Network inference 49

Page 86: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (2)Interpretation as a GGM

Multivariate Gaussian vector and covariance selection

� ✓ijp✓ii✓jj

= cor

�Xi ,Xj |XP\i ,j

�= ⇢ij |P\{i ,j},

Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.

conditional dependency between Xj and Xi

ornon-null partial correlation between Xj and Xi

m✓ij 6= 0

if and only ifi

j

?

Network inference 49

Page 87: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 50

Page 88: Biological Network Inference via Gaussian Graphical Models

Time-course data: scheme

t0t1

tn

⇡ 10s microarrays over time

⇡ 1000s probes (“genes”)

Inference

Which interactions?

Network inference 51

Page 89: Biological Network Inference via Gaussian Graphical Models

Modeling time-course data with DAG

Collecting gene expression

1. Follow-up of one single experiment/individual;

2. Close enough time-points to ensureI dependency between consecutive measurements;I homogeneity of the Markov process.

X1

X2X3

X4

X5

stands for

X

t1 X

t+11

X

t2 X

t+12

X

t+13

X

t+14G

X

t+15

Network inference 52

Page 90: Biological Network Inference via Gaussian Graphical Models

Modeling time-course data with DAG

Collecting gene expression

1. Follow-up of one single experiment/individual;

2. Close enough time-points to ensureI dependency between consecutive measurements;I homogeneity of the Markov process.

X1

X2X3

X4

X5

stands for

X

t1 X

t+11

X

t2 X

t+12

X

t+13

X

t+14G

X

t+15

Network inference 52

Page 91: Biological Network Inference via Gaussian Graphical Models

Modeling time-course data with DAG

Collecting gene expression

1. Follow-up of one single experiment/individual;

2. Close enough time-points to ensureI dependency between consecutive measurements;I homogeneity of the Markov process.

X

t1 X

21

. . .X

n1

X

12 X

22

. . .X

n2

X

13 X

23

. . .X

n3

X

14 X

24

. . .X

n4

G G GX

15 X

25

. . .X

n5

Network inference 52

Page 92: Biological Network Inference via Gaussian Graphical Models

DAG: remark

X

1

X

2X

3

X

4

X

5

Argh, there is a cycle :’(

versus

X

1t X

1t+1

X

2t X

2t+1

X

3t+1

X

4t+1G

X

5t+1

is indeed a DAG

Overcomes the rather restrictive acyclic requirement

Network inference 53

Page 93: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (1)

Model for data generationA microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp , generated through a first order vectorautoregressive process VAR(1):

X

t= ⇥X

t�1+ b + "t , t 2 [1,n]

where "t is a white noise to ensure the Markov property andX

0 ⇠ N (0,⌃0).

Consequence: a Gaussian Graphical Model

I Each X

t |X t�1 ⇠ N (✓X t�1,⌃),

I or, equivalently, X tj |X t�1 ⇠ N (⇥jX

t�1,⌃)

where ⌃ is known and ⇥j is the j th row of ⇥.

Network inference 54

Page 94: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (1)

Model for data generationA microarray can be represented as a multivariate vectorX = (X1, . . . ,Xp) 2 Rp , generated through a first order vectorautoregressive process VAR(1):

X

t= ⇥X

t�1+ b + "t , t 2 [1,n]

where "t is a white noise to ensure the Markov property andX

0 ⇠ N (0,⌃0).

Consequence: a Gaussian Graphical Model

I Each X

t |X t�1 ⇠ N (✓X t�1,⌃),

I or, equivalently, X tj |X t�1 ⇠ N (⇥jX

t�1,⌃)

where ⌃ is known and ⇥j is the j th row of ⇥.

Network inference 54

Page 95: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (2)

I

2

66666666664

X

1t

.

.X

it

.

.

.X

pt

3

77777777775

=

2

6666666666664

✓11 . . . ✓1j . . ✓1p. . . . . . . .. . . . . . . .. . . . . . . .

✓i1 . . . ✓ij . . ✓ip. . . . . . . .. . . . . . . .. . . . . . . .

✓p1 . . . ✓pj . . ✓pp

3

7777777777775

2

66666666664

X

1t�1

.

.

.

X

jt�1..

X

pt�1

3

77777777775

+

2

66666666664

b1

.

.bi

.

.

.bp

3

77777777775

+

2

66666666664

"1t.."it...

"pt

3

77777777775

I Example:

⇥ =

0

@✓11 ✓12 0

✓21 0 0

0 ✓32 0

1

A

Network inference 55

Page 96: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (3)Interpretation as a GGM

The VAR(1) as a covariance selection model

✓ij =

cov

⇣X

ti ,X t�1

j |X t�1P\j

var

⇣X

t�1j |X t�1

P\j

⌘ ,

Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.

conditional dependency between X t�1j and X t

ior

non-null partial correlation between X t�1j and X t

im

✓ij 6= 0

if and only ifi

j

?

Network inference 56

Page 97: Biological Network Inference via Gaussian Graphical Models

Modeling the underlying distribution (3)Interpretation as a GGM

The VAR(1) as a covariance selection model

✓ij =

cov

⇣X

ti ,X t�1

j |X t�1P\j

var

⇣X

t�1j |X t�1

P\j

⌘ ,

Graphical Interpretation The matrix ⇥ = (✓ij )i ,j2P encodes the network G we are looking for.

conditional dependency between X t�1j and X t

ior

non-null partial correlation between X t�1j and X t

im

✓ij 6= 0

if and only ifi

j

?

Network inference 56

Page 98: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 57

Page 99: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 58

Page 100: Biological Network Inference via Gaussian Graphical Models

The graphical models: reminderaa for goldfish-like memories

AssumptionA microarray can be represented as a multivariate Gaussian vector X .

Collecting gene expression

1. Steady-state data leads to an i.i.d. sample.

2. Time-course data gives a time series.

Graphical interpretationconditional dependency between X (i) and X (j )

ornon null partial correlation between X (i) and X (j )

if and only ifj

i

Encoded in an unknown matrix of parameters ⇥.

Network inference 59

Page 101: Biological Network Inference via Gaussian Graphical Models

The graphical models: reminderaa for goldfish-like memories

AssumptionA microarray can be represented as a multivariate Gaussian vector X .

Collecting gene expression

1. Steady-state data leads to an i.i.d. sample.

2. Time-course data gives a time series.

Graphical interpretationconditional dependency between X (i) and X (j )

ornon null partial correlation between X (i) and X (j )

if and only ifj

i?

Encoded in an unknown matrix of parameters ⇥.

Network inference 59

Page 102: Biological Network Inference via Gaussian Graphical Models

The graphical models: reminderaa for goldfish-like memories

AssumptionA microarray can be represented as a multivariate Gaussian vector X .

Collecting gene expression

1. Steady-state data leads to an i.i.d. sample.

2. Time-course data gives a time series.

Graphical interpretationconditional dependency between Xt (i) and Xt�1(j )

ornon null partial correlation between Xt (i) and Xt�1(j )

if and only ifj

i?

Encoded in an unknown matrix of parameters ⇥.

Network inference 59

Page 103: Biological Network Inference via Gaussian Graphical Models

The Maximum likelihood estimatorThe natural approach for parametric statistics

Let X be a random vector with distribution defined by fX (x ;⇥), where⇥ are the model parameters.

Maximum likelihood estimator

ˆ

⇥ = arg max

⇥L(⇥;X)

where L is the log likelihood, a function of the parameters:

L(⇥;X) = log

nY

k=1

fX (x

k;⇥),

where x

k is the k row of X.

Remarks

I This a convex optimization problem,

I We just need to detect non zero coe�cients in ⇥

Network inference 60

Page 104: Biological Network Inference via Gaussian Graphical Models

The penalized likelihood approach

Let ⇥ be the parameters to infer (the edges).

A penalized likelihood approach

ˆ

⇥� = arg max

⇥L(⇥;X) � � pen`1(⇥),

I L is the model log-likelihood,

Ipen`1 is a penalty function tuned by � > 0.

It performs1. regularization (needed when n ⌧ p),2. selection (sparsity induced by the `1-norm),

Network inference 61

Page 105: Biological Network Inference via Gaussian Graphical Models

The penalized likelihood approach

Let ⇥ be the parameters to infer (the edges).

A penalized likelihood approach

ˆ

⇥� = arg max

⇥L(⇥;X) � � pen`1(⇥),

I L is the model log-likelihood,

Ipen`1 is a penalty function tuned by � > 0.

It performs1. regularization (needed when n ⌧ p),2. selection (sparsity induced by the `1-norm),

Network inference 61

Page 106: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 62

Page 107: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityConstrained Optimization

f(�1,�

2;X

)

�2 �1

We basically want to solve a problem ofthe form

maximize

�1,�2

f (�1, �2;X)

where f is typically a concave likelihoodfunction.This is strictly equivalent to solve

minimize

�1,�2

g(�1, �2;X)

where g = �f is convex ! For instancethe square lost in the OLS.

Network inference 63

Page 108: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityConstrained Optimization

f(�1,�

2;X

)

�2 �1

(maximize

�1,�2

f (�1, �2;X)

s.t. ⌦(�1, �2) c

,

where ⌦ defines a domain thatconstrains �.

Network inference 63

Page 109: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityConstrained Optimization

�2

�1

(maximize

�1,�2

f (�1, �2;X)

s.t. ⌦(�1, �2) c

,

where ⌦ defines a domain thatconstrains �.

mmaximize

�1,�2

f (�1, �2;X) � �⌦(�1, �2)

Network inference 63

Page 110: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityConstrained Optimization

�2

�1

(maximize

�1,�2

f (�1, �2;X)

s.t. ⌦(�1, �2) c

,

where ⌦ defines a domain thatconstrains �.

mmaximize

�1,�2

f (�1, �2;X) � �⌦(�1, �2)

How shall we define ⌦ to inducesparsity?

Network inference 63

Page 111: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsitySupporting Hyperplane

An hyperplane supports a set i↵

I the set is contained in one half-space

I the set has at least one point on the hyperplane

�2

�1

Network inference 64

Page 112: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsitySupporting Hyperplane

An hyperplane supports a set i↵

I the set is contained in one half-space

I the set has at least one point on the hyperplane

�2

�1

Network inference 64

Page 113: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsitySupporting Hyperplane

An hyperplane supports a set i↵

I the set is contained in one half-space

I the set has at least one point on the hyperplane

�2

�1

Network inference 64

Page 114: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsitySupporting Hyperplane

An hyperplane supports a set i↵

I the set is contained in one half-space

I the set has at least one point on the hyperplane

�2

�1

Network inference 64

Page 115: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsitySupporting Hyperplane

An hyperplane supports a set i↵

I the set is contained in one half-space

I the set has at least one point on the hyperplane

�2

�1

There are Supporting Hyperplane at all points of convex sets:Generalize tangents

Network inference 64

Page 116: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsitySupporting Hyperplane

An hyperplane supports a set i↵

I the set is contained in one half-space

I the set has at least one point on the hyperplane

�2

�1

�2

�1

Network inference 64

Page 117: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityDual Cone

Generalizes normals

�2

�1

�2

�1

�2

�1

Network inference 65

Page 118: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityDual Cone

Generalizes normals

�2

�1

�2

�1

�2

�1

Network inference 65

Page 119: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityDual Cone

Generalizes normals

�2

�1

�2

�1

�2

�1

Network inference 65

Page 120: Biological Network Inference via Gaussian Graphical Models

A Geometric View of SparsityDual Cone

Generalizes normals

�2

�1

�2

�1

�2

�1

Shape of dual cones ) sparsity pattern

Network inference 65

Page 121: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 66

Page 122: Biological Network Inference via Gaussian Graphical Models

The LASSO

R. Tibshirani, 1996.The Lasso: Least Absolute Shrinkage and Selection Operator

S. Chen , D. Donoho , M. Saunders, 1995.Basis Pursuit.

Weisberg, 1980.Forward Stagewise regression.

3.2. Régularisations `p 23

�ls

�`1�1

�2

�ls

�`2

�1

�2

Fig. 3.2 – Comparaisons des solutions de problèmes régularisés par une norme `1 et `2.

À gauche de la figure 3.2, �`1 est l’estimateur du problème (3.2) régularisépar une norme `1. La deuxième composante de �`1 est annulée, car l’ellipseatteint la région admissible sur l’angle situé sur l’axe �2 = 0. À droite de lafigure 3.2, �`2 est l’estimateur du problème (3.2) régularisé par une norme`2. La forme circulaire de la région admissible n’incite pas les coefficientsà atteindre des valeurs nulles.

Afin de poursuivre cette discussion avec des arguments à la fois simpleset formels, on peut donner l’expression d’un coefficient des estimateurs�`1 et �`2 , lorsque la matrice X est orthogonale (ce qui correspond à descontours circulaires pour la fonction de perte quadratique). Pour �`2, nousavons

�`2m =

11 + �

�lsm .

Les coefficients subissent un rétrécissement2 proportionnel par le biais dufacteur 1 / (1 + �). En particulier, �`2

m ne peut être nul que si le coefficient�lsm est lui même exactement nul. Pour �`1, nous avons

�`1m = sign

⇣�lsm

⌘ ⇣|�ls

m| � �⌘

+

,

où [u]

+

= max(0, u). On obtient ainsi un seuillage « doux » : les compo-santes des coefficients des moindres carrés sont rétrécies d’une constante �lorsque |�ls

m| > � , et sont annulés sinon.

Stabilité

Définition 3.2 Stabilité—Selon Breiman [1996], un problème est instable si pour des ensemblesd’apprentissage similaires mais pas identiques (petites perturbations), on obtientdes prédictions ou des estimateurs très différents (grande perturbation).

Remarque 3.5 — Bousquet et Elisseeff [2002] ont défini de façon formelledifférentes notions de stabilité, basées sur le comportement des estima-teurs quand l’échantillon d’apprentissage est perturbé par le retrait ou leremplacement d’un exemple. �

2Shrinkage, en anglais.

(minimize

�2R2ky � X�k22,

s.t. k�k1 = |�1| + |�2| c.

mminimize

�2Rky � X�k22 + �k�k1.

Network inference 67

Page 123: Biological Network Inference via Gaussian Graphical Models

Orthogonal case and link to the OLS

OLS shrinkageThe Lasso has no analytical solution but in the orthogonal case: whenX

|X = I (never for real data),

ˆ�lassoj = sign(

ˆ�olsj ) max(0, |ˆ�ols

j | � �).

�ols

Lasso

OLS

�4

�4

�2

�2

00

2

2

4

4

Network inference 68

Page 124: Biological Network Inference via Gaussian Graphical Models

LARs: Least angle regression

B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, 2004.Least Angle Regression.

E�cient algorithm to compute the Lasso solutionsThe LARS solution consists of a curve denoting the solution � for eachvalue of �.

I construct a piecewise linear path of solution starting from the nullvector towards the OLS estimate,

I (Almost) the same cost as OLS,

I well adapted to cross validation (help us to choose �).

Network inference 69

Page 125: Biological Network Inference via Gaussian Graphical Models

Example: prostate cancer I

Lasso solution path with Lars

> library(lars)

> load("prostate.rda")

> x <- as.matrix(x)

> x <- scale(as.matrix(x))

> out <- lars(x,y)

> plot(out)

Network inference 70

Page 126: Biological Network Inference via Gaussian Graphical Models

Example: prostate cancer II

*

**

* * * * **

0.0 0.2 0.4 0.6 0.8 1.0

02

46

|beta|/max|beta|

Stan

dard

ized

Coe

ffici

ents

* * *

* * * * * *

* * * * * **

*

*

* * * * **

* **

* **

* ** * *

*

* * * * * * * *

*

* * * * * * * * ** * * * *

* * **

LASSO

37

82

1

0 1 3 5 6 7 8

Network inference 71

Page 127: Biological Network Inference via Gaussian Graphical Models

Choice of the tuning parameter � I

Model selection criteria

BIC(�) = ky � X

ˆ��k22 � df(

ˆ��)log n

2

AIC(�) = ky � X

ˆ��k22 � df(

ˆ��)

where df(

ˆ��) is the number of nonzero entries in ��.

Cross-validation

1. split the data into K folds,

2. use successively each K fold as the testing set,

3. compute the test error on this K folds,

4. average to obtain the CV estimation of the test error.

� is chosen to minimize the CV test error.

Network inference 72

Page 128: Biological Network Inference via Gaussian Graphical Models

Choice of the tuning parameter � II

CV choice for �

> cv.lars(x,y, K=10)

Network inference 73

Page 129: Biological Network Inference via Gaussian Graphical Models

Choice of the tuning parameter � III

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.6

0.8

1.0

1.2

1.4

1.6

Fraction of final L1 norm

Cro

ss−V

alid

ated

MSE

Network inference 74

Page 130: Biological Network Inference via Gaussian Graphical Models

Many variations

Group-LassoActivate the variables by group (given by the user).

Adaptive/Weighted-Lasso

Adjust the penalty level to each variables, according to prior knowledge orwith data driven weights.

BoLassoBootstrapped version that removes false positives/stabilizes the estimate.

etc.

+ many theoretical results.

Network inference 75

Page 131: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 76

Page 132: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 77

Page 133: Biological Network Inference via Gaussian Graphical Models

Problem

t0t1

tn

⇡ 10s microarrays over time

⇡ 1000s probes (“genes”)

Inference

Which interactions?

The main statistical issue is the high dimensional setting.

Network inference 78

Page 134: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of the dataBy introducing some prior

Priors should be biologically grounded

1. few genes e↵ectively interact (sparsity),

2. networks are organized (latent clustering),

G1

G2

G3

G4 G5

G6

G7

G8

G9

G10

G11

G12

G13

Network inference 79

Page 135: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of the dataBy introducing some prior

Priors should be biologically grounded

1. few genes e↵ectively interact (sparsity),

2. networks are organized (latent clustering),

G1

G2

G3

G4 G5

G6

G7

G8

G9

G10

G11

G12

G13

Network inference 79

Page 136: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of the dataBy introducing some prior

Priors should be biologically grounded

1. few genes e↵ectively interact (sparsity),

2. networks are organized (latent clustering),

A1

A2

A3

A4 A

B1

B2

B3

B4

B5

B

C1

C

Network inference 79

Page 137: Biological Network Inference via Gaussian Graphical Models

Penalized log-likelihood

Banerjee et al., JMLR 2008

ˆ

⇥� = arg max

⇥Liid(⇥;S) � �k⇥k`1 ,

e�ciently solved by the graphical Lasso of Friedman et al, 2008.

Ambroise, Chiquet, Matias, EJS 2009Use adaptive penalty parameters for di↵erent coe�cients

Liid(⇥;S) � �kPZ ? ⇥k`1 ,

where PZ is a matrix of weights depending on the underlying clusteringZ.

Works with the pseudo log-likelihood (computationally e�cient).

Network inference 80

Page 138: Biological Network Inference via Gaussian Graphical Models

Penalized log-likelihood

Banerjee et al., JMLR 2008

ˆ

⇥� = arg max

⇥Liid(⇥;S) � �k⇥k`1 ,

e�ciently solved by the graphical Lasso of Friedman et al, 2008.

Ambroise, Chiquet, Matias, EJS 2009Use adaptive penalty parameters for di↵erent coe�cients

˜Liid(⇥;S) � �kPZ ? ⇥k`1 ,

where PZ is a matrix of weights depending on the underlying clusteringZ.

Works with the pseudo log-likelihood (computationally e�cient).

Network inference 80

Page 139: Biological Network Inference via Gaussian Graphical Models

Neighborhood selection (1)

Let

IXi be the ith column of X,

IX\i be X deprived of Xi .

Xi = X\i� + ", where �j = �✓ij✓ii

.

Meinshausen and Bulhman, 2006Since sign(corij |P\{i ,j}) = sign(�j ), select the neighbors of i with

arg min

1

n

��Xi � X\i�

��22+ � k�k`1 .

The sign pattern of ⇥� is inferred after a symmetrization step.

Network inference 81

Page 140: Biological Network Inference via Gaussian Graphical Models

Neighborhood selection (2)

The pseudo log-likelihood of the i.i.d Gaussian sample is

˜Liid(⇥;S) =

pX

i=1

nX

k=1

logP(Xk (i)|Xk (P\i);⇥i)

!,

=

n

2

log det(D) � n

2

Trace

⇣D

�1/2⇥S⇥D

�1/2⌘

� n

2

log(2⇡),

where D = diag(⇥).

Proposition

ˆ

pseudo

� = arg max

⇥:✓ij 6=✓ii

˜Liid

(⇥;S) � � k⇥k`1has the same null entries as inferred by neighborhood selection.

Network inference 82

Page 141: Biological Network Inference via Gaussian Graphical Models

Structured regularizationIntroduce prior knowledge

Building the weights

1. Build w from prior biological informationI transcription factors vs. regulatees,I number of potential binding sites,I KEGG pathways, Gene Ontology . . .

2. Build the weights matrix from clustering algorithmI Infer the network G0 with w = 1 for each node,I Apply a clustering algorithm on G0,I Re-Infer G with w built according to the clustering Z.

Network inference 83

Page 142: Biological Network Inference via Gaussian Graphical Models

Structured regularization“Bayesian” interpretation of `1 regularization

Laplacian prior on ⇥ depends on the clustering Z

P(⇥|Z) /Y

j

exp {�� · wkj (Z)|✓kj |} .

wk(Z) summarizes prior information on the position of edges fornode k

-1.0 -0.5 0.0 0.5 1.0

0.2

0.4

0.6

0.8

1.0

akj

prior

distribution

�wkj(Z) = 0.5�wkj(Z) = 1

Network inference 84

Page 143: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 85

Page 144: Biological Network Inference via Gaussian Graphical Models

Multiple SclerosisStudying treatment e↵ect on 31 selected genes

Collaboration started with J.-C. Corvol, La pitie Salpetriere (and C.Charbonnier, C. Ambroise).

The data set

I 31 genes identified as implied in the MS physiopatholgy,

I Each patient received either a placebo or the treatment at t = 0,

I Gene expression are recorded through microarray along time for eachpatient (time-course data),

I full data set: of 100 microarrays, 25 patients (⇡ 75%/25%

treatment/placebo), 4 time points.

Comparing networks inferred under treatment or placebo.

Network inference 86

Page 145: Biological Network Inference via Gaussian Graphical Models

Multiple sclerosisDetecting changes in the regulatory networks

Active

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Placebo

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Intersection

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Difference

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Active

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Placebo

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Intersection

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Difference

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

I Better insight that di↵erential analysis,

I networks try to tell us how genes vary together,

I hope that sometimes an edge describes regulations.

Network inference 87

Page 146: Biological Network Inference via Gaussian Graphical Models

Multiple sclerosisDetecting changes in the regulatory networks

Active

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Placebo

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Intersection

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Difference

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Active

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Placebo

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Intersection

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Difference

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

I Better insight that di↵erential analysis,

I networks try to tell us how genes vary together,

I hope that sometimes an edge describes regulations.

Network inference 87

Page 147: Biological Network Inference via Gaussian Graphical Models

Multiple sclerosisDetecting changes in the regulatory networks

Active

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Placebo

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Intersection

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Difference

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Active

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Placebo

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Intersection

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

Difference

IL2RA

ITGAM

JAK1

STAT5A

STAT3

ITGA6

IL2RB

CD58

ITGB2

IL2RA.1ITGAL

STAT3.1

VCAM1.1

IL2

VCAM1.2

TOB1

JAK3

IL7.1

STAT3.2

ITGA6.1

CD58.1

IL2RGIL21

STAT5B

ICAM1

ITGB2.1

ITGA4

I Better insight that di↵erential analysis,

I networks try to tell us how genes vary together,

I hope that sometimes an edge describes regulations.

Network inference 87

Page 148: Biological Network Inference via Gaussian Graphical Models

E. coli S.O.S DNA repair network

E.coli S.O.S data

Assigning numbers to the arrows: Parameterizing agene regulation network by using accurateexpression kineticsMichal Ronen†, Revital Rosenberg†, Boris I. Shraiman‡, and Uri Alon†§¶

Departments of †Molecular Cell Biology and §Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel; and ‡Bell Laboratories,Lucent Technologies, Murray Hill, NJ 07974

Edited by David Botstein, Stanford University School of Medicine, Stanford, CA, and approved June 5, 2002 (received for review January 28, 2002)

A basic challenge in systems biology is to understand the dynam-ical behavior of gene regulation networks. Current approaches aimat determining the network structure based on genomic-scale data.However, the network connectivity alone is not sufficient to defineits dynamics; one needs to also specify the kinetic parameters forthe regulation reactions. Here, we ask whether effective kineticparameters can be assigned to a transcriptional network based onexpression data. We present a combined experimental and theo-retical approach based on accurate high temporal-resolution mea-surement of promoter activities from living cells by using greenfluorescent protein (GFP) reporter plasmids. We present algorithmsthat use these data to assign effective kinetic parameters within amathematical model of the network. To demonstrate this, weemploy a well defined network, the SOS DNA repair system ofEscherichia coli. We find a strikingly detailed temporal program ofexpression that correlates with the functional role of the SOSgenes and is driven by a hierarchy of effective kinetic parameterstrengths for the various promoters. The calculated parameters canbe used to determine the kinetics of all SOS genes given theexpression profile of just one representative, allowing a significantreduction in complexity. The concentration profile of the masterSOS transcriptional repressor can be calculated, demonstratingthat relative protein levels may be determined from purely tran-scriptional data. This finding opens the possibility of assigningkinetic parameters to transcriptional networks on a genomic scale.

There is much interest in understanding the design principlesunderlying the structure and dynamics of gene regulation

networks (1–10, 36). Determining the dynamic behavior of thesesystems requires specifying not only the network connectivity,but also the kinetic parameters for the various regulationreactions. Standard biochemical methods of measuring thesekinetic parameters are usually done outside of the cellularcontext and cannot be easily scaled up to a genomic level. Itwould therefore be valuable to develop methods to assigneffective kinetic parameters to transcriptional networks based onin vivo measurements. Here we present an approach for deter-mining the effective kinetic parameters of a transcriptionalnetwork based on accurate promoter activity measurements andanalysis algorithms (Fig. 1).

We developed a system for real-time monitoring of thetranscriptional activity of operons by means of low-copy reporterplasmids (10) in which a promoter controls green fluorescentprotein (GFP) (11). In each plasmid a different promotercontrols the transcription rate of the same reporter gene, gfp, andthus rate of transcript production from the promoter is propor-tional to the rate of GFP accumulation. By continuous mea-surements from living cells grown in a multiwell plate fluorim-eter, high-resolution time courses of the promoter strength andcell density are obtained. With this method, temporal resolutionof minutes can be achieved. This process complements, at higheraccuracy, the genomic-scale perspective given by DNA microar-rays (12). In a previous study, we demonstrated that this ap-proach can be used to determine the order of genes in an

assembly pathway (10). Here, we extend it by presenting analysisalgorithms that use accurate expression data to assign kineticparameters that can be incorporated into a mathematical modelof the dynamics.

We apply this method to a well characterized transcriptionalnetwork, the SOS DNA repair system in Escherichia coli. TheSOS system includes about 30 operons regulated at the tran-scriptional level (12–16). A master repressor (LexA) binds sitesin the promoter regions of these operons (16, 17). One of theSOS proteins, RecA, acts as a sensor of DNA damage: by bindingto single-stranded DNA it becomes activated and mediates LexAautocleavage. The drop in LexA levels causes the de-repressionof the SOS genes (Fig. 2). Once damage has been repaired orbypassed, the level of activated RecA drops, LexA accumulatesand represses the SOS operons, and the cells return to theiroriginal state.

We demonstrate that effective kinetic parameters can be usedto detect SOS genes with additional regulation, capture thetemporal transcriptional program, and calculate the concentra-tion profile of the regulatory protein.

MethodsPlasmids and Strains. Promoter regions were amplified fromMG1655 genomic DNA by using PCR and the following startand end coordinates for the primers taken from the sequencedE. coli genome (18): uvrA (4271368–4271753), uvrD (3995429–3995664), lexA (4254491–4254751), recA (2821707–2821893),ruvA (1943919–1944201), polB (65704–65932), umuD (1229552–1230069), uvrY (1993282–1993900), and lacZ (365438–365669).Each amplified region includes the entire region between ORFswith an additional 50–150 bp into each of the flanking ORFs.The promoter regions were cloned by using XhoI and BamHIsites upstream of a promoterless GFPmut3 gene in a low copypSC101 origin plasmid as described (10). The plasmids weretransformed into E. coli strain AB1157 [argE3, his4, leuB6,proA2, thr1, ara14, galK2, lacY1, mtl1, xyl5, thi1, tsx33, rpsL31, andsupE44] (24).

Culture and Measurements. Cultures of strain AB1157 (1 ml)inoculated from glycerol frozen stocks were grown for 16 h in LBmedium with kanamycin (25 !g!ml) at 37°C with shaking at 250rpm. The cultures were diluted 1:100 into defined medium (24)[M9 supplemented with thiamine (10 !g!ml), glucose (2 mg!ml), MgSO4 (1 mM), MgCl2 (0.1 mM), thymine (20 !g!ml), eachof the 20 aa except tryptophan (50 !g!ml) ! 25 !g!ml kana-mycin], at a final volume of 100 !l per well in a flat-bottom96-well plate (Sarstedt). The cultures were covered with anadhesive pad to prevent evaporation and grown in a WallacVictor2 multiwell f luorimeter at 37°C, set with an automatically

This paper was submitted directly (Track II) to the PNAS office.

Abbreviation: GFP, green fluorescent protein.¶To whom reprint requests should be addressed. E-mail: [email protected].

www.pnas.org!cgi!doi!10.1073!pnas.152046799 PNAS " August 6, 2002 " vol. 99 " no. 16 " 10555–10560

GEN

ETIC

S

I 8 major genes involved in theS.O.S. response

I 50 time points

uvrD

lexA

umuDC

recA

uvrA

uvrY

ruvA

polB

Network inference 88

Page 149: Biological Network Inference via Gaussian Graphical Models

E. coli S.O.S DNA repair networkPrecision and Recall rates

0.0

0.5

1.0

1.5Exp. 1

0.0

0.5

1.0

1.5Exp. 2

0.0

0.5

1.0

1.5Exp. 3

Lasso

Adapti

ve

G1DBN

Renet

KnownC

l

InferC

l0.0

0.5

1.0

1.5Exp. 4

PrecisionRecall

Network inference 89

Page 150: Biological Network Inference via Gaussian Graphical Models

E. coli S.O.S DNA repair networkInferred networks

Lasso

uvrD

lexA

umuDC

recA

uvrA

uvrY

ruvA

polB

Known Classification Inferred Classification

uvrD

lexA

umuDC

recA

uvrAuvrY

ruvA

polB

uvrDlexA

umuDC

recA

uvrA

uvrY

ruvA

polB

Network inference 90

Page 151: Biological Network Inference via Gaussian Graphical Models

Reasonnable computing timeComparing computation time with the multi-tests approach

2.5 3.0 3.5 4.0 4.5 5.0

02

46

810

CPU time (log scale)

log of p/n

log

of C

PU ti

me

(sec

)

1 se

c.10

sec

.1

min

.10

min

.1

hr.

4 hr

.

G1DBNRenet−VARInfCl

Figure: Computing times on the log-log scale for Renet-VAR, G1DBN and InfCl(including inference of classes). Intel Dual Core 3.40 GHz processor.

Network inference 91

Page 152: Biological Network Inference via Gaussian Graphical Models

OutlineIntroduction

MotivationsBackground on omicsModeling issue

Modeling toolsStatistical dependenceGraphical modelsCovariance selection and Gaussian vector

Gaussian Graphical Models for genomic dataSteady-state dataTime-course data

Statistical inferencePenalized likelihood approachInducing sparsity and regularizationThe Lasso

Application in Post-genomicsModeling time-course dataIllustrationsMultitask learning

Network inference 92

Page 153: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of data (1)By reducing the number of parameters

AssumptionConnections will only appear between informative genes

di↵erential analysis

select p key genes P

p “reasonable” compared to n

typically, n 2 [p/5; 5p]

the learning dataset

n size–p vectors of expression

(X1, . . . ,Xn ) with Xi 2 Rp

inference

Network inference 93

Page 154: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of data (2)By collecting as many observations as possible

Multitask learning Go to learning

How should we merge the data?organism

drug 1drug 2

drug 3

Network inference 94

Page 155: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of data (2)By collecting as many observations as possible

Multitask learning Go to learning

by inferring each network independentlyorganism

drug 1drug 2

drug 3

(X(1)1 , . . . ,X

(1)n1

),X(1)i 2 Rp1 (X

(2)1 , . . . ,X

(2)n2

),X(2)i 2 Rp2 (X

(3)1 , . . . ,X

(3)n3

),X(3)i 2 Rp3

inference inference inference

Network inference 94

Page 156: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of data (2)By collecting as many observations as possible

Multitask learning Go to learning

by pooling all the available dataorganism

drug 1drug 2

drug 3

(X1, . . . ,Xn ),Xi 2 Rp , with n = n1 + n2 + n3.

inference

Network inference 94

Page 157: Biological Network Inference via Gaussian Graphical Models

Handling the scarcity of data (2)By collecting as many observations as possible

Multitask learning Go to learning

by breaking the separabilityorganism

drug 1drug 2

drug 3

(X(1)1 , . . . ,X

(1)n1

),X(1)i 2 Rp1 (X

(2)1 , . . . ,X

(2)n2

),X(2)i 2 Rp2 (X

(3)1 , . . . ,X

(3)n3

),X(3)i 2 Rp3

inference

Network inference 94

Page 158: Biological Network Inference via Gaussian Graphical Models

Coupling related problems

Consider

IT samples concerning the expressions of the same p genes,

IX

(t)1 , . . . ,X (t)

nt is the t

th sample drawn from N (0p ,⌃(t)

), withcovariance matrix S

(t).

Multiple samples setup Go to scheme

Ignoring the relationships between the tasks leads to

arg max

⇥(t),t=1...,T

TX

t=1

L(⇥

(t);S

(t)) � � pen`1(⇥

(t),Z).

Breaking the separability

I Either by modifying the objective function

I or the constraints.

Network inference 95

Page 159: Biological Network Inference via Gaussian Graphical Models

Coupling related problems

Consider

IT samples concerning the expressions of the same p genes,

IX

(t)1 , . . . ,X (t)

nt is the t

th sample drawn from N (0p ,⌃(t)

), withcovariance matrix S

(t).

Multiple samples setup Go to scheme

Ignoring the relationships between the tasks leads to

arg max

⇥(t),t=1...,T

TX

t=1

L(⇥

(t);S

(t)) � � pen`1(⇥

(t),Z).

Breaking the separability

I Either by modifying the objective function

I or the constraints.

Network inference 95

Page 160: Biological Network Inference via Gaussian Graphical Models

Coupling related problems

Consider

IT samples concerning the expressions of the same p genes,

IX

(t)1 , . . . ,X (t)

nt is the t

th sample drawn from N (0p ,⌃(t)

), withcovariance matrix S

(t).

Multiple samples setup Go to scheme

Ignoring the relationships between the tasks leads to

arg max

⇥(t),t=1...,T

TX

t=1

L(⇥

(t);S

(t)) � � pen`1(⇥

(t),Z).

Breaking the separability

I Either by modifying the objective function

I or the constraints.

Network inference 95

Page 161: Biological Network Inference via Gaussian Graphical Models

Coupling problems by grouping variables through penalty

Potential groups Group(s) induced by edges (1, 2)

Group

-Lasso

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

Coo

perative-Lasso

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

X1 X2

X3 X4

Network inference 96

Page 162: Biological Network Inference via Gaussian Graphical Models

Conclusion

Versatile Statistical Tools

I Gaussian Graphical Model

I Sparse Regression

I Model Based Graph Clustering

Available Package on CRAN

ISimone to infer graphs from various kind of data

Iquadrupen to solve e�ciently various penalized regressions problems

Iosbm and mixer to search for structure in graphs

Network inference 97

Page 163: Biological Network Inference via Gaussian Graphical Models

References

Christopher Bishop, 2002-2006.Pattern Recognition and Machine Learning

Joe Whittacker, 1996.Graphical Model

Trevor Hastie, Robert Tibshirani, Jerome Friedman, 2009.The Elements of Statistical Learning (2nd Edition).

Ste↵en Lauritzen, 1996.Graphical Models.

+ papers of our lab (see webpage) ;-)

Network inference 98