Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

77
Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland

Transcript of Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Page 1: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Probabilistic modelling in computational biology

Dirk Husmeier

Biomathematics & Statistics Scotland

Page 2: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

James Watson & Francis Crick, 1953

Page 3: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Frederick Sanger, 1980

Page 4: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 5: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 6: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 7: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Network reconstruction from postgenomic data

Page 8: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Model Parameters q

Page 9: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Friedman et al. (2000), J. Comp. Biol. 7, 601-620

Marriage between

graph theory

and

probability theory

Page 10: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Bayes net

ODE model

Page 11: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Model Parameters q

Probability theory Likelihood

Page 12: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Model Parameters q

Bayesian networks: integral analytically tractable!

Page 13: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

UAI 1994

Page 14: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Identify the best network structure

Ideal scenario: Large data sets, low noise

Page 15: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Uncertainty about the best network structure

Limited number of experimental replications, high noise

Page 16: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Sample of high-scoring networks

Page 17: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Sample of high-scoring networks

Feature extraction, e.g. marginal posterior probabilities of the edges

High-confident edge

High-confident non-edge

Uncertainty about edges

Page 18: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Number of structures

Number of nodes

Sampling with MCMC

Page 19: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Madigan & York (1995), Guidici & Castello (2003)

Page 20: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 21: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Overview

• Introduction

• Limitations

• Methodology

• Application to morphogenesis

• Application to synthetic biology

Page 22: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Homogeneity assumption

Interactions don’t change with time

Page 23: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Limitations of the homogeneity assumption

Page 24: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Example: 4 genes, 10 time points

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10

X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10

X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10

X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10

Page 25: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Supervised learning. Here: 2 components

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10

X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10

X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10

X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10

Page 26: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Changepoint model

Parameters can change with time

Page 27: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Changepoint model

Parameters can change with time

Page 28: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10

X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10

X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10

X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10

Unsupervised learning. Here: 3 components

Page 29: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Extension of the model

q

Page 30: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Extension of the model

q

Page 31: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Extension of the model

q

k

h

Number of components (here: 3)

Allocation vector

Page 32: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Analytically integrate out the parameters

q

k

h

Number of components (here: 3)

Allocation vector

Page 33: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 34: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

P(network structure | changepoints, data)

P(changepoints | network structure, data)

Birth, death, and relocation moves

RJMCMC within Gibbs

Page 35: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Dynamic programming, complexity N2

Page 36: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 37: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Collaboration with the Institute of

Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group)

- Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4,

ELF3, GI, PRR9, PRR5, and PRR3

- Transcriptional profiles at 4*13 time points in 2h intervals under constant light for

- 4 experimental conditions

Circadian rhythms in Arabidopsis thaliana

Page 38: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Comparison with the literature

PrecisionProportion of identified interactions that

are correct

Recall = Sensitivity Proportion of true interactions that we

successfully recovered

SpecificityProportion of non-interactions that are

successfully avoided

Page 39: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

CCA1

LHY

PRR9

GI

ELF3

TOC1

ELF4

PRR5

PRR3

False negative

Which interactions from the literature are found?

True positive

Blue: activations

Red:Inhibitions

True positives (TP) = 8

False negatives (FN) = 5

Recall= 8/13= 62%

Page 40: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Which proportion of predicted interactions are confirmed by the literature?

False positives

Blue: activationsRed: Inhibitions

True positive

True positives (TP) = 8

False positives (FP) = 13

Precision = 8/21= 38%

Page 41: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Precision= 38%

CCA1

LHY

PRR9

GI

ELF3

TOC1

ELF4

PRR5

PRR3

Recall= 62%

Page 42: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

True positives (TP) = 8

False positives (FP) = 13

False negatives (FN) = 5

True negatives (TN) = 9²-8-13-5= 55

Sensitivity = TP/[TP+FN] = 62%

Specificity = TN/[TN+FP] = 81%

Recall

Proportion of avoided non-interactions

Page 43: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Model extension So far: non-stationarity in the

regulatory process

Page 44: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Non-stationarity in the network structure

Page 45: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Flexible network structure .

Page 46: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Model Parameters q

Page 47: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Model Parameters q

Use prior knowledge!

Page 48: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Flexible network structure .

Page 49: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Flexible network structure with regularization

Hyperparameter

Normalization factor

Page 50: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Flexible network structure with regularization

Exponential priorversus

Binomial prior with conjugate beta

hyperprior

Page 51: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

NIPS 2010

Page 52: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Overview

• Introduction

• Limitations

• Methodology

• Application to morphogenesis

• Application to synthetic biology

Page 53: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Morphogenesis in Drosophila melanogaster

• Gene expression measurements at 66 time points during the life cycle of Drosophila (Arbeitman et al., Science, 2002).

• Selection of 11 genes involved in muscle development.

Zhao et al. (2006),

Bioinformatics 22

Page 54: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Can we learn the morphogenetic transitions: embryo larva

larva pupa pupa

adult ?

Page 55: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Average posterior probabilities of transitions

Morphogenetic transitions: Embryo larva larva pupa pupa adult

Page 56: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 57: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Can we learn changes in the regulatory network structure ?

Page 58: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 59: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Overview

• Introduction

• Limitations

• Methodology

• Application to morphogenesis

• Application to synthetic biology

Page 60: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 61: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 62: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Can we learn the switch Galactose Glucose?

Can we learn the network structure?

Page 63: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Task 1:Changepoint detection

Switch of the carbon source:Galactose Glucose

Page 64: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Galactose Glucose

Page 65: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Task 2:Network reconstruction

PrecisionProportion of identified interactions

that are correct

Recall Proportion of true interactions that

we successfully recovered

Page 66: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

BANJO: Conventional homogeneous DBN TSNI: Method based on differential equations

Inference: optimization, “best” network

Page 67: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 68: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Sample of high-scoring networks

Page 69: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Sample of high-scoring networks

Marginal posterior probabilities of the edges

P=1

P=0

P=0.5

Page 70: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

P=1

True network

Thresh 0.9

Prec 1

Recall 1/2

PrecisionRecall

Page 71: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

P=1 P=0.5

True network

Thresh 0.9 0.4

Prec 1 2/3

Recall 1/2 1

PrecisionRecall

Page 72: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

P=1

P=0

P=0.5

True network

Thresh 0.9 0.4 -0.01

Prec 1 2/3 1/2

Recall 1/2 1 1

PrecisionRecall

Page 73: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.
Page 74: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Future work

Page 75: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

How are we getting from here …

Page 76: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

… to there ?!

Page 77: Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Input:Learn:MCMC

Prior knowledge