Inferring ancestral states of the bZIP transcription factor interaction network

29
Combining the strengths of UMIST and The Victoria University of Manchester Inferring ancestral states of the bZIP transcription factor interaction network John Pinney Faculty of Life Sciences University of Manchester, UK

description

Inferring ancestral states of the bZIP transcription factor interaction network. John Pinney. Faculty of Life Sciences University of Manchester, UK. Networks in computational biology. The genotype  phenotype relationship is mediated by many inter-related biochemical networks. - PowerPoint PPT Presentation

Transcript of Inferring ancestral states of the bZIP transcription factor interaction network

Page 1: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Inferring ancestral states of the bZIP transcription factor interaction network

John Pinney

Faculty of Life Sciences

University of Manchester, UK

Page 2: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Networks in computational biology

• The genotype phenotype relationship is mediated by many inter-related biochemical networks.

protein interaction gene regulationmetabolismsignal transduction

Page 3: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Network evolution

• As our knowledge of large-scale network structures improves, we can start to ask questions about the evolution of cellular systems as a whole, instead of simply looking at phylogenetic trees for individual genes.

species A

species B

species C

species D

Page 4: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Network inference

• We would like to be able to predict ancestral interactions based only on observations of networks from extant species.

• The problem is compounded by the poor quality of high-throughput datasets (many false positives and negatives).

species A

species B

species C

species D

Page 5: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Network inference by probabilistic methods

• We can use a probabilistic methodology to combine multiple noisy observations of extant networks across several species.

• Would like to infer probabilities for “strong” interactions between every pair of proteins in each of the least common ancestors, as well as the extant species.

species A

species B

species C

species D

observed datainferred networks

Page 6: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

bZIP transcription factors

• A useful model system for investigating methods for ancestral network inference!

• Family of homo- and hetero-dimerizing proteins.

• Involved in development, metabolism, circadian rhythm.

• bZIP domain consists of a basic region (contacting the DNA major groove) and a leucine zipper (LZ) mediating dimerization specificity.

Page 7: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

bZIP transcription factors

• The different sub-families of bZIP proteins are known to have broadly conserved interactions with each other.

GD Amoutzias et al. (2007)Mol Biol Evol 24:827-835

Page 8: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

bZIP interactions

• The relative strengths of pairwise interactions between bZIP proteins have been measured experimentally for human and yeast.

• In addition, the relatively simple biophysics of the coiled-coil interaction means that strong interactions can be predicted reliably from sequence data alone.

JRS Newman, AE Keating (2003)Science 300:2097-2101

(Darker colours show stronger interactions)JH Fong, AE Keating, M Singh (2004)Genome Biol 5:R11

Page 9: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Genomic data

• Using sets of bZIP proteins from four chordate genomes, we construct a Maximum Likelihood phylogeny for the gene family with PAML.

• The software by Fong et al. can be used to predict interactions between the LZ regions for the extant genomes. The scores for each pair of proteins will be our “observations” of the networks

Teleost

Ciona

Human

Fugu

Danio

Vertebrate

Chordate

Page 10: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Reconciling gene and species trees

• To keep the analysis as simple as possible, we need to decide on a fixed set of proteins at each ancestral species.

• This can be done by “reconciling” our gene phylogeny with the known species tree using the NOTUNG software.

D Durand, BV Halldorsson, B Vernot (2006)J Comp Biol 13:320-335

Page 11: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

From gene trees to interaction trees

• The model of network evolution is greatly simplified by converting to an alternative view, considering all possible interactions within a tree.

Page 12: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

From an interaction tree to a probabilistic model

• Our probabilistic graphical model of network evolution is based directly on the interaction tree.

• Binary nodes represent the presence or absence of each potential interaction.

• Continuous nodes are added to represent observations of interactions in extant species (our interaction scores).

Page 13: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Probabilistic model parameters

• There are two different processes to consider in parametrising the model:

1) How are protein interactions re-wired as sequences evolve?

2) How are the observed data related to the real extant networks?

species A

species B

species C

species D

false positives and negatives introduced

network re-wiring

Page 14: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Estimating rates of network re-wiring

• It is difficult to construct a general model for gain and loss of interactions as a protein interaction network evolves.

• For the bZIP network, we can estimate probabilities of gain and loss of interactions using the experimental data for human proteins.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10

prob

abili

ty

d1 + d2

P(loss of a strong interaction)

P(gain of a strong interaction)

Both loss and gain of interactions are well described by logistic functions of the sum of evolutionary distances.

d1

d2

d2d1

loss of strong interaction

gain of strong interaction

Page 15: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Results: Vertebrate

Page 16: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Adding noise to the input data

= 0 = 10 = 20

(Human input data shown)

• The parsimony approach might be expected to work well in cases with good quality observed data.

• However, real interaction datasets are often extremely noisy. We can simulate this situation by adding Gaussian noise with different variances to the input scores.

Page 17: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

ROC curves: Vertebrata (noise added to inputs)

• As expected, the parsimony method quickly fails when the data quality falls.

• The probabilistic inference method is much more robust to poor quality data, as it combines evidence across all species.

Page 18: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Using probabilistic inference to clean noisy interaction data

• The probabilistic inference method offers a principled way to combine cross-species interaction data of various types.

• This could be very useful in improving interaction predictions in extant species.

Page 19: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Conclusions

– First successful reconstruction of ancestral interaction networks.

– Parsimony method is only appropriate if input data are reliable.

– Probabilistic inference works and is more robust to noisy data.

– Also, probabilistic method can be used to clean up protein networks by combining cross-species data in an evolutionary context.

– We hope to be able to extend this approach to model the evolution of more general classes of protein-protein interaction networks.

Page 20: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Acknowledgements

David Robertson

Magnus Rattray

Grigoris Amoutzias

Brian Holden

Amelie Veron (Muenster)

Mona Singh and Jessica Fong (Princeton)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 21: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Network inference by maximum parsimony

• One straightforward method to infer ancestral networks would be to use the principle of maximum parsimony.

• We calculate the minimal number of changes to the network during evolution that explain the observed data.

species A

species B

species C

species D

observed datainferred networks

Page 22: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Network inference using maximum parsimony

• The PARS algorithm can be used to infer ancestral states of the interaction tree that are maximally parsimonious.

• Interaction gains are weighted more highly than losses, as in the Bayesian approach.

1 gain, 3 losses 3 losses

BG Mirkin, TI Fenner, MY Galperin, EV Koonin (2003)BMC Evol Biol 3:2

Interaction lostInteraction

gained

Page 23: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Validation of inferred networks

• We can also use Maximum-Likelihood methods to infer probability distributions for sequences at each of the least common ancestors.

• The software by Fong et al. can then be used to predict interactions between the LZ regions for the ancestors.

Teleostei

Ciona

Human

Fugu

Danio

Vertebrata

Chordata

Page 24: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Predicting interactions using sequence inference

• The phylogenetic analysis software CODEML is used to infer probabilities for each amino acid at each sequence position for all nodes in the gene tree.

• Sampling from these distributions allows us to predict the strength of the interaction between each pair of proteins from the same ancestral species.

0

100

200

300

400

500

600

700

800

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

score

frequency

X1000 samples

P1

P2

90% probability of strong interaction (calibrated using human experimental data)

Page 25: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Summary of methods for ancestral network inference

1. Gold standard: ML sequence reconstruction + sequence-based prediction

2. Current best method:Maximum Parsimony

using PARS algorithm

3. New method: Inference over

probabilistic model of network evolution

X

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10

Page 26: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

bZIP interactions

• In addition, the relatively simple biophysics of the coiled-coil interaction means that strong interactions can be predicted reliably from sequence data alone. (70% sensitivity at 92% specificity)

JH Fong, AE Keating, M Singh (2004)Genome Biol 5:R11

CNC, lgMAF, smMAF families

Page 27: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Example: genomic data for human

Darker colours show stronger predictions of interaction.

Page 28: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

bZIP transcription factors

• Gene duplication has played a major role in the evolution of the bZIP family.

domain structures

Page 29: Inferring ancestral states of the bZIP transcription factor interaction network

Combining the strengths of UMIST andThe Victoria University of Manchester

Estimating error rates for predicted networks

• Using the experimental human data, we can calculate the probability of a pair of proteins having a strong interaction as a function of their sequence-based interaction score.

0

10

20

30

40

50

60

70

-17.5 -12.5 -7.5 -2.5 2.5 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5

score

freq

non-interactions (data)

non-interactions (fit)

strong interactions (data)

strong interactions (fit)