Protein Interaction Networks

96
Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen University [email protected] Feb. 21, 2013

description

Protein Interaction Networks. Feb. 21, 2013. Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen University [email protected]. My research. Protein complex structures Protein-protein docking Correlated mutations - PowerPoint PPT Presentation

Transcript of Protein Interaction Networks

Page 1: Protein Interaction Networks

Protein Interaction Networks

Aalt-Jan van DijkApplied Bioinformatics, PRI, Wageningen UR

& Mathematical and Statistical Methods, Biometris, Wageningen [email protected]

Feb. 21, 2013

Page 2: Protein Interaction Networks

My research

• Protein complex structures– Protein-protein docking– Correlated mutations

• Interaction site prediction/analysis

– Protein-protein interactions– Enzyme active sites– Protein-DNA interactions

• Network modelling– Gene regulatory networks– Flowering related

Page 3: Protein Interaction Networks

Overview

• Introduction: protein interaction networks• Sequences & networks: predicting interaction sites• Predicting protein interactions• Sequence and network evolution• Interaction network alignment

Page 4: Protein Interaction Networks

Protein Interaction Networks

Obligatory

hemoglobin

Page 5: Protein Interaction Networks

Obligatory Transient

hemoglobin Mitochondrial Cu transporters

Protein Interaction Networks

Page 6: Protein Interaction Networks

Experimental approaches (1)

Yeast two-hybrid (Y2H)

Page 7: Protein Interaction Networks

Experimental approaches (2)

Affinity Purification + mass spectrometry (AP-MS)

Page 8: Protein Interaction Networks

Interaction Databases• STRING http://string.embl.de/

Page 9: Protein Interaction Networks

Interaction Databases

Page 10: Protein Interaction Networks

Interaction Databases• STRING http://string.embl.de/• HPRD http://www.hprd.org/

Page 11: Protein Interaction Networks

Interaction Databases

Page 12: Protein Interaction Networks

Interaction Databases• STRING http://string.embl.de/• HPRD http://www.hprd.org/• MINT http://mint.bio.uniroma2.it/mint/

Page 13: Protein Interaction Networks

Interaction Databases

Page 14: Protein Interaction Networks

Interaction Databases• STRING http://string.embl.de/• HPRD http://www.hprd.org/• MINT http://mint.bio.uniroma2.it/mint/• INTACT http://www.ebi.ac.uk/intact/

Page 15: Protein Interaction Networks

Interaction Databases

Page 16: Protein Interaction Networks

Interaction Databases• STRING http://string.embl.de/• HPRD http://www.hprd.org/• MINT http://mint.bio.uniroma2.it/mint/• INTACT http://www.ebi.ac.uk/intact/• BIOGRID http://thebiogrid.org/

Page 17: Protein Interaction Networks

Interaction Databases

Page 18: Protein Interaction Networks

Some numbers

Organism Number of known interactions

H. Sapiens 113,217S. Cerevisiae 75,529D. Melanogaster 35,028A. Thaliana 13,842M. Musculus 11,616

Biogrid (physical interactions)

Page 19: Protein Interaction Networks

Overview

• Introduction: protein interaction networks• Sequences & networks: predicting interaction

sites• Predicting protein interactions• Sequence and network evolution• Interaction network alignment

Page 20: Protein Interaction Networks

Binding site

Page 21: Protein Interaction Networks

Binding site predictionApplications:

Page 22: Protein Interaction Networks

Binding site predictionApplications:•Understanding network evolution•Understanding changes in protein function•Predict protein interactions•Manipulate protein interactions

Page 23: Protein Interaction Networks

Binding site predictionApplications:•Understanding network evolution•Understanding changes in protein function•Predict protein interactions•Manipulate protein interactionsInput data:•Interaction network•Sequences (possibly structures)

Page 24: Protein Interaction Networks

Sequence-based predictions

Page 25: Protein Interaction Networks

Sequences and networks

• Goal: predict interaction sites and/or motifs

Page 26: Protein Interaction Networks

Sequences and networks

• Goal: predict interaction sites and/or motifs

• Data: interaction networks, sequences

Page 27: Protein Interaction Networks

Sequences and networks

• Goal: predict interaction sites and/or motifs

• Data: interaction networks, sequences

• Validation: structure data, “motif databases”

Page 28: Protein Interaction Networks

Motif search in groups of proteins• Group proteins which have same interaction partner• Use motif search, e.g. find PWMs

Neduva Plos Biol 2005

Page 29: Protein Interaction Networks

Correlated Motifs

Page 30: Protein Interaction Networks

Correlated Motifs

• Motif model

• Search

• Scoring

Page 31: Protein Interaction Networks

Predefined motifs

Page 32: Protein Interaction Networks

Predefined motifs

Page 33: Protein Interaction Networks

Predefined motifs

Page 34: Protein Interaction Networks

Predefined motifs

Page 35: Protein Interaction Networks

Predefined motifs

Page 36: Protein Interaction Networks

Correlated Motif MiningFind motifs in one set of proteins which interact with(almost) all proteins with another motif

Page 37: Protein Interaction Networks

Correlated Motif MiningFind motifs in one set of proteins which interact with(almost) all proteins with another motif

Motif-models:•PWM – so far not applied•(l,d) with l=length, d=number of wildcards

Score: overrepresentation, e.g. χ2

Page 38: Protein Interaction Networks

Correlated Motif MiningFind motifs in one set of proteins which interact with(almost) all proteins with another motif

Search:•Interaction driven•Motif driven

Page 39: Protein Interaction Networks

Interaction driven approachesMine for (quasi-)bicliques most-versus-most interactionThen derive motif pair from sequences

Page 40: Protein Interaction Networks

Motif driven approachesStarting from candidate motif pairs, evaluate theirsupport in the network (and improve them)

Page 41: Protein Interaction Networks

D-MOTIF

Tan BMC Bioinformatics 2006

Page 42: Protein Interaction Networks
Page 43: Protein Interaction Networks

IMSS: application of D-MOTIF

Van Dijk et al., Bioinformatics 2008Van Dijk et al., Plos Comp Biol 2010

proteinYprotein

XTe

st e

rror

Number of selected motif pairs

Page 44: Protein Interaction Networks

Experimental validationprotein

YproteinX

Test

err

or

Number of selected motif pairs

Van Dijk et al., Bioinformatics 2008Van Dijk et al., Plos Comp Biol 2010

Page 45: Protein Interaction Networks

proteinYprotein

X

Van Dijk et al., Bioinformatics 2008Van Dijk et al., Plos Comp Biol 2010

Test

err

or

Number of selected motif pairs

Experimental validation

Page 46: Protein Interaction Networks

proteinYprotein

X

Van Dijk et al., Bioinformatics 2008Van Dijk et al., Plos Comp Biol 2010

Test

err

or

Number of selected motif pairs

Experimental validation

Page 47: Protein Interaction Networks

SLIDER

Boyen et al. Trans Comp Biol Bioinf 2011

Page 48: Protein Interaction Networks

SLIDER

Page 49: Protein Interaction Networks

Validation

Page 50: Protein Interaction Networks

Extensions of SLIDER

Boyen et al. Trans Comp Biol Bioinf 2013

Page 51: Protein Interaction Networks

Extensions of SLIDERExtension I: better coverage of networkExtension II: use of more biological information

Page 52: Protein Interaction Networks

bioSLIDERDGIFELELYLPDDYPMEAPKVRFLTKI

Page 53: Protein Interaction Networks

DGIFELELYLPDDYPMEAPKVRFLTKIconservation

bioSLIDER

Page 54: Protein Interaction Networks

DGIFELELYLPDDYPMEAPKVRFLTKIconservationaccessibility

bioSLIDER

Page 55: Protein Interaction Networks

DGIFELELYLPDDYPMEAPKVRFLTKIconservationaccessibility

bioSLIDER

Thresholds for conservation and accessibilityExtension of motif model: amino acid similarity (BLOSUM)

Page 56: Protein Interaction Networks

DGIFELELYLPDDYPMEAPKVRFLTKIconservation

No conservation, no accessibilityConservation and accessibility

Using human and yeast data for training and optimizing parameters

0.0 0.3 0.6Inte

ract

ion-

cove

rage

0.0 0.3 0.6

0.50.40.30.20.10.0

accessibility

bioSLIDER

Motif-accuracyLeal Valentim et al., PLoS ONE 2012

Page 57: Protein Interaction Networks

Application to Arabidopsis

Arabidopsis Interactome Mapping Consortium, Science 2011

Input data: 6200 interactions, 2700 proteinsInterface predictions for 985 proteins (on average 20 residues)

Page 58: Protein Interaction Networks

Ecotype sequence data (SNPs)SNPs tend to ‘avoid’ predicted binding sites

In 263 proteins there is a SNP in a binding site these proteins are much more connected to each otherthan would be randomly expected

Page 59: Protein Interaction Networks

Summary

• Prediction of interaction sites using proteininteraction networks and protein sequences• Correlated motif approaches

Page 60: Protein Interaction Networks

Overview

• Introduction: protein interaction networks• Sequences & networks: predicting interaction sites• Predicting protein interactions• Sequence and network evolution• Interaction network alignment

Page 61: Protein Interaction Networks

Protein Interaction Prediction

Lots of genomes are being sequenced…(www.genomesonline.org)

Complete IncompleteARCHAEA 182 264BACTERIA 3767 14393EUKARYA 183 2897TOTAL 4132 17514

Page 62: Protein Interaction Networks

Protein Interaction Prediction

Lots of genomes are being sequenced… (www.genomesonline.org)

Complete IncompleteARCHAEA 182 264BACTERIA 3767 14393EUKARYA 183 2897TOTAL 4132 17514

But how do we know how the proteins in there work together?!

Page 63: Protein Interaction Networks

Protein Interaction Prediction

• Interactions of orthologs: interologs

• Phylogenetic profiles

• Domain-based predictions

A 1 0 1 1 0 0 1

B 1 0 1 1 0 0 1

Page 64: Protein Interaction Networks

Orthology based prediction

Page 65: Protein Interaction Networks

Orthology based prediction

Page 66: Protein Interaction Networks

Phylogenetic profiles

A 1 0 1 1 0 0 1

B 1 0 1 1 1 0 1

C 1 0 1 1 1 0 1

D 0 1 0 1 0 0 1

Page 67: Protein Interaction Networks

Domain Based Predictions

Page 68: Protein Interaction Networks

Domain Based Predictions

Page 69: Protein Interaction Networks

Overview

• Introduction: protein interaction networks• Sequences & networks: predicting interaction sites• Predicting protein interactions• Sequence and network evolution• Interaction network alignment

Page 70: Protein Interaction Networks

Duplications

Page 71: Protein Interaction Networks

Duplications and interactions

Gene duplication

Page 72: Protein Interaction Networks

Duplications and interactions

Gene duplication

Page 73: Protein Interaction Networks

Duplications and interactions

0.1 Myear-1

Gene duplication Interaction loss

0.001 Myear-1

Page 74: Protein Interaction Networks

Duplications and interaction loss

Duplicate pairs share interaction partners

Page 75: Protein Interaction Networks

Interaction network evolution

Science 2011

Page 76: Protein Interaction Networks

Overview

• Introduction: protein interaction networks• Sequences & networks: predicting interaction sites• Predicting protein interactions• Sequence and network evolution• Interaction network alignment

Page 77: Protein Interaction Networks

Network alignment

Local Network Alignment: find multiple, unrelated regions ofIsomorphism

Global Network Alignment: find the best overall alignment

Page 78: Protein Interaction Networks

PATHBLAST

Kelley, PNAS 2003

Page 79: Protein Interaction Networks

PATHBLAST: scoring

Kelley, PNAS 2003

homology

interaction

Page 80: Protein Interaction Networks

PATHBLAST: results

Kelley, PNAS 2003

Page 81: Protein Interaction Networks

PATHBLAST: results

Kelley, PNAS 2003

For yeast vs H.pylori, with L=4, all resulting paths with p<=0.05 can be merged into just five network regions

Page 82: Protein Interaction Networks

Multiple alignment

Scoring: Probabilistic model for interaction subnetworks

Sub-networks: bottom-up search, starting with exhaustivesearch for L=4; followed by local search

Sharan PNAS 2005

Page 83: Protein Interaction Networks

Multiple alignment: results

Sharan PNAS 2005

Page 84: Protein Interaction Networks

Multiple alignment: results

Applications include protein function predictionand interaction prediction

Sharan PNAS 2005

Page 85: Protein Interaction Networks

Global alignment

Singh PNAS 2008

Page 86: Protein Interaction Networks

Global alignment

Singh PNAS 2008

Page 87: Protein Interaction Networks

Global alignment

Alignment: greedy selection of matches

Singh PNAS 2008

Page 88: Protein Interaction Networks

Network alignment: the future?

Sharan & Ideker Nature Biotech 2006

Page 89: Protein Interaction Networks

Summary• Interaction network evolution: mostly

“comparative”, not much mechanistic• Approaches exist to integrate and model network

analysis within context of phylogeny (not discussed)

• Outlook: combine interaction site prediction with network evolution analysis

Page 90: Protein Interaction Networks

ExercisesThe datafiles “arabidopsis_proteins.lis” and “interactions_arabidopsis.data” contain Arabidopsis MADS proteins (which regulate various developmental processes including flowering), and their mutual interactions, respectively.

AGL24 LFYSOC1

Page 91: Protein Interaction Networks

Exercise 1

• Start by getting familiar with the basic Cytoscape features described in section 1 of the tutorial http://opentutorials.cgl.ucsf.edu/index.php/Tutorial:Introduction_to_Cytoscape

• Load the data into Cytoscape• Visualize the network and analyze the number of

interactions per proteins – which proteins do have a lot of interactions?

Page 92: Protein Interaction Networks

Exercise 2

Write a script that reads interaction data and implements a datastructure which enables further analysis of the data (see setup on next slides).Use the datafiles “arabidopsis_proteins.lis” and “interactions_arabidopsis.data” and let the script print a table in the following format:PROTEIN Number_of_interactionsMake a plot of those data

Page 93: Protein Interaction Networks

#two subroutines

#input: filename#output: list with content of filesub read_list { my $infile=$_[0]; YOUR CODEreturn @newlist;}

#input: protein list and interaction list#output: hash with “proteins” list of their partners sub combine_prot_int($$) { my ($plist,$intlist) = @_; YOUR CODEreturn %inthash;}

Page 94: Protein Interaction Networks

#reading input datamy @plist= read_list($ARGV[0]);my @intlist= read_list($ARGV[1]);

#obtaining hash with interactions%inthash=combine_prot_int(\@plist,\@intlist);

YOUR CODE#loop over all proteins and print their name and their number of interactions

Page 95: Protein Interaction Networks
Page 96: Protein Interaction Networks

In “orthology_relations.data” we have a set of predictedorthologs for the Arabidopsis proteins fromexercise 1. “protein_information.data” describes a.o. from which species these proteins are. Finally, “interactions.data“ contains interactions between those proteins.Use the Arabidopsis interaction data from exercise 1 to “predict” interactions in other species using the orthology information. Compare your predictions with the real interaction data and make a plot that visualizes how good your predictions are.

Exercise 3