1 Palmitoylation-Induced Aggregation of Cysteine-String Protein ...
STRING - Cross-species integration of known and predicted protein-protein interactions
-
Upload
lars-juhl-jensen -
Category
Technology
-
view
431 -
download
1
description
Transcript of STRING - Cross-species integration of known and predicted protein-protein interactions
![Page 1: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/1.jpg)
STRINGCross-species integration of known and
predicted protein-protein interactions
Lars Juhl JensenEMBL Heidelberg
![Page 2: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/2.jpg)
STRING provides a protein network based on integration of diverse types of evidence
Genomic neighborhood
Species co-occurrence
Gene fusions
Database imports
Exp. interaction data
Microarray expression data
Literature co-mentioning
![Page 3: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/3.jpg)
Inferring functional modules fromgene presence/absence patterns
Restingprotuberances
Protractedprotuberance
Cellulose
© Trends Microbiol, 1999
CellCell wall
Anchoring proteins
Cellulosomes
Cellulose
The “Cellulosome”
![Page 4: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/4.jpg)
Genomic context methods
© Nature Biotechnology, 2004
![Page 5: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/5.jpg)
Formalizing the phylogenetic profile method
Align all proteins against allAlign all proteins against all
Calculate best-hit profileCalculate best-hit profile
Join similar species by PCAJoin similar species by PCA
Calculate PC profile distancesCalculate PC profile distances
Calibrate against KEGG mapsCalibrate against KEGG maps
![Page 6: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/6.jpg)
Predicting functional and physical interactions from gene fusion/fission events
Find in A genes that matcha the same gene in B
Find in A genes that matcha the same gene in B
Exclude overlappingalignments
Exclude overlappingalignments
Calibrate againstKEGG maps
Calibrate againstKEGG maps
Calculate all-against-allpairwise alignments
Calculate all-against-allpairwise alignments
![Page 7: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/7.jpg)
Inferring functional associations from evolutionarily conserved operons
Identify runs of adjacent geneswith the same direction
Identify runs of adjacent geneswith the same direction
Score each gene pair based onintergenic distances
Score each gene pair based onintergenic distances
Calibrate against KEGG mapsCalibrate against KEGG maps
Infer associationsin other species
Infer associationsin other species
![Page 8: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/8.jpg)
Score calibration against a common reference
• Many diverse types of evidence– The quality of each is judged by
very different raw scores
– Quality differences exist among data sets of the same type
• Solved by calibrating all scores against a common reference– Scores are directly comparable
– Probabilistic scores allow evidence to be combined
• Requirements for the reference– Must represent a compromise of
the all types of evidence
– Broad species coverage
![Page 9: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/9.jpg)
Integrating physical interaction screens
Complexpull-down
experiments
Complexpull-down
experiments
Yeast two-hybriddata sets are
inherently binary
Yeast two-hybriddata sets are
inherently binary
Calculate scorefrom number of
(co-)occurrences
Calculate scorefrom number of
(co-)occurrences
Calculate scorefrom non-shared
partners
Calculate scorefrom non-shared
partners
Calibrate against KEGG mapsCalibrate against KEGG maps
Infer associations in other speciesInfer associations in other species
Combine evidence from experimentsCombine evidence from experiments
![Page 10: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/10.jpg)
Mining microarray expression databases
Re-normalize arraysby modern methodto remove biases
Re-normalize arraysby modern methodto remove biases
Buildexpression
matrix
Buildexpression
matrix
Combinesimilar arrays
by PCA
Combinesimilar arrays
by PCA
Calculate pairwiselinear correlation
coefficients
Calculate pairwiselinear correlation
coefficients
Calibrateagainst
KEGG maps
Calibrateagainst
KEGG maps
Inferassociations inother species
Inferassociations inother species
![Page 11: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/11.jpg)
?
Source species
Target species
Evidence transfer based on “fuzzy orthology”
• Orthology transfer is tricky– Correct assignment of orthology
is difficult for distant species
– Functional equivalence cannot be guaranteed for in-paralogs
• These problems are addressed by our “fuzzy orthology” scheme– Confidence scores for functional
equivalence are calculated from all-against-all alignment
– Evidence is distributed across possible pairs according to confidence scores in the case of many-to-many relationships
![Page 12: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/12.jpg)
Multiple evidence types from several species
![Page 13: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/13.jpg)
Getting more specific – generally speaking
• Benchmarking against one common reference allows integration of heterogeneous data
• The different types of data do not all tell us about the same kind of functional associations
• It should be possible to assign likely interaction types from supporting evidence types
• The aim: to construct an accurate, qualitative models of biological systems or processes
• The models should be accurate even at the level of individual interactions
• This allows specific, testable hypotheses to be made based on high-throughput experimental data
![Page 14: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/14.jpg)
Yeast culture Microarrays Gene expression Expression profile
600 periodically expressed genes (with associated peak times) that encode “dynamic
proteins”
The parts listNew analysis
Getting the parts list
Cho & Spellman et al.
![Page 15: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/15.jpg)
Constructing a reliable protein network
• The stickiness of an interaction was scored based on its local network topology
• We benchmarked these scores for each individual data set against a common reference
• Impossible interactions were eliminated based on subcellular localization data
• By restricting the network to a particular system the error rate is further reduced
![Page 16: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/16.jpg)
Cell cycle microarray
data
Physical PPI interactions with
confidence scores
Expand the set of proteins to include non-periodic proteins that are strongly connected to
periodic proteins
Raw DataNode selection
List of periodically expressed proteins
with peak time
Interactions
Require compatible compartments and high confidence
Extract cell cycle network
Extracting a cell cycle interaction network
![Page 17: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/17.jpg)
The temporal interaction network
Interacting proteins are expressed close in time
Two thirds of the dynamic proteins lack interactions but likely participate in transient interactions
![Page 18: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/18.jpg)
Static proteins comprise a third of the interactions at all times of the cell cycle
Their time of action can be predicted from interactions with dynamic proteins
Static proteins play a major role
![Page 19: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/19.jpg)
Cdc28p and its interaction partners
![Page 20: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/20.jpg)
Just-in-time synthesis vs. just-in-time assembly
Most dynamic proteins are expressed just before they are needed to carry out their function
Most complexes also contain static proteins
Just-in-time assembly of complexes appear to be the general principle
The time of assembly is controlled synthesizing the last subunits just-in-time
![Page 21: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/21.jpg)
Assembly of the pre-replication complex
![Page 22: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/22.jpg)
Network as a discovery tools
The network enables us to place 30+ uncharacterized proteins in a temporal interaction context
Quite detailed hypotheses can be made concerning the their function
The network also contains entire novel modules and complexes
![Page 23: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/23.jpg)
Transcription is linked to phosphorylation
A genome-wide screen identified 332 Cdc28p targets, which include– 6% of all yeast proteins
– 8% of the static proteins
– 27% of the dynamic ones
A similar correlation was observed with predicted PEST regions
This suggests a hitherto undescribed link between transcriptional and post-translational control
![Page 24: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/24.jpg)
Conclusions
• Genomic context methods are able to infer the function of many prokaryotic proteins from genome sequences alone
• Integration of large-scale experimental data allows similar predictions to be made for eukaryotic proteins
• Benchmarking is a prerequisite for data integration
• It is possible to construct highly reliable models through careful integration of high-throughput experimental data
• Try STRING at http://string.embl.de
![Page 25: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/25.jpg)
Acknowledgments
• The STRING team– Christian von Mering– Berend Snel– Martijn Huynen– Daniel Jaeggi– Steffen Schmidt– Sean Hooper– Julien Lagarde– Mathilde Foglierini– Peer Bork
• New context methods– Jan Korbel– Christian von Mering– Peer Bork
• Cell cycle analysis– Ulrik de Lichtenberg– Thomas Skøt Jensen– Anders Fausbøll– Søren Brunak
![Page 26: STRING - Cross-species integration of known and predicted protein-protein interactions](https://reader035.fdocuments.in/reader035/viewer/2022062513/55501a33b4c90535638b5060/html5/thumbnails/26.jpg)
Thank you!