Introduction to STRING

80
Introduction to STRING Lars Juhl Jensen EMBL Heidelberg

Transcript of Introduction to STRING

Page 1: Introduction to STRING

Introduction to STRING

Lars Juhl JensenEMBL Heidelberg

Page 2: Introduction to STRING

STRING

Page 3: Introduction to STRING

integrate diverse evidence

Page 4: Introduction to STRING

functional interactions

Page 5: Introduction to STRING
Page 6: Introduction to STRING

hundreds of proteomes

Page 7: Introduction to STRING

Ensembl

Page 8: Introduction to STRING

SWISS-PROT

Page 9: Introduction to STRING

prokaryotes

Page 10: Introduction to STRING

genomic context methods

Page 11: Introduction to STRING

gene fusion

Page 12: Introduction to STRING
Page 13: Introduction to STRING

gene neighborhood

Page 14: Introduction to STRING
Page 15: Introduction to STRING

phylogenetic profiles

Page 16: Introduction to STRING
Page 17: Introduction to STRING
Page 18: Introduction to STRING
Page 19: Introduction to STRING
Page 20: Introduction to STRING

Cell

Cellulosomes

Cellulose

Page 21: Introduction to STRING

eukaryotes

Page 22: Introduction to STRING

data integration

Page 23: Introduction to STRING
Page 24: Introduction to STRING

curated knowledge

Page 25: Introduction to STRING

MIPSMunich Information center

for Protein Sequences

Page 26: Introduction to STRING

Reactome

Page 27: Introduction to STRING

KEGGKyoto Encyclopedia of Genes and Genomes

Page 28: Introduction to STRING

STKESignal Transduction Knowledge Environment

Page 29: Introduction to STRING

literature mining

Page 30: Introduction to STRING

co-mentioning

Page 31: Introduction to STRING

NLPNatural Language Processing

Page 32: Introduction to STRING

MEDLINE

Page 33: Introduction to STRING

SGDSaccharomyces Genome Database

Page 34: Introduction to STRING

The Interactive Fly

Page 35: Introduction to STRING

OMIMOnline Mendelian Inheritance in Man

Page 36: Introduction to STRING

primary experimental data

Page 37: Introduction to STRING

microarray expression data

Page 38: Introduction to STRING

GEOGene Expression Omnibus

Page 39: Introduction to STRING

SMDStanford Microarray Database

Page 40: Introduction to STRING

physical protein interactions

Page 41: Introduction to STRING

BINDBiomolecular Interaction Network Database

Page 42: Introduction to STRING

MINTMolecular Interactions Database

Page 43: Introduction to STRING

GRIDGeneral Repository for Interaction Datasets

Page 44: Introduction to STRING

DIPDatabase of Interacting Proteins

Page 45: Introduction to STRING

HPRDHuman Protein Reference Database

Page 46: Introduction to STRING

problems

Page 47: Introduction to STRING

many sources

Page 48: Introduction to STRING

different gene identifiers

Page 49: Introduction to STRING

many types of evidence

Page 50: Introduction to STRING

questionable quality

Page 51: Introduction to STRING

not directly comparable

Page 52: Introduction to STRING

spread over many species

Page 53: Introduction to STRING

parsers

Page 54: Introduction to STRING

synonyms lists

Page 55: Introduction to STRING

quality scores

Page 56: Introduction to STRING

benchmarking

Page 57: Introduction to STRING

orthology

Page 58: Introduction to STRING

how is it actually done?

Page 59: Introduction to STRING

gene fusion

Page 60: Introduction to STRING

Find in A genes that matcha the same gene in B

Exclude overlappingalignments

Calibrate againstKEGG maps

Calculate all-against-allpairwise alignments

Page 61: Introduction to STRING

gene neighborhood

Page 62: Introduction to STRING

Identify runs of adjacent geneswith the same direction

Score each gene pair based onintergenic distances

Calibrate against KEGG maps

Infer associationsin other species

Page 63: Introduction to STRING

phylogenetic profiles

Page 64: Introduction to STRING

Align all proteins against all

Calculate best-hit profile

Join similar species by PCA

Calculate PC profile distances

Calibrate against KEGG maps

Page 65: Introduction to STRING

literature co-occurrence

Page 66: Introduction to STRING

Associate abstracts with species

Identify gene names in title/abstract

Count (co-)occurrences of genes

Test significance of associations

Calibrate against KEGG maps

Infer associations in other species

Page 67: Introduction to STRING

physical interaction data

Page 68: Introduction to STRING

Make binaryrepresentationof complexes

Yeast two-hybriddata sets are

inherently binary

Calculate scorefrom number of

(co-)occurrences

Calculate scorefrom non-shared

partners

Calibrate against KEGG maps

Infer associations in other species

Combine evidence from experiments

Page 69: Introduction to STRING

calibrate against KEGG

Page 70: Introduction to STRING
Page 71: Introduction to STRING

transfer by orthology

Page 72: Introduction to STRING
Page 73: Introduction to STRING

orthologous groups

Page 74: Introduction to STRING
Page 75: Introduction to STRING

fuzzy orthology

Page 76: Introduction to STRING

?

Source species

Target species

Page 77: Introduction to STRING

combine all evidence

Page 78: Introduction to STRING
Page 79: Introduction to STRING

AcknowledgmentsThe STRING team (EMBL)– Christian von Mering– Berend Snel– Martijn Huynen– Sean Hooper– Samuel Chaffron– Julien Lagarde– Mathilde Foglierini– Peer Bork

Literature mining project(EML Research)– Jasmin Saric– Rossitza Ouzounova– Isabel Rojas

Page 80: Introduction to STRING

Thank you!