The STRING Database What it does and how it interfaces to other resources The STRING Database What...

Post on 22-Dec-2015

224 views 0 download

Tags:

Transcript of The STRING Database What it does and how it interfaces to other resources The STRING Database What...

The STRING Database

What it does and how it interfaces to other resources

The STRING Database

What it does and how it interfaces to other resources

Christian von Mering, University of Zurich & SIBbigDATA Workshop

- viewers for all types of evidence

- focus on useability and speed

- integrated scoring scheme

- information transfer between species

Genomic Neighborhood

Genes/Species Co-occurence

Gene Fusions

Database Imports

Exp. Interaction Data

Co-expression

Literature co-occurence

STRING http://string-db.org/

http://string-db.org

• 630 organisms

• 2.6 Mio proteins

• 88 Mio interactions

• server-footprint: 320 Gb

Numbers:

networks

Phylogenetic Profiles

Conserved Neighborhood

Gene-Fusions

quantify …

integrate …

Interaction prediction from genome information

“genomic context”

Other Interaction Sources

Interaction Databases Pathway Databases

Reactome

Automated Textmining Interolog Transfer

final interaction score: protein A – protein B 0.856between 0 and 1, pseudoprobability, “likelihood of functional association”

1 – (1 – nscore) * (1 – fscore) * (1 – pscore) * (1 – cscore) * (1 – escore) * (1 – tscore)neighborhood fusion cooccurence coexpression experimental textmining

nscore = 1 – (1 – nscorequery species) * (1 – nscoretransf.)

evidence transfer between speciesinformation transfer betweenspecies either via orthologs(COG database) or via homology

analog for cscore, escore, tscore,...

benchmarking

raw score

KE

GG

per

form

ance

(fra

ctio

n on

sa

me

map

) raw score Example - Neighborhood raw score:

each predictor has its own raw-score regime

gene A gene B

100 bp 6 bp 20 bp

raw score: sum of intergenic distances

The scoring system

The raw score regimes

gene A gene B

100 bp 6 bp 20 bp

raw score: sum of intergenic distances

Neighborhood Phylogenetic profiles

• “similarity profiles”• singular value decomposition

raw score: euklidian distance

filter: downweigh scores for homologous pairs

raw score: constant (0.99)

Fusion experimental interactions• two-hydrid, TAP, annotated complexes, …• topology-based analysis: who with whom, how many other partners?

raw score: various (usually ‘uniqueness’ of interaction).

Co-expression

• download all microarray datasets for a given species• data normalization (spatial correction)

raw score: pairwise pearson-correlation coefficient

Textmining

• download all PubMed abstracts• identify proteins in the abstracts• search for co-mentioned pairs

raw score: log-odds score

User-Experience: Aiming to be Visual and Intuitive

1’000 visits / day800 users / day9’000 pageviews / day> 10’000 DB-queries / day

Citations

2000 NAR Snel et al.

2003 NAR von Mering et al.

2005 NAR von Mering et al.

2007 NAR von Mering et al.

2009 NAR Jensen et al.

80 citations

215 citations

183 citations

189 citations

47 citations

total: 714 citations

Cross-links

SMART: protein domain information

GENECARDS: info and products on human genes

SWISS-MODEL-REPOSITORY: homology models

CYTOSCAPE: access via plug-in architecture

SWISSPROT / UNIPROT: expert protein annotation

Cross-link example

launchSwissModel

Reciprocal View

popup: launchSTRING

Example #1

A missing chaperone for Cytochrome C oxidase

Question: who inserts the Copper-atom into CcO ?

Initial observation:

Example #1

The missing chaperone for Cytochrome C oxidase

Example #1

The missing chaperone for Cytochrome C oxidase

• gene expressed• structure solved• it binds copper !• likely function - copper delivery

Example #2

Simplify discovery in genome-wide association screens ?

Christian von Mering – UZH MolBio – SIB

a) download data in relational database scheme

d) cross-link to server(version controlled, to network, protein, link, ...)

In-House Use of STRING

b) download data ascompact flat-files

e) PSI-MI export

f) [ SOAP / webservices ]

c) in-house installationof webserver

Core organisms:

• include all model organisms (annotated knowledge)

• non-redundant, each genus is covered

• include organisms with functional genomics data

Irrelevant Organisms

[future category]

Version 9.0 – exceeding 1000 genomes

More details & new features

“Payload Display” - Your Own STRING Server

=> “branding” STRING via remote-control: a call-back API

=> “branding” STRING via remote-control: a call-back API

Acknowledgements

The STRING team:

Samuel ChaffronManuel WeissMichael KuhnLars Juhl Jensen

Sean HooperBerend SnelMartijn HuynenPeer Bork

The STRING institutions:

SIB – Swiss Instituteof Bioinformatics

University ofZurich

TU-Dresden,University of Copenhagen

European MolecularBiology Laboratory

“MySTRING”

users can register / login

using OpenID or similar for authentication

persistency of search results (“history”)

store lists / items of interest (“bag of genes”)

users can customize the interface

generate revenue (?)

Feature #2 (Finding Relevant Texts)

Example #2

The missing enzymes for uric acid degradation

Question: why can’t humans degrade uric acid ?

Example #2

The missing enzymes for uric acid degradation

?

?

Example #2

The missing enzymes for uric acid degradation

initial observation:

Example #2

The missing enzymes for uric acid degradation

• genes cloned, expressed• enzymatic activity demonstrated• candidate short-term therapeutics !