From ELMs to function: interaction networks and feature spaces

17
From ELMs to Function: Interaction Networks and Feature Spaces Lars Juhl Jensen EMBL Heidelberg

description

7th ELM meeting, ISG Hotel, Heidelberg, Germany, February 28, 2004

Transcript of From ELMs to function: interaction networks and feature spaces

Page 1: From ELMs to function: interaction networks and feature spaces

From ELMs to Function:Interaction Networks and Feature Spaces

Lars Juhl JensenEMBL Heidelberg

Page 2: From ELMs to function: interaction networks and feature spaces

Function unknown for 40% of human proteins

Page 3: From ELMs to function: interaction networks and feature spaces

1AOZ (129 aa) vs. 1PLC (99 aa)

scoring matrix: BLOSUM50, gap penalties: -12/-215.5% identity; Global alignment score: -23

10 20 30 40 50 601AOZ SQIRHYKWEVEYMFWAPNCNENIVMGINGQFPGPTIRANAGDSVVVELTNKLHTEGVVIH .. .. : ... . . ..: . :...: . .: ...:. 1PLC ---------IDVLLGA---DDGSLAFVPSEFS-----ISPGEKIVFK-NNAGFPHNIVFD 10 20 30 40

70 80 90 100 110 1201AOZ WHGILQRGTPWADGTASISQCAINPGETFFYNFTVDNPGTFFYHGHLGMQRSAGLYGSLI .: :. . . : . :::: .. . .:. : : ::. :.. 1PLC EDSI-PSGVDASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQG----AGMVGKVT 50 60 70 80 90

1AOZ VDPPQGKKE :. 1PLC VN-------

Page 4: From ELMs to function: interaction networks and feature spaces

Structural similarity can be deceiving: Two structures from the Cupredoxin superfamily

Enzyme Non-enzyme

Page 5: From ELMs to function: interaction networks and feature spaces

ProtFun: Prediction of protein function from post-translational modifications

Page 6: From ELMs to function: interaction networks and feature spaces

# Functional category 1AOZ 1PLC Amino_acid_biosynthesis 0.126 0.070 Biosynthesis_of_cofactors 0.100 0.075 Cell_envelope 0.429 0.032 Cellular_processes 0.057 0.059 Central_intermediary_metabolism 0.063 0.041 Energy_metabolism 0.126 0.268 Fatty_acid_metabolism 0.027 0.072 Purines_and_pyrimidines 0.439 0.088 Regulatory_functions 0.102 0.019 Replication_and_transcription 0.052 0.089 Translation 0.079 0.150 Transport_and_binding 0.032 0.052

# Enzyme/nonenzyme Enzyme 0.773 0.310 Nonenzyme 0.227 0.690

# Enzyme class Oxidoreductase (EC 1.-.-.-) 0.077 0.077 Transferase (EC 2.-.-.-) 0.260 0.099 Hydrolase (EC 3.-.-.-) 0.114 0.071 Lyase (EC 4.-.-.-) 0.025 0.020 Isomerase (EC 5.-.-.-) 0.010 0.068 Ligase (EC 6.-.-.-) 0.017 0.017

Protein features determine function

Page 7: From ELMs to function: interaction networks and feature spaces

Feature-functioncorrelations

• Transmembrane helices predictive of– Receptors

– Transporters

– Ion channels

• Subcellular localization– Receptors

– Transcription (regulation)

• S/T-phosphorylation– Transcription regulation

Page 8: From ELMs to function: interaction networks and feature spaces

ELMer hunting Bugs: “Heeeey, there's something awfly scwewy going on awound here”

• The idea: compare GO annotation of ELMs with GO term of ELM containing proteins– Color shows the correlation between a GO

term and ELM matches

– Black dots denote annotated GO terms

• Lack of correlations need not be a problem

• But how come ...– LIG_Dynein_DLC8_1 is not annotated as

intracellular protein transport?

– LIG_TRP is not stress response?

– LIG_WRPW_1 and 2 are not involved incell differentiation and development?

– MOD_ASX_betaOH_EGF is not cell differentiation (and perhaps development)?

Page 9: From ELMs to function: interaction networks and feature spaces

And now for something completely different: Protein association networks

Genomic Neighborhood

Species Co-occurrence

Gene Fusions

Database Imports

Exp. Interaction Data

Co-expression

Literature co-occurrence

Page 10: From ELMs to function: interaction networks and feature spaces

Integrating physical interaction screens

• All screens are not equal– Complex purification vs. Y2H

– Quality varies greatly

• All interactions within a screen are not equal– Quality measure for each type

– Benchmarking against KEGG

• Combination of evidence from multiple screens

• Cross-species transfer of interaction evidence

Page 11: From ELMs to function: interaction networks and feature spaces

Mining microarray expression databases

Re-normalize arraysby modern methodto remove biases

Re-normalize arraysby modern methodto remove biases

Buildexpression

matrix

Buildexpression

matrix

Combinesimilar arrays

by PCA

Combinesimilar arrays

by PCA

Construct predictorby Gaussian kerneldensity estimation

Construct predictorby Gaussian kerneldensity estimation

Calibrateagainst

KEGG maps

Calibrateagainst

KEGG maps

Transferassociations

across species

Transferassociations

across species

Page 12: From ELMs to function: interaction networks and feature spaces

Co-mentioning in the scientific literature

Associate abstracts with speciesAssociate abstracts with species

Identify gene names in title/abstractIdentify gene names in title/abstract

Count (co-)occurrences of genesCount (co-)occurrences of genes

Test significance of associationsTest significance of associations

Calibrate against KEGG mapsCalibrate against KEGG maps

Transfer associations across speciesTransfer associations across species

Page 13: From ELMs to function: interaction networks and feature spaces

Extracting transient interactionsthrough data integration

Page 14: From ELMs to function: interaction networks and feature spaces

Mining for ELM mediated interactions

• ELM pattern matching against D. melanogaster SP-proteome using species and domain filters

• Assignment of SMART domains

• Find pairs of proteins having a SMART domain and the corresponding ligand ELM

• Overlay with Y2H protein interaction set by Curagen

Page 15: From ELMs to function: interaction networks and feature spaces

Summary: Have ELMs – want function

• There is a huge potential in using ELMs in addition to domains for function prediction

• Conservation of protein features (such as ELMs) in orthologs underlines their importance for protein function

• Integration of ELMs with other evidence types can be used to extract likely (transient) ELM mediated interactions

• Work still remains to be done:– The false positive rate is still too

high for predictive purposes

– Better ELM models are needed

– Better filters are needed

• Wild and crazy ideas– Overlay SMART/ELM pairs with

STRING predicted associations

– Functional associations from ELM/SMART vectors

Page 16: From ELMs to function: interaction networks and feature spaces

Acknowledgments

• You – the ELM team

• DisEMBL team– Rune Linding

– Francesca Diella

– Peer Bork

– Toby Gibson

– Rob Russell

• The STRING team– Peer Bork

– Christian von Mering

– Berend Snel

– Martijn Huynen

– Daniel Jaeggi

– Steffen Schmidt

• ArrayProspector– Julien Lagarde

• NetView– Sean Hooper

• The ProtFun team– Søren Brunak– Ramneek Gupta– Can Kesmir– Kristoffer Rapacki– Hans-Henrik Stærfeldt– Henrik Nielsen– Nikolaj Blom– Claus A.F. Andersen– Anders Krogh– Steen Knudsen– Chris Workman

• The EUCLID team– Alfonso Valencia– Damien Devos– Javier Tamames

Page 17: From ELMs to function: interaction networks and feature spaces

Thank you!