Presentation alfonso romero

20
ConSAT: a database and tool for protein function prediction using consensus domain architectures Alfonso E. Romero [1], Tamás Nepusz [2], Rajkumar Sasidharan [3], Alberto Paccanaro [1] [email protected] http://www.cs.rhul.ac.uk/~aeromero http://paccanarolab.org [1] Royal Holloway, University of London [2] Model College, Molde (Norway) [3] UCLA

Transcript of Presentation alfonso romero

ConSAT: a database and tool for protein function prediction using consensus domain architectures

Alfonso E. Romero [1], Tamás Nepusz [2],Rajkumar Sasidharan [3], Alberto Paccanaro [1]

[email protected] – http://www.cs.rhul.ac.uk/~aeromero http://paccanarolab.org

[1] Royal Holloway, University of London[2] Model College, Molde (Norway)

[3] UCLA

Outline

1.Domains and function prediction

2.The ConSAT method for function prediction

3.ConSAT: web server and database

● Domain → conserved part of protein sequence– Stable and exist in many proteins

– Evolve independently

– Often autonomous folding units

– Individual function

● Example: zinc finger domains

Domains and function prediction

Domains and function prediction

● Single domain proteins:

Protein ↔ Domain ↔ Function

● Multiple domain proteins (most common case):

Protein Functions?

Domain1Domain2…DomainN

Domains and function prediction

● Domain combination can create new functions

● Domain combination can modify or suppress any of the individual domain functions

● Function is determined by the domain arrangement (architecture) and not just by the aggregation of the individual domain functions.

Bashton M, Chothia C. The generation of new protein functions by the combination of domains. Structure. 2007 Jan;15(1):85-99.

Domains and function prediction

● Protein architecture– Domain juxtaposition

– Domain insertion

Insertions can be recursive

Aroul-Selvam R, Hubbard T, and Sasidharan R. Domain insertions in protein structures. J Mol Biol. 2004. 338(4):633-41.

Domains and function prediction

● Finding domains:– Computational models (signatures)

– Domain databases (Pfam, CATH-Gene3D, PANTHER, …)

– Use InterPro as a starting point:● Agglutinates the main databases ● Common name for several signatures of the same domain (IPR

domain identifiers)● InterPro2GO (IPR domains → GO terms)● Issues:

– Different databases do not agree in the output– Different domain boundaries for same IPR domain

Domains and function prediction

● Issues (graphically):– 2 and 4 overlap (one of them is probably wrong)

– 1 and 3 seem to be the same

● Solution: Consensus domain architecture

The ConSAT method for FP

Two steps:

1.Given the InterPro output for a sequence, obtain the consensus domain architecture

2. Assign functions to each architecture

The ConSAT method for FP

● Step 1: consensus domain architecture

The ConSAT method for FP

● Step 2: function prediction methods (GO terms)

The ConSAT method for FP

● Step 2: function prediction methods (weighted English words)

Prot 1Prot 2

...Prot N

Abs 1Abs 2

...Abs M

CleaningStopwordsStemmingTF x IDF

Retina 0.356Cancer 0.281Immune 0.148Mammal 0.121

...

1

Abs 1Abs 2

...Abs M

2 3

ConSAT: web server and database

http://paccanarolab.org/consat

https://github.com/alfonsoeromero/ConSAT

@consat_web

www.facebook.com/consatweb

ConSAT: web server and database

● Web server: run ConSAT given a set of sequences + InterPro

● Database: precomputed architectures and functions for all UniProtKB sequences. Easily accessible. Also raw datasets.

ConSAT: web server and database

● Database:– Search facilities (by gene(s), by protein(s), GO

term, IPR domain, by words…)

– Detail pages for protein, architecture, domain and word

→ Try it now!

→ Feedback is accepted (and very useful!)

→ Suggestions and other opinions as well!

ConSAT: web server and database

ConSAT: web server and database

ConSAT: web server and database

ConSAT: web server and database

Thanks for your attention!

Questions, comments?

Anyone hiring? :)