EBI is an Outstation of the European Molecular Biology Laboratory. Protein Function Prediction From...
-
Upload
dora-holland -
Category
Documents
-
view
219 -
download
4
Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. Protein Function Prediction From...
EBI is an Outstation of the European Molecular Biology Laboratory.
Protein Function Prediction From StructureIn Structural Genomics
Its Contribution To The Study Of Health And Disease
•James Watson
Erice 40th School of Crystallography
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/082
UniProt Growth
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
2,000,0000
1/0
9/8
6
01
/03
/87
01
/09
/87
01
/03
/88
01
/09
/88
01
/03
/89
01
/09
/89
01
/03
/90
01
/09
/90
01
/03
/91
01
/09
/91
01
/03
/92
01
/09
/92
01
/03
/93
01
/09
/93
01
/03
/94
01
/09
/94
01
/03
/95
01
/09
/95
01
/03
/96
01
/09
/96
01
/03
/97
01
/09
/97
01
/03
/98
01
/09
/98
01
/03
/99
01
/09
/99
01
/03
/00
01
/09
/00
01
/03
/01
01
/09
/01
01
/03
/02
01
/09
/02
01
/03
/03
01
/09
/03
01
/03
/04
01
/09
/04
01
/03
/05
Date
Nu
mb
er
of
en
trie
s
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/083
PDB Growth
0
5000
10000
15000
20000
25000
30000
35000
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Year
Nu
mb
er
of
str
uc
ture
s
New deposits Total Entries
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/084
From Genome To Proteome?
• Determine every structure?
• Homology Modelling?
- Too expensive and time consuming
- Ultimately might be technically impossible?
- Find closest sequence match in PDB and use as start point to simulate structure
- Dependant on widespread fold coverage
Estimate of 20,000 to 25,000 genes in the human genome
Structural Genomics
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/085
Structural Genomics Aims
?
Pathogens and disease
Human proteins
Coverage of fold space
Automation / high throughput
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/086
Structural Genomics Collaborators
MCSG – Mid-west Centre for Structural Genomics
SPINE – Structural Proteomics in Europe
SGC – Structural Genomics Consortium
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/087
MCSG pipeline
Structural andStructural andFunctional Analysis
Functional Analysis
Automated C
rysta
l
Automated C
rysta
l
Mountin
g and
Mountin
g and
Structu
re Refi
nemen
t
Structu
re Refi
nemen
t
Red
uct
ive
Met
hyl
atio
nR
edu
ctiv
e M
eth
ylat
ion
Dom
ain
Def
init
ion
Dom
ain
Def
init
ion
Web Site
Web Site
Workshops
Workshops
Publications
Publications
Cleavage On Column
Cleavage On Column
New Tags andNew Tags andExpression SystemsExpression Systems
Targe
t Ref
inem
ent
Targe
t Ref
inem
ent
Dom
ain
Pars
ing
Dom
ain
Pars
ing
FUNCTIONALSTUDIES
GENOMICSEQUENCES
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/088
MCSG Collaboration
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/089
How Do We Define Function?
Question: What is the function of a cooker?
Burn Natural Gas?Boil water?
Grill fish?
Kitchen appliance?
Bake pie?
Central Heating?Gene OntologyEC nomenclature
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0812
Function Prediction
Also Known As: Guess Who?
• Ask the right questions
• Ask enough questions
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0813
A Friendly Warning!
when computers are applied to biology, it is vital to understand the difference between mathematical & biological significancecomputers don’t do biology
In short, you need to determine whether the information is reliable or not
they're often misleading & sometimes wrong!Don't always believe what databases tell you
they're sometimes misleading & occasionally wrong!Don't always believe what lecturers tell you
they're often misleading & sometimes wrong!
Don't always believe what programs tell you
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0814
Methods
Evolutionary relationships
Metabolome
Genome organisation Biological
multimeric state
Electrostatics
OHH
NO
R R'
R''NH
O O R''
R R'
+
Catalytic clusters, mechanisms & motifs Ligands
Clefts and surfaces
MACiE
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0815
Residueconservation
Sequence motifs(PROSITE, BLOCKS,SMART, Pfam, etc)
Sequence scans
Sequence search vs Uniprot and PDB
Gene neighbours
Superfamily HMMlibrary
Fold search (SSM and DALI)
Surface clefts
Nest analysis
Reverse templates
Structure scans
Templates
Enzyme active sites
Ligand binding sites
DNA binding sites
Laskowski RA, Watson JD & Thornton JM (2005).ProFunc: a server for predicting protein function from 3D structure.
Nucleic Acids Res., 33, W89-W93.
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0816
Templates
GARTfaseCholesterol oxidaseIIAglc histidine kinase
Carbamoylsarcosineamidohhydrase
Dihydrofolate reductase Ser-His-Aspcatalytic triad
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0817
Reverse Templates3-residue templates
2
4 5
87
3
6
9
…
1
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0818
Template Matching – False Positives
Cambridge Erice Barcelona
Car Park
ChurchLibrary
High False Positive
Marsala
No False Positives
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0819
Comparison Of Template Environments
Template structure – 1mbb
Arg
Glu
Ser
Match to template:
Query structure – 1hsk
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0820
Template structure – 1mbb
Arg
Glu
Ser
Match to template:
Query structure – 1hsk
Comparison Of Template Environments
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0821
Template structure – 1mbb
Identical residues in neighbourhood:
Query structure – 1hsk
Comparison Of Template Environments
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0822
Template structure – 1mbb
Arg
Glu
Ser
Similar residues in neighbourhood:
Query structure – 1hsk
Comparison Of Template Environments
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0823
Tempura Templates – User Refined Approach
X
X
X
Arg 295Asp296Gly 297Ala 298Gly 299His 300Tyr 301Gly 302
…
…
List of Structures
Single Structure
Templates
www.ebi.ac.uk/thornton-srv/databases/tempura/
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0824
It can be hard to judge whether something “makes sense”.
The lack of labeling on many web pages makes it hard to know the source.
Calculations based on databases are even harder to deal with
Functional Annotation Transfer
???
Logical deductions may be worse.
“tacR gene regulates the human nervous system”
“tacQ gene is similar to tacR but is found in E. coli”
“so tacQ gene regulates the E. coli nervous system”
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0825
It’s not all depressing though…..
“Sigh!”
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0826
MCSG as a test dataset (1)Reference: Watson et al (2007), J.Mol.Biol. 367,1511-1522
282 non-redundant
UnknownFunction
PutativeFunction
KnownFunction
30% Seq ID
33% (93)
47%
20%
319 MCSGStructures
(Sep 2005)
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0827
MCSG as a test dataset (1)
KnownFunction Results
BackdatingTop Hit
Manual Checking
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0828
Manual Assessment of Methods
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
SIT
DNA
LIG
ENZ
SSM SSM protein fold match
Enzyme active site templates
Ligand-binding templates
DNA-binding templates
Reverse templates
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0829
PDB entry: 1m33
• BioH protein from Escherichia coli • Contains Pfam domain
• PF00561: alpha/beta hydrolase fold common to a number of hydrolytic enzymes
Example 1: Predicted function confirmed
R.Sanishvili et al. (2003). Integrating Structure, Bioinformatics, and Enzymology to Discover Function. J. Biol. Chem. 278(28): 26039-26045.
• Search against the enzyme templates database provides a significant hit to Ser-His-Asp catalytic triad (rmsd = 0.28 Å)
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0830
Example 1: Predicted function confirmed
Catalytic triad = lipase, protease, or esterase activity?
Serine nucleophile (Ser82) is located within one of the two Gly-Xaa-Ser-Xaa-Gly motifs present
= acyltransferase or thioesterase activity ?
Experimentally demonstrated carboxylesterase activity (EC 3.1.1.1).
A novel carboxylesterase with broad substrate specificity and a preference for short chain substrates.
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0831
PDB entry: 2plm
• Tm0936 from Thermotoga maritima • Contains Pfam domain
• PF01979: Amidohydrolase family which contains a number of deaminases and is part of a wider Amidohydrolase superfamily clan.
Example 2: Identifying previously published function
J.C.Hermann et al. (2007). Structure-based activity prediction for an enzyme of unknown function. Nature, 448, 775-779.
• Publication suggests function: adenosine deaminase (E.C.3.5.4.4)
• Performed targeted docking of high-energy metabolic intermediates• Results dominated by adenine analogues undergoing C6-deamination• Structure determined with S -adenosylhomocysteine (SAH)• Clone provided by the JCSG
Method:
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0832
2plm
1a4l
2plmTemplate Searches
1a4l
Sequence Identity = 17.5%, Local sequence identity = 27.7%. Structural Similarity = 95%
• Strong Enzyme template match (e-value = 2.45 E-04) • Structure: Adenosine deaminase (E.C.3.5.4.4)
Identifying previously published function
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0833
Example 3: Putative Function
Function prediction: Metallo-beta-lactamase/nuclease
or phosphodiesterase
APC29563: Crystal structure of a hypothetical protein from Enterococcus faecalis V583.
Evidence:Sequence: Superfamily hit to metallo-hydrolase/oxidoreductaseBLAST hits to putative metallo-beta-lactamases
Fold, Ligand Templates (Zn) and Reverse Templates:Hits to metallo-beta-lactamase proteins and RNA degradation enzymes.
Some metallo--lactamases have been shown to have phosphodiesterase activity
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0834
First pass screening
Phosphatase
Phosphodiesterase
Dehydrogenase
NADPH Oxidase
Oxidase
Protease Lipase
Thioesterase
Amino Acids AcidsSugarAlcoholAldehyde
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0835
Phosphodiesterase Assay
0
0.5
1
1.5
2
2.5
3
3.5
4
2’3’ Cyclic mononucleotides are preferred substrates
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0836
Detailed assays
1. Preferred metal • Cobalt gives strongest
activity
0
0.5
1
1.5
2
2.5
none Mg Mn Ni Co Zn Cu Ca
1.
2. Saturation curve • 2’3’ cAMP saturation curve calculated• Suggests kinetics:
i. Km near 1.2 mM
ii. Vmax near 2.9 mmol min-1 mg-1
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6 71.
2. 3. No preference for 2’ or 3’ position
2’cAMP
3’cAMP
Adenosine
2’3’cAMP
Samples
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0837
Putative Function
1. Shown phosphodiesterase activity against 2’3’ cyclic mononucleotides
2. Interestingly, is structurally similar to PDB entry 2dkf • Identified by fold and reverse template matches • Published as an RNA degradation protein of the metallo-beta-
lactamase superfamily
3. Possible RNA degradation protein?
More to be done……
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0838
Problem Structures
Hypothetical protein from Bacillus subtilis (PDB entry 1Q8B)
1. Sequence Methods
2. Fold Comparison
3. Templates
4. Reverse templates
NAMSANDKLTILW
No Motifs
Hypothetical ProteinsHypothetical Proteins
Lots of Hits Hypothetical Proteins
Plant “Stable” Proteins
No Hits
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0839
Medically relevant structuresStudying the molecular basis for ligand selectivity in a family of transcriptional regulators from Pseudomonas aeruginosa
Dimeric transcription factors which respond to small phenolic molecules and are responsible for antibiotic resistance.
13 PA sequences 5 structures solved, 1 with ligand bound
Implications to Cystic Fibrosis patients
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0840
The Future Of Structural Genomics?
NIAID Category A, B, and C Priority Pathogens
Viral hemorrhagic fevers Toxoplasma Rabies
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0841
Future Methods
Using binding site 3D atomic similarities to predict ligand binding &
Protein Function
CleftXplorer IsoCleft Finder
Abdullah Kahraman Rafael Najmanovich
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0842
CleftXplorer - Algorithm for shape comparison
Kahraman, A., Morris, R. J., Laskowski, R. A. & Thornton, J. M. (2007). Shape variation in protein binding pockets and their ligands.
J Mol Biol 368, [email protected]
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0843
2pd0: Cryptosporidium parvum protein, unknown function
•No sequence- or structure-based hints from Profunc•Bound MES (blue) used to define binding site
•Top 2 hits (red and green respectively) are analogs of the product and substrate of the same reaction in Humans and E.coli.
+ +Purine nucleoside phosphorylase:
Putative function:
IsoCleft Finder – query a database of binding sites
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0844
Conclusions
- Need to use as many techniques as possible
- Successes but still need new methods and assays
- In ProFunc the fold and reverse templates most successful
HTP ligand binding assays, HTP enzyme assays, IsoCleft, CleftXplorer, etc.
EBI is an Outstation of the European Molecular Biology Laboratory.
EBI Resources And Services
Outreach and training
•James Watson
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0846
Interactive training for all levels of experience1.Hands-on training in our purpose-built IT training
suite at EMBL-EBI, Hinxton, Cambridge2.EBI Roadshows bring expert trainers in our resources
to your site with a variety of modules on offer
3.New e-learning platform currently in development 4. Full programme at www.ebi.ac.uk/training/
Wellcome Images
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0847
Coming up in our Hands-on Training2008
Patterns, similarities and differences in biological data
9–11 June
28–31 July Programmatic access of Proteomics ResourcesInteractions and Pathways26–27 August
1–3 September8–11 September
ENFIN Advanced Course on Protein Function PredictionProgrammatic access in Perl: webservices and workflowsA two-day dip into the EBI’s resources6–8 October
24–27 November
Programmatic access in Java: webservices and workflows
Transcriptomics resources and data analysis
19–22 January23–26 February16–18 March
27–29 April
Bioinformatics resources for protein structureSequence to genes: genome informatics
A walk through EBI Bioinformatics Resources
Programmatic access to biological databases
11–15 May
2009
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0848
The Bioinformatics Roadshow 06/07
Stanford and UCSD, Jun 06
(all core
services)
Leuven, Oct 06
(MSD) Cambridge, Nov
06 (MSD)
Portsmouth,
Nov 06 (MSD)
Melbourne, Jan
07 (proteomics)
Harvard & MIT Mar 07 (all core
services)
Trieste (ICGEB, Jun 07 (all core
services)
Subscribe to the EBI-FELICS Roadshows calendar at http://www.google.com/calendar/
Liverpool, Apr 07 (modules
tbc)
Valencia, Apr 07 (modules tbc;
BioSapiens)
Oxford, Dec 06 (all core
services)
Exeter, Feb 07
(MSD)
Basel, Sep 07 (modules tbc;
BioSapiens)
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0849
Roadshow modulesGenomes
Ensembl, EMBL-Bank, Integr8
GenomesEnsembl, EMBL-Bank, Integr8
TranscriptomesArrayExpress,
Expression Profiler,
R/Bioconductor
TranscriptomesArrayExpress,
Expression Profiler,
R/Bioconductor
ProteomesUniProt, InterPro,
IntAct, PRIDE, OLS
ProteomesUniProt, InterPro,
IntAct, PRIDE, OLS
StructuresMSD, PDBSum,
ProFunc
StructuresMSD, PDBSum,
ProFunc
PathwaysReactome, BioModels, BRENDA
PathwaysReactome, BioModels, BRENDA
Mini modulesWeb services; BioMart;
SRS; ChemistryGO/GOA; Alignments;
Literature
Mini modulesWeb services; BioMart;
SRS; ChemistryGO/GOA; Alignments;
Literature
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0850
eLearning pilot project
Sequence searching1. Introduction2. BLAST for beginners3. Intermediate BLAST4. Patterns, profiles & HMMs5. Other tools: SSAHA,
FASTA, MPSrch
Sequence searching1. Introduction2. BLAST for beginners3. Intermediate BLAST4. Patterns, profiles & HMMs5. Other tools: SSAHA,
FASTA, MPSrch
For each module…1. Video tutorial2. Print tutorial3. Key concepts quiz4. Reflective tasks
For each module…1. Video tutorial2. Print tutorial3. Key concepts quiz4. Reflective tasks
More to come …1. Basic and advanced courses on
core data resources2. Web services3. Structural Biology Resources
More to come …1. Basic and advanced courses on
core data resources2. Web services3. Structural Biology Resources
Looking for beta-testers!
Structure To Function In Structural Genomics:Contribution To The Study Of Health And Disease
07/06/0851
Acknowledgements
• Funding: MCSG, NIH/NIGMS PSI, BioSapiens
• Structures: MCSG + many others
• Enzyme Assays:
Alexei Savchenko, Alexander Yakunin, Mike Proudfoot.
• Thornton Group
• EBI Outreach and Training Team
• Organisers and of course, you!