[email protected] DIMACS 2006 Quality and effectiveness of protein structure models.
-
Upload
barnaby-dean -
Category
Documents
-
view
219 -
download
1
Transcript of [email protected] DIMACS 2006 Quality and effectiveness of protein structure models.
Molecular function
Molecular structure
Sequence
Th
e p
ara
dig
m
Protein No.FLAV_CLOBE 1 A . . . I V Y W S G T G N T E K M A ECYSJ_THIRO 2 A . I T I L F G S Q T G N A K A V A E
…
Dete
cti
ng
hom
olo
gy
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1.0 0.8 0.6 0.4 0.2 0
Fraction sequence identity after structural superposition
r.m
.s.d
. =
[(1
/N)
Σ d
2]1
/2
Chothia and Lesk, EMBO J., 1986
Pro
tein
s e
volv
e
AVGIFRAAVCTRGVAKAVDFVP
AVGIFRAAVCTRGVAKAVDFVP| || | | || ||||| ||AIGIWRSATCTKGVAKA--FVA
+
If If the alignment is correct, we can use the Chothia and Lesk relationship to predict the expected quality of the modelC
om
para
tive m
od
ellin
g
Orengo, Curr. Op. Str. Biol, 1994
AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP
Score and select modelFold
recog
nit
ion
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Fra
gm
en
t b
ased
Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP…AVGIFR AAVCTR GVAKAVDF
Score and select modelFra
gm
en
t b
ased
Moult et al., Proteins, 1995
CASP: Critical assessment of techniques for protein structure predictionAVSRAFT
RAFTAAFDGHTYIPK
Th
e e
valu
ati
on
Tramontano, NSB, 2003
Mod
els
Targ
ets
Gro
up
s
Th
e e
valu
ati
on
0
50
100
150
200
250
300
0
10
20
30
40
50
60
70
1 2 3 4 5
0
5000
10000
15000
20000
25000
30000
6
Cozzetto and Tramontano, Proteins, 2004
CASP4 CASP5 CASP6 : Best models
20,00
30,00
40,00
50,00
60,00
70,00
80,00
90,00
100,00
110,00
120,00
0 20 40 60 80
Max
P.A
L0 casp6
casp4
casp5
Th
e e
valu
ati
on
http://predictioncenter.govS
tate
of
the a
rt
Moult et al., Proteins, 2005.
http://www.caspur.it/PMDB
Castrignano’ et al., NAR, 2006.
Sta
te o
f th
e a
rt
Str
uctu
ral g
en
om
ics
Protein crystallization
Diffraction datameasurements
Model building Phase estimation
Protein preparation
Mole
cu
lar
rep
lacem
en
t
}Rotation
search
Translation search ?
Model
Mole
cu
lar
rep
lacem
en
t
Mole
cu
lar
rep
lacem
en
t
Completely automatic procedure:
CASP ModelsMolRep (10x10)AMoRe. (20)RefMac (10)
ArpWarp
Giorgetti et al., Bioinformatics, 2005
100
80
60
40
?
Mole
cu
lar
rep
lacem
en
tGDT-TS (distance based measure)= [NCA(1Å)+NCA (2Å)+NCA (4Å)+NCA (8Å)]/4
Giorgetti et al., submittedMole
cu
lar
rep
lacem
en
t
What if we don’t know the quality of the model?
What if we don’t know how to build models?
Mole
cu
lar
rep
lacem
en
t
Giorgetti et al., submitted
ACTFGARTEADEASRTFCGAVHIGFRLPMNHTYWPLYHMVCS…
Structure factors
Mole
cu
lar
rep
lacem
en
t60% success rate
Mole
cu
lar
rep
lacem
en
t60% success rate
If one of the retrieved models
works, the procedure is successful
molecular
bio
log
ical
bloodcoagulation
catalityc activity
cellu
lar
extra cellular
Fu
ncti
on
pre
dic
tion
Moult et al., Proteins, 1995
AVSRAFTRAFTAAFDGHTYIPK
The experiment
?
Scheme of the experiment
Collect known info on targets
Ask people to provide ADDITIONAL information
Compare predictions
Is there a consensus?
Once the structure is known, can we say more?
Fu
ncti
on
pre
dic
tion
EC Number BindingBinding site(s)Residue role(s)PT modificationsFree text comments
Soro and Tramontano, Proteins 2005
Fu
ncti
on
pre
dic
tion
We had too few predictions per target to derive any
sensible conclusion.
However, for the sake of the experiment, we tried to see what we could do and which
would be the problems in analysing the data (other
than the format) pretending that the numbers were
significant.
Fu
ncti
on
pre
dic
tion
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency
287 magnesium ion binding 1
4176 ATP-dependent peptidase activity 1
4475 mannose-1-phosphate guanylyltransferase activity 1 1
4476 4672 protein kinase activity 1
5094 Rho GDP-dissociation inhibitor activity 1
5554 Molecular function unknown 1 -
6812 PROCESS (1)
6825 PROCESS (1)
8170 N-methyltransferase activity 1
16822 hydrolase activity, acting on acid carbon-carbon bonds 1
46872 metal ion binding 1
Fu
ncti
on
pre
dic
tion
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency GO Parents
287 magnesium ion binding 1 46872, 43167, 5488
4176 ATP-dependent peptidase activity 1 8233, 16787, 3824
4475 mannose-1-phosphate guanylyltransferase
activity 1 8905, 16779, 16772, 16740, 3824
4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824
5094 Rho GDP-dissociation inhibitor
activity1 1 5092, 5083, 30695, 30234
8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824
16822 hydrolase activity, acting on
acid carbon-carbon bonds 1 16787, 3824
46872 metal ion binding 1 43167, 5488
Fu
ncti
on
pre
dic
tion
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency GO Parents
287 magnesium ion binding 1 46872, 43167, 5488
4176 ATP-dependent peptidase activity 1 8233, 16787, 3824
4475 mannose-1-phosphate guanylyltransferase
activity 1 8905, 16779, 16772, 16740, 3824
4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824
5094 Rho GDP-dissociation inhibitor
activity1 1 5092, 5083, 30695, 30234
8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824
16822 hydrolase activity, acting on
acid carbon-carbon bonds 1 16787, 3824
46872 metal ion binding 1 43167, 5488
Fu
ncti
on
pre
dic
tion
Summary table for target T0230Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group:
General function prediction only; Category: Poorly characterized)
Predictions:
GO number GO name frequency GO Parents
287 magnesium ion binding 1 46872, 43167, 5488
4176 ATP-dependent peptidase activity 1 8233, 16787, 3824
4475 mannose-1-phosphate guanylyltransferase
activity 1 8905, 16779, 16772, 16740, 3824
4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824
5094 Rho GDP-dissociation inhibitor
activity1 1 5092, 5083, 30695, 30234
8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824
16822 hydrolase activity, acting on
acid carbon-carbon bonds 1 16787, 3824
46872 metal ion binding 1 43167, 5488
16787 hydrolase 3824 catalyitic activity16740 transferase activity
Fu
ncti
on
pre
dic
tion
Target Group Mod GO GO name Target Group Mod GO GO name T0226 P0009 1 4347 glucose-6-phosphate isomerase activity T0263 P0049 1 3754 chaperone activity
P0050 1 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 1 5098
Ran GTPase activator activity
P0050 3 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 2 5098
Ran GTPase activator activity
P0070 1 4347 glucose-6-phosphate isomerase activity P0070 1 4497 monooxygenase activity
P0344 1 4360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity P0237 1 16491
oxidoreductase activity
Cons Transferase activity/Isomerase activity P0344 1 3676 nucleic acid binding
T0243 P0050 2 3980 UDP-glucose:glycoprotein glucosyltransferase activity
Cons
Binding/Oxidoreductase activity/Enzyme regulator activity
P0050 3 4581 dolichyl-phosphate beta-glucosyltransferase activity T0266 P0003 1 3723
RNA binding
P0070 1 3700 transcription factor activity P0049 1 3754 chaperone activity P0237 1 3677 DNA binding P0050 1 4587 ornithine-oxo-acid transaminase activity P0344 1 3677 DNA binding P0050 3 4047 aminomethyltransferase activity Cons DNA binding/Transferase activity P0096 1 4827 proline-tRNA ligase activity T0249 P0003 1 3677 DNA binding P0237 1 166 nucleotide binding P0070 1 3700 transcription factor activity P0726 1 4812 tRNA ligase activity
P0100 1 3700 transcription factor activity
Cons
Binding/ Transport activity/ Transferase activity
P0237 1 3677 DNA binding P0344 1 3677 DNA binding P0589 1 3700 transcription factor activity
Cons
DNA binding/Transcription factor activity
Results: GO consensus
Soro and Tramontano, Proteins, 2005
Fu
ncti
on
pre
dic
tion
18 months later…
Annotations in DB decreased by 5%
24 new targets were annotated
We looked at methods (abstracts, directly contacting predictors, literature)
Fu
ncti
on
pre
dic
tion
4
5
2
2
2
1
11
11011
10011
10100
10001
10000
11100
10101
11001Fu
ncti
on
pre
dic
tion
18 months later…
4 newly annotated targets had been correctly predicted by at least one method
85% of the consensus non
redundant predictions were correct
Fu
ncti
on
pre
dic
tion
Target Group Mod GO GO name Target Group Mod GO GO name T0226 P0009 1 4347 glucose-6-phosphate isomerase activity T0263 P0049 1 3754 chaperone activity
P0050 1 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 1 5098
Ran GTPase activator activity
P0050 3 16429 tRNA (adenine-N1-)-methyltransferase activity P0050 2 5098
Ran GTPase activator activity
P0070 1 4347 glucose-6-phosphate isomerase activity P0070 1 4497 monooxygenase activity
P0344 1 4360 glutamine-fructose-6-phosphate transaminase (isomerizing) activity P0237 1 16491
oxidoreductase activity
Cons Transferase activity/Isomerase activity P0344 1 3676 nucleic acid binding
T0243 P0050 2 3980 UDP-glucose:glycoprotein glucosyltransferase activity
Cons
Binding/Oxidoreductase activity/Enzyme regulator activity
P0050 3 4581 dolichyl-phosphate beta-glucosyltransferase activity T0266 P0003 1 3723
RNA binding
P0070 1 3700 transcription factor activity P0049 1 3754 chaperone activity P0237 1 3677 DNA binding P0050 1 4587 ornithine-oxo-acid transaminase activity P0344 1 3677 DNA binding P0050 3 4047 aminomethyltransferase activity Cons DNA binding/Transferase activity P0096 1 4827 proline-tRNA ligase activity T0249 P0003 1 3677 DNA binding P0237 1 166 nucleotide binding P0070 1 3700 transcription factor activity P0726 1 4812 tRNA ligase activity
P0100 1 3700 transcription factor activity
Cons
Binding/ Transport activity/ Transferase activity
P0237 1 3677 DNA binding P0344 1 3677 DNA binding P0589 1 3700 transcription factor activity
Cons
DNA binding/Transcription factor activity
Results: GO consensus
Soro and Tramontano, Proteins, 2005
Fu
ncti
on
pre
dic
tion
**
*
***
Fu
ncti
on
pre
dic
tion
CASP is about to start again:
We will start collecting targets next week
There will be a few differences
http://predictioncenter.org
An
nou
ncm
en
ts
BioSapiens - EU VI FrameworkMinistero della Salute
Universita' di Roma Istituto Pasteur Roma
Facolta' di Medicina San Paolo
CNR
Claudia Bonaccini Michele CerianiDomenico Cozzetto Emanuela GiombiniAlejandro GiorgettiPaolo MarcatiliVeronica MoreaRomina OlivaMassimiliano OrsiniMarialuisa Pellegrini Domenico Raimondo Simonetta Soro Ivano Talamo
Krzysztof FidelisTim Hubbard
Andriy KryshtafovychJohn Moult
Burkhard RostAdam Zemla
Structural biologistsPredictors
Ackn
ow
led
gem
en
ts