Evolving Better Computer Game Algorithms with the Gprotoolkit Genetic Programming system
A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction
-
Upload
natalio-krasnogor -
Category
Education
-
view
654 -
download
1
description
Transcript of A Genetic Programming Challenge: Evolving the Energy Function for Protein Structure Prediction
Evolving energy functionfor protein structure prediction
Paweł [email protected]
Natalio Krasnogor, Jonathan Garibaldi
Department of Computer ScienceBen-Gurion University of the Negev Beer Sheva, Israel
2009-06-30
Outline
1 Introduction
2 Protein energy models
3 Genetic Programming problem formulation
4 Results
5 Conclusions
Pawe l Widera Evolving energy function for PSP 2009-06-30 2 / 26
Protein structure predictionFrom 1D sequence to 3D structure
LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEEQLNRYEMYRRSAFPKAAIKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWGEMPPLQPKHMREAVRRLKSKGQIP
Protein basics20 aminoacidalphabetsequence encodesstructurestructuredetermines activity
Pawe l Widera Evolving energy function for PSP 2009-06-30 3 / 26
International prediction contestCritical Assessment of techniques for protein Structure Prediction
CASP factsbiannual competition started in 1994parallel prediction and experimental verificationmodel assesment by human experts
Prediction difficultycomparative modelling (sequence similarity)fold recognition (new or existing)ab initio modelling (first principles)
Pawe l Widera Evolving energy function for PSP 2009-06-30 4 / 26
Ab initio predictor schemaFrom sequence to the final model
Target sequence
Secondarystructureprediction
Foldrecognition
and threading
Initial ab initio prediction
Optimisation
Clustering
Final models
PSIPRED
SAM-T02
JUFO
PSI-BLAST
Pawe l Widera Evolving energy function for PSP 2009-06-30 5 / 26
Ab initio predictor schemaFrom sequence to the final model
Target sequence
Secondarystructureprediction
Foldrecognition
and threading
Initial ab initio prediction
Optimisation
Clustering
Final models
PSIPRED
SAM-T02
JUFO
PSI-BLAST
Pawe l Widera Evolving energy function for PSP 2009-06-30 5 / 26
The algorithm of foldingAnfinsen’s thermodynamic hypothesis [Anfinsen, 1973]
[Dill and Chan, 1997]
Refolding experimentfolds to the samenative statenative state isenergetically stable
Energy funnelroll down freeenergy hillavoid local minimatraps
Pawe l Widera Evolving energy function for PSP 2009-06-30 6 / 26
Model assesmentCorrelation between energy and similarity to native
Similarity measure
RMSD =
√√√√ 1N
i=N∑i=1
δ2i
Decoys generated byI-TASSER[Wu et al., 2007]
Robetta[Rohl et al., 2004]
Pawe l Widera Evolving energy function for PSP 2009-06-30 7 / 26
Model assesmentCorrelation between energy and similarity to native
Similarity measure
RMSD =
√√√√ 1N
i=N∑i=1
δ2i
Decoys generated byI-TASSER[Wu et al., 2007]
Robetta[Rohl et al., 2004]
Pawe l Widera Evolving energy function for PSP 2009-06-30 7 / 26
All-atom force fieldFolding simulation ∑bonds
ik l
i2 (l − l0i )+∑angles
ikθi2 (θ − θ0
i )+∑torsionsi
Vωi2 [1 + cos(niωi − φi)]+∑N−1
i=1∑N
j=i+1
{4εij
[(σijrij
)12−(
σijrij
)6]
+qi qj
4πε0rij
}Intermolecular forces
bond forces (stretching, bending, rotating)short range forces (Pauli repulsion, van der Waals’ interactions)electrostatic forces (Coulomb’s law)
Rosetta@home in CASP7140k computers (37 TFLOPS) — 500k CPU hours per domain
Pawe l Widera Evolving energy function for PSP 2009-06-30 8 / 26
All-atom force fieldFolding simulation ∑bonds
ik l
i2 (l − l0i )+∑angles
ikθi2 (θ − θ0
i )+∑torsionsi
Vωi2 [1 + cos(niωi − φi)]+∑N−1
i=1∑N
j=i+1
{4εij
[(σijrij
)12−(
σijrij
)6]
+qi qj
4πε0rij
}Intermolecular forces
bond forces (stretching, bending, rotating)short range forces (Pauli repulsion, van der Waals’ interactions)electrostatic forces (Coulomb’s law)
Rosetta@home in CASP7140k computers (37 TFLOPS) — 500k CPU hours per domain
Pawe l Widera Evolving energy function for PSP 2009-06-30 8 / 26
Simplified knowledege-based potentialProtein structure prediction
i − 1
i
i + 1
n̂ib̂i v̂i
Example
Estiff =∑
i
(−λv̂i · v̂i+4 − λ
∣∣∣b̂i · b̂i+2
∣∣∣− λΘ1(i) + Θ2(i) + Θ3(i))
Eenv =∑
i V (NPi ,NAi ,NOi ,Ai)
Pawe l Widera Evolving energy function for PSP 2009-06-30 9 / 26
Energy functionWeighted sum of terms vs. evolved function
F (~T ) = w1 ∗ T1 + . . .wn ∗ Tn[Zhang et al., 2003]
F (~T ) = T1∗T3w1∗log(T2)
+ sin(
T4−w2∗T1T5∗exp(cos(w1∗T3))
)GP input
terminals:T1, . . . ,T8
functions:add sub mul divsin cos exp lograndom ephemeralsin range [0,1]
GP tree examplesize = 60depth = 17
Pawe l Widera Evolving energy function for PSP 2009-06-30 10 / 26
Energy functionWeighted sum of terms vs. evolved function
F (~T ) = w1 ∗ T1 + . . .wn ∗ Tn[Zhang et al., 2003]
F (~T ) = T1∗T3w1∗log(T2)
+ sin(
T4−w2∗T1T5∗exp(cos(w1∗T3))
)GP input
terminals:T1, . . . ,T8
functions:add sub mul divsin cos exp lograndom ephemeralsin range [0,1]
GP tree examplesize = 60depth = 17
Pawe l Widera Evolving energy function for PSP 2009-06-30 10 / 26
Energy functionWeighted sum of terms vs. evolved function
F (~T ) = w1 ∗ T1 + . . .wn ∗ Tn[Zhang et al., 2003]
F (~T ) = T1∗T3w1∗log(T2)
+ sin(
T4−w2∗T1T5∗exp(cos(w1∗T3))
)GP input
terminals:T1, . . . ,T8
functions:add sub mul divsin cos exp lograndom ephemeralsin range [0,1]
GP tree examplesize = 60depth = 17
Pawe l Widera Evolving energy function for PSP 2009-06-30 10 / 26
Fitness evaluationEvolutionary objective
1 construction of the reference ranking RR(decoys sorted by similarity to native)
2 ranking decoys using evolved energy function RE(decoys sorted by energy)
3 rankings comparison - RR vs. RE4 fitness = average distance for all proteins
Pawe l Widera Evolving energy function for PSP 2009-06-30 11 / 26
Reference ranking constructionCorrelation between energy and similarity to native
R0
RMSD
0
3.2
1
2.1
2
5.2
3
1.2
4
3.5
5
2.1
6
4.8
7
3.5
R1 3 1 5 0 4 7 6 2
R2 3.0 1.5 7.0 0.0 4.5 1.5 6.0 4.5
Ranking typesR1 - permutation of indicesR2 - averaged ranks
Pawe l Widera Evolving energy function for PSP 2009-06-30 12 / 26
Rankings comparisonMeasure of distance between rankings
4 3 2 1 53 4 1 5 21 1 1 4 3→ 10
1 45
35
25
15 → 4.6
Distance functionsLevenshtein edit distance - O(n)
Kendall Tau distance - O(n(n−1)2 )
Spearman footrule distance - O(12n2)
Ranks weightinglinearsigmoid
Pawe l Widera Evolving energy function for PSP 2009-06-30 13 / 26
Rankings comparisonMeasure of distance between rankings
4 3 2 1 53 4 1 5 21 1 1 4 3→ 10
1 45
35
25
15 → 4.6
Distance functionsLevenshtein edit distance - O(n)
Kendall Tau distance - O(n(n−1)2 )
Spearman footrule distance - O(12n2)
Ranks weightinglinearsigmoid
Pawe l Widera Evolving energy function for PSP 2009-06-30 13 / 26
Decoys samplingSelection vs. noise reduction
Simple selectiontopuniformrandom
Bin based selectionequal sizeequal distance
Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26
Decoys samplingSelection vs. noise reduction
Simple selectiontopuniformrandom
Bin based selectionequal sizeequal distance
Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26
Decoys samplingSelection vs. noise reduction
Simple selectiontopuniformrandom
Bin based selectionequal sizeequal distance
Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26
Decoys samplingSelection vs. noise reduction
Simple selectiontopuniformrandom
Bin based selectionequal sizeequal distance
Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26
Decoys samplingSelection vs. noise reduction
Simple selectiontopuniformrandom
Bin based selectionequal sizeequal distance
Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26
Decoys samplingSelection vs. noise reduction
Simple selectiontopuniformrandom
Bin based selectionequal sizeequal distance
Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26
Decoys samplingSelection vs. noise reduction
Simple selectiontopuniformrandom
Bin based selectionequal sizeequal distance
Pawe l Widera Evolving energy function for PSP 2009-06-30 14 / 26
Experiment design
Pawe l Widera Evolving energy function for PSP 2009-06-30 15 / 26
Evolutionary progressFitness throughout generations
0 200 400 600 800 1000
generation
0.0010
0.0015
0.0020
0.0025
0.0030
0.0035
0.0040
fitness
levenshtein-steadystate
0 200 400 600 800 1000
generation
0.34
0.36
0.38
0.40
0.42
fitness
spearman-generational
0 200 400 600 800 1000
generation
0.500
0.505
0.510
0.515
0.520
fitness
kendall-generational
ObservationsRound I - early saturationRound II - small but constant improvement
Pawe l Widera Evolving energy function for PSP 2009-06-30 16 / 26
Evolutionary progressFitness throughout generations
0 200 400 600 800 1000
generation
0.32
0.34
0.36
0.38
0.40
0.42
0.44
fitness
spearman-linear-ts8-generational
0 200 400 600 800 1000
generation
0.42
0.44
0.46
0.48
0.50
0.52
0.54
fitness
spearman-sigmoid-ts8-elitism
0 200 400 600 800 1000
generation
0.500
0.505
0.510
0.515
0.520
0.525
fitness
kendall-ts4-generational
ObservationsRound I - early saturationRound II - small but constant improvement
Pawe l Widera Evolving energy function for PSP 2009-06-30 16 / 26
Landscape analysisFitness distribution for the random walk
Pawe l Widera Evolving energy function for PSP 2009-06-30 17 / 26
Landscape analysisFitness distribution for the random walk
d-100 d-58 f-100 f-42 random-100 top-100 uniform-100 all0.3
0.4
0.5
0.6
0.7
0.8
fitness
Pawe l Widera Evolving energy function for PSP 2009-06-30 17 / 26
Population diveristy analysisGenotype - Phenotype - Fitness mapping
Diversity measuresF - fitness entropy(frequency ofduplicates)P - root mean squaredistance betweenrankingsG - number of uniquetrees <#T, #NT, depth>
Pawe l Widera Evolving energy function for PSP 2009-06-30 18 / 26
Population diveristy analysisGenotype - Phenotype - Fitness mapping
Pawe l Widera Evolving energy function for PSP 2009-06-30 18 / 26
Population diveristy analysisGenotype - Phenotype - Fitness mapping
Pawe l Widera Evolving energy function for PSP 2009-06-30 18 / 26
Improvement over random walkIs the evolution any good?
decoys set improvement avg best
all 0.78% 0.710uniform-100 0.96% 0.711random-100 1.28% 0.713top-100 1.93% 0.702s-42 7.76% 0.713s-100 7.64% 0.772d-58 8.21% 0.780d-100 10.88% 0.804
Pawe l Widera Evolving energy function for PSP 2009-06-30 19 / 26
Best evolved energy functionsComparison to naive combination of energy terms
Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)
Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26
Best evolved energy functionsComparison to naive combination of energy terms
Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)
Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26
Best evolved energy functionsComparison to naive combination of energy terms
Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)
Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26
Best evolved energy functionsComparison to naive combination of energy terms
Correlation to RMSDd-100 0.76(generational+ADF)all decoys 0.30(steady-state+elitism)best single term 0.24worst single term -0.20naive combination ofterms 0.12original I-TASSERenergy 0.44 (0.51/0.65)
Pawe l Widera Evolving energy function for PSP 2009-06-30 20 / 26
Comparison to weighted sum of termsNelder-Mead downhill simplex optimisation
spearman-sigmoid correlation
method d-100 all d-100 all
simplex 0.734 0.638 0.650 0.166GP 0.835 *0.714 0.740 *0.200
Pawe l Widera Evolving energy function for PSP 2009-06-30 21 / 26
Distribution of terminals and operatorsDid the evolution discovered any knowledge?
energy term correlation
T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16
average 0.06
Use of energy termsmost frequent: T4, T5
least frequent: T1, T6
Use of operatorsmost frequentadd, divleast frequentsin, cos, log
Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26
Distribution of terminals and operatorsDid the evolution discovered any knowledge?
energy term correlation
T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16
average 0.06
Use of energy termsmost frequent: T4, T5
least frequent: T1, T6
Use of operatorsmost frequentadd, divleast frequentsin, cos, log
Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26
Distribution of terminals and operatorsDid the evolution discovered any knowledge?
energy term correlation
T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16
average 0.06
Use of energy termsmost frequent: T4, T5
least frequent: T1, T6
Use of operatorsmost frequentadd, divleast frequentsin, cos, log
Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26
Distribution of terminals and operatorsDid the evolution discovered any knowledge?
energy term correlation
T1 (E13) 0.03± 0.11T2 (E14) 0.20± 0.17T3 (E15) 0.15± 0.15T4 (Estiff ) 0.24± 0.22T5 (EHB) −0.16± 0.20T6 (Epair ) 0.01± 0.14T7 (Eelectro) −0.20± 0.23T8 (Eenv ) 0.04± 0.16
average 0.06
Use of energy termsmost frequent: T4, T5
least frequent: T1, T6
Use of operatorsmost frequentadd, divleast frequentsin, cos, log
Pawe l Widera Evolving energy function for PSP 2009-06-30 22 / 26
Summary
ConclusionsGP evolved function outperforms linear combination of weightsGP choice of energy terms reflects their correlation to RMSDdecoys from real prediction process are more difficult to assesbloat control is necessary to evolve more compact functions
Ideas for the futuremore complex total fitnessdistance measured using ProCKSI consensusRosetta generated decoysadditional energy terms (SA, RCH)
Pawe l Widera Evolving energy function for PSP 2009-06-30 23 / 26
Summary
ConclusionsGP evolved function outperforms linear combination of weightsGP choice of energy terms reflects their correlation to RMSDdecoys from real prediction process are more difficult to assesbloat control is necessary to evolve more compact functions
Ideas for the futuremore complex total fitnessdistance measured using ProCKSI consensusRosetta generated decoysadditional energy terms (SA, RCH)
Pawe l Widera Evolving energy function for PSP 2009-06-30 23 / 26
Thank you!
AcknowledgementsThis work was supported by Marie CurieAction MEST-CT-2004-7597 under theSixth Framework Programme of theEuropean Community.
Ben Gurion University of the Negev’sDistinguished Scientists Visitor Programand Prof. Moshe Sipper.
Pawe l Widera Evolving energy function for PSP 2009-06-30 24 / 26
Publications
1 P. Widera, J.M. Garibaldi, N. KrasnogorEvolutionary design of the energy function for proteinstructure predictionIn IEEE Congress on Evolutionary Computation, CEC’09,p1305–1312, Trondheim, Norway, May 2009
2 P. Widera, J.M. Garibaldi, N. KrasnogorGP challange: evolving the energy function for proteinstructure predictionsubmitted to Genetic Programming and Evolvable Machines, 2008
Pawe l Widera Evolving energy function for PSP 2009-06-30 25 / 26
References
Anfinsen, C. (1973).Principles that Govern the Folding of Protein Chains.Science, 181(4096):223–30.
Dill, K. A. and Chan, H. S. (1997).From Levinthal to pathways to funnels.Nat Struct Mol Biol, 4(1):10–19.
Rohl, C. A., Strauss, C. E. M., Misura, K. M. S., and Baker, D. (2004).Protein Structure Prediction Using Rosetta.In Brand, L. and Johnson, M. L., editors, Numerical Computer Methods, Part D, volume Volume 383 of Methods inEnzymology, pages 66–93. Academic Press.
Wu, S., Skolnick, J., and Zhang, Y. (2007).Ab initio modeling of small proteins by iterative TASSER simulations.BMC Biol, 5(1):17.
Zhang, Y., Kolinski, A., and Skolnick, J. (2003).TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction.Biophys. J., 85(2):1145–1164.
Pawe l Widera Evolving energy function for PSP 2009-06-30 26 / 26