Post on 02-Jan-2016
Protein Design
CS273: Final ProjectCharles Koucharlesk@stanford.edu
Crystal structure of top7 – A novel protein structure created with RosettaDesign.
http://rosettadesign.med.unc.edu/
What is Protein Design
Opposite of structure prediction: determine low energy sequence that yield given structure.
Computationally difficult: Search space of 20^n where n =
sequence length (20 amino acids) Major algorithms: Dead-end
elimination, genetic algorithms, Monte Carlo, Branch & Bound.
http://www.stanford.edu/class/cs273/project/project.html
Major Algorithms
Trade off between thoroughness and computational speed.
Monte Carlo / Genetic Algorithm: Can sample space with infinite number of solutions Sidechain identity, side chain orientation and
backbone structure can be varied continuously. No guarantee of reaching global energy minimum.
Dead-End Elimination Allows only discrete conformations. Rejection criteria is used to prune the search
space.
Desjarlais JR, Clarke ND. Computer search algorithms in protein modification and design. Curr Opin Struct Biol. 1998 Aug;8(4):471-5. PubMed
Review: Energy LandscapeReview: Energy Landscape
q1
qi
q2
qj
qN-1
qN
defined over large dimensionalconformation space
JC Lantombe, Energy2.ppt
Review: Example Energy Review: Example Energy FunctionFunction
• E = bonded terms + non-bonded terms + solvation terms
• E = (ES + EQ + ES-B + ETor) + (EvdW + Edipole) Bonded terms
- Relatively few Non-bonded terms
- Depend on distances between pairs of atoms - O(n2) Expensive to compute
Solvation terms- May require computing molecular surface
JC Lantombe, Energy2.ppt
Random walk through conformation space At each cycle:
– Perturb current conformation at random– Accept step with probability:
(Metropolis acceptance criterion) The conformations generated by an arbitrarily long
MCS are Boltzman distributed, i.e.,
#conformations in V ~
ΔE-
kTP(accept)=min 1,e
Review: Monte Carlo Review: Monte Carlo Simulation (MCS)Simulation (MCS)
E
-kT
Ve dV
JC Lantombe, Energy2.ppt
Monte Carlo Simulation
Tend to waste time in local min. May consist of millions of steps. Energy must be evaluated frequently
(computationally expensive). Use ChainTree to improve
performance.
Koehl, P and Levitt, M. De novo protein design. I. In search of stability and specificity. Journal of Molecular Biology, 293, 1161-1181 (1999).
Lotan, I., Schwarzer, F., Halperin, D., Latombe, J.C.: Efficient maintenance and self-collision testing for kinematic chains. In: Symposium on Computational Geometry (2002) 43–52
Genetic AlgorithmStarts with First generation pool. Iteratively apply genetic operators
(selection, recombination, mutation). Evloves toward better solution (low
energy function).
S. M. Larson, J. England, J. DesJarlais, and V. S. Pande. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Science 11 2804-281 (2002). Protein Science
Selection
• Selection function takes into account the value of fitness function. This gives priority to the “fit” organism but also gives chance for “less fit” organisms.
http://en.wikipedia.org/wiki/Genetic_algorithm
Selection Method
• Roulette Method: probability of selection is proportional to the value of fitness function
• Tournament: picks k individuals (tournament size), and choose the individual with probability p. Iterate with probability p*(1-p), then p*(p*(1-p)) …
• Higher k = less chance for weaker individual.
http://en.wikipedia.org/wiki/Tournament_selection
http://en.wikipedia.org/wiki/Roulette_wheel_selection
Recombination, Mutation
Recombination: different segment of the structure which is optimized in parallel can be recombined into the same model. Recombination occurs with a set probability. Otherwise, the population is propogated to the next generation.
Mutation: avoids local minima by mutating the child with a set probability.
Similar to MC: there is no guarantee to converge into global minimum.
http://en.wikipedia.org/wiki/Genetic_operator
Genome@home
• Genome@Home uses distributed computing and genetic algorithm.
• It also incorporates backbone flexibility using Monte Carlo (random perturbation with RMSD<1.0a) which improves the result.
http://www.stanford.edu/group/pandegroup/genome/
Dead-end Elimination
Discrete conformational search. Functionally equivalent to exhustive
search. It uses rejection criteria to prune the search
space. The robustness depends on the
discreteness and the rejection criteria used. Guaranteed convergence to global min. Initially used for sidechain placement. More
difficult for protein design because of high degrees of freedom.Looger LL, Hellinga HW.
Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol. 2001 Mar 16;307(1):429-45. PubMed
Energy of conformation
Reformulation of sidechain placement problem: Amino acid identity is used instead of rotamer.
The general DEE allows residue up to 300. Energy of conformation is defined as sum of
interaction among side chains and sum of interaction of sidechain and the backbone.
Rejection criteria is used and iterated until no more rotamers can be eliminated. Convergence occurs, or reduces the problem sufficiently for exhaustive serach.
DEE filter: Rejection Criteria
Simple Criterion: If lowest energy struct that can be found using a given sidechain rotamer (low energy side chain conformation) is higer than the highest energy struct w/ different rotamer, the first rotamer is eliminated.
DEE filter: Rejection Criteria
Goldstein Criteria: if energy struct containing one rotamer is always lowered by changing to a second one, the first one is eliminated.
DEE filter: Rejection Criteria
Generalized Criterion: residues are added in group, eliminated clusters of rotamers in the groups maybe excluded from the minimum operator, in addition to those which form dead-end clusters with c.
Mean Field Theory
• Reduce search space.• Self-consistency is
sought by placing amino acids at pre-selected positions in a given structure.
• Energy function is minimized by mean field.
Voigt CA, Mayo SL, Arnold FH, Wang ZG. Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A. 2001 Mar 27;98(7):3778-83. PNAS
Review: Branch & Bound
• Set of solutions can be partitioned into subsets (branch)
• Upper limit on a subset’s solution can be computed fast (bound)
Branch & Bound1. Select subset with best
possible bound2. Subdivide it, and compute
a bound for each subset
S.Batzoglou, Threading2.ppt
Rosetta Design• Initial backbone designed
without regard to side-chain packing.
• Iterates between sequence design and backbone optimization using Monte Carlo.
• Perturbation in random change in the torsional angles of 1-5 random residue, or substitution of backbone torsonal angles of 1-3 consecutive residues with torsional angles from a structure in the PDB. Sidechain optimization. Accept/reject using Metropolis criterion.
• 1.17-a backbone atom RMSD between model and structure.
Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003 Nov 21;302(5649):1364-8. PubMed
Crystal structure of top7 – A novel protein structure created with RosettaDesign.
http://rosettadesign.med.unc.edu/
Using Rosetta Design
• Red: PDB 1A1M: Mhc Class I Molecule B*5301 Complexed With Peptide Typdinqml From Gag Protein Of Hiv2
• Blue: Rosetta Stone Designed
• Visualized with Deep View / Swiss-PdbViewer.
http://www.rcsb.org/pdb/cgi/explore.cgi?pid=195321117535569&pdbId=1A1M
http://us.expasy.org/spdbv/