Physics and structure of biomacromolecules Konstantin Zeldovich LRB 1004, x62354.
-
Upload
hana-morrisson -
Category
Documents
-
view
216 -
download
0
Transcript of Physics and structure of biomacromolecules Konstantin Zeldovich LRB 1004, x62354.
Physics and structure of biomacromolecules
Konstantin ZeldovichLRB 1004, x62354
Protein structure• PDB, the Protein Data Bank: ~63,000 structures• Primary, secondary, tertiary, … structure• Domains• Methods: X-ray and NMR• Computational approaches
• Diverse structures: from globular to knotted and intrinsically disordered, but a limited repertoire of ~1000 folds
Branden & Tooze, Introduction to protein structure
Interactions within a protein• Van der Waals • Hydrophobic forces• Electrostatic• Hydrogen bonds• Role of solvent• Hierarchy of energies (bond strength)
Many interactions of a similar energy scale (except chemical bonds).Overall, a 300-residude protein has G ~ 5 kcal/mol
-per residue, a very small difference between folded and unfolded states
- SUBTLE BALANCE Hydrophobic interactions drive folding to the compact structure
Thermodynamics of folding
Privalov, J Chem Thermodyn 29: 447 (1997)
Methods: calorimetry , thermal or chemical denaturationSmall proteins fold in a two-state fashion, folding is reversible
lysozymeheat capacity
N
U
G
reaction coordinate
unfolded nativetransition state
Kinetics of folding
Plaxco et al, JMB 277:985 (1998); Biochemistry 39:11177 (2000)
For many proteins, folding rate is determined by their topology (contact order)
However: newer research suggests strong outliers; C.R. Matthews lab.
Contact order (CO) = average sequence separation between contacting residue pairs
Relative CO: normalized by chain length
Most proteins are densely packedRadius of gyration vs. chain length
3/1
3
3
~
~
aNR
VR
NaV
g
All bacterial proteins from the PDB, June 2009
Anfinsen’s thermodynamic hypothesis• Native state is entirely defined by sequence• Native state is a minimum of free energy– Unique– Stable– Kinetically accessible
All computational efforts depend on these ideas
Anfinsen, Science 181: 223 (1973)
How sequence defines structure?
• Protein is a heteropolymer• How can a specific structure arise at all?• Protein-like sequences and energy gap• Folding landscape and “funnels”
Review papers:
Dill et all, Annu. Rev. Biophys. 2008 37:289-316Shakhnovich, Chem. Rev. 2006 106:1559-1588Onuchic, Luthey-Schulten, Wolynes, Annu. Rev. Phys. Chem. 1997 48:545-600
Toy models address basic questions
27-residue compact chain on 3x3x3 latticeConformational space is discrete, 103346 structuresPairwise contact potentials: only nearest neighbors interactSimulations are very quick
Lau & Dill, Macromolecules 22, 3986 (1989)Shakhnovich & Gutin, J Chem Phys 93, 5967 (1990)
Discrete conformational space -> we can calculate the energies of the toy proteinin each and every of the possible configuration.The configuration with the lowest energy is the native state
Proteins have a large energy gapE
103345
0
/
/0
)(
i
TE
TE
ie
eTP
WHPCECQLLRYGNNDFRNLDMLFISFR
WEDNMIQAGWYCPLTRRHIFQFYCHFY
compact lattice 27-mers with 10,000 possible conformations
Gap!Also, a sparse spectrum for low E
Energy gap leads to stability
...1
11)( /)(
0
/)(
0
/
/
1
TEEM
i
TEEM
i
TE
TE
i
NN N
Nii
N
eee
e
p
pTP
What is the probability to find a protein in its native state?
Gap!
The larger the gap, the more populated the native state is compared to other states
Np
T
protein
random polypeptide
PN vs T is roughly equivalent to CD spectra of thermal denaturation
Kinetics of folding and “funnels”How does the protein find its native state?Levinthal paradox: a brute-force search of all possible configurations would be outrageously long. In reality, proteins fold in milliseconds. Answer: the native state must be kinetically accessible
Dill et all, Annu. Rev. Biophys. 2008 37:289
The lower the energy, the more similarconformations are. Folding thus converges to the single native state
Empirically (from simulations), a large gap is necessary for fast folding
To crystallize or to simulate?
• Protein structure prediction• Homology modeling vs molecular simulations• Structural genomics• CASP competition
To crystallize is hard, to sequence is cheap. Structure from sequence?
In a perfect world: knowing the all of the interactions, find the conformation corresponding to the minimum energy. Voila, this is the native state.
Practical challenges: -Interactions are not known exactly-Interactions with solvent-Very large parameter space (# bond angles ~# of atoms ~ 105)-Rugged energy landscape with deep local minima – search algorithms are inefficient
Threading using energies
Jones, Taylor, Thornton, Nature 1992
Given a set of structures, determine which one is the best match for the given sequenceRationale: the number of folds is limited
Thread the sequence into each structure (possibly with gaps), thenevaluate the energy of amino acid contacts.
Select the threading which yields the lowest energy (cf. the gap)
Works well even at low sequence homology
Threading using profiles
Bowie, Luthy, Eisenberg, Science 1991
For each position, assess:-secondary structure-fraction polar-buried area, …
Residue typeA C D E …32 84 -92 23-6 87 34 -5…
posi
tion
profileAverage over homologous sequences with known structures
Create profiles for different folds (using known structures with homologous sequences)
For a given sequence with unknown structure, match it to all profiles (with gaps)
Select the profile with best score.
Homology modeling
Marti-Renom,… Sali, Annu. Rev. Biophys. Biomol. Struct. 2000. 29:291–325
Pairwise sequence alignment with PDB (BLAST)Match to multiple seq.alignment (PSI-BLAST)Threading, or 3D template matching to PDB
Fold correctness? (by seq.similarity?)StereochemistrySolvent accessibilityPositions of charged and hydrophobic groups…
Rigid-body assemblySegment matching (aligning conserved atoms)Satisfaction of spatial restraints
ab initio structure prediction
Anfinsen’s hypothesis: -native structure is entirely determined by the sequence-native structure is a unique energy minimum
Assuming we know interactions between the amino acids, can we just look for this minimum???
Polymer modeling is extensively used in materials science. Is it applicable to proteins?
Two main methods: molecular dynamics and Monte Carlo deterministic stochastic reflects dynamics no dynamics
Karplus, Scheraga, …
Force fields and potentialsHow do we know the strength of each interaction between atoms in a protein?
Ab initio approach: quantum chemistry can calculate the electron density profiles , and thus the energy (isn’t a protein just one big Schroedinger equation?)
Statistical approach: learn from the PDB by counting the contacts
Potentials optimized to correctly predict known structures of small moleculesCHARMM, AMBER
Miyazawa & Jernigan 1985, 1996
Boltzmann law: Inverting:
ji
ijijij
RTUij NN
NRTpRTUep ij loglog/ number of contacts
molar fractions
Training set must be carefully chosen: various folds, no homology, …
Molecular dynamics: amF
For i-th atom:
tvxx
tavv
Fm
a
iii
iii
jiji
all
1
for a
whi
le
i
j
x
time
Trajectories of all atoms
Pros:- Most detailed, most realistic- True dynamics
Cons:-Time-consuming
...
)(
HBij
electrij
VdWij
bondijij
i
jiijij
UUUUU
dx
xxdUFforce
Main issue: needs (picosecond) to reproduce bond vibrations, butfolding occurs on microsecond to seconds timescale so at least 107 iterations needed
s10~ 12t
Tools: AMBER, CHARMM, GROMACS, NAMD, …
Applications of molecular dynamics
• Protein-ligand interactions• Dynamics of protein folding• Membrane proteins and ion channels• Sidechain packing
D.E.Shaw Research has developed a dedicated hardware supercomputer, Anton,to run MD simulations much faster than any commodity clusters
hardware designed to run MD, using custom-built chips (ASIC and FPGA)
milliseconds are becoming accessible!
D.E.Shaw et al 2009, Proceedings of the ACM/IEEE Conference on Supercomputing (SC09)
Monte-Carlo simulationSacrifices information about dynamics to better explore the full energy landscape
Trial move
oldE newEenergy
Elementary step:Make a trial move, and accept or reject the new configuration
oldnew
oldnew
EE
EE
TkEE Boldnewep /)( - always accept
- accept with probability
(Metropolis sampling)
Different conformations are visited with the same frequency as in mol.dyn.
Monte-Carlo simulation (cont’d)Typical moves are rotations around bonds
-local move, rotation of one atom rel. to its two neighbors -global move, pivoting of the entire chain around a bond
Advantage over MD: no small/large timescale problemHowever, - no direct information about dynamics - calculating rotations is expensive (trigonometry!)
Often used in coarse-grained simulations to explore large conformational space and find basins of attraction (energy valleys).
If needed, these valleys can then be further explored by molecular dynamics
Tools: ProFASi
Hybrid techniques: I-TASSER
Wu, Skolnick, Zhang, BMC Biology 5:17 (2007)
Hybrid techniques: ROBETTA
Kim, Chivian, Baker, NAR 2004, vol. 32 W526–W531
Sequences parsed into putative domains
If homology is found, comparative modeling
If low homology, ab initio folding
3 or 9 residues fragment libraries are assembled
Selected decoys are clustered, cluster centroids used as models
Sidechains repacked by MC simulationsusing a rotamer library
Structural databases: SCOP, CATHhttp://scop.mrc-lmb.cam.ac.uk/scop/
• Hierarchical structural classification
• Class all-alpha, all-beta, alpha/beta, alpha+beta, mulitdomain, membrane, small
• Fold • Superfamily• Family
http://www.cathdb.info/
• Hierarchical domain classification
• Class: mainly-alpha, mainly-beta and alpha-beta
• Architecture• Topology (fold family)• Homologous superfamily
Murzin et al, JMB 247:536(1995) Orengo et al, Structure 5:1093 (1997)
Tools & serversPDB www.rcsb.orgStructure prediction servers and tools (just a few)
I-TASSER http://zhanglab.ccmb.med.umich.edu/I-TASSER/ROBETTA http://robetta.bakerlab.org/ MODELLER http://salilab.org/modeller/
Molecular dynamics packages (general)AMBER http://ambermd.org/CHARMM http://www.charmm.org/GROMACS http://www.gromacs.org/NAMD http://www.ks.uiuc.edu/Research/namd/
Monte Carlo protein modelingProFASi http://cbbp.thep.lu.se/activities/profasi/
Structural biology software databasehttp://www.ks.uiuc.edu/Development/biosoftdb/