Simulating Protein Folding with Full Atomistic Detail Prof ... › upload › content › talk ›...
Transcript of Simulating Protein Folding with Full Atomistic Detail Prof ... › upload › content › talk ›...
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 1
1
Vijay S. Pande
Departments of Chemistr y and of Structural Biology
Stanford University
Simulating Protein Folding
with Full Atomistic Detail
2
Protein folding and disease
� Misfolding related diseases
� Proteins/peptides misfold and aggregate
� CJD (Mad Cow), Alzheimer’s, Parkinson’s
� Challenging problem
� Significant challenge for traditional structural methods
� Aggregates: no well-defined structure
� Kinetics slow and complex
Plaque (CJD)Folded Misfolded
3
Folding kinetics
can have a biological impact
p53: “Guardian of the genome”
� Dimerizes cotranslati onall y
� Nascent chains from adjacent ribosomes
will dimerize during translation
� Mutations which folding and formation
of the dimer are linked to various cancers
Challenging problem
� Complex dynamics
� Going from crystal structure
to dimerization mechanism Nicholls, C. D. et al., J. Biol. Chem. 2002;277:12937-12945
dimers
tetramer
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 2
4
Why physical simulation?
� Our goals
� To understand the biophysics of these molecules
� Analogy to bridges: we want to understand
the underlying physics and then apply it
� Why not an informatic approach?
� Dearth of experimental data to train models
� Need to extrapolate from physical principles
� Future:
� Informatic approaches to analyze physical simulation
� Hybrid approaches
Alzheimer’s aggregates
p53: protein central to cancer
5
� Long timescales vs. accurate models
� The eternal problem in computer simulations: accuracy vs. sampling
� Can we “eat our cake and have it too” –
High degree of sampling with accurate models
� Analysis of data now a challenge
� Reproducing reality is nice, but doesn’t go beyond experiment
� How can we learn more about these complex systems?
� How do proteins fold?
� What is the mechanism of folding at atomic detail?
� What can simulations do to add insight?
Primary challenges
6
Lattice
models:
Simple & generic
Off-lattice
models:
Simple models
of particular proteins
All-atom
models:
Very detailed,
typically intractabl e
CPU minute
Accurate model
Range of possible models
Great sampling
CPU hour 1000 CPU years
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 3
7
Atomistic models
8
Building an atomistic model
� What are the important atom-atom forces
in biomolecul es?
� Can we approxi mate them with classical models
� QM would slow the calc down by 1000x
� A classical approxi mation should work well
in many cases (e.g. no bond breaki ng)
� Can we find the parameter s needed
in some methodical way
� No bias
� Automated procedur e
9
Short range interactions� Bonds connect atoms
� Vibrate with a given frequency
� Known bond length
� Approximate energy
w/2nd order term
� Connect them by springs
� Sterics
� Angles & dihedrals
� Control how atoms bend & move locally
� Van der Waals
� Dipole-dipole interaction: -(σ/r)6
� Hard core repulsion: modeled as (σ/r)12
� Leads to Lenard-Jones: VLJ( r) = ε [ (σ/r)12 - (σ/r)6]
Energy
Distance (r)
r
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 4
10
� Charge-charge interactions: Coulomb’s law
Charge-charge interactions
� Physically driven by electrostatics
of sorts
� NH will be positive
� CO will be negative
� Hence, attraction
� In models
� Handled by partial charges
on N,H,C,O
� Partial charges now derived
from quantum mechanics
� Directionality?
� Partial charges yield
a dipole interactions,
hence directionality
� Previousl y, specific angular
functions have been used
11
How do we get parameters?
φ1tt and φ1gt
-180 -120 -60 0 60 120 180
ene
rgy
(kc
al/m
ol)
0
1
2
3
4
5
6
7
The plan:
Use quantum
mechanics
to calculate parameters
and then fit to classical
potentials
12
Existing forcefields
� All the hard w ork has been done by other groups?
� Be skeptical!
� Comparison to experiments and theory?
� Done on small molecules!
� Several to choose from
� OPLS, CHARMM, AMBER
� Have become very similar
� Questions:
� What about water?
� How good are they?Peter Kollman
(AMBER)Martin Karplus(CHARMM)
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 5
13
What about water?
14
Hydrophobic effect
� Origin of hydrophobicity
� Water forms HB network
around hydrophobic solutes
� This reduces the solvent entropy
� When two hydrophobic solutes are brought together
� This reduces the exposed surface area
� Reduces the number of “bound” water
� Increases the entropy, decreasing DG
� Important for biomolecul es
� Hydrophobic cores of proteins
� Lipid membrane interior vs. exterior
solute apart:exposed surface area
solute together:less exposed surface area
15
Dielectric properties
� Why dielectric
� Proteins have lots of charges
� Charges induce polarization in dielectric media
� Water and the protein can act as a dielectric medium
� Importance
� A great deal of the solvation free energy
can come from dielectric properties
� Especially for charged amino acids
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 6
16
Implicit solvation model: PB/SA
� Dielectric (Poisson-Boltzmann)
� Consider the protein water system
as a 2-dielectric system
� For dielectric ε(x), electrostatic potential φ (x),
and charge density ρ(x), we get
∇ε(x) ∇φ (x) = - 4 π ρ(x)
� Two types of charges ρ = ρfixed + ρmobile
� Fixed (on the protein)
� Mobile (counter ions)
� We say the counter ions immediatel y equilibrate
ρmobile = exp(-cφ /kT)
� We get the Poisson-Boltzmann equation
∇ε ∇φ = - 4π[ρfixed + exp(-cφ /kT)]
εwater
εprotein
εprotein
17
Implicit solvation model: PB/SA
� Hydrophobicity (surface area)
� We make the approximation that hydrophobicity is related
to buried surface area
� The more buried area, the better
� Surface area terms as an effective energy
� Add HSA = Σi σi Ai to energy
� Ai is the surface area
� σi is the coefficient, related to hydrophobicity scale
� In the end, we need A i to correlate
with solvation free energies more than a geometrical calculation
of area
� How to parameterize PB/SA?
� Compare to solvation free energies of small molecules
18
Solvation models
� Water is very important
� Creates the hydrophobic effect
� Hydrogen bonding to water
� Explicit water
� Water model ed
atomisticall y
� TIP3P, SPC, etc.
� Implicit water
� Water model ed mathematicall y
� GBSA
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 7
19
How good
are these models?
20
Aren’t molecular models flawed?
� Of course they are - but that’s not the point
� The question is “is the model good enough
for the questions asked”
� All models have limitations
� What does good enough mean?
� Is the model predictive?
� There is no “right or wrong” only “predictive
or not predictive”
21
Comparison with experiment
� RMS deviations from experi ment (kcal/mol):
AMBER 0.97 CHARMM 0.84 OPLS-AA 0.64
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 8
22
Water model can have a big impact
� Results� We see a significant improvement with TIP3P-Mod
� Need a model with zero average error (even if RMS is non-zero)
� 4P models not better
� Towards new water models� We can use this to design new water models
� “M24” has zero average error with similar water properties
Ave. Error RMS Error
kcal/mol kcal/mol
TIP3P-MOD 0.23 0.42
TIP3P 0.50 0.64
SPC 0.53 0.64
TIP4P-Ew 0.65 0.77
SPC/E 0.71 0.82
TIP4P 0.71 0.82
One major problem:
average error in the solvation
free energies means
that there will be overall offsets
Trend: amino acids
less soluble
23
Are forcefields good enough?
� It’s a model after all: always will have limitations
� What kind of accuracies (typically found)
� Solvation free energy of small molecules:
3-6 kcal/mol error
� Ligand binding (e.g. drug binding affinity)
1.5-6 kcal/mol
� Is that good enough?
� K = exp( DG/kT) ~ exp(6/0.6) ~ 105
� Can’t discriminate µµµµ -mol from 50 picomol
24
What’s the problem?
� Is it the model?
� To test this, we looked at considerably simpler systems
(e.g. solvation free energy of amino acid sidechains)
� We could calculate fairly accurate free energies
(RMS error ~ 0.5 kcal)
� Must be the sampling
� Typical simulations are run for only picoseconds
to nanoseconds
� What do you get for a pico- to nanosecond simulation?
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 9
25
Experimentally relevant timescales
10-15
femto10-12
pico10-9
nano10-6
micro10-3
milli100
seconds
Bond vibration
Isomer-ation
Waterdynamics
Helixforms
Fastestfolders
Typicalfolders
Slowfolders
long MD run
where weneed to be
MDstep
where we’dlove to be
� Fundamental problem for simulation
� Proteins fold in micro- to milliseconds
� Computers can simulate nanoseconds
� How can we break this impasse?
26
Folding@home: worldwide grid computing
� Very powerful
� ~200 Teraflops sustained performance
� >1,000,000 total CPUs; 200,000 active
� >200 countries
� Very low cost
� $100,000 for server hardware & admin
� Sun Microsystems charges $1/CPU/hour ~ $1B/year
� New paradigm for supercomputi ng
� Design algorithms to use many CPUs, slow networking
>170,000 active CPUs over the world (CPU locations from IP address)
27
Traditional approach: use many processors
to speed a single calculation� Tightly coupled parallelism is limited
� It takes 5 years ~ 2000 days to complete a PhD
� Can 2000 students complete a PhD’s worth of research in 1 day?
� Clearly not - some problems cannot be subdivided: communication, organization challenging
One must design novel ways to utilize large scale resources in efficient ways
� Simulation equivalent: tightly coupled
MD simulation
� Divide the force calculation of a trajectory
between CPUs
� All CPUs work together by spatial decomposition
� We need to rethink how we solve such problems
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 10
28
Simulating two-state dynamics
� Methods for two-state kinetics
� Simulations are each longer
than the second most limiting timescal e
� Works for small proteins
and simple systems
FU
Putting in real numbers :
Number that cross = Mkt =
10,000 simulations x 10,000ns-1
x 100ns = 100 events!
10µs
29
New method: build markovian state model
(MSM) from a graph of trajectories
� Goals
� To tackle complex, multi-state dynamics
� To quantitatively predict
virtually all folding properties
� To efficiently parallelize
over ~100,000 CPUs
� Inspirations
� Original state models of Karplus & Weaver and others
(building rate models for coarse grained states)
� Path sampling methods of Chandler, co-workers, and others
� Can we combine these ideas to quantitatively predict folding kinetics
(e.g. rates, free energies, structure)?
F
U
100ns10µsI
Our previous methods fail for multi-state
kinetics or just where the minimum timescale for barrier crossing is long
30
New method: build markovian state model
(MSM) from a graph of trajectories
Singhal, Snow, and Pande, Journal of Chemical Physics (2004)
� Plan
� To build a Markovian representation of the state space
� Clustering of conformations retains
Markovian behavior
FU
I
(1) Sample nodes
� Need to sample space: nodes of a Markovian graph
� Sampling can obtained by a variety of means
� Thermodynamics: Replica Exchange (REMD)/Parallel tempering
� Simplified models can sample space well
� Iterate sampling to test convergence
Thermodynamics methods like REMD
are poorly suited for describing kinetics,but are well-suited to sample space thoroughly
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 11
31
New method: build markovian state model
(MSM) from a graph of trajectories
Singhal, Snow, and Pande, Journal of Chemical Physics (2004)
(2) Use MD to sample transitions
� Paths need not be long in time,
well suited for large CPU grid
� Do not get stuck in minima
due to the large number of nodes
(3) Efficient procedure to calculate kinetic properties
� Can calculate any kinetic property (rates, commitment probabilities)
vivj Pij
FU
I
0
)(
=
+= ∑
F
edge
jijiji
MFPT
MFPTtPMFPT
ij
1,0 ==
= ∑
FU
edgejiji
ij
P
σσ
σσ MFPT = Mean First Passage Time ~ 1/k
σ = commitment probability = “pfold”
With a MSM, one can calculate virtually
any kinetic quantity of interest
32
Kinetics: predicted vs. experiment
(with several different experimental groups’ data)
� Rates not enough to test
� Different forcefields
yield similar absolute rates
� BUT, yield different
mechanisms
� Need additional
experimental data
� Computati onal
comparisons
� Results robust?
� Which details important?
y = 0.7853x1.0106
R 2 = 0.96841
10
100
1000
10000
100000
1 10 100 1000 10000 100000
Experiment (ns)
Sim
ula
tio
n (
ns
)
alpha helix
(Fs peptide)
PPA
TZ2 tf
TZ2 tu
Trp cage
villin (GB/SA)
villin (TIP3P)
BBA5 (GB/SA)BBA5 (TIP3P)
33
� Long timescales vs. accurate models
� The eternal problem in computer simulations: accuracy vs. sampling
� Can we “eat our cake and have it too” –
high degree of sampling with accurate models
� Analysis of data now a challenge
� Reproducing reality is nice, but doesn’t go beyond experiment
� How can we learn more about these complex systems?
� How do proteins fold?
� What is the mechanism of folding at atomic detail?
� What can simulations do to add insight?
Primary challenges
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 12
34
What is the role of chemical detail
in the folding mechanism?
Folded
(“native”)
tertiary
structure
Unfolded
conformers� Does water play a structural role?
� We have had great success with a continuum model for water
� Would explicit solvation
lead to qualitatively different results?
� Bigger question: which details are important?
� Can we use polymeric arguments?
� Or does water structure play an important role
� Must one go beyond just
the polymeric / topological properties?
35
0 2 4 6 8
0
2
4
6
8
-2
-2
0
16
8
y
x
A
B
Problem: complex dynamics
�How to understand
complex dynamics?
� Proteins are a high dimensional
system (lots of degrees of freedom)
� No obvious degrees of freedom
to understand the kinetics
�Have to be careful
� Picking the wrong degree
of freedom to follow can lead
to qualitativel y different resultsEven simple two dimensional
free energy landscapes (A) can be misunderstood if poor
projections (x or y) are used instead of the true reaction coordinate (r)
r
x
y
r
36
Pfold: ordering states along
the folding reaction
� How to order states along the folding reaction?
� Look at the commitment probability (or “pfold”)
� pfold ≡ probability of folding before unfolding
� How to calculate pfold?
� Calculate by running many simulations from a given conformation
� Naturally suited for massively parallel computation
Du, Pande, Grosberg,
Tanaka, Shakhnovich, Journal of Chemical
Physics (1998)
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 13
37
“old water” Same protein
conformation,new water
configuration
Is the water configuration relevant
for the transition state ensemble?
� Our test: pfold correlation
1) Remove water
2) Add new water & equilibrate
(holding protein coordinates fixed)
3) Compare pfold to original value
� Result:
� Strong correlation:
pfold is invariant to solvent rearrangement
� Direct test of role of water “new water”
How would
pfold change?Does the water
define pfold?
� Implication: water configuration does not define TSE
� Protein configuration alone defines committor
� Water anneals quickly = fast, orthogonal degree of freedom
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Original solvation
Ne
w s
olv
atio
n
Pfold (original water)
Pfo
ld(n
ew
wa
ter)
38
Maybe we don’t need an explicit
representation for water?Test:
� Remove explicit (TIP3P) water and calculate pfold’s
in implicit solvation (GB/SA)
� Compare pfold’s as calculated with explicit vs. implicit
Result:
� There is a (nonlinear) monotonic
agreement, with some noise
Implication:
Mechanism similar, but shift in transition state
� Shift in TS: pfold ~½ in TIP3P has pfold ~0 in GB/SA
� Theory can deconvolute the TS shift from noise
� Yields a quantitative measure of similarity in mechanism
(via non-linear pfold correlation)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Explicit Solvent
Imp
lic
it S
olv
en
t
Pfold (TIP3P)
Pfo
ld(G
B/S
A
)
39
Results
Explicit solvation models
TIP3P: 3-point water model
TIP4P: 4-point water model
M20, M24: TIP3P variants
To improve free energy properties
Implicit solvation models
GB/SA: Generalized Born
Cα−Go model
All-atom Go model
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 14
40
� Long timescales vs. accurate models
� The eternal problem in computer simulations: accuracy vs. sampling
� Can we “eat our cake and have it too” –
high degree of sampling with accurate models
� Analysis of data now a challenge
� Reproducing reality is nice, but doesn’t go beyond experiment
� How can we learn more about these complex systems?
� How do proteins fold?
� What is the mechanism of folding at atomic detail?
� What can simulations do to add insight?
Primary challenges
41
un fo lded s ta teun fo lded s ta teun fo lded s ta teun fo lded s ta te
?Questions:
� Do any proteins
fold via these
mechanisms?
� Are any of these
“universal”
� Can simulations
help to arbitrate?
native statenative state
Protein folding theory
formatio n o fmicro domains
native state
di ffusion and coll ision ofmicrodomains
native state
formation ofa nucleuscollapse
"topomer"
1. Form secondar y structure first
(Diffusion/Collisi on)
� Hierarchical: form helices & hairpins, decrease entropy
2. Nucleation
� Form nucleus of structure, then grow
(analogous to 1st order phase trans)
3. Collapse first
� Hydrophobically driven:
remove water to form HBs
4. Form rough native shape
first (topomer search)
� Find the right “topology” first,
then pack side chains
42
Zinc finger fold: BBA5
� BBA5: model protein
� Designed and characterized
by Barbara Imperiali’s group
� Small, but stable
� Successful folding
� Fold within 2Å of native state
� Rate (4.5 µµµµ s) agrees
with experiment (7.5 µµµµ s)
� TIP3P corrected rate is 7.5 µµµµ s
� Methods
� Amber94-GS, NPT, RF
� 250 µµµµ s simulated time
(>106 CPU-days
on Folding@home)
(BBA5 designed and characterized
by Barbara Imperiali’s group)
NMRnative state
Simulationnative state
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 15
43
General folding properties
are independent of the water model
� Both models lead to a diffusion-collision mechanism
� 2nd structure forms independentl y
� Probability of forming helix & hairpin statistically independent:
P(helix & hairpin) = P(helix) P(hairpin)
Explicit
solvent(TIP3P)
Implicit
solvent(GB/SA)
44
Three helix bundle: villin headpiece
� Canonical model system for folding
� Small 3 helix bundle
� Well-formed hydrophobic core
� Exposed hydrophobic residues
(for function)
� Simplest protein with function
� Actin binding domain
� TRP and PHE residues involved
in binding: cause trouble for folding?
NMR structure (McKnight)
45
MSM for villin: kinetics
MSM for kinetics
� Predict rate: 3.1±0.3 µµµµ s
experiment: 4.3±0.6 µµµµ s
� Agrees with short times (inset)
and predicts long timescale
� Essentially single
exponenti al dynami cs
Folding mechani sm
� Finds rough topology first
� Then locks in sidechai ns
and secondar y structure
� More like how larger proteins fold?
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10t (µs)
0
0.002
0.004
0.006
0 5 10 15 20t (ns)
fracti
on
fo
lded
Markovian State Model (MSM) for villin:
allows for a direct calculation of kinetic properties over long timescales
blue curve = MSM prediction
red curve = raw data
Villin kinetics: MSM vs. raw data
Simulating Protein Folding
with Full Atomistic Detail
Prof. Vijay S. Pande
The screen versions of these slides have full details of copyright and acknowledgements 16
46
How do proteins fold?
� We find no single mechanism
� Collapse first (protein G hairpin)
� Hydrophobicall y driven, must remove water
in order to make hydrog en bonds stable
� Form secondar y structure first (BBA5)
� Form helices & hairpins
� Hierarchical, decrease in entropy
� Form rough native shape first (villin)
� Are there any universal aspects?
� So far, no! Perhaps there isn’t anythi ng to find?
� Evolution uses what ever works
47
AcknowledgementsTopic Group member(s) Collaborator(s)
Methodology for grid-based
protein simulation
Guha Jayachandran,
Siraj Khaliq,
Michael Shirts
Folding@Home contributors Erik
Lindahl (Stanford) Adam Beberg
(Mithral)
Protein folding Guha Jayachandran,
Nick Kelley, Nina Singhal,
Chris Snow, Vishal
Vaidyanathan, Bojan
Zagrovic
Martin Gruebele (UIUC), Steve
Hagen (U Florida), Feng Gai
(U Penn), Kevin Plaxco (UCSB)
RNA folding Eric Sorin, Jim Caldwell,
Sung-Joo Lee,
Mark Engelhardt
Dan Herschlag, Seb Doniach
(Stanford), Lois Pollack (Cornell)
Non-biological folding Sidney Elmer Craig Hawker (IBM)
Free energy
calculation methods
Young Min Rhee,
Michael Shirts
Bill Swope (IBM),
Chris Jarzynski (LANL)
Protein design Lillian Chong,
Stefan Larson,
Vishal Vaidyanathan
John Desjarlais (Xencor),
Chris Garcia (Stanford)
48