Simulating Protein Folding with Full Atomistic Detail Prof ... › upload › content › talk ›...

Simulating Protein Folding

with Full Atomistic Detail

Prof. Vijay S. Pande

The screen versions of these slides have full details of copyright and acknowledgements 1

1

Vijay S. Pande

Departments of Chemistr y and of Structural Biology

Stanford University



2

Protein folding and disease

� Misfolding related diseases

� Proteins/peptides misfold and aggregate

� CJD (Mad Cow), Alzheimer’s, Parkinson’s

� Challenging problem

� Significant challenge for traditional structural methods

� Aggregates: no well-defined structure

� Kinetics slow and complex

Plaque (CJD)Folded Misfolded

3

Folding kinetics

can have a biological impact

p53: “Guardian of the genome”

� Dimerizes cotranslati onall y

� Nascent chains from adjacent ribosomes

will dimerize during translation

� Mutations which folding and formation

of the dimer are linked to various cancers

Challenging problem

� Complex dynamics

� Going from crystal structure

to dimerization mechanism Nicholls, C. D. et al., J. Biol. Chem. 2002;277:12937-12945

dimers

tetramer





4

Why physical simulation?

� Our goals

� To understand the biophysics of these molecules

� Analogy to bridges: we want to understand

the underlying physics and then apply it

� Why not an informatic approach?

� Dearth of experimental data to train models

� Need to extrapolate from physical principles

� Future:

� Informatic approaches to analyze physical simulation

� Hybrid approaches

Alzheimer’s aggregates

p53: protein central to cancer

5

� Long timescales vs. accurate models

� The eternal problem in computer simulations: accuracy vs. sampling

� Can we “eat our cake and have it too” –

High degree of sampling with accurate models

� Analysis of data now a challenge

� Reproducing reality is nice, but doesn’t go beyond experiment

� How can we learn more about these complex systems?

� How do proteins fold?

� What is the mechanism of folding at atomic detail?

� What can simulations do to add insight?

Primary challenges

6

Lattice

models:

Simple & generic

Off-lattice

models:

Simple models

of particular proteins

All-atom

models:

Very detailed,

typically intractabl e

CPU minute

Accurate model

Range of possible models

Great sampling

CPU hour 1000 CPU years





7

Atomistic models

8

Building an atomistic model

� What are the important atom-atom forces

in biomolecul es?

� Can we approxi mate them with classical models

� QM would slow the calc down by 1000x

� A classical approxi mation should work well

in many cases (e.g. no bond breaki ng)

� Can we find the parameter s needed

in some methodical way

� No bias

� Automated procedur e

9

Short range interactions� Bonds connect atoms

� Vibrate with a given frequency

� Known bond length

� Approximate energy

w/2nd order term

� Connect them by springs

� Sterics

� Angles & dihedrals

� Control how atoms bend & move locally

� Van der Waals

� Dipole-dipole interaction: -(σ/r)6

� Hard core repulsion: modeled as (σ/r)12

� Leads to Lenard-Jones: VLJ( r) = ε [ (σ/r)12 - (σ/r)6]

Energy

Distance (r)

r





10

� Charge-charge interactions: Coulomb’s law

Charge-charge interactions

� Physically driven by electrostatics

of sorts

� NH will be positive

� CO will be negative

� Hence, attraction

� In models

� Handled by partial charges

on N,H,C,O

� Partial charges now derived

from quantum mechanics

� Directionality?

� Partial charges yield

a dipole interactions,

hence directionality

� Previousl y, specific angular

functions have been used

11

How do we get parameters?

φ1tt and φ1gt

-180 -120 -60 0 60 120 180

ene

rgy

(kc

al/m

ol)

0

1

2

3

4

5

6

7

The plan:

Use quantum

mechanics

to calculate parameters

and then fit to classical

potentials

12

Existing forcefields

� All the hard w ork has been done by other groups?

� Be skeptical!

� Comparison to experiments and theory?

� Done on small molecules!

� Several to choose from

� OPLS, CHARMM, AMBER

� Have become very similar

� Questions:

� What about water?

� How good are they?Peter Kollman

(AMBER)Martin Karplus(CHARMM)





13

What about water?

14

Hydrophobic effect

� Origin of hydrophobicity

� Water forms HB network

around hydrophobic solutes

� This reduces the solvent entropy

� When two hydrophobic solutes are brought together

� This reduces the exposed surface area

� Reduces the number of “bound” water

� Increases the entropy, decreasing DG

� Important for biomolecul es

� Hydrophobic cores of proteins

� Lipid membrane interior vs. exterior

solute apart:exposed surface area

solute together:less exposed surface area

15

Dielectric properties

� Why dielectric

� Proteins have lots of charges

� Charges induce polarization in dielectric media

� Water and the protein can act as a dielectric medium

� Importance

� A great deal of the solvation free energy

can come from dielectric properties

� Especially for charged amino acids





16

Implicit solvation model: PB/SA

� Dielectric (Poisson-Boltzmann)

� Consider the protein water system

as a 2-dielectric system

� For dielectric ε(x), electrostatic potential φ (x),

and charge density ρ(x), we get

∇ε(x) ∇φ (x) = - 4 π ρ(x)

� Two types of charges ρ = ρfixed + ρmobile

� Fixed (on the protein)

� Mobile (counter ions)

� We say the counter ions immediatel y equilibrate

ρmobile = exp(-cφ /kT)

� We get the Poisson-Boltzmann equation

∇ε ∇φ = - 4π[ρfixed + exp(-cφ /kT)]

εwater

εprotein

εprotein

17

Implicit solvation model: PB/SA

� Hydrophobicity (surface area)

� We make the approximation that hydrophobicity is related

to buried surface area

� The more buried area, the better

� Surface area terms as an effective energy

� Add HSA = Σi σi Ai to energy

� Ai is the surface area

� σi is the coefficient, related to hydrophobicity scale

� In the end, we need A i to correlate

with solvation free energies more than a geometrical calculation

of area

� How to parameterize PB/SA?

� Compare to solvation free energies of small molecules

18

Solvation models

� Water is very important

� Creates the hydrophobic effect

� Hydrogen bonding to water

� Explicit water

� Water model ed

atomisticall y

� TIP3P, SPC, etc.

� Implicit water

� Water model ed mathematicall y

� GBSA





19

How good

are these models?

20

Aren’t molecular models flawed?

� Of course they are - but that’s not the point

� The question is “is the model good enough

for the questions asked”

� All models have limitations

� What does good enough mean?

� Is the model predictive?

� There is no “right or wrong” only “predictive

or not predictive”

21

Comparison with experiment

� RMS deviations from experi ment (kcal/mol):

AMBER 0.97 CHARMM 0.84 OPLS-AA 0.64





22

Water model can have a big impact

� Results� We see a significant improvement with TIP3P-Mod

� Need a model with zero average error (even if RMS is non-zero)

� 4P models not better

� Towards new water models� We can use this to design new water models

� “M24” has zero average error with similar water properties

Ave. Error RMS Error

kcal/mol kcal/mol

TIP3P-MOD 0.23 0.42

TIP3P 0.50 0.64

SPC 0.53 0.64

TIP4P-Ew 0.65 0.77

SPC/E 0.71 0.82

TIP4P 0.71 0.82

One major problem:

average error in the solvation

free energies means

that there will be overall offsets

Trend: amino acids

less soluble

23

Are forcefields good enough?

� It’s a model after all: always will have limitations

� What kind of accuracies (typically found)

� Solvation free energy of small molecules:

3-6 kcal/mol error

� Ligand binding (e.g. drug binding affinity)

1.5-6 kcal/mol

� Is that good enough?

� K = exp( DG/kT) ~ exp(6/0.6) ~ 105

� Can’t discriminate µµµµ -mol from 50 picomol

24

What’s the problem?

� Is it the model?

� To test this, we looked at considerably simpler systems

(e.g. solvation free energy of amino acid sidechains)

� We could calculate fairly accurate free energies

(RMS error ~ 0.5 kcal)

� Must be the sampling

� Typical simulations are run for only picoseconds

to nanoseconds

� What do you get for a pico- to nanosecond simulation?





25

Experimentally relevant timescales

10-15

femto10-12

pico10-9

nano10-6

micro10-3

milli100

seconds

Bond vibration

Isomer-ation

Waterdynamics

Helixforms

Fastestfolders

Typicalfolders

Slowfolders

long MD run

where weneed to be

MDstep

where we’dlove to be

� Fundamental problem for simulation

� Proteins fold in micro- to milliseconds

� Computers can simulate nanoseconds

� How can we break this impasse?

26

Folding@home: worldwide grid computing

� Very powerful

� ~200 Teraflops sustained performance

� >1,000,000 total CPUs; 200,000 active

� >200 countries

� Very low cost

� $100,000 for server hardware & admin

� Sun Microsystems charges $1/CPU/hour ~ $1B/year

� New paradigm for supercomputi ng

� Design algorithms to use many CPUs, slow networking

>170,000 active CPUs over the world (CPU locations from IP address)

27

Traditional approach: use many processors

to speed a single calculation� Tightly coupled parallelism is limited

� It takes 5 years ~ 2000 days to complete a PhD

� Can 2000 students complete a PhD’s worth of research in 1 day?

� Clearly not - some problems cannot be subdivided: communication, organization challenging

One must design novel ways to utilize large scale resources in efficient ways

� Simulation equivalent: tightly coupled

MD simulation

� Divide the force calculation of a trajectory

between CPUs

� All CPUs work together by spatial decomposition

� We need to rethink how we solve such problems





28

Simulating two-state dynamics

� Methods for two-state kinetics

� Simulations are each longer

than the second most limiting timescal e

� Works for small proteins

and simple systems

FU

Putting in real numbers :

Number that cross = Mkt =

10,000 simulations x 10,000ns-1

x 100ns = 100 events!

10µs

29

New method: build markovian state model

(MSM) from a graph of trajectories

� Goals

� To tackle complex, multi-state dynamics

� To quantitatively predict

virtually all folding properties

� To efficiently parallelize

over ~100,000 CPUs

� Inspirations

� Original state models of Karplus & Weaver and others

(building rate models for coarse grained states)

� Path sampling methods of Chandler, co-workers, and others

� Can we combine these ideas to quantitatively predict folding kinetics

(e.g. rates, free energies, structure)?

F

U

100ns10µsI

Our previous methods fail for multi-state

kinetics or just where the minimum timescale for barrier crossing is long

30



Singhal, Snow, and Pande, Journal of Chemical Physics (2004)

� Plan

� To build a Markovian representation of the state space

� Clustering of conformations retains

Markovian behavior

FU

I

(1) Sample nodes

� Need to sample space: nodes of a Markovian graph

� Sampling can obtained by a variety of means

� Thermodynamics: Replica Exchange (REMD)/Parallel tempering

� Simplified models can sample space well

� Iterate sampling to test convergence

Thermodynamics methods like REMD

are poorly suited for describing kinetics,but are well-suited to sample space thoroughly





31



Singhal, Snow, and Pande, Journal of Chemical Physics (2004)

(2) Use MD to sample transitions

� Paths need not be long in time,

well suited for large CPU grid

� Do not get stuck in minima

due to the large number of nodes

(3) Efficient procedure to calculate kinetic properties

� Can calculate any kinetic property (rates, commitment probabilities)

vivj Pij

FU

I

0

)(

=

+= ∑

F

edge

jijiji

MFPT

MFPTtPMFPT

ij

1,0 ==

= ∑

FU

edgejiji

ij

P

σσ

σσ MFPT = Mean First Passage Time ~ 1/k

σ = commitment probability = “pfold”

With a MSM, one can calculate virtually

any kinetic quantity of interest

32

Kinetics: predicted vs. experiment

(with several different experimental groups’ data)

� Rates not enough to test

� Different forcefields

yield similar absolute rates

� BUT, yield different

mechanisms

� Need additional

experimental data

� Computati onal

comparisons

� Results robust?

� Which details important?

y = 0.7853x1.0106

R 2 = 0.96841

10

100

1000

10000

100000

1 10 100 1000 10000 100000

Experiment (ns)

Sim

ula

tio

n (

ns

)

alpha helix

(Fs peptide)

PPA

TZ2 tf

TZ2 tu

Trp cage

villin (GB/SA)

villin (TIP3P)

BBA5 (GB/SA)BBA5 (TIP3P)

33




high degree of sampling with accurate models







Primary challenges





34

What is the role of chemical detail

in the folding mechanism?

Folded

(“native”)

tertiary

structure

Unfolded

conformers� Does water play a structural role?

� We have had great success with a continuum model for water

� Would explicit solvation

lead to qualitatively different results?

� Bigger question: which details are important?

� Can we use polymeric arguments?

� Or does water structure play an important role

� Must one go beyond just

the polymeric / topological properties?

35

0 2 4 6 8

0

2

4

6

8

-2

-2

0

16

8

y

x

A

B

Problem: complex dynamics

�How to understand

complex dynamics?

� Proteins are a high dimensional

system (lots of degrees of freedom)

� No obvious degrees of freedom

to understand the kinetics

�Have to be careful

� Picking the wrong degree

of freedom to follow can lead

to qualitativel y different resultsEven simple two dimensional

free energy landscapes (A) can be misunderstood if poor

projections (x or y) are used instead of the true reaction coordinate (r)

r

x

y

r

36

Pfold: ordering states along

the folding reaction

� How to order states along the folding reaction?

� Look at the commitment probability (or “pfold”)

� pfold ≡ probability of folding before unfolding

� How to calculate pfold?

� Calculate by running many simulations from a given conformation

� Naturally suited for massively parallel computation

Du, Pande, Grosberg,

Tanaka, Shakhnovich, Journal of Chemical

Physics (1998)





37

“old water” Same protein

conformation,new water

configuration

Is the water configuration relevant

for the transition state ensemble?

� Our test: pfold correlation

1) Remove water

2) Add new water & equilibrate

(holding protein coordinates fixed)

3) Compare pfold to original value

� Result:

� Strong correlation:

pfold is invariant to solvent rearrangement

� Direct test of role of water “new water”

How would

pfold change?Does the water

define pfold?

� Implication: water configuration does not define TSE

� Protein configuration alone defines committor

� Water anneals quickly = fast, orthogonal degree of freedom

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Original solvation

Ne

w s

olv

atio

n

Pfold (original water)

Pfo

ld(n

ew

wa

ter)

38

Maybe we don’t need an explicit

representation for water?Test:

� Remove explicit (TIP3P) water and calculate pfold’s

in implicit solvation (GB/SA)

� Compare pfold’s as calculated with explicit vs. implicit

Result:

� There is a (nonlinear) monotonic

agreement, with some noise

Implication:

Mechanism similar, but shift in transition state

� Shift in TS: pfold ~½ in TIP3P has pfold ~0 in GB/SA

� Theory can deconvolute the TS shift from noise

� Yields a quantitative measure of similarity in mechanism

(via non-linear pfold correlation)

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Explicit Solvent

Imp

lic

it S

olv

en

t

Pfold (TIP3P)

Pfo

ld(G

B/S

A

)

39

Results

Explicit solvation models

TIP3P: 3-point water model

TIP4P: 4-point water model

M20, M24: TIP3P variants

To improve free energy properties

Implicit solvation models

GB/SA: Generalized Born

Cα−Go model

All-atom Go model





40




high degree of sampling with accurate models







Primary challenges

41

un fo lded s ta teun fo lded s ta teun fo lded s ta teun fo lded s ta te

?Questions:

� Do any proteins

fold via these

mechanisms?

� Are any of these

“universal”

� Can simulations

help to arbitrate?

native statenative state

Protein folding theory

formatio n o fmicro domains

native state

di ffusion and coll ision ofmicrodomains

native state

formation ofa nucleuscollapse

"topomer"

1. Form secondar y structure first

(Diffusion/Collisi on)

� Hierarchical: form helices & hairpins, decrease entropy

2. Nucleation

� Form nucleus of structure, then grow

(analogous to 1st order phase trans)

3. Collapse first

� Hydrophobically driven:

remove water to form HBs

4. Form rough native shape

first (topomer search)

� Find the right “topology” first,

then pack side chains

42

Zinc finger fold: BBA5

� BBA5: model protein

� Designed and characterized

by Barbara Imperiali’s group

� Small, but stable

� Successful folding

� Fold within 2Å of native state

� Rate (4.5 µµµµ s) agrees

with experiment (7.5 µµµµ s)

� TIP3P corrected rate is 7.5 µµµµ s

� Methods

� Amber94-GS, NPT, RF

� 250 µµµµ s simulated time

(>106 CPU-days

on Folding@home)

(BBA5 designed and characterized

by Barbara Imperiali’s group)

NMRnative state

Simulationnative state





43

General folding properties

are independent of the water model

� Both models lead to a diffusion-collision mechanism

� 2nd structure forms independentl y

� Probability of forming helix & hairpin statistically independent:

P(helix & hairpin) = P(helix) P(hairpin)

Explicit

solvent(TIP3P)

Implicit

solvent(GB/SA)

44

Three helix bundle: villin headpiece

� Canonical model system for folding

� Small 3 helix bundle

� Well-formed hydrophobic core

� Exposed hydrophobic residues

(for function)

� Simplest protein with function

� Actin binding domain

� TRP and PHE residues involved

in binding: cause trouble for folding?

NMR structure (McKnight)

45

MSM for villin: kinetics

MSM for kinetics

� Predict rate: 3.1±0.3 µµµµ s

experiment: 4.3±0.6 µµµµ s

� Agrees with short times (inset)

and predicts long timescale

� Essentially single

exponenti al dynami cs

Folding mechani sm

� Finds rough topology first

� Then locks in sidechai ns

and secondar y structure

� More like how larger proteins fold?

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10t (µs)

0

0.002

0.004

0.006

0 5 10 15 20t (ns)

fracti

on

fo

lded

Markovian State Model (MSM) for villin:

allows for a direct calculation of kinetic properties over long timescales

blue curve = MSM prediction

red curve = raw data

Villin kinetics: MSM vs. raw data





46

How do proteins fold?

� We find no single mechanism

� Collapse first (protein G hairpin)

� Hydrophobicall y driven, must remove water

in order to make hydrog en bonds stable

� Form secondar y structure first (BBA5)

� Form helices & hairpins

� Hierarchical, decrease in entropy

� Form rough native shape first (villin)

� Are there any universal aspects?

� So far, no! Perhaps there isn’t anythi ng to find?

� Evolution uses what ever works

47

AcknowledgementsTopic Group member(s) Collaborator(s)

Methodology for grid-based

protein simulation

Guha Jayachandran,

Siraj Khaliq,

Michael Shirts

Folding@Home contributors Erik

Lindahl (Stanford) Adam Beberg

(Mithral)

Protein folding Guha Jayachandran,

Nick Kelley, Nina Singhal,

Chris Snow, Vishal

Vaidyanathan, Bojan

Zagrovic

Martin Gruebele (UIUC), Steve

Hagen (U Florida), Feng Gai

(U Penn), Kevin Plaxco (UCSB)

RNA folding Eric Sorin, Jim Caldwell,

Sung-Joo Lee,

Mark Engelhardt

Dan Herschlag, Seb Doniach

(Stanford), Lois Pollack (Cornell)

Non-biological folding Sidney Elmer Craig Hawker (IBM)

Free energy

calculation methods

Young Min Rhee,

Michael Shirts

Bill Swope (IBM),

Chris Jarzynski (LANL)

Protein design Lillian Chong,

Stefan Larson,

Vishal Vaidyanathan

John Desjarlais (Xencor),

Chris Garcia (Stanford)

48

Simulating Protein Folding with Full Atomistic Detail Prof ... › upload › content › talk ›...

Documents

Transcript of Simulating Protein Folding with Full Atomistic Detail Prof ... › upload › content › talk ›...