Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around...

20
Notes on protein folding Jimmy Qin Fall 2019 These are notes on protein folding. They include my personal notes from reading as well as notes from our group meetings and from others’ presentations. Review: Finkelstein and Galzitskaya, “Physics of protein folding.” Phys. Life Rev. 1: 23 (2004) This is a review written by two famous Russian dudes working in protein physics, for physicists. It is very good and perhaps as good as the entire book by Fraunfelder. For example, it covers some scaling laws, etc. It focuses only on the kinetics of the first-order folding transition. It does not cover, for example, the dynamics of a particular phase, etc. Introduction A protein is a heteropolymer made of amino acids. The protein folding problem is as fol- lows: given only the amino acid structure, can we predict the folded (i.e. native) state of the protein? (We will see later that the actual folded state does not necessarily correspond to the thermodynamically stable state.) Here are three “fundamental experimental facts:” Proteins have well-defined 3D stable structures Proteins are capable of self-organization The phase transition to the folded state is first-order, i.e. all-or-none. Since it is discon- tinuous, it ensures robustness of protein structure. A little bit about the biological and in vitro processes Typically the protein starts folding as it is being spit out of the ribosome; for large proteins, the already-synthesized domains can fold before the entire protein is synthesized. There are special proteins called chaperons which separate unfolded proteins from each other so they don’t get mixed up; however, the chaperons do not change the final structure of the protein. 1

Transcript of Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around...

Page 1: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Notes on protein folding

Jimmy Qin

Fall 2019

These are notes on protein folding. They include my personal notes from reading as well as notesfrom our group meetings and from others’ presentations.

Review: Finkelstein and Galzitskaya, “Physics ofprotein folding.” Phys. Life Rev. 1: 23 (2004)

This is a review written by two famous Russian dudes working in protein physics, for physicists.It is very good and perhaps as good as the entire book by Fraunfelder. For example, it coverssome scaling laws, etc. It focuses only on the kinetics of the first-order folding transition. It doesnot cover, for example, the dynamics of a particular phase, etc.

Introduction

A protein is a heteropolymer made of amino acids. The protein folding problem is as fol-lows: given only the amino acid structure, can we predict the folded (i.e. native) state of theprotein? (We will see later that the actual folded state does not necessarily correspond to thethermodynamically stable state.) Here are three “fundamental experimental facts:”

• Proteins have well-defined 3D stable structures

• Proteins are capable of self-organization

• The phase transition to the folded state is first-order, i.e. all-or-none. Since it is discon-tinuous, it ensures robustness of protein structure.

A little bit about the biological and in vitro processes

Typically the protein starts folding as it is being spit out of the ribosome; for large proteins, thealready-synthesized domains can fold before the entire protein is synthesized. There are specialproteins called chaperons which separate unfolded proteins from each other so they don’t getmixed up; however, the chaperons do not change the final structure of the protein.

1

Page 2: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Around 1960, it was discovered experimentally that globular proteins can spontaneously fold, in-vitro. (Plus, under sensitively tuned conditions, the protein can reassemble from the constitutentamino acids; however, it may not be the same protein.) Therefore, the spontaneous folding propertyallows us to detach the study of protein folding physics from the study of protein biosynthesis.No more biology, but rather physics with biological materials!

Protein folding puzzle (Levinthal paradox)

The ability of proteins to fold spontaneously raises a fundamental problem about informationwhich is known as the Levinthal paradox. Here it is:

Because the protein goes to the same native state both in vivo and in vitro, it seems we can safelysay that the native folded state is thermodynamically preferred under the standard biologicalconditions. However, for simple 1D chain with 100 amino acids, there are 2100 possibilities to test!And since even a single deviation from the stable state can strongly increase the chain energy inthe closely-packed structure, essentially all states must be sampled before finding the energeticallystable one. If a sampling takes 1 ps, then around 1010 years are necessary to search for the stableconformation!

To summarize: the paradox is that experimentally, we know there is a stable structure. But ifthere is a stable structure, it would take us forever to find it.

Resolution: The resolution of Levinthal’s paradox is that there should exist a kinetic foldingpathway which guides the transformation of the protein from the unfolded to the folded state.The native state is merely the end of the pathway and may not be the most stable chain fold.Rather, the native state is the lowest-energy choice of the manifold of states which are easilyaccessible.

Folding pathways

To summarize, the resolution of Levinthal’s paradox is that the protein structure is under kinetic,rather than thermodynamic, control. It raises interesting questions on how to predict proteinstructure and on how to design de novo proteins which fold to a structure we want. There isa tension between thermodynamics and kinetics: should we maximize stability of the desiredstructure or create a rapid pathway to it?

Pitsyn proposed a model of stepwise protein folding called framework model. Essentially, theα-helices and β-hairpins and β-sheets are formed first, then these secondary structures are gluedtogether, and finally the structure is “crystallized.”

2

Page 3: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Figure 1: Apparently these are called “pre-molten globule,” “molten globule,” and “globule.”

Let us study this framework model. We will study each step in turn.

Folding to the secondary structure

Experimentally, we know that some native-like secondary structure elements form early andrapidly, without talking to other parts of the chain. They may play a role in initiating the entireprocess of protein folding. The folded secondary structure is called the pre-molten globule.Here are the α-helix and β-hairpin structures:

Kinetics of α-helix formation

Experimentally, we determine the kinetics of both α- and β-structure formation by using anultrafast laser-induced temperature jump, which unfolds the α- or β-structure. We control theunfolding rate, and once the unfolding rate “crosses over” the folding rate, we can determine thefolding rate.

α-helix formation occurs over perhaps 200 ns for an α-helix of 20-30 residues.

The canonical theory of α-helix folding in homopolypeptides (this may not apply as well elsewhere!)is based on the Zimm-Bragg model of helix-coil transitions (a simplified and updated version isin Accounts Chem. Res. 1998 31: 74553.). Also good to know is that the “coil” means “randomcoil,” which is not geometrically like a coil; see https://en.wikipedia.org/wiki/Random_coil.We will describe the theory now.

1. Initiation: Generally, a helix is more stable than a coil. However, there is a kinetic barrierto produce the first turn of the helix. This is the slow step. Why is the first turn of the helixhard to make?

3

Page 4: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Consider ∆G = ∆H − T∆S. When we make a helix, we lower the entropy, which is en-ergetically unfavorable. However, when we add another turn to an existing helix, we pairthe amino acids with buddies on the preceeding coil, which gives a negative ∆H, whichcounteracts the unfavorable effect of the entropy change. Therefore, the free energy helixinitiation is energetically unfavorable; it is roughly

Gαinit ≈ +4

kcal

mol.

2. Elongation: After the α-helix has started to form, neighboring peptides join the structurein succession. We generally associate a time τ (around 1 − 10 ns) for another amino acidto join, which is related to the vibrational frequency. If this is related to the vibrationalfrequency, then it should also give us an estimate of the initiation time,

tαinit ≈ τeGαinit/RT .

It seems that σα = e−Gαinit/RT ≈ 10−3.

3. Effect of chain length: We have separated the kinetics of α-helix formation into two steps.Which step is slower depends on the length N of the peptide.

Since the protein can initiate at any site in the chain, the initiation time is roughly

tαinit, N =tαinit

N.

The elongation time is roughlytelong, N = Nτ.

The length N0 which separates the short-length and long-length regimes is therefore

tαinit, N0= telong, N0 =⇒ N0 ≈ σ−1/2

α ∼ 30.

For N < N0, the slow step is initiation. For N > N0, the slow step is elongation. However,the protein folding time generally never goes to infinity: for extremely long chaings N N0,there are roughly N/N0 independent helix initiation acts, and the effective length of theprotein is more like N0. Conclusion: even in very long polypeptides, the α-folding transitionis expected to occur within 30τ , independent of N .

Kinetics of β-hairpin formation

It takes perhaps 30 times longer to fold a β-hairpin, compared to an α-helix. The theory ofβ-hairpin folding, called zippering theory, is quite similar to the coil-helix transition.

Why should β-structure take longer than α-structure to fold? α-helices can start to form anywherein the chain, while β-helices must start at the chain turn, or at the “hairpin.” Therefore, theinitiation of a polypeptide of length N takes around N times longer in the β-structure than in theα-structure.

The rest of the kinetic model is the same as for α-helices. We can model the thermodynamicsof a β-sheet, which is a lot of β-helices put together, with the regular first-order phase transitionenergy

Gβ(N) = γ√N + εN.

4

Page 5: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Here, ε < 0 describes the stability of the β-sheet. The more negative ε is, the more stable theβ-sheet.

The minimum energy is G→ −∞ at N →∞. We have

Gβ = 0 for N = 0, N0 = (γ/ε)2.

The maximum free energy, i.e. the energy barrier, is

Gmaxβ =

γ2

−4εat N =

1

4N0.

We could guess this is basically the activation energy,

Gβinit =

γ2

−4ε=γ

4

√N0.

The stability ε changes drastically in different chemical environments. ε is not very negative inwater, so β-sheets fold slowly in water. α-helices can fold faster.

Molten globule intermediate

After the α- and β-structures are done folding, they are kind of “glued together.” Before the fullnative state is achieved, we meet a state in which the secondary structures are already folded butthe links between them are still kind of loose. This is called the molten globular intermediatestate. The crystallization of protein structure within this molten globule is the last step of folding.

Two-state folding

What if there are no metastable intermediate states, or they are not metastable enough suchthat their existence is too fleeting for us to observe? Then we observe a sudden transition fromthe primary structure to the folded structure – the all-or-none transition, which is also calledtwo-state folding.

The microscopic way to do this would be transition state theory, which is prohibitively hard.Anyway, we postulate a transition state T of high energy. Letting D be the denatured state andN be the native state, and letting

k0 = τ−1

be the characteristic time of elementary transitions, kinetic theory gives

k→ = k0e−(GT−GD)/RT and k← = k0e

−(GT−GN )/RT

which gives us an equilibrium constant

K =k→k←

.

5

Page 6: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Levinthal paradox for two-state folding

The Levinthal paradox is even harder to resolve for two-state folding, because if there are no inter-mediates, then we don’t have a good idea of a possible “directed” kinetic reaction pathway. Also,if experiments cannot detect the transition state, we can only rely on theory and simulations!This is a big challenge. Wild idea: it has been suggested that the stable structure automati-cally forms a focus for the “rapid” folding pathways, so there is no tension between kinetics andthermodynamics.

It turns out that the all-or-none transition requires the amino acid sequence to provide a significantenergy gap between the native fold and other folds. The ability to create this energy gap is whatdistinguishes protein chains from random heteropolymers of amino acids. This is one of the mainconclusions of theoretical protein physics!

The main theoretical idea was to develop a study of energy spectra, like in the spin glasses.For thermodynamics only (and not kinetics), the “Random Energy Model” proposed by Derridaseems to capture most of the important effects. In this model, each microstate attains its energyindependently of the others. If this is true and the gap condition holds, then we can show thefollowing claim, which proves the stable structure of a normal size heteropolymer can be reachedon the order of seconds or minutes:

Claim: If the global thermodynamically-stable state is separated from the rest of the configurationlandscape by a substantial energy gap, then there is at least one rapid folding pathway which leadsto the lowest-energy state. In this case, the native state is the same as the thermodynamicallystable state.Proof : We would like to show there exists at least one rapid folding pathway to the lowest-energystate. First, we must define what “rapid” means.

A “rapid pathway” is one which (1) does not have too many steps (2) does not have a very highfree-energy barrier. Again, we estimate

folding time ≈ τe(GT−GD)/RT ,

where T and D index the transition state and denatured (unfolded) state. The question is then:how high is ∆G = GT − GD? ∆G can be low if the decrease of entropy is simultaneously com-pensated by a decrease in energy, as in the usual first-order phase transitions. In this case, theentropy decrease and the energy gain along the sequential folding pathway are somehow tuned tobe essentially equal. There are thus no volume effects on ∆G and the free energy barrier is dueto the surface effects only.

Suppose the chain is of length L. An increase in L means the chain is folding (i.e. adding morepolymers from the unfolded parts of the heteropolymer.

Let us model the free energy of the transition state ∆G as being due to the boundary betweenthe native and unfolded phases. The boundary is approximately a surface area, so energy scaleslike L2/3. In the worst case, the transition state is when half the chain is folded, so

∆H ∼ 1

4εL2/3

where ε is the protein denaturation energy per residue. Numerically we find ε ≈ 2RT at roomtemperature. Now let us think about entropy. The entropy lost depends on how ordered the

6

Page 7: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

structure is compared to the original unfolded structure. It is on the order of

∆S ∼ RL2/3.

Therefore,∆G ≈ (1± 0.5)RTL2/3.

Figure 2: Amazing – see it to believe it!

In fact, there is another contribution to ∆G, not from surface effects, but rather from the “rugged-ness of the reaction pathway,” which scales like L1/2.

Studying the folding nucleus

The above theory of the ∆G ∼ L2/3 scaling law is predicated on the idea of a transition state,the boundary between manifolds which flow to unfolded and folded final states. We give thistransition state a name – the folding nucleus.

Review article: Michaels, et. al. “Chemicalkinetics for bridging molecular mechanisms and

macroscopic measurements of amyloid fibrilformation.” Annu. Rev. Phys. Chem. 69: 273

(2018)

We would like to understand how normally-soluble peptides and proteins aggregate to form amyloidfibrils. We will use the tool of chemical reaction kinetics to do so.

7

Page 8: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Introduction

What are amyloid fibrils? Biologically, they are a class of filamentous structures which arise fromthe assembly of normally monomeric proteins or peptides, and feature a characteristic structurerich in β-sheets. On the largest scale, amyloid fibrils can self-assemble to form amyloid plaquesand amyloid films, which are observed in Alzheimer’s, Parkinson’s, type 2 diabetes, prion dis-eases, and sickle-cell anemia. However, amyloids are involved in many functional processes, suchas storing hormones and protecting surfaces via the hydrophobic effect.

The most important molecular interaction in amyloid fibrils are the directional hydrogen bondswhich make up the “backbone” which holds together the cross-β structure of the fibrils. Thehydrogen bonds tend to be strong, so the amyloid structure is quite stable and therefore manyunrelated proteins can form the fibrils. The fact that many different proteins can form the samestrucutre gives us hope that there are general physical principles involved.

Kinetics of amyloid fibril formation

We would like to write down a master equation which describes the chemical kinetics of amyloidfibril formation. Here are the biological processes we consider:

• Fibril elongation: adding monomers to the ends of existing fibrils

• Monomer dissociation: monomers fall off the ends of existing fibrils. (This is much slowerthan fibril elongation and is sometimes neglected.)

• Nucleation: form new fibrils. This is done in several ways:

– From individual monomers: We can have homogeneous nucleation or heterogeneousnucleation (i.e. on an interface, such as on the surface of lipid vesicles, or an impurity).

– From fibrils: Essentially, this is heterogeneous nucleation on the surface of existingfibrils.

Let us call f(t, j) the concentration of aggregates of size j at time t. The master equation,accounting for all processes described above, is given by

∂tf(t, j) = 2k+m(t)(f(t, j − 1)− f(t, j)) + 2k−

∞∑i=j+1

f(t, i)− k−(j − 1)f(t, j)

knm(t)ncδj,nc + k2m(t)n2

∑i=nc

if(t, i)δj,n2

And

∂tm(t) = −∞∑j=nc

j∂tf(t, j).

It is easy to describe what each thing is:

• f(t, j) is the concentration of j-length fibrils at time t.

8

Page 9: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

• m(t) is the concentration of monomers in solution.

• k+ describes the fibril elongation.

• k− describes the fibril dissociation.

• kn describes the spontaneous nucleation of fibrils.

• k2 describes the secondary nucleation (on an existing fibril of length i) of fibrils.

• nc is like the smallest size of a nucleus.

We can try to solve these with a computer. A more tractable analytic method, which still givesus some of the information, is to introduce the principal moments

P (t) =∞∑j=nc

f(t, j) and M(t) =∞∑j=nc

jf(t, j)

and rewrite the master equation as a time-evolution of the moments,

∂tP (t) = knm(t)nc + k2m(t)n2M(t) and ∂tM(t) = −∂tm(t) = 2k+m(t)P (t).

In the last equation, we have neglected the nucleation terms, which are generally smaller than theelongation terms.

Analytical solutions

These aren’t hard for a computer to solve, so why bother with the analytics? The answer isthat analytics can tell us about scaling laws, which are more apparent analytically than if onlynumerical methods are used.

The general result is that the integrated rate law, i.e. the time-dependent concentration, showsan initial lag time, followed by rapid growth, followed by a plateau at the steady-state. Generally,it is found that the half-time t1/2 of the reaction scales with the initial number of monomers like

t1/2 ∼ m(0)γ

where γ is some scaling exponent. In fact, different kinds of nucleation processes give differentscaling laws and different kinetics. We can find out what the dominant process is by testingdifferent regimes and finding the one that fits, such as below:

Generally, it is found that

γ = −1

2(nnucl + nelong),

where nnucl and nelong are the reaction orders of the dominant nucleation and elongation processes,respectively.

9

Page 10: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Which fibrils are toxic?

It is currently thought that the prefibrillar protein oligomers, rather than the mature fibrils, couldbe the main toxic agents in many diseases. However, it seems that these oligomers usually do notturn into the mature fibrils – rather, only oligomers of a characteristic and rare size turn into themature fibrils; the other oligomers just dissociate after some finite lifetime.

Chemical kinetics in drug discovery

If we know the chemical kinetics of the formation of these things, then we also can guess how toinhibit the production of the fibrils. The problem with amyloid formation is that the most toxicforms are short-lived and are challenging to isolate and characterize.

If we find an inhibitor, we can try to use the chemical kinetic rate laws and experimental data todetermine which step it inhibits. This can tell us more about the biological process involved.

Paper: Zwanzig, “Simple model of protein foldingkinetics.” Proc. Natl. Acad. Sci. 92: 9801 (1995)

This is a short and influential paper on protein folding kinetics. It includes the idea of a gap inthe energy spectrum, where the native state is much more favorable than the manifold of otherstates.

Let us assume we are dealing with a protein which easily folds to its native state, and ignore issuesof metastability, roughness of the energy landscape, etc. We will set up a very generic formalism.It will be weak because the parameters in this formalism have no immediate physical meaning. Itwill be strong because the formalism is quite general and can be used for many things.

Suppose there are N discrete parameters, each of which takes ν + 1 values. The total number ofconfigurations is (ν + 1)N and only one of these is “correct” for all the N parameters. Let thenumber of incorrect parameters be S. When S → 0, the protein is correctly folded. In this sense,our reaction coordinate is how “correct” the configuration is, compared with the native state.

Thermodynamics

We need to postulate an energy E(S). One simple choice is a smooth funnel with a gap ε,

E(S) = SU − εδS0, 0 ≤ S ≤ N.

Here U, ε > 0 and ε describes how favorable it is to be in the native state. If K = νe−βU , then thepartition function is

Z = eβε + (1 +K)N − 1

10

Page 11: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

and the equilibrium probability of a configuration with S > 0 is

Peq(S) =KS

Z

(N

S

)for S > 0.

Thermodynamically, there is a “folding transition” at some temperature Tc where the protein tendsto be folded for T < Tc and unfolded for T > Tc. This can be calculated thermodynamically.

Kinetics

We are thinking of everything in terms of S. In an elementary step, S can change to either S − 1or S + 1. Let us postulate an elementary rate constant

k1 ∼ e−∆E/kT

and recall that the total number of peptides is N . Therefore the rates are

w(S → S − 1) = Sk1, w(S → S + 1) = (N − S)Kk1 for S > 0

by detailed balance law, which follows from

w(S → S − 1)Peq(S) = w(S + 1→ S)Peq(S + 1)

and the rate constants hold even if the system is not in equilibrium. The extra factor of K = νe−βU

is because the states have different energies and such, so their equilibrium occupations are different.Finally,

w(0→ 1) = NKk1e−βε.

We can then write the usual master equation

∂tP (S) = w(S−1→ S)P (S−1)−w(S → S−1)P (S)+w(S+1→ S)P (S+1)−w(S → S+1)P (S).

Because the rate w(0 → 1) is slow, the bottleneck is supposed to be between S = 0 and S = 1.We can estimate the folding time using these master equations.

Meeting 8 September 2019

Ethan Kim’s paper

Inhibition of amyloid fibril formation. Amyloids can come together and form plaques, which causedisease.

• Nucleation: fibril starts to grow. Dimer or trimer of individual monomers; the nucleationstep is the slow step.

• Elongation: monomers added to the end of the chain.

11

Page 12: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Drugs can be tuned to block either the nucleation or the secondary elongation phase. We canmodel the concentration of the aggregate over time. We would like to find an optimal dosingregime for the drug, where we want to minimize the toxicity but also make the drug work.

• To inhibit nucleation, administer immediately.

• To inhibit elongation, administer only after nucleation has finished.

There are some things which tell us how much to administer, and when. Essentially, we create anartificial cycle in which we “control” the amount of time in each cycle. If we know how harmfulthe toxic drug is, we can determine how long to treat the patient in each cycle.

Michael Medaugh’s paper

Michael’s paper is related to Ethan’s paper. How can we calculate the concentration vs. timecurve? We write down a differential kinetic equation to solve for the concentrations, the bindingand dissociation constants kb and kd, and the reaction order.

How do we know the reaction order for nucleation processes?

Burk shift – once a heme group has a single oxygen, it gets easier to pick up other oxygens.

A phase space argument – when the regions occupied by two different “stable states” in phasespace start to overlap, then the states could switch and then there would no longer be a singlefolded state. Ch 11 in Frauenfelder

Ethan Cobb’s paper

Three general questions: folding mechanism, predict native structure from amino acid sequence,predict tertiary structure.

What balance of forces encodes the native structure? The native structure may not necessarilybe the most stable structure. Folding code “distributed locally and nonlocally in the sequence.”Why can two proteins have the same native structure?

Review article: Dill, “Polymer principles andprotein folding.” Protein Science 8:1166 (1999)

This paper reviews the role of statistical mechanics and polymer theory in protein folding. Itresolves two paradoxes:

• Blind watchmaker’s paradox: biological proteins could not have originated from randomsequences

• Levinthal’s paradox: folded state of a protein cannot be found by random search.

12

Page 13: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

In both processes, the paradoxes are resolved by positing that the searches are (partially) guided,not wholly random. In this case, the vastness of the search is irrelevant to the search time. In thissense, we will introduce energy landscapes and fitness landscapes.

For information on what is called the φψ interaction, which is really an enumeration of the allowedchain angles in the backbone, see https://en.wikipedia.org/wiki/Dihedral_angle#Proteins.

Importance of side-chain forces

In the start of protein folding, with the predictions of Pauling (of α and β structures), peoplethat that protein folding was backbone centric, favoring certain φψ values in the Ramachan-dran plots, which could be calculated using certain steric repulsion arguments, and especially thehydrogen bonds.

However, recently the view of protein folding has shifted to being side-chain centric rather thanbackbone centric. Perhaps the greater contribution to the free energy is encoded in a delocalized“solvation” code – there are very few conformations of the full chain which can buy nonpolaramino acids to the greatest possible degree. In this sense, we have to make sure the nonpolaramino acids live in the inside of the protein macromolecule and the polar ones are on the outside,because they are in contact with water.

In this view, fast secondary structure formation is not a consequence of strong helix φψ propensi-ties, but rather a rush to nonpolar desolvation – hiding the nonpolar amino acids on the insides. Itseems that if we ignore the backbone interactions, we can reproduce many of the results of proteinfolding; for example, a side-chain centric view predicts collapse.

What do these models look like? One is the HP model, in which each amino acid is a bead, eachbond is a straight line, bond angles are a few discrete options, we work in 2 or 3 dimensions, andthe 20 amino acids are condensed into a two-letter alphabet: H (hydrophobic) or P (polar). Thisis amenable to statistical mechanical considerations.

Blind watchmaker paradox

Can proteins arise from random sequences? For a protein chain of 100 amino acids, there are 20100

sequences. However, if ask a different question – what is the chance of creating a chain that foldsto a particular structure – the probability improves to perhaps 10−15, which is perfectly very bigcompared to Avogadro number, for example.

Levinthal paradox

Both Levinthal paradox and blind watchmaker paradox are resolved by going from the “old” tothe “new” views of protein folding:

13

Page 14: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Ensemble perspective of protein folding

In the ensemble view, the vastness of the search is irrelevant, and the most important problemis kinetic trapping. Chains can sort quickly through vast stretches of conformational space,like a ball rolling down a bumpy funnel. In this view, if the kinetics is fast, i.e. two-state phasetransition, then the chain is folding at nearly is maximum possible diffusion-limited speed, withoutkinetic traps. The stable intermediates are the kinetic traps which slow down the folding process.

One way to think about this is that even a very small bias can be the difference between a lifetimet → ∞ and a lifetime t on the order of milliseconds. Imagine a golf course: the ball never findsthe hole by random processes, but if there is a small tilt that funnels towards the hole, it alwaysfinds the hole.

Where does the energy funnel come from?

The first estimates of the shapes of folding energy landscapes were based on mean-field theories.Hydrophobic collapse (i.e. the nonpolar molecules hide in the middle) lead to compact chainconformations. Therefore, the funnel arises because the drive to collapse is also a drive towardsa reduced number of configurations, in which some of the amino acids are “taken out” of thegame because they are hiding in the middle. The fraction of conformations which are compact isinfinitesimal compared to the total conformational space. This apparently can be predicted usingFlory-Huggins theory. So the fewer nonpolar-polar interactions, the lower the energy, and thefewer configurations there are. In this sense, collapse can be fast.

Energy landscape model

Suppose we have a chain of length n and m of them are nonpolar. The progress variable ξ =0, · · · ,m reflects the extent of folding. To first approximation, we use E = ξε and

Fmacro(ξ) = −kT ln Ω = −kT ln[g(ξ)e−E(ξ)/kT ] = E(ξ)− kT ln g(ξ),

where g(ξ) is the number of conformations having ξ hydrophobic contacts, which can be rewrittenas

F = E − TS.

14

Page 15: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Often there is no single reaction pathway. Different unfolded proteins start in different initial statesand generally encounter different metastable states. This is the idea of hierarchical folding,primary to secondary to tertiary structure, in contrast to a unique sequential pathway. However,most simulations have found preferred folding routes in the late stages of folding.

Is there a fluctuation-dissipation relationship for protein equilibrium processes?

Paper: Shakhnovich and collaborators. “Howdoes a protein fold?” Nature 369 248 (1994)

This is a simulation paper which uses lattice Monte Carlo to find the global minimum energy, ornative state. The native state should be a pronounced global minimum of the energy surface.Note: the description of the process is true only for short proteins (perhaps 30 amino acids long).

Results: Folding starts by a rapid collapse from a random-coil state to a random semi-compactglobule. It then proceeds by a slow, rate-determining search through the semi-compact states tofind a transition state from which the chain folds rapidly to the native state. There are manytransition states and also a reduced number of conformations which need to be searched in thesemi-compact globule.

There is a critical temperature Tc. For T > Tc, energy and entropy decrease smoothly as the chainapproaches the native state. For T < Tc, the free-energy reaction profile is rugged and folding isthus slow.

Figure 3: T > Tc

15

Page 16: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

Figure 4: T < Tc

We should investigate why the plots look so different for different temperatures. Probably it isdue to the F = H − TS term and the temperature T changes the slope of the S-profile. Thiswould be a nice project to do, generalizing this idea to arbitrary degrees of freedom. We couldintroduce a distribution function f and investigate the effect of temperature, accounting for thequantum tunneling effects at low temperature – similar to the hydrogen adsorbates quantumdiffusion project. We could also calculate, theoretically, the diffusion constant in phase spaceand write down a kinetic equation. The average folding time could be extracted by averaging thefolding time over all reasonable initial configurations. This would eliminate the effect of metastable“potholes” and emphasize the effect of metastable “moats” which surround the entire native-stateregion.

Paper: Shakhnovich and collaborators. “Phasediagram of a model protein derived by exhaustive

enumeration of the conformations.” J. Phys.Chem. 101 1444 (1994)

It is suggested that the phase space of the protein, away from the native state, is complex. Thereare several states. Some possible regimes are the random coil, native state, homopolymer-likeglobule where the chain is relatively compact but fluctuates between a large number of collapsedconfigurations, molten globule where the backbone has attributes of the antive structure, butthe sidechains are still free to rotate.

Paper: Liu, et. al. “Toward a quantitative theoryof intrinsically disordered proteins and their

function.” PNAS 106 47: 19819 (2009)

16

Page 17: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

From the abstract: A large number of proteins are sufficiently unstable that their full 3D structurecannot be resolved. The origins of this intrinsic disorder are not well understood, but its ubiquitouspresence undercuts the principle that a proteins structure determines its function. Here we presenta quantitative theory that makes predictions regarding the role of intrinsic disorder in proteinstructure and function. In particular, we discuss the implications of analytical solutions of aseries of fundamental thermodynamic models of protein interactions in which disordered proteinsare characterized by positive folding free energies.

Many proteins are not stable enough for experimental methods to resolve their full 3D structure.We call these disordered proteins, and people suggested that their disorder plays a functionalrole, allowing for multiple interaction partners and functional diversity. Apparently this is impor-tant in cell signaling and cancer.

This paper tries to look for ways in which disorder influences protein function. Proteins in a vivoenvironment may have disorder in their long loops, end terminals, hinge regions, and sometimestheir entire sequences. We will work especially with the folding free energy ∆Gf and thedissociation constant Kd of the protein-self interaction. A positive folding free energy,

Gf > 0,

corresponds to a disordered protein. Actually, the folding free energy may be positive only for aspecial part of the amino acid chain; the term “disordered” means part of the chain refuses the fold.For a very nice summary, see https://en.wikipedia.org/wiki/Intrinsically_disordered_

proteins.

Paper: Wolynes and collaborators. “DiffusiveDynamics of the Reaction Coordinate for Protein

Folding Funnels.” J. Chem. Phys. ?? (1996)

We describe the protein folding kinetics using a collective reaction coordinate, which moves dif-fusively in the phase space. The analytic calculations agree with simulation results, and supportthe idea of an energy funnel.

The different ensembles of partially ordered structures on the path from unfolded to folded canbe described by one or more collective reaction coordinates or order parameters. Becausethe reaction coordinate is a “macroscopic” object associated with many microstates, we have touse the free energy instead of the Hamiltonian, to accunt also for the number of configurations(i.e. the entropy):

F (ξ) = β−1 ln[Ω(ξ)e−βH(ξ)

]=⇒ F = H− TS.

An important characteristic of the funnel is that the energy and entropy change as you move alongthe reaction coordinate: As you get closer to the folded state, the entropy decreases, and so doesthe energy. Since the free energy F is the driving force, this means that entropy and energy tend towork in opposite directions – the folded states are more energetically favorable, but there are fewerof them. At high temperatures, folding is an uphill process, so folding is exponentially supressed.At the folding phase transition temperature Tf , the free energy profile is usually bistable with asmall thermodynamic barrier. At low temperatures, folding is a downhill process.

17

Page 18: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

The gradient of free energy F determines the average drift up or down the funnel; superimposedon this drift is a stochastic motion whose statistics depends on the jumps between local minima;if the minima are large they may be metastable. To first approximation, this process can bedescribed as diffsuion.

The paper shows that the theoretical arguments realistically agree with fast-folding small helicalproteins.

“Good” folding sequence

We define a good folding sequence to be one which has a single dominant folding funnel anddoes not have large metastable traps. For very small proteins, we can use just a single orderparameter; for real proteins, we may have to use several.

Some ideas for the order parameter are

• Degree of collapse

• Helicity, or how much of the random coil has been added to the helix

• Fraction of correct contacts (i.e. shielding the nonpolar amino acids from the polar aqueousenvironment)

• Dihedral angles φ and ψ.

We must introduce the order parameters ~ξ to calculate the free energy surface F (~ξ). In the case ofa single dominant funnel, the order parameters can also be associated with reaction coordinates.

In the single funnel case, the folding time is determined by (1) difficulty to overcome the freeenergy barrier (2) the ruggedness of the landscape which enters via the diffusion coefficient.

Diffusion equation for the order parameter

We assume that the reaction coordinate can change only by small steps. In this case, the masterequation for the order parameter is

∂tP (n, t) = ∂n

[D(n)

(∂nP (n, t) + P (n, t)∂n(βF (n))

)].

The average direction of the flow is given by the gradient of the free energy, F (n). The diffusionconstant depends on the ruggedness of the energy landscape.

The folding time is expressed as an integral,

τf =

∫ nfold

nunf

dn

∫ n

0

dn′eβ(F (n)−F (n′))

D(n).

Near Tf , F (n) is approximately like a double well. The two minima of F (n) are near n = nunf andn = nfold, respectively. In this case, we can calculate τf from something called Kramer rule.

18

Page 19: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

The local diffusion coefficient depends on both the local moves allowed to the protein and theenergy landscape. According to the Bryngelson-Wolynes analysis, the diffusion coefficient hasstrong T -dependence which arises from the local minima on the rough energy landscape.

At high temperatures, D follows a Ferry law for spin-glasses,

D(T, n) = D0e−β2∆E2(n),

where ∆E2(n) is the local mean square fluctuation in energy.

Michael’s comments on my talk

(1) Account for all the different kinds of forces. What this means is we have to be careful whenestimating fluctuation magnitudes.

Michael’s talk

A protease is an enzyme that cuts a protein. If you want the protein to have a slow rate ofdegradation, such that the protease can’t find the end of the protein to attach, then you can tuckthe end of the protein into the center of the body. Sometimes aggregation happens, when the tailof one protein gets locked into the inter-digitation site of another protein. Then the proteinsget stuck together; this happens at high protein density.

Michael’s idea is that we can find a probability distribution for the length of linear aggregatesof proteins which have folded with each other. This is similar to the probability distribution ofmicelle size in Israelachvili Surface and Intermolecular Forces. The two competing effects are theability to find the interactions and the energetic favorability.

Another idea is studying the amyloid oligomers relevant to Alzheimer’s disease. We would like toquantify why the oligomers, only of certain size, tend to come together.

Also, Michael had something about the muco-adherent drug depot stomach, and maybe we cancalculate rates of diffusion of drug into the bloodstream.

Ethan’s talk

Ethan read mathematical models of drug delivery.

The kinds of models are called empirical and semiempirical, based on statistics, and mecha-nistic, which is a physics or chemical kinetics-based model. The mechanistic model’s parameters,such as the reaction rate, can be fitted with empirical data.

With drugs, usually the delivery system is based on capsules, like pills. For some drugs, thereis a membrane surrounding the drug which controls how quickly the drug is released. For some

19

Page 20: Physics of protein folding. Phys. Life Rev. 1: 23 (2004) · Jimmy Qin Protein folding notes Around 1960, it was discovered experimentally that globular proteins can spontaneously

Jimmy Qin Protein folding notes

other release mechanisms, the release of the drug depends on the concentration of the drug inthe body. Other drug releases are based on a polymer network where the drug is trapped in alattice. In the right environment, the lattice could get swelled or degraded, so the drug comes out.

How can we model these effects? (1) Simulation methods, i.e. Monte Carlo (2)

The release profile can be plugged into the differential equations from Maha’s paper to describethe cost function (patient suffering) and the differential rate

20