Lectures on Physical Biochemistry

1

Lectures on Physical Biochemistry

Todd Yeates

© 2015

2

For Melissa

3

Preface

The following text derives from material presented in a course in physical biochemistry at UCLA

(Chemistry and Biochemistry 156). Much of the material owes its origin to lectures delivered by

other faculty members at UCLA who were my teachers and mentors. These include Emil Reisler,

Wayne Hubbell, and especially Doug Rees, for whom I served as a TA for multiple offerings of the

course while I was a graduate student. With regard to other books upon which the material rests,

the classic text Physical Biochemistry with Applications to the Life Sciences by Eisenberg and

Crothers stands out as the most influential. Other texts from which selected materials have been

extracted include: Physical Biochemistry by van Holde, Physical Chemistry: Principles and

Applications in Biological Sciences by Tinoco, Sauer, Wang, Puglisis, Harbison, & Rovnyak, and The

Molecules of Life: Physical and Chemical Principles by Kuriyan, Konforti, and Wemmer, Molecular

Driving Forces by Dill, and Random Walks in Biology by Berg.

I am indebted to the students and TA’s who have participated in the course over many years and have

made developing and teaching the material a stimulating and rewarding challenge. I am particularly

indebted to Sunny Chun, who proofread the first draft.

4

Contents

Page

Chapter 1 9

Points for Review

o Thermodynamic systems

o Systems and surroundings

o The 1st law

o Work, w

o Heat, q

o Enthalpy, H

o The 2nd law

o Classical and statistical views of entropy

Entropy and the Distribution of Molecules in Space

Entropy and the Distribution of Molecules Among Energy Levels

Chapter 2 20

Entropy of Mixing and its Dependence on Log of Concentrations

o Stirling’s approximation

o ‘Entropy of Mixing’

Gibbs Free Energy, G

o A state variable that indicates the favorability (or equilibrium) of a process at

constant T & P

o ΔG as a balance of two factors, ΔH and TΔS

o How to think about ΔG in a steady state process

o Free energy of mixing and the: further insight into what drives processes towards

equilibrium

Chapter 3 26

Chemical Potentials, µ

o Definition of µ as a partial derivative of G with respect to composition

o Dependence of chemical potentials on concentrations and standard state chemical

potentials µ0

o The total differential, dG as a function of changes in composition

o Equilibrium conditions in terms of µ’s

o Equilibrium conditions in terms of concentrations and standard chemical potentials:

arriving at familiar equations for the equilibrium constant

o Importance of units

o Precautions about ΔG vs ΔG0, reactions with changes in stoichiometry, and overall

concentration effects

o The dependence of ΔG and K on T (van’t Hoff equation)

Chapter 4 37

5

Non-ideal behavior in mixtures

o The breakdown of ideal equations for chemical potential

o Activities and activity coefficients

o The ideal behavior of highly dilute solutions

o The origin of non-ideal behavior at higher concentrations

o Reworking the equilibrium equations in terms of activities instead of concentrations

Ion-ion interactions in solution as an example of non-ideal behavior (Debye-Hückel

theory)

o Ionic strength and the Debye length

o Activity coefficients for ionic species

o Using ionic activity coefficients to analyze the effect of charge on molecular

association, and electrostatic screening

Molecular crowding and excluded volume effects as an example of non-ideal

behavior in solutions of macromolecules

o The idea of excluded volume

o The peculiar behavior of rigid elongated structures

Chapter 5 50

Chemical Potential and Equilibrium in the Presence of Additional Forces

o Osmotic pressure

o Equilibrium sedimentation

Chapter 6 60

Electrostatic potential energy, ion transport, and membrane potentials

o The chemical potential energy of an ion at a position of electrostatic potential

o The Nernst equation and membrane potential

o The Donnan potential

o Variable ion permeabilities and complex phenomena

Molecular Electrostatics

o The dielectric value

o Simplified electrostatics equations

o A different kind of electrostatic energy: the Born ‘self-charging energy’

o Free energy of ion transfer

Chapter 7 70

Energetics of Protein Folding

o A balance between large opposing forces

o Terms that contribute to the energetics of protein folding

o The special case of membrane proteins

Measuring the Stability of Proteins

Ideas Related to How Proteins Reach their Folded Configurations

6

Chapter 8 80

Describing the Shape Properties of Molecules

o Radius of gyration

o Persistence length for flexible chains

Chapter 9 87

A Brief Introduction to Statistical Mechanics for Macromolecules

o Probabilities and expected values

o Statistical weights for outcomes with unequal probabilities

o Handling degeneracies

A Statistical Mechanics Treatment of the Helix-Coil Transition for a Polypeptide

Chapter 10 94

Cooperative Phenomena and Protein-Ligand Binding

o Relationship between cooperative behavior and processes involving formation of

multiple interactions simultaneously

o Protein-ligand binding equilibria

o Binding to an oligomeric protein – independent binding events, no cooperativity

o Non-linear Scatchard plots – non-identical or non-independent binding sites

o Experiments for measuring binding

o Phenomenological treatment of cooperative binding- the Hill equation

o Physical models of cooperative binding - MWC

Allostery

Chapter 11 113

Symmetry in Macromolecular Assemblies

o Definition of symmetry

o Mathematical groups

o Point group symmetries for biological assemblies

Special Topics in Symmetry

o Helical Symmetry (non-point group)

o Quasi-equivalence and the structure of icosahedral viral capsids

o Using symmetry to design novel protein assemblies

o Algebra for describing symmetry

Chapter 12 124

Equations Governing Diffusion

o Diffusion in 1-D

o General equations for diffusion

7

o Special topic: Using numerical (computational) methods to simulate diffusion

behavior

Chapter 13 132

The Diffusion Coefficient: Measurement and Use

o Measuring the diffusion coefficient, D

o Relating the diffusion coefficient to molecular size

Special Topic in Diffusion: Diffusion to Transporters on a Cell Surface

Chapter 14 143

Sedimentation velocity, v

o Sedimentation coefficient, s

o Combining s and D to get molecular weight without a spherical assumption

o A summary of molecular weight determination from sedimentation and diffusion

measurements

Chapter 15 148

Chemical Reaction Kinetics

o Reaction velocity, v

o Rate laws

o Integrating rate laws

o Behavior of more complex reaction schemes

o Numerical computer simulation of more complex reaction schemes

o Enzyme kinetics under a steady-state assumption

o Relaxation kinetics: how systems approach equilibrium

o Kinetics from single molecule studies

Chapter 16 164

Kinetic Theories and Enzyme Catalysis

o The Arrhenius equation

o Eyring transition state theory

o Catalysis by lowering the transition state energy

o Practical consequences of enzymes binding tightly to the transition state

o Kinetic parameters of natural enzymes

Chapter 17 171

Introduction to Biochemical Spectroscopy

o Energy transitions

o Fluorescence

o Kinetics of fluorescence and competing routes for return to the ground state

8

Chapter 18 178

Special Topics in Biochemical Spectroscopy

o Polarization and selection rules

o Fluorescence experiments with polarized light

o Fluorescent resonant energy transfer (FRET)

o FRET in biology

o Spectroscopy of chiral molecules: Optical rotation and circular dichroism

Chapter 19 194

Macromolecular Structure Determination and X-ray Crystallography

o The limiting effect of wavelength

o Diffraction geometry

o Obtaining the atomic structure

o Protein Crystallization

9

CHAPTER 1

Points for Review

Thermodynamic systems

We are all familiar with the everyday behavior of various kinds of mechanical systems. This often

aids us in understanding the behavior of molecules, which are indeed governed by the laws of

physics. But there are also key differences to bear in mind between the physical behavior of systems

at the macroscale and the thermodynamic behavior of molecular systems. Analogies can be drawn

to a bowling ball at the top of a hill. We know that if pushed it will go to the bottom of the hill and

(eventually) stay there. Once it has come to (apparent) rest we don’t worry about it suddenly moving

under its own internal energy to a higher location. Or likewise for a textbook sitting on a desk. We

wouldn’t think about measuring how far on average it finds itself levitating above its lowest energy

position on the desktop. But these sorts of ideas arise constantly in thinking about the behavior of

molecules. Why? The distinction is largely one of scale, having to do with relative sizes, forces, and

energies. The essence is that in molecular systems (at temperatures sufficiently above 0 K), the

magnitude of the thermal energy is comparable to energy differences associated with meaningful

differences in the properties of the molecules, such as their velocities and detailed three-dimensional

conformations.

We will emphasize throughout the course the importance of the idea of an ‘average thermal energy’,

which is kBT, where kB is Boltzmann’s constant (or alternatively RT when working with molar

quantities, where R is the universal gas constant and R=NAkB and NA is Avagadro’s number)). If we

accept the idea that physical objects exhibit energies on the scale of kBT, then how high might we

expect a 1kg textbook to levitate off the desktop under its own thermal energy at 298K? [Hint: equate

gravitational potential energy for the book at a height h with the energy value of kBT]. You will

(hopefully) find that that height is infinitesimally small, which is consistent with experience. But

owing to the much smaller energies that affect molecules, big or small, kBT is an energy sufficient to

drive the rapid movements, collisions, conformational changes, and chemical reactions that

characterize molecular systems.

Systems and surroundings

In thermodynamics it is important to keep in mind what is being considered as the system under

investigation. Everything else is the surroundings. When discussing thermodynamic quantities (P, V,

U, …) we are referring to measurements and properties of the system, but depending on the situation

the surroundings may be important in exchanging energy (in the form of work or heat) or material

with the system. In a closed system, no exchange of material occurs. In an isolated system, there is

no exchange of material or heat or work. In some problems, the entire universe might comprise the

system. In that case there are no surrounding with which exchange might occur, so the universe

would follow the same rules as an isolated system.

10

The 1st law

The first law of thermodynamics expresses a law of energy conservation, namely that the energy

change in a system equates to whatever energy is delivered to it (and thereby lost by) the

surroundings. Therefore, the change in energy (U) of a system during some process is given by the

amount of heat that is transferred to it from the surroundings plus the amount of work done on it by

the surroundings.

U = q + w

For an isolated system, q=0 and w=0, so U = 0

The first law is relatively easy to appreciate, since we’re familiar with conservation laws in other

contexts, e.g. conservation of mass, or the conservation of total energy in mechanical systems. Besides

conveying an important conservation principle, the first law serves as a reminder about equations

for work and heat.

Work, w

You’ll recall from physics that work is force integrated over distance or displacement:

w = F dx

Against pressure: F=PA, dV=Adx, so w= PA(1/A)dV= PdV [As written this work would have the

sense of work done by a system whose volume was changing.]

Against a harmonic spring with force constant k, w= Fdx = kxdx = (1/2)kx2

And likewise for any situation where a function for the force on a molecule can be written (possibly

depending on position). We can integrate over position to give the work energy that would be done

on the molecule as a function of its position.

Heat, q

Other things being held constant, we often associate heat transfer with temperature change, and the

heat capacity, C, relates those two changes. Recall

C=dq/dT or q/T and

q = C dT

The heat capacity is a measure of how hard it is to change the temperature by adding heat. From

introductory physical chemistry you’ll recall that for an ideal gas Cv = (3/2)R (on a per mole basis).

11

For complex molecules, the molar heat capacity is higher. For an ideal gas, the energy of the system

is associated solely with the kinetics (i.e. velocities) of the molecules (which are presumed rigid in an

ideal gas model). More complex molecules like biological macromolecules have very many ‘internal

degrees of freedom’, which are required to specify the positions and movements of atoms relative to

each other in the same molecule. Recall that macromolecules are subject to all kinds of

conformational fluctuations, mostly small but some very large. You might recall that the

equipartition theorem tells us that energy in the amount kBT will be partitioned equally into each of

the degrees of freedom in a system, which means that systems comprised of complex molecules will

require more heat energy to raise the temperature owing to the much greater number of degrees of

freedom into which the energy gets partitioned.

Enthalpy, H

Note from the form of the equation U = q + w, that at constant volume (so no ‘PV’ work is done)

heat transfer q relates closely to internal energy U. [If w=0 then U = q or dU = dq]

But if the volume is not constant, the heat transfer relates better to another thermodynamic state

variable, H, the enthalpy, given by H = U + PV

Differentiating H = U + PV gives

dH = dU + PdV + VdP = dq + dw + PdV + VdP

At constant P (dP=0) and with only ‘PV’ work, dw = –PdV [The sign appears negative here as the

work w must refer to the work done on the system, whereas our earlier equation for w had the

opposite meaning]

Substituting dw=-PdV into the previous equation, we see that at constant pressure and only PV work,

dH = dqP – PdV + PdV + 0, giving

dH = dqP (with the P subscript denoting what is held constant). So enthalpy and q are closely related

at constant P.

Note that especially for gases, where pressure and volume changes (e.g. as a function of temperature)

are substantial, U and H (which differ from each other by the term PV) may be substantially different.

But in other systems where pressure and volume changes are minimal, our intuition about what

enthalpy and internal energy mean tends to be closer. This is the case for many of the kinds of

systems like solutions of macromolecules that we will be thinking about, and in those cases a fair

view is that the enthalpy embodies all the kinds of molecular forces of attraction and repulsion

between molecules that we’re familiar with. And in terms of ‘favorable’ vs. ‘unfavorable’, a high value

of H implies high energy or unfavorable interactions, while a low value of H implies favorable

interactions.

12

This means that we can often learn something useful about the forces and interactions that exist in a

system, for example a purified protein in solution, by measuring enthalpy changes. An experimental

method known as differential scanning calorimetry is often used to make those kinds of

measurements. A sample is slowly heated and the heat transfer required to produce each small

incremental increase in temperature is recorded. The difference is taken relative to a blank, which

would contain the solution and buffer but not the protein. If performed at constant pressure, that

recorded quantity, dqP /dT, is the heat capacity at constant pressure, CP. And from above, dqP /dT =

dH/dT = CP. And dH = CP dT. That means H for a process can be obtained by integrating the heat

capacity over the course of a temperature increase, H = CP dT

The example illustrated here is from

a thermal unfolding experiment on a

purified protein, carboxypeptidase A.

At low temperature the protein is

folded natively. At high temperature

the protein is unfolded. The

relatively flat parts of the curve in

those two regions are simply

reflecting the heat or enthalpy

change associated with increasing

the temperature (local vibrations for

example) of the protein. But the

region in the middle shows a

dramatic increase in the heat

capacity. This corresponds to the energy required to convert the folded protein to its unfolded form.

Favorable molecular interactions are broken in the process and the overall enthalpy change is

positive. The area of the shaded region is attributed to the H for the protein unfolding transition.

The Second Law

The second law presents much greater challenges to understanding than the first. Rather than stating

a law of conservation, it defines a directionality in which processes will naturally proceed (in time).

In that sense the second law enforces the ‘arrow of time’. The second law tells us that the total

entropy S in the universe (i.e. for any system plus its surroundings) is always increasing in time. And

this is likewise the case for an isolated system (since it can be viewed as its own universe).

So, for a spontaneous process (i.e. a process that would occur in the forward direction) occurring in

an isolated system, or for the universe as a whole, S > 0. Likewise, S = 0 describes a process at

equilibrium, i.e. with no net conversion forward or backward. In that sense the condition of

equilibrium can be seen as an optimization problem. At equilibrium S is a maximum and ds = 0 with

respect to forward or backward progress of the imagined process.

13

It is often tempting to forget about the conditions or restrictions under which various

thermodynamic equations hold true, but it is vital to understand that the equation S > 0 (for

spontaneous occurrence) requires the condition of an isolated system. In fact failure to understand

this vital requirement is the source of much confusion among the public and lay-scientists about

whether the development of life on Earth and the associated increase in order and molecular

complexity – an idea we will tie to entropy shortly – violates the second law of thermodynamics (and

thereby requires a creator). The folly in the argument is that the Earth is by no means an isolated

system, and in fact the delivery of light energy from the Sun to the Earth to drive photosynthesis is

essential for the chemical conversions that support life on Earth.

Classical view of entropy

From classical thermodynamics you learned that dS = qrev/T , where qrev is the heat transferred during

a reversible infinitesimal step in a process. This view of entropy is extremely useful for

understanding processes of heat transfer and expansion in gases. We learn that entropy increases

when gas volumes expand, and when heat is transferred from a hotter object to a colder object; those

processes are naturally favorable or spontaneous.

Statistical description of entropy

A way of stating the second law from a statistical thermodynamics view is that processes tend toward

maximum disorder or randomness, i.e. to configurations that can be realized in the greatest number

of ways. This view can be reconciled intuitively with the classical view – gas expansion allows for

greater freedom and less order with regard to the positions of atoms, and heat transfer from hot to

cold molecules decreases the order in the sense that the distinction between some molecules having

more thermal energy than others is removed. The intuitive relationship between the classical and

statistical views of entropy can be formalized mathematically but we will not attempt that here.

Instead, without further proof the statistical view of entropy is

S = kB lnW

where W is a measure of disorder or randomness that can be interpreted as the number of distinct

configurations that correspond to a given state. This is sometimes referred to as the number of

microstates. [Note that some texts use instead of W].

In this view, the requirement that entropy increases means that favorable states are those that can

be realized in the greatest number of ways. We will set up some highly simplified problems to see

how the statistical view of thermodynamics helps explain some basis molecular phenomena.

Entropy and the distribution of molecules in space

14

Let’s look at what the statistical view of entropy tells us about the way molecules tend to be

distributed in space. We’ll first consider a very tiny problem, too tiny really to qualify as a proper

thermodynamic system, but still informative. Suppose we have 4 molecules or particles that are

identical, but we can label or number them to make it possible to distinguish between microstates

within a given state. The system consists of a box with two chambers, a left side and a right side.

Suppose we describe the state of the system according to the number of molecules that are on the

left side vs the right side. We can let nL be the number of particles on left side and nR be the number

on the right. For each possible state (i.e. a defined number of molecules on each side), we can

enumerate the number of distinct ways or microstates (W) by which each state can be achieved by

choosing distinctly labeled molecules.

For some of the states, the value

of W is obvious enough. For

example, for state B, which has

just one molecule on the right

(nL=3), any of the 4 molecules

can be chosen to place on the

right, and so W=4. Likewise for

state D for which nL=1. The

case of nL=2 is harder. How

many ways can we divide or

partition a group of four objects

into a first subset of 2 (to place

on the left) and a second subset

of 2 (to place on the right)? The answer is 6, which comes from 4!/(2!2!) = 24/(2*2) = 6. This is a

combinatorial expression closely related to the permutation equation that says the number of ways

of ordering n objects is n factorial, or n!. Why in the case above do we divide 4! by 2! and 2!? One

way to see this is as follows. How many ways can 4 objects be ordered (e.g. in a line)? The answer is

4! or 24. Now let’s say that each of these 24 ways of writing down the molecules in order (e.g. 3 1 2

4) automatically assigns two to the left side (in this case 3 and 1) and two to the right side (2 and 4).

But you can see that the total set of 24 possible orderings overcounts the number of distinct outcomes

in the sense that there are other orderings that give the same partitioning. For example (1 3 2 4) is

the same partitioning as (3 1 2 4). If the same two particles are on the left, their separate ordering is

irrelevant. Since the state in question has two molecules on the left, and the number of ways of

ordering 2 objects is 2! or 2, we need to divide the total number of 24 orderings by 2. For the same

reason, the ordering of molecules within the right side doesn’t matter either, and so we must divide

again by 2! This gives us the value 6 we expect.

Thinking about problems like this in terms of partitioning between 2 (or more) groups is powerful.

The general equation for the number of distinct partitionings of N objects between a first group of n1

and a second group of n2 (with n1 + n2 = N) is

W = N!/(n1! n2!)

15

An equation of this form shows up throughout statistics applications. In typical statistics jargon, the

number of possible combinations for “N choose m” is NCm = N!/(m! (N-m)!), which matches the

equation above. The basic partitioning idea applies to many problems. How many different 5-card

hands can be dealt from a 52-card deck in which the cards are all considered distinct from each other?

[Hint: being dealt 5 cards is really just partitioning the 52 cards into the 5 you get and the others you

don’t get; the order in which you get dealt the cards doesn’t matter here.]

As an aside, another common type of probability problem (which also shows up in molecular

problems) involves a series of independent choices, and there the total number of possible outcomes

is n1*n2*n3*…. where the n’s describe the number of distinct options that are available to choose at

each step. Often the two types of probability problems are related to each other. Consider a variation

on the 4 molecule problem above. Suppose we want to know the total number of different ways the

4 molecules can be placed into two chambers, allowing all possibilities for the number of molecules

on each side, and as before not distinguishing between the positions of particles within the same

chamber. This can be answered by seeing that it amounts to making an independent choice for each

molecule about whether it will go on the left or right. So there are 2 choices, made 4 independent

times, which is 2*2*2*2 = 16. You’ll note that the answer to this problem counts up all together the

number of different partitioning, so it is not a coincidence that the values for W in the original

problem (1, 4, 6, 4, 1) sum to 16.

Returning to the problem of how molecules tend to distribute themselves in space, four molecules is

perhaps too small to give a clear picture of significance, so let’s go slightly bigger to N=6, again

treating the problem of how the molecules can be partitioned into two sides. For nL = {0, 1, 2, 3, 4, 5,

6}, we get respective values for W of {1, 6, 15, 20, 15, 6, 1}. You may begin to recognize the coefficients

as those from Pascal’s triangle. What does this tell us? Assuming there are no energetic differences

at play and each of the 6 molecules is free to occupy either chamber, then the likelihood of the system

being in any given state is proportional to the number of microstates, W. That means that it is 20

times more likely by chance for there to be three molecules in each chamber compared to the case

where all 6 molecules are on the left. Evidently, the most likely scenario is the one where the

molecules are evenly distributed with three on each side. The same trend applies, and becomes more

dominant, as the size of the system N increases. The basic conclusion is that entropy drives things

towards a uniform distribution of molecules in space, i.e. equal concentrations everywhere, assuming

the absence of energetic differences.

The behavior of this problem as N get large is also instructive. The plots show the probabilities one

gets for the distribution of molecules between the two sides of a system for N=6, larger 100, and then

N=10,000.

16

Returning to the case of N=6, one sees that the state with a uniform distribution of molecules is the

most likely, but the chances of significant deviations from that arrangement are substantial. With

molecules that are free to move around, two-fifteenths of the time the system will be found with all

the molecules on one side or the other. As N gets larger, the likelihood of substantial variations (on

a relative scale) goes down. As N gets larger the discrete combinatorial plot turns into a smooth

Gaussian function. The most likely outcome is still where nL is N/2. The standard deviation from the

most likely value for nL is (from earlier courses in statistics) 0.5*sqrt(N). So for example, if N=100,

the most likely value for nL is 50, but with a standard deviation of 5. What about when N = NA =

6.02*1023? There the standard deviation would be a large number (3.9*1011), but in fractional terms

compared to NA, the variation is minute. That is, the expected fraction of molecules on the left would

be 0.5 +/- 6*10-13. This is a general finding; for large thermodynamic systems the behavior of the

system tends to be dominated by the most likely scenario. On the other hand it is important to note

that the kinetic (time-dependent) behavior of a system often depends on the frequency of

perturbations away from the most probable arrangement.

Entropy and the distribution of molecules among energy levels

The same kind of treatment can be used to analyze how the energy in a system tends to be distributed

among the molecules present. Again, for numerical simplicity we’ll first treat a tiny system just big

enough to gain some insight. Suppose we have a system in which 4 identical molecules are each able

to exist in a series of discrete energy levels (in arbitrary units, E=0, E=1, E=2, …). And further suppose

that the total energy is fixed at ET=3. What are the possible ways that the 4 molecules can be placed

into the available energy levels? Note that nothing prevents multiple molecules from having the same

energy. For this tiny system there are only 3 different states or configurations of molecules among

energy levels subject to the restriction that ET=3. They are shown below, labeled states A, B, and C.

What is the value of W for each state? That is, for each energy configuration, how many different

ways could the molecules satisfy that configuration?

17

The answer is to think of this as a

partitioning problem. For state A, the 4

molecules are being partitioned into a

subset of 3 that will have energy 0, and

a subset of 1 that will have energy 3. For

that case, W = 4!/(3! 1!) = 4. For state B

we get W=4 also. For state C we must

first generalize our previous equation

for the number of combinations or

partitionings. When a partitioning

occurs into more than 2 subsets, the

equation for W generalizes to

W = N!/(n1! n2! n3! …),

where the small n’s refer to the number of molecules in the different subsets. [For completeness, also

remember that 0!=1 so empty subsets can be ignored]. So, for state C, W = 4!/(2! 1! 1!) = 12. What

we glean from this tiny test case is that the most likely situation (i.e. where W is greatest) is where

the molecules are spread out to some degree among the available energy levels, with the lowest

energy being the most populated.

Simulating exchange of energy between molecules in a closed system

The behavior of slightly larger systems can be analyzed by random simulations with rather

remarkable results. Suppose now we have a set of 50 molecules, and for the sake of argument

suppose the average energy is 1 so that ET = 50. We can set up an initial system where all 50

molecules sit at energy level 1. Then, molecules exchange energy between themselves, as might

results from collisions for instance. The details of the execution are important. Pick two molecules

at random, one whose energy will go down by one unit and the other whose energy will go up by one

unit. Do this over and over. But note the caveat that if the first molecule randomly chosen is already

at energy level 0, then throw out this energy exchange trial and repeat again; i.e. the energy of a

molecule can’t drop below the lower bound. If one performs this kind of random simulation, one

finds remarkably that the system will tend towards an energy distribution of the type noted above.

No other tricks are required. An example result of random simulation for N=20 is shown below.

For larger N, the simulation begins to produce a smooth distribution. Examples for N=320 and

average energy =1 and 2 are shown below.

18

The exercise demonstrates that the random tendency of molecules to spread out among available

energy levels while also being subject to the constraint of a lower energy bound naturally gives rise

to a smooth distribution where the lowest energy is most-populated, and then the distribution falls

off at higher energy. The diagrams above have the energy level going up vertically, and the frequency

with which molecules are found at that energy indicated by a horizontal bar. This can be flipped

around to give a more typical plot showing the probability (or abundance) of molecules on the

vertical axis and the energy value indicated on the horizontal axis. Doing this produces familiar plots

that show an exponentially decaying curve for the probability that any given molecule will have

energy E. This is the Boltzmann distribution.

The Boltzmann distribution

teaches some important

principles. There are fewer

and fewer molecules with

higher and higher energies.

But there are some, and

how many of these higher

energy molecules there are

is essential for

understanding rates of

processes that depend on

overcoming an energy

barrier. Another key

feature of the Botzmann

distribution concerns how

sharply the probability falls off as a function of energy. According to the Boltzmann equation, that

fall-off is governed by the denominator of the exponent (kBT, a term we alluded to before).

Specifically, we can ask what the ratio is between probabilities for two energy levels separated by

kBT. Call those probabilities P(E) and P(E+kBT). With a little algebraic manipulation we find that

P(E+kBT)/P(E) = exp(-(E+kBT)/kBT)/exp(-E/kBT) = exp(-1) = 1/e

19

This is a powerful simplifying statement. It tells us that kBT is the amount of energy difference that

corresponds to a drop in probability by a factor of e (which is about 2.7). The ‘thermal energy’ value

kBT is therefore the key quantity for comparison when evaluating whether two possible

configurations of a system that are separated by some given energy difference will be populated

similarly or very differently. The value of kBT is such a useful quantity for comparison that an energy

difference will sometimes be stated in terms of how many kBT units it is (which is effectively the same

as stating the value of the unitless exponent E/kBT above). For example, one might hear,

“conformation A is ‘2 kay – tee’ higher in energy than conformation B”.

Finally, always keep in mind that kBT and RT convey equivalent meanings; they simply differ by a

factor of Avagadro’s number, NA. RT must be used if the energy values are being described on a per

mole basis. The context and units assigned to the energy should make it clear which is being used.

For convenience, RT (at 298 K) is about 2500 J/mol (in SI units); the value is also sometimes given in

non-SI units as 2.5 kJ/mol.

20

CHAPTER 2

Entropy of mixing and its dependence on log of concentrations

Stirling’s approximation

We begin with a preliminary equation, Stirling’s approximation. As we saw before, various

calculations having to do with the statistical interpretation of entropy lead us to factorial expressions,

n!. Such numbers become intractable to evaluation as n gets large; how would you actually figure out

what a billion factorial was, or the factorial of Avagadro’s number? Stirling’s equation gives us an

approximation for the natural log of a factorial expression; from there one could exponentiate if

necessary to get an approximation for the value of n!, but we’ll see it is typically the log of the factorial

expression that we want anyway. Stirling’s approximation is as follows:

ln(N!) ≈ N * (ln(N) – 1)

Here is how close the approximation is:

N Actual value of ln(N!) evaluated as ln(1) + ln(2) + … + ln(N)

Stirling’s approx., N*(ln(N) – 1)

1,000 5907.7 5912.1 106 12815510 12815518

At least in terms of relative proportion, you’ll see that the error becomes very small as N gets large.

‘Entropy of Mixing’

A simple exercise illustrates the dependence of entropy (and subsequently other energetic terms) on

the natural log of concentrations. Suppose you have a system with two chambers and it contains

molecules of two types (black and white for sake of illustration). Suppose there are n1 black

molecules and n2 white molecules and consider a starting configuration where the n1 black molecules

are all on the left and the n2 white molecules are all on the right. Now we want to consider what

change in entropy would be associated with a process whereby the molecules could mix together so

that black and white molecules might occupy either side, as illustrated below.

21

From before we know that S = kB

ln(W), so analyzing the change in

entropy, S, boils down to figuring out

what W is for the initial state and the

final state. There are different ways

of treating this problem, but one is to

think of it as a partitioning problem

like before. Imagine beginning with

a bag of n1 + n2 = N molecules

together in a bag. Then to set up the

system you are going to partition the

N molecules into a group of n1 to go on the left side and a group of n2 to go on the right. As we

discussed before, there are many different ways of partitioning a large set into two smaller groups,

but in order to obtain the initial setup shown, only one of the possible partitionings satisfies the

requirement that all the molecules in the first group are black and all those in the second group are

white. So for the left side, Winitial=1. Now for the final state of the system. There we have agreed that

the molecules can be on either side regardless of type. For this particular problem we are going to

assume that upon mixing we still keep n1 molecules on the left and n2 molecules on the right, so for

the final state we are partitioning the N molecules into n1 on the left and n2 on the right, but any of

the possible partitionings is allowed. That means Wfinal = N!/(n1! n2!).

Therefore, the entropy change for mixing is

Smix = kB ln(Wf) – kB ln(Wi) = kB ln(Wf/Wi) = kB ln(N!/(n1! n2!))

Now you’ll see why we began with Stirling’s approximation, so we can replace the logs of factorial

expressions with algebraic quantities that can be manipulated and evaluated.

From above,

Smix = kB ln(N!/(n1! n2!))

= kB (ln N! – ln n1! – ln n2!)

≈ kB (N(ln N - 1) – n1(ln n1 – 1) – n2(ln n2 – 1) (then noting that –N + n1 + n2 = 0)

= kB (N ln N – n1 ln n1 – n2 ln n2) (then rewriting N as n1 + n2)

= kB ((n1 + n2) ln N – n1 ln n1 – n2 ln n2) (then rearranging and taking out a negative sign)

= -kB (n1 (ln n1 – ln N) + n2 (ln n2 – ln N)

= -kB (n1 ln (n1/N) + n2 ln (n2/N))

Now, if we use mole fraction Xi as a concentration to replace ni/N

Smix = - kB (n1 ln X1 + n2 ln X2), or more generally for more species

Smix = - kB (ni ln Xi), which is always positive

22

Basic conclusions from this exercise are that entropy increases by mixing, and that entropies depend

on the logs of concentrations (here expressed as mole fractions). As a further insight, noting that ln

x always goes down with lower values of x, we sense that the drive toward maximum entropy favors

every species going to a lower and lower concentration. But of course the total conservation of atoms

constrains things, making equilibrium effectively a fight over which species is driven most strongly

to lower concentration.

Gibbs free energy, G

A state variable that indicates the favorability (or equilibrium) of a process at constant T & P

Which way processes proceed naturally (i.e. forward or backwards) is established by the total

entropy of the system plus surroundings, or for an isolated system only the entropy of the system

needs to be considered. But this restriction can be removed and replaced with other more convenient

ones by constructing other state variables from a combination of S and other quantities. For many

applications in biochemistry, temperature and pressure do not change much. A state variable G, the

Gibbs free energy, which is constructed as G = H – TS, has the property of dictating the directionality

of a process in a system at constant temperature and pressure (the surroundings no longer require

consideration). A little algebra can demystify this claim. Beginning by differentiating G = H – TS,

dG = dH – Tds – SdT (then using the derivative of H = U + PV, dH=dU+PdV+VdP)

= dU + PdV + Vdp – TdS – SdT (then substituting the derivative dU = dq + dw

= dq + dw + PdV + VdP – TdS – SdT

At constant T and P we can drop two terms. And if the only work is PV work, then dw = -PdV, giving

dG = dq – TdS. Then, if the process is occurring reversibly, meaning it is not being driven forward or

backward, then from the classical treatment of entropy we recall that dS = dqrev/T, and dqrev = TdS.

Substituting then gives us

dG = 0 (reversible or equilibrium process at constant T and P)

Furthermore, the directionality of a process that is not at equilibrium is dictated by the sign of dG or

G, in the same way that the sign of ΔSsystem+ΔSsurroundings dictated the directionality of a process in our

earlier discussions, but now with a reversal of sign. Noting the negative sign that applies to S in the

expression for G = H –TS, we conclude that

dG < 0 for a process that occurs spontaneously (in the forward direction).

That is, processes at constant temperature and pressure are driven to minimum free energy, G.

23

G as a balance of two factors, H and TS

It is helpful to bear in mind that, from the form of G = H – TS, the free energy (which dictates the

directionality of processes) is affected by two terms. Converting the equation for G to a form that

describes the difference or differential between the ‘before’ and ‘after’ or left vs right sides of a

process,

dG = dH – TdS or G = H – TS (note that we have dropped a term SdT that would have been

present from differentiation since we are considering a process at constant T)

Evidently the drive to minimum free energy is a combined drive (1) toward low enthalpy H (recall

that H embodies the energetics of molecular forces and interactions between molecules, with lower

values of H corresponding to energetically favorable configurations or lower amounts or

concentrations of molecular species that have high energy), and (2) towards high entropy S (meaning

more randomness and disorder, including more uniform or equal concentrations).

How to think about G in a steady state process

In discussions of how state variables, like H or G for example, are changed in a process, what is

sometimes being described is a before and after scenario. For example a calculation of what H is

for converting a mole of pure substance A into a mole of pure substance B. [We could look up the

molar enthalpies for the two compounds in a table].

While the values of those quantities are important in evaluating the thermodynamic properties of a

process, this is rarely the sense in which things like changes in free energy, G, are considered in

biochemical processes. If we are talking about the free energy change G for conversion of citrate to

isocitrate in the cell, we are thinking about the conversion of the substrate to the product at whatever

their concentrations are, and those concentrations are not changing. Contrast that with the earlier

scenario where the composition and concentrations of the initial and final states are entirely

different. In biochemical systems where the concentrations of substances are being held roughly

constant by pathways and networks of reactions occurring together, it is clearer to think of

infinitesimal conversions of substrate to product. There can be a change in free energy in such a

process in the sense that the product and the reactant may have different free energies associated

with them (which depends on their concentrations as we shall see later), and we are creating more

molecules of the product and less molecules of the reactant, but without any substantive change in

composition or concentrations. Of course in order to express the magnitude of the free energy change

for the process of interest, we have to express it as a quantity with a meaningful scale for the

conversion that is occurring. So we express things like the free energy change for a process on a per

mole quantity, though for conceptual clarity we should keep in mind the infinitesimal or differential

nature of the process we are considering.

24

Free energy of mixing and the dependence of G on log of concentrations

We can return to our earlier treatment of mixing and now calculate the free energy of mixing in the

same way.

From the definition of G = H - TS, Gmix = Hmix - TSmix. Now, if molecules of different types

interact with each other in a way that is energetically similar to the way molecules of like type

interact, then it should be safe to say that there shouldn’t be any enthalpy change associated with

mixing (based on our intuition that enthalpy is about molecular forces and interactions). So, letting

Hmix be zero and using our previous equation for the entropy of mixing, we get

Gmix = RT ni ln(Xi) where ni are in moles and R reflects ‘per mole’ quantities

Consistent with earlier discussions, we see that different species contribute to the total free energy

of the system according to the logs of their individual concentrations. Also note that Gmix will always

be negative, consistent with our expectation that the free energy of mixing should be favorable.

The finding that the free energy

of mixing is negative (favorable)

gives us insight into what drives

chemical reactions to their

equilibrium positions. Suppose

we start with a system containing

only chemical A, and there is a

reaction A can undergo to form B,

and that the energetics of

molecule A and B are identical; as

one instance suppose the two

molecules are enantiomers

(equivalent in structure except for handedness). We know from intuition and possible experience

that a system like this will proceed by reaction until the two species are present in equal amounts or

concentrations. But why? If A and B have the same energy, then what could drive the conversion?

Wouldn’t it be simpler if the molecules just stayed as A since the energy is not improved by converting

to B? The answer of course has to do with entropy, and specifically the contribution entropy makes

to the free energy of mixing. This imaginary scenario helps us draw a connection between (1) the

chemical conversion to reach equilibrium and (2) mixing of different components. Suppose we take

the initial system composed of only A and then imagine a hypothetical divider down the middle. Now

imagine converting all the material on the right side from A to B. Clearly the entropy, enthalpy, and

free energy of that process are all zero based on our supposition about the energetic equivalence

between A and B. Now, in a second step we can imagine that the contents of the two sides are able

to mix. This will result in a mixed system with equal amount of A and B, and the free energy of the

mixing would be negative (following from the equation above). The two steps put together produce

25

exactly the same result as if there was chemical conversion of half the A molecules into B molecules

in the whole system. This thought exercise lets us see that the favorability of converting some

amount of a pure substance into other substances to reach equilibrium derives from a favorable

entropy of mixing.

26

CHAPTER 3

Chemical potentials, µ

From before we have an understanding that the free energy of a system composed of a mixture of

chemicals depends on the concentrations of the components, and if a chemical process is possible

that would interconvert some components into others, then there is a free energy change associated

with that process.

From previous courses you will likely remember equations of the following form:

G = G0 + RT ln (Q), and letting K = exp(-G0/RT) gives G = RT ln (Q/K)

where G0 expresses the (molar) free energy change for a reaction if it were occurring under

standard state conditions, Q represents the ratio of product concentrations to reactant

concentrations under the conditions where the reaction is being considered, and the equilibrium

constant K is the ratio of product to reactant conditions at equilibrium. Below we will show how the

equations above can be obtained, and perhaps better understood, by taking a differential or

infinitesimal view of any reaction or process underlying the conversion of molecules from one

species to another or from one location or phase to another.

Definition of µ as a partial derivative of G with respect to composition

‘Chemical potentials’ are differential or derivative quantities that help us get at the free energy of a

mixture (i.e. a system with multiple components). Since a mixture is just a combination of separate

components, it makes sense to consider what free energy is contributed to the mixture by each

component separately. A way of looking at that question is to consider how much the free energy of

a system would be changed by adding a tiny, infinitesimal amount of a particular component. That

idea is shown on the right, where the

chemical potential of a given

component, i, is defined as the

partial derivative of G with respect to

the change in the amount of that

component. Note that despite the

chemical potential being a

differential related to an infinitesimal

change, it is expressed as per mole

quantity.

27

Dependence of chemical potentials on concentrations and standard state chemical potentials

µ0

The free energy in a mixture depends on the natural log of the concentrations, so naturally we expect

to see a similar dependence of on concentration. The total free energy for a mixture should be the

sum of the free energies of the pure components (weighted of course by the amount of each

component), plus the free energy of mixing, since starting with pure components separately and then

mixing them (obviously) gives us a mixture.

𝐺𝑡𝑜𝑡𝑎𝑙 = ∑ (𝑛𝑖�̅�𝑖∎)

𝑖+ 𝑅𝑇 ∑ (𝑛𝑖 ln 𝑥𝑖)

𝑖

= ∑ 𝑛𝑖(�̅�𝑖∎ + 𝑅𝑇 ln 𝑥𝑖)

𝑖

where the first term in the sum relates to the free energies of the pure components and the last term

describes the free energy of mixing. The bar over the G indicates a per mole quantity and the solid

symbol as a superscript indicates reference to the pure component. Now we can evaluate the

chemical potential for component i as a partial derivative of G with respect to ni:

𝜇𝑖 =𝜕𝐺

𝜕𝑛𝑖= 𝜇𝑖

∎ + 𝑅𝑇 ln 𝑥𝑖

where we have replaced the free energy of the pure component on a per mole basis (�̅�𝑖∎) with the

chemical potential of the pure component (𝜇𝑖∎); their meanings are equivalent. As expected, we see

that the chemical potential of each species depends on the log of its concentration, and that the

chemical potential goes down (i.e. becomes more energetically favorable) as the concentration goes

down.

The total differential, dG as a function of changes in composition

Now that we have an expression for how the chemical potentials depend on concentration, we can

turn to look at how the total free energy depends on changes in the quantities of the various

components. We note that T and P are the natural variables for G, and that G also depends on

composition, i.e. the ni’s. Taking G as a function of T, P, and the ni’s, we can write out the total

differential for G as:

𝑑𝐺 = (𝜕𝐺

𝜕𝑃)

𝑇,𝑛𝑖

𝑑𝑃 + (𝜕𝐺

𝜕𝑇)𝑃,𝑛𝑖

𝑑𝑇 + ∑(𝜕𝐺

𝜕𝑛𝑖)

𝑇,𝑃,𝑛𝑗≠𝑖𝑖

𝑑𝑛𝑖

Replacing the partial derivatives with the correct thermodynamic quantities gives:

𝑑𝐺 = 𝑉𝑑𝑃 − 𝑆𝑑𝑇 + 𝜇1𝑑𝑛1 + 𝜇2𝑑𝑛2 + ⋯

28

Then, at constant T and P, we see that the change in free energy arising from a change in composition

(i.e. a chemical conversion of some molecules to others, or movement of molecules from one place to

another) is given by:

𝑑𝐺 = ∑ 𝜇𝑖𝑑𝑛𝑖𝑖

There is much sense to this equation. The total differential free energy change takes into account the

(differential) compositional change for each component (dn) multiplied by the chemical potential of

each component (µ). We get a general sense then that dG will be negative (i.e. a favorable process) if

molecules with higher chemical potentials are converted to molecules with lower chemical

potentials. Furthermore, if a process is at equilibrium then the chemical potentials of the molecules

that would be created and those that would be consumed should be equally balanced in order for dG

to be equal to 0.

Equilibrium conditions in terms of µ’s

From above,

∑ 𝜇𝑖𝑑𝑛𝑖𝑖 = 0 for a process at equilibrium.

This is a powerful equation for analyzing all kinds of processes, from chemical reactions (where

chemically distinct molecules are able to interconvert) to transport processes (where a molecule of

a given type is able to move from one place to another or from one phase to another).

Phase or transport equilibrium

The diagram at the right illustrates equilibrium involving

partitioning of a component between two different phases.

You are familiar with processes like this from organic

chemistry where you partitioned a compound between an

aqueous phase and an organic phase (e.g. in a separatory

funnel). How does the differential free energy change, dG, in

this case depend on the process under consideration

(specifically transport of molecule A from phase 1 to phase 2)?

From above, dG = µA,1 dnA,1 + µA,2 dnA,2, where the subscripts denote the chemical species (which

doesn’t change here) and the phase where it occurs. At equilibrium, dG = 0, so µA,1 dnA,1 + µA,2 dnA,2 =

0. Then, noting that dnA,1 and dnA,2 are identical but negatively related quantities, µA,1 dnA,2 - µA,2 dnA,2

= dnA,2 (µA,2 - µA,1) = 0. The parenthetic expression must be zero. Therefore, when A is at equilibrium

between the two phases,

29

µA,1 = µA,2

This makes perfect sense; since this process creates a molecule of A in phase 2 at the expense of a

molecule of A in phase 1, at equilibrium the chemical potential of A in the two phases must be equal.

If the two chemical potentials were not equal, then the process would not be at equilibrium, and G

could be decreased (in a shift closer to equilibrium) by converting some of the higher chemical

potential component into the lower chemical potential component. In the problem described here,

that would be by movement (i.e. a transport process).

If the system was not at equilibrium, then the free energy associated with the process (assuming the

forward process is interpreted to be movement from left to right) would be (µA,2 - µA,1). This would

represent a differential free energy on a per mole basis. The value of that energy term could have

various practical interpretations in a biological setting. If the value was less than zero, then it might

describe the amount of work that could be extracted from the process and used to drive a different

unfavorable process if the two processes were coupled together by some mechanism. Or, if the free

energy difference was positive, then that energy value could describe the amount of work (or

favorable free energy) that would have to be extracted from another coupled process in order to

maintain the first process away from the equilibrium condition to which it would go otherwise.

Chemical equilibrium

Now we consider a chemical reaction and look at the conditions on the i’s for equilibrium. Consider

the reaction below:

A + B 2C

In the process above, the amounts of A, B, and C are subject to change, so the differential free energy

change is

𝑑𝐺 = ∑ 𝜇𝑖𝑑𝑛𝑖𝑖 = µA dnA + µB dnB + µC dnC

The reaction arrow represents a single process, so the changes that occur to the amounts of the

different components must be related to each other, and to a single quantity describing the extent of

the reaction. If we let describe the extent of the reaction on a per mole basis, then according to the

reaction stoichiometry,

dnA = - d

dnB = - d

dnC = + 2d and substituting above gives

dG = (-µA -µB dnB + 2µC ) d

30

At equilibrium (dG = 0), so we have (– µA – µB dnB + 2µC ) = 0. This makes intuitive sense since you

can see that in order for the expression to evaluate to zero, 2µC would have to be equal to µA + µB ,

meaning that adding up the chemical potentials of the components on the two sides of the reaction

has to give matching values. Otherwise the free energy could be lowered by having the reaction

proceed one way or the other.

If the reaction is occurring away from equilibrium, then the free energy difference for the reaction on

a per mole basis (meaning per mole of reaction events) would be (-µA -µB dnB + 2µC ). You’ll see that

this is nothing more than adding up and subtracting the chemical potentials of the products and

reactants, properly weighted by their respective stoichiometries. As before, if the concentrations are

away from equilibrium then the expression above would describe the molar energy required to make

the process proceed or (if the value is negative) how much work could be extracted from the process.

Equilibrium conditions in terms of concentrations and standard chemical potentials: arriving

at familiar equations for the equilibrium constant

So far we have laid out the conditions on the chemical potentials that must be true at equilibrium.

But of course the chemical potentials of the components depend on their concentrations, and

together this leads to equations for equilibrium constants (K) and reaction quotients (Q), which

should be familiar.

From this point forward we will switch away from mole fraction and use molarity (capital C) as our

concentration unit instead. We replace the solid superscript denoting the pure state before with the

open subscript denoting 1M as the choice for standard state concentrations. We therefore rewrite

our equation for the chemical potential and its dependence on concentration as

i = i0 + RT ln Ci

The standard state chemical potential ( i0) refers to the chemical potential the molecule would have

at its standard state concentration (1M). The standard state chemical potential serves as a reference

value to which the chemical potential can be related, taking into account the dependence on

concentration. This general statement about how the chemical potential of a component depends on

a standard state value (which is a constant) and the concentration of that component will appear

throughout our subsequent discussions.

Phase or transport equilibrium

For the earlier case of phase equilibrium of molecule A between phases 1 and 2, at equilibrium µA,1 =

µA,2 and substituting the expression above for each component gives,

A,10 + RT ln CA,1,eq = A,20 + RT ln CA,2,eq

31

Here we recognize that the standard state chemical potentials for the same molecule could be

different in different phases, from which one can see that the concentrations on the two sides would

be unequal at equilibrium. By collecting separately the terms for concentration and those for

chemical potentials,

RT ln CA,2.eq - RT ln CA,1,eq = -( A,20 - A,10) Rearranging gives,

ln (CA,2,eq/ CA,1,eq) = -( A,20 - A,10)/RT

We can recognize CA,2,eq/ CA,1,eq as the equilibrium constant K for this process. Making that

substitution and also replacing the difference between standard chemical potentials with the more

familiar expression G0 for the standard state free energy change, we arrive at

ln K = -G0/RT and K = exp(-G0/RT) with K = (CA,2,eq/ CA,1,eq)

To analyze a system away from equilibrium, we can introduce concentrations and equilibrium

constants into the non-equilibrium situation. Returning to dG = dnA,2 (µA,2 - µA,1), and substituting in

equations of the form i = i0 + RT ln Ci as before gives, with some rearrangement,

dG/dnA,2 = RT ln(CA,2/ CA,1) + ( A,20 - A,10)

or more familiar,

G = RT ln(CA,2/ CA,1) + G0

where the free energy differences here refer to the transport process on a per mole basis. Noting

from above that G0 = – RT ln(K), and recalling that the reaction quotient Q is used to describe the

ratio of product to reactant concentrations in the general case where a system may be away from

equilibrium, we get the familiar equation

G = RT ln(Q/K) where in this case Q = CA,2/ CA,1 and K = CA,2,eq/ CA,1,eq

Again, G on a per mole basis has the same meaning as (µA,2 - µA,1), which is the energy per mole that

can be extracted from (or that would be needed to drive) the reaction under consideration.

Chemical equilibrium

We can work out similar equilibrium expressions as well for our previous chemical reaction.

Substituting general terms of the form i = i0 + RT ln Ci into (– µA – µB dnB + 2µC = 0) gives, with

some rearrangement:

32

2RT ln CC,eq - RT ln CA,eq - RT ln CB,eq = - (2 C0 - A0- B0)

ln (CC,eq2/ (CA,eq CB,eq)) = - (2 C0 - A0- B0)/RT

which again matches

ln K = -G0/RT and K = exp(-G0/RT) with K = (CC,eq2/ (CA,eq CB,eq)) and G0 = (2 C0 - A0-

B0)

As before, if the reaction is away from equilibrium then we can work out equations for the molar free

energy for the reaction, obtaining in this case

G = RT ln(Q/K) with Q = (CC2/ (CA CB)) and K = (CC,eq2/ (CA,eq CB,eq))

Importance of units

It is important to understand the way concentration units have been implied in the equations we

have developed for chemical potentials, free energies, and equilibrium constants. Returning to the

general equation we developed for how chemical potential depends on concentration, where we

switched over to molar concentrations, i = i0 + RT ln Ci, you will see that we seem to be taking a

logarithm of a quantity that has units associated with it (molarity in this case), which is technically

illegal. To correct this problem, in every occurrence of a concentration value in our preceding

equations, we should understand that the concentration needs to be implicitly divided by the value

chosen for the standard state, 1M for example. That division generates unitless quantities for the

concentrations in all of our expressions for chemical potentials, free energies, reaction quotients and

equilibrium constants:

i = i0 + RT ln (Ci / 1M) for example

or

K = ((CC,eq/1M)2/ ((CA,eq/1M)(CB,eq/1M)) for the reaction above.

As you can see, as long as the standard state is 1M, then leaving out these implicit denominators is

fine. But there is an important case where 1M is not the typical choice made for the standard state.

Because biological conditions are typically close to pH7 (and not pH0), the standard state

concentration for protons (H+) is taken to be 10-7M. That means that anytime a reaction (or transport

process) involves the creation, consumption, or movement of protons, the concentration of protons

must be divided by 10-7M when using it in the calculation of free energies and reaction quotients and

equilibrium constants.

33

Other species that get handled as special cases, typically by being omitted from the equilibrium

expressions, include: water (its concentration in most scenarios is taken to be nearly pure so the mole

fraction X≈1) and compounds in their pure forms (e.g. crystalline solids) which are also taken to be

their own phase, with X=1.

Precautions about G vs G0, reactions with changes in stoichiometry, and overall

concentration effects

Free energy is sometimes discussed loosely, which can lead to confusion and errors in interpretation.

A particularly common error is to not properly distinguish between whether one is talking about G

or G0. As discussed above, G0 describes how favorable or unfavorable a process would be if the

reactants and products were all at their standard state concentrations. That is practically never

representative of conditions of biochemical interest. [Note that cellular concentrations of small

molecule metabolites are often in the millimolar range; macromolecules like proteins are present in

the cell at individual concentrations that are often in the micromolar range (e.g. for housekeeping

enzymes) or nanomolar or lower for low-abundance proteins like those often involved in cell

signaling.] The value of G0 is simply a reference energy that makes it possible to calculate the free

energy or equilibrium position at some other more relevant set of concentrations.

Another common source of confusion arises in the context of reactions where the total stoichiometry

of the reactants and products are different. In simple processes or reactions where the

stoichiometries of the reactants and products are the same, casual statements such as, “that reaction

or process is ‘naturally favorable’ because the (standard) free energy is negative”, can be interpreted

in a sensible way. For example, for the reaction A B, if the standard state free energy

difference is negative, then K > 1, and if A and B were both present at 1M concentration then, since

Q=1 which is lower than K, G would be negative and the forward reaction (conversion of some A to

B) would be favorable. The same conclusions would be reached if the concentrations of A and B were

both much lower (or higher) but still equal to each other. For example, if A and B were both present

at 1mM concentration then Q would still be 1 and the forward reaction would still be favorable. A

further conclusion is that at equilibrium B would have a higher concentration than A, whether the

overall concentrations are high or low. But this kind of casual logic falls apart entirely when the

number of molecules on the left and right side of a reaction are unequal. A classic case is a process

of binding between a protein and a ligand (e.g. an inhibitor or substrate or cofactor). Here there are

two ‘reactants’ and one ‘product’ (the bound form of the protein). In the former example, the sign of

G0 provided quick insight into the relative concentrations one would expect for the substrate and

product at equilibrium, without worrying about the definition of the standard state. But what about

the case of ligand binding by a protein? Here, the value of G0 provides no such easy insight. The

problem can be appreciated by noting that if the concentration units for Q (or K) do not cancel (which

they do not if the total stoichiometries are different on the left and the right), then the value of Q

changes with changes in overall concentration, even if relative concentrations are held equal. So, for

example, a negative G0 (K > 1) for the binding energy would tell you that if the protein, ligand, and

34

protein-ligand complex were all at 1M, then the binding process would proceed forward toward more

complete binding (so that ultimately more of the protein would be in the bound form than the

unbound form). But if those three species were all present at equal concentrations of 1uM, the value

of Q would be a million (10-6/(10-6*10-6)), which could be much greater than K (depending on how

negative G0 was), which would mean that the process would proceed in the reverse direction

toward unbinding, and ultimately most of the protein would not be bound to ligand. This is just one

illustration of the point that the interpretation of free energies must be made carefully, particularly

when there are differences in stoichiometry between reactants and products. In those cases one

must bear clearly in mind that overall concentrations are profoundly important, and that the sign and

magnitude of G0 is hardly informative without further consideration of real concentrations. Note

that the argument above about stoichiometry and the interpretation of free energy G applies just as

well to entropy S, but is a less critical issue for enthalpy H.

Comments on the dependence of G and K on T (van’t Hoff equation)

In our discussions of free energy we emphasized that the sign of G indicates the favorability of

reactions under conditions of constant temperature and pressure. But how G depends on those

values is also of interest in some situations. [One example is how temperature affects the equilibrium

between the unfolded and folded states of a protein.] Here we say something about how free energy

G and the equilibrium constant K depend on T.

From G = H – TS, one can see quickly that the dependence of G on T is determined by S. In fact,

we can look up from derivative expressions of the state variables that the partial derivative of G with

respect to T, holding P constant, is –S. That is, (∂G/∂T)P = -S. So for example, if a process is

entropically favored (S > 0), then increasing the temperature will make G more negative. Clearly,

the dependence of G on T is dictated by the sign of S.

But now let’s look at the dependence of the equilibrium constant K on T. This is where intuition can

go awry. We know that K is determined from G0 (recall K=exp(-G0/RT), and that a more negative

value of G0 corresponds to a higher value of K. So we might expect that if increasing T causes a

decrease in G0 (as it would if S0 > 0 as discussed above), then K should also depend on S0, with an

increase in T causing an increase in K if S0 > 0. But this logic is incorrect (though not uncommonly

heard in discussions). The problem with the logic is that K depends on T in two ways: through the

effect of T on G0 and through the presence of T in the denominator of the expression for K in terms

of G0.

To get the correct answer for how K depends on T, we have to break up G0 into its enthalpy and

entropy components at the outset, since those two terms have different dependencies on T.

K = exp(-G0/RT)

ln K = -G0/RT = -(H0 – TS0)/RT = -H0/RT + S0/R

35

Now the dependence of K on T can be seen to be governed by H0 and not by S0! Taking derivatives

with respect to T we get

d(ln K)/dT = H/(RT2) (here the standard state superscript for H might be omitted since H

depends less strongly on overall concentrations, in contrast to G and S as discussed at length above).

This is one form of the van’t Hoff equation. Separating the derivative variables K and T on different

sides gives d(ln K) = H/(RT2) dT, and as long as H does not change much with change in T we can

integrate between two temperatures T1 and T2 to get

ln(K2) – ln(K1) = ln (K2/K1) = (1/T1 – 1/T2)H/R

which shows how one can extract a value for the enthalpy change for a reaction or process from the

value of K at two different temperatures. Or, plotting ln(K) vs 1/T should give a slope of –H/R.

36

Graphical views of chemical potentials and total free energy as a function of reaction progress

for a simple equilibrium (A B)

Note that

‘mu’ means .

37

CHAPTER 4

Non-ideal behavior in mixtures

The breakdown of ideal equations for chemical potential

Our previous discussions have emphasized the idea that the energies in a mixture have a simple

behavior (i.e. a log dependence) that is perfectly obeyed across all ranges of concentrations,

regardless of what sorts of molecular forces might come into play as different kinds of molecules

encounter each other. We refer to that kind of behavior as ‘ideal’. We turn now to consider the

behavior of ‘real’ or ‘non-ideal’ solutions.

To understand non-ideal behavior, let’s rethink the steps we took to arrive at our simple equations

for ideal behavior to look for assumptions we made that might be violated in real situations. We used

the idea of ‘free energy of mixing’ as the foundation for establishing our equations for chemical

potential and their dependence on log concentrations in the ideal case. We started with this equation,

Gmix = Hmix –TSmix, which led us to Gmix = RT (ni ln(Xi)). But we made two assumptions in the

process.

First, you’ll recall that we allowed ourselves to drop out the enthalpy term, asserting that Hmix would

be zero upon mixing if the different kinds of molecules make energetic interactions with each other

that are similar to those they make with themselves in their pure forms. This might be a fair

assumption if the two (or more) molecular species are very similar to each other (e.g. in polarity,

charge, size, etc.). On the other hand, if intermingling of the different components leads to interaction

forces of different types and magnitudes, then our assumption that mixing would not have any

enthalpic effect will be incorrect, and the energy or chemical potential felt by each component will be

affected not only by its own concentration but by the new forces it experiences when interacting with

the other components.

A second simplification came in the way we treated the second term, the entropy of mixing. We

developed our combinatorial expression for W (to give us entropy) based on an idealized mixing

scheme where we placed molecules of different types on different sides of a container. This seemed

innocent enough. But what if the two types of molecules were of vastly different sizes? This might

have led to a more complex problem relating to how large vs small molecules might be arranged in

space without overlapping each other. This issue would not have been captured by our simple

equation for counting partitionings of molecules.

Later we will discuss in more detail specific situations where violations of the assumptions above

lead to non-ideal behavior. But first we will modify our previous equations for chemical potentials

and equilibrium constants so that they will hold true even when non-ideal effects are at play. To do

this we introduce a correction or factor into the chemical potential equations in the form of an

‘activity coefficient’, .

38

Activities and activity coefficients

Our ideal equation for the chemical potential of species i was:

i = i0 + RT ln Ci (ideal)

Now admitting that that equation might not be totally valid, we introduce a correction factor, the

activity coefficient, i, designed to make the equation remain true.

i = i0 + RT ln (iCi) (real or non-ideal)

or

i = i0 + RT ln (ai) with ai = iCi (real or non-ideal)

where we introduce the ‘activity’ ai to be equal to iCi . Then ai effectively replaces Ci in the chemical

potential equation. You can see that the ‘activity’, a, becomes like an effective concentration of a given

component. Another way of looking at it would be to imagine that you don’t have a way of directly

measuring the true concentration of a component in a mixture, but you have a way of measuring the

chemical potential of that component (through some energetic evaluation). From the chemical

potential of that component, since chemical potential depends on concentration, you could say that

you are able to measure what concentration that component seems to have based on its energetic

behavior, and that effective concentration is the activity. You might anticipate from the equations

above, which make it explicit that chemical potential relates to activity and not necessarily to

concentration, that the activities will be the key quantities in equilibrium constants and reaction

quotients for non-ideal systems.

Before we rework our previous equilibrium equations in terms of activities, let’s look a little more at

the range of possibilities for the activity coefficients and how this relates to favorable vs unfavorable

energetic features in non-ideal mixtures.

First, note that our new equations reduce to the ideal ones when the activity coefficients, i, are equal

to 1. In that case, the activity is the same as the concentration, ai = iCi. Logically then, non-ideal

behavior is when the activity coefficient is either greater than or less than 1. Those two possibilities

can be ascribed different energetic meanings. By comparing the equations above for chemical

potential in the ideal and non-ideal cases, we can see that a value of i > 1 relates to an elevated value

for the chemical potential for component i. Since the chemical potential reports on the energy that is

felt by some component, we surmise that i > 1 indicates that component i is experiencing

unfavorable energetics compared to the case of ideal behavior. Conversely, i < 1 reflects unusually

good energetic interactions.

39

The ideal behavior of highly dilute solutions

Now we have to discuss in a bit more detail what limiting situations are chosen (by convention) to

represent ideal behavior. From our previous discussions it might seem that the sensible thing would

be to take the pure state of each component to represent its ideal behavior. This is fine for the solvent;

in biochemistry our ‘mixtures’ are nearly always solutions where water is the solvent and various

other molecules are the dissolved solutes. But the idea of a pure solute often doesn’t make sense for

biochemistry. For example, a sample containing only a protein in a pure form (without solvent) is

nonsensical since proteins don’t fold properly unless they are in an aqueous solution. Therefore, the

condition chosen to represent ideal behavior for a solute is usually the (hypothetical) infinitely dilute

limit. Let’s see if this is consistent with ideas we laid out earlier about how the equations for chemical

potential as a function of concentration should behave. Putting a slightly finer point on our previous

arguments, the ideal equation for chemical potential fails if a given component experiences different

kinds of interactions as its concentration is changed. Now we can examine the situation of a highly

dilute mixture to see if meets the ideal requirement that a given component makes the same kinds of

interactions as its concentration is changed slightly. First consider a dilute solution from the

perspective of the solvent. If the solute is present in a 1:1000000 ratio to the solvent (setting aside

for the moment potential differences in molecular size), then any arbitrarily chosen solvent molecule

will be interacting nearly exclusively with other solvent molecules. Now if we increase the

concentration of the solute by a factor of two, that doesn’t change the picture; a solvent molecule will

still interact nearly exclusively with other solvent molecules. Now let’s view it from the perspective

of the solute. At the 1:1000000 ratio, a solute molecule will rarely interact with another solute and

will exclusively ‘see’ the solvent. When we double the concentration of the solute, this is still the case.

Clearly then, if a solution is very dilute, the various components can be expected to behave ideally.

The ideal state for the solvent is taken to be pure solvent (water), whereas the ideal state for the

solute is at infinite dilution, and the components under these highly dilute conditions have activity

coefficients equal to 1.

The origin of non-ideal behavior at higher concentrations

We can use the same logic as above to think about the non-dilute situation where non-ideal behavior

begins to show up. Consider what happens when a solute concentration gets much higher. Now the

solvent will start to encounter solute molecules with frequencies that cannot be ignored (as

illustrated below). So if for the sake of argument the solvent and the solute make poorer or less

favorable interactions with each other than they do with themselves, then as the mixture moves into

the non-ideal range, the solvent will experience a higher chemical potential than expected for ideal

behavior owing to its increased interactions with the other component (the solute). That would

mean the activity coefficient for the solvent would be > 1. Now let’s look at it from the perspective of

the solute, which sees things differently because it is dilute rather than nearly pure like the solvent.

As the solute concentration increases, at some point solute molecules begin to encounter other solute

molecules to an appreciable extent. Now under the same scenario as before where the solvent and

40

solute make poorer interactions with each other, and better interactions with themselves, you see

that as the concentration of the solute increases it makes more favorable interactions (with itself).

So, the activity coefficient for the solute would be < 1. The reason we obtain different numerical

behavior for the activity coefficient for the solvent vs the solute under the same set of assumptions

about the energetics of the solution is due entirely to the different choices for what the ideal limit is

for the different

components: pure in

the case of the

solvent (water) and

highly dilute in the

case of the dissolved

solute. Note that if

we imagined the

opposite scenario

where the solvent

and solute made

better interactions

with each other than

with themselves,

then the behavior of

the activity

coefficients would

be reversed, with

the activity

coefficient for the

solute being > 1 and

solvent < 1.

Reworking the equilibrium equations in terms of activities instead of concentrations

The expression for the total differential dG remains true even if the behavior is non-ideal, as does the

requirement that dG equals 0 at the equilibrium composition.

𝑑𝐺 = ∑ 𝜇𝑖𝑑𝑛𝑖𝑖 = 0

But now we use

i = i0 + RT ln (ai)

41

This is the same as before except activity a has replace molar concentration C. Clearly the equations

will develop exactly as before, but with activity a replacing C everywhere. For example, for the

reaction A B, starting from µA = µB at equilibrium, we would obtain

ln (aB,eq/ aA,eq) = -(B0 - A0)/RT

(aB,eq/ aA,eq) = exp(-(B0 - A0)/RT) = K (where K is the equilibrium constant as before)

Note however that K CB,eq/CA,eq if the behavior is non-ideal, since ai Ci

The relationship between the equilibrium constant and the concentrations can be seen by grouping

the acitivity coefficients together as a single correction terms, as follows:

K = (aB,eq/aA,eq) = (BCB,eq)/ (ACA,e) = (CB,eq/CA,eq)*(B/A)

The equilibrium constant is constant and so its value is not affected by non-ideal behavior (e.g. at

higher concentrations), and the ratio of activities also remains equal to the equilibrium value. But

the ratio of concentrations, which we ordinarily think of as the equilibrium constant, is affected and

can change. You might then think of the ratio of concentrations as the non-ideal or ‘apparent’

equilibrium constant, whose relationship to the true, ideal equilibrium constant would be:

(CB,eq/CA,eq) = Kapp = K/(B/A)

And if the system were away from equilibrium then the expression for molar free energy for the

reaction would be the same as for the ideal case, except activities would replace concentrations in

the formulation of the reaction quotient Q. For the simple reaction of A B for example,

G = RT ln ((aB/aA)/K) = RT ln (((BCB)/(ACA))/K)

The equations above are of course specific for the simple equilibrium between A and B, but the idea

generalizes immediately to any reaction or stoichiometry.

For the more complex reaction A + B 2C, beginning with 2 µC = µA + µB, we would end up with,

K = aC,eq2/(aA,eq*aB,eq) = (CCC,eq)2/(ACA,eq*BCB,eq) = CC,eq2/(CA,eq*CB,eq)* C2/(A*B) and

CC,eq2/(CA,eq*CB,eq) = Kapp = K/(C2/(A*B))

And for the molar free energy if the system is away from equilibrium,

G = RT ln ((aC2/(aA*aB))/K) = RT ln (((CCC)2/(ACA*BCB))/K)

42

Ion-ion interactions in solution as an example of non-ideal behavior

(Debye-Hückel theory)

Here we will examine how ions in an electrolyte (salt) solution behave. As you know, charged species

repel or attract each other depending on whether their charges have the same or opposite signs. This

affects the positions that ions exhibit (on average) as they move around freely in solution. We will

contrast what happens when we have a very dilute (meaning ideal) electrolyte solution compared to

when the concentrations of ions gets higher. In the dilute limit, the ions are so far apart that their

electrostatic properties do not influence each other. In contrast, at higher concentrations the positive

ions will prefer to be in the vicinity of negative ions, and vice versa, while like charges will prefer to

be farther from each other. That means that, on average, a positive ion will find itself surrounded by

a slight excess of negatively charged ions, and likewise a negative ion will find itself surrounded by a

slight excess of positively charged ions; remember that we always have a mixture of positive and

negative ions in an electrically neutral solution. The ions are moving around in solution, so the effect

is subtle, but significant. From this argument you can see that each ion should enjoy a favorable

energetic interaction with its ‘counter-ion atmosphere’. Referring to our earlier discussions, this

favorable energetic contribution corresponds to an activity coefficient for the ions that is < 1.

A quantitative treatment of the energetics of electrolyte solutions was developed by Debye and

Hückel, and is worked out in detail in some texts. Here we will simply summarize the essential ideas.

Ionic strength and the Debye length

First we explain the idea of the Debye-length. Each ion is surrounded by a counter-ion atmosphere

whose total charge offsets the charge on the central ion. How is that opposing charge distributed (on

average) as a function of distance from the central ion? At a very long distance from the central ion

of interest the attractive force is small, so the counter-ion atmosphere drops to zero at long distance.

In addition, the amount of opposing charge that can exist very close to the central charge is limited

since the available volume at very small distance becomes small. So, as diagrammed below, the

amount of counter-ion charge goes up and then down with distance, and its maximum value is at a

distance referred to as the Debye length, 1/. The increased counter-ion concentration in the vicinity

of a central ion also has the effect of ‘screening’ or diminishing the electrostatic force or field that is

exerted by a given ion, and the Debye length also describes that effect. From Coulomb’s law you’ll

remember that the electrostatic potential at a distance r from a central ion is proportional to 1/r

(that is, 1/r), and that equation would apply in the infinitely dilute limit. When screening

becomes significant owing to an increase in the concentration of ions, then (1/r)exp(-r).

A simple computer simulation is shown for ions moving around in solution under forces of attraction

and repulsion. A snapshot is shown along with a calculation of the average counterion charge around

a negatively charge ion. The Debye-length length 1/ is indicated.

43

What is the value of 1/? This depends mainly on the total concentration of ions in solution; more

exactly, it depends on the ionic strength, I.

For reasonably dilute solutions the equation for ionic strength is

I = (1/2) (Cizi2)

where the Ci are molar concentrations of the charged species, and zi is their charge, and the sum is

over all ions. Note that the squaring of z gives positive values for anions as well as cations.

The dependence of on I is complex, but for aqueous solutions near 298K,

1/ ≈ 3.0Å/sqrt(I) where I is understood to be in molar concentration units

So, for example, if the ionic strength of a solution is 0.001 M, then 1/ = 96Å, whereas if I = 0.1 M,

then 1/ = 9.6Å. For reference recall that the sizes and distances between bonded atoms is in the 1Å

to 1.5Å range.

Activity coefficients for ionic species

44

A quantitative treatment of how ions are surrounded by a counter-ion atmosphere makes it possible

to calculate the theoretical magnitude of the favorable energy of interaction between an ion and its

counter-ion atmosphere. This energy of interaction will be the source of non-ideality in the

electrolyte solution, so mathematical expressions can be obtained for the activity coefficient for an

ion. Without derivation, the following is obtained. For a given charged species, i:

ln(𝛾𝑖) =−𝑧𝑖

2𝑒2

2휀𝑘𝐵𝑇

𝜅

1 + 𝜅𝑎

where a is the radius of the ion. Under relatively dilute conditions, 1/ >> a, and a << 1, so the a

term drops out of the denominator to give

ln(𝛾𝑖)−𝑧𝑖

2𝑒2

2휀𝑘𝐵𝑇𝜅

In aqueous solutions near 298K this equation, and the dependence of 1/ on sqrt(I), can be combined

and reduced to a simple approximate expression:

ln(𝛾𝑖) ≈ −1.2 𝑧𝑖2√𝐼 where I is understood to be in molar concentration units.

Note from the equation above that the activity coefficient is < 1 for each species, regardless of charge

sign, which is consistent with our qualitative discussion above. And note that as the ionic strength

goes to zero (e.g. under highly dilute conditions), the log of goes to 0 and therefore goes to 1, as

expected for ideal conditions.

Using ionic activity coefficients to analyze the effect of charge on molecular association, and

electrostatic screening

We can use the activity coefficient equation above to gain insight into how ionic strength affects

molecular association between charged

molecules (e.g. proteins or nucleic acids) in

solution. We’ll set up an abstract problem

where a molecule A has charge zA and a

molecule B has charge zB, and A and B can

come together in some association or

binding process to form species C, whose

charge is zA+zB.

From our previous discussions, we can quickly write out how we expect the equilibrium position of

this binding process to be affected by total ionic strength. Note that if we are dealing with large

molecules like proteins, their molarity is usually very low, so the charges on the molecules in question

45

(here A and B) may not contribute meaningfully to the total ionic strength. The total ionic strength

we’re talking about here would more likely relate to how much salt we added to the experiment. So

we’ll imagine that the ionic strength is something we control separately from whatever is happening

regarding A and B and their association.

What do we expect to happen to the equilibrium above if the charges on A and B are opposite and we

start adding salt? You’ve likely learned about electrostatic ‘screening’ before, which is the idea that

high salt concentration tends to mask or diminish any electrostatic force that two charged molecules

might exert on each other. So, intuitively you might expect that in the case where A and B have

opposite net charges that adding salt would lessen their tendency to associate and would therefore

shift the equilibrium position to the left.

Let’s set up the equilibrium equation for this process and see if we get the result we expect. Now that

we know how to handle non-ideal equilibrium expressions, we can write

Kapp = CC/(CACB) = K/ (C/(AB))

to describe how the non-ideal or apparent equilibrium constant would change according to the

values of the activity coefficients i for the three species. From the simplified Debye-Hückel equation

we know how the activity coefficients of the three species should depend on their charges and on the

ionic strength I. Exponentiating the previous equation for how depends on I, we would get

A = exp(-1.2*zA2*sqrt(I)) and similarly for B, and C = exp(-1.2*(zA+zB)2*sqrt(I))

Following some rearrangements,

Kapp = CC/(CACB) = K* exp(2.4*zA*zB*sqrt(I))

or

ln (Kapp) = ln (K) + 2.4*zA*zB*sqrt(I)

These equations confirm that if the charges on A and B have opposite sign, then Kapp would be lowered

(since the product of zA and zB would be negative) and the equilibrium position for the reaction would

therefore be shifted to the left by increasing ionic strength. This is precisely what we expected based

on higher ionic strength screening the attractive electrostatic force between A and B. And note that

the effect would be opposite if A and B were of like charge; the overall driving force for their

association in that case might arise from other non-electrostatic interactions, and an increase in ionic

strength would diminish the electrostatic repulsion between them.

46

Molecular crowding and excluded volume effects as an example of non-

ideal behavior in solutions of macromolecules

The idea of excluded volume

Earlier we alluded to the idea that solutions containing very large solute molecules might give rise to

non-ideal behavior. This phenomenon is sometimes described in the context of ‘molecular crowding’

or ‘excluded volume’ effects. To understand the phenomenon we need to consider a solution that

contains some large solute molecules already, and think about what effect their presence has on our

ability to add another copy of the solute. The molecules cannot occupy the same space. Therefore,

across the entire volume of the system, some of the locations are excluded as possible positions for

placing a new molecule. That is the excluded volume. To a first approximation, the relationship

between molecular crowding and the activity coefficient for a macromolecule can be written as

= Vtot/(Vtot – Vexcl) where Vtot is the total volume of the system and Vexcl is the excluded volume.

Note that this implies that molecular crowding effects correspond to > 1. Geometrically interesting

aspects of molecular crowding come into play when we look more carefully at what is meant by the

excluded volume. The excluded volume is not simply the volume of space that is occupied by the

existing solute molecules. The complication is that we have to think about where we can and can’t

choose to position a new molecule, meaning where its center could or could not reside. As you will

see from the diagram below, the region where we cannot place (the center of) a new solute molecule

is much larger than the space actually occupied by the existing solute molecules. First we illustrate

the situation where the solute has the shape of a large sphere (e.g. a compact globular protein).

What the diagram shows is that the excluded volume in the case of spherical molecules is a sphere

with twice the radius of the individual molecule. That volume is therefore 8 times larger than the

volume actually occupied. That is, Vexcl = 8Vocc. By rearranging the approximation above for by

dividing by Vtot, we see that = 1/(1-Vexcl/Vtot) = 1/(1-8Vocc/Vtot). As a result, even if a relatively small

47

fraction of the total space is occupied by macromolecules, the activity coefficient may be considerably

higher than 1. In this rough model, if 5% of the space is occupied, = 1.67.

The peculiar behavior of rigid elongated structures

This is interesting by itself, but the situation becomes much more intriguing when we consider

molecules whose shapes are highly elongated rather than spherical. Choosing a geometrically

tractable model, here we treat the case of a long rod-shaped molecule with a square cross-section,

whose dimensions are L x d x d, with L >> d. Again we can consider what volume of space is excluded

for placing (the center of) an added molecule in the proximity of another. The analysis is more

complicated than for the sphere because now the relative orientation of the two molecules matters.

Furthermore, as we carry out

the same exercise as before in

sliding the second rod around

the first one to see where we

cannot place the second one, we

must keep the orientation of the

rods unchanged; we are only

asking about the allowable

position for the second molecule at some fixed orientation. First, we will consider the best case

scenario, which is where the two rods are parallel to each other. The result is similar to the case with

the spheres: the excluded volume would be (2d)(2d)(2L)=8Ld2, which is 8 times the volume of a

single molecule.

But what about the case where the

two rods are perpendicular to each

other? This is the worst case

scenario. It takes more careful

visualization in 3D (shown on the

right), but the excluded volume in

this case is (L+d)(L+d)(2d), which is

2d(L2 + 2Ld + d2). If we compare

this to the volume of one molecule

by dividing by Ld2, we get a ratio of

2L/d + 4 (dropping the term 2d/L

which would be small). Now,

instead of getting a ratio of 8, we get

a much higher number since L >> d.

Returning to the earlier equation

for the activity coefficient, we see

that could be large even when the

fraction of the space occupied by

parallel rods

perpendicular rods

48

the rod-shaped molecules is small. The real behavior of course would have to be an average (or really

an integral) of the behavior as a function of the angle between the rods. But the effect remains

substantial.

How do these excluded volume ideas relate to real macromolecules? If we take the lessons to be

general ones that should apply even if the situations in question don’t involve molecules that look

exactly like spheres or rigid rods, then the implications are numerous. Protein folding is one relevant

example. Proteins have to be stable in their folded compact conformations compared to the unfolded

form in which their backbones would generally be flexible and much more extended. Certainly the

unfolded form of a protein should have a much greater excluded volume. As a result, under

conditions where crowding effects are significant, like when the overall concentration of

macromolecules is high, the activity coefficient for the unfolded form of a protein could be

significantly greater than 1. We can write an equilibrium process between the unfolded (U) and

natively folded (N) states:

U N

If the equilibrium constant under dilute conditions is K, then following the procedures we developed

earlier we can write that under conditions where non-ideal (crowding) effects come into play,

CN/CU = K/(N/U)

We would expect molecular crowding effects to give U > N . The consequence is that (CN/CU) should

go up and the equilibrium position should be shifted to the right, towards the direction of native

folding, by crowded conditions. This is an important point given how crowded the conditions are in

the cell, and also how dilute typical conditions are when purified proteins are studied in the

laboratory. It may be that proteins are significantly stabilized in their folded states in the cell by

crowding; this is not reflected in typical laboratory studies.

Another example involves highly elongated filamentous structures like F-actin and microtubules that

form in the cell by assembly of large numbers of protein subunits. The behaviors of these kinds of

protein filaments in the cell are influenced strongly by crowding. The effects are probably myriad,

but one basic effect in such systems is the tendency towards alignment or bundling of filaments.

Without working out a sophisticated model, one can still get a sense of why this is the case. Consider

the alternative scenarios

where you have a system

with filaments that are

either mainly aligned vs

randomly oriented. Now

consider trying to add an

additional filament (which is

a way of sensing the activity

of a component). Which case

49

allows for easier addition? The situation is illustrated above, where you can see clearly that adding

additional filaments is easier if they are more aligned. In that sense you can see that the activity

coefficient should be lower in the aligned case, and so that case will be favored as crowding comes to

dominate. Of course we know that entropy will tend to drive such a system in the other direction,

towards random molecular orientations, but at some point the crowding effects will prevail and favor

alignment. The alignment and bundling of protein filaments is likely functionally important in the

cell.

In this lecture we detailed just two kinds of physical phenomena – ionic interactions in solution and

crowding effects – that give rise to non-ideal behavior. But biological systems are highly complex,

and indeed non-ideal behavior can arise in many different ways.

50

CHAPTER 5

Chemical Potential and Equilibrium in the Presence of Additional Forces

There are many instances, in both cellular and experimental laboratory settings, where molecules

experience additional forces that contribute to the energy they feel, thereby affecting the equilibrium

positions of the processes in which they are involved. We will consider some examples here, and

develop a general framework for modifying our previous equations for chemical potential in order

to handle these situations. The essence is to rewrite our equation for the chemical potential for some

molecular species in a way that adds in the relevant extra energy:

i = i0 + RT ln Ci + energy term

with the added term relating to the energy experienced by the molecular component in question, on

a per mole basis.

Osmotic pressure

Osmotic pressure is a familiar phenomenon. It has important roles in cellular function, and is also

the basis for laboratory measurements to study molecular behavior, though this was more common

in the past than it is now. As you will recall, osmotic pressure refers to a pressure difference that

must be exerted to prevent water from moving across a semi-permeable membrane from a side

where the overall solute concentration is low to where it is higher.

We can set up a system with two chambers separated by a

semi-permeable membrane (permeable to water but

nothing else). The equilibrium process in question is

therefore the transport of water from one side to the other.

We will add protein to side A, where it will be confined to

stay.

We can see right away that the concentration of water on

the two sides is never going to be equal, and we recall from

earlier discussions that the chemical potential is

determined by the standard chemical potential plus the concentration term. So the only way water

can be at equilibrium between the two sides is if there is an addition force that is different between

the two sides, in this case a pressure on side A preventing flow of water from right to left. We can

write the equilibrium situation for water as follows:

For the chemical potential of water on the B side,

H2O,B = 0H2O

51

For the A side,

H2O,A = H2O,A0 + RT ln XH2O,A +

(Note that we are using X instead of C for the concentration of H2O)

To complete this analysis we need to know how the chemical potential energy should change as a

function of pressure.

𝜕

𝜕𝑃𝜇 =

𝜕

𝜕𝑃(𝜕𝐺

𝜕𝑛) =

𝜕

𝜕𝑛(𝜕𝐺

𝜕𝑃) =

𝜕

𝜕𝑛𝑉 = �̅�

where �̅� is the molar volume (for water in this case). Then, d = �̅� dP

and H2O = �̅�𝐻2𝑂P = �̅�𝐻2𝑂 where denotes the osmotic pressure difference.

Now, if we use the term �̅�𝐻2𝑂 to take the place of the extra term in the chemical potential for water

on the A side in our previous equation, we can equate the chemical potentials for water on the two

sides and rearrange to see that:

RT ln XH2O,A + �̅�𝐻2𝑂 = 0

This equation provides more insight if we manipulate it to obtain an expression in terms of the

concentration of the solute instead of the concentration of water. We let X2 be the mole fraction of

the solute. X2 = 1 – XH2O. Then substituting above, and noting that from Taylor’s expansion that ln(1-

X2) = -X2 + X22/2 + … ≈ -X2, as long as X2 is small, we get

RT X2 ≈ �̅�𝐻2𝑂

Now switching from mole fraction to molar concentration by noting that at low concentration X2 ≈

C2�̅�𝐻2𝑂 , we get

≈ RTC2

In other words, the osmotic pressure is proportional to the molar concentration of solute molecules

present, making it an example of a ‘colligative’ property.

It is sometimes convenient to further modify the osmotic pressure equation to convert from molar

concentration C, to weight concentration, c. We often have a better way of knowing the weight

concentration of a protein in solution, for example from a spectroscopic absorbance measurement

that reports on the approximate number of peptide groups or particular amino acid groups present

rather than the number of polypeptide chains present. Also, the conversion from molar to weight

the expected change in chemical potential

energy due to a change in pressure.

52

concentration introduces a molecular weight term, and as a result information about molecular

weight (of a protein or nucleic acid) can be obtained. The conversion follows from c = MC where

lower case c is the weight concentration (typically in g/L = mg/ml) and M is molecular weight

(typically in g/mol). We get,

≈ RTc2/M

Therefore, measuring and knowing the weight concentration allows approximation of the

molecular weight: M ≈ RTc2/. Osmotic pressure is no longer a common biochemistry laboratory

technique, but later we will discuss more common experimental methods for molecular weight

determination.

Osmotic pressure measurements are sometimes used to examine non-ideal effects in solution.

Modifying our previous equation to allow for non-ideal effects (and also realizing that other

approximations were introduced by truncation

of the Taylor’s expansion), we can write an

expression for osmotic pressure as follows:

≈ RT (c2/M + B c22 + …)

In this expression, B captures the first order non-

ideality and is referred to as the second virial

coefficient. One way of extracting B from

measurement of osmotic pressure as a function

of solute concentration is illustrated here.

Equilibrium sedimentation

We don’t often think about the effect of gravitational force on molecules in solution. But putting a

sample in a centrifuge is essentially the same as increasing the force of gravity (sometimes by a factor

of tens of thousands). If the solute molecule has a large mass (as do proteins and nucleic acids), then

these forces can have significant effects. Centrifugation is a widely used laboratory technique, and is

used in various modes for different purposes. Here we will consider a particular kind of

centrifugation experiment that is powerful for studying the molecular weights of macromolecules in

solution in a way that preserves their native conformations and assembly states.

We begin by thinking about what we expect to happen during centrifugation (or even under simple

gravitational forces) in two limiting cases: (1) where a solute is very small (e.g. ethanol or sucrose)

or (2) when the particle in question is massive (like a cell or a sand particle). If the solute is very

small, then we can spin the sample forever and the concentration of the solute will be uniform,

essentially equal throughout the tube, from the top to the bottom. Schematically, the result can be

53

diagrammed as shown, where r is the variable describing distance from the axis of rotation. [We draw

the tube and the position variable r horizontal since the axis of rotation is vertical.]

At the other extreme, if the particle in question is massive, then virtually all of it will go to the bottom,

and the concentration will be nearly zero everywhere else.

But what if the solute is intermediate in mass? Then we should expect its concentration profile to

somehow be in-between the two extreme cases illustrated: not uniform throughout the sample, but

also not completely sedimented to the bottom. In other words, we should get a higher concentration

towards the bottom and a lower concentration towards the top. And this situation should be stable,

meaning we can spin it forever and this is the final equilibrium result.

The idea is schematized above. But what is the exact form we expect for this curve? Surely it must

depend on the mass of the molecule, so how might we extract a value for the mass from the

equilibrium sedimentation behavior?

Qualitatively we can see that this is a situation of forces in balance. We end up with a concentration

that is unequal (higher towards the bottom), and we know that there must be an entropic driving

force in the opposite direction, favoring a more equal distribution. This is a balancing force that acts

against the gravitational or centrifugal force that is driving molecules towards the bottom of the tube.

54

This is a situation at equilibrium, so we can treat the problem with our general approach of setting

up a chemical potential equation that contains an extra energy term relating to work done by an

external force.

Imagine a solute molecule that is free to move between two positions in a tube. At equilibrium, the

chemical potential for the solute at those two positions must be equal (otherwise there would be

further net transport). So, we will solve our problem by requiring that d/dr = 0. But first we write

an equation for how we expect the chemical potential to depend jointly on concentration and

position in the tube, since we are ultimately interested in establishing how concentration and

position are related to each other.

µ = µ0 + RT ln C + U

where U can be viewed as a potential energy on a per mole basis relates to movement of a solute in

the applied gravitational or centrifugal field.

Then, d/dr, which must be zero at equilibrium, is

d/dr = 0 = RT d(ln C)/dr + dU/dr

Generally, force F is the negative of the derivative of potential energy with respect to position, F = -

dU/dr, so rearranging we get

RT d(ln C)/dr = F

Now we can introduce the dependence on mass, since F=ma, where m is mass and a is acceleration.

But before proceeding with the equation above we have to expand on the meaning of the mass m in

the context of the current problem. What matters here is not simply the mass of the solute, but the

‘buoyant’ mass, meaning the difference between its mass and the mass of the water it displaces, which

of course depends on its volume. Also, to be consistent with the energy equation we need to work

out the relevant mass equation in per mole terms. The mass we need in our equation above is:

NA(m – v * H20) where v is the volume of one solute molecule.

We can replace the volume of one molecule with its ‘specific volume’ �̅� (which is volume per mass or

really just the reciprocal of density), times its mass. Including a subscript 2 to make it clear that the

specific volume refers to the solute and not the solvent, we get:

NA(m – m�̅�2H20) = NAm(1 – �̅�2H20) = M(1 – �̅�2H20) (where M is the molecular weight of the

solute)

55

The unitless term (1 – �̅�2H20) is referred to as the ‘density increment’ and is sometimes replaced

with a single variable 2 for simplicity. Note that if the solute is composed of material whose density

is greater than water, which is true for proteins and nucleic acids (but not lipids), then �̅�2H20 will be

less than 1, and 2 will be greater than 0.

Using this expression (M2) for the buoyant mass on a per mole basis in our F=ma equation, we get F

= M2a.

Before returning to our equation that balanced the concentration gradient in the tube with force, we

point out that there are two different kinds of problems where these equations are useful, (1) where

the force is simply gravitational (in which case a=g), and (2) where we are doing centrifugation (in

which case a = 2r, from introductory physics, with representing angular velocity; also recall that

= ‘rpms’*2/60).

We will proceed to work out the equilibrium situation for centrifugation. Substituting F= M2a =

M22r into our previous equation for balanced forces, we get

RT d(ln C)/dr = M22r

d(ln C)/dr = M22r/RT Then separating the derivative variables gives

d(ln C) = M22r/RT dr And integrating gives

𝑙𝑛𝐶 |𝐶=𝐶0

𝐶 =𝑀𝜙2

2𝑅𝑇𝜔2𝑟2|𝑟=𝑟0

𝑟

𝑙𝑛𝐶 − ln 𝐶0 = ln (𝐶 𝐶0⁄ ) =

𝑀𝜙2

2𝑅𝑇𝜔2(𝑟2 − 𝑟0

2)

or

𝐶𝐶0

⁄ = 𝑒(𝑀𝜙22𝑅𝑇

𝜔2(𝑟2−𝑟02))

where r0 refers to some reference position in the tube and C0 refers to the concentration at that

position.

By matching the equation for ln(C) above to the standard form for a linear equation (y=mx+b), you

can see that plotting log of concentration with respect to the square of the position in the tube (i.e.

distance from the axis of rotation) should theoretically give a straight line whose slope is M22/2RT,

from which the value of M can be calculated, since the other variables represent known quantities. A

schematic diagram is shown.

56

Furthermore, note that because weight

concentration (c) is proportional to molar

concentration (C), ln(c) differs from ln(C) only by an

additive factor. That means that you can plot ln(c)

vs r2, and the slope will be the same as above. This

is useful because the weight concentration of a

protein or nucleic acid sample is typically the easier

quantity to establish from a routine spectroscopic

measurement.

Note that in comparison to some other methods that

you might be familiar with for determining

molecular weights of proteins – e.g. SDS polyacrylamide gel electrophoresis – equilibrium

sedimentation keeps proteins in their native forms, including potential assemblies of multiple

subunits. It is therefore very useful for getting information about the oligomeric states of proteins,

i.e. whether they are dimers or trimers or larger species in solution.

Gravitational sedimentation

If the acceleration on a sample is due simply to gravity instead of centrifugal acceleration, then

instead of a = 2r, we simply have a = g.

With analogy to the previous equations, we get

RT d(ln C)/dr = M2g

Rearranging and integrating gives

𝑙𝑛𝐶 − ln 𝐶0 = ln (𝐶 𝐶0⁄ ) =

𝑀𝜙2

𝑅𝑇𝑔(𝑟 − 𝑟0)

or

𝐶𝐶0

⁄ = 𝑒(𝑀𝜙2𝑅𝑇

𝑔(𝑟−𝑟0))

Equilibrium sedimentation of a mixture

If a sample contains more than one type of macromolecular solute (e.g. two different proteins), then

its sedimentation behavior will be more complex. Each component will behave exactly as described

57

above, but if the multiple components have different masses, then their concentrations will increase

to different degrees as a function of position in the tube. As a result, if you were only able to measure

the total concentration of protein as a function of position, which would be the case if you were

relying on a typical spectroscopic reading, then your concentration profile would have an unusual

behavior that could not be fit to the equations we worked out above. The resulting concentration

profile would not match what you would expect for any choice of molecular weight for a single

component. That is, a plot of ln(c) vs r2 will not be straight, but curved.

Let’s look more specifically at how that plot would look. The slope of the curve should obey, slope=

d(ln(c))/d(r2) = M22/2RT. Now rewrite this in terms of d(c) instead of d(ln(c)) by noting that

d(ln(c)) = (1/c)d(c). That gives, d(c)/d(r2) = Mc22/2RT. Now if the sample is a mixture and our

measurement is of the total weight concentration, then we can write an equation for the behavior of

the total weight concentration as a sum over the components:

𝑑𝑐𝑇

𝑑(𝑟2)=

𝑑 ∑ 𝑐𝑖𝑖

𝑑(𝑟2)= ∑

𝑑𝑐𝑖

𝑑(𝑟2)𝑖

=𝜙2

2𝑅𝑇𝜔2 ∑𝑀𝑖𝑐𝑖

𝑖

Dividing by cT on both sides and then absorbing the cT in the denominator on the left into the

derivative of the log gives

𝑑(𝑙𝑛(𝑐𝑇))

𝑑(𝑟2)=

𝜙2

2𝑅𝑇𝜔2 (

∑ 𝑀𝑖𝑐𝑖𝑖

𝑐𝑇)

We can compare the complicated equation above to the simpler equation we had before for a single

pure component. That previous equation, after rearranging the terms a bit to match the form above,

was:

𝑑(𝑙𝑛(𝑐))

𝑑(𝑟2)=

𝜙2

2𝑅𝑇𝜔2𝑀

We can see that the equation for the slope of the curve of the log of the concentration of a mixture

matches the equation for the single component case, except that in the case of a mixture the molecular

weight M has been replaced with a term that gives a kind of average of the molecular weights of all

the components present, accounting for their concentrations in weight terms. Evidently, the slope of

the curve for the case of a mixture gives (after dividing by 22/(2RT)) an effective molecular weight,

Meff given by:

𝑀𝑒𝑓𝑓 = (∑ 𝑀𝑖𝑐𝑖𝑖

𝑐𝑇)

58

This is an example of a ‘weight-average’ molecular weight, since each component gets included

according to its weight concentration (as opposed to its molar concentration).

Clearly, extracting the molecular weights of multiple components from the equilibrium

sedimentation behavior of a mixture is a complicated challenge. We will not examine that problem

in any more detail, but some essential points can be made about the overall behavior. How would

the overall shape of a plot of ln(c) vs r2 look for a mixture compared to a pure component? We have

already established that it should not be straight, and that the slope should reflect the effective

molecular weight. Is the effective molecular weight greater or smaller as we move farther down the

tube? As we move to higher values of r (e.g. towards the bottom of the tube), the relative proportion

of the heavier components should be

greater since their concentrations

increase more rapidly with increasing

r. That means that the slope of the

curve, which is proportional to Meff,

should be higher at higher r. This

corresponds to upward curvature. The

plot on the right shows the result from

an equilibrium sedimentation run for a

protein assembly composed of 12

subunits. This result was interpreted

to indicate that the 12-subunit

assembly is in equilibrium with other

smaller subassemblies.

If a sample behaves like a mixture rather than a pure component, it may mean that we were

unsuccessful at purifying the desired component prior to the centrifugation experiment, so that

contaminants remained. But there is another possibility that occurs often. Many proteins associate

naturally into oligomeric forms, like dimers for example. At typical concentrations, a protein could

be at equilibrium between a monomer and a dimer form. What would happen then in an equilibrium

sedimentation experiment? This is clearly a case of a mixture, so we will see behavior like that

detailed above. Towards the bottom of the tube, the relative proportion of the dimer will be higher

than at the top. If the monomer and dimer are at equilibrium, then you might wonder how they could

be at equilibrium both at the top of the tube and at the bottom if the relative proportion of monomer

to dimer is different at the two positions. But if the problem is worked out in detail, one sees that

this is exactly as expected: the overall concentration of protein is higher at the bottom, and this

naturally gives a higher proportion of dimer at equilibrium. In other words, the entire system is able

to reach equilibrium both with respect to monomer-dimer association and with respect to the

dependence of concentration on position in response to the external centrifugal force.

Effects of non-ideal behavior

59

If a solute exhibits non-ideal behavior, then we might also observe deviation from the equations we

worked out above for the equilibrium concentration as a function of position. Specifically, we should

still expect a plot of log of activity of the solute to be linear when plotted as a function of r2, but the

concentration may be different from the activity.

Summary

In this chapter we examined the behavior of systems where molecules are under the influence of

external forces. We provided a general strategy for modifying the chemical potential equations, and

worked out the details for two situations: osmotic pressure and equilibrium sedimentation. These

are just two examples among many different ways that external forces come into play in biochemical

systems.

60

CHAPTER 6

Electrostatic potential energy, ion transport, and membrane potentials

In previous lectures we covered scenarios where molecules were subject to specific forces. In this

lecture we will look at ions that are subject to forces arising from voltage or electrostatic potential

differences. We discussed ionic interactions earlier in a different context, where we dealt with the

energy an ion experiences from being in solution with other ions around it. In this chapter we will

deal with electrostatic interactions in a different context. We will consider how ions are distributed

in space as a result of an electrostatic potential (i.e. a voltage) that is different at different locations.

A main focus will be on situations where the electrostatic potential is different on the two sides of a

semi-permeable membrane. This has wide applications to molecular biology and electrophysiology.

The chemical potential energy of an ion at a position of electrostatic potential

We need to know what energy to associate with a charge residing at a particular electrostatic

potential (which is a voltage). You’ll recall from introductory physics that the work required (or

potential energy generated) in moving a charge q to an electrostatic potential is U = q. For our

purposes we need energy on a per mole basis. You’ll recall that the charge on a mole of elementary

particles is NA*e = 6.02 1023 * 1.6 10-19Coulombs = 96,500 C, which is defined as one Faraday, F. That

means if we are considering a particular kind of ion whose valence charge is z (e.g. zCa2+ = 2), then the

charge q on a mole of those ions will be q=zF. Finally, the potential energy gained by putting that

charge at an electrostatic potential (on a per mole basis) is U = zF. From this we can write our

equation for chemical potential energy in the presence of an electrostatic potential:

i = i0 + RT ln Ci + ziF

The Nernst equation and membrane potential

Suppose we have a system with two chambers separated by a

membrane that is permeable to the ionic species in question. Our

interest here is in situations where the electrostatic potential is

different on side A vs side B, that is A B. The equilibrium process

of interest is the transfer of ionic species i from side A to side B. From

before we know that this means i,A = i,B.

The separate equations for the chemical potential of species i on the

two sides, taking into account electrostatic energies, would be:

i,A = 0i,A + RT ln Ci,A + ziFA

61

and

i,B = 0i,B + RT ln Ci,B + ziFB

Setting the chemical potential equal to each other and rearranging gives:

= B - A = RT/(ziF) ln(Ci,A/Ci,B)

This is one form of the Nernst equation. It tells us that the electrostatic potential difference between

the two sides is related to the log of the concentrations of the ion on the two sides (assuming that the

ion is free to reach equilibrium across the membrane). Note the effect of the sign of the charge, z. A

negation of z reverses the effect. Consider first a case where z is positive (e.g. Na+ ions). The equation

tells us that if the potential is higher on side B (here meaning > 0), then the concentration of

positively charged ions will be higher on side A. The reverse is true for a negatively charged ion that

is free to equilibrate; it would be more concentrated on side B if the potential is more positive on that

side. At first this might seem backwards. How can the potential be higher on the right if the positively

charged ions are more abundant on the left? The short answer is that it is important to keep in mind

that the ions here are responding to an electrostatic potential that exists in the system. That is, the

unequal concentration of ions is an effect and not the cause of the potential difference here .

It is instructive to note that the voltage difference in the equation does not carry a subscript for

the ion or its charge. That means that if there are multiple charged species in the system that are at

equilibrium between the two sides, then the ratios of their concentrations must give the same value

for . Evidently the concentration ratios for different ions must be related to each other. We can

write out two versions of the equation above, one for ion i and the other for ion j, and then equate the

two potentials to give:

ln(Ci,B/Ci,A)/zi = ln(Cj,B/Cj,A)/zj and (𝐶𝑖,𝐵

𝐶𝑖,𝐴⁄ )

1𝑧𝑖

⁄

= (𝐶𝑗,𝐵

𝐶𝑗,𝐴⁄ )

1𝑧𝑗⁄

As an example, if Na+ and Cl- ions are both at equilibrium between the two sides, then

(𝐶𝑁𝑎+,𝐵

𝐶𝑁𝑎+,𝐴⁄ ) = (

𝐶𝐶𝑙−,𝐴𝐶𝐶𝑙−,𝐵

⁄ ) and (𝐶𝑁𝑎+,𝐵𝐶𝐶𝑙−,𝐵) = (𝐶𝑁𝑎+,𝐴𝐶𝐶𝑙−,𝐴). The equation would be

more complex if the charges were not plus or minus one, but in this simple case the product of sodium

and chloride ions on the two sides is equal at equilibrium. This result will be convenient in a

calculation shortly.

The previous equations describe equilibrium conditions. Away from equilibrium, the free energy on

a per mole basis for ion transport between positions A and B, where the ion concentrations and

electrostatic potentials may both be different, would be:

G = RT ln(CB/CA) + zF(B – A)

62

The Donnan potential

So far we have discussed how ions that can equilibrate are driven to unequal concentrations

depending on the electrostatic potential difference that exists between two positions. But what might

be the source of the electrostatic potential difference? We already discussed that it is not caused by

the unequal distribution of ions that are able to equilibrate – their distribution is in the opposing

direction. One possibility would be an external applied voltage with electrodes on the two sides.

Problems based on those sorts of electrochemical cells are typically discussed in introductory

chemistry courses. But electrostatic potential differences – between the outside and inside of a cell

for example – are a common subject in cellular and biochemical systems, and in those cases an

external battery voltage is rarely the origin of the electrostatic potential.

Here we illustrate a highly simplified

system that shows how an unequal

distribution of an ion that cannot cross a

membrane can give rise to an electrostatic

potential. This is called a Donnan

potential. We begin with a simple two-

chamber system like before, but now we

put a protein molecule on side A only, and

assume it has a negative charge.

Counterions would also be present so we’ll

begin the setup with an equi-molar

concentration of protein- and Na+ ions on

the A side. We’ll denote this starting

concentration as x. Now in addition let’s

say that a certain amount of salt (NaCl) is

added to both sides to start. Call this

concentration s. Now let’s say that the Na+

and Cl– ions can cross the membrane but the protein cannot. What happens? We can answer that

question by supposing that some quantity of Na+ and Cl– crosses the boundary in order to reach

equilibrium; the amounts of Na+ and Cl– that cross should be equal in order to maintain

electroneutrality. Let’s assume the volumes of the two sides are equal for simplicity, so that the molar

concentration change as a result of Na and Cl- movement is the same on both sides, and call that value

d (plus d on the right, minus d on the left). Now we can establish the concentrations at equilibrium

by using the equation we worked out earlier that told us the product of Na+ and Cl– concentrations

must be the same on the A side and the B side at equilibrium, assuming they are both free to

equilibrate.

We get:

(x+s-d)*(s-d) = (s+d)*(s+d)

63

sx - dx = 4sd

The values of x and s are fixed quantities related to the initial concentrations. Solving for the desired

quantity d gives:

d = sx/(x+4s)

Having solved for d, we can write out expressions for the final concentrations of the Na+ and Cl– ions.

From there we can ask whether there is an electrostatic potential difference by applying the Nernst

equation to the concentration of ions on the two sides. As discussed before we should get the same

answer regardless of whether we examine the Na+ or Cl– ions since both are able to equilibrate.

Taking the Na+ ions, we get:

= B - A = RT/(Fzi) ln(CNa+,A/CNa+,B) where z for Na+ is +1

Plugging in the expression for d at equilibrium, the argument in the log function is

(x+ s - sx/(x+4s))/(s + sx/(x+4s)) which simplifies to (x+2s)/(2s). So,

= B - A = = B - A = RT/(Fzi) ln((x+2s)/(2s))

This equation is very specific for the way we set up the problem, so it doesn’t represent a general

finding, but it does let us evaluate the electrostatic potential under a given set of initial conditions.

Suppose for example that the concentration of salt added (s) was equal to the molar concentration of

the protein (x). Then the argument to the log function is simply (x+2s)/(2s) = 3/2. The value of RT/F

near room temperature is 0.0256 V (or 25.6 millivolts), which is a useful simplification worth

remembering. Finally, we get

= 0.0256 V * ln(3/2) = 0.010 V = 10 mV

Note that the way we defined means that the potential is higher on the B side compared to the A

side. Where did this voltage come from (since we didn’t apply an external voltage)? It comes from

the charge on the species (the protein in this case) that is confined to one side. Note that the protein,

which we took to have a negative charge, is generating a negative potential on the side where it

resides.

The situation is actually a bit more complicated. For example, how could there be a voltage if we

assumed electroneutrality on the two sides, since voltage is really a charge separation. This is a fair

objection, but the energy associated with macroscopic charge separation is very high, so while there

would in fact be a small amount of charge separation creating net charge on the two sides, that minor

charge imbalance would not affect the ion concentrations significantly. Evidently, very slightly more

Na+ than Cl- would cross from left to right and this would give a slight charge separation with net

negative charge resulting on the left, consistent with the negative voltage on the left.

64

Another point of interest is to look at what would happen in a system like this if we were to add

excess salt, that is s >> x. In that case, the argument of the log function from above ((x+2s)/(2s))

approaches 1, and the log goes to 0, so ≈ 0. This shows that the Donnan potential goes away if

excess ions are present that can equilibrate freely between the two sides.

Variable ion permeabilities and complex phenomena

In our previous discussions we treated simplified situations where different ions were either

completely free to equilibrate or totally unable to permeate the membrane. This was helpful in

gaining intuition about what drives the creation of electrostatic potentials, but relevant biological

scenarios are much more complicated. The membrane has very different degrees of permeability to

different ions. Furthermore, the distributions of ions across a cell membrane do not reflect

equilibrium ratios but are instead the result of a steady state process (or even a dynamic process

changing over time). Depending on their permeabilities, the ions are flowing down their chemical

(or electrochemical) gradients at the same time that transmembrane protein pumps are continuing

to transport them against those gradients. Ion permeabilities are therefore fundamental to

understanding the potential across the biological membrane. For example, it is changes in the

permeability of the membrane for certain ions that drives changes in the membrane potential during

nerve conduction. How can the cell membrane have different (and controllable) permeabilities to

different ions? Transmembrane protein channels provide the answer. They can be highly specific

for certain ions. And whether they

are in open or closed conformations

can be controlled by ligand binding

or other phenomena, including

things like pressure.

A simplified scheme at the right

illustrates the concentration

gradients of Na+ and K+ across a

typical cell. These gradients are

created at the expense of energy

input (e.g. ATP hydrolysis). The

inward pumping of K+ and the

outward pumping of Na+ results in

K+ being higher inside the cell and

Na+ being higher outside the cell. Now, if the membrane is more permeable to K+ ions than Na+ ions,

by virtue of a potassium channel for example, then we can understand the resulting membrane

potential. [The membrane potential in a typical cell is in the range of -40 to -80 mV, meaning the

inside of the cell has a negative potential.] One thing we learned was that a Donnan potential is

created by ions that can’t cross the membrane (or that cross very slowly). In the cellular scheme

here, the Na+ ions are the ones least able to cross (since the K+ channel doesn’t allow Na+ ions to pass),

65

and the higher concentration of this species outside the cell is consistent with the outside having the

positive potential. Another way of looking at it is in terms of the net charge separation that would

occur across the membrane. The gradients for Na+ and K+ are in different directions. Na+ ions are

trying to move back into the cell as fast as they can while K+ ions are trying to exit across the

membrane as fast as they can. But owing to the K+ channel, K+ ions exit more easily than Na+ ion

enter, thereby creating a small charge separation with more positive charge on the outside, again

consistent with the correct sign of the voltage across the cell. Note that the concentration gradients

of the ions in this scenario cannot simply be used to evaluate the membrane potential because

the ions are not reaching equilibrium between the two sides; their concentration gradients reflect

the activity of membrane pumps.

Finally, you can see from the preceding arguments how changing the ion permeabilities would affect

the membrane potential. Those permeabilities are controlled by opening and closing, or ‘gating’ of

membrane channels. A simplified description of how nerve conduction depends on ion

permeabilities goes something like this: a neuron with a negative resting potential receives a signal

that causes Na+ channels in the membrane to open; according to our earlier discussions, this reduces

the membrane potential (i.e. raising it closer to 0); this depolarizing voltage change is conducted

down the length of the axon like an electrical current down a wire; at the axon terminal, this

depolarization across the membrane causes Ca2+ channels in the axon terminal to open up; the resting

Ca2+ concentration is higher in the synaptic space between neurons, so Ca2+ ions flow into the axon

terminal; the increasing Ca2+ concentration inside the axon terminal triggers the fusion of synaptic

vesicles with the inner membrane of the axon terminal, releasing the enclosed neurotransmitter into

the synaptic space; the neurotransmitter diffuses across the synaptic cleft and binds to receptors on

the adjacent cell (e.g. a muscle cell or another nerve cell); depending on the cell receiving the signal,

binding of the neurotransmitter to the receptor may open up a sodium channel on the next neuron

to propagate the electric signal, or cause some other event, like muscle contraction or sensory

signaling.

Molecular Electrostatics

We will continue with our discussion of electrostatics, focusing here on the forces they exert and the

effects they have on macromolecules and their conformations. We are familiar with Coulomb’s law,

which tells us that the force between two charges goes as the product of the two charges divided by

r squared.

𝐹 =𝑞1𝑞2

𝜖𝑟2

This form of the equation applies when using cgs units.

[The SI form of Coulomb’s law includes an extra term

66

(4𝜖0) in the denominator, which is dropped from the cgs equation by having it absorbed into the

definition of the cgs electrostatic unit of charge.]

A related equation for potential energy (U) can be obtained by taking the force equation and

integrating over r and negating (since F=-dU/dr) to give:

𝑈 =𝑞1𝑞2

𝜖𝑟

In addition, recalling that the energy for placing a charge q at a potential is U=q, we can see that

the equation above implies that a single charge creates a potential around it given by

Φ =𝑞

𝜖𝑟

The dielectric value

The equations above for electrostatic forces and energies are likely familiar, but what is sometimes

overlooked is the importance of the medium in which the interactions takes place. This is captured

by the dielectric value 𝜖, which occurs in the denominator. Roughly speaking, the dielectric describes

how polarizable the medium is. For a vacuum, 𝜖=1, which is why the term is sometimes dropped in

the equations above, for example in introductory physics problems. But it is vital for biochemical

situations. The dielectric value for water is around 78! That means electrostatic energy calculations

that take place in aqueous solutions may be off by nearly two orders of magnitude if the dielectric

value is not handled properly. The extremely high dielectric value for water relates to its large dipole

moment. Water molecules in an electric field tend to orient themselves in a way that gives the lowest

energy, i.e. with the oxygen atom pointing in the direction opposite of the electric field vector. The

effect is to diminish or screen the net electrostatic force.

The dielectric value in less polar materials is much lower than in water. For hydrocarbons (which

serve as a model for the interior of a lipid membrane) the dielectric value is between about 2 and 4.

As we will discuss later, charged amino acids are important in protein structure, and so the value of

the dielectric for a protein molecule is an important (and long debated) issue. Values between 4 and

20 occur in the literature for the dielectric in the interior of a protein. For a charge that resides on

the surface of a protein, exposed to water, the relevant value is probably close to that for pure water.

Simplified electrostatics equations

The equations above are clumsy to apply unless you remember what the value is for an elementary

charge in cgs electrostatic units. Instead it is convenient to convert the equations to forms that can

be applied more easily, using integer values, z, for the charges (e.g. z=1 for Na+). Simplified equations

that apply near room temperature are:

67

𝑈 =𝑧1𝑧2

𝜖𝑟1389 kJ/mol (where r must be in Angstroms)

and

Φ =𝑞

𝜖𝑟 14.4 Volts (where r must be in Angstroms)

Examples:

1) How much energy does it take to bring two Na+ ions from a starting distance of infinity to a final

distance of 4 Å if the dielectric value is 78? Answer: 4.5 kJ/mol. Is this energy significant or not?

Recall RT ≈ 2.5 kJ/mol, so the magnitude of the effect would be exp(-4.5kJ/2.5kJ)=0.17.

2) How much energy might an ion-pair contribute to the stability of a folded protein? Suppose the

situation in question is an aspartate side chain that is 5 Å away from a lysine. Suppose the interaction

takes place near the protein surface where the high dielectric of water makes the effective dielectric

there about 40. Answer: -6.9 kJ/mol, and the magnitude of the effect on K would be

exp(+6.9kJ/2.5kJ)=16.

A different kind of electrostatic energy: the Born ‘self-charging energy’

A powerful but underappreciated idea arises by considering a hypothetic process of creating a unit

charge out of infinitesimal charge elements, dq. [This is an example of a kind of ‘thought experiment’

referred to by physicists as a ‘gedanken experiment’; essentially an experiment that can only be

performed in one’s mind.]

Form our previous discussions we know that the (differential) energy

required to bring a (differential) charge dq to a position where the

potential is is dU=dq. Here, is the potential at the place where we

are depositing the charge, which is at the surface of the ion being created

(in our imagination). If the radius of the ion is a, then from above we know

the potential there is q/(𝜖a), where q is whatever charge has already been

deposited. We can obtain the hypothetical energy for creating this charge,

which is usually referred to as the Born self-charging energy, by

integrating our differential energy over q.

Born self-charging energy: 𝑈 = ∫𝑑𝑈 = ∫𝑞

𝜖𝑎𝑑𝑞 =

1

2

𝑞

0𝑞2 (

1

𝜖𝑎)

Again, this can be made more convenient:

Born self-charging energy: 𝑈 = 𝑧2 (1

𝜖𝑎) (1389/2) kJ/mol (where a must be in Angstroms)

68

The imaginary idea of creating a charge from nothing may seem silly at first, but it gives us a powerful

result relating to the energy for a (very real) process of transferring an ion between two locations

where the dielectric is different.

Free energy of ion transfer

As we discussed earlier, the cell is complex and the dielectric is different in different places. The low

dielectric of the lipid bilayer is particularly noteworthy, especially given the physiological importance

of ion passage through membranes. Let’s look then at the energy associated with transferring an ion

from aqueous solution into a lipid bilayer. [We’ll assume here that the energy can be considered as

a contribution to the free energy of the process.] We can think of the transfer process as a

composition of separate steps: reversing the imaginary ion creation process in the first medium, then

transferring the infinitesimal charges into the second medium (at no energy cost since they are

infinitesimal), and then recreating the charge in the second medium. Evidently, the transfer free

energy is just the difference between the energies required to create the charge in the two different

media. If the dielectric values for the two media are 𝜖1 and 𝜖2, then the free energy of ion transfer

from medium 1 to medium 2 would be:

Δ𝐺𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 = 𝑧2

𝑎(1

𝜖2−

1

𝜖1) (1389/2) kJ/mol

Things to note here are that z is squared, so the effect is

the same for positive or negative ions. Second is that the

energy is positive (i.e. unfavorable) if the transfer is to a

lower dielectric, as expected. Third, note the

dependence on the radius of the ion; the energy term is

larger if the ion is small since the charge is more

localized. Lastly is the magnitude of the effect. Consider

the free energy of transferring a sodium ion from water

into the middle of a lipid bilayer. Take 1 Å as an

approximation for the ion radius. Let the dielectric be 4 for the bilayer and 80 for the water. Under

those approximations, Gtransfer = 12*(1/1Å)*(1/4 – 1/80)*1389/2 kJ/mol = 165 kJ/mol. Is this big

or small? 170 kJ/mol = 66*RT ! This is an enormous energy barrier. This exercise demonstrates the

virtual impermeability of a lipid bilayer to naked ions. What is going on at the atomic level that would

explain why an ion prefers so strongly to remain in water rather than in the lipd? In water (or another

material of high dielectric), the surrounding molecules rearrange themselves to interact with the ion

in ways that corresponds to an overall favorable energy. This is not possible in the lipid bilayer (or

other medium of low dielectric). The same underlying idea is sometimes described in terms of the

energy required to desolvate an ion when it is moved into a non-polar environment.

69

Processes that generate, maintain and exploit ion gradients across membranes form the basis for

most of the energy conversions that occur in biology. This is only possible because ions cannot easily

cross the bilayer. The complex energy conversion processes in the cell are conducted by

transmembrane proteins (pumps, channels, and transporter) that reside in the bilayer.

70

CHAPTER 7

Energetics of Protein Folding

Proteins acquire their unique functions by folding up into specific three-dimensional structures. This

gives them the shapes and arrangements of chemical groups required to carry out their activities.

The details of how proteins manage to reach their correct shapes from a starting point of being

extruded in a more or less extended conformation from the ribosome, and what energetic features

stabilize their final configurations, are questions of considerable importance from a fundamental

perspective and for practical reasons as well. Numerous biotechnology and pharmaceutical

problems revolve around stabilizing enzymes or other types of proteins.

A balance between large opposing forces

You’ve learned before about simple molecules that have multiple conformations whose relative

energies dictate which is more preferred; the chair vs boat conformation for cyclohexane derivatives

is an example from organic chemistry. The problem of protein stability is considerably more

complex. One thing that makes the protein stability problem unique is the sheer size of the molecules

involved; thousands or tens of thousands of atoms are interacting with each other. Another

important consequence of the size and mainly linear covalent structure of protein molecules is the

vastness of the possible configurations each one could adopt in principle, by variation of the phi-psi

torsion angles along its backbone, not to mention the side chain conformations. The vastness of this

conformational space (practically all of which represents non-native configurations of the protein)

means that folding a protein into its native configuration comes at an enormous cost in terms of lost

entropy. This cost must be offset by very large numbers of favorable interactions between the

thousands of protein atoms in the natively folded conformation.

The arguments above paint a unique picture for protein energetics. The total net energy that

stabilizes a folded protein over its unfolded state is typically not very large; it arises as a relatively

small difference between very large energetic terms working in opposing directions. You can

imagine then that rather small changes to the amino acid sequence of a protein might offset this

balance, and indeed minor changes to the sequence of a protein often have surprisingly large (and

frequently unexpected) effects on protein stability and function.

As a rough numerical estimate, the conformational entropy lost in going from a flexible protein

backbone to a particular conformation is about 5 kcal/mol per amino acid residue, meaning 1000

kcal/mol for a 200 amino acid protein for example. [Confusingly, protein energetics are often

discussed in kcal instead of kJ; multiply by about 4.2 to convert from kcal to kJ.] In contrast, the net

stability (G0) for the process of protein folding is much smaller, often in the range of -5 to -10

kcal/mol (unfolded folded). This is fairly large compared to RT, so typical proteins have

stabilities that keep them nearly exclusively in their correctly folded configurations (at least under

the right conditions, though those aren’t always known or easy to replicate in vitro), but as noted

71

above, this net stability is small compared to the magnitudes of the opposing energetic terms that

must balance out in the end.

Terms that contribute to the energetics of protein folding

You are familiar already with the various forces involved in atomic interactions. Here we will

summarize how those forces relate to the particular problem of protein stability.

Electrostatics

We discussed charge-charge or ‘salt-bridge’ interactions earlier. They contribute to the stabilization

of proteins with magnitudes that are somewhat modest since they tend to occur near the surface of

proteins where the high dielectric of water reduces their strength. But they can be important,

particularly with the view that net protein stabilization has to come from the accumulation of many

smaller energetic contributions.

Following from our earlier discussions, the unfavorable energetics of putting a charge in a region of

lower dielectric also has important consequences for protein structure. Several of the natural amino

acids are charged (aspartate, glutamate, lysine, and arginine), so we expect to find those amino acids

almost exclusively on the exterior of a protein when it is in its correctly folded configuration. Charged

amino acids are occasionally found buried in the interior of a protein, but those are usually cases

where that particular amino acid is playing a critical role, for example in the catalytic cycle of an

enzyme; it is sometimes necessary for a protein to pay the cost of an unfavorable energetic feature in

order to achieve a required function.

The cost of burying a charge inside a protein also means that the natural pKa values of amino acids

can be significantly different compared to the textbook values that give the pKa value of amino acids

dissolved in water. Placement of a titratable group (i.e. a group that can add or lose protons) in a low

dielectric shifts the equilibrium position towards the neutral form. Does that raise or lower the pKa

of a carboxylate group? What about the amino group of lysine? How much do you expect the pKa

values to change? [Hint: You know how pH and pKa values relate to concentration ratios: 1 unit equals

a factor of 10. And you know how energy differences affect equilibrium ratios: divide the energy by

RT and exponentiate. And you know how to estimate the energetic cost of burying a charge according

to our previous equations for ion transfer.]

van der Waals or London dispersion forces: favorable atomic packing

You’ll recall from earlier coursework that van der Waals or London dispersion forces, sometimes

colloquially called ‘packing’ forces, are relatively weak. That may be true on an individual basis, but

with thousands of atoms the effects are extremely important. It is notable that when one examines

the structures of proteins in atomic detail, the atomic packing is seen to be generally very tight. The

atomic packing density in most protein interiors is similar to the packing seen in solid crystals of

72

organic molecules. That is a manifestation of the favorable energy associated with atom-atom

contacts on a large scale.

How does the good packing achieved in protein interiors relate to the stability of the protein in the

native state? The answer here is not so straightforward. Note that if the protein was in an unfolded

configuration it would likely be able to make good atomic contacts with the water molecules

surrounding it; water molecules are small enough to be arranged in ways that would give good

packing. The more important consideration is that if a protein had the wrong amino acid sequence,

for example if we mutated a small side chain to a large one or vice versa, then the atomic packing in

the natively folded configuration may be seriously disrupted. In that sense, favorable packing in a

protein may not be a major driving force towards folding, but if the native packing is compromised

then surely the folded configuration will be destabilized compared to the unfolded state.

Energies due to packing defect have been estimated to be about 0.5 kcal/mol per methylene-sized

cavity. That gives a rough estimate for the consequence of replacing a larger amino acid side chain

with one that does not fill the space properly. On the other hand it is hard to estimate the effect of

adding a larger amino acid side chain. You’ll recall that modeling the van der Waals potential energy

using the Lennard-Jones equation gives a very sharp rise (going as the 12th power of the interatomic

separation) in energy for steric overlap. So the cost of adding even one methylene to a place where

there might not be space can be catastrophic for the stability of the folded state.

Hydrogen bonding

Hydrogen bonding is a very specific type of interaction; it has some features of bonding (i.e. orbital

overlap) but it is mainly an electrostatic feature. It arises from (1) a hydrogen atom that carries a

partial positive charge owing to its covalent attachment to an electronegative atom (typically N or

O), which is referred to as the ‘donor’, and (2) a lone pair of electrons on an electronegative atom

(typically N or O), which is referred to as the ‘acceptor’. The strongest hydrogen bonds are where the

lone pair, the hydrogen, and the heavy atom attached to the hydrogen are arranged at least roughly

in a straight line.

The energy contributed by a hydrogen bond can

be estimated from model studies of small

molecules in solution to be in the range of about

5 kcal/mol. There are however many nuances

regarding hydrogen bonding: hydrogen bonds

involving a negatively charged acceptor like a

carboxylate can be extra strong, the lower

dielectric of the protein interior might magnify

the effects of hydrogen bonding, multiple

hydrogen bonds working together might benefit

from a cooperative effect, and so on. As a result,

the role of hydrogen bonding in proteins is a constantly discussed and debated issue. However,

73

certain points are clear. Proteins are full of hydrogen bond donors and acceptors. This is true of the

polypeptide backbone in particular; every peptide unit has a carbonyl acceptor and an amide

nitrogen donor. The cumulative energetics of hydrogen bonding is therefore substantial. It is

important to note however that water has excellent hydrogen bonding properties, so a protein in an

unfolded configuration can satisfy all its hydrogen bonding groups through interactions with water.

Accordingly, it may be that hydrogen bonding is not a major driving force for folding. On the other

hand, following the same logic as above regarding the importance of good packing in the natively

folded state, if we were to alter a protein in such a way that we created unsatisfied hydrogen bond

donors or acceptors in the interior, then this would surely destabilize the protein (since those donors

and acceptors could satisfy their hydrogen bonding needs by exposure to water in the unfolded

state). For example, if a serine side chain is buried in the interior of a natively folded protein and its

hydroxyl group is hydrogen bonded to a histidine, and then the histidine is replaced by mutation to

something like valine that lacks the required hydrogen bonding capacity, the serine would have

unsatisfied hydrogen bonding needs, and this could be highly destabilizing.

Hydrophobic effect

As you may have learned before, the hydrophobic effect is generally accepted to be the major driving

force for protein folding, at least for typical globular proteins. But the hydrophobic effect is in fact

not a separate force. It is instead a complex phenomenon arising from many-body interactions. The

net effect is that nonpolar molecules or functional groups are driven to associate with other nonpolar

molecules by being excluded from interactions with water. The name “hydrophobic” conjures the

idea that a nonpolar molecule doesn’t like water because it can’t make good interactions there, but a

closer look shows something a bit different. Three different kinds of interactions are possible here:

nonpolar-nonpolar, nonpolar-water, water-water. The nonpolar-nonpolar interaction benefits from

favorable van der Waals energies. A nonpolar molecule also benefits from good van der Waals

interactions if it is surrounded by water. So from the perspective of the nonpolar molecule, the

energetic difference is small whether it interacts with another nonpolar molecule or with water. But

things are different from the perspective of the water. Water makes highly favorable hydrogen

bonding interactions with itself. Some of those interactions must be lost if a water molecule is in

contact with a nonpolar molecule. So really it is the water molecules that don’t want to interact with

the nonpolar solute. The effect is the same in any case, the two kinds of molecules are driven to have

the least amount of interaction with each other as possible. From the description you can see that

the magnitude of the unfavorable energy relates to the surface area of the interaction.

As a side note, it is surprising to learn that when the unfavorable free energy associated with

transferring a nonpolar solute from an organic phase to water is examined in more detail

experimentally, one finds that the unfavorable free energy is not the result of a positive enthalpy

change, but is instead the result of a negative (unfavorable) change in entropy. This has been

explained by noting that at the interface between a nonpolar solute and water, the water molecules

are driven into highly ordered arrangements (sometimes referred to as clathrates) presumably in

order to recover as many of their lost hydrogen bonds as possible.

74

The magnitude of the hydrophobic effect has been estimated to be about 22 cal/mol per Å2 of

interaction area. A typical amino acid side chain has an area in the 100 – 200 Å2 range, and of course

many of the natural amino acids are nonpolar. Clearly the cumulative magnitude is very large. And

perhaps most critical is that these hydrophobic interactions are entirely different in the unfolded vs

the natively folded state (in contrast to some of the other energetic terms we discussed earlier). In

the unfolded state, enormous amounts of nonpolar surface would be exposed to solvent. As a result,

for a correctly folded protein molecule, the nonpolar side chains are mainly buried in the interior in

the ‘hydrophobic core’.

The special case of membrane proteins

More than a quarter of the protein molecules coded for by a cell are not soluble in the cytosol, but

instead spend their lives embedded in a lipid bilayer (either the cell membrane or a membrane

surrounding one of the various organelles in a cell). The energetic considerations for these

transmembrane (or TM) proteins are unique in some profound ways.

Enforcement of regular secondary structure in the membrane region

One of the most profound effects of the lipid blilayer environment is on the secondary structure of

proteins that span the membrane. We discussed the importance of proteins being able to satisfy all

or nearly all of their hydrogen bonding groups in order to maintain stability. And we noted that the

polypeptide backbone is full of hydrogen bond donors and acceptors that need to be satisfied. In

aqueous proteins, the need to satisfy backbone hydrogen bonds can be achieved with relative ease.

The backbone can either adopt regular secondary structure elements (which by their nature satisfy

backbone hydrogen bonding), in various orientations and with turns or longer unstructured loops

connecting them in almost unlimited fashion. The figure below (left) is just one example of the

tertiary structure of an aqueous protein. Aqueous proteins can have practically limitless structures

because regions of the backbone that are not in regular secondary structure elements can satisfy

their hydrogen bonding needs using water instead. That is not possible for transmembrane proteins.

The lipid bilayer is almost devoid of water. Therefore, where the protein is embedded in the bilayer,

it must practically always adopt strictly regular secondary structure so that the backbone will be

satisfied. There are two basic classes of transmembrane proteins: those that contain of a bundle of

alpha helices (or sometimes just one TM helix), and those that consist of a beta barrel, which is

essentially a beta sheet that is rolled up so that there are no unsatisfied edges. Those two classes are

illustrated below (middle and right). The alpha helix class is more abundant, but the beta barrel class

is common where large pores in a membrane are needed, i.e. in the outer membrane of many bacteria.

There are a few known cases where a protein enters only part way into the membrane or forms some

other structure that seems to involve unsatisfied hydrogen bonds, but they constitute rare

exceptions.

75

The problem of the missing hydrophobic effect

Another major difference between TM proteins and aqueous proteins concerns the hydrophobic

effect and how TM proteins can be stabilized in their native forms. You might have already surmised

that the outer surface of a TM protein (at least the region that is embedded in the bilayer) needs to

be nonpolar. Otherwise it would not partition into the membrane. But that is a major distinction

compared to aqueous proteins. Aqueous proteins have polar/charged surfaces and nonpolar

interiors, and that is what drives their folding in the presence of water. But if TM proteins have

nonpolar interiors and nonpolar exteriors as well, and are not surrounded by water in any case, then

it seems the hydrophobic effect cannot play a major role. The real situation is somewhat more

complicated, but the answer to this puzzle remains largely unanswered.

Measuring the Stability of Proteins

Much of our previous discussion has concerned the stability of protein molecules, meaning how much

lower the energy of the native configuration is compared to the unfolded configuration(s). In essence

we want to know K, and hence G0 (from G0 = -RT lnK), for the process:

U N where U denotes unfolded and N denotes natively folded

The first thing we need is to be able to tell what fraction of the protein in a sample is folded and what

fraction is unfolded (i.e. XN and XU=1-XN). This requires experimental measurement of some property

that is sensitive to whether a molecule is folded or not. We will talk later about various kinds of

experiments that satisfy this requirement – the natural fluorescence of tryptophan tends to depend

on whether it is in a polar or nonpolar environment, so you can see how that might suffice – but for

now we will keep it abstract and just say that there is some property P that we can measure for a

sample, and that the value of P should change depending on what fraction of the protein is folded.

76

At this point you might feel like you have enough to figure out K. If your measurement tells you what

fraction is folded, then you know the equilibrium constant for folding is K=XN/XU. But, we have a

problem related to sensitivity. Even though the native state of a typical protein is not extremely

stable, it is usually stable enough that K=exp(-G0/RT) is a large number, meaning the fraction of the

protein that is unfolded is very small, in fact too small to measure accurately. Another way of seeing

this is that a practical

measurement is not going

to be able to tell the

difference between

whether 1 in 1,000

molecules are unfolded vs 1

in 1,000,000. The

difference in the signal

between those two cases

would be too small to

measure even though the

value of K would be

different by a factor of

1,000.

What is the solution to this

problem? In order to get at

the value of -G0, the

standard approach is to

artificially shift the system towards the unfolded state by adding a chemical denaturant like urea. If

we go to conditions where both the native and unfolded states are reasonably populated, then at that

point we can figure out what fraction of the protein is folded. This can be done by measuring the

value of our property P under those conditions and then comparing it to the values of the property

under conditions where the protein is fully folded and where it is totally unfolded. The algebra for

this is shown in the figure. Intuitively you can see how the procedure makes sense. If you measure

the property under some amount of denaturant and you see that the value of the property is exactly

halfway between the value you get for fully folded and fully unfolded, then the sample must be half

folded and half unfolded, i.e. K=1. The calculation for an

arbitrary degree of folding is only slightly more complex.

We see now that under conditions where both forms of

the protein are populated we can calculate K and then

G0. But that experiment would just tell us what G0 was

under conditions where we had added denaturant to

destabilize the protein. What good is that? The answer

is that if we repeat the experiment at several different

denaturant concentrations, we should be able to

calculate G0 as a function of denaturant concentration.

77

Then, if we believe a simplistic theory that argues that G0 depends on denaturant concentration in

a roughly linear fashion, then we should be able to estimate G0 in the absence of denaturant, by

extrapolation. This final step of the analysis is illustrated above.

Ideas Related to How Proteins Reach their Folded Configurations

Our discussions up to this point have been only about the initial (unfolded) and final (natively folded)

protein. How or why a protein finds its correctly folded configuration is another question, and one

that has occupied protein scientists for the last half-century.

In 1961 Christian Anfinsen performed seminal experiments showing that the enzyme ribonuclease

A (RNaseA) could be unfolded and then refolded after removal of the denaturant. This showed that

the native three-dimensional structure of the protein is encoded in the linear amino acid sequence.

This seems a bit obvious decades after the fact, but the demonstration that the protein could find its

correct structure outside the cell, without other influences, was an important conceptual advance.

Anfinsen asserted that this meant that the amino acid sequence encoded the correct three-

dimensional structure by having the correct three-dimensional structure be the lowest possible

energy. That idea is known as the “Thermodynamic Hypothesis”. Some 60 years after Anfinsen, we

understand that the situation is rather more complex.

In 1969 Cyrus Levinthal formalized an argument that the number of possible configurations a protein

could conceivably adopt is vastly greater than could ever be sampled by a protein molecule wiggling

around in solution in a reasonable time. Yet most proteins fold on the time scale of seconds or faster.

His calculation was something like this. Consider a protein with 200 amino acids. Assume that only

three different phi-psi backbone configurations need to be sampled at each amino acid position –

based on the idea of choosing between helix or beta or random loop – which is clearly an

underestimate. And suppose that a protein can sample a new configuration at a speed that is limited

by molecular vibrations (from quantum mechanics, kBT/h ≈ 1013/sec).

The time required to sample 3200 conformation would be 3200/1013 >> age of the universe. How can

the Thermodynamic Hypothesis make sense if there isn’t any way a protein molecule could search

the space of all possible configurations in order to end up at the lowest energy configuration. This is

known as the ‘Levinthal paradox’.

The Levinthal paradox has motivated decades of research on protein folding. Work in the 1980’s,

especially by Robert (Buzz) Baldwin and Peter Kim, focused on the idea of specific ‘pathways’ that

might guide a protein from its unfolded state to its native state. For example:

U I1 I2 N

78

where I1 and I2, etc, are well-defined intermediates that

would be populated on the way to the native state. Work

along this line involved a search for cases where well

defined intermediates could be detected.

How can one differentiate between a process that occurs

as a two-state transition (without any populated

intermediates) vs. a process with populated

intermediates? There are multiple distinctions. One has

to do with kinetic behavior. We will discuss such topics

later, but for the moment we will just say that a single step

(or two-state) transition gives a simple exponential

approach to the final equilibrium position, whereas a

process with multiple transitions can give more

complicated kinetics, including a lag phase, as

diagrammed here. In a few cases, experiments have

identified specific protein folding intermediates, but they

have not emerged as a general feature of protein folding.

Other ideas have developed to advance our

understanding.

‘Energy landscape’ theories

were developed (by Peter

Wolynes and Ken Dill and

others), with the main idea

that the multi-dimensional

energy landscape surface

for proteins must be

smooth and funneled vs.

rugged. Figures like the

one drawn here illustrate

the basic idea of a good vs

bad energy landscape for

rapid folding. If the energy

landscape is rugged, then the folding process is likely to get trapped in a local minimum. The idea of

a smoothly funneled landscape also lifts the requirement for pathway intermediates. Instead, all

downhill routes lead to the native state. Under this idea, evolution would have selected amino acid

sequences and structures whose energy landscapes were favorable.

Other ideas related to protein folding

Current ideas like energy landscape theory offer a good framework for understanding protein

folding. But important questions remain. For example, it turns out that many proteins do not fold

79

spontaneously to their native states either in vitro or in vivo. Many proteins in the cell rely on

sophisticated protein machinery known as molecular chaperones, which consume ATP to help

proteins reach their correct configurations. Apparently those proteins either do not have smoothly

funneled energy landscapes, or perhaps their native states are not the state of lowest energy. Another

wrinkle is that at high concentration and given sufficient time (or partially destabilizing conditions),

many proteins adopt an alternate beta-rich conformation and then aggregate into ‘amyloid’ fibrils.

Amyloid formation is suspected as the basis for a growing number of diseases, from Alzheimer’s to

Parkinson’s to Lou Gherig’s. Does this mean that the lowest energy configuration for some proteins

is not the natively folded state seen in the cell, but the amyloid fibril state instead? Finally, there are

some rare proteins that have extremely peculiar folded structures in which the protein backbone is

tied in a knot! How do those proteins reach their native states? The energy landscapes in those case

would seem to be rather complex and require traversal of narrow valleys to reach the native state.

The points above are of fundamental interest in biology, but they also have potentially important

practical implications. Much work has been done in the last few decades (and some notable progress

has been made) on the problem of predicting the three dimensional structures of proteins from their

amino acid sequences alone. What does it mean for those efforts if proteins might have lower energy

configurations than their native states? As you can see, the area of protein folding remains rich with

open questions.

80

CHAPTER 8

Describing the Shape Properties of Molecules

Some of our previous discussions have introduced the idea that shape is an important consideration

for the behavior and function of macromolecules. Later in the course we will talk about techniques

for determining the three-dimensional structures of macromolecules in atomic level detail. But now

we will discuss more simplified descriptions of shape that are sometimes obtained from biophysical

measurements in the laboratory.

Radius of gyration

Often we have an object or molecules whose shape is reasonably compact but it is not really a sphere.

How might we assign a single size scale that would describe such an object, in the same way that a

radius describes the size of a sphere? The ‘radius of gyration’ (RG) provides this. It is essentially an

average radius, but more accurately it is an ‘rms’ or root-mean-square radius. The general meaning

of root-mean-square is, taking the monikers in reverse order: square the quantities, then average

them, then take the square root. You’re undoubtedly familiar with this in the context of rms deviation

from the mean (of test scores for instance). For the radius of gyration, two general cases arise: (1) a

collection of discrete points or atoms (like you would have once the detailed structure of protein is

known, for example), and (2) a continuous shape defined by a boundary (like an ellipsoid for

example). We will handle the discrete case first.

Discrete objects

Assuming that the points that make up the object should all be given equal weight, the formula for

radius of gyration is:

𝑅𝐺 = (∑ 𝑟𝑖

2𝑁𝑖=1

𝑁⁄ )

12⁄

Here and in the later equations for the

continuous case, it is vital to note that the radius

of each point, ri, is its distance to the center of

mass of the object. An entirely different and

incorrect result will be obtained if the center is

not defined correctly. A simple example for an

object composed of 5 points arranged like on

the face of a die is shown. This example

happens to be two-dimensional, but the

situation is equivalent in three-dimensions. As

reminders, the distance between two points in

81

three-dimensional space is r=sqrt((x)2+(y)2+(z)2). And the center of mass of a collection of

points is obtained simply by averaging their x, y, and z coordinates separately.

You can see that this is a simple procedure, so calculating the radius of gyration given the atomic

coordinates of any molecules, large or small, is straightforward.

Objects with continuous shapes

For objects that have a continuous shape, the summation in the equation for radius of gyration must

be replaced by an integral, and the division by the number of points must be replaced by division by

the volume. Before doing that, we’ll just point out the one shape where the radius of gyration requires

no calculation.

Spherical shell (not to be confused with a solid sphere):

Every point on a spherical shell has the same radius, r. So the radius of

gyration, RG=r.

Solid sphere:

The general equation for radius of gyration for a continuous object is:

𝑅𝐺 = (∫ 𝑟2𝑑𝑉𝑉

∫ 𝑑𝑉𝑉

⁄ )

12⁄

= (∫ 𝑟2𝑑𝑉𝑉

𝑉⁄ )

12⁄

For a solid sphere, the simplest way to integrate over the whole volume

is in a series of infinitesimally thin shells of radius r and thickness dr.

The differential volume of that infinitesimally thin shell is dV=4r2dr.

The integral above becomes:

𝑅𝐺,𝑠𝑝ℎ𝑒𝑟𝑒 = (∫ 𝑟2𝑑𝑉𝑉

𝑉⁄ )

12⁄

= (∫ 4𝜋𝑟2𝑉

𝑟2𝑑𝑟𝑉

⁄ )

12⁄

= (∫ 4𝜋𝑟4𝑉

𝑑𝑟43𝜋𝑟3

⁄ )

12⁄

= (3

5𝑟2)

12⁄

= √3 5⁄ 𝑟

Note that, as expected, the radius of gyration of a solid sphere is less than the outer radius, since the

points belonging to the sphere are all at a distance less than or equal to r.

82

Ellipsoid:

Instead of resorting to some horrible integrals, we will

solve this by geometric reasoning. An ellipsoid is really

just a stretched out sphere. So let’s begin with a unit

sphere (r=1), decompose its behavior into x, y, and z

components, and then see what happens when we

stretch it out. For a solid unit sphere (r=1), from above

we know that the average value of r2 is (3/5). But for a

sphere, the x, y, and z behaviors must be the same, so that means the average value of x2, y2, and z2,

must all be the same, and because r2 = x2 + y2 + z2, we can conclude that the average values of x2, y2,

and z2 are all 1/5 for a solid unit sphere. Now if we stretch the unit sphere along the x axis alone, so

that its axial radius along x is now a instead of 1, then the average value of x2 must be (1/5)a2. There

would be no change along y or z. Then stretching by b along y and c along z, we conclude that the

average value of y2 is (1/5)b2 and z2 is (1/5)c2. Putting it back together, the average value of r2 would

be (1/5)(a2+b2+c2). [Note that this gives the expected expression for a sphere if a=b=c=r.]

Non-spherical objects have higher radii of gyration

Now let’s compare the radius of gyration for a sphere and an ellipsoid that have the same volume.

Suppose the sphere has radius 10. Its radius of gyration would be sqrt(3/5)*10 = 7.74. Now suppose

the ellipsoid has principle axes of 5, 10, and 20 (the volume of an ellipsoid is (4/3)abc, which would

be the same volume as the sphere. Its radius of gyration would be sqrt((1/5)(52+102+202)), which is

10.2.

The important thing to note here is the radius of gyration is greater for the ellipsoid compared to the

sphere. This is just one specific case, but it is a completely general conclusion that a sphere has the

lowest possible radius of gyration compared to any other possible shape of the same volume. The

significance is that if an experimental study gives us a radius of gyration of a molecule whose mass

(and therefore volume) we know, and that radius of gyration is larger than we would have expected

for a molecule of the known volume if it was a sphere, then we have established something about the

molecule’s shape: i.e. that it is nonspherical, or elongated.

The behavior of flexible polymer chains or filamentous assemblies of protein subunits

Above we discussed a way of looking at relatively compact objects. Now let’s look at the behavior of

objects that are so much longer than they are wide that they are flexible and we can analyze them by

thinking about the path their backbone takes in terms of a random process. Theories for analyzing

molecules in this way were developed by polymer chemists dating back to the 1950’s (see Paul Flory),

but biochemical systems are rich with examples that fit the description as well: long molecules of

DNA or RNA, unfolded protein molecules, and (on a longer scale) noncovalent polymers formed by

the end-to-end assembly of protein subunits, as in F-actin.

83

Persistence length

If we have to pick a single parameter that would be useful for describing the behavior of a flexible

chain, it would be a measure of its flexibility, or to be more precise, a measure of the length scale over

which it appears to be flexible (any curve seems straight if you examine a small enough length). One

specific measure of stiffness is called the persistence length. Roughly speaking, it is a measure of how

far a curve tends to proceed in the direction it started before random curvature renders its progress

in the original direction negligible. Of course to extract such a value from a curve requires repeating

the evaluation of how far it extends from many different starting points on the curve. The plot below

conveys the essence of the persistence length. Clearly, a stiffer polymer has a greater persistence

length.

Approximate persistence lengths for some biological molecules are given below. These are

approximations, and the persistence lengths of nucleic acids in particular are rather strongly

dependent on conditions like salt concentrations. The stiffness of a biological polymer often has

important implications for how it behaves. Note for instance how the exceptional stiffness of

microtubules means that they are nearly perfectly rigid over the length scale of a cell, which is

obviously important for their function in mechanical division of the cell and transport of molecular

cargo across long distances.

Polymer Persistence length DNA (double stranded) 500 Å RNA (double stranded) 800 Å F-actin filament 5 um

(< eukaryotic cell) microtubule 5 mm

(>> cell)

84

(Jointed) Random walk models

The diagrams above were based on smooth worm-like curves. A different kind of model, slightly less

realistic but mathematically more generalizable, is often used to treat problems of this type. In the

random walk model, a chain travels

straight in some direction for a distance

b (the statistical Kuhn length). Then it

takes a turn in a random direction, and

so on.

The mathematical treatment is fairly

straightforward:

N = # of steps

b = step length (Kuhn statistical length)

C = length of the curve (i.e. if it was

stretched out)

L = straight end-to-end distance

L of course would change with every random walk so we are really just interested in the average or

expected behavior of L. We can get the average behavior of L by treating it like a

vector, which is the sum of N smaller vectors, one for each step. Call the individual

step vectors li. Each one has length b and is in a random direction.

�⃗� = 𝑙1⃗⃗ + 𝑙2⃗⃗⃗ + ⋯+ 𝑙𝑁⃗⃗ ⃗

What is the expected value of |L|2? We can get the squared length of a vector by

taking a dot product of the vector with itself. Letting angle brackets denote the

average or expected value,

⟨|�⃗� |2⟩ = ⟨�⃗� ∙ �⃗� ⟩ = ⟨(𝑙1⃗⃗ + 𝑙2⃗⃗⃗ + ⋯+ 𝑙𝑁⃗⃗ ⃗) ∙ (𝑙1⃗⃗ + 𝑙2⃗⃗⃗ + ⋯+ 𝑙𝑁⃗⃗ ⃗)⟩

Now the expression on the right is a product of sums, which can be expanded to a sum of products,

N2 terms in all. For example, ⟨(𝑙1⃗⃗ ∙ 𝑙1⃗⃗ + 𝑙1⃗⃗ ∙ 𝑙2⃗⃗⃗ + ⋯+ 𝑙1⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗) + (𝑙2⃗⃗⃗ ∙ 𝑙1⃗⃗ + 𝑙2⃗⃗⃗ ∙ 𝑙2⃗⃗⃗ + ⋯+ 𝑙2⃗⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗) + ⋯ ⟩

We can move the brackets to the individual terms to give:

⟨|�⃗� |2⟩ = (⟨𝑙1⃗⃗ ∙ 𝑙1⃗⃗ ⟩ + ⟨𝑙1⃗⃗ ∙ 𝑙2⃗⃗⃗ ⟩ + ⋯+ ⟨𝑙1⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗⟩) + (⟨𝑙2⃗⃗⃗ ∙ 𝑙1⃗⃗ ⟩ + ⟨𝑙2⃗⃗⃗ ∙ 𝑙2⃗⃗⃗ ⟩ + ⋯+ ⟨𝑙2⃗⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗⟩) + ⋯

But this simplifies. The key is to realize that if you take the dot product of two vectors where the

angle between them is random, the expected value is 0. That means that among the N2 terms, they

85

all become zero except those representing a dot product between a little vector li and itself. There

are just N of those. And each term ⟨𝑙𝑖⃗⃗ ∙ 𝑙𝑖⃗⃗ ⟩ is just the squared length of the little vector, which is b2.

Therefore,

⟨|�⃗� |2⟩ = 𝑁𝑏2 and Lrms = N1/2b

This is a rather general result that applies not only to polymer behavior but to other kinds of physical

problems like diffusion that can be modeled as a random walk. The average (or rms) distance you

expect after taking N steps of length b is proportional to b, but is not proportional to N, but to the

square root of N.

Our reason for developing the random walk model was to use it to characterize the behavior of a

flexible chain. In this model you can see that the value of the step length (b) is going to be a

description of how stiff the polymer is. If you take a random walk with tiny steps, the path will have

the properties of a curve that is highly flexible, i.e. it will not extend very far from where it started.

How can the value of b be extracted from the behavior of a random walk path.

From before, the contour length C of the path is C=Nb. Substituting into our equation for L2, we see

that (dropping the vector notation)

<L2> = Nb2 = Cb

or

b = <L2>/C

Depending on the study, we may know the length of the polymer chain. For example if we know the

molecular weight of a large DNA molecule, and we know the molecular weight of one base pair, then

we know how many base pairs there are, and we know the spacing between base pairs in DNA is

about 3.4Å, so we can do the math and estimate C. Then, if we have a way of experimentally

measuring the average straight end-to-end distance L for the molecule, then we can get b directly.

We might get an estimate of end-to-end distance from some kind of spectroscopic experiment where

the ends of the molecule were labeled, or maybe we can visualize the molecule on an electron

microscopy grid and do a series of end-to-end measurements that way.

We can connect the random walk polymer model to our earlier topic of radius of gyration. We used

RG earlier as a way to characterize compact shapes, but it can be used to describe flexible structures

as well. The algebra is a bit messy so we will not work it out here, but it turns out that the radius of

gyration for a flexible chain is closely related to its expected end-to-end length. Specifically,

<RG2> = <L2>/6

86

That means if we can do an experiment that gives us a value for the radius of gyration for a polymer

chain, then we can estimate b by substituting into the previous equation. This is useful because there

are in fact biophysical experiments that give values for the radius of gyration. Dropping the brackets,

under the assumption that an actual experiment to measure the radius of gyration in solution would

give us a time average,

b = 6RG2/C

Finally, we need to reconcile the two models we developed here, the smooth worm-like chain and the

jointed random walk. It can be shown mathematically that the relationship between the two models

is that the statistical Kuhn length b is twice the persistence length a. That is, b=2a. The scientific

literature generally reports persistence lengths, so if an experiment is interpreted in terms of the

jointed random walk model to give b, then the persistence length a=b/2.

87

CHAPTER 9

A Brief Introduction to Statistical Mechanics for Macromolecules

Complex systems of biological molecules are often characterized by many different configurations,

which may all be at equilibrium. Handling such systems and predicting their behavior can be

simplified with an appropriate mathematical framework. The term ‘statistical mechanics’ is often

used to describe such treatments.

Probabilities and expected values

We begin with some examples involving familiar phenomena. Consider a fair die (singular for dice).

In a single roll, there are six possible outcomes, all with equal probabilities: P(1) = P(2) = … = P(6)

=1/6

What is the average or ‘expected number’ of dots that will show up in a roll of the die? The average

of 1 through 6 is 3.5, so we can correctly deduce that the expected value is 3.5. A table makes the

case more explicit.

i P(i) P(i)*(# of dots) 1 1/6 1/6 * 1 =1/6 2 1/6 1/6 * 2 = 2/6 3 1/6 1/6 * 3 = 3/6 4 1/6 1/6 * 4 = 4/6 5 1/6 1/6 * 5 = 5/6 6 1/6 1/6 * 6 = 1 P(i)=1 (P(i)*(#dots))=3.5

Formulating the problem this way illustrates a powerful and general point:

⟨property⟩ = ∑ (𝑃(𝑖) ∗ property𝑖)

outcomes,𝑖

where <property> denotes the average value of some property of interest for the system, P(i) denotes

the probability of configuration or outcome i, and propertyi denotes the value of the property for

outcome i. For the problem above, the property of interest is the number of dots showing.

The case above was trivial, but practically any property that one can construct can be evaluate this

way. As a non-obvious example, suppose you need to know what the expected value is for the square

of the number of dots that shows up on the die. You might want to know this if someone offered to

roll a die and pay you $1 for rolling a 1, $4 for rolling a 2, $9 for rolling a 3, and so on, and asked you

88

how much you would be willing to pay to play this game of chance. The answer is easy to obtain from

the equation above.

<dots2> = (1/6)*12 + (1/6)* 22 + (1/6)*32 + … + (1/6)*62 =15.1

So, paying anything less than $15.10 is a favorable bet for you.

Statistical weights for outcomes with unequal probabilities

In order to handle problems of real interest, for example where different molecular arrangements

have different energies and therefore different probabilities, it is convenient to introduce a scheme

for handling unequal probabilities. The main equation above is suitable for unequal probabilities,

but sometimes the probabilities of the different outcomes are not given directly. Instead, we are

often given relative probabilities between different outcomes – think about the meaning of an

equilibrium constant for example. Relative probabilities are sometimes referred to as statistical

weights, and denoted wi.

For an application, let’s consider the case of a strange die whose outcomes are not equally likely, but

instead the probability of rolling any given number of dots is twice as high as the probability of rolling

one fewer dots. In other words, consider the case where P(i+1) = 2*P(i). We can work out the

behavior of this system as before, but starting with relative probabilities or statistical weights, and

then converting them to individual probabilities according to the equation, P(i)= wi/wi .

i wi P(i) P(i)*(# of dots) 1 1 1/63 1/63 * 1 =1/63 2 2 2/63 2/63 * 2 = 4/63 3 4 P(i)= wi/wi 4/63 4/63 * 3 = 12/63 4 8 8/63 8/63 * 4 = 32/63 5 16 16/63 16/63 * 5 = 80/63 6 32 32/63 32/63 * 6 = 192/63 wi=63 P(i)=1 (P(i)*(#dots)) = 5.1

Above we worked out the problem by converting weights explicitly to probabilities, but the

formulation can also be written directly in terms of the weights without explicitly writing out

probabilities. By substituting P(i)= wi/wi into the equation above for the average value of some

property, we get

⟨property⟩ =∑ (𝑤𝑖 ∗ property𝑖)𝑖

∑ (𝑤𝑖)𝑖⁄

Applied to the die problem above, where the property of interest is again the number of dots showing,

this equation gives <# of dots> = (1*1 + 2*2 + 4*3 + 8*4 + 16*5 + 32*6)/(1+2+4+8+16+32) = 5.1, in

agreement with the value in the Table above. Note that in setting up the statistical weights, one is

free to choose the first configuration (or any other) as a reference whose statistical weight is set to 1.

89

It would be slightly less convenient numerically, but one could set the weight to be 1 for one of the

other configurations and obtain the same answer in the end, as long as the correct relative weights

get assigned.

As a final example with the die, to show how simple it is to evaluate any property as long as it can be

evaluated for each outcome, consider the strange die from before, where P(i+1) = 2*P(i), and evaluate

the expected value for the squared number of dots that would show up in one roll. Without needing

to construct a table,

<dots2> = (1*12 + 2*2*2 + 4*32 + 8*42 + 16*52 + 32*62)/(1+2+4+8+16+32) = 27.4

So you wouldn’t want to pay more than $27.40 to play a game where this strange die is rolled and

you get payed the square of the number of dots showing.

Handling degeneracies

For a complete treatment we need just one more element. There are often systems where a single

kind of state can be obtained in several different ways. We saw this type of situation earlier in the

course when we were evaluating the number of distinct ways that a particular state could be

constructed (e.g. by exchanging the identities of molecules of like type). The same situation arises

here. We assign the variable gi to the degeneracy of arrangement i. With that adjustment, the two

main equations above can be re-written, replacing wi everywhere it appeared with wigi, to give these

two general equations:

𝑃(𝑖) =𝑤𝑖𝑔𝑖

∑ (𝑤𝑗𝑔𝑗)𝑗⁄

and

⟨property⟩ =∑ (𝑤𝑖𝑔𝑖 ∗ property𝑖)𝑖

∑ (𝑤𝑖𝑔𝑖)𝑖⁄

As we will see later, for molecular systems the weights relate to equilibrium constants between

different molecular configurations, which are in turn related to the energies of the configurations (i.e.

by exponentiating the negated energies after dividing by kT or RT, as usual). The denominator in the

equation above therefore takes the form of a sum over Boltzmann-like terms for the different

configurations. That summation has a special role in statistical mechanics applications and is

sometimes referred to as the partition function, and sometimes replaced with the notation Q or Z,

depending on the text or context. The student is referred to texts on statistical mechanics for a

treatment of how the dependence of the partition function (on temperature for example) can make

it possible to evaluate thermodynamic state variables for a system.

We will turn instead to see how the equations above can be applied to evaluate the behavior of

various physical properties of complex biological molecules.

90

A Statistical Mechanics Treatment of the Helix-Coil Transition for a

Polypeptide

A polypeptide that has a tendency to fold up into a single alpha helix serves as a classic example of a

system with a series of possible conformations ranging from a fully unfolded ‘random coil’ to a fully

formed alpha helix. Early treatments of this systems came in the late 1950’s by Bruno Zimm and J.K.

Bragg and are sometimes referred to as Zimm-Bragg models. Here we consider a simplified version

sometimes referred to as a zipper model. Along a single pathway of conformational transitions from

coil to helix, a first turn of helix forms when around 4 amino acid residues come into the right

conformation to form a backbone hydrogen bond (i to i+4) characteristic of an alpha helix. From

there the helix can propagate by extension, in a step-by-step addition of more amino acid residues to

the helix, eventually reaching the fully helical form. A cartoon diagram is below.

The propagation parameter s,

describes an equilibrium

constant for adding another

residue to the helical

conformation, leading to one

more hydrogen bond. But a

key element of the model is

that the first step is different.

In order to form the first

hydrogen bond, several

amino acid residues must all adopt a specific conformation. That comes at an entropic cost, which

contributes an extra (opposing) term to the equilibrium for the first step. That effect is described by

the nucleation parameter, . The values of s and depend on the amino acid type that comprises the

polypeptide. But for a typical case of interest, s is slightly greater than 1 and is much smaller than

1.

To work out a statistical mechanics treatment it is convenient to re-draw the system symbolically,

using a ‘C’ to denote an amino acid in the random coil conformation and ‘H’ to denote an amino acid

in the helix conformation, as shown below. From this diagram we can see how we might assign

statistical weights and degeneracies to the configurations. For the statistical weights, we can begin

by assigning 1 to the first conformation as a reference. Then the statistical weights for the other

forms can be assigned by taking into account the equilibrium constants for each step in a cumulative

fashion. Recall that for a multi-step equilibrium, the total equilibrium constant between two species

is the product of the equilibrium constants for the separate steps between them.

91

The degeneracies in this

problem relate to how

many locations can be

chosen for the helical

region. [In our simplified

model only one helical

segment is allowed in the

polypetide.] For the

unfolded form, the entire

polypeptide is in the ‘C’

conformation, so there

are no choices to be made

and therefore no

associated degeneracy

(g=1). When we

introduce a segment of 4

amino acids to nucleate

the first helical turn, then

we have a choice for the

location of that segment.

The total number of distinct places where that segment can be chosen is N-3, where N is the total

number of amino acids, so the degeneracy g for that conformation is N-3. [This can be seen by noting

that if N was 4, there would be only one choice (consistent with N-3) for the location of the segment,

if N was 5 there would be 2 choices (again consistent with N-3), and so on.] Another way of

understanding the meaning of the degeneracies is to realize that the specific drawing provided is just

one representation of the multiple (N-3) different configurations that could have been drawn having

a 4-residue segment in the helical conformation. As we move further to the right along the multi-step

equilibrium, the degeneracy drops by one in each step, as there are fewer and fewer different choices

for selecting a longer helical segment, until at the end there is no choice and the degeneracy is back

to 1.

Once we have the weights and degeneracies, we can calculate the behavior of a system. Here we

might want to evaluate the overall degree of helical folding in the polypeptide. Some molecules in

the system will be less helical and some will be more helical, but we can evaluate the average, and we

can also see which forms are more or less populated. The figure below shows how this kind of

calculation can be done easily with a spreadsheet (like in Excel), where the weights and degeneracies

can be filled in, the individual probabilities of the states can be evaluated, and the distribution of

states can be plotted. The figure below illustrates a case were N=40, =0.01, and s=1.1.

92

Looking at the probability distribution of the conformations, we see that compared to the fully

unfolded state, the state with just one helical turn is poorly populated, but as we move further to the

right it is increasingly probable to find states with more helical character. In other words, it is hard

to begin the process of folding, but easier to continue it once it has begun. There is also a peculiar

behavior towards the very right, where we see a drop in the likelihood of finding fully folded states.

Mathematically, this comes from the lower degeneracies for the fully folded states. In terms of

structure, the consequence is that the ends of the polypeptide tend statistically to be unfolded or

floppy.

Setting aside the peculiar behavior at the far right of the diagram, the rest of the picture exhibits the

general property of being hard to begin and easier to continue, which is a hallmark of cooperative

processes. Another common property of cooperative systems is a tendency to show a sudden or

93

steep response to changes in certain

parameters of a system, like

concentration or temperature for

examples. We can look at the behavior

of the zipper model above as a

function of the value of the

propagation parameter s. In reality s

might depend on temperature, so the

dependence on s could also be an

illustration of dependence on T for

example. From our equations above

you can see that it is a relatively simple

matter to evaluate the average

number of hydrogen bonds in our

system if we are given N, and s. We

can convert this to a fractional degree

of helicity by dividing the average

number of hydrogen bonds by the

maximum number possible (N-3).

Repeating the calculation of fractional

helicity as a function of s (keeping

N=40 and =0.01) gives the behavior

shown, where the dependence on s shows a relatively sharp transition.

Our calculations on this model illustrate the power of statistical mechanics approaches to

characterize the behavior of complex systems. The specific behavior of this system also illustrates

general themes that underlie macromolecules and biochemical systems, in particular the appearance

of cooperative phenomena, sharp transitions, and a high sensitivity to physical parameters.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.8 0.9 1 1.1 1.2 1.3 1.4

frac

tio

nal

hel

icit

y

s

helix-coil zipper model - helicity as a

function of s (N=40, =0.01)

94

CHAPTER 10

Cooperative Phenomena and Protein-Ligand Binding

Relationship between cooperative behavior and processes involving formation of multiple

interactions simultaneously

Consider the behavior of a reaction at equilibrium wherein n molecules of A come together to form

an assembly B (and where no intermediates with fewer than n units of A are allowed).

nA B

What does the concentration of B look

like as a function of increasing

concentration of A? This is easy to

evaluate from K = CB,eq/(CA,eq)n, and CB,eq

= K * (CA,eq)n. Evidently, the

concentration of B depends on A

according to a simple polynomial whose

exponent is the stoichiometry of the

association, n. This leads to a surprising

interpretation when one looks at how a

polynomial term behaves with

increasing exponent, n. This is shown

below, where for simplicity the

equilibrium constant K is taken as 1 for

all cases.

The result is remarkable. For large n

(emphasized in red in the figure), you

see an effective maximum value of [A],

after which further addition of molecule

A would lead suddenly to formation of B

(in order to avoid a concentration of A exceeding its effective upper limit). B is effectively absent at

lower concentrations of A. Not surprisingly, this resembles the sort of thing you expect for a curve of

precipitation as a function of concentration; at the solubility limit of the substance, further addition

of that component to solution leads only to a solid-state form of the solute molecule.

Another interesting comparison is to micelle formation by an amphiphilic detergent. There, many

copies of the detergent molecule come together in the form of a micelle, where the non-polar lipid

95

tails of the detergent molecules project into an

entirely hydrophobic core, with the polar or

charged head groups of the lipids exposed on the

surface of a roughly spherical supramolecular

assembly.

Since only complete micelles fully shield the

hydrophobic tails, partial micelles are hardly

populated. This corresponds to a very high level

of cooperativity for the assembly process. Much

like the case of aggregation or precipitation at the solubility limit, there is a limit to the concentration

of the detergent monomer, after which further added detergent leads only to the micelle form. In

analogy to the polynomial plots above, plotting the concentration values where the detergent

monomer would be at equilibrium with micelles would give a curve with a sharp break, as shown.

The behavior can also be plotted a different way, with the x-axis representing the total detergent

added to the system (which is a convenient independently controllable variable), and separate curves

shown for the concentrations of the monomer and the micelle. That scheme is also shown.

Both plots indicate the key concentration above which the monomer cannot be taken. That is called

the critical micelle concentration (or CMC) and is a particular property of a detergent; it depends on

tail length, number of tails, size and charge repulsion of the head, etc. The overarching idea is that

high-order transitions tend to give rise to sharp transitions, reminiscent in some ways of typical

phase transitions.

96

Protein-ligand binding equilibria

The binding of ligands (substrates, cofactors, inhibitors) to proteins (or nucleic acids) sometimes

shows cooperative behavior and sharper-than-usual transitions. This is usually seen in oligomeric

proteins or enzymes – the coordinated action of multiple subunits bound together makes the

cooperativity possible. The case of hemoglobin, with four heme groups and four protein subunits, is

well-known.

Before we tackle the case of cooperative binding by multiple binding sites in an oligomeric protein,

we will analyze the simple (non-cooperative) binding behavior of a single protein subunit (P) and its

ligand (A).

P + A PA

Note that the K here is an equilibrium association constant, not a dissociation constant, as is

sometime written for substrate dissociation in enzyme kinetics treatments.

We introduce a binding parameter, v, to describe the extent of binding, i.e. the average number of

ligands bound to any given protein molecule.

v = (# or concentration of bound ligands)/(# or concentration of protein molecules) (0 v 1).

At equilibrium, K = [PA]/([P][A]). Note that the [A] in this equilibrium equation is the concentration

of free A molecules, not the total A concentration, which would include ligands bound to the protein.

And [P] is the concentration of unbound protein. Taking the ratio of [PA] to [P] gives an expression

that is useful in later substitutions:

[PA]/[P] = K[A] (which is a unitless fraction).

From the definition of v, we can see that

v = [PA]/([P]+[PA])

Dividing through by [PA] gives

v = ([PA]/[PA]) / ([P]/[PA] + [PA]/[PA])

Then substituting from above,

v = 1/(1/(K[A]) + 1) = K[A]/(1 + K[A])

97

This gives the familiar hyperbolic curve for binding when the extent of binding is plotted versus free

ligand concentration. Binding is half-saturated when [A] = 1/K.

As you know, simple enzyme

kinetics share this behavior.

This is to be expected since, at

least in the simpler

mathematical treatments, the

catalytic event is preceded by a

binding event at equilibrium,

and the rate of the reaction is

proportional to the

concentration of the enzyme-

substrate complex, usually

denoted [ES]. In that case, the

reaction rate over the maximal

rate (at very high substrate), v/Vmax, is equal to [ES]/([E] + [ES]). By analogy to the equations above

for ligand binding (matching [E] with [P] and [ES] with [PA]), and using a dissociation constant Kd for

the enzyme case that would be the reciprocal of the binding association constant K, one gets v/Vmax

= ([S]/Kd)/(1 + ([S]/Kd)) = [S]/(Kd + [S]), or [S]/(Km + [S]) where the Michaelis-Menten constant Km

would be equal to Kd. The reaction rate is half its maximal value when [S] = Km. This behavior (which

should be relatively familiar) matches precisely what we’ve done here for ligand binding – only the

variable names have changed.

Binding to an oligomeric protein – independent binding events, no cooperativity

What about binding to an oligomeric protein with

multiple binding sites (e.g. one per subunit)?

Suppose the multiple sites are identical and

independent. If for the case of the oligomer we

express the binding parameter as the average

number of ligands bound per trimer, then v = (#

ligands at site 1 + # ligands at site 2 + # ligands at

site 3)/(# of protein trimers). The number of

ligands at site 1 per trimer would be the same as

above for binding to a monomer (K[A]/(1 + K[A]).

And the same for binding at site 2 and site 3. So, v = 3K[A]/(1 + K[A]). And v would be between 0

and 3. The behavior has the same character as before – e.g. hyperbolic saturation – all that is different

is the multiplicative factor of 3, which arises simply because we’re expressing the binding per trimer

instead of per monomer. This is the expected behavior for binding to multiple identical and

independent sites. It generalizes readily to n sites:

98

v = nK[A]/(1+ K[A]) (0 v n)

Extracting K (and n) from equilibrium binding measurements

You’ll remember from earlier coursework in enzyme kinetics that, for extracting parameters from

graphs, it can be convenient to do algebraic rearrangements so the hyperbolic function becomes

linear in some variables. Analysis of binding data is simplified by using a rearrangement to give a so-

called Scatchard plot. [The Scatchard plot closely resembles a rearrangement used sometimes in

enzyme kinetics, the Eadie-Hofstee plot – the two kinds of plots are related by exchanging x and y

axes]. In our case we begin with

v = nK[A]/(1 + K[A])

Then multiplying through by the denominator on the right gives

v + vK[A] = nK[A] then v = nK[A] – vK[A] and dividing by [A] gives

v/[A] =nK – vK

So, plotting v/[A] vs v should give a straight line with slope –K, and x-intercept equal to n (y-intercept

of nK), as shown on the plot on the left.

Sometimes it is difficult to obtain a value for v as we have expressed it here for binding to an oligomer,

since the oligomeric state of the protein may be unknown at the outset, preventing one from knowing

what the concentration of the protein is in terms of # of oligomers per volume. It is sometimes easier

to establish the fractional binding, f = v/n, from experimental measurements. For example, if binding

of a ligand causes some measurable change in the system – e.g. maybe the protein has a tryptophan

whose fluorescence changes when a ligand binds – then one can add a certain amount of ligand and

compare the change in the measured property to the maximum possible change (e.g. by adding excess

ligand to the point of saturation). The ratio of the change observed to the maximum possible change

would be a measure of the fractional binding, f. The algebra would be the same as above. Dividing

99

the final equation above by n on both sides would give f/[A] = K –fK. As shown in the plot on the

right, the analysis is the same, but no attempt can be made to establish the number of binding sites,

n.

Non-linear Scatchard plots – non-identical or non-independent binding sites

What might cause a non-linear Scatchard plot? If the protein sample is impure, it might contain

slightly different forms of the protein of interests – a mixture of phosphorylated vs non-

phosphorylated forms is just one example – whose binding affinities for a ligand might be different.

Or, perhaps the protein of interest really has multiple

distinct binding sites that have evolved to have different

affinities. A cartoon of a case with two binding sites of

one type and a single binding site of another type is

shown.

For a case where there are different types of binding sites, if the binding events are independent (not

cooperative), then the binding behavior is simply additive, with terms matching those from before:

𝑣 = ∑ (𝑛𝑖𝐾𝑖[𝐴]

(1 + 𝐾𝑖[𝐴])⁄ )

types, 𝑖

If the binding affinities (Ki) for the

different kinds of sites are not equal,

then the Scatchard plot cannot be

straight. Reasoning that the left side of

the curve in a Scatchard plot

corresponds to initial binding at low

ligand concentration to the highest

affinity sites, and noting that the slope

relates to the binding constant K, we

can see that the curve should be steeper

on the left, and therefore bent as shown.

If the affinities of two different kinds of

sites are different enough it may be

possible to extract separate binding constants from different parts of the curve. This may not be

possible for binding constants that are not so different from each other, and practically impossible if

there are more than two types of binding sites. In those case, if accurate data are recorded over a

wide range of ligand concentration it may be possible to analyze the detailed behavior using

sophisticated computer fitting software.

Our discussions above have all assumed that the binding events are independent – i.e. no

cooperativity arising from communication between sites. We will deal in more detail later with

100

cooperative binding, where binding of a first ligand promotes binding of subsequent ligands to other

sites in the same oligomer, but for now we can simply anticipate what effect that would have on a

Scatchard plot. The behavior would effectively be the reverse of the case above where we had sites

that were independent but naturally different in affinity. In that case we naturally tended to fill the

high affinity sites before the low affinity sites. But with cooperative binding, the first binding event

is harder and the later binding events are easier, which is the reverse. So, our Scatchard curve would

curve downward for cooperative binding, as shown on the left.

Whereas a straight

Scatchard plot

corresponded to an

ordinary hyperbolic

binding curve when

plotting v vs [A], a

downward curving

Scatchard plot of

the type shown

above, resulting

from cooperative

binding, would correspond to a sigmoidal shape if the binding data were plotted as v vs [A], as shown

on the right. We will discuss cooperative behavior more rigorously later.

Experiments for measuring binding

Classic method – Equilibrium dialysis

The classic method for studying binding equilibrium is

equilibrium dialysis. The protein is placed in a dialysis

bag that allows the ligand to cross but not large molecules

like the protein. Ligand is then added, which equilibrates

between the inside and the outside. Outside the bag, the

ligand exists only in its free form. Inside the bag, the

ligand exists in two forms: free and bound to the protein.

At equilibrium, the concentration of free ligand inside the

bag must equal the concentration of ligand outside the

bag. That means if you measure the concentration of total

ligand inside the bag, and then subtract the concentration

of ligand outside the bag, you have a measurement of the

concentration of the ligand in its bound form, that is [PA]. From

[A]total,inside = [A]free,inside + [PA]inside and [A]outside = [A]free,inside

101

[PA] = [A]total,inside – [A]outside

Then, the binding parameter v can be obtained by dividing [PA] by the total concentration of protein

that was placed inside the bag (or the concentration of protein oligomers if the oligomeric state of

the protein is already known).

A modern method – Isothermal titration Calorimetry (ITC)

This experiment is based on the

expectation that there will be some heat

(H) associated with the binding event.

The protein is held in a sample chamber

that is kept at constant temperature.

Ligand is added slowly in a series of

small increments. After each

incremental addition, the instrument

measures the heat transfer required to

keep the protein sample at a constant

temperature.

The amount of heat transferred during

equilibration at each step is plotted;

note that H is often < 0 for binding. A

typical readout looks something like

this:

As more and more ligand is added, the system begins to saturate. The individual peak areas

correspond to the amount of heat released and therefore to the amount of additional ligand that was

102

bound in that incremental step. Therefore, the total amount of ligand bound can be obtained by

accumulating the integrated peak heights. This leads to a more traditional looking binding plot.

Note that in some experiments, like ITC, it is easy to know the total amount of ligand present in the

system and harder to know the free amount – contrast that with the equilibrium dialysis experiment

where the free A concentration was evident from the concentration of A outside the bag. Not being

able to plot the free A concentration makes it a bit harder to analyze binding curves with the usual

tricks (like identifying the point of half saturation and estimating the Kd or 1/K from the free ligand

concentration at that point). Computer software is usually used to interpret the binding constant

(and whether multiple binding sites might be present) from ITC data.

Various spectrophotometric methods

As noted earlier, if there is some kind of spectroscopic experiment that gives a different reading for

the ligand-bound protein PA compared to the unbound protein P, then it is often possible to

determine what fraction of the total protein exists in the two forms; doing this as a function of ligand

concentration then enables determination of binding constants. The algebra is reminiscent of the

way we looked at measuring the extent of protein folding vs unfolding earlier.

If we let the variable P denote the value of some spectroscopic property – maybe the natural

tryptophan fluorescence of a protein if it is affected by ligand-binding – then assuming that

spectroscopic contributions are additive,

Pmeas = f * PPA + (1-f) * PP

where Pmeas is the value of some spectroscopic property measured after addition of some specific

amount of ligand, PPA is the value you expect to obtain for the protein in its bound form, and PP is the

value you expect for the unbound protein. As before, f is the fractional binding. Now, realizing that

PP and PPA can be obtained by doing the spectroscopic experiment with no ligand added and with

saturating ligand added, we can change the notation above to give:

Pmeas = f * Psaturating A + (1-f) * Pno A

which rearranges to give:

f = (Pmeas - Pno A) / (Psaturating A - Pno A)

This makes sense since it is really just a ratio between a partial change and the maximum possible

change.

103

Phenomenological treatment of cooperative binding- the Hill equation

We turn now to the case of cooperative binding to multiple sites, such as in an oligomeric protein

composed of several identical subunits.

The limiting case of perfect cooperativity

To establish the limiting case for cooperative behavior, we examine an idealized situation of “all –or-

none”, which essentially means perfect cooperativity. Either 0 or n ligands can be bound to a

particular oligomer.

P + nA PAn

Note that this formulation implies that there is no formation of partially bound forms, PA1, and so on.

As before, anticipating the usefulness of having a ratio of the bound form to the unbound form, we

can write K = [PAn]/([P][A]), and then

[PAn]/[P] = K[A]n

From the meaning of the binding parameter v as the number of ligands bound per oligomer, we know

that

𝑣 =𝑛[𝑃𝐴𝑛]

([𝑃] + [𝑃𝐴𝑛])⁄

With the same rearrangements as we used before for binding to a monomer – namely, dividing the

top and bottom by [PAn] and then substituting the term K[A]n for [PAn]/[P], we get

𝑣 = 𝑛𝐾[𝐴]𝑛

(1 + 𝐾[𝐴]𝑛)⁄

How does this binding curve behave as a function of ligand concentration? Clearly it begins at v=0

for [A]=0. And it saturates as expected, getting ever closer to 1 as [A] gets very large. In those ways

it is similar to the binding equation we developed for a monomer, which was v = 1/(1+K[A]). But the

key distinction is in the exponent applied to the ligand concentration, [A]. From our earlier

discussions you should appreciate the consequences of that exponent; it creates a sharper transition

in terms of [A]. As a result, the binding curve will not be simply hyperbolic like before, but there will

be a region where the curve exhibits steeper behavior. In other words, we get a sigmoidal curve of

the type you’ve seen before (probably in the context of cooperative oxygen binding to hemoglobin).

Binding curves calculated from the equation above for perfect cooperativity are shown here (taking

K=1 for convenience).

104

Realistic case – partial cooperativity

Real molecular binding processes are

never perfectly cooperative; their

behaviors fall in between the perfect case

and the simple case of independent

binding. By comparing the equations we

obtained for the two cases, you’ll notice

that the only difference is in the exponent

that gets assigned to the ligand

concentration. In the case of independent

binding events (or binding to a

monomer), the exponent was 1, whereas

in the case of perfect cooperativity and n

binding sites, the exponent was n.

From this comparison, A. V. Hill

generalized the binding equation to

intermediate cases where the cooperativity would not be perfect. The binding equation becomes

𝑣 = 𝑛𝐾[𝐴]𝑥

(1 + 𝐾[𝐴]𝑥)⁄ (1 x n is the allowable range for positive cooperativity)

where x is used as the exponent and is called the Hill coefficient. There is frankly no mathematic

justification for this equation. But what it does allow for is a way to compare observed binding

behavior to an equation where the exponent that gives the best fit to the data is some indication of

the degree of cooperativity. That is, if the observed binding data for a case where there are 4 binding

sites (n=4) is best matched by the Hill equation when x is chosen to be 2.8, then this gives you a sense

of the degree of cooperativity.

The standard treatment for analyzing observed binding data according to the Hill equation goes as

as follows. From the equation above, multiplying through by the right-side denominator, and then

subtracting the second term on the left from both sides gives

v/(n-v) = K[A]x

It can be more convenient at this point to switch to fractional binding, f = v/n, to give

v/(n-v) = (v/n)/(n/n – v/n) = f/(1-f) =K[A]x

105

Then taking logs on both sides gives

ln(f/(1-f)) = ln K + x ln[A]

Evidently, a plot of ln(f/(1-f)) vs ln[A] should have

a slope of x. When one plots real binding data this

way, the result is invariably a curve rather than a

line with constant slope. That illustrates a

weakness of the Hill equation – it is not founded

on any underlying physical model of binding.

Nonetheless, the slope of a Hill plot at its steepest

point remains a useful description of the degree

of cooperativity.

Physical models of cooperative binding - MWC

Monod, Wyman, and Changeaux (MWC) (among others) developed explicit models to explain how

cooperative behavior could emerge in biological systems. Their idea was simple and elegant, and the

underlying principles have turned out to be strongly supported by a wealth of detailed structural

investigations in diverse systems over several decades.

The key elements of the MWC model are as follows:

an oligomer, with each subunit having a binding site

(at least) two different conformations are possible for the protein subunit and its binding site,

and those alternate conformations have very different affinities for the ligand. [In the limiting

case, only one of the conformations can bind the ligand]. The high affinity binding site

conformation is designated R and the weak (or forbidden) binding site conformation is

designated T.

symmetry must be preserved, so that all the subunits in any one oligomer are all in the same

conformation.

No other assumptions are required. As you might recall from earlier studies, and from some of our

previous discussions, cooperativity invokes the idea of non-independent events, and is often

discussed in terms of ‘communication’ between different binding sites, i.e. binding at one site

promotes binding at the other sites. But you’ll note that those ideas are not explicit aspects of the

MWC model. Yet, the tenets of the model lead to that apparent behavior.

We can sketch out the behavior of a tetrameric system with 4 subunits. We will designate the R form

of the subunit (which has high affinity for the ligand) as a circle, and the T form as a square. We will

assume all the possible forms of the oligomer – in terms of conformation and ligand binding – are all

at equilibrium. But note that most of the possible forms of the oligomer are disallowed by the element

106

of the MWC model that requires all the subunits in one oligomer to have the same conformation (R

or T). That is, we don’t need to consider the R3T1 conformation, and so on. Only the symmetric forms

need to be written out. And if we take the limiting case where the T form cannot bind ligand at all,

then we are left with just a few configurations to consider. The R forms of the oligomer can be bound

to a number of ligands from 0 to 4, and the equilibrium between those forms is affected by the ligand

binding constant K (and the ligand concentration). And an equilibrium constant L must be written

to describe the relationship between the T form of the tetramer and the (unbound) R form of the

tetramer.

How can this scheme (which doesn’t explicitly invoke ‘communication’ between different binding

sites) give rise to cooperative behavior? We can get an understanding of this from two perspectives.

First, by mass action, addition of a ligand to one site in an oligomer drives the other subunits into the

high affinity configuration; that follows from the requirement stipulated in the model that symmetry

has to be preserved in an oligomer. It is the requirement of the subunits to adopt the same

conformation that gives the effect of communication between sites.

107

The other view of how the MWC model creates

cooperative behavior is statistical, relating to the

probability (or concentration) distribution of the

distinct forms of the protein. We can analyze the

situation above in terms of the R forms of the protein

binding ligands at the four sites totally independently,

plus the extra T form. For independent binding to the

R forms, the concentration ratio for incrementally

bound forms would go up by the same ratio in each

step. From there you can see that at some

concentration of ligand there will be substantially

more R4 than R3, and more R3 than R2, and so on,

meaning that, considering the R forms by themselves,

R4 will be the dominant form, with little R0, R1, R2, and

R3. But what about the T form. If the equilibrium constant L between the T0 form and the R0 form is

high enough, then T0 will be well-populated even if R0 is low. Now think about plotting the

distribution of the forms of the oligomer that have 0, 1, 2, 3, or 4 ligands bound; taking into account

that the concentration of oligomers with 0 ligands bound includes both R0 and T0. As you can see, the

distribution of the different forms is concentrated at the extremes, which is a hallmark of a

cooperative system.

Exactly what kind of behavior do models like MWC predict? Our statistical mechanics tools let us

answer that in straightforward fashion. We start with the case of n=2. But first a comment about

statistical weights. When we work out the statistical weights for a problem that involves binding, we

need to come up with a (unitless) ratio that relates the forms that arise by sequential addition of a

new ligand. From K= [PA]/([P][A]), we can see that the ratio of [PA]/[P] is the familiar term K[A]. It

is this term, and not the equilibrium constant by itself, that we need to multiply cumulatively in each

sequential step of our reaction. Our statistical mechanics terms are:

108

Note that the degeneracies have to account for the combinatorial ways for choosing which subunits

will have ligands bound. Now we can calculate the average number of ligands bound, which is v,

according to our familiar rules for evaluating the expected value of some property. We get

𝑣 =(𝐿 ∗ 1 ∗ 0) + (1 ∗ 1 ∗ 0) + (𝐾[𝐴] ∗ 2 ∗ 1) + ((𝐾[𝐴])2 ∗ 1 ∗ 2)

𝐿 + 1 + 2𝐾[𝐴] + (𝐾[𝐴])2

As an aside, you can show that if L=0 (meaning that we have removed the element of the model that

is critical for cooperativity, leaving only the R forms, which bind ligands at their sites independently),

the equation above reduces to the equation we developed earlier for binding to identical and

independent sites: v = 2K[A]/(1+K[A]), as it should.

Similar equations to the one above for n=2 can be developed for higher values of n, using the same

statistical mechanics treatment. With these equations, with judicious choices of K and L, one can get

binding behavior that exhibits the features we expect for cooperativity. Binding and Hill plots are

shown below, as calculated from the MWC model (with n=4) using the statistical mechanics approach

above.

109

Advantages of cooperative behavior

Why are these kinds of cooperative binding (and catalytic) phenomena useful or advantageous in

biological settings? For one, steep response curves generally allow for better, tighter control of a

system. In a non-cooperative, hyperbolic binding scenario, a certain fractional increase in the ‘input’

(i.e. ligand concentration) leads always to a smaller fractional change in the ‘output’ (i.e. binding). In

contrast, a sigmoidal curve allows a large change in response to a smaller fractional change (e.g. in

the ligand concentration). This feature is key to the ability of hemoglobin to have very different

affinities for oxygen (thereby allowing efficient uptake and release) at oxygen concentrations in the

lungs and in muscle tissue that are not very different. Similar advantages apply to cooperative

enzymes. Their activities can increase more substantially in response to smaller increases in the

concentration of

the substrate.

This allows for

tighter metabolic

control in a

system and also

enables more

‘on-off ‘type

signaling in the

cell.

110

Allostery

Roughly translated, allostery means ‘other spatial arrangement’. In the context of macromolecules,

it describes the general phenomenon wherein binding of one compound to a protein or nucleic acid

at one site affects the conformation elsewhere, with diverse consequences for activity. You may

remember the well-studied case of hemoglobin, where binding of effector molecules (including 2,3-

bisphosphoglycerate, CO2, and protons) affects the protein conformation some distance away where

oxygen binds. Similarly, effector molecules can bind to allosteric sites in enzymes and affect the

catalytic properties of the active site, which may be in a distant region of the protein. Or, binding of

effectors can control signaling pathways by affecting molecular recognition events. In some cases,

allosteric regulation occurs together with cooperative phenomena in an oligomeric protein (like

hemoglobin), but it can also occur in simpler scenarios in a single protein subunit. Allosteric

regulation is a deep subject with

diverse manifestations in

molecular biology, but a unifying

theme can be articulated in a

scheme where a protein has two

available conformations, and the

two conformations have different

affinities for the effector, as

shown.

If we say that the conformation on

the right has a higher affinity for

the effector than the

conformation on the left (i.e. K3 >

K1), then we must also conclude

that K4 > K2 (since the two

different routes from the top left

to the bottom right must give the

same total equilibrium constant,

meaning that K2*K3 = K1*K4). The interpretation of K4 > K2 is that binding of the effector shifts the

conformational equilibrium to the right. If K1 K3 and K2 K4, we would say that there is a

thermodynamic linkage between the effector binding and the conformational change. The shapes of

the two alternate conformations of the protein are drawn in this

diagram to emphasize that the two conformations may be different in

multiple ways, e.g. at multiple different locations. How this happens

depends on the detailed structure of the protein. But if you recognize

that proteins have a certain degree of structural rigidity, you can

imagine any number of different ways where the movement of atoms

at one location can propagate to another site. As just one example, if

the protein molecule is composed of two relatively rigid domains, but

111

the relative position of those domains can change, then the conformation at one location will be

coupled to the conformation elsewhere.

We can use our statistical mechanics framework to analyze the simple allosteric scheme above. We

might want to know what fraction of the total protein molecules would be in the conformation on the

right (with the pointed blnding cleft for the effector and the larger opening on the top surface), as a

function of the effector concentration. From our thermodynamic reasoning above, we can anticipate

that higher effector concentration will cause more of the protein to be in the conformation on the

right by mass action, but exactly how much? There are no degeneracies to worry about in this case,

and the weights follow from the equilibrium constants, and the effector concentration. With these

values, the fraction of the protein in

the conformation on the right (with

the top binding site more open)

would be:

(K2 + K1K4[E]) / (1 + K2 + K1[E] +

K1K4[E])

(equivalent expressions are

possible with K1K4 = K2K3). A plot of

this behavior is shown for judicious

choices of the equilibrium

constants.

Catalytic cycles linked to conformational changes linked to motion = molecular motors

Nature has evolved a wide range of molecular motors whose operation are beyond extraordinary.

They are essentially all based on some kind of allostery combined with cycles of catalysis. Bear in

mind that when the active site of an enzyme goes through a series of reactions, it goes through a

sequence of events where it is empty, then bound to substrate, then bound to product, then empty

again, and so on. If those binding states each favor different conformations of the protein, then

ongoing catalysis will drive the protein through a cyclical series of conformations. How such events

can be linked to larger scale movements of the type one would call a motor varies, but the classic case

it the rotary F1-ATPase. In a remarkable example of intuition and foresight, in the 1970’s Paul Boyer

(UCLA) predicted that ATPase would act like a cyclic motor using a ‘binding-change’ mechanism

based on careful biochemical experiments and an understanding that the trimeric assembly had

three active sites, but without the benefit of knowing the three-dimensional structure of ATPase. It

was almost 20 years afterwards that Andrew Leslie working in the laboratory of John Walker

determined the crystal structure of ATPase, revealing that indeed the head of the ATPase (composed

of the alpha and beta subunits) has the structure of a wheel that rotates on an axle formed by the

gamma subunit in an extended alpha helical structure. Together with Jens Skou (who discovered a

112

different ATP-driven ion transporter), Boyer and Walker shared the Nobel Prize in Physiology or

Medicine in 1997.

113

CHAPTER 11

Symmetry in Macromolecular Assemblies

Definition of Symmetry

Symmetry is an important subject in essentially all branches of science, and the arts as well. Loosely

defined, we think of something that is symmetric as being repetitive in some way, being composed of

multiple copies of an underlying subunit. In scientific applications, symmetry has a precise meaning.

An object is symmetric if there is some physical operation we can do to it that leaves it invariant (i.e.

indistinguishable from the way it appeared before). The operation in question is usually an isometry,

that is a physical movement in space that preserves distances. Those operations include rotations in

space and mirror inversions. However, since biological macromolecules are chiral and exist in just

one of two possible hands or enantiomers, for our purposes we can dispense with mirrors and

inversions (i.e. so-called ‘operations of the second kind’) and focus on rotations. We are lucky in that

regard as it leads to considerable restrictions upon an otherwise larger variety of symmetry types that exist in three-dimensions.

We will shortly work through all the

possible symmetries in three-dimensions,

but we start with one example here. The

assembly shown is comprised of three

copies of the same subunit rotated 120°

and 240° relative to each other. As you

can see, if we rotate the entire assembly

by 120°, the result is indistinguishable

from the initial configuration. In fact there are exactly three operations we can do to the assembly

that leave it invariant. They are: {Identity (i.e. 0° rotation), 120° rotation, 240° rotation}. The set of

operations that that leave an object invariant is a complete description of its symmetry. Sets of this

type obey special properties that make them examples of mathematical groups, which we discuss

next.

Mathematical Groups

In mathematics, a group is a set that, together with a defined binary operator, obeys a specific set of

rules. A binary operator is something that takes two elements as input and returns one element as

output. In regular arithmetic, addition and multiplication are examples of binary operators, but as

we shall see binary operators can take diverse forms.

The rules that must be obeyed for a set to be a group are as follows:

There must be an identity element (I) in the set, such that for every element A in the set,

I A = A I = A. [Here, the symbol is used to denote the general binary operator.]

114

For every elements A in the set, there must be an inverse element (denoted A-1), also

within the set, such that A A-1 = A-1 A = I.

The associative rule must apply: A (B C) = (A B) C for all elements in the set.

A closure rule must be satisfied so that the product of any two elements from the set

(including the product of an element with itself) must also belong to the set. That is, if A

and B belong to the set, then so must (C = A B) for all choices of A and B within the set.

The rules must all be satisfied for a set to constitute a group, but for our purposes the last rule is

especially illuminating.

Here are a few examples of groups relating to pure mathematics:

{1, -1} under ordinary multiplication

{integers} under ordinary addition

{1, i, -1, -i} under complex-valued multiplication

{[1 00 1

] , [0 −11 −1

] , [−1 1−1 0

]} under matrix multiplication

In each case you should be able to identify the identity element and also work out a multiplication-

type table. For the third example above, the table would be:

x 1 i -1 -i

1 1 i -1 -i

i i -1 -i 1

-1 -1 -i 1 i

-i -i 1 i -1

From our discussions above we can now see why the symmetry of an object obeys the properties of

a group. In particular we can see why a set composed of symmetry operations of an object must obey

the closure rule for a group; if operation A leaves the object indistinguishable, and the same is true

for operation B, then surely performing operation A followed by operation B must also comprise an

operation that leaves the object invariant. The symmetries obeyed by objects are therefore typically

referred to as symmetry groups. Next we will enumerate the possible symmetry groups for

assemblies of macromolecules.

Point Group Symmetries for Biological Assemblies

The prevalence of symmetry in natural proteins is impossible to miss. About 50% of all proteins that

have been purified and studied in the laboratory have been shown to be symmetric oligomers. In

this section we enumerate all the finite symmetry groups that are possible in three-dimensions; we

save for later a discussion of essentially infinite symmetry groups that characterize extended

assemblies like those in filamentous structures. Finite symmetry groups are referred to as point

group symmetries because the symmetry axes pass through a central point in the assembly. The 3-

115

D point group symmetries can be arranged, in order of increasing complexity, as cyclic, dihedral, and

cubic.

Cyclic Point Group Symmetries

Each of these symmetry groups is based on a single axis of rotational symmetry: two subunits in a

dimer, three subunits in a trimer, generalizing to n subunits in a cycle. The symmetry designations

are C2, C3, …Cn. [C1 would be the symmetry group for an object with no symmetry.] For C2, the axis

of symmetry corresponds to a 180° rotation. Because applying this operation twice (or two-fold)

returns one back to

the starting

orientation, that

symmetry element

is often referred to

as a “two-fold” axis

of symmetry.

Likewise, a

symmetry element

for 120° and 240°

(and of course 0°) rotations is referred to as a “three-fold” axis, and so on. In drawings, the rotational symmetry axes are denoted by symbols that match the order of their rotation (e.g. a small square

representing a 4-fold axis). In any symmetry

group, the number of elements in the group is

the same as the number of differently oriented

but otherwise identical subunits required to

construct the symmetry. For cyclic symmetry

groups, Cn, that number is n. Example drawings

are shown for the first few cyclic symmetries.

There is no theoretical limit to the value of n,

but the highest rotational symmetry for any

known protein assembly is 39, for a truly

extraordinary barrel-shaped protein chamber

known as the vault, which is present in

eukaryotic cells for an as-yet uncertain

function.

As you can see from the diagrams, a

fundamental point that arises from the

principles of symmetry is that the individual components (e.g. protein subunits) are all in identical

environments. No physical difference of any kind can be ascribed to the multiple copies of the

subunit.

An example of a pentameric protein obeying C5 symmetry is shown as a semi-transparent surface

over a ribbon diagram. The five copies of the subunit are shown in separate colors.

116

Dihedral Point Group Symmetries

Dihedral symmetry groups are somewhat more

complicated. They are essentially built by combining

two copies of a cyclically symmetric arrangement,

one flipped upside down on top of the other. As a

result, they sometimes resemble double ring

structures, but sometimes that feature is not so

evident, depending on the shape of the subunit and

its position relative to the symmetry axes. In

dihedral symmetry, there are multiple axes of

rotational symmetry, all passing through and hence

intersecting at the center of mass of the assembly.

Symmetry D4 is shown. As you can see, there is a

unique 4-fold axis of symmetry, along with four 2-

fold axes of symmetry, which all intersect the 4-fold

axes in a perpendicular fashion. If the unique 4-fold

axis is along the z-direction, then the four 2-fold axes

lie in the x-y plane, evenly spaced at 45° from each

other. Note that a rotation about the 4-fold axis exchanges subunits within the same ring, whereas the 2-fold axes exchange subunits between the two rings. As shown here, a convenient way to draw

dihedral symmetries is to base them on a prism of the appropriate symmetry; e.g. a square prism for

D4. For dihedral symmetry, the number of subunits (and distinct subunit orientations) is 2*n, where

n is the order of the unique axis of symmetry. The enzyme RuBisCO, argued to be the most abundant

enzyme on Earth, is an example of a protein assembly with D4 symmetry. Its subunit composition is

L8S8 (eight large subunits and eight small subunits). In order to most clearly illustrate the D4

symmetry, the arrangement of the large subunits in RuBisCO are shown here, with each subunit

colored differently, oriented in order to show views down the different symmetry axes.

Dihedral symmetries are possible from D2 to Dn for any n. Note that the case of D2 is somewhat

unique. In that case there is no single unique axis of highest order. Instead there are three 2-fold

axes all perpendicular to each other (e.g. along x, y, and z). And instead of a pair of ring structures

there is a pair of dimers; D2 symmetry is therefore sometimes referred to as a dimer-of-dimers. But

otherwise the situation is the same as for higher n. That is, there is still an axis with n-fold symmetry

(where n=2 for D2) combined with n evenly spaced 2-fold axes perpendicular to that axis.

117

Hemoglobin has subunit stoichiometry α 2β2, where the alpha and beta subunits are highly similar. If

they were identical, the four subunits in hemoglobin would be an example of D2 symmetry. D2 is a

very common symmetry for proteins; C2 is the most common.

Beyond the dihedral symmetries, there are just three cases of higher rotational symmetry groups in

three-dimensions. These are the cubic symmetries, discussed next.

Cubic Symmetries

The cubic symmetries are based on the Platonic solids and thus share their symmetry. There are

exactly five Platonic solids – their study dates back to the ancient Greek mathematicians – defined by

the requirement of having equivalent vertices, equivalent faces, and equivalent edges. They are the

regular tetrahedron, cube, octahedron, icosahedron, and dodecahedron. It turns out that two pairs

of these are intimately related to each other, sharing the same symmetry, so in fact there are really

just three symmetries represented by the five Platonic solids. Tabulating the numbers of faces,

vertices, and edges in the five Platonic solids illuminates the so=called ‘dual’ relationship between

the cube and the octahedron and between the icosahedron and the dodecahedron. Those pairs are

related to each other by exchange of faces for vertices and vice-versa. That is, if you place a point at

the center of each of the six faces of a cube, those points are the vertices of an octahedron. And

likewise, points at the centers of the eight faces of an octahedron produce the vertices of a cube. In the same way, the icosahedron and the dodecahedron are duals of each other. And, interestingly, the

tetrahedron is its own dual.

The Platonic solids are shown with

rotational symmetry axes indicated.

For simplicity, only one instance of

each axis type is shown on each figure.

Note how 2-fold symmetry axes pass

through opposing pairs of edges.

Symmetry axes passing through faces

must conform to the symmetry of the faces. And symmetry axes passing through vertices must

conform to the number of faces that meet at a vertex.

An assembly conforming to tetrahedral symmetry (T) can be constructed by placing three subunits

(or symbols) on each face in a symmetric arrangement, for a total of 12 subunits. Octahedral

Platonic solid

vertices faces edges symmetry

tetrahedron 4 4 6 T cube 8 6 12 O octahedron 6 8 12 O icosahedron 12 20 30 I dodecahedron 20 12 30 I

118

symmetry (O) can be constructed by placing 4 subunits on each square face of a cube, or three

subunits on each face of an octahedron, leading to 24 subunits in either case. Whether a real

assembly (e.g. of protein subunits) that obeys symmetry O looks more like a cube or an octahedron

typically depends on the situation and can be subjective. But the symmetry properties do not depend

on whether one thinks of the assembly as cube-like or octahedron-like. The situation is the same for

icosahedral symmetry I. Those cases can be drawn and visualized as either three subunits on 20

triangular faces or five subunits on 12 pentagonal faces, for a total of 60 subunits.

Schematic diagrams are drawn for assemblies in tetrahedral, octahedral and icosahedral symmetry.

Broken or pseudo symmetry

Many examples appear in nature where an assembly nearly has a higher symmetry, but owing to

subtle differences the symmetry is broken down to a lower symmetry group, which is a subgroup of

the higher symmetry group. Hemoglobin is the best-known example. As noted above, if the α and β

subunits were identical the symmetry would be D2. But because the two subunit types are slightly

different, the true symmetry is only C2; one might say the symmetry is pseudo-D2. The head of the

F1-ATPase motor is another well-known example. The subunit stoichiometry is (αβ)3. Again, the α

and β subunits are very similar, but only the beta subunits have active catalytic sites. The α and β

subunits alternate in a hexameric ring. The true symmetry is C3 (although really even the C3

symmetry is broken by conformational differences in the chemically identical subunits), while it is pseudo-C6. Note that C3 {0°, 120°, 240°} is a subset or subgroup of C6 {0°, 60°, 120°, 180°, 240°, 300°}.

Both of these examples presumably arose from gene duplication of a single ancestral protein subunit,

followed by divergent evolution to give slightly different sequences and structures.

Biological Considerations

Why are nearly all the oligomeric proteins found in nature symmetric? The short answer is that

symmetric arrangements are easier to build compared to non-symmetric arrangements. The key

distinction is that symmetric arrangements require the fewest number of distinct subunit interaction

types. That is illustrated below for a tetramer of four identical subunits. To create a C4 arrangement,

119

a single interface type (highlighted by the

red dot) is sufficient to hold the entire

assembly together. On the other hand, the

non-symmetric case has four distinct

interaction interfaces that are all

necessary. The question of how easy it is

for something to arise by chance is critical

in evolution, since natural selection can

only operate on phenotype outcomes that

are somehow sampled by random

incremental mutations. Interestingly, it

was articulated as early as 1956 by Crick and Watson in their early work on how virus capsids should

be assembled (which predated their better-known discovery of the structure of DNA) that symmetric

arrangements would be dominant in natural structures like viral capsids because of the fewer

number of contact types required.

Setting aside the issue of symmetry, why are so many proteins and enzymes oligomeric in the first

place? One explanation is cooperativity. We already discussed the cooperativity that is made

possible in oligomers. However, the number of oligomeric enzymes where cooperativity has been

established is quite a small fraction of all the known oligomers that have been studied. Another explanation holds for some cases where large-scale structural integrity is required; viral capsids,

microtubules and bacterial S-layers are well-known examples. But again, these are special cases, and

they do not speak to the question of why enzymes are so often oligomeric. Other potential advantages

have been proposed, including the idea that oligomers are naturally more stable than monomers. But

there is little evidence to support such ideas. The exceptionally high abundance of oligomeric

enzymes is (in the author’s opinion) a largely unexplained puzzle in molecular biology.

Special Topics in Protein Symmetry

Helical Symmetry (non-point group)

Some symmetries contain operations that have translational or shift components in addition to

rotation. Repeated application of an operation that includes a shift naturally implies a structure that

extends essentially indefinitely; i.e. a filamentous structure. F-actin filaments, microtubules, many

rod-shaped filamentous viruses, and phycobilisomes are some examples of protein assemblies that

follow helical symmetry.

Describing the geometry of helical assemblies is generally more complicated than describing finite

assemblies that obey point group symmetries. In some case, the organization of a helical assembly

can be fully described by a single spatial operation (a rotation combined with a shift), which when

applied repeatedly generates all the subunits in the structure. The F-actin filament is described by a

rotation of about 167° combined with a translation of about 28 Å. Because the rotation is close to

120

180°, the F-actin

filament takes the

appearance of two

separately

interwound helical

‘protofilaments’. The

cylindrical protein

coat of tobacco mosaic

virus (TMV) can be

described by a single

rotational operation

of about 22° combined

with a translation of

about 23 Å. Because

that rotation is

somewhere between

1/16 and 1/17 of

360°, the assembly

can also be viewed as

16 slowly twisting protofilaments twisting one way, or 17 protofilaments twisting the other direction. In other cases, like the microtubule, the helical assembly is much harder to describe by a

single operation. Instead, different families of helical curves can be drawn on the ‘surface lattice’.

With one family of curves, there are apparently 10 protofilaments (such a curve is therefore referred

to as a ’10-start’ helix). There are also 13-start helical curves and 3-start helical curves for the

microtubule. In yet other kinds of tubular protein assemblies, there can be a true rotational

symmetry along the axis of the tube; in those cases it is impossible to describe the assembly in terms

of a single spatial operation between subunits.

Quasi-equivalence and the structure of icosahedral viral capsids Early work on viral capsids – of the ‘spherical’ variety, not the filamentous variety – led to the

conclusion that they would be constructed according to principles of symmetry. The highest cubic

point group symmetry in three-dimensional space is icosahedral (as we discussed above), and this

posed a major problem. It was clear that a 60-subunit (icosahedral) protein shell would not be large

enough to encapsulate all the genetic material of a virus. Don Caspar and Aaron Klug proposed a

solution to that puzzle based on the idea of ‘quasi-equivalence’. Under symmetry, related copies of a

subunit are in equivalent environments, so only 60 subunits can be assembled while retaining strict

environmental equivalence. But Caspar and Klug showed how larger numbers of subunits can be

assembled in quasi-equivalent environments. The key was to begin with a scheme based on

triangular facets of an icosahedron, but then to subdivide the triangular facet of the icosahedron into

several smaller triangles. The simplest way to do this is to divide a triangular facet into four smaller

triangles. Then, instead of placing three subunits in a symmetrical arrangement on a single face of

an icosahedron, as one would do for a simple icosahedral assembly, one can place three subunits on

121

each smaller subdivided triangle,

again in a symmetric arrangement.

Clearly you would end up with four

times as many subunits as for a simple

icosahedral assembly, namely

4*60=240 total subunits. We asserted

before that no more than 60 subunits

can be placed in strictly equivalent

environments, but this triangulation

method leads to subunits that are in

nearly equivalent or quasi-equivalent

environments, as shown. To see the

difference, note how some subunits

appear to be part of hexameric units

while others appear to be part of pentameric units. If the capsid is composed of only a single kind of

protein, then the same protein must be able to occupy multiple distinct conformational states. In

some viruses, the distinct geometric sites are occupied by slightly different capsid proteins; that

avoids the problem of the same subunit having to take on different conformations, but it also creates

a need for the viral genome to encode more proteins.

The case explained above is referred to as T=4 based on the factor by which the number of subunits

increases; the total number of subunits is 60*T. Other cases besides T=4 are more common in nature.

These triangulation scheme are a bit harder to draw because the side of the large triangular facet of

the icosahedron does not fall along a lattice line of the smaller triangulation pattern on which the

subunits are arranged. Only some triangulation numbers (T) are possible. The governing equation

is

T = h2 + k2 + hk

where h and k are

integers. In the

triangulation diagram

shown, h and k

describes the indices of

an edge of the large

triangular facet of the

icosahedron in terms of

edges of the smaller

triangular lattice. The

recipe for assigning the

values of h and k for a

given diagram is as

follows. Take one

corner of the large triangle as the origin (0,0). Then draw two unit vectors a and b to serve as

coordinate axes on the pattern of smaller triangles; these two unit vectors drawn from the origin

must be 60° apart (not 120°). Now figure out what the coordinates would be in this system for one

122

of the other corners of the larger triangular facet. In other words, determine how many steps you

would have to take along a and b in order to reach the other corner of the large triangular facet.

Those numbers of steps are the values of h and k. Note that as long as you follow these rules, you can

choose any edge of the larger triangular facet and multiple choices for the two coordinate basis unit

vectors; you may get different values for h and k, but the value for T should be unchanged. T=3 is a

common case in natural viruses, while at the upper limit a few giant viruses are known where T is at

least 1000 and the virus exceeds the size of a bacterial cell!

Using symmetry to design novel protein assemblies

Nature is full of examples of proteins that have evolved to form elaborate assemblies. A long-standing

goal in bioengineering has been how do design novel proteins in the laboratory so they will self-

assemble to make interesting architectures like those seen in nature. Ideas for how this might be

accomplished were laid out by the author’s laboratory several years ago and have reached fruition in

recent years. Symmetry has played the key role in the design strategy. In one approach, simple

natural protein oligomers like dimers and trimers are connected together in specific geometric ways

to give giant structures like cubic cages. Designed structures of this type may have utility in varied

biomedical and nanomaterials applications. The crystal structure of a designed protein assembly

having 24 identical subunits in symmetry O is shown below, oriented along its different axes of rotational symmetry.

Algebra for describing symmetry

When working with symmetry, it is often necessary to describe the underlying spatial operations in

algebraic terms. Some recollection of matrices and how to multiply matrices and vectors together is

important. We noted earlier that groups can be composed of matrices, and indeed we can take a

symmetry group and represent its elements by matrices. Each matrix represents a rotation operation

that is an element of the symmetry group.

A simple recipe makes it possible to construct a 3 by 3 matrix from a physical description of a rotation

operation. First, imagine that you have a starting point at coordinates (1,0,0), that is a point one unit

along the x-axis. Now ask yourself where that point would go under the operation in question. For

example, if the operation in question is a 180° rotation about the z-axis, a point that starts at (1,0,0)

would rotate to a position where the coordinates are (-1,0,0). Now write those coordinates (-1,0,0)

123

as the first column vector of a matrix. Now repeat the exercise with (0,1,0) as the starting point.

Under the operation of interest it would go to a position where the coordinates are (0,-1,0). Write

that as the second column vector. Now do the same for the starting point (0,0,1). That point actually

sits on the z-axis about which the rotation is occurring, so it would not go anywhere; it’s final position

would be (0,0,1). So, the constructed matrix for a 180° rotation about the z-axis would be

𝑅 = [−1 0 00 −1 00 0 1

]

We can do much with this matrix representation. For one, we can use it to multiply a generic ‘x,y,z’

vector notation to get a symbolic representation of the rotation in question:

[−1 0 00 −1 00 0 1

] [𝑥𝑦𝑧] = [

−𝑥−𝑦𝑧

]

That means we can equally well represent the rotation about z symbolically as ‘(-x, -y, z)’. Also, by

reversing our steps we could begin with a symbolic description of a rotational operation, write out

what the elements of the rotation matrix must be, and then use the columns of that matrix to get a

physical picture of what kind of operation is being performed.

Finally, there is a useful trick. In three-dimensions, the angle of rotation described by a 3x3 rotation matrix can be determined easily from its ‘trace’. The trace of a square matrix is the sum of its diagonal

elements (i.e. R(1,1)+R(2,2)+R(3,3)). The equation for the angle of rotation is:

Trace(R) = 1 + 2cos()

Checking the case we worked out above, the trace is (-1 + -1 + 1) = -1. Solving for, , we get 2cos()

= -2, then cos() = -1, and finally = 180°, as expected. [Note that for a 2x2 matrix describing a 2-D

rotation, the equation is different; the additive term “1” on the right must be removed for the 2-D

case.]

124

CHAPTER 12

Equations Governing Diffusion

We turn our attention now to dynamic processes, where behavior becomes a function of time.

Diffusion – the movement of molecules as a result of random thermal motion and collisions – is one

of the most basic dynamic processes for molecules. We will discuss the central equations that govern

diffusion and consider their consequences for diffusive behavior.

Diffusion in 1-D

Consider diffusion in one dimension as a random walk along a number line. Suppose we start at x=0,

and then take n steps randomly either to the left or right, with each step having length .

Where do we expect to end up after n steps?

𝑥𝑛 = ∑𝑙𝑖𝛿

𝑛

𝑖=1

where li is either +1 or -1 with equal probability. Then, the expected value of the position after n

steps, <xn>, will be

⟨𝑥𝑛⟩ = ⟨∑𝑙𝑖𝛿

𝑛

𝑖=1

⟩ = 𝛿 ⟨∑𝑙𝑖

𝑛

𝑖=1

⟩ = 𝛿 ∑⟨𝑙𝑖⟩

𝑛

𝑖=1

= 0

since the average or expected value of l is 0. The equation tells us that the average value of the

position of the particle after n steps remains at 0. This is as expected since there was no preference

to step one way or the other. This means that if a large group of particles take independent random

walks starting at the origin, the average of their distribution will remain at 0. But clearly the particles

themselves do not individually remain at zero. That leads us to ask how spread out the distribution

of particles would be after they each take n steps.

The standard way of expressing a degree of spreading is to evaluate the average value of the squared

position – the squaring causes displacements in both directions (positive and negative) to contribute

positively to the spread, as they should. The average squared displacement after n steps is

125

⟨𝑥𝑛2⟩ = ⟨(∑𝑙𝑖𝛿

𝑛

𝑖=1

)

2

⟩ = 𝛿2 ⟨(∑𝑙𝑖

𝑛

𝑖=1

)

2

⟩ = 𝛿2⟨(𝑙1 + 𝑙2 + ⋯+ 𝑙𝑛)(𝑙1 + 𝑙2 + ⋯+ 𝑙𝑛)⟩

= 𝛿2 (⟨𝑙1𝑙1⟩ + ⟨𝑙1𝑙2⟩ + ⋯+ ⟨𝑙1𝑙𝑛⟩

+⟨𝑙2𝑙1⟩ + ⟨𝑙2𝑙2⟩ + ⋯+ ⟨𝑙2𝑙𝑛⟩+⋯

) = 𝛿2((1 + 0 + ⋯+ 0) + (0 + 1+)… ))

= 𝑛𝛿2

The logic here parallels what we saw earlier in the course when treating the path of a flexible polymer

using a random walk model. Namely, the average squared distance traveled is proportional to the

number of steps. This means that the rms distance goes as the square root of the number of steps, as

before.

𝑥𝑟𝑚𝑠 = √𝑛𝛿

This is an important result. It shows that diffusion can be an efficient mechanism for movement over

short distances but not over long distances. This important limiting feature of diffusion explains the

evolution of a range of biological phenomena where energy is expended (e.g. in the form of ATP or

GTP hydrolysis) to cause directed movement of molecules or even organelles across long distances.

The places where this becomes most important is where the cellular length scales are largest;

neurons are the classic example, and energy driven transport (e.g. by molecular motors tracking

along microtubules) is critical there.

We can convert the equation <xn2> = n2 from a form that depends on details of the random walk

(e.g. n and ) to a more phenomenological form that depends on time. Let be the time interval

between steps. Then the elapsed time during a random walk is t=n, and n=t/. So,

<xn2> = (t/) 2 = (2/)t

Now we replace the term (2/) with 2D, where D is called the diffusion coefficient, which is a

property of the molecule in question, among other things such as the viscosity of the solution in which

the diffusion is occurring. With that substitution, we find that the rms distance traveled is

<xn2> = 2Dt

This equation applies to diffusion in one dimension. In three dimensions,

<xn2> = 6Dt

We will discuss the relationship between molecular properties and the diffusion coefficient D later,

but for now we continue with an analysis of how molecules spread out from a central starting point,

and how this depends on time and on D.

126

For the 1-D case we can

imagine a thin tube in

which material is

initially concentrated at

one point (x=0). Over

time, the molecules

naturally spread out.

What will the concentration profile

look like at different time points? The

answer is that the concentration will

take the form of a Gaussian

distribution. Specifically,

𝐶(𝑥) ∝ 𝑒−(𝑥2)/4𝐷𝑡

Comparing this to the standard form

of a Gaussian (𝑒−(𝑥2)/(2𝜎2)), where σ

is the standard deviation, we see that

the standard deviation of the

concentration profile is obtained by

setting 4Dt = 2σ2, which gives a

standard deviation for spreading

from the center of σ = sqrt(2Dt),

which matches our previous

expression for rms distance traveled,

as it should.

General Equations for Diffusion

We were able to work out some basic properties for diffusion from a fixed point, but what about

equations we can apply to more general cases? This leads us to Fick’s first and second laws of

diffusion.

Fick’s first law

We begin by setting up a 1-D system as before. Then we consider how much material (i.e. how many

molecules) would cross an imaginary boundary between two points in the system, x and x+, over a

time interval where each molecule would take a step.

127

We can let N(x) denote

the number of molecules

that are at position x, and

N(x+) denote the

number of molecules at

position x+. To relate

numbers of molecules to

concentrations, we would need to divide the number at each location (x or x+) by the volume

allotted to each position, namely A*, where A is the cross sectional area.

Now we can consider how many molecules we expect to cross the imaginary boundary. Our interest

is in the net movement. The net movement or transport across a boundary (real or imaginary) is the

flux, J, expressed as a # per area per time (cm-2 sec-1 in cgs units). We can calculate the flux in our

system by noting that half of the molecules at position x will cross the boundary from left to right

(since the probability is ½ in each time step that a molecule will take a step to the right), and likewise

half of the molecules starting at x+ will cross the boundary from right-to-left. Clearly then the net

flux (taken here as the net movement from left to right) would be the first quantity minus the second

quantity, dividing by area and the time interval:

J = ½*(N(x)-N(x+))/(A)

Now multiplying the top and bottom by 2 gives

J = ½*(N(x)-N(x+))2/(A2)

Rearranging and substituting concentration C for N/V = N/(A) gives

J = - ½ 2/ * (C(x+)-C(x))/

Now we recognize that (½ 2/) is just the diffusion coefficient D from before. And the expression on

the right side of the equation appears as a difference between two values of the concentration C at

two closely spaced points (x and x+), divided by the spacing; this has the form of a derivative of C

with respect to position. So,

J = -D dC/dx

This is Fick’s first law in one dimension.

This general result tells us that the net movement of molecules due to diffusion (down a

concentration gradient) is proportional to the steepness of the gradient (dC/dx) times the diffusion

coefficient. And of course the negative sign is important as it specifies movement in the direction

128

opposite from the direction of the

gradient. The idea can be graphed on a 1-

D concentration profile as shown.

Fick’s second law

What can we say about how

concentrations will be changing over time

as a result of diffusion? We can answer

that with a similar treatment. But now we

think about how the number of molecules

in some particular region would change

as a result of the flux that is occurring on

one side compared to the other (i.e. on the

left compared to the right). If the total flux

into a region, taking into account movement across the boundaries on either side, is positive, then

the concentration should be increasing over time.

Thinking about this as a change in concentration over a change in time,

ΔC/Δt = (Δ(# of molecules)/V) / Δt

Then realizing that the change

in the number of molecules is

given by the net number being

transferred across the

boundary on the left minus the

net number being transferred

across the boundary on the

right, and taking the volume

element to be A* , and taking

the time interval to be ,

ΔC/Δt = (J(x)A – J(x+)A)/(A)) /

This simplifies to

ΔC/Δt = - (J(x+) – J(x))/

Similar to before, we can recognize this as a derivative of J with respect to position. So,

dC/dt = -dJ/dx.

129

But we know from Fick’s first law that the flux J is the first derivative of the concentration C with

respect to position x. So, the change in concentration as a function of time is evidently the second

derivative of C with respect to position, multiplied by the diffusion coefficient D.

(𝜕𝐶

𝜕𝑡)

𝑥= 𝐷 (

𝜕2𝐶

𝜕𝑥2)

𝑡

This is Fick’s second law in 1-D.

Essentially, it tells us that the way

the concentration is changing at a

fixed position due to diffusion is

determined by the curvature (i.e.

the second derivative) of C with

respect to x. With that

understanding we can sketch how a

concentration profile would be

expected to change over time (at

least over a short interval where the

derivatives are not changing much):

Fick’s second law is a second order

differential equation for C in terms

of x and t. Some scenarios have

simple enough ‘boundary’

conditions (e.g. a simple form for

the concentration at time 0), that we

can solve Fick’s law to obtain a

complete expression for C in terms

of x and t, meaning we would know what the concentration profile would look like at any time t. Most

real problems have mathematical forms that are difficult to solve. But in the case of diffusion from a

point that we dealt with earlier, we did write out an equation (without proof) saying that the

concentration profile as a function of time and position was proportional to a Gaussian. Introducing

a leading multiplicative term in order to make the total amount of material in the system constant

over time, the correct equation is:

𝐶(𝑥) ∝ (1

√𝐷𝑡) 𝑒−(𝑥2)/4𝐷𝑡

Although we write this equation without proof, we can show that it does indeed obey Fick’s second

law, as it must. Take the first (partial) derivative of C with respect to t. Then take the second (partial)

derivative of C with respect to x. The resulting expressions should be equal to each other after

multiplying by the diffusion coefficient, D.

130

Generalizing Fick’s laws for three dimensions

Fick’s laws generalize readily to higher dimension. The single-variable derivatives get replaced by

the gradient operator.

In dimensions higher than 1, the flux J is a vector. It points directly down the concentration gradient

in the direction of steepest descent. Fick’s first law takes the following form:

𝐽 = −𝐷 ((𝜕𝐶

𝜕𝑥)𝑡,𝑦,𝑧

�̂�𝑥 + (𝜕𝐶

𝜕𝑦)𝑡,𝑥,𝑧

�̂�𝑦+(𝜕𝐶

𝜕𝑧)𝑡,𝑥,𝑦

�̂�𝑧) = −𝐷∇⃗⃗ 𝐶

where the �̂�𝑥 term denotes the unit vector along x and likewise for y and z, and ∇⃗⃗ symbolizes the

gradient or ‘del’ operator.

Similarly, Fick’s second law generalizes to:

(𝜕𝐶

𝜕𝑡)

𝑥,𝑦,𝑧= 𝐷 ((

𝜕2𝐶

𝜕𝑥2)

𝑡,𝑦,𝑧

+ (𝜕2𝐶

𝜕𝑦2)

𝑡,𝑥,𝑧

+ (𝜕2𝐶

𝜕𝑧2)

𝑡,𝑥,𝑦

) = 𝐷∇2𝐶

Special topic: Using numerical (computational)

methods to simulate diffusion behavior

Many problems that involve differential equations can

be treated effectively using computer techniques. The

derivative quantities are effectively replaced with

differences in a variable over small sampling distances.

To apply Fick’s second law to a diffusion problem, we

need to know the second derivative of concentration

with respect to position. Examining the 1-D case first,

we might have a plot of concentration at some time t as

shown:

How would we estimate the value of (d2C/dx2) at point x? Well, the second derivative is just the

derivative of the first derivative, so we need to evaluate how the first derivative changes. We can

take the difference between the first derivatives between points x and x+Δx and between points x-

Δx and x, keeping in mind we need to divide by the separation distance when calculating derivatives.

We get:

131

𝑑2𝐶

𝑑𝑥2≅

𝐶(𝑥 + ∆𝑥) − 𝐶(𝑥)∆𝑥

−𝐶(𝑥) − 𝐶(𝑥 − ∆𝑥)

∆𝑥∆𝑥

=𝐶(𝑥 + ∆𝑥) + 𝐶(𝑥 − ∆𝑥) − 2𝐶(𝑥)

(∆𝑥)2

In other words, the second derivative is approximated by first adding up the values of a variable on

either side of the central point and then subtracting twice the value the variable takes at the central

point, dividing by the square of the sampling distance. This recipe makes it possible to simulate the

evolution of a concentration profile under diffusion by estimating the second derivative of C at each

point and then using those values to update the new concentrations at a new time point. Since

∆𝐶

∆𝑡= 𝐷

𝜕2𝐶

𝜕𝑥2

∆𝐶 = (𝐷𝜕2𝐶

𝜕𝑥2)∆𝑡

The procedure can be extended easily into higher dimensions. For 2-D, the numerator is just the sum

of separate second partial derivatives with respect to x and y, and we would end up with:

∆𝐶 = (𝐷 (𝜕2𝐶

𝜕𝑥2+

𝜕2𝐶

𝜕𝑦2))∆𝑡

=𝐶(𝑥 + ∆𝑥, 𝑦) + 𝐶(𝑥 − ∆𝑥, 𝑦) + 𝐶(𝑥, 𝑦 + ∆𝑦) + 𝐶(𝑥, 𝑦 − ∆𝑦) − 4𝐶(𝑥, 𝑦)

(∆𝑥)2∆𝑡

assuming the sampling distance is the same in both

directions (i.e. Δx = Δy). Graphically, the effect is to add up

the value of the variable (C) at the four points surrounding

the central point (x,y) and then substract 4 times the value of

the variable at the central point – this is the numerator

above.

In 3-dimensions, the required coefficients are of course

+1.+1,+1,+1,+1,+1, and -6.

132

CHAPTER 13

The Diffusion Coefficient: Measurement and Use

We noted earlier that the diffusion coefficient, D, which describes how rapidly a molecule diffuses,

depends on the properties of the molecule (chiefly its size). Naturally then, if we measure D we can

learn something about molecular size. We begin with a discussion of experiments for measuring D.

Measuring the diffusion coefficient, D

The diffusion coefficient can be measured in various ways, each of which may be suitable for some

systems and not others. One approach is to measure the rate of spread from an initial point. From

our earlier equations we know that the standard deviation in the spread of material initially

concentrated at a point goes as xrms=sqrt(2Dt) in 1-D and sqrt(6Dt) in 3-D. Therefore, if the spatial

spreading (i.e. the standard deviation of the concentration distribution) after time t can be measured,

D can be obtained.

Recent technology developments have made it possible with special instrumentation to track the

location of a single molecule, usually on the basis of fluorescent labeling combined with highly

sensitive detectors. If a single particle is monitored long enough, its average movement over a

particular time period can be evaluated, leading to a value for D in the same way as for the

measurement of spreading in an ensemble, as above.

Fluorescent recovery after photobleaching (FRAP)

Diffusion is a process whereby molecules move about in a way that tends to result in a uniform

distribution; i.e. an equal concentration everywhere. As a result, measuring diffusion in a system

where the molecule of interest is already uniformly distributed is naturally problematic, since further

diffusion has no effect on the concentration distribution. The principle behind methods known as

fluorescent recovery after photobleaching or FRAP, is to create a non-uniform distribution of the

molecule of interest and then monitor how rapidly its concentration profile returns to uniformity.

The standard approach is to first label the molecule (e.g. a protein) with a fluorescent group. [As we

will discuss later, fluorescence is a convenient and sensitive way to measure the concentration of a

labeled molecule.] Then, a strong laser pulse is used to ‘bleach’ (i.e. destroy by bond breakage) the

fluorescent probe, not everywhere but just in one spot, leaving a region (e.g. a circular spot) where

there is no fluorescence from the molecule of interest. Then, one waits and measures fluorescence

in the bleached region. Assuming the labeled molecules are diffusing about, unbleached molecules

from outside the bleached region will find their way into that region. How many of those molecules

have diffused into that region can be monitored with a (usually microscopic) fluorescence reading.

Of course the bleached molecules will also diffuse out of the bleached region, but they lack a

fluorescent group and so do not contribute to the measured fluorescence. After an extended period,

the fluorescence as a function of time will plateau as the concentration of unbleached molecules

becomes equal inside and outside the bleached region. The time scale over which the fluorescence

133

returns depends on the diffusion coefficient of the molecule being studied. So, D can be obtained by

measuring the rate of fluorescence recovery.

The behavior of fluorescence recovery and its relation to D is diagrammed here. We will not go

through a detailed mathematical treatment here other than to say that the equations for diffusion

make it possible to formulate what the curve for fluorescence recovery should look like as a function

of D. And therefore a value for D can be extracted by determining what value of D gives the best

match between the mathematically calculated behavior and the observed data.

FRAP can be used in

various types of set-ups.

It is naturally suited for

measuring two-

dimensional diffusion in

a thin layer, for example

of a protein in a lipid

bilayer. It is also

commonly used in situ

(i.e. inside cells) using

fluorescent microscopy.

For fluorescent studies

inside cells, the protein

of interest must be fluorescently labeled by genetic fusion to a naturally fluorescent protein, a topic

we will discuss in more detail later.

Dynamic Light Scattering (DLS)

Dynamic light scattering (DLS, also sometimes referred to as photon correlation spectroscopy) is a

powerful and convenient experiment for measuring diffusion coefficients. Part of its power and

convenience comes from not having to artificially create a system where the concentration

134

distribution is out of equilibrium (i.e. non-uniform), as required in FRAP. DLS relies on natural

fluctuations in the intensity of light that is scattered by the large solute molecules in a solution. We

will not discuss the physics of light scattering in detail, other than to say that the phenomenon under

discussion here is typically referred to as elastic or Rayleigh scattering and occurs where the

wavelength of light is much larger than the sizes of the molecules involved; macromolecules have

sizes between a few nanometers to tens of nanometers, while the visible/UV region of the

electromagnetic spectrum is in the few hundred nanometer range. The intensity of the light

scattering – e.g. the fraction of the incident photons that bounces off in directions other than the

incident direction – depends on the index of refraction or polarizability of the molecules, and is

strongly dependent on molecular size. In fact, it tends to be dominated by the largest species of

molecule in a solution.

Without belaboring the details, the random movements of molecules in solution causes random

fluctuations in the intensity of light that is scattered in any fixed direction. At one moment in time,

the scattered light intensity may be slightly lower than the average (over time), whereas a moment

later the intensity may be higher. But the crux of the phenomenon is that the time scale over which

the fluctuations persist depends (inversely) on the rate of diffusion. If at some instant in time the

positions of the macromolecules in a solution are such that the light scattering is higher than average,

then the light scattering will remain above average until the molecules have moved far enough (by

diffusion) to erase the momentary fluctuation, and likewise if the scattered intensity is lower than

average. If the molecule under study has a high diffusion coefficient, then deviations above or below

the average intensity

will vanish or

dissipate quickly,

whereas if the

molecule has a low

diffusion coefficient

then whatever

fluctuations occur will

persist longer. In

other words, a

phenomenon that

shows random

fluctuating behavior

has a natural time

scale associated with

the fluctuations. The

plots here illustrate

the general idea of

fluctuating behavior

having different

characteristic time

scales.

135

How can the time scale of something that is randomly fluctuating be characterized mathematically?

The autocorrelation function provides the answer. The essence of an autocorrelation function is to

ask how similar or correlated the intensity measured at time t is compared to the value measured at

time t+, where is some specified time increment. Of course the answer depends on the value of .

If is sufficiently small, then the values measured at t and t+ will be very similar (in fact identical if

=0). On the other hand, if we consider a large value of (i.e. longer than the time scale of the

fluctuations), then the intensity values at t and t+ will be uncorrelated. And of course at

intermediate values of we will see intermediate values of the correlation (i.e. between 1 and 0). In

other words, the value of the autocorrelation function will be 1 when is 0, and will decay to 0 when

is large. Precisely how quickly the autocorrelation function decays as a function of tells us what

the characteristic time scale is for the fluctuating behavior.

The plot above (lower panel), illustrates the mechanics of how the autocorrelation function is

evaluated. First, recall from your prior exposure to statistics that when calculating the correlation

coefficient between two ordered sets of values, for the numerator one simply takes the sum or

average of the products of one set of values with the other set; the denominator is simply a

normalizing factor. So, to calculate the autocorrelation function we just need to calculate the average

value of the product of the intensities at time t and t+. To calculate the average value of the product

of I(t) and I(t+), you can imagine taking a bar of length and sliding it along the length of the plot to

identify intensity values to multiply together and average. Without loss of generality we can simplify

things by pretending that the average value of the intensity is 0, with fluctuations giving plus and

minus values. Then, the autocorrelation function A() is just

A() = <I(t)*I(t+)>

A plot of A() vs will decay exponentially, according to our discussions above. The characteristic

time for the fluctuating intensity is the value of the time increment at which A()/A0 = e. The

motivation for this kind of autocorrelation analysis is that the characteristic fluctuation time is

136

related inversely to the value of the diffusion coefficient, D. From more advanced texts one can find

that a plot of ln(A) vs gives a slope equal to D times (-82n2 sin2(/2)/2) where n is the index of

refraction, is the scattering angle, and is the wavelength of the light. In that way, D can be

obtained from the autocorrelation analysis.

With modern instrumentation, the value of D can be obtained by DLS quickly with very little material.

Built-in software performs the necessary analysis. The experiment is ‘native’ in the sense that the

macromolecule is examined in its native state, and nondestructive. Of course the macromolecule of

interest must be purified, and owing to the strong dependence of scattering on size, special care must

be taken to remove any particulates or aggregated material. If the sample in question is

heterogeneous, containing molecules of more than one size, then in ideal cases it is possible to

decompose the behavior into separate components, but this becomes challenging.

Relating the diffusion coefficient to molecular size

Relating D to molecular friction

When a molecule moves down its concentration gradient under diffusion, the overall rate of

movement or transport reflects a sort of terminal velocity situation, where the force of movement

caused by the concentration gradient is balanced by the friction the molecules feel as they travel. The

frictional force on an object is the product of its velocity with its frictional coefficient, f. The direction

of force is opposite from the velocity, so

Ffrict = -f v

What is meant by a force due to a concentration gradient? We have seen this before in the context of

balancing forces in equilibrium sedimentation. If we denote the force due to a concentration gradient

as FC, then recalling that force is the derivative of (potential) energy as a function of position,

FC = -d(0 + kBT lnC)/dx = (-kBT/C) dC/dx

Now requiring that the two forces sum to zero at terminal velocity, FC + Ffrict = 0 and

f v = (-kBT/C) dC/dx

But the velocity v is a description of how fast the molecules are moving (not their vibrational speed

between collisions but their net transport speed), so v must be related to flux J. By examining units

(J is #/(cm2 sec), and v is cm/sec), we can see that the conversion between them is by units of #/cm3

which is concentration, so

J = v C and v = J/C

137

Putting this into the previous equation and cancelling concentration on both sides,

f J = (-kBT) dC/dx

But from Ficks first law we know that J = -D dC/dx . Substituting that into the equation above and

cancelling like terms, we find remarkably that

f D = kBT

Of course kBT is a constant (our familiar average thermal energy), having no dependence on the

molecule in question. Evidently, the frictional coefficient of a molecule f and the diffusion coefficient

of a molecule D are just two manifestations of the same thing, inversely related. If we know one, we

know the other. This equation is known as the Einstein-Smoluchowski equation. The reason it is

important is that there are well-known physics equations to describe how the frictional coefficient

of an object relates to its size, and since knowing D give us f, we have a way to get from D to molecular

size, as we describe next.

Relating the frictional coefficient f to molecular size (spherical radius)

In 1850 Stokes showed that for a sphere of radius R moving in a medium of viscosity , the frictional

coefficient is

f0 = 6 R

where the subscript in f0 denotes the assumption of a sphere. This is known as Stokes’ equation. For

water at ambient temperatures, the viscosity is about = 0.010 g/(cm sec). From the equations

above you can see that if you measure D, you can immediately obtain the frictional coefficient f, and

if you assume the molecule is spherical then you can obtain R. The value of R so-obtained is

sometimes called the Stokes radius or sometimes the hydrodynamic radius. From a value for R, one

can use the known density for a protein or nucleic acid to calculate its mass.

Transport problems are often treated in cgs units. The units for the key variables are somewhat

peculiar. They are listed here for convenience.

variable units D cm2/sec f g/sec

J 1/(sec cm2)

v cm/sec

g/(cm sec)

138

Example:

Suppose the measured value of D for a large protein complex is 510-7 cm2/sec. Assuming

the molecule is spherical, what is the molecular weight? (Let the density of protein be 1.35

g/cm3)

f = kBT/D = 8.2 10-8 g/sec

R = f/(6) = 4.4 10-7cm (= 44 Å)

MW = 1.35 g/cm3 (4/3) R3 NA = 290000 g/mol = 290 kDa

Non-spherical molecules

For a given volume, a sphere has the lowest possible frictional coefficient, which also means it has

the highest diffusion coefficient. Non-spherical objects have higher values of f and lower values of D.

That means that estimating molecular size from the diffusion coefficient alone (using Stokes’

equation, which assumes a spherical shape) can give somewhat erroneous values. In particular, since

a non-spherical shape leads to a lower diffusion coefficient and a higher frictional coefficient, and the

frictional coefficient varies directly with the assumed spherical radius, a highly non-spherical

molecule will have the same diffusion coefficient as a larger spherical molecule. In other words,

estimating molecular size from D alone can lead to an overestimation of size.

Advanced texts provide complex equations that relate f to the degree of non-sphericality, but these

are not useful in practice very often. As we will see shortly, combining measurements of D (and hence

f) with other kinds of measurements makes it possible to obtain molecular weights without assuming

a spherical shape.

In addition to the issue of shape, the frictional coefficient can also be affected by other factors. In

particular, macromolecules tend to carry a hydration layer of bound water molecules with them, and

this sometimes complicates the analysis of frictional coefficients.

Diffusion or spreading out with respect to orientation rather than position

The idea of diffusion can be generalized to go beyond positional variables. Ordinary diffusion

considers how molecules that begin in the same place spread out to other positions. That idea can

be generalized to molecular orientations. If the macromolecules in a solution can be initially driven

to the same orientation and then allowed to reorient through random rotational changes, then over

time their orientations will spread back out until their orientational distribution becomes uniform.

As with ordinary diffusion, there is a distinct constant associated with rotational diffusion of a

molecule, which is inversely related to the degree of friction the molecule experiences when it

139

tumbles in the viscous solution. Rotational diffusion will come up again later when we discuss special

spectroscopic techniques.

Special Topic in Diffusion: Diffusion to Transporters on a Cell Surface

It is a surprising observation that cells that have surface transporters for taking up nutrients usually

have a rather low density of transporters on their cell surface. Why not gain an advantage by densely

packing the surface with transporters in order to obtain nutrients more rapidly? The answer to this

puzzle comes from examining the peculiar and surprising properties of diffusive behavior.

Net movement can be present in situations where the concentration is not changing over time

anywhere. Steady-state can be achieved where a molecule of interest is being produced at one place

(called the source) and consumed at another (called the sink). Under steady state conditions,

meaning dC/dt=0, Fick’s second law becomes:

((𝜕2𝐶

𝜕𝑥2)

𝑡,𝑦,𝑧

+ (𝜕2𝐶

𝜕𝑦2)

𝑡,𝑥,𝑧

+ (𝜕2𝐶

𝜕𝑧2)

𝑡,𝑥,𝑦

) = ∇2𝐶 = 0

Solving this differential equation gives a description of the concentration everywhere in a system

between the source and the sink. Solving this one equation gives different results for the function

C(x,y,z), depending on the ‘boundary conditions’. The boundary conditions are specified by having

fixed (and unequal) concentrations at the surfaces of the source and sink, so one obtains a different

solution to the differential equation and a

different function for C(x,y,z) depending on

the nature (e.g. size, shape and

arrangement) of the source and sink.

We begin our analysis of the problem of

diffusion to cell surface transporters with a

treatment of a simpler problem where we

have a sphere (representing a cell) whose

entire surface acts as an absorber;

whenever a diffusing molecule of interest

hits the surface it is captured or consumed.

This is the case of a spherical sink, and the

boundary condition is that the

concentration C=0 at the surface of the

sphere. We can treat the source as being infinitely far away – imagine an enclosing sphere of very

large radius giving a boundary condition of C equal to some fixed value C0 at infinity. Without proof,

we can find that the solution to C2 = 0 with these boundary conditions is

140

C(r) = C0 (1 – a/r)

where the spatial variables x, y, z, are

replaced instead with r since the

problem is spherically symmetric and

a is the radius of the spherical

absorber. Notice that, as required, C =

0 at r=a and C = C0 at r = infinity. You

can further prove to yourself that this

equation does indeed obey Fick’s

second law by converting r to

sqrt(x2+y2+z2) and taking second

partial derivatives to show that 2C =

0.

Now that we know what the concentration function looks like, we can determine how fast the

absorbing sphere is capturing diffusing molecules. We can find this by evaluating the flux J at the

surface and then multiplying by the area of the sphere, 4a2.

|J| = D|dC/dr| = DC0 a/r2

and at the surface of the sphere (r=a), so

|J| = DC0/a

Taking into account the surface area of the sphere,

Capture rate for an absorbing sphere = 4a2 DC0/a = C04Da

This is how fast a spherical cell of radius a could capture a diffusing nutrient whose diffusion

coefficient is D and whose bulk concentration is C0.

Now we have to answer the harder question concerning the capture rate for a sphere where the

diffusing molecule is captured only if it collides with the sphere at small absorbing patches (i.e.

transporters); the rest of the sphere is not absorbing. This problem was first treated by Howard Berg

and is discussed in his short classic Random Walks in Biology. Let’s say that there are N patches on

the surface and each is circular with radius s. Using a clever mathematical analogy between diffusive

resistance and resistance in an electrical circuit, Berg reasoned as follows. The rate of capture by a

single circular disk of radius s is (given without proof) C04Ds. [This comes from solving C2 = 0 using

boundary conditions of C=0 at the surface of a flat circular disk, calculating the flux as the derivative

of C, and then integrating the flux over the circular patch.] Then Berg notes that the problem of

diffusion to a set of patches on a sphere can be broken down into two steps: (1) diffusion from infinity

141

to a spherical surface just outside the sphere of interest (but with a radius not substantially greater

than a), followed by (2) diffusion to a set of circular patches.

Then Berg introduces an

electrical analogy. Recall that

electrical resistance is R=V/I

(which is a driving voltage

divided by a flow). By analogy,

diffusive resistance would be the

driving concentration divided by

the capture rate. For the case of

the sphere whose entire surface

is absorbing, we get C0/(C04Da)

= 1/(4Da) as the diffusive

resistance. For capture by a

single circular disk, for the

diffusive resistance we get

C0/(C04Ds) = 1/(4Ds). Now we

put the two steps together. The

two steps occur in series (one

after the other), so we should add

the resistances of the two steps

together. But first we have to

account for there being N

separate patches. Flow to the separate patches can occur in parallel, so we need to divide the

diffusive resistance of the second step by N. The total diffusive resistance becomes

1/(4Da) + 1/(N4Ds) = 1/(4Da) (1 + a/(Ns))

Finally, we convert to a rate of capture by taking the driving concentration and dividing by the

diffusive resistance.

Capture rate for a sphere with N absorbing patches = C04Da / (1 + a/(Ns))

Recall that the sphere whose entire surface was absorbing had a capture rate of C04Da. Therefore,

we can express the relative or fractional speed of capture for the sphere with patches (in comparison

to the fully absorbing sphere) as

fractional capture rate = 1/ (1 + a/(Ns))

Evidently, the capture rate has asymptotic behavior in terms of the number of patches N. If we call

the number of patches where the fractional capture rate is 50%, N50%, then N50%=a/s. In other

142

words, if the sphere has a radius that is a hundred times greater than the radius of the patches, then

a few hundred patches spread across the surface of the sphere will achieve 50% capture efficiency.

So what fraction of the surface of the sphere is actually covered by patches under these conditions of

50% capture efficiency? The area of the sphere is 4a2. The total area occupied by the circular

patches of radius s would be N50%s2, which after substituting N50%=a/s would give 2as. So the

fractional coverage of the spherical surface would be 2as/4a2 = s/(4a). Note that this is a small

number if the radius of the sphere is large compared to the size of the patches.

As a practical example, if the sphere is a bacterial cell with radius 1um, and for the sake of argument

we take the size of a transporter on the cell surface to be about 5Å in diameter, then the fraction of

the cell surface that needs to be occupied by transporters to reach 50% maximum capture rate is

510-10M/410-6M = 3.910-4, which is less than 1/10th of 1%!

What this exercise shows is that diffusion has peculiar properties, leading here to a phenomenon

where good capture efficiency is achieved with low surface coverage, and not much extra advantage

is gained by increasing the density of transporters significantly – increasing the density by a factor of

9 gets you to 90% efficiency. And there is of course a substantial cost associated with producing large

quantities of cell surface transporters, so the cell reaches a point where there are diminishing returns

for synthesizing more transporters. As an interesting counterpoint, cell surface phenomena that are

not diffusion-limited show strongly contrasting behavior. Photosynthetic antenna proteins and other

proteins that generate energy by light absorption are sometimes very densely packed on cell

surfaces, sometimes to the point of forming two-dimensional protein crystals in the membrane; light

absorption does not obey the same behavior as diffusion.

143

CHAPTER 14

Sedimentation velocity

Earlier we discussed sedimentation in the context of an equilibrium situation, where the experiment

was run essentially to completion, to a point where no further concentration changes were occurring,

and the external force of centrifugation (or gravity) came into balance with the opposing force of a

concentration gradient. Now we will consider a different scenario. In the limit we might imagine a

situation where the sample is being spun at such a high speed that it would nearly all be driven to the

bottom of the tube if the experiment were continued indefinitely. Now instead of considering the

ultimate equilibrium situation we might ask how fast the macromolecules are moving downward

during the experiment. This brings us to consider a balance between different forces: the downward

external force due to centrifugation and an opposing frictional force limiting the speed of movement.

We saw the frictional force come into play in the previous chapter, in opposition to a concentration

gradient. We can draw a scheme that ties together different kinds of measurements and experiments

we have discussed where different pairs of forces are put into balance as shown below. The arrow

at the lower right is of interest to us now.

144

Sedimentation coefficient, s

From before, the external force (on a per molecule basis) due to centrifugation is m2r, where is

the density increment, is the angular velocity, m is the mass, and r is the distance from the axis of

rotation. The opposing frictional force is –fv. Setting the sum of forces to zero (e.g. at terminal

velocity) gives

v = m2r/f

or converting to a per mole basis

v = M2r/(NAf)

How is the velocity v visualized in a velocity sedimentation experiment? If the sample begins with

the macromolecule uniformly distributed (e.g. with the concentration equal everywhere in the tube),

then when the centrifugation begins (at very high speeds as we discussed above), then the top region

of the sample will begin to be depleted of macromolecules. In the limiting scenario, there will be an

effective boundary position; at lower values of r the concentration of the macromolecule would be

nearly zero, as diagrammed here:

The sedimentation velocity, v, is the speed of the boundary, i.e. (boundary position, r)/t.

Meaning and measurement of the sedimentation coefficient, s

How does the sedimentation velocity v relate to molecular properties? We can see from the equation

above that v is affected both by molecular properties (e.g. M) and by experimental parameters (e.g.

). The behavior is clarified by separating the two kinds of variables on different sides of the

equation. Then,

v/2r = m/f

145

Now we can introduce the sedimentation coefficient s to be equal to those quantities. In other words,

s is obtained experimentally as s=v/(2r). And s relates to molecular properties according to

s = M/(NAf) or s = m/f

The advantage of separating the variables in this way and assigning a new variable s is clear. If we

increase the angular velocity in the centrifugation experiments, the sedimentation velocity goes up

also, but the sedimentation coefficient is unaffected. This must be the case since the equation above

shows us that s can be written in terms of molecular properties alone without reference to

experimental parameters.

As an aside, you can see that since s=v/(2r), we could try to obtain a value for s by measuring the

sedimentation velocity v (i.e. the speed of the boundary) at some instantaneous point in the

experiment and then dividing by 2r (using the value for r of the boundary at that instantaneous

point). But this is a bit sloppy given that v will be dependent on r. Better is to note that since v is

defined as dr/dt, s = (dr/dt)/(2r) = (d(ln(r))/dt)/2. So, measuring the position r of the boundary

at a series of time points during the experiment and plotting them as ln(boundary position) vs t

should give a straight line with slope s2, from which s can be obtained.

For convenience, a special unit is typically used to express the value of the sedimentation coefficient

s (whose natural units are seconds). The Svedberg, S, is defined as 10-13 sec. You are likely familiar

with this notation from molecular biology courses. For example, you learned about the 50S large

subunit of the ribosome; its name originates from its sedimentation coefficient – it’s s value is 5010-

13 sec.

Relating s to molecular properties

The sedimentation coefficient relates to molecular properties in two ways, through direct

dependence on mass and through the frictional coefficient f, which also depends on size (and

therefore mass). This leads to a somewhat complex dependence of s on mass. Because the frictional

coefficient depends on size via a linear dimension (i.e. radius, R), f goes as the 1/3 power of volume

and mass. So, from the equation above, s = m/f , we should expect s to depend on the 2/3 power of

m. Combining

𝑠 =𝑚𝜙

𝑓⁄

with Stokes’ equation for f (assuming a spherical shape), f= 6R,

𝑠 =𝑚𝜙

6𝜋𝜂𝑅⁄

146

But R relates to volume and mass according to

𝑉 =4

3𝜋𝑅3 and 𝑅 = (

3𝑉

4𝜋)

13⁄

Relating volume V to mass m by the density of the protein or nucleic acid, m=V, and substituting,

𝑠 =𝑚𝜙

6𝜋𝜂 (3𝑚

4𝜋𝜌)

13⁄⁄

which gives

𝑠 = 𝑚2

3⁄ (𝜙

6𝜋𝜂⁄ ) (4𝜋𝜌/3)1

3⁄ or 𝑠 = (𝑀 𝑁𝐴⁄ )

23⁄

(𝜙

6𝜋𝜂⁄ ) (4𝜋𝜌/3)1

3⁄

This can of course be further rearranged to give M in terms of s raised to the 3/2 power.

Do larger molecules sediment faster or slower than smaller molecules of equal density and similar

shape? The centrifugal force on an object is proportional to its mass, but the opposing force is

proportional only to the 1/3 power of the mass, so larger molecules sediment faster, according to the

2/3 power of their mass.

The equation above for relating mass to the sedimentation coefficient relies on the assumption of a

nearly spherical shape, since we were forced to employ Stokes’ equation to relate friction to size and

mass. And as we discussed before, if the molecule of interest is highly non-spherical, we may

misestimate the mass by such an approach. To be specific, if a molecule is highly non-spherical, than

compared to a spherical molecule of the same mass, it will experience the same centrifugal force but

a greater frictional force, resulting in a smaller value of s. From the equation above you can see that

a lower value of s for the non-spherical molecule would lead to an erroneously low value for the

estimated molecular weight.

Combining s and D to get molecular weight without a spherical assumption

We can free ourselves from the assumptions of a spherical shape if we have measured values for s

and D together. D and s both had relations to the frictional coefficient, but if we have values for s and

D we can cancel f out and avoid the need to obtain an expression for f in terms of a sphere. From our

previous chapter, f = kBT/D, and from above, f = M/(NAs). Setting kBT/D = M/(NAs), we get

𝑀 = (𝑅𝑇

𝜙)

𝑠

𝐷

147

So the molecular weight relates simply to the ratio of s to D, regardless of shape. Furthermore, if we

obtain a valid value for M from s and D together, we have the opportunity to evaluate the shape

properties, for example by checking to see how closely the value for f (which we can calculate directly

from D) matches the value you would expect for f for a molecule with mass M if it was indeed a sphere

(using Stokes’ equation).

Earlier we discussed various ways of

measuring the diffusion coefficient,

but in fact information about D can

typically be obtained from the

sedimentation experiment itself. In

our initial discussions we imagined

that the boundary in the

concentration profile would be

perfectly sharp. But in fact diffusion

would be occurring at the same time

as sedimentation, thereby causing

some spreading out at the boundary,

and to a concentration profile that is

not infinitely sharp and steep at the

boundary. Therefore, a more

advanced mathematical treatment

can extract information about D from

the shape of the sedimentation

profiles.

A summary of molecular

weight determination

from sedimentation and

diffusion measurements

148

CHAPTER 15

Chemical Reaction Kinetics

In this chapter we discuss the rates of chemical reactions, focusing on the meaning of reaction

velocity, its dependence on the stoichiometric order of a reaction, and on the time-dependent

behavior of reactant and product concentrations.

Reaction velocity, v

If we conceive a reaction generally as

reactants products

then the reaction velocity v is the frequency (#/time) with which the event described by the reaction

arrow is occurring per unit volume. The units of v are therefore #/(volumetime), which is

concentration per time or M/sec. Consistent with those units, one can see that (as long as other

reactions are not simultaneously producing or consuming the same reactants and/or products) the

reaction velocity is directly reflected in the rate of change of the concentration of reactants and

products. More specifically, note that there is but one velocity associated with the reaction, though

it may be that multiple reactants are being consumed and multiple products are being generated.

The reaction velocity is indicated equally by the rate of change of any of the species involved.

However, the stoichiometric coefficients associated with the reactants and products must be

accounted for carefully. If for instance a chemical species has associated with it a stoichiometry of 2,

then each reaction event corresponds to a consumption or production of two molecules of that

species. Therefore, for the general reaction:

A + B + … C + D + …

-d[A]/dt = v

-d[B]/dt = v

d[C]/dt = v

d[D]/dt = v

and so on for any and all species. Alternatively,

v = -(1/) d[A]/dt = -(1/) d[B]/dt = (1/) d[C]/dt = (1/) d[D]/dt = …

Evidently, if we measure the rate of change of the concentration of some species involved in the

reaction then we have measured the reaction velocity v, assuming we have properly accounted for

the stoichiometry.

149

Rate laws: how v depends on concentrations

The velocity of a reaction naturally depends on how concentrated the reactants are; if the number of

reactant molecules in a unit volume is vanishingly small then surely we can expect the frequency with

which we observe the molecule in question undergoing reaction events in that volume to also be

effectively zero. If the concentrations are higher, then the reaction velocity will be higher. Besides

the dependence on concentration, different reactions (i.e. involving different chemical species) will

have different reaction velocities according to the likelihood of the underlying chemical events. This

natural likelihood of a reaction to occur is captured by a rate constant k. The combined dependence

of a reaction velocity v on the rate constant k and the concentrations is referred to as a rate law. The

rate law can be complicated for complex reaction schemes, but for reactions that represent simple

individual chemical events, the dependence of v on concentrations can be written by inspection. Such

reactions are sometimes referred to as ‘elementary reactions’, and are to be distinguished from

scenarios where a written reaction actually describes the net stoichiometric result arising from more

than one reaction, e.g. two operating in sequence. For a single elementary step of the form:

A B

where k is the rate constant, the rate law is

v = k[A]

and [A] is the concentration of species A. Such a reaction is said to be first order in A. For a reaction

of the form

2A B

v = k[A]2

and the reaction is said to be second order in A. For a reaction of the form

A + B C

v = k[A][B]

and the reaction is first order in A and first order in B, and so on. The multiplicative or higher order

dependence of the reaction velocity in cases where multiple molecules are reacting at the same time

is a reflection of the joint probability that both reacting molecules are present together, colliding with

each other. Note that in our discussions of kinetics we will use brackets to denote concentrations to

be most consistent with common usage (instead of using Ci as we did in earlier chapters).

150

Relationship of rate constants to equilibrium constants

For reactions drawn as above, the single forward arrow indicates an irreversible process where the

combined free energies of the reactants are so much higher than for the products, that reaction events

in the reverse direction effectively never occur. Such reactions go to completion, without residual

reactants. In contrast, for reactions where the energetics on the two sides are more nearly balanced,

reaction events can occur in both directions. The velocities of the forward and reverse reactions

depend on the concentrations of the reactants and products, respectively. And when concentrations

are reached where the forward and reverse velocities are equal, then no net conversion is occurring

(though conversions are in fact occurring in both directions). This is what is meant by chemical

equilibrium. This notion gives us an important relationship between rate constants and equilibrium

constants. For the reaction

2A B

where k1 is the forward rate constant and k-1 is the reverse rate constant, the forward reaction

velocity would be k1[A]2, while the reaction velocity in the reverse direction would be k-1[B]. The

equilibrium condition is where those two velocities are equal, giving

k1[A]2 = k-1[B] (at equilibrium)

and

k1/k-1 = [B]/[A]2 (at equilibrium)

Evidently, the ratio of rate constants k1/k-1 is equal to the equilibrium constant K. This is a general

result.

Integrating rate laws

For simple reaction schemes it is often possible to integrate the differential equations that come from

the rate law in order to obtain a complete description of how the concentrations of the reactants and

products change over time. We will work out the results for first order and second order reactions:

1st order decay

A B

151

In order to get to a differential equation in terms of [A], we combine two points. First is the definition

of the velocity in terms of the rate of change of [A], v = -d[A]/dt. Second is the rate law that describes

the dependence of the velocity on [A], v = k[A]. Together these give,

−𝑑[𝐴]

𝑑𝑡= 𝑘 [𝐴]

∫𝑑[𝐴]

[𝐴]= −𝑘 ∫𝑑𝑡

ln[𝐴] |[𝐴]0

[𝐴]= −𝑘 𝑡|

0

𝑡

which gives the familiar first order decay equations

ln ([𝐴]

[𝐴]0) = −𝑘𝑡

and

[𝐴] = [𝐴]0𝑒−𝑘𝑡

The behavior of [A] over time is exponential and the behavior of ln [A] is linear with time, with a slope

that gives the rate constant k.

Describing decay times for 1st order decay

152

The time scale of first order decay is often

described in terms of a half-life, t1/2. This is the

time required for a reaction to go from some

given conditions to 50% completion. A slightly

different parameter, , is sometimes used to

describe decay times. It gives the time required

for a reaction to go to a degree of completion

that is 1/e compared to the initial condition.

That is, [A]/[A]0 = 1/e. The relationship

between t1/2 and is obtained by comparing

ln (1/2) = -k t1/2 to ln (1/e) = -1 = -k

giving

t1/2 = ln(2)

For the simple first order decay reaction of [A] above,

t1/2 = ln(2)/k

and

= 1/k

Note that the physical interpretation of is slightly more complex than t1/2, but gives a simpler

algebraic relationship to the rate constant. We will see later that in more complex kinetic schemes

we sometimes get first order equations with more complex expressions in the exponent term. But

is always simply related to the exponent (which multiplies time) by a reciprocal relationship. That

is, if

𝑥 = 𝑥0𝑒(some expression)𝑡

then

𝜏 =1

(some expression)

and

𝑡12⁄

=ln (2)

(some expression)

153

Integrated rate law for a 2nd order irreversible reaction

2A B

Again we combine two expressions involving the velocity v, one that defines v in terms of the rate of

disappearance of the substrate (v = (-1/2)d[A]/dt) and the other a rate law that describes the

dependence of v on the substrate concentration (v=k[A]2). Combined, these give

−𝑑[𝐴]

𝑑𝑡= 2𝑘 [𝐴]2

∫𝑑[𝐴]

[𝐴]2= −2𝑘 ∫𝑑𝑡

−1

[𝐴]|[𝐴]0

[𝐴]

= −𝑘 𝑡|0

𝑡

1

[𝐴]−

1

[𝐴]0= 2𝑘𝑡

In this case, a plot of 1/[A] versus time gives a

straight line whose slope relates to the rate constant

k.

Other irreversible reactions of higher order can be integrated easily in a similar fashion.

Establishing a rate law from measured reaction velocities

We saw above how different rate laws give a different time evolution for the concentration of

reactants. And whether a reaction follows a first order or second order (or some other) rate law can

therefore be examined by plotting ln[A] or 1/[A] as a function of time and checking to see if the result

is a straight line. A different way of experimentally examining a rate law is by evaluating the

dependence of reaction velocity on concentrations. Measuring initial reaction velocities under

different concentrations makes it possible to determine what exponents are associated with the

reactant concentrations. If a reaction is first order in [A], then v will depend linearly on [A] (i.e.

doubling [A] will double the reaction velocity). Likewise, if a reaction is second order in [A], then

doubling [A] will quadruple the velocity, and so on. Some complicated reaction schemes can show

non-trivial dependence on concentration, even non-integer exponents. As a general approach to

establishing an exponent, if

154

v = [A]

then for rate measurements made at two different concentrations,

ln (v2/v1) = ln([A]2/[A]1)

Behavior of more complex reaction schemes

We are very often interested in the behavior of kinetic schemes involving more than one independent

reaction event. Complexity can arise in different forms. The two reactions may effectively operate

in sequence, with the product of the first reaction being the reactant in the second reaction. Or two

reactions may be operating on the same species, giving a scheme that is branched rather than linear.

Regardless of the complexity of the reaction scheme being proposed, setting up the underlying

differential equations is generally straightforward. As an example,

gives the following equations:

d[A]/dt = -k1[A], d[B]/dt = k1[A] – k2[B], d[C]/dt = k2[B]

Note here that the rate of change of [B] involves two terms, the rate at which it is being formed minus

the rate at which it is being consumed. For another example,

d[A]/dt = -k1[A] - k2[A] = -(k1+k2)[A], d[B]/dt = k1[A], d[C]/dt = k2[A]

Steady state assumptions for obtaining simple rate laws for complex reactions

Complex reaction schemes generally have complex behavior, including non-trivial equations for the

time dependence of the reactants and products. And the dependence of the rate of the overall

reaction (i.e. the rate law) can depend on the concentrations of species that do not contribute to the

overall reaction. Sometimes a complete description of the behavior can be obtained by solving the

full system of differential equations, but this may be difficult. However, especially in cases where

155

sequential reactions are involved, simplified rate laws can often be obtained by assuming steady state

conditions. Steady state refers to conditions where there are intermediate species (which do not

contribute to the overall reaction stoichiometry) whose concentrations have reached a constant

value, at least momentarily. In other words, one can work out a simplified expression for the overall

reaction velocity under conditions where d[Intermediate]/dt=0. As an example, consider

Here, the intermediate is B; it does not contribute to the overall reaction stoichiometry A → C. First

we write out an expression for the change in [B], based on elementary rate laws for all the steps in

which B is formed or consumed, and then we set that derivative to zero according to the steady state

assumption.

d[B]/dt = k1[A] – k-1[B] –k2[B] = k1[A] – (k-1+k2)[B] = 0

Now we rearrange to get an expression for [B] that we can use for substitution subsequently. At

steady state,

[B] = k1[A]/(k2 + k-1)

Now we can go back to the original reaction scheme and write an expression for the overall velocity.

The overall reaction velocity could be defined as v =d[C]/dt. Then we can write an equation for

d[C]/dt in terms of elementary rate laws. Here, d[C]/dt = k2[B]. Now we substitute for [B] at steady

state from above to get

v = k1k2[A]/(k2 + k-1)

Under this treatment, the 2-step reaction behaves as first order in [A] at steady state. This particular

reaction scheme shows up often, including in treatments of enzyme kinetics, as we will see later.

Numerical computer simulation of more complex reaction schemes

In cases where complete solutions are difficult to obtain by solving differential equations, and where

approximations like steady state are undesirable, one can almost always simulate the behavior of a

complex reaction scheme using simple computer programs. The key is to treat the time derivative of

the concentration of each species as the ratio of a very small change in concentration over a very

small time increment. For example, in the scheme above

Δ[A]/Δt = -k1[A] + k-1[B]

and

156

Δ[A] = (-k1[A] + k-1[B]) Δt

A related equation can be written for each species. To make the computer simulation go, one simply

assigns starting values to the concentrations of all the species, and then updates the concentrations

of all the species in a series of very small time steps on the basis of equations like the one above.

An example of computer code (in the Python programming language) is shown below for simulating

the behavior of the kinetic scheme above. The initial concentrations are [A] = 1M, [B]=0, [C]=0.

# Set up arrays to hold concentrations for 500 time steps

A = [0.0 for n in range (0,500)]

B = [0.0 for n in range (0,500)]

C = [0.0 for n in range (0,500)]

# Assign initial concentrations

A[0]=1.

B[0]=0.

C[0]=0.

# Assign rate constants for the simulation

k1=500.

kminus1=400.

k2=300.

# Choose a time interval small enough so that concentration

# changes in each step will be small.

timestep = 0.00005

# Set up the loop over time. Here, after 500 steps, the total

# time elapsed would be 500 * 0.00005 = 0.025 seconds

# Apply kinetic equations to update the concentrations in each step.

# The index for the time step is specified in brackets.

for nt in range (1,500):

A[nt] = A[nt-1] + timestep*(B[nt-1]*kminus1 – A[nt-1]*k1)

B[nt] = B[nt-1] + timestep*(A[nt-1]*k1 – B[nt-1]*kminus1 - \

B[nt-1]*k2)

C[nt] = C[nt-1] + timestep*(B[nt-1]*k2)

print (nt*timestep, A[nt], B[nt], C[nt])

157

The result of that simulation is shown here.

Enzyme kinetics under a steady-state assumption

The following model is often used to treat the kinetics of a simple unimolecular enzyme reaction:

Here, E is the free or unbound enzyme, S is the free or unbound substrate, ES is the enzyme-substrate

(or Michaelis-Menten) complex between the enzyme and substrate, and P is the product. The ratio

of k1 to k-1 describes how tightly the enzyme binds the substrate, while kcat describes the unimolecular

catalytic rate constant for conversion to P. The treatment of this enzyme model at steady state dates

to the 1920’s by Briggs and Haldane. The velocity of the overall reaction is described by v = d[P]/dt,

and according to the rate law for this elementary step, v = kcat[ES]. But [ES] is an intermediate, and

to obtain an equation for v in terms of species that contribute to the overall stoichiometry of the

reaction, we need to replace [ES]. If we adopt a steady state assumption we can obtain an expression

for [ES]. Taking account of all the steps in which [ES] is formed or consumed, and then setting this to

0,

d[ES]/dt = 0 = k1[E][S] – (k-1 + kcat)[ES]

158

which gives

[ES] = k1[E][S]/(k-1 + kcat)

Then substitution gives

v = kcat k1[E][S]/(k-1 + kcat)

This equation is valid but not very insightful in the sense that it describes the reaction velocity in

terms of the free enzyme concentration; in an experimental set-up one typically has control over the

total enzyme concentration but not the free enzyme concentration, which is clearly a function of how

much substrate is present. To gain more insight, the standard approach is to recast the kinetic

equations in terms of the total enzyme concentration and in terms of the ratio of the reaction velocity

to its maximum possible value (i.e. when all the enzyme is in the [ES] form so that [ES]= [E]total). The

maximum velocity is kcat times the maximum possible value for [ES], which is kcat[E]total =

kcat([ES]+[E]). Then,

v/Vmax = kcat[ES]/(kcat([E]+[ES])) = [ES]/([ES]+[E])

This is sensible. It simply states that the velocity in terms relative to the maximum is given by the

fraction of the enzyme that is in the [ES] form. To simplify the expression further we can divide the

top and bottom by [ES] to give

v/Vmax = 1/(1+ [E]/[ES])

Then we can take the previous equation, [ES] = k1[E][S]/(k-1 + kcat), and rearrange it to get an

expression for [E]/[ES] = (k-1 + kcat)/(k1[S]), which we can substitute in the equation above to give

v/Vmax = 1/(1 + (k-1 + kcat)/(k1[S]))

Multiplying the top and bottom by [S] gives

v/Vmax = [S]/([S] + (k-1 + kcat)/(k1))

This has the form of the familiar Michaelis-Menten equation

v/Vmax = [S]/([S] + KM)

where the Michaelis-Menten constant KM can be seen to be (k-1 + kcat)/(k1) for this kinetic model. The

equation above can also be converted from fractional velocity to v to give

v = kcat[E]total [S]/([S] + KM)

159

From the form of this equation you can see that the behavior is hyperbolic, with v approaching Vmax

asymptotically as [S] gets much higher than KM, and v/Vmax =1/2 at [S]=KM.

Relaxation kinetics: how systems approach equilibrium

Fast reactions are difficult to study experimentally. A rapid reaction may proceed to completion so

rapidly once it is initiated that measuring concentration changes is problematic. For bi-molecular

reactions, the reaction may be faster than the time required to mix the reactants prior to

measurement. Problems of that type can sometimes be mitigated with instruments designed to

achieve rapid mixing. So-called stopped-flow or continuous-flow instruments deliver reagents from

two separate syringes into a small mixing chamber where spectroscopic readings can be taken

immediately. The delivery of reagents occurs in a single bolus after which the delivery of reactants

is stopped while the time course of the reaction is monitored. In a variation referred to as

continuous-flow, delivery of reactants continues and the reacting mixture flows down a capillary

tube. In this set-up, distance along the capillary corresponds directly to time elapsed since the

reactants encountered each other, so that concentration measurements at different positions give an

effective time course. The disadvantage of continuous-flow methods is that large amounts of material

may be required. On the other hand, microfabrication techniques for making very small systems for

fluid handling are now fairly commonplace. Microfluidic devices can be custom designed and

fabricated from transparent polymers (typically PDMS), with very many chambers and transport

capillaries with flow controlled by computers and pressure sensitive valves. This has given rise to

the notion of a ‘lab-on-a-chip’, with thousands of reactions being monitored in a piece of polymer the

size of a credit card. These kinds of devices make it possible to study many reactions and reactant

conditions with very little loss of material.

Another category of kinetic analysis aims to study how rapidly systems approach (or ‘relax’ towards)

equilibrium after they have been (nearly instantaneously) perturbed from equilibrium. So-called

relaxation methods circumvent the mixing problem in the sense that the system under study is

already mixed. How can a system in which all the reacting substrates and products are present and

equilibrate rapidly be brought to a point where it is not at equilibrium so that the speed of approach

to equilibrium can be studied? The T-jump (for temperature jump) method was developed in the

1950’s by Manfred Eigen to study fast reactions and their relaxation back to equilibrium. The idea is

that if energy is rapidly delivered to a solution containing a reactant and product at equilibrium, for

example by discharging an electrical capacitor (or in later developments using a laser pulse), then

the temperature of the system can be heated nearly instantaneously. Now, recalling the van’t Hoff

equation, if H for the reaction under study is non-zero, then the equilibrium constant will be

different at the new temperature. So by increasing the temperature suddenly, the system has

effectively been perturbed from its previous equilibrium, not by changing the concentrations of

reactants and products, but by changing the equilibrium position; the system is maintained at the

new temperature while the system approaches the new equilibrium concentrations by conversion of

reactant to product or vice-versa. How fast the system approaches equilibrium clearly depends on

160

the forward and backward rate constants of the reaction. The mathematics for how systems

approach equilibrium reveals some general principles.

The simplest system to consider is A interconverting to B. First we note that because there is just

one conversion in this system that it must be possible, given any concentration values for A and B, to

describe how far the system is from equilibrium with a single concentration variable. Call this

distance from equilibrium x. If we introduce a shorthand notation of �̅� to represent the equilibrium

concentration of A (at the new temperature) and similarly for �̅�, then we can relate the

concentrations of A and B to their equilibrium values plus or minus x, by [A]= �̅�+x and [B]= �̅�-x.

Now we can examine the approach to equilibrium by writing an equation for the time dependence of

x. First, we note that d[A]/dt = d(�̅�+x)/dt = dx/dt. Then we can write an expression for d[A]/dt as

–k1[A] + k-1[B] = –k1(�̅�+x) + k-1(�̅�-x) = k-1�̅�- k1�̅� –x(k1+ k-1). The term k-1�̅�- k1�̅� can be seen to be

equal to zero because �̅�/�̅�= K = k1/ k-1. Dropping those terms gives

dx/dt = –(k1+ k-1) x

Evidently, x (the distance from

equilibrium) follows first order

kinetics. Skipping the familiar

details for handling a first order

differential equation,

𝑥 = 𝑥𝑜𝑒−(𝑘1+𝑘−1)𝑡

and

ln(x/x0) = -(k1+ k-1)t

We can further convert these to the

general forms

𝑥 = 𝑥𝑜𝑒−𝑡/𝜏

161

and

ln(x/x0) = -t/

where in this case = 1/(k1+ k-1).

Assuming one can measure the

concentration of A or B as a function of

time, then x as a function of time is

known, since x = [A]− �̅� = �̅� −[B]. This

allows measurement of (e.g. from the

reciprocal of the slope of ln(x) vs t), so

that the value of (k1+ k-1) is obtained. If

the equilibrium constant K for the

reaction is known also, then k1 and k-1

can both be obtained from values of

and K.

1/ = k1+ k-1 and k1 = Kk-1 , so 1/ = Kk-1+ k-1 = (K+1)k-1

k-1 = 1/( (K+1)) and k1 = K/( (K+1))

Higher order reactions approaching equilibrium

It is not surprising to see that the first order reaction above approaches equilibrium in a first order

fashion. However, we can show that more complex reactions also approach equilibrium in a first

order fashion as they come close to equilibrium. Consider this second order reversible reaction:

Again, there is just one transformation so the distance from equilibrium can be described by a single

variable x, and the concentrations at any point in time can be expressed in terms of x and the eventual

equilibrium concentrations. Following the same approach as before, d[A]/dt = d(�̅�+x)/dt = dx/dt.

And d[A]/dt = –k1(�̅�+x) (�̅�+x) + k-1(𝐶̅-x) = k-1𝐶̅- k1�̅��̅� – x(k1(�̅�+�̅�)+ k-1) – k1x2. The first two terms,

k-1𝐶̅- k1�̅��̅�, cancel to zero, and if we are close enough to equilibrium then x will be small and we can

neglect the x2 term. Then,

dx/dt = –(k1(�̅�+�̅�)+ k-1) x

162

Therefore, the distance from equilibrium x shows first order behavior close to equilibrium, with

=1/(k1(�̅�+�̅�)+ k-1)

Kinetics from single molecule studies

Recent developments in instrumentation have made it possible to perform a variety of

measurements on single molecules. The kinetics of individual molecules undergoing chemical

reactions or conformational transitions can be analyzed, but when working with individual

molecules we have to look at things in a way that does not involve concentrations. For a unimolecular

event, we can get a sense for the rate constant by looking at how long the molecule persists in its

current state before undergoing a reaction. Think of this as a waiting time before a reaction event or

conformational change occurs. Reaction events are always stochastic (i.e. having a random

character), but if we measure the waiting time for several independent reaction events we should be

able to relate that to an underlying rate constant; the average waiting time should be shorter for a

process with a higher rate constant. Consider the irreversible conversion of A to B with a rate

constant k. On average, how long should we expect to wait before any given molecule of A converts

to B? We can work out the relationship by starting with the equation for treating the reaction in bulk:

A/A0 = exp(-t/). In this case =1/k but we will keep the equation in terms of for generality. One

way to look at A/A0 is to see it as the probability that any given molecule of A has not reacted before

time t. From there we can see that the probability that a molecule of A will react precisely at time t

is the derivative of that expression with respect to t. Differentiating, and correcting for the negative

sign, the probability that molecule A reacts precisely at time t is (1/)exp(-t/). Then, to get the

average time at which a molecule of A reacts, which has the same meaning as the waiting time, we

need to get the average value of t by weighting all possible values of t by the probability of reaction

at time t. We get this by multiplying t by the probability of reaction at time t and integrating from t=0

to infinity. From t((1/)exp(-t/)dt evaluated from t=0 to infinity (which requires integration by

parts), we get <waiting time>=. This simple result makes sense since the decay or relaxation time

is a general description of the time scale of a first order reaction.

The result above means that if we evaluate how long it takes a single molecule to undergo a transition

(preferably making the time measurement several times), then we have effectively measured . And

here, k=1/. The figures below illustrate three different kinds of experiments where single molecule

studies have been used to measure the rates of conformational conversions. The first is an example

of a voltage measurement across a cell using the patch-clamp method, where the voltage depends on

whether an ion channel is in the open or closed conformation. The second example illustrates a

spectroscopic measurement where the output signal depends on the conformation of a fluorescently

labeled ribosome, which is interconverting between two states. The third example illustrates a

reversible winding-unwinding transition in a single DNA molecule whose ends are being pulled apart

gently. In all three cases you can see how the data could be interpreted in terms of waiting times

between transitions. In fact, you can see that it should be possible to get the rate constants for both

the forward and reverse transitions by measuring the waiting times in both states. The ratio of those

163

would be the equilibrium constant, and as expected this matches the ratio between the average time

the molecule spends in the two conformations.

164

CHAPTER 16

Kinetic Theories and Enzyme Catalysis

We have discussed the reaction velocity and rate laws, but so far we have not said anything about

what determines the rate constant, k. What makes some reactions intrinsically fast and some slow?

Given the variety of reactions that occur in nature – with vast differences in speed, number of

reactants involved, types of bonds formed, etc. – it is not surprising that different models have been

developed to explain the mechanisms of chemical reactions and their rate constants. Two widely

discussed models are due to Arrhenius and Eyring.

The Arrhenius equation

The Arrhenius equation is often used to discuss reaction rates in terms of molecular collisions.

According to the Arrhenius equation, a rate constant k is determined by

𝑘 = 𝐴𝑒−𝐸𝑎

𝑅𝑇⁄

A is a ‘frequency factor’ and Ea is an activation energy. The frequency of collisions in a reaction is

clearly dependent on concentrations, and that dependence is already built into the equation for

reaction velocity; e.g. for X + Y → Z, v=k[X][Y]. The frequency factor in the equation for k therefore

embodies other phenomena, such as the dependence of molecular velocities and consequently

collision rates on temperature, and the dependence of reaction probability on the orientation of the

colliding molecules.

The activation energy, Ea, describes a lower bound for the energy that reactants must have if reaction

is to occur. Why does Ea enter the equation for k as an exponential term? This follows directly from

the Boltzmann distribution. If we express the number of molecules N(E) that have energy E according

to the Boltzmann distribution (N(E) exp(-E/RT)), we can evaluate the fraction of molecules having

energy at least as high as some fixed energy value Ea by taking the ratio of the area that falls under

the curve and has E greater than or equal to Ea, divided by the entire area under the curve.

∫ 𝑒−𝐸

𝑅𝑇⁄∞

𝐸𝑎

∫ 𝑒−𝐸

𝑅𝑇⁄∞

0

⁄ =−𝑅𝑇𝑒

−𝐸𝑅𝑇⁄ |

𝐸𝑎

∞

−𝑅𝑇𝑒−𝐸

𝑅𝑇⁄ |0

∞⁄ =𝑒−𝐸𝑎

𝑅𝑇⁄

1= 𝑒−𝐸𝑎

𝑅𝑇⁄

165

This explains the exponential term in

the Arrhenius equation. A key element

of the Arrhenius equation is that the

rate constant depends strongly on the

height of an energy barrier. It also

depends on temperature. In fact the

dependence of k on T can be used to

evaluate the activation energy in the

Arrhenius model. The frequency factor

introduces some dependence of k on

temperature vis-à-vis molecular

velocities, but the main dependence of k

on T is through the exponential term.

𝑑(𝑙𝑛(𝑘))𝑑𝑇

⁄ ≅𝑑 (

−𝐸𝑎

𝑅𝑇 )

𝑑𝑇⁄ =

𝐸𝑎𝑅𝑇2⁄

or

𝑑(𝑙𝑛(𝑘))

𝑑 (1𝑇)

⁄ ≅𝑑 (

−𝐸𝑎

𝑅𝑇 )

𝑑 (1𝑇)

⁄ =𝑑 (

−𝐸𝑎

𝑅𝑇 )

−𝑑(𝑇)𝑇2

⁄ =−𝐸𝑎

𝑅𝑇2⁄ 𝑇2 =−𝐸𝑎

𝑅⁄

Eyring transition state theory

Eyring transition state theory provides a slightly different way of looking at things that is more

explicit about the occurrence of high energy species during a single reaction event. In the Eyring

model, a single reaction step, for example

is reimagined in terms of two steps. In the first step, an unstable high energy species referred to as

the transition state is formed, The transition state breaks down to product in the second step. The

‘double-dagger’ symbol indicates the transition state.

The rate constant for breakdown of the maximally unstable transition state is approximated to be the

frequency of molecular vibrations, which from quantum mechanics is on the order or kBT/h, where

h is Planck’s constant. With that substitution, the velocity of the reaction scheme above would be

v=d[C]/dt=(kBT/h)[AB‡]. As we did in our earlier treatments of multistep reactions, we need to make

some assumption if we want to express the velocity in terms of reactants that contribute to the

166

stoichiometry. Here, if we assume that first step describing formation of the transition state is at

equilibrium, then k+‡/k–‡=K‡=[AB‡]/([A][B]), so [AB‡]= K‡[A][B]. Making that substitution, the

velocity of the reaction would be v=(kBT/h) K‡ [A][B]. Now if we compare this expression for v to the

simple rate law we would write for a single elementary reaction step, namely v=k[A][B], we can see

by matching up terms that the Eyring model gives

k=(kBT/h) K‡

as the expression for the rate constant of the reaction. In this model, the rate constant is determined

largely by the equilibrium constant K‡ for forming the transition state. We can also write K‡ in terms

of the free energy for reaching the transition state, G‡. K‡=exp(-G‡/RT). That substitution would

give

k=(kBT/h) exp(-G‡/RT)

This result differs in detail from the Arrhenius equation, but the similarity in terms of an exponential

dependence on an energy barrier is clear. We can look at the temperature dependence of ln(k) for

the Eyring equation in the same was as we did for the Arrhenius equation. The multiplicative factor

at the beginning of the expression introduces a minor dependence on temperature, which we will set

aside in order to look at the main dependence.

d(ln(k))/dT d(-G‡/RT)/dT = d(-H‡/RT + S/R)/dT = H‡/RT2

Comparing to our earlier result with the Arrhenius equation, we can see that the activation energy in

the Arrhenius equation relates closely to the transition state enthalpy in the Eyring model.

Catalysis by lowering the transition state energy

The Eyring transition state model provides a way to look at catalysis in terms of transition state

energies. Consider the reaction of a substrate to form a product in either an uncatalyzed reaction or

a catalyzed reaction. Let the rate constant for the uncatalyzed reaction be kuncat and the rate constant

for the catalyzed reaction be kcat. We can draw the two reactions on the same energy diagram and

consider the effect of lowering the transition state energy in the case of the catalyzed reaction. From

the Eyring equation for the rate constant we can write out the ratio of the two rate constants.

𝑘cat

𝑘uncat=

𝑘𝐵𝑇ℎ⁄ 𝐾cat

‡

𝑘𝐵𝑇ℎ⁄ 𝐾uncat

‡=

𝐾cat‡

𝐾uncat‡

=𝑒

−Δ𝐺‡cat

𝑅𝑇⁄

𝑒−Δ𝐺‡

uncat𝑅𝑇

⁄= 𝑒

−(Δ𝐺‡cat−Δ𝐺‡

uncat)𝑅𝑇

⁄

167

In other words, if a catalyst lowers the

transition state energy for a reaction by an

energy that amounts to 10RT, then the

reaction will be sped up by a factor of e10,

which is about 22,000.

But how is it that a catalyst lowers the

transition state energy of a reaction? This

question was considered in the context of

enzyme catalysis as early as 1948 by Linus

Pauling. His description of how enzymes

must operate (which came more than a

decade before the atomic structures were

known for any proteins or enzymes) was

extraordinarily prescient. According to

Pauling:

I believe that … the surface configuration of the enzyme is … complimentary to an unstable

molecule with only transient existence – namely the “activated complex” for the reaction that

is catalyzed by the enzyme. The mode of action of an enzyme would then be the following: the

enzyme would show a small power of attraction for the substrate molecule or molecules,

which would become attached to it in its active surface region. This substrate molecule, or

these molecules, would then be strained by the force of attraction to the enzyme, which would

tend to deform it into the configuration of the activated complex, for which the power of

attraction by the enzyme is the greatest. The activated complex would then, under the

influence of ordinary thermal agitation, either reassume the configuration corresponding to

the reactants, or assume the configuration corresponding to the products. The assumption

made above that the enzyme has a configuration complementary to the activated complex, and

accordingly has the strongest power of attraction for the activated complex, means that the

activation energy for the reaction is less in the presence of the enzyme than in its absence, and

accordingly, that the reaction would be speeded up by the enzyme.

Further insight can be added by drawing

a kinetic diagram that relates the binding

events in the presence of an enzyme to

the formation of the transition state. The

reactions across the top describe

reaction in the absence of the enzyme

while the reactions across the bottom

describe reaction in the presence of the

enzyme. Following our previous

equations from the Eyring theory, the

ratio between the rate constants in those

two cases would be

168

𝑘cat

𝑘uncat=

𝐾ES‡

𝐾S‡

By providing a binding surface that is complimentary to the transition state form of the substrate, the

equilibrium constant for reaching the transition state is increased by mass action, and according to

the equation above this speeds up the reaction. The situation can also be viewed in terms of binding

affinities of the enzyme for S compared to S‡. Those steps are described by the vertical reactions. By

completing the thermodynamic cycle in the figure, we know that K‡S Kbinding S‡ = Kbinding ES‡K‡ES. The

ratio above for how much a reaction is sped up is then

𝑘cat

𝑘uncat=

𝐾ES‡

𝐾S‡

=𝐾binding S‡

𝐾binding S

In this view, an enzyme speeds up its reaction by binding exceptionally tightly to the transition state

form of the substrate; that is what lowers the free energy of the transition state.

Practical consequences of enzymes binding tightly to the transition state

Understanding that an enzyme binds extremely tightly to the transition state form of its substrate

has led to a number of important practical scientific developments.

Transition state analogues as enzyme inhibitors

Designing molecules to inhibit key enzymes is a major effort in pharmaceutical research. Important

enzyme targets are too numerous to list, but they include enzymes from pathogenic bacteria and

viruses as well as human enzymes involved in disease-related pathways, such as those that regulate

blood pressure and inflammation. If an enzyme speeds up a reaction by a factor of a thousand, then

our reasoning above indicates that the enzyme binds the transition state form of the substrate a

thousand times more tightly than it binds the substrate. So, a drug molecule that looks like the

transition state form of the substrate will bind tightly to the enzyme and act as an inhibitor. The main

challenge of course is that the transition state is entirely unstable. The goal then is to come up with

a compound – a transition state analogue – that looks as much as possible like the transition state,

but yet is stable and can be synthesized (cheaply). This can be a difficult proposition.

Creating new enzymes from a natural antibody repertoire

The concept of catalytic antibodies was first mentioned by chemist William Jencks in 1969 and

reduced to practice about the same time by Richard Lerner and Peter Schultz beginning in the 1980’s.

The goal was to create novel enzymes that would catalyze useful chemical reactions, including types

of reactions that no natural enzymes had evolved to carry out. The idea relies on the spectacularly

169

large diversity of antibodies that can be generated by the mammalian immune system. If an animal

has the genetic capacity to generate 1012 different antibody molecules, surely some of them should

have a tight binding affinity for any imaginable chemical entity, including a transition state for a

reaction one might want to catalyze. According to the Eyring theory and the logic articulated by

Pauling and Jencks, if you can find an antibody sequence with a high affinity for the transition state

of a desirable reaction, then you have found an enzyme for that reaction. The work required to

identify an antibody with the desired property is challenging. In order to induce production of

antibodies that might bind tightly to the transition state, the animal must be inoculated with a

transition state analogue for the reaction, and the same difficulty noted above regarding transition

state analogues must be overcome. Several studies have succeeded in finding antibodies that exhibit

catalytic activity for a desired reaction, but the rates of acceleration have generally not been very

high.

Computational enzyme design

There is much current interest in the idea of using sophisticated computer programs to design the

amino acid sequence of a protein that will catalyze a desired reaction. Rather than designing a novel

protein from scratch, the most feasible approach is to take a natural protein that has a surface cleft

suitable for binding a compound of about the right size, and modify the amino acid sequence mainly

within the binding site cleft. The potential power of this approach is very high. In contrast to the

catalytic antibody approach, there is no need to synthesize a transition state analog. Instead, one

requires an accurate model (i.e. detailed atomic coordinates) for what the transition state is likely to

look like. Modern computer programs are capable of producing reasonable models of transition

states. The most challenging element is designing amino acid changes into a protein in such a way

that the transition state would be tightly bound. One difficult issue concerns the calculation of free

energies for large systems like proteins and their complexes. Even the aqueous solvent is important

to consider given the contribution of hydrophobic effects and water structure in general to the

energetics. Beyond the still unsolved problem of accurate energy predictions, changing the amino

acid sequence of a natural protein very often causes unforeseen and unpredictable effects, including

loss of stability and aggregation. In many cases, changing the amino acid sequence of a protein may

make alternate (non-native) configurations of the protein more stable than the intended structure.

The ability to consider and avoid all the possible alternate structures a modified protein might adopt

is well beyond the current capacity of computers and protein modeling software. Nonetheless, there

have been a few exciting successes in designing new enzyme activities computationally. As with the

catalytic antibodies however, catalytic rates have so far been fairly modest. Further advances along

this line are very likely in the future as computer programs continue to improve.

Kinetic parameters of natural enzymes

Natural enzymes have evolved over billions of years to speed up the reactions they catalyze. How

well do they perform? And could they be better? These are thorny questions. Part of the complexity

concerns the saturation behavior of enzyme kinetics (v=[Etotal] kcat [S]/([S]+KM)). An enzyme that has

170

a very high kcat may not be so great if it doesn’t bind its substrate very well (i.e. if KM is high). Of

course the best thing would be to have a very high kcat and a very low KM. But there may be tradeoffs

in the ability of any given enzyme to optimize both parameters. In view of this, the ratio kcat/KM is

often discussed as a general measure of the efficiency of an enzyme, essentially a reflection of the

joint value of having high kcat and low KM. In terms of a typical hyperbolic graph of enzyme activity

vs substrate concentration, kcat determines the maximum velocity at any substrate concentration,

while kcat/KM is the slope of the velocity curve (normalized for total enzyme concentration) in its

linear region well below saturation. This can be seen by evaluating the standard Michaelis-Menten

velocity equation (above) at [S] << KM. There, v/[Etotal](kcat/KM)[S], which confirms the statement

about the slope of the velocity curve. And v(kcat/KM)[Etotal][S], which shows that kcat/KM takes the

form of a bimolecular rate constant – recall that for A+B→C, v=k[A][B].

What are the values for kcat and KM for natural enzymes? These values show an astonishing range of

variation from enzyme to enzyme. Much of that variation reflects differences in the kinds of

substrates involved and the kinds of chemical rearrangements that take place. Subtler effects relate

to cellular conditions. For example, there is no need for an enzyme to evolve an incredibly tight

binding constant (low KM), which might even come at the expense of a lower kcat, if the substrate in

question exists at high concentration in the cell; operating under highly saturated conditions is

generally not an advantageous strategy. Conversely, if the KM is too high, then the enzyme will be

very poorly occupied; synthesizing idle enzyme molecules is an expensive burden for the cell. A

general finding is that the KM values exhibited by natural enzymes tend to be roughly in the same

range as the natural cellular concentration of the substrate or substrates on which they operate.

According to a recent survey of published enzyme kinetic parameters in the literature, median kcat

values for natural enzymes are on the order of 10 sec-1 overall, and about 10 times faster (100 sec-1)

for enzymes that operate in central metabolism, where high flux is important. The median value for

KM in natural enzymes is around 100M or 10-4 M. The median value for kcat/KM is on the order of

105 M-1sec-1.

As a group, natural enzymes vary widely from these median values. What about limiting cases? Is

there a maximum? Reactions that are bimolecular face an upper limit governed by diffusion. Even if

an enzyme could bind a substrate infinitely tightly and catalyze its conversion to product infinitely

fast, the rate of the reaction would be limited by how fast the two molecules can encounter each other

in solution owing to the limits of diffusion. Depending on molecular sizes (which govern diffusion

coefficients), the upper bound for kcat/KM in diffusion-limited bimolecular enzyme-substrate

encounters is in the range 108 – 109 M-1sec-1. This limiting value comes from analyses beyond those

used to develop our standard equations for reaction velocities, which ignore the role of diffusion.

Very few enzymes operate near the diffusion-limiting value of kcat/KM, but a few do. Superoxide

dismutase (SOD) and triose phosphate isomerase are two well-studied examples.

171

CHAPTER 17

Introduction to Biochemical Spectroscopy

Energy transitions

We understand from quantum mechanics that molecules

can exist only in discrete energy states, and transitions

between one energy state and another can be driven by

absorption or emission of electromagnetic radiation (i.e.

photons) if the energy of the photon matches the energy

difference of the transition. The relationship between the

frequency or wavelength of the radiation and energy is

E=hv=hc/.

You may remember from general chemistry that very

simple molecules like single atoms typically show very sharp absorption and emission bands. They

undergo transitions at only very narrow wavelengths – recall the Rydberg series. The discreteness

of their spectral properties reflects the simplicity of their allowable energy states (i.e. electronic

states of hydrogen-like atomic orbitals).

In contrast, complex molecules

have complex spectra. The

presence of multiple atoms in a

molecule introduces a dependence

of energy on nuclear positions.

Nuclear motions give rise to

vibrational energy states. The

energy differences between

vibrational states are generally

much smaller than those between

electronic states. The idea that

vibrational transitions are smaller in energy and essentially separable from electronic transitions

gives a picture where more finely spaced vibrational states can be superimposed on individual

electronic states, as shown. And even more finely spaced rotational transitions exist within those

states. The much greater complexity of the energy profile for complex molecules introduces the

possibility of very many transitions with closely spaced energies. As a result, absorption and

emission spectra for larger molecules are complex and more continuous in nature rather than

discrete.

An examination of typical energy magnitudes for electronic and vibrational transitions is instructive.

Electronic transitions are typically the subject of spectroscopy in the UV and visible range of the

electromagnetic spectrum. Consider then the energy associated with a wavelength of 400nm in the

172

violet region of the visible spectrum, E=hv=hc/ = 4.110-21 J. By comparison, this is about 120kBT.

According to the Boltzmann equation, the probability of a molecule residing in the excited electronic

state rather than the ground electronic state is essentially zero. We can repeat the calculation for a

typical vibrational transition; these typically occur in the infrared (IR) region of the electromagnetic

spectrum. Consider a carbonyl stretch, for which is approximately 1.9 M. The corresponding

energy is 1.0410-19 J. This is smaller than the energy for transitions in the UV/visible range, but still

equal to about 25 kBT. The conclusion is that, at ordinary temperatures and unless otherwise excited,

molecules generally populate almost exclusively the lowest vibrational state of the lowest electronic

states. This general idea has important implications for what energy transitions are most likely to

occur; a high probability transition requires the initial energy state to be well-populated.

Fluorescence

Our previous analysis tells us that an absorption transition is likely to occur from the ground

vibrational state of the ground electronic state. But to what higher electronic energy states is a

molecule likely to be excited by absorption? An interesting phenomenon arises from the general idea

that the lowest energy nuclear positions for a molecule are typically slightly different for different

electronic states. This is typically diagrammed as show here, where the two black curves indicate

the classical energy of a molecule as a function of its nuclear positions in two different electronic

states. The minimum energies occur at slightly different nuclear positions. Within each electronic

state, a series of vibrational states

are indicated. The width of the

lines (setting aside quantum

mechanical aspects of harmonic

oscillations) illustrate the range of

nuclear positions that are allowed

in each vibrational state. A

consideration of timescales now

leads to an interesting conclusion.

The timescale for photon

absorption is much shorter than

the timescale for nuclear motions.

This means that electronic

transitions occur ‘vertically’ in the

sense of the diagram shown. This

is known as the Franck-Condon

principle. If an electronic

transition must occur without

appreciable movement of nuclei,

then it must occur to a vibrational

state for which the initial nuclear

positions are allowable. The

173

diagram emphasizes that this typically is an excited vibrational state rather than a ground vibrational

state.

After absorption to an excited electronic state, according to the Boltzmann equation a molecule must

return to the ground state. The return to the ground electronic state can occur with emission of a

photon; this is fluorescence. The timescale for fluorescence is typically in the 10-8 to 10-5 sec range,

which is long enough for thermal vibrations and collisions (whose effects are illustrated in red in the

figure) to allow the molecule to descend to

lower vibrational states within the excited

electronic state before returning to the

ground electronic state. Again, that

electronic transition occurs vertically to an

excited vibrational state of the ground

electronic state, after which further

transitions lead back to the ground

vibrational state of the ground electronic

state. The key consequence of this

phenomenon is that the fluorescence

emission spectrum for a molecule is shifted

to lower energy and longer wavelength

compared to the absorption spectrum. This is referred to as the Stokes shift.

Uses and advantageous properties of fluorescence

Fluorescence offers a high degree of sensitivity for detecting and measuring the concentrations of

specific molecules, which may be fluorescent either naturally or by virtue of being chemically labelled

with a fluorophore (a fluorescence chemical group). Particularly in contrast to absorption studies

for measuring concentration, the high sensitivity of fluorescence derives from two features. First is

the shift in wavelength from the incident wavelength to the emission wavelength. In an absorption

experiment involving a dilute or weakly absorbing molecule, one is forced to analyze a small

difference between two large numbers – the number of photons transmitted by a blank compared to

the number transmitted by the sample. In a fluorescence experiment, the change in wavelength

makes it possible to analyze the number of emitted photons without interference from transmitted

photons, which have the same wavelength as the incident light. Taking advantage of the wavelength

difference requires a second monochromator placed between the sample and the detector; a first

monochromator is required between the light source and the sample. The fluorescence emission

intensity is proportional to the concentration of the fluorophore (as long as the concentration is not

too high), making accurate concentration determination possible from very dilute solutions. An extra

level of sensitivity comes from the ability to monitor fluorescence in a direction different from the

path of the transmitted beam. The figure illustrates the combined effects of wavelength change and

detection at an angle. Photons that are scattered (elastically) from the sample emerge at all angles,

174

but their wavelength is the same as the incident beam so they are distinguishable from fluorescent

photons.

Proteins typically have some natural fluorescence owing to the presence of tryptophan amino acids.

But the intensity of tryptophan fluorescence is not especially high, and one is often interested in using

fluorescence to monitor one protein (or nucleic acid) in particular. An enormous range of

fluorophores are available commercially with a wide range of spectral characteristics. These are

typically conjugated chemically to the macromolecule of interest by covalent attachment, often

through nucleophilic attack by a cysteine thiol or lysine amine groups. Fluorescence experiments

can also be performed in situ to monitor the presence and subcellular location of a specific protein

inside cells in tissue culture using fluorescence microscopy. Chemical labeling is generally not

possible in that scenario. Instead, the protein of interest can be rendered fluorescent inside the cell

by creating a fusion at the DNA level between the protein of interest and a naturally fluorescent

protein. Originally discovered in coral sea organisms, numerous such proteins are known with a

diverse range of emission colors; green fluorescent protein (GFP) is the most widely studied. An

interesting variation on the approach is to label two different proteins with distinct fluorescent

proteins having different emission colors, like red and green. Whether the two proteins localize

together in the cell – e.g. if they interact with each other – is evident by joint emission of red and

green colors (making yellow). The level of spatial detail that can be visualized in a standard visible

or UV microscopy experiment is a few hundred nanometers, which is fine enough to visualize

organellar, nuclear, and cytoskeletal structure in eukaryotic cells, but not fine enough to see

molecular structure.

A particularly useful feature of fluorescence is its sensitivity to chemical environment. The greater

sensitivity to environment for fluorescence compared to absorbance relates in part to the longer time

scale of fluorescence. In general, increased flexibility and environmental polarity lead to lower

fluorescent intensity; the peak emission wavelength can also be affected. As an example, the

fluorescence of tryptophan increases by a factor of roughly 4 in a low polarity solvent such as DMSO

175

(dielectric of about 35) compared to water (dielectric of about 80). Exposure of a fluorescent group

to particular chemicals known as quenchers also reduces fluorescence, and the magnitude of the

effect can depend on the degree to which the fluorophore is exposed on the surface of the

macromolecule.

The environmental sensitivity of fluorescence can be exploited in various types of experiments. We

discussed earlier how native tryptophan fluorescence can be used to monitor protein folding.

Tryptophan residues almost always become less flexible and more rigidly held in the folded state of

a protein, leading to higher fluorescence. In another type of experiment, if a protein is suspected to

bind a ligand that is fluorescent (or for which a fluorescent analogue is available), then binding of the

ligand to the protein can be detected by an increase in fluorescence.

Kinetics of fluorescence and competing routes for return to the ground state

After a molecule has

been driven to an

excited state by

absorbing a photon,

there are several

possible competing

routes for returning

to the ground state.

Some of these we

have discussed

already while some

we will return to

later. The relative

rates of these

processes relates

directly to which pathways dominate for a given molecule. If the rate constant for fluorescent

emission is higher than the rate constants for other processes, then most of the excited molecules

will return to the ground state by way of fluorescent emission.

As we discussed above, a number of phenomena affect fluorescence, including chemical environment,

so fluorescence can be used to monitor various events that alter the environment of a fluorophore in

solution. The details of the behavior expected can be understood by analyzing the phototransitions

using treatments similar to those we developed earlier for chemical kinetics. We can simplify things

by lumping together the various non-fluorescent pathways for return to the ground state under a

single rate constant, kother. Various underlying events can then be analyzed in terms of the effects

176

they have on the relative rates

of kfluor and kother. With respect

to kinetic treatments,

fluorescence experiments can

be of two essentially different

types: 1) under continuous

illumination where steady state

behavior is assumed, or 2)

following a brief pulse of incident light, after which time-dependent measurements are made. Note

that the latter type of experiment requires special instrumentation because the time scale for

fluorescence decay is usually shorter than milliseconds. We can analyze the behavior of both kinds

of experiments.

Constant illumination

Under constant illumination, the concentration of the excited state form of the fluorophore (P*) is not

changing. Setting d[P*]/dt = 0

d[P*]/dt = 0 = kabs[P] – (kfluor + kother)[P*]

Rearranging to obtain an expression for [P*],

[P*]=kabs[P]/(kfluor + kother)

Then, the fluorescent intensity Ifluor is given by

Ifluor = kfluor[P*] = kabs[P]kfluor/(kfluor + kother)

Since the rate of photon absorption is kabs[P], the ratio of the number of photons emitted to the

number absorbed – a fractional quantity known as the quantum yield Q – is given by

Q = kfluor/(kfluor + kother)

We can conclude from this analysis that the quantum

yield and the intensity of the fluorescence observed

under constant illumination is decreased by events in

solution that increase the rates of non-fluorescent

‘other’ pathways for return to the ground state. That

idea is illustrate here. One example of such a scenario

is binding of a fluorescent molecule (perhaps an

analogue of a suspected ligand) to a protein; this

would decrease the non-fluorescent pathways by

177

reducing mobility of the fluorophore and thereby increase the quantum yield along with the steady

state fluorescence intensity.

Time-resolved fluorescence

With appropriate instrumentation, an excitation pulse can be applied and the fluorescence intensity

(which must decay back to zero) can be

monitored over time. The same kinetic scheme

as above can be used if we remove the

continuous absorption event. This becomes a

simple case of exponential decay with a total rate

constant of kfluor + kother and a decay time of

= 1/(kfluor + kother)

The comparison of decay behavior in the

presence and absence of processes competing

with fluorescence can be diagrammed as shown.

178

CHAPTER 18

Special Topics in Biochemical Spectroscopy

Polarization and selection rules

In this section we discuss the

important role orientation

plays in spectroscopy.

Orientational effects become

apparent in spectroscopic

experiments conducted using

polarized light. You may

remember that

electromagnetic radiation is a

transverse wave in which a

traveling photon carries

oscillating electric and magnetic field vectors perpendicular to the direction of travel. Light emitted

from an ordinary source (e.g. a light bulb) carries photons whose electric field vectors point in all

possible directions perpendicular to the direction of travel. A variety of materials can be used to filter

ordinary incoming light to produce light that is ‘plane polarized’, meaning that the electric field vector

points in a single direction while it oscillates in magnitude, up and down; the direction of travel and

the electric field vector form a plane.

We know that for a photon of light to be absorbed and cause a transition from an initial state to a final

state that the energy of the photon must be correct. But the direction of its electric field vector is also

critically important. Whether a potential transition can be caused by a photon polarized in a certain

direction is embodied in quantum mechanical ‘selection rules’. In absorption spectroscopy, the

extinction coefficient relates to the strength or probability of a transition by being proportional to

the square of a transition dipole moment, 𝜇 . In a general form of the transition dipole moment,

𝜇 = ∫Ψ𝑖𝑥 Ψ𝑓𝑑𝑥

where 𝑥 describes the general position vector in space and Ψ𝑖 and Ψ𝑓 are the quantum mechanical

wavefunctions for the initial and final energy states. For our purposes of considering the absorption

or emission of light polarized in a particular direction, we can rewrite the equation in separate x, y, z

components. The probability of absorbing a photon polarized along the x direction is related to the

x component of the transition dipole moment, evaluated as

𝜇𝑥 = ∫Ψ𝑖𝑥 Ψ𝑓𝑑𝑥

179

with equivalent equations for polarization along y or z.

Analyzing whether electronic transitions can or cannot occur when the light is polarized in certain

directions can be simplified using a treatment that considers the symmetry vs anti-symmetry of the

initial and final wavefunctions. We will illustrate one example situation where the initial and final

wavefunctions are simple – much simpler than one would encounter with complex molecules, but

still highly instructive in understanding orientational effects. We begin with a reminder about

symmetric and anti-symmetric functions. These refer to functions whose values are either the same

when a spatial variable is negated (i.e. f(-x)=f(x)) or negated when a spatial variable is negated (f(-

x)=-f(x)), respectively. One way of looking at symmetric vs antisymmetric functions is in terms of

polynomial functions. We find that polynomial functions with even exponents (x0, x2, x4, etc.) are

symmetric whereas polynomial functions with odd exponents (x1, x3, etc.) are antisymmetric. We are

particularly interested in considering what happens when we evaluate the integral of a function that

is symmetric or antisymmetric. By either explicitly evaluating the integrals of such functions or by

thinking about the areas under the curves (positive and negative), we can see that odd

(antisymmetric) functions integrate to zero while even (symmetric) functions generally do not. The

illustrations here are 1-dimensional (depending only on x). In three-dimensions where a function

would depend on x, y, and z and integration would be over all dimensions, the result integrates to

zero if the function is odd or antisymmetric with respect to any of the three spatial variables.

Now consider an electronic transition between a bonding molecular orbital and a * anti-bonding

molecular orbital. Such transitions are common in conjugated double bond systems. Both molecular

180

orbitals are effectively

combinations of two side-

by-side p orbitals. The

signs of the two p orbitals

are aligned in the

molecular orbital but

oppositely oriented in the

* molecular orbital,

creating an extra nodal

plane in the latter case.

We can set up a coordinate system at the center of the molecular orbital and then tabulate the

symmetry vs anti-symmetry (or evenness vs oddness) of the functions that get multiplied together

inside the integral for the transition dipole moment. In order to evaluate the total symmetry of the

product of the functions inside the integral, we need to understand the rules for symmetry vs anti-

symmetry when functions are multiplied together. Since the evenness or oddness is a property of

exponents, products of the functions behave according to addition of even and odd numbers:

even+even=even; even+odd=odd; odd+odd=even. We need to make a separate analysis to consider

the transition dipole component for light polarized in each possible direction. For the case of light

polarized along the x direction, the term ‘x’ in the middle of the integral can be understood as being

x1y0z0, which is therefore odd with respect to x, and even with respect to y and z. With these rules in

hand we can construct a table to analyze 𝜇𝑥.

w/r/t axis

�⃗⃗� 𝑖 x = x1y0z0

�⃗⃗� 𝑓 total

x even odd odd even

y even even even even

z odd even odd even

In evaluating 𝜇𝑥, the total function inside the integral is even with respect to all three coordinate

variables, so the integral does not necessarily vanish. We conclude that the to * transition can

occur by absorption of a photon polarized along x (which is the bond direction). This transition is

therefore allowed, though our simple symmetry vs anti-symmetry treatment doesn’t tell us about

magnitudes. Also note that the allowable transition for polarization along x does not mean the

direction of travel of the photon is along x; in fact the direction of travel would have to be

perpendicular to x in order for the polarization to be along x.

Next we can evaluate transition dipoles for light polarized along y or z. Those tables are shown.

181

In both of these cases the total function is odd with respect to at least one variable, so the integral

vanishes. That means that 𝜇𝑦 and 𝜇𝑧 both vanish. Those transitions are forbidden, meaning the to

* transition cannot be promoted by absorption of a photon polarized along y or z. Absorption is

only allowed for polarization along x. Some instinct can be developed to understand this result. In

comparing the symmetry vs anti-symmetry of the initial and final wavefunctions we can see that the

x direction is the only direction in which the two functions differ. An electric field vector along that

direction can therefore drive the conversion of one to the other. The treatment of the transition

dipole moment and selection rules for emission are similar to those for absorption.

Example of absorption of polarized light by an oriented pigment

Although light may only be absorbed when the electric field

is oriented in a particular direction relative to the

chromophore, the effects of this are often not evident in

experiments done in solution, since the absorbing

chromophore is present in all possible orientations. The

dependence of absorption on direction of polarization can

sometimes be seen in a crystalline sample where the

chromophore exists in the same orientation throughout the

crystal specimen. The example shown here comes from a

crystal of a protein that binds a carotenoid molecule as a

cofactor. Carotenoids are long organic molecules with

conjugated double bonded orbital systems, and from our

previous exercise we might expect a carotenoid to absorb

light polarized along the long axis of the molecule. The

photographs shown were taken under a light microscope

where the incident light passed through a polarizing filter

before passing through the crystalline sample. The polarizer was rotated at different angles for the

two photographs. Evidently, the light was polarized in a direction that allowed absorption by the

carotenoid in the top panel, but it was oriented in a direction that did not allow absorption in the

second panel.

Fluorescence experiments with polarized light

w/r/t axis �⃗⃗� 𝑖

y = x0y1z0


x even even odd odd

y even odd even odd

z odd even odd even

w/r/t axis �⃗⃗� 𝑖

z = x0y0z1


x even even odd odd

y even even even even

z odd odd odd odd

182

Interesting phenomena occur when a sample absorbs polarized light and then reemits photons by

fluorescence. To simplify the discussion at the outset we will assume that if a molecule absorbs a

photon polarized in a particular direction then by fluorescence it will emit a photon polarized in the

same direction if the molecule has not changed its orientation between the absorption and emission

events. But what about molecular motions, particularly changes in molecular orientation, that are

occurring during the process? How much might one expect a molecule to rotate in the time between

when it absorbs a photon and re-emits a photon by fluorescence? Clearly this depends on the relative

rates of fluorescence and molecular rotation in solution. If random molecular rotation occurs very

slowly compared to fluorescence, then fluorescent photons will be polarized in the same direction as

the incident (polarized) light. Conversely, if random rotations occur much faster than fluorescence,

then emitted photons will have electric field vectors oriented in all directions equally. As a result it

is possible to learn about the relative rates of fluorescence vs molecular rotation by studying the

degree to which emitted photons are polarized in the same way as the incident light. If the time scale

for fluorescence is known then the time scale for molecular rotations can be determined. This is

useful because the time scale for molecular rotation in solution depends on the size of the molecule

(and on viscosity), so ultimately we can get information about molecular size using experiments of

this type. Some of the technical details are described here.

The figure diagrams essential features of a fluorescence polarization or fluorescence anisotropy

experiment. Two polarizing filters are required: one before the sample and one after the sample.

Monochromators (not shown) are also required to select appropriate wavelengths for the incident

and emitted photons being detected. The second polarizer (sometimes referred to as the analyzer) is

rotated during the experiment. This makes it possible to measure the relative intensity of emitted

light that is polarized in different directions; this is a measure of how much molecules have rotated

after absorption and before emission.

183

The mathematical treatment is as follows. The intensity of light emitted parallel to the incident light

is denoted 𝐼∥. The intensity emitted perpendicular to the incident light is denoted 𝐼⊥. The (unitless)

measure of how much stronger the parallel emission is compared to the perpendicular is described

by the fluorescence anisotropy, r. [Anisotropy translates roughly to “not” the “same” in all

“directional movements”.] The anisotropy of the emitted fluorescence is defined in terms of

experimental measurements as

𝑟 =𝐼∥ − 𝐼⊥𝐼∥ + 2𝐼⊥

(0 ≤ 𝑟 ≤ 1)

If the value of r is close to zero (i.e. no anisotropy) then the intensity is the same for parallel and

perpendicular emission, meaning the rate of molecular rotation is much faster than the rate of

fluorescence. If the value of r is close to 1 (perfect anisotropy) then the intensity of emission in the

perpendicular orientation is negligible, meaning the rate of fluorescence is much faster than the rate

of molecular rotation. Useful information comes from intermediate scenarios where the two rates

are in a comparable range and the value of r is intermediate between 0 and 1.

Exactly how does the fluorescence anisotropy relate to the relative rates of fluorescence and

molecular rotation? As in our earlier kinetic analyses, unimolecular rates can be described by

(reciprocally related) decay times. The decay time for changes in molecular orientation is referred

to as the rotational correlation time, denoted here as rot. We denote the fluorescence decay time as

fluor. One form of Perrin’s equation (given without proof) states that

𝑟

𝑟0=

𝜏𝑟𝑜𝑡

𝜏𝑟𝑜𝑡 + 𝜏𝑓𝑙𝑢𝑜𝑟

Setting aside the term r0 momentarily, the equation indicates that if the decay time for rotation is

much longer than the decay time for fluorescence (meaning the rate of rotation is much slower than

the rate of fluorescence), then the anisotropy r would be 1. And conversely r would be zero if the

decay time for rotation was much shorter than for fluorescence. The term r0 is necessary to deal with

an imperfect alignment of emitted and absorbed photons that occurs even without any molecular

rotation. Finally, the rotational correlation time is directly related to molecular size by 𝜏𝑟𝑜𝑡= V/RT,

where is viscosity and V is molecular volume. Therefore, in principle, one can obtain a value for

molecular volume from fluorescence anisotropy measurement, assuming the molecule of interest is

fluorescent or has been fluorescently labeled and the fluorescence decay time can be established in

separate experiments.

In biochemical applications, fluorescence anisotropy experiments are often used not to estimate

actual molecular volumes, but in a somewhat more qualitative way, comparing the degree of

anisotropy before and after some potential binding event for example. The central requirement is

that the event under investigation must cause a change in the rotational correlation time of the

fluorescent molecule. Two kinds of experiments are possible as illustrated. Under constant

illumination (steady state), conditions or events that give rise to slower rotational tumbling give an

184

increase in fluorescent anisotropy, r, since less tumbling occurs prior to fluorescent emission. In a

time-resolved experiment following an incident pulse, the anisotropy will decay more slowly if the

rotational tumbling is slower.

Fluorescent resonant energy transfer (FRET)

Under special circumstances, an excited chromophore can return to the ground state not by emission

but by transferring exciton energy to a nearby chromophore. The efficiency of this process depends

on two main factors: the

degree of overlap

between the emission

spectrum of the first

chromophore (referred

to as the donor) and the

absorption spectrum of

second chromophore

(referred to as the

acceptor).

According to the Förster equation, the transfer efficiency depends steeply on the separation R

between the donor and acceptor.

Efficiency = 1

1+(𝑅 𝑅0⁄ )

6

The parameter R0 is particular for the donor and acceptor pair and depends chiefly on the quality of

the spectral overlap between the donor emission and the acceptor absorbance. Note that when R=R0

the efficiency of energy transfer is 1/2, so a good donor acceptor pair will have a relatively high value

for R0. FRET experiments are useful mainly in the 10 to 100Å range.

185

Looking back at our scheme from the previous chapter showing the possible routes for return of the

excited state to the ground state, we see that energy transfer by FRET is a phenomenon that competes

with the fluorescent emission from the donor. As a result, the presence of the acceptor chromophore

reduces donor fluorescence and speeds decay of the excited donor and its fluorescence. As

diagrammed here, FRET experiments can be done either under constant illumination, where the

fluorescence intensity from the donor is reduced by the presence of the acceptor, or in a time-

resolved experiment where the speed of decay is increased by the acceptor and the characteristic

decay time is decreased.

FRET experiments find use in diverse experiments where detecting the proximity or approximating

the distance between two molecules or functional groups might be informative. Unless one of the

two components of interest is naturally fluorescent, this generally involves labeling both components

– one with the donor fluorophore and one with the acceptor. Experimental measurements give rise

to a value for the transfer efficiency, as diagrammed here for either continuous illumination or time-

resolved studies, after which the efficiency value can be used to approximate the distance between

the donor and acceptor according to the Förster equation. Note that values for R0 have been tabulated

for very many donor-acceptor pairs, so that parameter is typically a known quantity.

FRET in biology

Besides being useful in biochemical experiments, the FRET phenomenon plays a key role in

photosynthesis. The key step that converts light energy into chemical energy in photosynthesis takes

place in a transmembrane protein complex known as the photosynthetic reaction center (RC). The

RC binds a ‘special pair’ of chlorophyll molecules in a parallel and partially overlapping arrangement.

That grouping makes the special pair suitable for participating in the primary photochemical event.

After the special pair is excited, instead of returning to the ground state an electron leaves the special

186

pair and jumps along a path of neighboring pigment cofactors (chlorophylls and carotenoids) bound

to the protein, leading the electron across the membrane and thereby generating an electrochemical

difference between the two sides; this forms the basis for chemical energy conversion in

photosynthesis. But the reaction center by itself is not suited for absorbing photons that are hitting

the photosynthetic membrane everywhere and with a broad range of wavelengths. Other

transmembrane proteins, known as light harvesting complexes (LH), bind a large number of pigment

molecules and surround the reaction center. Depending on the particular system and organism,

multiple types of LH rings may be present. The LH proteins are designed to hold their pigment

molecules in very specific positions with respect to each other, and to tune their spectral properties

so that the pigment molecules can absorb photons efficiently throughout the photosynthetic

membrane and then transfer that exciton energy, essentially by FRET, to pigment molecules closer

to the RC. This results in a funneling effect where, at some expense of lost energy in each transfer

step, the exciton energy is eventually delivered to the special pair in the RC in order to drive the

primary photochemical event. The photosynthetic reaction centers from bacteria (which are

analogous to Photosystem II from higher plants) were the first transmembrane proteins whose

structures were determined in atomic detail in the mid 1980’s. The structure of the bacterial RC is

shown here in a side view with the membrane running horizontally and its pigment molecules in red,

along with a view of the RC and surrounding light harvesting complexes viewed perpendicular to the

photosynthetic membrane.

Spectroscopy of Chiral Molecules: Optical Rotation and Circular Dichroism

Chiral molecules exhibit special spectroscopic phenomena that become evident when they interact

with polarized light. Because practically all biological macromolecules are chiral, as are many smaller

biochemical metabolites, spectroscopic techniques that exploit these phenomena are widely used in

the laboratory.

Circularly polarized light

187

We have discussed plane polarize light at length. There, the electric field vector oscillates in a plane

(e.g. vertically for vertically plane polarized light). Much insight can be gained about how chiral

molecules interact with plane polarized light by taking a monetary leap of faith and noting that a

vector that oscillates up and down vertically can be generated by the sum of two vectors that rotate

in a circle in opposite directions at equal frequency; when they are both vertical (up or down), they

sum to give a vertical result, whereas when they are horizontal they oppose each other and cancel.

For a traveling wave, the circularly rotating electric field vector means that the wave takes the form

of a helix. Therefore, plane polarized light can be imagined as being composed of two circularly

polarized components: one that is ‘right circularly polarized’ and the other that is ‘left circularly

polarized’. This is not merely a thought exercise, because in fact pure circularly polarized light can

be prepared by passing plane polarized light through a so-called ‘quarter-wave plate’, but for now

we will stick to our view of plane polarized light as a composition of two circular components. The

figure shows both

forms of circularly

polarized light; the

‘right’ component

forms a right-handed

helix (like DNA or a

protein alpha helix or a

standard hardware

screw) while the ‘left’

component forms a

left-handed helix. The

sense of the rotation is

that a fixed observer

looking towards the

source will see the

direction of the E field

vector rotate

clockwise in time for

right circularly polarized light as the traveling wave moves past the point of observation. This is

reversed for left circularly polarized light.

The point of considering plane polarized light as a sum of two circular components is that by viewing

them in terms of helical waves we can immediately appreciate why chiral molecules might interact

differently with left vs right polarized light. Helices are chiral, as is a biological macromolecule, and

we can appreciate the distinct interactions chiral objects make with each other by thinking about

putting our foot into a shoe; feet and shoes both being chiral, a particular shoe interacts differently

with your two feet. So what are the distinct kinds of interactions that a chiral molecule can make

with a chiral light wave? Two effects are noteworthy, relating to differences in absorption and

differences in index of refraction, and these lead to two important types of experiments, which we

discuss next.

188

Circular dichroism (CD)

If the right and left circularly polarized components (imagined to be contained within a beam of plane

polarized light) are absorbed to the same extent when passing through a sample, then the light that

is transmitted should naturally remain plane polarized. But one component may be absorbed more

strongly than the other; this forms the basis for circular dichroism or CD. What is the consequence?

Clearly if the left circularly polarized component is absorbed slightly more, then the transmitted

beam should have at least a slightly larger component of the right circularly polarized type. If we add

up oppositely rotating vectors of unequal magnitude, we get an elliptical shape for the resulting path

of the electric field vector.

The magnitude of the circular

dichroism effect is captured by a

parameter referred to as the ellipticity,

. Diagrammatically, relates to the

angle formed by a line between the tips

of the transmitted electric field vectors

in the perpendicular directions of

maximum and minimum magnitude, as

shown. The ellipticity of the

transmitted beam can be measured by

a CD spectrophotometer; this requires

additional polarizers between the sample and the detector.

The ellipticity effect originates from a difference in extinction coefficients (and therefore absorbance

values) for left vs right circularly polarized components, so naturally should reflect that

relationship. The equation for in terms of absorbance for left vs right is:

𝜃 =2.303(𝐴𝐿 − 𝐴𝑅)

4=

2.303

4𝑑𝐶(𝜖𝐿 − 𝜖𝑅)

where A refers to absorbance, 𝜖 refers to the extinction coefficient, 𝑑 is the path length of the light

through the sample, and C is the molar concentration (recall A= 𝜖𝐶𝑑). According to the sign

convention, higher absorption of left-circularly polarized light, resulting in a greater right component

and therefore a clockwise-rotating elliptical field vector, corresponds to positive . But this equation

requires further explanation of the multiplicative factors, 2.303 and 1/4. The 2.303 term is

recognizable as ln(10) which we might guess relates to the conventional use of log10 for absorbance

equations. But what about the 4? In many texts this appears without comment. At the expense of

some thorny details we will show the origin of these multiplicative terms. To begin we point out that

the diagram for is vastly exaggerated; the actual differences in absorbance are usually very small

(which means that is small), which makes it possible to simplify a number of complex non-linear

relationships between variables in this problem with linear approximations (i.e. keeping just the first

terms in a Taylor expansion). Briefly, the transmittance for the left component would be 10−𝜖𝐿𝑑𝐶 =

189

𝑒−2.303𝜖𝐿𝑑𝐶, and similarly for the right. But you may recall from earlier physics courses that the

intensity of a light beam (which here relates to the transmittance) goes as the square of the

magnitude of the electric field vector, so the lengths of the electric field vectors in the diagram for

go as the square roots of the transmittance values. So, the magnitude of the transmitted electric field

vector for the left component would be 𝑒−2.303𝜖𝐿𝑑𝐶/2 , and likewise for the right. When the exponents

in those terms are small, we can approximate e-x as 1-x from Taylor’s expansion to get

(1 − 2.303𝜖𝐿𝑑𝐶/2) for the E field magnitude for the left component, and similarly for the right. Then,

noting from the diagram that the tangent of would be the ratio of the short axis to the long axis, and

the length of the short axis is the length of the right circularly polarized electric field magnitude minus

the left, and the long axis is the sum of the magnitudes, then

tan(𝜃)

=((1 − 2.303𝜖𝑅𝑑𝐶/2) − (1 − 2.303𝜖𝐿𝑑𝐶/2))

((1 − 2.303𝜖𝑅𝑑𝐶/2) + (1 − 2.303𝜖𝐿𝑑𝐶/2))⁄

If the terms of the form 2.303𝜖𝐿𝑑𝐶/2 that appear in the denominator are << 1, then the whole

denominator is very nearly equal to 2. Finally, when is small, then Taylor’s expansion gives

tan() (in radians), and so finally the whole expression simplifies to the one earlier with the 2.303

in the numerator and 4 in the denominator coming from (1/2)/2.

As a final manipulation, if the value for is expressed in degrees instead of radians, which introduces

a multiplicative factor of 180/=57.3 degrees/rad, and the ellipticity is normalized to be a molar

value by dividing by molar concentration and also normalized for path length (typically in cm), then

the molar ellipticity in degrees is 𝜃(in degrees)

𝐶 𝑑=

2.303

457.3(𝜖𝐿 − 𝜖𝑅) = 32.98(𝜖𝐿 − 𝜖𝑅). And finally, for historical reasons relating to

volume and length unit conversion, a factor of 100 is present in the standard equation for the molar

ellipticity (denoted by square brackets) giving, [𝜃] = 100𝜃 (𝑑𝐶)⁄ , to give:

[𝜃] = 3298(𝜖𝐿 − 𝜖𝑅)

which matches standard textbook expressions.

Optical Rotation

The CD effect arises from differences in absorption. A different effect arise when the left and right

circularly polarized components travel through the sample at different speeds (owing to electronic

interactions with a chiral molecule). What happens when light goes through a sample more slowly?

The frequency of the wave is unchanged, but the wavelength changes. The speed of light is inversely

dependent on the index of refraction, n, and the index of refraction here may be different for the left

component compared to the right. cL=c0/nL and cR=c0/nR, where c0 is the speed of light in a vacuum

and the subscripts refer to left and right. Then, L = cL/v = c0/(nLv). How many oscillatory cycles does

190

a light beam make when it passes through a sample of thickness d? The

answer is d/. The angular rotation in radians would be 2d/L, which after

substituting the expression for would be 2dnLv/c0 = 2dnL/ for the left,

and similarly for the right. Because of the dependence on the index of

refraction, the different components will execute a different amount of

rotation as they pass through the sample. The final orientation of the

polarization direction is determined by the sum of left and right vectors,

whose angle is the average of the two component vectors, so the resulting

transmitted wave should be rotated (as shown) according to half the

difference between their separate angles of rotation. This gives for the angle

of rotation of the polarized beam,

𝛼 =𝜋𝑑

𝜆(𝑛𝐿 − 𝑛𝑅)

The sense of the rotation is worth clarifying. According to the equation, the optical rotation angle

is positive if the index of refraction is higher for the left circularly polarized light, meaning its speed

through the sample will slower. As a result, that wave will oscillate further (i.e. execute more of a

wave cycle) compared to the right circularly polarized light. But referring to the earlier figure

showing circularly polarize light you will notice that when left polarized light rotates further as a

function of position along the direction of travel, it is actually rotating clockwise; this is opposite from

the apparent counterclockwise rotation of the E field vector seen by a fixed observer as the left

circularly polarized traveling wave passes. As a result, if 𝑛𝐿 − 𝑛𝑅 > 0, then the rotation of the electric

field vector is clockwise as shown, as viewed by an observed looking towards the source. If the

optical rotation is expressed as a molar quantity by dividing by concentration, and also normalized

for path length, an equation for molar optical rotation is obtained.

[𝛼] =100𝛼

𝐶𝑑=

100𝜋

𝜆𝐶(𝑛𝐿 − 𝑛𝑅)

As with the CD effect, for most molecules in solution the (unitless) difference in index of refraction

(𝑛𝐿 − 𝑛𝑅) is very small, perhaps 10-5. But from the equation above for you can see that because the

path length d is often about 104 or more times longer than the wavelength , that the optical rotation

is often substantial and can be measured accurately. Again, this requires an additional polarizing

filter between the sample and the detector. Optical rotation can be used to identify chiral molecules

and it is particularly useful in organic chemistry for evaluating the enantiomeric purity of a synthetic

product; a racemic product, being composed of equal amounts of both enantiomers, shows no optical

rotation.

The use of optical rotation has a storied past. In the early 1800’s it was observed that individual

quartz crystals, which grow in two mirror-related forms, rotated polarized light in different

directions. In 1849 Louis Pasteur took a crystallized sample of tartaric acid (a 4-carbon compound

with two chiral centers – the meaning of which was unknown at that time), separated the crystals

191

into two piles according to their apparently mirror-related morphology and discovered remarkably

that the dissolved crystals of mirror-related morphology rotated polarized light in opposite

directions. That experiment came several decades before atomic structures were determined for any

compounds, at a time when theories of bonding and the atomic structure of matter were still

undeveloped. With regard to the apparent luck and insight required to make that discovery,

Pasteur’s own words are notable – “Dans les champs de l'observation le hasard ne favorise que les

esprits prepares” [In the field of observation, chance favors only the prepared mind].

Optical rotation and circular dichroism are interrelated

The phenomena of optical rotation and circular dichroism are related to each

other and they occur together in the same molecular sample, as illustrated.

Both arise from complex relationships between electric transition dipole

moments (which we discussed briefly earlier) and magnetic dipole moments

in a molecule. We will only touch on the subject qualitatively here. An

important point is that the effects (like other spectroscopic phenomena) have

to do with allowable energy transitions in a molecule. As a result, the observed

effects are strongly wavelength dependent. Indeed the common term ORD

(optical rotary dispersion) comes from the wavelength dependence of the

optical rotary effect. The CD and ORD effects are strongest at or near

wavelengths where some underlying absorption transition occurs. An exact

integral relationship exists between ORD and CD in

the form of a Kramers-Kronig transform, which we

will not discuss here, but in its simplest form the

relationship leads to a characteristic result

diagrammed here for an idealized electronic

transition whose maximum absorption would be at

max. Molecules with complex absorption spectra

give more complex CD and ORD spectra.

One result of the integral relationship that gives

from is that even when the circular dichroism

peak (which relates to absorbance differences) is

sharply peaked at the absorption maximum and is

weak elsewhere in the spectrum, the optical

rotation signal may be appreciable at wavelengths

farther from the transition. In some sense this

amounts to a smoothing out effect. The CD signal from a complex molecule may therefore offer

sharper distinguishing features, which can be important in analyzing detailed behavior and

conformations, while the advantage of optical rotation is that its effects can often be observed in the

visible range of the spectrum even if the molecule being studied has strong electronic transitions only

in the far UV region. Pasteur’s tartaric acid is a case in point. The optical rotation phenomenon and

192

its wavelength dependence can also be demonstrated easily with simple corn syrup owing to its high

concentration of chiral sugars.

CD studies for analyzing protein secondary structure

CD spectroscopy is widely used to monitor the conformation of proteins. There are strong transitions

from the polypeptide backbone in the 200-220 nm range, so CD measurements on proteins are

typically made in the surrounding range. A particularly common use is to estimate the percent

composition of the basic secondary structure elements – alpha helix, beta sheet, and ‘random coil’ –

in a protein. This can be informative if the three dimensional structure of the protein is not known

in more detail from other techniques, or if one is concerned about whether a protein is folded

properly. As we have discussed, under various conditions, or after mutations have been made, a

protein may become partially or totally unfolded.

The different types of protein secondary structure have distinctive CD spectra (shown here), which

have been established with model polypeptides or proteins. Clearly, if one measures the CD spectrum

of an unknown protein and it matches precisely to one of the three reference spectra (alpha, beta, or

random), then you could surmise that

the protein in question was entirely

helical, entirely beta sheet, or entirely

unfolded. This is of course rarely the

case. Instead, after recording a CD

spectrum one is generally faced with the

problem of how to decompose it into a

sum of the reference spectra, weighted

according to the estimated fractional

contribution each makes to the total

observed spectrum. Virtually all

spectroscopic techniques give additive

behavior with respect to multiple

components that are present in a

mixture, and CD is no different. As a

result, we can write a series of linear

equations stating how the reference CD

values at each wavelength would be

expected to sum to the value observed

in the unknown sample.

At each wavelength, i, we can write an equation of the form

obs(i) = f(i) + f(i) + frr(i)

193

where obs is the observed ellipticity and f, f, and fr, are the unknown fractions of alpha, beta, and

random coil that make up the protein under study. The other terms in the equation, e.g. (i), are

known quantities based on the reference curves. A series of equations at different wavelengths can

be written in matrix form as shown.

[

𝜃𝑜𝑏𝑠(𝜆1)

𝜃𝑜𝑏𝑠(𝜆2)

𝜃𝑜𝑏𝑠(𝜆3)…

] =

[ 𝜃𝛼(𝜆1) 𝜃𝛽(𝜆1) 𝜃𝑟(𝜆1)

𝜃𝛼(𝜆2) 𝜃𝛽(𝜆2) 𝜃𝑟(𝜆2)

𝜃𝛼(𝜆3) 𝜃𝛽(𝜆3) 𝜃𝑟(𝜆3)… … … ]

[

𝑓𝛼𝑓𝑏𝑓𝑟

]

Ideally we would write a large number of equations based on measurements at many different

wavelength values. This would give us a system of n equation in three unknowns (f, f, fr), with n

>> 3. As you may know, if the number of equations is larger than the number of unknowns, then

there may be no exact solution for the unknowns for which all the equations are satisfied. The key is

to determine what values for the unknowns give the best agreement overall with the equations

provided. The general solution to this problem is by the method of linear least squares. More

complex treatments are possible in which different weights are given to different measurements

according to their uncertainties, but we will give the simplest treatment here where all the estimated

errors are assumed to be equal. Then the optimal solution can be written out relatively easily. First,

we will shorten our notation for the system of equation above as: a vector �⃗� (of dimension n) is equal

to a rectangular matrix A (n rows by 3 columns) times a vector 𝑓 (of dimension 3, representing the

quantities to be determined (f, f, and fr,).

𝐴𝑓 = �⃗�

Then multiplying on the left by the transpose of the matrix A, AT, we get

𝐴𝑇𝐴𝑓 = 𝐴𝑇�⃗�

(𝐴𝑇𝐴) is a square 3x3 matrix that can be inverted to give its reciprocal, (𝐴𝑇𝐴)−1. Then multiplying

on both sides on the left by (𝐴𝑇𝐴)−1 gives

𝑓 = (𝐴𝑇𝐴)−1𝐴𝑇�⃗�

This is a straightforward calculation to perform by computer, making it easy to obtain estimates of

the secondary structure composition from measured values of the ellipticity at several wavelengths.

The linear least squares approach above is extremely powerful and can be applied to wide ranging

problems where a large set of equations can be written in terms of a smaller set of unknown variables.

194

CHAPTER 19

Macromolecular Structure Determination and X-ray Crystallography

Our current understanding of biology dwarfs what was known only a few decades ago. During that

time, two areas of study have driven genuine scientific revolutions: genome sequencing and

structural biology. This chapter focuses on the latter subject.

The power of structural biology rests on the adage that seeing is believing. And indeed learning and

seeing what macromolecules look like in atomic detail has changed the way we understand the

workings of the cell and all its components. This chapter will focus on the diffraction technique

known as x-ray crystallography. Only a few comments on other important methods in structural

biology will be offered, mainly in counterpoint.

We will see shortly that at its heart, x-ray crystallography is a type of imaging method; there are

important complications, but in the end one obtains a true three-dimensional image of the molecule

under study. On this issue a contrast can be drawn to nuclear magnetic resonance (NMR), which is

the second leading method for studying macromolecular structure. NMR methods probe the complex

interactions between nuclear spins in a molecule and powerful external magnetic pulses. From

sophisticated analyses of those interactions, sometimes relying on special biochemical protocols

involving isotopic labeling of specific atom types or residues in a macromolecule, information is

extracted about the proximity and relative orientation between different amino acids in the protein

(or nucleotides in the case of nucleic acid molecules). This leads ultimately to a large number of

inferred spatial constraints that must be obeyed by a correct atomic model. Computer programs then

attempt to generate a set of atomic coordinates that is most consistent with the body of NMR

constraints, along with other known information (chiefly the amino acid or nucleotide sequence). If

a sufficient number of spatial constraints can be obtained, then an accurate model can be produced.

With NMR, the experimental challenges increase steeply with molecular size and complexity, but new

methods continue to push the limits of size. In addition, NMR methods offer valuable dynamical

information about macromolecules that is difficult to obtain by other methods, including x-ray

crystallography.

We will shortly discuss the importance in imaging methods of using a sufficiently short wavelength

for the radiation source in order to get detailed structural information. X-rays fit that requirement,

but the high energy electrons used in electron microscopy also fit that requirement; they have very

short (DeBroglie) wavelengths. Yet despite the sufficiently short wavelength offered by high energy

electrons, until recently electron microscopy has not been able to produce images of macromolecules

in atomic detail. The reasons are complex, but they concern two interrelated issues of instrument

sensitivity and the strongly destructive interaction of electrons with biological materials (like

proteins and nucleic acids). But it appears that those limitations are finally falling away. Very recent

instrumentation developments have produced systems with detector sensitivities high enough that,

if sufficient effort is applied to collect very large numbers of molecular images, atomic level detail can

indeed be obtained by electron microscopy in favorable cases. Electron microscopy methods, and

195

NMR methods as well, are sure to continue to grow in power and to contribute increasingly to our

body of knowledge in the area of structural biology. But we turn now to the method that has

contributed so enormously to our understanding of the three dimensional structures of

macromolecules: x-ray crystallography.

The limiting effect of wavelength

In order to explain why x-ray crystallography is necessary, we have to understand the fundamental

limiting effect that the wavelength has in an imaging experiment. One way to understand this point

is to ask how different the scattering is from two points in space that are separated by a distance d, if

the wavelength of the radiation is . Besides d and , the answer also depends on the geometry of

the scattering, as shown. Scattering phenomena depend on how light waves interfere with each

other, and whether light waves interfere with each other (constructively or destructively) depends

on the relative phases of the scattered waves, and this depends in turn on the relative distance the

light waves travel when they scatter from different points in space. If the path a light wave takes

from its origin to a detector is the same whether it scatters from point A or point B, then those two

points interact with the wave in an effectively indistinguishable way. Indeed, the distinction between

the scattering from two points comes from the ‘path length difference’ for rays traversing paths

through those points. In the scheme shown, the path length difference is 2𝑑𝑠𝑖𝑛(𝜃). Advanced texts

in different fields of study address the next

point in different ways. Here we will err on

the side of simplicity and just argue that if the

path length difference is short compared to

the wavelength of the radiation, then

scattering from the two points is not so

different, and an optical experiment based on

the indicated set-up (of d and ) would not

clearly resolve the two points. If we press the

argument and say then that the level of detail

or spatial ‘resolution’ d is defined by requiring

2𝑑𝑠𝑖𝑛(𝜃) to be comparable to , then we see

that the minimum possible value for d (i.e. the

finest detail that could be resolved) is limited

by /2 (which occurs at =90°). This is why light (or UV) microscopy cannot provide spatial detail

below a few hundred nanometers, no matter how large or perfect the lenses are. This fundamental

limitation that the wavelength places on resolution is sometimes referred to as the diffraction limit

or the Abbe limit. Some very special tricks – some having to do with the power of statistical averaging

and some having to do with special instrumentation – have been developed over the last few years

to circumvent the diffraction limit; these techniques are sometimes grouped under the moniker of

‘super-resolution’ microscopy, recognized by the Nobel Prize in Chemistry in 2014. Setting aside

such special techniques, the limiting effect of the wavelength means that in order to resolve atomic

level details in molecules, we have to use radiation with a wavelength not much longer than the

196

separation between atoms, which is 1 to 2Å. That corresponds to the x-ray region of the

electromagnetic spectrum, hence x-ray crystallography.

X-rays and the problem of focusing

X-ray radiation provides the answer we need with respect to wavelength, but it also introduces a

critical problem. In a typical imaging experiment, using a camera or a telescope or your eye (or even

magnetic lenses for the case of electron microscopy), the photons or waves that are scattered from

the object under study are focused back to form the (typically enlarged) image. But x-rays cannot be

focused, at least not to a practical degree, because there are no materials with a high index of

refraction for x-ray. So, using x-rays we can do the first part of an imaging experiment (i.e. the

scattering) but not the second part (focusing). To understand what must be done instead, consider

that, though there is no suitable lens for x-rays, information sufficient to create the desired image

must be contained in the scattered waves that arrive at the lens location; if the x-rays could be focused

then an image would be formed. The solution to the problem then is to record the scattered

information and figure out what mathematical relationship is required to convert the observed

scattering back into an image – that is, to do using a computer what a lens would do naturally. It

turns out that that relationship is well-understood. An object and its scattering pattern are related

by a Fourier transform, an integral transform that is ubiquitous in mathematical physics and

engineering. Before we discuss how such operations relate to applications in x-ray crystallography

we have to consider certain aspects of how repetitive objects like crystals scatter radiation.

Diffraction geometry

From earlier physics experiments you are likely familiar with the basic idea that when light is

scattered from a regularly repeating object, like a light or a laser passing through a set of fine slits,

one get destructive

interference almost

everywhere, but

constructive

interference (i.e.

bright spots) at a

series of special

positions.

Constructive

interference occurs

at diffraction angles

where the light

passing through

different slits have

path length

differences that are

197

integral multiples of the wavelength of the light. Destructive interference occurs elsewhere. The

example shown is essentially a one-dimensional system, slits repeating in one direction. Describing

diffraction from objects that repeat in more dimensions becomes a bit more complex, but the two

dimensional case can be illustrated clearly.

We can imagine scattering

from a two-dimensional

crystal where a molecule

repeats regularly in the x

and y dimensions. Let the

repeat distance along x be

|a|, and the repeat distance

along y be |b|. The a and b

vectors define the

boundaries of a ‘unit cell’,

whose contents could be

used to construct the entire (ideally indefinite) crystal by translational shifts; for our simplified

discussions we will be ignoring the possibility of rotation symmetry within the crystal. The a and b

vectors also describe a lattice of points embodying the properties of the translational repetition in

the crystal. We can understand the geometry of diffraction by momentarily forgetting about the

underlying structure of the object in the crystal in order to focus on the repeating lattice. The lattice

captures the relationship in the crystal between equivalent atoms belonging to molecules from

different unit cells. For example, if you considered just the C-alpha atom of the first amino acid in the

protein molecule, all the instances of that atom throughout the crystal would describe the crystal

lattice. Now we can apply what we know about scattering from a repeating object to this system of

lattice points. Scattering will be constructive for some choices of the direction of the incoming and

outgoing beams, but it will be destructive for most choices, giving no intensity for the outgoing beam

in those cases. A useful point is that if the scattering for a particular choice of incoming and outgoing

beam directions would be destructive for scattering from the repeating arrangement of one

particular atom in the protein molecules (i.e. the C-alpha atom alluded to earlier), then the scattering

would also be destructive when considering the arrangement of some other particular atom in the

protein. In other words, if a particular choice of incoming and outgoing beam directions would give

destructive interference from the crystal lattice points, then there would be destructive interference

from the entire crystal, regardless of what the molecule looks like or how its atoms are arranged

internally. The geometry of diffraction is therefore dictated only by the repeat pattern in the crystal

and not by the contents of the unit cell. This important simplification allows us to proceed to discuss

diffraction geometry separately from the question of molecular structure. [We will see later that

while the lattice geometry alone determines where we see diffraction, the molecular structure within

a unit cell determines which diffraction spots are bright and which are weak, and that information is

ultimately the basis for structure analysis].

198

The key relationship between

the incoming x-ray beam

direction and the outgoing

direction is captured by the

scattering vector, S. First we

define the incoming and

outgoing directions by unit

vectors �̂�𝑖𝑛 and �̂�𝑜𝑢𝑡. Then, a

diagram and a little algebra

shows us that the vector

difference between the

outgoing and incoming unit vectors, �̂�𝑜𝑢𝑡 – �̂�𝑖𝑛 is a vector of length 2sin. From before, our condition

for constructive interference for scattering or reflecting from planes that are separated by distance

d is 2d sin = n, or 2 sin/ = n(1/d). By substituting 2sin = (�̂�𝑜𝑢𝑡 – �̂�𝑖𝑛) we get (�̂�𝑜𝑢𝑡-�̂�𝑖𝑛)/ =

n(1/d). This motivates us to define a new vector, the scattering vector S, to be S = (�̂�𝑜𝑢𝑡-�̂�𝑖𝑛)/ . The

scattering vector bisects the outgoing vector and the (negated) incoming vector. According to the

algebra used to construct S, for constructive interference S is perpendicular to the reflecting or Bragg

planes drawn on the lattice, and the length of S must satisfy

|S| = (1/d)n

S is defined geometrically by the incoming and outgoing beam directions, and scattering is only

constructive when S follows the equation above. But the planes we drew in the diagram above

illustrate just one possible way that parallel planes can be drawn on a lattice. A practically unlimited

number of choices can be made for a set of planes running through the lattice at different angles. But

if we just choose two directions as our foundation, we can set up a system for describing the 2-

dimensional diffraction completely. Here the lattice has been drawn to be orthogonal (i.e. rectangular

instead of oblique). This is not necessary, and in fact many crystal have non-orthogonal unit cells,

but we will treat the orthogonal case because the algebra is simpler there. It makes sense to choose

our planes to be horizontal or vertical. Referring to our previous diagram of the two-dimensional

crystal, for the vertical planes along b, the spacing would be |a|, and there would be diffraction (i.e.

constructive scattering) for an S vector perpendicular to b (and therefore along a) and having length

1/|a| times an integer, which we will call h. For reflection from horizontal planes along a, S would be

along b and have length 1/|b| times an integer, k. Note the reciprocal relationship between the

lengths of the unit cell edges |a| and |b| and the length of the S vector where we get diffraction.

At this point there is utility in introducing a new set of basis vectors for describing the S vector. Owing

to the reciprocal nature of the relationship noted above, the coordinate space where we construct

the scattering vector S is referred to as ‘reciprocal space’. As basis vectors in reciprocal space, we

create an a* vector (perpendicular to b and having length 1/|a|) and a b* vector (perpendicular to a

and having length 1/|b|). That scheme is shown in the following figure. Now, for the S vectors

perpendicular to the planes defined by the b axis, we have S = ha*. And for S vectors perpendicular

to the planes defined by the a axis we have S = kb*. But in addition to the horizontal and vertical

199

planes, we could also draw sets of planes through the lattice at oblique angles. We should expect

diffraction for S vectors perpendicular to those planes as well, and with the length of S reciprocally

related to the spacing between planes. We will skip a full algebraic treatment, but it turns out that

the scattering vector S for any choice of planes is described by a linear combination of integral

multiples of the reciprocal axes a* and b*. That is,

S = ha* + kb* (h and k integers)

That equation clearly describes a two-dimensional lattice of spots (or really scattering vectors S) in

reciprocal space for which we expect diffraction. Every ordered pair (h, k) defines a set of Bragg

planes through the lattice, and those planes gives rise to a reflection corresponding to a scattering

vector S perpendicular to those planes, whose ‘Miller indices’ in reciprocal space are h and k. The

figure illustrates the relationship between different sets of Bragg planes that can be drawn on the

crystal lattice, the corresponding scattering vector S, and the location of the resulting reflection in

the diffraction pattern. The green arrows in the bottom panels indicate the diffraction spot or

‘reflection’ that arises from the Bragg planes drawn in green in the upper panels.

We can also work backwards from an observed diffraction spot and calculate what the spacing was

between the (generally oblique) lattice planes that gave rise to that reflection: from the indices h and

k of the diffraction spot, and knowing the lengths of a* and b*, we can use the Pythagorean equation

to calculate the length of the scattering vector S. Then, from above (ignoring the n from before), d =

1/|S|. This has an important meaning. Scattering from closely spaced planes in the lattice (i.e. where

d is small) shows up in the diffraction pattern where the S vector is long, i.e. farthest from the center

(which is where the direct beam would hit [h=0, k=0]). Small values of d correspond to a fine level

of detail (or ‘high resolution’) in the image we ultimately obtain for the crystallized molecule.

Therefore, in order to ultimately obtain a high resolution image of the crystallized molecule,

200

diffraction data must be present and recorded at high angles of diffraction (i.e. high ). Note from the

figures we have drawn that the angle the outgoing beam makes with the incoming beam is actually

2.

Our diffraction geometry equations can be put into practice in several ways:

Suppose that a horizontal x-ray beam with wavelength =1.54Å hits a crystal and we are able to

record good diffraction data on a detector at an angle up to 50° degrees away from the direction of

the direct beam. What is the highest resolution (i.e. lowest value of d) for diffraction spots that would

be recorded? 2=50°, =25°. |S|=2 sin/ and d=1/|S| = /(2 sin) = 1.82 Å

Given the indices of a reflection, we can calculate the resolution it provides; we must also know the

reciprocal unit cell. If a=100 Å, b=125 Å, and c=160 Å, the resolution for reflection (24, 8, 17) would

be:

d=1/|S| = 1/sqrt((24*(1/100Å))^2 + (8*(1/125Å))^2 + (17*(1/160Å))^2) = 3.7 Å. This calculation

assumes an orthogonal lattice; otherwise the calculation would have to take angles into account.

For the same unit cell as above, how many total reflections exist in three-dimensions within the limit

of 2.5 Å resolution, i.e. where d > 2.5 Å or |S| < 1/(2.5 Å)? The volume of a 3-D sphere in reciprocal

space with radius 1/(2.5 Å) is V=(4/3)(1/(2.5 Å))^3. Dividing this by the volume occupied by one

reciprocal unit cell volume (a*b*c*), gives about 536,000 reflections. We did not discuss internal

rotational symmetry in crystals, but that would make some of the reflections equivalent to each other

and therefore redundant, but nonetheless you can appreciate the very large number of observed

quantities that are measured in a macromolecular crystallography experiment, which is consistent

with the requirement of producing an image that can define the detailed structure of a molecule

typically containing thousands of atoms.

Diffraction in three dimensions

Our diagrams above were drawn for two-dimensional diffraction, and there the patterns come out

like we might imagine, or like we have seen in a classroom demonstration of a laser passing through

a fine screen. The geometry of diffraction from an object that repeats in three dimensions is a bit

more complex. At any given orientation of the crystal and the incoming beam, we are able to see just

a two-dimensional slice through the diffraction pattern that exists in our hypothetical three-

dimensional reciprocal space. But the slice we observe is not from a flat plane, but rather from a

sphere intersecting the three-dimensional reciprocal lattice. Why? We saw earlier that the scattering

behavior is governed by the scattering vector S. And S is the sum of the outgoing beam unit vector

201

(�̂�𝑜𝑢𝑡) with the negated incoming beam unit

vector (−�̂�𝑖𝑛), divided by . Now if the beam has

a fixed direction relative to the crystal (i.e. the

incoming beam and the crystal are both

stationary), so that sin is fixed, then the question

is: what values of S can possibly be sampled by

all the allowable directions for the outgoing

beam unit vector �̂�𝑜𝑢𝑡? Interestingly, the answer

is a sphere, as shown here. This is referred to as

the sphere of reflection or the Ewald sphere. So,

for diffraction from a three dimensional crystal,

we see diffraction only where S falls on a sphere

(of radius 1/), and where S simultaneously

falls on a reciprocal lattice of points. This has

the effect of planes of spots intersecting a

sphere, and since a plane intersects a sphere in

a circle, we see a diffraction pattern with spots seeming to appear in circular rings. In order to obtain

information on the full three-dimensional diffraction pattern, the crystal must be rotated about an

axis while diffraction images are recorded. An example of diffraction from a crystal undergoing a

narrow rotation is shown. Interpreting such a diffraction pattern, e.g. determining the indices (h,k,l)

of all the spots is a complicated problem. Modern crystallographic programs can usually do this

automatically for good diffraction data, a procedure known as ‘autoindexing’. This was not possible

in the early days of crystallography. Then, a crystallographer had to take pains to characterize the

crystal unit cell and the reciprocal lattice. From the arguments above, you can see that to take a

diffraction image showing values of S in a flat slice of reciprocal space requires changing the

orientation of the crystal relative to the incoming beam during the film exposure, in a very particular

way. The complex motion of the crystal and the film and an intervening annular screen can be

202

accomplished by a ‘precession camera’. An example of a precession photograph of a protein crystal

is shown. Owing to the time required and the complexity of the procedure, precession photographs

are rarely produced in modern crystallographic work.

Limited diffraction and disorder

We noted earlier that the geometry of a data collection experiment can set a limit on the resolution

obtained. A (typically flat) x-ray detector panel only allows data collection to a certain value of , and

the resolution is limited by d=/(2 sin). But the geometry of the experimental setup is rarely the

element limiting the resolution. The resolution of an x-ray diffraction experiment on a

macromolecular crystal is most often limited by the absence of detectable diffraction spots above

some scattering angle ; spots are clear and strong in the inner region of the diffraction pattern, but

they fall off and become unmeasurable farther from the center. This natural limit in resolution is a

direct reflection of the degree of order vs disorder in a crystal. For a perfect crystal, where the protein

atoms in one unit cell are in identical positions in every unit cell all through the crystal, diffraction

would be strong to unlimited resolution. But if the protein exhibits substantial atomic motion or

conformational variation, then diffraction will vanish at resolutions (i.e. values of d) comparable to

those variations. Stronger x-ray beams make it possible to observe weaker reflections, and indeed

the development of synchrotrons that produce x-ray beams thousands of times stronger than home

laboratory sources is a major reason for the current ability to determine atomic structures of highly

complex macromolecular assemblies, which often yield only small and weakly diffracting crystals.

Obtaining the atomic structure

How the contents of the unit cell affects the diffraction: the structure factor equation

In our previous discussions, we imagined abstracting just one specific atom from each molecule in

the crystal to establish the lattice, and from there we analyzed where the diffraction spots would

appear, and what their indices (h,k,l) would be. That exercise did not depend on the internal

structure of the molecule in the crystal, and so the positions of the spots evidently reveal nothing

about the underlying molecular structure. Instead, the molecular structure is manifest through the

character of the individual waves that comprise the diffraction pattern.

Every diffraction spot or reflection has an intensity (i.e. a brightness or darkness on the detector,

depending on the display device), which is denoted as I(h,k,l) according to the indices of the

particular reflection. But each reflection is a wave and so also carries with it a ‘phase’. The phase

describes how far advanced the wave fronts of a wave are compared to a reference wave (which in

our case is a hypothetical wave scattering from a reference point at the origin of the unit cell of the

crystal). How do the positions of the thousands of atoms in the crystal unit cell relate to the intensity

and phase of the wave corresponding to reflection h,k,l? Each atom scatters in all directions, so each

reflection is just a sum of waves scattered from the atoms in one unit cell, as shown.

203

How do those waves

add up? The separate

waves scattered from

the many atoms in the

unit cell interfere

constructively or

destructively in

matters of degree

rather than absolutely,

as was the case for our

earlier analysis of

scattering from a

lattice. To add them

up we have to account

for magnitudes and

the relative phases of

the waves scattered

from the separate

atoms. To a first approximation, the magnitude of the wave scattered from each atom is determined

by the number of electrons in the atom. The phase is determined by the position of the atom, since

that (along with the value of S for the reflection in question) is what determines the path length for

the wave. The phase angle (compared to hypothetical scattering from a point at the origin of the unit

cell) is governed by the dot product of the S vector for the reflection and the position vector for the

atom, which is usually written as r=xfa + yfb + zfc. Note that the coordinates xf, yf, zf, are ‘fractional’

relative to the unit cell axis vectors, a, b, and c. The consequence is that the path length difference is

r(�̂�𝑖𝑛-�̂�𝑜𝑢𝑡). To convert the path length difference to an angular phase, we need to divide the path

length difference by , multiply by 2, and then

negate since a longer path gives a negative shift of

the wave peaks. The phase angle then becomes

2r(�̂�𝑜𝑢𝑡-�̂�𝑖𝑛)/, and then substituting S for (sout-

sin)/ we obtain 2(rS) for the phase of the wave

scattered from an atom at position r to the reflection

whose scattering vector is S. A further simplification

in notation comes from expanding (rS) in

components as (xfa + yfb + zfc)(ha* + kb* +lc*).

Because of the specific way the reciprocal lattice was

constructed (aa*=1, ab*=0, ac*=0, and so on), (rS)

can be written as (hxf + kyf + lzf), which makes it

clearer how to calculate the required value. These

relationships are diagrammed in the figure here. The

phase angle term 2(hxf+kyf+lzf) that relates

scattering from position x,y,z to reflection h,k,l

appears throughout crystallography equations.

204

Now we have to add up the waves scattered from all the atoms based on their separate magnitudes

and phases. We will skip over some basic mathematical details here and simply explain that the way

to add up interfering waves (of the same frequency) with different phases is to decompose each wave

into a sine and a cosine term. That decomposition depends on the phase: If the phase is zero we say

that the wave is a cosine wave. If the phase is 90° we say it is a sine wave. If the phase is intermediate

then it decomposes into both cosine and sine components according to the cosine and sine of the

phase angle. Then, the waves can all be added up by adding their cosine components and sine

components separately. The final summed components describe a new total wave whose magnitude

and phase are embodied in the values of the total cosine and sine components. In order to collapse

the two components together as a single number so that the waves can all be added together with a

simple summation, we cast things into the space of complex numbers, assigning the cosine terms to

be real and the sine terms to be imaginary. Making those representations, we end up with the

structure factor equation:

𝐹 (ℎ, 𝑘, 𝑙) = ∑ 𝑓𝑗𝑐𝑜𝑠(2𝜋(ℎ𝑥 + 𝑘𝑦 + 𝑙𝑧)) + 𝑖 ∑ 𝑓𝑗𝑠𝑖𝑛(2𝜋(ℎ𝑥 + 𝑘𝑦 + 𝑙𝑧)) = 𝐴 + 𝑖𝐵

atom 𝑗atom 𝑗

The 𝑓𝑗 term is the form factor for

atom j (its number of electrons to a

first approximation). 𝐹 (ℎ, 𝑘, 𝑙) is

called the structure factor for

reflection h,k,l. The figure shows a

graphical representation of how the

total structure factor 𝐹 arises from

the summation of atomic

contributions.

The structure factor, 𝐹 (ℎ, 𝑘, 𝑙),

describes the wave that gives rise to

a specific diffraction spot.

Specifically, the brightness or

intensity of the spot on the detector

is the square of the amplitude of the

structure factor, 𝐼(ℎ, 𝑘, 𝑙) =

|𝐹 (ℎ, 𝑘, 𝑙)|2

, so the magnitude of the

structure factor for each reflection

is obtained by taking the square root of the measured intensity. This leaves us with one major

problem. We wrote the structure factor 𝐹 as a vector to emphasize its complex character. It contains

a real and imaginary part (A and iB above). Or, viewed another way, it is described by a length (or

magnitude) and an angle in the complex plane, which is the phase of the total wave. We can

205

measure the magnitudes but not the phases; that information contained in the waves is lost upon

collision with the detector.

To summarize, the positions of the atoms in the unit cell determine the structure factors for the

diffraction pattern. The structure factor for each reflection specifies (1) its magnitude, which can be

measured (as the square root of the spot intensity), and (2) its phase, which cannot be measured.

The ability to calculate precisely what the structure factors should be once we have an atomic

structure is an invaluable property, as we shall see later. But, the absence of phase information

creates an immediate problem.

Without the phases, there is no easy way to work the problem backwards and calculate where the

atoms must have been in order to give rise to the observed structure factor magnitudes. This places

the crystallography problem in a class mathematicians refer to as inverse problems: the results of a

calculation can be worked one way easily, but not the other. If we do have the phases for the structure

factors, then it is easy to calculate an image of the contents of the unit cell. The essential problem in

crystallography then is to recover (at least approximate values for) the missing phases. This is known

in crystallography as the phase problem.

Before turning to the phase problem, we will simply point out how the contents of the unit cell can

be calculated from the structure factors, once the phases are known. The relationship is an inverse

Fourier transform, well known in physics and engineering problems; it describes essentially what a

lens would have done if we had one for x-rays:

𝜌(𝑥, 𝑦, 𝑧) =1

𝑉∑|𝐹 (ℎ, 𝑘, 𝑙)|

ℎ𝑘𝑙

𝑐𝑜𝑠(𝛼(ℎ, 𝑘, 𝑙) − 2𝜋(ℎ𝑥 + 𝑘𝑦 + 𝑙𝑧))

In this equation, |𝐹 (ℎ, 𝑘, 𝑙)| is the magnitude of the structure factor and 𝛼(ℎ, 𝑘, 𝑙) is its angular phase.

𝜌(𝑥, 𝑦, 𝑧) is the electron density in the unit cell at position x,y,z. To obtain the electron density at any

point x,y,z, a summation is required over all reflections h,k,l, emphasizing that information about the

electron density is distributed across all the reflections. We will say more about the electron density

calculation later, but recognizing the requirement for the phases for all the reflections (𝛼(ℎ, 𝑘, 𝑙)), we

turn now to the phase problem.

Phasing and the phase problem

There are two essentially different ways the phase problem can be surmounted: (1) by methods

known as ‘molecular replacement’ and (2) by ‘heavy atom methods’ or variations thereon including

anomalous phasing; these make use of strong or unusual scattering from certain atom, generally

heavier than those naturally present in proteins and nucleic acids.

Molecular replacement - Summarizing briefly, molecular replacement requires that a structure

already be known for a similar molecule, or perhaps part of the unknown molecule. Common

206

scenarios include when the structure is known for a homologous protein from another organism, or

where the structure is known for the alpha subunit from an alpha/beta heterocomplex, or where a

protein-ligand complex has been crystallized and the structure is already known for the protein by

itself. In the molecular replacement approach, if one can figure out how the approximate model or

‘search model’ should be placed in the unknown unit cell, then structure factors can be calculated

from this oriented search model. The reason for doing this is the structure factor calculation

produces phases for the reflections. The phase values so-obtained may not be so accurate, because

they come from some approximate model, but they are usually good enough. One proceeds by taking

the structure factor magnitudes from the observed diffraction pattern from the unknown crystal, and

combines them with the approximate phases calculated from the search model. Those quantities,

placed in the equation above, can be used to produce an electron density map. Because the phases

do not contain information from the actual unknown structure, electron density maps calculated by

molecular replacement are typically biased to look like the search model. Overcoming this bias is a

thorny problem, but the speed and convenience of molecular replacement makes it an attractive

choice whenever it is possible. Of course if the crystallized molecule is unlike any known structure,

molecular replacement is not an option. In addition, a problem in molecular replacement we glossed

over is how the correct orientation and position can be identified for the search model in the unit cell.

The short answer is that if the search model is correctly placed then the structure factor magnitudes

calculated from that model (using the structure factor equation) should approximately match the

measured structure factor magnitudes. When the search model differs substantially from the

unknown structure, molecular replacement methods may fail, leaving only heavy atom and related

methods as viable routes.

Heavy atom methods – With heavy atoms methods, one obtains estimates of the phases by doing

additional diffraction experiments after perturbing the atomic structure (by addition of heavy

atoms). Making additional measurements of the structure factor magnitudes from perturbed

versions of the crystal makes it possible to break the phase ambiguity. As an example, suppose you

were trying to determine the value of an unknown (signed) quantity, and you were told that its

absolute magnitude was 100. It could be +100 or -100, but you can’t tell from a single measurement

of its magnitude. But what if you were able to ask what the absolute magnitude would be after

perturbing it in a known way? What if you were told that if you added 5 to the number the result had

a magnitude of 95. Then you would conclude that the unknown value was -100, not +100. That is

the essence of heavy atom methods and its variations. The crystallographic phase problem is more

complicated because what is missing is not merely the sign of the value but its phase angle. But it

turns out to not be so much more complicated.

The typical view of the problem is to take a single reflection and think of its structure factor as being

a vector that lies on a circle of known radius, namely the structure factor magnitude that comes from

the square root of the measured reflection intensity. Each structure factor has a phase associated

with it – if the atomic structure was already known then that phase could be calculated directly from

the structure factor equation – but not knowing the phase means that we do not know where on the

circle the structure factor vector for that reflection points. But now, imagine we have briefly soaked

the protein crystal in a solution containing a heavy atom compound, say HgCl2 for example, and that

207

as a result the crystal had been modified in a uniform way, say with a single mercury atom bound to

an exposed cysteine thiol in each copy of the protein molecule. We could do a diffraction experiment

on this ‘derivatized’ crystal, and we would obtain slightly different structure factor magnitudes

compared to the native protein crystal. Now for each reflection we have two magnitudes, the native

protein magnitude (FP) and the derivative magnitude (FPH). Now the parallel to the scenario laid out

before should start to come into view. To proceed further we have to understand precisely what was

added to each structure factor of the native crystal to produce the derivative structure factor. In

other words, we need to know what the heavy atom contribution was. We know from the structure

factor equation that if we are able to determine the location of the heavy atom(s) within the unit cell,

then we can calculate directly what contribution the heavy atom(s) made to each structure factor.

Determining the heavy atom position(s) is a separate problem – known as ‘solving the heavy atom

substructure’ – which we will not discuss in this chapter as it requires specialized calculations and

analyses. Instead we will proceed to explain how knowing the heavy atom position(s) and being able

to calculate the heavy atom contribution makes it possible to determine the unknown phase for each

reflection. First, we have to recognize that it is the vector (or complex valued) quantities for the

structure factors that determine how they add together. For each reflection (h,k,l),

𝐹 𝑃𝐻 = 𝑓 𝐻 + 𝐹 𝑃

where 𝑓 𝐻is the heavy atom contribution, whose value including its phase (or real plus imaginary

components) can be calculated from the positions determined for the heavy atom(s). The behavior

of this equation is typically illustrated with a phase circle or Harker diagram, as shown. The FP and

FPH structure factors are depicted

as circles, since their phases are

unknown at the outset. But the

centers of those two circles are

offset by a vector amount dictated

by 𝑓 𝐻. Under that construction,

with the 𝑓 𝐻and 𝐹 𝑃 vectors laid head

to tail, one sees that the

intersections of the two circles

gives two possible solutions to the

vector triangle addition equation.

Either of two choices for the phase

of the native structure factor

would agree with the diffraction

data. So, with information from

only one heavy atom derivative

(referred to as SIR for single

isomorphous replacement) we are

still left with an ambiguity about

the correct choice of phase angle. It

is possible to proceed with an

208

electron density calculation with an average of the two possible phase values, and sometimes this is

good enough, but clearly one would like to do better. The solution is MIR (multiple isomorphous

replacement). Additional heavy atom derivatives are sought, perhaps using different heavy atom

types. Further diagrams are not given here, but you can anticipate how collecting diffraction data on

a second derivatized crystal would for each structure factor produce a second heavy atom circle with

yet another different origin (related by a different heavy atom contribution), and if the data are well-

measured then there should be a point on the native protein phase circle where the other two circles

nearly intersect, and this gives a value for the native phase angle. In reality, the presence of

experimental errors makes the actual assessment of the best phase somewhat more involved.

In the last two decades, a large fraction of structures have been determined with variations on the

heavy atom method above (which dates to the first protein structures of myoglobin and hemoglobin

by the Kendrew and Perutz laboratories). These variations take advantage of the ‘anomalous’ x-ray

scattering of certain atoms, including many heavy atoms but also including lighter atoms, most

notably selenium. The twist is as follows. Ordinary heavy atom methods gain the additional

information required to break the phase ambiguity by measuring two different structure factor

magnitudes from two different crystals: the native protein crystal and the heavy atom derivatized

crystal. Anomalous scattering methods gain the information required for phasing from two structure

factors from the same crystal. We did not discuss it earlier, but for scattering from an ordinary crystal

(i.e. one not containing anomalously scattering atoms), the structure factor magnitudes for the (h,k,l)

reflection and the (-h,-k,-l) reflection are identical. This equality is broken by anomalously scattering

atoms, and extra information is obtained then from comparing the magnitudes of F(h,k,l) and F(-h,-

k,-l) for each reflection. Phasing approaches involving combinations of heavy atoms and anomalous

scattering are possible, leading to various acronyms (e.g. SIRAS for single isomorphous replacement

with anomalous scattering). The phase circle constructions in these various cases are different in

detail, but the main idea remains the same. An important feature of anomalous scattering methods

is that selenium atoms provide a relatively strong anomalous signal and can often be incorporated

seamlessly into a native protein by expressing the protein in bacteria grown on selenomethione; a

powerful and general method developed by Wayne Hendrickson and colleagues.

Electron density maps: obtaining an atomic model

We have already seen the electron density equation by which we can calculate an electron density

function (x,y,z) or ‘map’, once we measure the structure factor amplitudes and recover approximate

values for the missing phases. We will just note here that the two elements most critical to the quality

of the electron density map obtained are the accuracy of the phases and the resolution (minimum

value of d) to which the data extend. Electron density maps are shown here at resolutions of 3Å and

1.7 Å to emphasize what can and cannot be visualized. Depending on the quality of the phases, a

reliable tracing of the path of the backbone may require a resolution of about 3.0 Å or better. Side

chain identities become relatively clear at about 2.7 Å; prior knowledge of the amino acid sequence

is extremely valuable in most cases. At a resolution of about 1.7 Å, holes in ring structures may

209

become evident. And at about 1.1 Å

separate spherical densities for

individual atoms appear. Hydrogen

atoms scatter weakly in x-ray

diffraction so at typical resolutions

their positions are not visualized but

are instead inferred from geometric

considerations.

Modern software packages attempt to

model an atomic structure, given the known amino acid or nucleotide sequence, into a calculated

electron density map. Such automatically traced models usually require several rounds of human

inspection and rebuilding, combined with automated minimization by computer, in order to obtain a

final model that is reliable. An important feature of x-ray crystallography is the ability to quantitate

the degree to which a final atomic model agrees with the observed data; the structure factor equation

makes this possible. The level of disagreement between the model and the observed structure factor

magnitudes is given as an ‘R-factor’, which is just a measure of the residual error on a fractional scale.

Macromolecules are complex. They are often flexible in ways that are hard to capture in a single

model, and they often have ordered solvent structure around their surfaces that can be difficult to

model. As a result, the R-value for a relatively good structure may be in the range of 20%, and worse

for structures determined at lower resolution. The final refined structure in a crystallography

experiment is usually also informed by known geometric restraints, based on known values for bond

distances and angles. In crystal structures reported in the literature, a standard crystallographic

table will report values for the R-factor along with deviations from ideal geometry. Additional entries

give further information by which an expert can assess the quality and likely accuracy of the reported

structure.

Protein Crystallization

Often the hardest part of a crystallographic project is obtaining good crystals. Surprisingly,

sometimes crystals that exhibit excellent morphology by microscopic examination turn out to diffract

poorly, presumably reflecting insufficient order on the atomic scale. Though there are important

theoretical considerations, protein crystallization is still largely an art. In effect, one drives the

protein or nucleic acid out of solution under many different conditions, perhaps thousands, searching

for conditions that give highly ordered crystals. The most common experimental set up is referred

to as hanging-drop vapor diffusion. In each essentially separate experiment, a tiny drop of protein

(typically from a tenth to a few microliters) is mixed with an equal volume of a ‘reservoir solution’,

which contains a precipitant of some kind (high salt or a crowding agent like polyethelyne glycol), a

buffer to control pH, and possibly other compounds such as metal ions or organic compounds. Then

210

that mixed drop is hung upside down over the reservoir solution in a sealed chamber. Because the

precipitant is more concentrated in the reservoir, water leaves the protein drop and condenses in the

reservoir by evaporation. The protein solution thereby become more concentrated, and if the

solubility limit is exceeded the protein precipitates. Modern hanging drop experiments are

commonly set up by robotic liquid handling devices in 96 well plates. Many plates are typically set

up during the search for good crystals. With good luck, and assuming the protein or nucleic acid has

a sufficiently well-defined three-dimensional structure, crystals can be obtained. Examples of

crystals grown in hanging drops are shown. In most experiments, well-diffracting crystals have

linear dimensions in the 50m to 500m range. Crystals diffracting to good resolution have been

obtained for whole ribosomes, other giant complexes containing multiple protein and nucleic acid

subunits, membrane protein complexes, and numerous whole viral capsids including some with large

triangulation numbers. Size and complexity does not present a fundamental obstacle as long as a

defined three-dimensional structure is well-populated in solution. A final comment about protein

and nucleic acid crystals is that the molecules in these crystals are fully hydrated. In fact the water

content in typical protein crystals is in the 40% to 50% range. This is an important reason why the

structures obtained in the crystal state can be shown in most cases to be largely unaffected by crystal

formation, aside from local conformational effects where molecules contact each other in the crystal

lattice.

Lectures on Physical Biochemistry

Documents

Transcript of Lectures on Physical Biochemistry