BIOCHEMISTRY LECTURES. Figure 10-2 2 Stages in the extraction of energy from foodstuffs.
Lectures on Physical Biochemistry
-
Upload
vuongduong -
Category
Documents
-
view
354 -
download
18
Transcript of Lectures on Physical Biochemistry
1
Lectures on Physical Biochemistry
Todd Yeates
© 2015
2
For Melissa
3
Preface
The following text derives from material presented in a course in physical biochemistry at UCLA
(Chemistry and Biochemistry 156). Much of the material owes its origin to lectures delivered by
other faculty members at UCLA who were my teachers and mentors. These include Emil Reisler,
Wayne Hubbell, and especially Doug Rees, for whom I served as a TA for multiple offerings of the
course while I was a graduate student. With regard to other books upon which the material rests,
the classic text Physical Biochemistry with Applications to the Life Sciences by Eisenberg and
Crothers stands out as the most influential. Other texts from which selected materials have been
extracted include: Physical Biochemistry by van Holde, Physical Chemistry: Principles and
Applications in Biological Sciences by Tinoco, Sauer, Wang, Puglisis, Harbison, & Rovnyak, and The
Molecules of Life: Physical and Chemical Principles by Kuriyan, Konforti, and Wemmer, Molecular
Driving Forces by Dill, and Random Walks in Biology by Berg.
I am indebted to the students and TA’s who have participated in the course over many years and have
made developing and teaching the material a stimulating and rewarding challenge. I am particularly
indebted to Sunny Chun, who proofread the first draft.
4
Contents
Page
Chapter 1 9
Points for Review
o Thermodynamic systems
o Systems and surroundings
o The 1st law
o Work, w
o Heat, q
o Enthalpy, H
o The 2nd law
o Classical and statistical views of entropy
Entropy and the Distribution of Molecules in Space
Entropy and the Distribution of Molecules Among Energy Levels
Chapter 2 20
Entropy of Mixing and its Dependence on Log of Concentrations
o Stirling’s approximation
o ‘Entropy of Mixing’
Gibbs Free Energy, G
o A state variable that indicates the favorability (or equilibrium) of a process at
constant T & P
o ΔG as a balance of two factors, ΔH and TΔS
o How to think about ΔG in a steady state process
o Free energy of mixing and the: further insight into what drives processes towards
equilibrium
Chapter 3 26
Chemical Potentials, µ
o Definition of µ as a partial derivative of G with respect to composition
o Dependence of chemical potentials on concentrations and standard state chemical
potentials µ0
o The total differential, dG as a function of changes in composition
o Equilibrium conditions in terms of µ’s
o Equilibrium conditions in terms of concentrations and standard chemical potentials:
arriving at familiar equations for the equilibrium constant
o Importance of units
o Precautions about ΔG vs ΔG0, reactions with changes in stoichiometry, and overall
concentration effects
o The dependence of ΔG and K on T (van’t Hoff equation)
Chapter 4 37
5
Non-ideal behavior in mixtures
o The breakdown of ideal equations for chemical potential
o Activities and activity coefficients
o The ideal behavior of highly dilute solutions
o The origin of non-ideal behavior at higher concentrations
o Reworking the equilibrium equations in terms of activities instead of concentrations
Ion-ion interactions in solution as an example of non-ideal behavior (Debye-Hückel
theory)
o Ionic strength and the Debye length
o Activity coefficients for ionic species
o Using ionic activity coefficients to analyze the effect of charge on molecular
association, and electrostatic screening
Molecular crowding and excluded volume effects as an example of non-ideal
behavior in solutions of macromolecules
o The idea of excluded volume
o The peculiar behavior of rigid elongated structures
Chapter 5 50
Chemical Potential and Equilibrium in the Presence of Additional Forces
o Osmotic pressure
o Equilibrium sedimentation
Chapter 6 60
Electrostatic potential energy, ion transport, and membrane potentials
o The chemical potential energy of an ion at a position of electrostatic potential
o The Nernst equation and membrane potential
o The Donnan potential
o Variable ion permeabilities and complex phenomena
Molecular Electrostatics
o The dielectric value
o Simplified electrostatics equations
o A different kind of electrostatic energy: the Born ‘self-charging energy’
o Free energy of ion transfer
Chapter 7 70
Energetics of Protein Folding
o A balance between large opposing forces
o Terms that contribute to the energetics of protein folding
o The special case of membrane proteins
Measuring the Stability of Proteins
Ideas Related to How Proteins Reach their Folded Configurations
6
Chapter 8 80
Describing the Shape Properties of Molecules
o Radius of gyration
o Persistence length for flexible chains
Chapter 9 87
A Brief Introduction to Statistical Mechanics for Macromolecules
o Probabilities and expected values
o Statistical weights for outcomes with unequal probabilities
o Handling degeneracies
A Statistical Mechanics Treatment of the Helix-Coil Transition for a Polypeptide
Chapter 10 94
Cooperative Phenomena and Protein-Ligand Binding
o Relationship between cooperative behavior and processes involving formation of
multiple interactions simultaneously
o Protein-ligand binding equilibria
o Binding to an oligomeric protein – independent binding events, no cooperativity
o Non-linear Scatchard plots – non-identical or non-independent binding sites
o Experiments for measuring binding
o Phenomenological treatment of cooperative binding- the Hill equation
o Physical models of cooperative binding - MWC
Allostery
Chapter 11 113
Symmetry in Macromolecular Assemblies
o Definition of symmetry
o Mathematical groups
o Point group symmetries for biological assemblies
Special Topics in Symmetry
o Helical Symmetry (non-point group)
o Quasi-equivalence and the structure of icosahedral viral capsids
o Using symmetry to design novel protein assemblies
o Algebra for describing symmetry
Chapter 12 124
Equations Governing Diffusion
o Diffusion in 1-D
o General equations for diffusion
7
o Special topic: Using numerical (computational) methods to simulate diffusion
behavior
Chapter 13 132
The Diffusion Coefficient: Measurement and Use
o Measuring the diffusion coefficient, D
o Relating the diffusion coefficient to molecular size
Special Topic in Diffusion: Diffusion to Transporters on a Cell Surface
Chapter 14 143
Sedimentation velocity, v
o Sedimentation coefficient, s
o Combining s and D to get molecular weight without a spherical assumption
o A summary of molecular weight determination from sedimentation and diffusion
measurements
Chapter 15 148
Chemical Reaction Kinetics
o Reaction velocity, v
o Rate laws
o Integrating rate laws
o Behavior of more complex reaction schemes
o Numerical computer simulation of more complex reaction schemes
o Enzyme kinetics under a steady-state assumption
o Relaxation kinetics: how systems approach equilibrium
o Kinetics from single molecule studies
Chapter 16 164
Kinetic Theories and Enzyme Catalysis
o The Arrhenius equation
o Eyring transition state theory
o Catalysis by lowering the transition state energy
o Practical consequences of enzymes binding tightly to the transition state
o Kinetic parameters of natural enzymes
Chapter 17 171
Introduction to Biochemical Spectroscopy
o Energy transitions
o Fluorescence
o Kinetics of fluorescence and competing routes for return to the ground state
8
Chapter 18 178
Special Topics in Biochemical Spectroscopy
o Polarization and selection rules
o Fluorescence experiments with polarized light
o Fluorescent resonant energy transfer (FRET)
o FRET in biology
o Spectroscopy of chiral molecules: Optical rotation and circular dichroism
Chapter 19 194
Macromolecular Structure Determination and X-ray Crystallography
o The limiting effect of wavelength
o Diffraction geometry
o Obtaining the atomic structure
o Protein Crystallization
9
CHAPTER 1
Points for Review
Thermodynamic systems
We are all familiar with the everyday behavior of various kinds of mechanical systems. This often
aids us in understanding the behavior of molecules, which are indeed governed by the laws of
physics. But there are also key differences to bear in mind between the physical behavior of systems
at the macroscale and the thermodynamic behavior of molecular systems. Analogies can be drawn
to a bowling ball at the top of a hill. We know that if pushed it will go to the bottom of the hill and
(eventually) stay there. Once it has come to (apparent) rest we don’t worry about it suddenly moving
under its own internal energy to a higher location. Or likewise for a textbook sitting on a desk. We
wouldn’t think about measuring how far on average it finds itself levitating above its lowest energy
position on the desktop. But these sorts of ideas arise constantly in thinking about the behavior of
molecules. Why? The distinction is largely one of scale, having to do with relative sizes, forces, and
energies. The essence is that in molecular systems (at temperatures sufficiently above 0 K), the
magnitude of the thermal energy is comparable to energy differences associated with meaningful
differences in the properties of the molecules, such as their velocities and detailed three-dimensional
conformations.
We will emphasize throughout the course the importance of the idea of an ‘average thermal energy’,
which is kBT, where kB is Boltzmann’s constant (or alternatively RT when working with molar
quantities, where R is the universal gas constant and R=NAkB and NA is Avagadro’s number)). If we
accept the idea that physical objects exhibit energies on the scale of kBT, then how high might we
expect a 1kg textbook to levitate off the desktop under its own thermal energy at 298K? [Hint: equate
gravitational potential energy for the book at a height h with the energy value of kBT]. You will
(hopefully) find that that height is infinitesimally small, which is consistent with experience. But
owing to the much smaller energies that affect molecules, big or small, kBT is an energy sufficient to
drive the rapid movements, collisions, conformational changes, and chemical reactions that
characterize molecular systems.
Systems and surroundings
In thermodynamics it is important to keep in mind what is being considered as the system under
investigation. Everything else is the surroundings. When discussing thermodynamic quantities (P, V,
U, …) we are referring to measurements and properties of the system, but depending on the situation
the surroundings may be important in exchanging energy (in the form of work or heat) or material
with the system. In a closed system, no exchange of material occurs. In an isolated system, there is
no exchange of material or heat or work. In some problems, the entire universe might comprise the
system. In that case there are no surrounding with which exchange might occur, so the universe
would follow the same rules as an isolated system.
10
The 1st law
The first law of thermodynamics expresses a law of energy conservation, namely that the energy
change in a system equates to whatever energy is delivered to it (and thereby lost by) the
surroundings. Therefore, the change in energy (U) of a system during some process is given by the
amount of heat that is transferred to it from the surroundings plus the amount of work done on it by
the surroundings.
U = q + w
For an isolated system, q=0 and w=0, so U = 0
The first law is relatively easy to appreciate, since we’re familiar with conservation laws in other
contexts, e.g. conservation of mass, or the conservation of total energy in mechanical systems. Besides
conveying an important conservation principle, the first law serves as a reminder about equations
for work and heat.
Work, w
You’ll recall from physics that work is force integrated over distance or displacement:
w = F dx
Against pressure: F=PA, dV=Adx, so w= PA(1/A)dV= PdV [As written this work would have the
sense of work done by a system whose volume was changing.]
Against a harmonic spring with force constant k, w= Fdx = kxdx = (1/2)kx2
And likewise for any situation where a function for the force on a molecule can be written (possibly
depending on position). We can integrate over position to give the work energy that would be done
on the molecule as a function of its position.
Heat, q
Other things being held constant, we often associate heat transfer with temperature change, and the
heat capacity, C, relates those two changes. Recall
C=dq/dT or q/T and
q = C dT
The heat capacity is a measure of how hard it is to change the temperature by adding heat. From
introductory physical chemistry you’ll recall that for an ideal gas Cv = (3/2)R (on a per mole basis).
11
For complex molecules, the molar heat capacity is higher. For an ideal gas, the energy of the system
is associated solely with the kinetics (i.e. velocities) of the molecules (which are presumed rigid in an
ideal gas model). More complex molecules like biological macromolecules have very many ‘internal
degrees of freedom’, which are required to specify the positions and movements of atoms relative to
each other in the same molecule. Recall that macromolecules are subject to all kinds of
conformational fluctuations, mostly small but some very large. You might recall that the
equipartition theorem tells us that energy in the amount kBT will be partitioned equally into each of
the degrees of freedom in a system, which means that systems comprised of complex molecules will
require more heat energy to raise the temperature owing to the much greater number of degrees of
freedom into which the energy gets partitioned.
Enthalpy, H
Note from the form of the equation U = q + w, that at constant volume (so no ‘PV’ work is done)
heat transfer q relates closely to internal energy U. [If w=0 then U = q or dU = dq]
But if the volume is not constant, the heat transfer relates better to another thermodynamic state
variable, H, the enthalpy, given by H = U + PV
Differentiating H = U + PV gives
dH = dU + PdV + VdP = dq + dw + PdV + VdP
At constant P (dP=0) and with only ‘PV’ work, dw = –PdV [The sign appears negative here as the
work w must refer to the work done on the system, whereas our earlier equation for w had the
opposite meaning]
Substituting dw=-PdV into the previous equation, we see that at constant pressure and only PV work,
dH = dqP – PdV + PdV + 0, giving
dH = dqP (with the P subscript denoting what is held constant). So enthalpy and q are closely related
at constant P.
Note that especially for gases, where pressure and volume changes (e.g. as a function of temperature)
are substantial, U and H (which differ from each other by the term PV) may be substantially different.
But in other systems where pressure and volume changes are minimal, our intuition about what
enthalpy and internal energy mean tends to be closer. This is the case for many of the kinds of
systems like solutions of macromolecules that we will be thinking about, and in those cases a fair
view is that the enthalpy embodies all the kinds of molecular forces of attraction and repulsion
between molecules that we’re familiar with. And in terms of ‘favorable’ vs. ‘unfavorable’, a high value
of H implies high energy or unfavorable interactions, while a low value of H implies favorable
interactions.
12
This means that we can often learn something useful about the forces and interactions that exist in a
system, for example a purified protein in solution, by measuring enthalpy changes. An experimental
method known as differential scanning calorimetry is often used to make those kinds of
measurements. A sample is slowly heated and the heat transfer required to produce each small
incremental increase in temperature is recorded. The difference is taken relative to a blank, which
would contain the solution and buffer but not the protein. If performed at constant pressure, that
recorded quantity, dqP /dT, is the heat capacity at constant pressure, CP. And from above, dqP /dT =
dH/dT = CP. And dH = CP dT. That means H for a process can be obtained by integrating the heat
capacity over the course of a temperature increase, H = CP dT
The example illustrated here is from
a thermal unfolding experiment on a
purified protein, carboxypeptidase A.
At low temperature the protein is
folded natively. At high temperature
the protein is unfolded. The
relatively flat parts of the curve in
those two regions are simply
reflecting the heat or enthalpy
change associated with increasing
the temperature (local vibrations for
example) of the protein. But the
region in the middle shows a
dramatic increase in the heat
capacity. This corresponds to the energy required to convert the folded protein to its unfolded form.
Favorable molecular interactions are broken in the process and the overall enthalpy change is
positive. The area of the shaded region is attributed to the H for the protein unfolding transition.
The Second Law
The second law presents much greater challenges to understanding than the first. Rather than stating
a law of conservation, it defines a directionality in which processes will naturally proceed (in time).
In that sense the second law enforces the ‘arrow of time’. The second law tells us that the total
entropy S in the universe (i.e. for any system plus its surroundings) is always increasing in time. And
this is likewise the case for an isolated system (since it can be viewed as its own universe).
So, for a spontaneous process (i.e. a process that would occur in the forward direction) occurring in
an isolated system, or for the universe as a whole, S > 0. Likewise, S = 0 describes a process at
equilibrium, i.e. with no net conversion forward or backward. In that sense the condition of
equilibrium can be seen as an optimization problem. At equilibrium S is a maximum and ds = 0 with
respect to forward or backward progress of the imagined process.
13
It is often tempting to forget about the conditions or restrictions under which various
thermodynamic equations hold true, but it is vital to understand that the equation S > 0 (for
spontaneous occurrence) requires the condition of an isolated system. In fact failure to understand
this vital requirement is the source of much confusion among the public and lay-scientists about
whether the development of life on Earth and the associated increase in order and molecular
complexity – an idea we will tie to entropy shortly – violates the second law of thermodynamics (and
thereby requires a creator). The folly in the argument is that the Earth is by no means an isolated
system, and in fact the delivery of light energy from the Sun to the Earth to drive photosynthesis is
essential for the chemical conversions that support life on Earth.
Classical view of entropy
From classical thermodynamics you learned that dS = qrev/T , where qrev is the heat transferred during
a reversible infinitesimal step in a process. This view of entropy is extremely useful for
understanding processes of heat transfer and expansion in gases. We learn that entropy increases
when gas volumes expand, and when heat is transferred from a hotter object to a colder object; those
processes are naturally favorable or spontaneous.
Statistical description of entropy
A way of stating the second law from a statistical thermodynamics view is that processes tend toward
maximum disorder or randomness, i.e. to configurations that can be realized in the greatest number
of ways. This view can be reconciled intuitively with the classical view – gas expansion allows for
greater freedom and less order with regard to the positions of atoms, and heat transfer from hot to
cold molecules decreases the order in the sense that the distinction between some molecules having
more thermal energy than others is removed. The intuitive relationship between the classical and
statistical views of entropy can be formalized mathematically but we will not attempt that here.
Instead, without further proof the statistical view of entropy is
S = kB lnW
where W is a measure of disorder or randomness that can be interpreted as the number of distinct
configurations that correspond to a given state. This is sometimes referred to as the number of
microstates. [Note that some texts use instead of W].
In this view, the requirement that entropy increases means that favorable states are those that can
be realized in the greatest number of ways. We will set up some highly simplified problems to see
how the statistical view of thermodynamics helps explain some basis molecular phenomena.
Entropy and the distribution of molecules in space
14
Let’s look at what the statistical view of entropy tells us about the way molecules tend to be
distributed in space. We’ll first consider a very tiny problem, too tiny really to qualify as a proper
thermodynamic system, but still informative. Suppose we have 4 molecules or particles that are
identical, but we can label or number them to make it possible to distinguish between microstates
within a given state. The system consists of a box with two chambers, a left side and a right side.
Suppose we describe the state of the system according to the number of molecules that are on the
left side vs the right side. We can let nL be the number of particles on left side and nR be the number
on the right. For each possible state (i.e. a defined number of molecules on each side), we can
enumerate the number of distinct ways or microstates (W) by which each state can be achieved by
choosing distinctly labeled molecules.
For some of the states, the value
of W is obvious enough. For
example, for state B, which has
just one molecule on the right
(nL=3), any of the 4 molecules
can be chosen to place on the
right, and so W=4. Likewise for
state D for which nL=1. The
case of nL=2 is harder. How
many ways can we divide or
partition a group of four objects
into a first subset of 2 (to place
on the left) and a second subset
of 2 (to place on the right)? The answer is 6, which comes from 4!/(2!2!) = 24/(2*2) = 6. This is a
combinatorial expression closely related to the permutation equation that says the number of ways
of ordering n objects is n factorial, or n!. Why in the case above do we divide 4! by 2! and 2!? One
way to see this is as follows. How many ways can 4 objects be ordered (e.g. in a line)? The answer is
4! or 24. Now let’s say that each of these 24 ways of writing down the molecules in order (e.g. 3 1 2
4) automatically assigns two to the left side (in this case 3 and 1) and two to the right side (2 and 4).
But you can see that the total set of 24 possible orderings overcounts the number of distinct outcomes
in the sense that there are other orderings that give the same partitioning. For example (1 3 2 4) is
the same partitioning as (3 1 2 4). If the same two particles are on the left, their separate ordering is
irrelevant. Since the state in question has two molecules on the left, and the number of ways of
ordering 2 objects is 2! or 2, we need to divide the total number of 24 orderings by 2. For the same
reason, the ordering of molecules within the right side doesn’t matter either, and so we must divide
again by 2! This gives us the value 6 we expect.
Thinking about problems like this in terms of partitioning between 2 (or more) groups is powerful.
The general equation for the number of distinct partitionings of N objects between a first group of n1
and a second group of n2 (with n1 + n2 = N) is
W = N!/(n1! n2!)
15
An equation of this form shows up throughout statistics applications. In typical statistics jargon, the
number of possible combinations for “N choose m” is NCm = N!/(m! (N-m)!), which matches the
equation above. The basic partitioning idea applies to many problems. How many different 5-card
hands can be dealt from a 52-card deck in which the cards are all considered distinct from each other?
[Hint: being dealt 5 cards is really just partitioning the 52 cards into the 5 you get and the others you
don’t get; the order in which you get dealt the cards doesn’t matter here.]
As an aside, another common type of probability problem (which also shows up in molecular
problems) involves a series of independent choices, and there the total number of possible outcomes
is n1*n2*n3*…. where the n’s describe the number of distinct options that are available to choose at
each step. Often the two types of probability problems are related to each other. Consider a variation
on the 4 molecule problem above. Suppose we want to know the total number of different ways the
4 molecules can be placed into two chambers, allowing all possibilities for the number of molecules
on each side, and as before not distinguishing between the positions of particles within the same
chamber. This can be answered by seeing that it amounts to making an independent choice for each
molecule about whether it will go on the left or right. So there are 2 choices, made 4 independent
times, which is 2*2*2*2 = 16. You’ll note that the answer to this problem counts up all together the
number of different partitioning, so it is not a coincidence that the values for W in the original
problem (1, 4, 6, 4, 1) sum to 16.
Returning to the problem of how molecules tend to distribute themselves in space, four molecules is
perhaps too small to give a clear picture of significance, so let’s go slightly bigger to N=6, again
treating the problem of how the molecules can be partitioned into two sides. For nL = {0, 1, 2, 3, 4, 5,
6}, we get respective values for W of {1, 6, 15, 20, 15, 6, 1}. You may begin to recognize the coefficients
as those from Pascal’s triangle. What does this tell us? Assuming there are no energetic differences
at play and each of the 6 molecules is free to occupy either chamber, then the likelihood of the system
being in any given state is proportional to the number of microstates, W. That means that it is 20
times more likely by chance for there to be three molecules in each chamber compared to the case
where all 6 molecules are on the left. Evidently, the most likely scenario is the one where the
molecules are evenly distributed with three on each side. The same trend applies, and becomes more
dominant, as the size of the system N increases. The basic conclusion is that entropy drives things
towards a uniform distribution of molecules in space, i.e. equal concentrations everywhere, assuming
the absence of energetic differences.
The behavior of this problem as N get large is also instructive. The plots show the probabilities one
gets for the distribution of molecules between the two sides of a system for N=6, larger 100, and then
N=10,000.
16
Returning to the case of N=6, one sees that the state with a uniform distribution of molecules is the
most likely, but the chances of significant deviations from that arrangement are substantial. With
molecules that are free to move around, two-fifteenths of the time the system will be found with all
the molecules on one side or the other. As N gets larger, the likelihood of substantial variations (on
a relative scale) goes down. As N gets larger the discrete combinatorial plot turns into a smooth
Gaussian function. The most likely outcome is still where nL is N/2. The standard deviation from the
most likely value for nL is (from earlier courses in statistics) 0.5*sqrt(N). So for example, if N=100,
the most likely value for nL is 50, but with a standard deviation of 5. What about when N = NA =
6.02*1023? There the standard deviation would be a large number (3.9*1011), but in fractional terms
compared to NA, the variation is minute. That is, the expected fraction of molecules on the left would
be 0.5 +/- 6*10-13. This is a general finding; for large thermodynamic systems the behavior of the
system tends to be dominated by the most likely scenario. On the other hand it is important to note
that the kinetic (time-dependent) behavior of a system often depends on the frequency of
perturbations away from the most probable arrangement.
Entropy and the distribution of molecules among energy levels
The same kind of treatment can be used to analyze how the energy in a system tends to be distributed
among the molecules present. Again, for numerical simplicity we’ll first treat a tiny system just big
enough to gain some insight. Suppose we have a system in which 4 identical molecules are each able
to exist in a series of discrete energy levels (in arbitrary units, E=0, E=1, E=2, …). And further suppose
that the total energy is fixed at ET=3. What are the possible ways that the 4 molecules can be placed
into the available energy levels? Note that nothing prevents multiple molecules from having the same
energy. For this tiny system there are only 3 different states or configurations of molecules among
energy levels subject to the restriction that ET=3. They are shown below, labeled states A, B, and C.
What is the value of W for each state? That is, for each energy configuration, how many different
ways could the molecules satisfy that configuration?
17
The answer is to think of this as a
partitioning problem. For state A, the 4
molecules are being partitioned into a
subset of 3 that will have energy 0, and
a subset of 1 that will have energy 3. For
that case, W = 4!/(3! 1!) = 4. For state B
we get W=4 also. For state C we must
first generalize our previous equation
for the number of combinations or
partitionings. When a partitioning
occurs into more than 2 subsets, the
equation for W generalizes to
W = N!/(n1! n2! n3! …),
where the small n’s refer to the number of molecules in the different subsets. [For completeness, also
remember that 0!=1 so empty subsets can be ignored]. So, for state C, W = 4!/(2! 1! 1!) = 12. What
we glean from this tiny test case is that the most likely situation (i.e. where W is greatest) is where
the molecules are spread out to some degree among the available energy levels, with the lowest
energy being the most populated.
Simulating exchange of energy between molecules in a closed system
The behavior of slightly larger systems can be analyzed by random simulations with rather
remarkable results. Suppose now we have a set of 50 molecules, and for the sake of argument
suppose the average energy is 1 so that ET = 50. We can set up an initial system where all 50
molecules sit at energy level 1. Then, molecules exchange energy between themselves, as might
results from collisions for instance. The details of the execution are important. Pick two molecules
at random, one whose energy will go down by one unit and the other whose energy will go up by one
unit. Do this over and over. But note the caveat that if the first molecule randomly chosen is already
at energy level 0, then throw out this energy exchange trial and repeat again; i.e. the energy of a
molecule can’t drop below the lower bound. If one performs this kind of random simulation, one
finds remarkably that the system will tend towards an energy distribution of the type noted above.
No other tricks are required. An example result of random simulation for N=20 is shown below.
For larger N, the simulation begins to produce a smooth distribution. Examples for N=320 and
average energy =1 and 2 are shown below.
18
The exercise demonstrates that the random tendency of molecules to spread out among available
energy levels while also being subject to the constraint of a lower energy bound naturally gives rise
to a smooth distribution where the lowest energy is most-populated, and then the distribution falls
off at higher energy. The diagrams above have the energy level going up vertically, and the frequency
with which molecules are found at that energy indicated by a horizontal bar. This can be flipped
around to give a more typical plot showing the probability (or abundance) of molecules on the
vertical axis and the energy value indicated on the horizontal axis. Doing this produces familiar plots
that show an exponentially decaying curve for the probability that any given molecule will have
energy E. This is the Boltzmann distribution.
The Boltzmann distribution
teaches some important
principles. There are fewer
and fewer molecules with
higher and higher energies.
But there are some, and
how many of these higher
energy molecules there are
is essential for
understanding rates of
processes that depend on
overcoming an energy
barrier. Another key
feature of the Botzmann
distribution concerns how
sharply the probability falls off as a function of energy. According to the Boltzmann equation, that
fall-off is governed by the denominator of the exponent (kBT, a term we alluded to before).
Specifically, we can ask what the ratio is between probabilities for two energy levels separated by
kBT. Call those probabilities P(E) and P(E+kBT). With a little algebraic manipulation we find that
P(E+kBT)/P(E) = exp(-(E+kBT)/kBT)/exp(-E/kBT) = exp(-1) = 1/e
19
This is a powerful simplifying statement. It tells us that kBT is the amount of energy difference that
corresponds to a drop in probability by a factor of e (which is about 2.7). The ‘thermal energy’ value
kBT is therefore the key quantity for comparison when evaluating whether two possible
configurations of a system that are separated by some given energy difference will be populated
similarly or very differently. The value of kBT is such a useful quantity for comparison that an energy
difference will sometimes be stated in terms of how many kBT units it is (which is effectively the same
as stating the value of the unitless exponent E/kBT above). For example, one might hear,
“conformation A is ‘2 kay – tee’ higher in energy than conformation B”.
Finally, always keep in mind that kBT and RT convey equivalent meanings; they simply differ by a
factor of Avagadro’s number, NA. RT must be used if the energy values are being described on a per
mole basis. The context and units assigned to the energy should make it clear which is being used.
For convenience, RT (at 298 K) is about 2500 J/mol (in SI units); the value is also sometimes given in
non-SI units as 2.5 kJ/mol.
20
CHAPTER 2
Entropy of mixing and its dependence on log of concentrations
Stirling’s approximation
We begin with a preliminary equation, Stirling’s approximation. As we saw before, various
calculations having to do with the statistical interpretation of entropy lead us to factorial expressions,
n!. Such numbers become intractable to evaluation as n gets large; how would you actually figure out
what a billion factorial was, or the factorial of Avagadro’s number? Stirling’s equation gives us an
approximation for the natural log of a factorial expression; from there one could exponentiate if
necessary to get an approximation for the value of n!, but we’ll see it is typically the log of the factorial
expression that we want anyway. Stirling’s approximation is as follows:
ln(N!) ≈ N * (ln(N) – 1)
Here is how close the approximation is:
N Actual value of ln(N!) evaluated as ln(1) + ln(2) + … + ln(N)
Stirling’s approx., N*(ln(N) – 1)
1,000 5907.7 5912.1 106 12815510 12815518
At least in terms of relative proportion, you’ll see that the error becomes very small as N gets large.
‘Entropy of Mixing’
A simple exercise illustrates the dependence of entropy (and subsequently other energetic terms) on
the natural log of concentrations. Suppose you have a system with two chambers and it contains
molecules of two types (black and white for sake of illustration). Suppose there are n1 black
molecules and n2 white molecules and consider a starting configuration where the n1 black molecules
are all on the left and the n2 white molecules are all on the right. Now we want to consider what
change in entropy would be associated with a process whereby the molecules could mix together so
that black and white molecules might occupy either side, as illustrated below.
21
From before we know that S = kB
ln(W), so analyzing the change in
entropy, S, boils down to figuring out
what W is for the initial state and the
final state. There are different ways
of treating this problem, but one is to
think of it as a partitioning problem
like before. Imagine beginning with
a bag of n1 + n2 = N molecules
together in a bag. Then to set up the
system you are going to partition the
N molecules into a group of n1 to go on the left side and a group of n2 to go on the right. As we
discussed before, there are many different ways of partitioning a large set into two smaller groups,
but in order to obtain the initial setup shown, only one of the possible partitionings satisfies the
requirement that all the molecules in the first group are black and all those in the second group are
white. So for the left side, Winitial=1. Now for the final state of the system. There we have agreed that
the molecules can be on either side regardless of type. For this particular problem we are going to
assume that upon mixing we still keep n1 molecules on the left and n2 molecules on the right, so for
the final state we are partitioning the N molecules into n1 on the left and n2 on the right, but any of
the possible partitionings is allowed. That means Wfinal = N!/(n1! n2!).
Therefore, the entropy change for mixing is
Smix = kB ln(Wf) – kB ln(Wi) = kB ln(Wf/Wi) = kB ln(N!/(n1! n2!))
Now you’ll see why we began with Stirling’s approximation, so we can replace the logs of factorial
expressions with algebraic quantities that can be manipulated and evaluated.
From above,
Smix = kB ln(N!/(n1! n2!))
= kB (ln N! – ln n1! – ln n2!)
≈ kB (N(ln N - 1) – n1(ln n1 – 1) – n2(ln n2 – 1) (then noting that –N + n1 + n2 = 0)
= kB (N ln N – n1 ln n1 – n2 ln n2) (then rewriting N as n1 + n2)
= kB ((n1 + n2) ln N – n1 ln n1 – n2 ln n2) (then rearranging and taking out a negative sign)
= -kB (n1 (ln n1 – ln N) + n2 (ln n2 – ln N)
= -kB (n1 ln (n1/N) + n2 ln (n2/N))
Now, if we use mole fraction Xi as a concentration to replace ni/N
Smix = - kB (n1 ln X1 + n2 ln X2), or more generally for more species
Smix = - kB (ni ln Xi), which is always positive
22
Basic conclusions from this exercise are that entropy increases by mixing, and that entropies depend
on the logs of concentrations (here expressed as mole fractions). As a further insight, noting that ln
x always goes down with lower values of x, we sense that the drive toward maximum entropy favors
every species going to a lower and lower concentration. But of course the total conservation of atoms
constrains things, making equilibrium effectively a fight over which species is driven most strongly
to lower concentration.
Gibbs free energy, G
A state variable that indicates the favorability (or equilibrium) of a process at constant T & P
Which way processes proceed naturally (i.e. forward or backwards) is established by the total
entropy of the system plus surroundings, or for an isolated system only the entropy of the system
needs to be considered. But this restriction can be removed and replaced with other more convenient
ones by constructing other state variables from a combination of S and other quantities. For many
applications in biochemistry, temperature and pressure do not change much. A state variable G, the
Gibbs free energy, which is constructed as G = H – TS, has the property of dictating the directionality
of a process in a system at constant temperature and pressure (the surroundings no longer require
consideration). A little algebra can demystify this claim. Beginning by differentiating G = H – TS,
dG = dH – Tds – SdT (then using the derivative of H = U + PV, dH=dU+PdV+VdP)
= dU + PdV + Vdp – TdS – SdT (then substituting the derivative dU = dq + dw
= dq + dw + PdV + VdP – TdS – SdT
At constant T and P we can drop two terms. And if the only work is PV work, then dw = -PdV, giving
dG = dq – TdS. Then, if the process is occurring reversibly, meaning it is not being driven forward or
backward, then from the classical treatment of entropy we recall that dS = dqrev/T, and dqrev = TdS.
Substituting then gives us
dG = 0 (reversible or equilibrium process at constant T and P)
Furthermore, the directionality of a process that is not at equilibrium is dictated by the sign of dG or
G, in the same way that the sign of ΔSsystem+ΔSsurroundings dictated the directionality of a process in our
earlier discussions, but now with a reversal of sign. Noting the negative sign that applies to S in the
expression for G = H –TS, we conclude that
dG < 0 for a process that occurs spontaneously (in the forward direction).
That is, processes at constant temperature and pressure are driven to minimum free energy, G.
23
G as a balance of two factors, H and TS
It is helpful to bear in mind that, from the form of G = H – TS, the free energy (which dictates the
directionality of processes) is affected by two terms. Converting the equation for G to a form that
describes the difference or differential between the ‘before’ and ‘after’ or left vs right sides of a
process,
dG = dH – TdS or G = H – TS (note that we have dropped a term SdT that would have been
present from differentiation since we are considering a process at constant T)
Evidently the drive to minimum free energy is a combined drive (1) toward low enthalpy H (recall
that H embodies the energetics of molecular forces and interactions between molecules, with lower
values of H corresponding to energetically favorable configurations or lower amounts or
concentrations of molecular species that have high energy), and (2) towards high entropy S (meaning
more randomness and disorder, including more uniform or equal concentrations).
How to think about G in a steady state process
In discussions of how state variables, like H or G for example, are changed in a process, what is
sometimes being described is a before and after scenario. For example a calculation of what H is
for converting a mole of pure substance A into a mole of pure substance B. [We could look up the
molar enthalpies for the two compounds in a table].
While the values of those quantities are important in evaluating the thermodynamic properties of a
process, this is rarely the sense in which things like changes in free energy, G, are considered in
biochemical processes. If we are talking about the free energy change G for conversion of citrate to
isocitrate in the cell, we are thinking about the conversion of the substrate to the product at whatever
their concentrations are, and those concentrations are not changing. Contrast that with the earlier
scenario where the composition and concentrations of the initial and final states are entirely
different. In biochemical systems where the concentrations of substances are being held roughly
constant by pathways and networks of reactions occurring together, it is clearer to think of
infinitesimal conversions of substrate to product. There can be a change in free energy in such a
process in the sense that the product and the reactant may have different free energies associated
with them (which depends on their concentrations as we shall see later), and we are creating more
molecules of the product and less molecules of the reactant, but without any substantive change in
composition or concentrations. Of course in order to express the magnitude of the free energy change
for the process of interest, we have to express it as a quantity with a meaningful scale for the
conversion that is occurring. So we express things like the free energy change for a process on a per
mole quantity, though for conceptual clarity we should keep in mind the infinitesimal or differential
nature of the process we are considering.
24
Free energy of mixing and the dependence of G on log of concentrations
We can return to our earlier treatment of mixing and now calculate the free energy of mixing in the
same way.
From the definition of G = H - TS, Gmix = Hmix - TSmix. Now, if molecules of different types
interact with each other in a way that is energetically similar to the way molecules of like type
interact, then it should be safe to say that there shouldn’t be any enthalpy change associated with
mixing (based on our intuition that enthalpy is about molecular forces and interactions). So, letting
Hmix be zero and using our previous equation for the entropy of mixing, we get
Gmix = RT ni ln(Xi) where ni are in moles and R reflects ‘per mole’ quantities
Consistent with earlier discussions, we see that different species contribute to the total free energy
of the system according to the logs of their individual concentrations. Also note that Gmix will always
be negative, consistent with our expectation that the free energy of mixing should be favorable.
The finding that the free energy
of mixing is negative (favorable)
gives us insight into what drives
chemical reactions to their
equilibrium positions. Suppose
we start with a system containing
only chemical A, and there is a
reaction A can undergo to form B,
and that the energetics of
molecule A and B are identical; as
one instance suppose the two
molecules are enantiomers
(equivalent in structure except for handedness). We know from intuition and possible experience
that a system like this will proceed by reaction until the two species are present in equal amounts or
concentrations. But why? If A and B have the same energy, then what could drive the conversion?
Wouldn’t it be simpler if the molecules just stayed as A since the energy is not improved by converting
to B? The answer of course has to do with entropy, and specifically the contribution entropy makes
to the free energy of mixing. This imaginary scenario helps us draw a connection between (1) the
chemical conversion to reach equilibrium and (2) mixing of different components. Suppose we take
the initial system composed of only A and then imagine a hypothetical divider down the middle. Now
imagine converting all the material on the right side from A to B. Clearly the entropy, enthalpy, and
free energy of that process are all zero based on our supposition about the energetic equivalence
between A and B. Now, in a second step we can imagine that the contents of the two sides are able
to mix. This will result in a mixed system with equal amount of A and B, and the free energy of the
mixing would be negative (following from the equation above). The two steps put together produce
25
exactly the same result as if there was chemical conversion of half the A molecules into B molecules
in the whole system. This thought exercise lets us see that the favorability of converting some
amount of a pure substance into other substances to reach equilibrium derives from a favorable
entropy of mixing.
26
CHAPTER 3
Chemical potentials, µ
From before we have an understanding that the free energy of a system composed of a mixture of
chemicals depends on the concentrations of the components, and if a chemical process is possible
that would interconvert some components into others, then there is a free energy change associated
with that process.
From previous courses you will likely remember equations of the following form:
G = G0 + RT ln (Q), and letting K = exp(-G0/RT) gives G = RT ln (Q/K)
where G0 expresses the (molar) free energy change for a reaction if it were occurring under
standard state conditions, Q represents the ratio of product concentrations to reactant
concentrations under the conditions where the reaction is being considered, and the equilibrium
constant K is the ratio of product to reactant conditions at equilibrium. Below we will show how the
equations above can be obtained, and perhaps better understood, by taking a differential or
infinitesimal view of any reaction or process underlying the conversion of molecules from one
species to another or from one location or phase to another.
Definition of µ as a partial derivative of G with respect to composition
‘Chemical potentials’ are differential or derivative quantities that help us get at the free energy of a
mixture (i.e. a system with multiple components). Since a mixture is just a combination of separate
components, it makes sense to consider what free energy is contributed to the mixture by each
component separately. A way of looking at that question is to consider how much the free energy of
a system would be changed by adding a tiny, infinitesimal amount of a particular component. That
idea is shown on the right, where the
chemical potential of a given
component, i, is defined as the
partial derivative of G with respect to
the change in the amount of that
component. Note that despite the
chemical potential being a
differential related to an infinitesimal
change, it is expressed as per mole
quantity.
27
Dependence of chemical potentials on concentrations and standard state chemical potentials
µ0
The free energy in a mixture depends on the natural log of the concentrations, so naturally we expect
to see a similar dependence of on concentration. The total free energy for a mixture should be the
sum of the free energies of the pure components (weighted of course by the amount of each
component), plus the free energy of mixing, since starting with pure components separately and then
mixing them (obviously) gives us a mixture.
𝐺𝑡𝑜𝑡𝑎𝑙 = ∑ (𝑛𝑖�̅�𝑖∎)
𝑖+ 𝑅𝑇 ∑ (𝑛𝑖 ln 𝑥𝑖)
𝑖
= ∑ 𝑛𝑖(�̅�𝑖∎ + 𝑅𝑇 ln 𝑥𝑖)
𝑖
where the first term in the sum relates to the free energies of the pure components and the last term
describes the free energy of mixing. The bar over the G indicates a per mole quantity and the solid
symbol as a superscript indicates reference to the pure component. Now we can evaluate the
chemical potential for component i as a partial derivative of G with respect to ni:
𝜇𝑖 =𝜕𝐺
𝜕𝑛𝑖= 𝜇𝑖
∎ + 𝑅𝑇 ln 𝑥𝑖
where we have replaced the free energy of the pure component on a per mole basis (�̅�𝑖∎) with the
chemical potential of the pure component (𝜇𝑖∎); their meanings are equivalent. As expected, we see
that the chemical potential of each species depends on the log of its concentration, and that the
chemical potential goes down (i.e. becomes more energetically favorable) as the concentration goes
down.
The total differential, dG as a function of changes in composition
Now that we have an expression for how the chemical potentials depend on concentration, we can
turn to look at how the total free energy depends on changes in the quantities of the various
components. We note that T and P are the natural variables for G, and that G also depends on
composition, i.e. the ni’s. Taking G as a function of T, P, and the ni’s, we can write out the total
differential for G as:
𝑑𝐺 = (𝜕𝐺
𝜕𝑃)
𝑇,𝑛𝑖
𝑑𝑃 + (𝜕𝐺
𝜕𝑇)𝑃,𝑛𝑖
𝑑𝑇 + ∑(𝜕𝐺
𝜕𝑛𝑖)
𝑇,𝑃,𝑛𝑗≠𝑖𝑖
𝑑𝑛𝑖
Replacing the partial derivatives with the correct thermodynamic quantities gives:
𝑑𝐺 = 𝑉𝑑𝑃 − 𝑆𝑑𝑇 + 𝜇1𝑑𝑛1 + 𝜇2𝑑𝑛2 + ⋯
28
Then, at constant T and P, we see that the change in free energy arising from a change in composition
(i.e. a chemical conversion of some molecules to others, or movement of molecules from one place to
another) is given by:
𝑑𝐺 = ∑ 𝜇𝑖𝑑𝑛𝑖𝑖
There is much sense to this equation. The total differential free energy change takes into account the
(differential) compositional change for each component (dn) multiplied by the chemical potential of
each component (µ). We get a general sense then that dG will be negative (i.e. a favorable process) if
molecules with higher chemical potentials are converted to molecules with lower chemical
potentials. Furthermore, if a process is at equilibrium then the chemical potentials of the molecules
that would be created and those that would be consumed should be equally balanced in order for dG
to be equal to 0.
Equilibrium conditions in terms of µ’s
From above,
∑ 𝜇𝑖𝑑𝑛𝑖𝑖 = 0 for a process at equilibrium.
This is a powerful equation for analyzing all kinds of processes, from chemical reactions (where
chemically distinct molecules are able to interconvert) to transport processes (where a molecule of
a given type is able to move from one place to another or from one phase to another).
Phase or transport equilibrium
The diagram at the right illustrates equilibrium involving
partitioning of a component between two different phases.
You are familiar with processes like this from organic
chemistry where you partitioned a compound between an
aqueous phase and an organic phase (e.g. in a separatory
funnel). How does the differential free energy change, dG, in
this case depend on the process under consideration
(specifically transport of molecule A from phase 1 to phase 2)?
From above, dG = µA,1 dnA,1 + µA,2 dnA,2, where the subscripts denote the chemical species (which
doesn’t change here) and the phase where it occurs. At equilibrium, dG = 0, so µA,1 dnA,1 + µA,2 dnA,2 =
0. Then, noting that dnA,1 and dnA,2 are identical but negatively related quantities, µA,1 dnA,2 - µA,2 dnA,2
= dnA,2 (µA,2 - µA,1) = 0. The parenthetic expression must be zero. Therefore, when A is at equilibrium
between the two phases,
29
µA,1 = µA,2
This makes perfect sense; since this process creates a molecule of A in phase 2 at the expense of a
molecule of A in phase 1, at equilibrium the chemical potential of A in the two phases must be equal.
If the two chemical potentials were not equal, then the process would not be at equilibrium, and G
could be decreased (in a shift closer to equilibrium) by converting some of the higher chemical
potential component into the lower chemical potential component. In the problem described here,
that would be by movement (i.e. a transport process).
If the system was not at equilibrium, then the free energy associated with the process (assuming the
forward process is interpreted to be movement from left to right) would be (µA,2 - µA,1). This would
represent a differential free energy on a per mole basis. The value of that energy term could have
various practical interpretations in a biological setting. If the value was less than zero, then it might
describe the amount of work that could be extracted from the process and used to drive a different
unfavorable process if the two processes were coupled together by some mechanism. Or, if the free
energy difference was positive, then that energy value could describe the amount of work (or
favorable free energy) that would have to be extracted from another coupled process in order to
maintain the first process away from the equilibrium condition to which it would go otherwise.
Chemical equilibrium
Now we consider a chemical reaction and look at the conditions on the i’s for equilibrium. Consider
the reaction below:
A + B 2C
In the process above, the amounts of A, B, and C are subject to change, so the differential free energy
change is
𝑑𝐺 = ∑ 𝜇𝑖𝑑𝑛𝑖𝑖 = µA dnA + µB dnB + µC dnC
The reaction arrow represents a single process, so the changes that occur to the amounts of the
different components must be related to each other, and to a single quantity describing the extent of
the reaction. If we let describe the extent of the reaction on a per mole basis, then according to the
reaction stoichiometry,
dnA = - d
dnB = - d
dnC = + 2d and substituting above gives
dG = (-µA -µB dnB + 2µC ) d
30
At equilibrium (dG = 0), so we have (– µA – µB dnB + 2µC ) = 0. This makes intuitive sense since you
can see that in order for the expression to evaluate to zero, 2µC would have to be equal to µA + µB ,
meaning that adding up the chemical potentials of the components on the two sides of the reaction
has to give matching values. Otherwise the free energy could be lowered by having the reaction
proceed one way or the other.
If the reaction is occurring away from equilibrium, then the free energy difference for the reaction on
a per mole basis (meaning per mole of reaction events) would be (-µA -µB dnB + 2µC ). You’ll see that
this is nothing more than adding up and subtracting the chemical potentials of the products and
reactants, properly weighted by their respective stoichiometries. As before, if the concentrations are
away from equilibrium then the expression above would describe the molar energy required to make
the process proceed or (if the value is negative) how much work could be extracted from the process.
Equilibrium conditions in terms of concentrations and standard chemical potentials: arriving
at familiar equations for the equilibrium constant
So far we have laid out the conditions on the chemical potentials that must be true at equilibrium.
But of course the chemical potentials of the components depend on their concentrations, and
together this leads to equations for equilibrium constants (K) and reaction quotients (Q), which
should be familiar.
From this point forward we will switch away from mole fraction and use molarity (capital C) as our
concentration unit instead. We replace the solid superscript denoting the pure state before with the
open subscript denoting 1M as the choice for standard state concentrations. We therefore rewrite
our equation for the chemical potential and its dependence on concentration as
i = i0 + RT ln Ci
The standard state chemical potential ( i0) refers to the chemical potential the molecule would have
at its standard state concentration (1M). The standard state chemical potential serves as a reference
value to which the chemical potential can be related, taking into account the dependence on
concentration. This general statement about how the chemical potential of a component depends on
a standard state value (which is a constant) and the concentration of that component will appear
throughout our subsequent discussions.
Phase or transport equilibrium
For the earlier case of phase equilibrium of molecule A between phases 1 and 2, at equilibrium µA,1 =
µA,2 and substituting the expression above for each component gives,
A,10 + RT ln CA,1,eq = A,20 + RT ln CA,2,eq
31
Here we recognize that the standard state chemical potentials for the same molecule could be
different in different phases, from which one can see that the concentrations on the two sides would
be unequal at equilibrium. By collecting separately the terms for concentration and those for
chemical potentials,
RT ln CA,2.eq - RT ln CA,1,eq = -( A,20 - A,10) Rearranging gives,
ln (CA,2,eq/ CA,1,eq) = -( A,20 - A,10)/RT
We can recognize CA,2,eq/ CA,1,eq as the equilibrium constant K for this process. Making that
substitution and also replacing the difference between standard chemical potentials with the more
familiar expression G0 for the standard state free energy change, we arrive at
ln K = -G0/RT and K = exp(-G0/RT) with K = (CA,2,eq/ CA,1,eq)
To analyze a system away from equilibrium, we can introduce concentrations and equilibrium
constants into the non-equilibrium situation. Returning to dG = dnA,2 (µA,2 - µA,1), and substituting in
equations of the form i = i0 + RT ln Ci as before gives, with some rearrangement,
dG/dnA,2 = RT ln(CA,2/ CA,1) + ( A,20 - A,10)
or more familiar,
G = RT ln(CA,2/ CA,1) + G0
where the free energy differences here refer to the transport process on a per mole basis. Noting
from above that G0 = – RT ln(K), and recalling that the reaction quotient Q is used to describe the
ratio of product to reactant concentrations in the general case where a system may be away from
equilibrium, we get the familiar equation
G = RT ln(Q/K) where in this case Q = CA,2/ CA,1 and K = CA,2,eq/ CA,1,eq
Again, G on a per mole basis has the same meaning as (µA,2 - µA,1), which is the energy per mole that
can be extracted from (or that would be needed to drive) the reaction under consideration.
Chemical equilibrium
We can work out similar equilibrium expressions as well for our previous chemical reaction.
Substituting general terms of the form i = i0 + RT ln Ci into (– µA – µB dnB + 2µC = 0) gives, with
some rearrangement:
32
2RT ln CC,eq - RT ln CA,eq - RT ln CB,eq = - (2 C0 - A0- B0)
ln (CC,eq2/ (CA,eq CB,eq)) = - (2 C0 - A0- B0)/RT
which again matches
ln K = -G0/RT and K = exp(-G0/RT) with K = (CC,eq2/ (CA,eq CB,eq)) and G0 = (2 C0 - A0-
B0)
As before, if the reaction is away from equilibrium then we can work out equations for the molar free
energy for the reaction, obtaining in this case
G = RT ln(Q/K) with Q = (CC2/ (CA CB)) and K = (CC,eq2/ (CA,eq CB,eq))
Importance of units
It is important to understand the way concentration units have been implied in the equations we
have developed for chemical potentials, free energies, and equilibrium constants. Returning to the
general equation we developed for how chemical potential depends on concentration, where we
switched over to molar concentrations, i = i0 + RT ln Ci, you will see that we seem to be taking a
logarithm of a quantity that has units associated with it (molarity in this case), which is technically
illegal. To correct this problem, in every occurrence of a concentration value in our preceding
equations, we should understand that the concentration needs to be implicitly divided by the value
chosen for the standard state, 1M for example. That division generates unitless quantities for the
concentrations in all of our expressions for chemical potentials, free energies, reaction quotients and
equilibrium constants:
i = i0 + RT ln (Ci / 1M) for example
or
K = ((CC,eq/1M)2/ ((CA,eq/1M)(CB,eq/1M)) for the reaction above.
As you can see, as long as the standard state is 1M, then leaving out these implicit denominators is
fine. But there is an important case where 1M is not the typical choice made for the standard state.
Because biological conditions are typically close to pH7 (and not pH0), the standard state
concentration for protons (H+) is taken to be 10-7M. That means that anytime a reaction (or transport
process) involves the creation, consumption, or movement of protons, the concentration of protons
must be divided by 10-7M when using it in the calculation of free energies and reaction quotients and
equilibrium constants.
33
Other species that get handled as special cases, typically by being omitted from the equilibrium
expressions, include: water (its concentration in most scenarios is taken to be nearly pure so the mole
fraction X≈1) and compounds in their pure forms (e.g. crystalline solids) which are also taken to be
their own phase, with X=1.
Precautions about G vs G0, reactions with changes in stoichiometry, and overall
concentration effects
Free energy is sometimes discussed loosely, which can lead to confusion and errors in interpretation.
A particularly common error is to not properly distinguish between whether one is talking about G
or G0. As discussed above, G0 describes how favorable or unfavorable a process would be if the
reactants and products were all at their standard state concentrations. That is practically never
representative of conditions of biochemical interest. [Note that cellular concentrations of small
molecule metabolites are often in the millimolar range; macromolecules like proteins are present in
the cell at individual concentrations that are often in the micromolar range (e.g. for housekeeping
enzymes) or nanomolar or lower for low-abundance proteins like those often involved in cell
signaling.] The value of G0 is simply a reference energy that makes it possible to calculate the free
energy or equilibrium position at some other more relevant set of concentrations.
Another common source of confusion arises in the context of reactions where the total stoichiometry
of the reactants and products are different. In simple processes or reactions where the
stoichiometries of the reactants and products are the same, casual statements such as, “that reaction
or process is ‘naturally favorable’ because the (standard) free energy is negative”, can be interpreted
in a sensible way. For example, for the reaction A B, if the standard state free energy
difference is negative, then K > 1, and if A and B were both present at 1M concentration then, since
Q=1 which is lower than K, G would be negative and the forward reaction (conversion of some A to
B) would be favorable. The same conclusions would be reached if the concentrations of A and B were
both much lower (or higher) but still equal to each other. For example, if A and B were both present
at 1mM concentration then Q would still be 1 and the forward reaction would still be favorable. A
further conclusion is that at equilibrium B would have a higher concentration than A, whether the
overall concentrations are high or low. But this kind of casual logic falls apart entirely when the
number of molecules on the left and right side of a reaction are unequal. A classic case is a process
of binding between a protein and a ligand (e.g. an inhibitor or substrate or cofactor). Here there are
two ‘reactants’ and one ‘product’ (the bound form of the protein). In the former example, the sign of
G0 provided quick insight into the relative concentrations one would expect for the substrate and
product at equilibrium, without worrying about the definition of the standard state. But what about
the case of ligand binding by a protein? Here, the value of G0 provides no such easy insight. The
problem can be appreciated by noting that if the concentration units for Q (or K) do not cancel (which
they do not if the total stoichiometries are different on the left and the right), then the value of Q
changes with changes in overall concentration, even if relative concentrations are held equal. So, for
example, a negative G0 (K > 1) for the binding energy would tell you that if the protein, ligand, and
34
protein-ligand complex were all at 1M, then the binding process would proceed forward toward more
complete binding (so that ultimately more of the protein would be in the bound form than the
unbound form). But if those three species were all present at equal concentrations of 1uM, the value
of Q would be a million (10-6/(10-6*10-6)), which could be much greater than K (depending on how
negative G0 was), which would mean that the process would proceed in the reverse direction
toward unbinding, and ultimately most of the protein would not be bound to ligand. This is just one
illustration of the point that the interpretation of free energies must be made carefully, particularly
when there are differences in stoichiometry between reactants and products. In those cases one
must bear clearly in mind that overall concentrations are profoundly important, and that the sign and
magnitude of G0 is hardly informative without further consideration of real concentrations. Note
that the argument above about stoichiometry and the interpretation of free energy G applies just as
well to entropy S, but is a less critical issue for enthalpy H.
Comments on the dependence of G and K on T (van’t Hoff equation)
In our discussions of free energy we emphasized that the sign of G indicates the favorability of
reactions under conditions of constant temperature and pressure. But how G depends on those
values is also of interest in some situations. [One example is how temperature affects the equilibrium
between the unfolded and folded states of a protein.] Here we say something about how free energy
G and the equilibrium constant K depend on T.
From G = H – TS, one can see quickly that the dependence of G on T is determined by S. In fact,
we can look up from derivative expressions of the state variables that the partial derivative of G with
respect to T, holding P constant, is –S. That is, (∂G/∂T)P = -S. So for example, if a process is
entropically favored (S > 0), then increasing the temperature will make G more negative. Clearly,
the dependence of G on T is dictated by the sign of S.
But now let’s look at the dependence of the equilibrium constant K on T. This is where intuition can
go awry. We know that K is determined from G0 (recall K=exp(-G0/RT), and that a more negative
value of G0 corresponds to a higher value of K. So we might expect that if increasing T causes a
decrease in G0 (as it would if S0 > 0 as discussed above), then K should also depend on S0, with an
increase in T causing an increase in K if S0 > 0. But this logic is incorrect (though not uncommonly
heard in discussions). The problem with the logic is that K depends on T in two ways: through the
effect of T on G0 and through the presence of T in the denominator of the expression for K in terms
of G0.
To get the correct answer for how K depends on T, we have to break up G0 into its enthalpy and
entropy components at the outset, since those two terms have different dependencies on T.
K = exp(-G0/RT)
ln K = -G0/RT = -(H0 – TS0)/RT = -H0/RT + S0/R
35
Now the dependence of K on T can be seen to be governed by H0 and not by S0! Taking derivatives
with respect to T we get
d(ln K)/dT = H/(RT2) (here the standard state superscript for H might be omitted since H
depends less strongly on overall concentrations, in contrast to G and S as discussed at length above).
This is one form of the van’t Hoff equation. Separating the derivative variables K and T on different
sides gives d(ln K) = H/(RT2) dT, and as long as H does not change much with change in T we can
integrate between two temperatures T1 and T2 to get
ln(K2) – ln(K1) = ln (K2/K1) = (1/T1 – 1/T2)H/R
which shows how one can extract a value for the enthalpy change for a reaction or process from the
value of K at two different temperatures. Or, plotting ln(K) vs 1/T should give a slope of –H/R.
36
Graphical views of chemical potentials and total free energy as a function of reaction progress
for a simple equilibrium (A B)
Note that
‘mu’ means .
37
CHAPTER 4
Non-ideal behavior in mixtures
The breakdown of ideal equations for chemical potential
Our previous discussions have emphasized the idea that the energies in a mixture have a simple
behavior (i.e. a log dependence) that is perfectly obeyed across all ranges of concentrations,
regardless of what sorts of molecular forces might come into play as different kinds of molecules
encounter each other. We refer to that kind of behavior as ‘ideal’. We turn now to consider the
behavior of ‘real’ or ‘non-ideal’ solutions.
To understand non-ideal behavior, let’s rethink the steps we took to arrive at our simple equations
for ideal behavior to look for assumptions we made that might be violated in real situations. We used
the idea of ‘free energy of mixing’ as the foundation for establishing our equations for chemical
potential and their dependence on log concentrations in the ideal case. We started with this equation,
Gmix = Hmix –TSmix, which led us to Gmix = RT (ni ln(Xi)). But we made two assumptions in the
process.
First, you’ll recall that we allowed ourselves to drop out the enthalpy term, asserting that Hmix would
be zero upon mixing if the different kinds of molecules make energetic interactions with each other
that are similar to those they make with themselves in their pure forms. This might be a fair
assumption if the two (or more) molecular species are very similar to each other (e.g. in polarity,
charge, size, etc.). On the other hand, if intermingling of the different components leads to interaction
forces of different types and magnitudes, then our assumption that mixing would not have any
enthalpic effect will be incorrect, and the energy or chemical potential felt by each component will be
affected not only by its own concentration but by the new forces it experiences when interacting with
the other components.
A second simplification came in the way we treated the second term, the entropy of mixing. We
developed our combinatorial expression for W (to give us entropy) based on an idealized mixing
scheme where we placed molecules of different types on different sides of a container. This seemed
innocent enough. But what if the two types of molecules were of vastly different sizes? This might
have led to a more complex problem relating to how large vs small molecules might be arranged in
space without overlapping each other. This issue would not have been captured by our simple
equation for counting partitionings of molecules.
Later we will discuss in more detail specific situations where violations of the assumptions above
lead to non-ideal behavior. But first we will modify our previous equations for chemical potentials
and equilibrium constants so that they will hold true even when non-ideal effects are at play. To do
this we introduce a correction or factor into the chemical potential equations in the form of an
‘activity coefficient’, .
38
Activities and activity coefficients
Our ideal equation for the chemical potential of species i was:
i = i0 + RT ln Ci (ideal)
Now admitting that that equation might not be totally valid, we introduce a correction factor, the
activity coefficient, i, designed to make the equation remain true.
i = i0 + RT ln (iCi) (real or non-ideal)
or
i = i0 + RT ln (ai) with ai = iCi (real or non-ideal)
where we introduce the ‘activity’ ai to be equal to iCi . Then ai effectively replaces Ci in the chemical
potential equation. You can see that the ‘activity’, a, becomes like an effective concentration of a given
component. Another way of looking at it would be to imagine that you don’t have a way of directly
measuring the true concentration of a component in a mixture, but you have a way of measuring the
chemical potential of that component (through some energetic evaluation). From the chemical
potential of that component, since chemical potential depends on concentration, you could say that
you are able to measure what concentration that component seems to have based on its energetic
behavior, and that effective concentration is the activity. You might anticipate from the equations
above, which make it explicit that chemical potential relates to activity and not necessarily to
concentration, that the activities will be the key quantities in equilibrium constants and reaction
quotients for non-ideal systems.
Before we rework our previous equilibrium equations in terms of activities, let’s look a little more at
the range of possibilities for the activity coefficients and how this relates to favorable vs unfavorable
energetic features in non-ideal mixtures.
First, note that our new equations reduce to the ideal ones when the activity coefficients, i, are equal
to 1. In that case, the activity is the same as the concentration, ai = iCi. Logically then, non-ideal
behavior is when the activity coefficient is either greater than or less than 1. Those two possibilities
can be ascribed different energetic meanings. By comparing the equations above for chemical
potential in the ideal and non-ideal cases, we can see that a value of i > 1 relates to an elevated value
for the chemical potential for component i. Since the chemical potential reports on the energy that is
felt by some component, we surmise that i > 1 indicates that component i is experiencing
unfavorable energetics compared to the case of ideal behavior. Conversely, i < 1 reflects unusually
good energetic interactions.
39
The ideal behavior of highly dilute solutions
Now we have to discuss in a bit more detail what limiting situations are chosen (by convention) to
represent ideal behavior. From our previous discussions it might seem that the sensible thing would
be to take the pure state of each component to represent its ideal behavior. This is fine for the solvent;
in biochemistry our ‘mixtures’ are nearly always solutions where water is the solvent and various
other molecules are the dissolved solutes. But the idea of a pure solute often doesn’t make sense for
biochemistry. For example, a sample containing only a protein in a pure form (without solvent) is
nonsensical since proteins don’t fold properly unless they are in an aqueous solution. Therefore, the
condition chosen to represent ideal behavior for a solute is usually the (hypothetical) infinitely dilute
limit. Let’s see if this is consistent with ideas we laid out earlier about how the equations for chemical
potential as a function of concentration should behave. Putting a slightly finer point on our previous
arguments, the ideal equation for chemical potential fails if a given component experiences different
kinds of interactions as its concentration is changed. Now we can examine the situation of a highly
dilute mixture to see if meets the ideal requirement that a given component makes the same kinds of
interactions as its concentration is changed slightly. First consider a dilute solution from the
perspective of the solvent. If the solute is present in a 1:1000000 ratio to the solvent (setting aside
for the moment potential differences in molecular size), then any arbitrarily chosen solvent molecule
will be interacting nearly exclusively with other solvent molecules. Now if we increase the
concentration of the solute by a factor of two, that doesn’t change the picture; a solvent molecule will
still interact nearly exclusively with other solvent molecules. Now let’s view it from the perspective
of the solute. At the 1:1000000 ratio, a solute molecule will rarely interact with another solute and
will exclusively ‘see’ the solvent. When we double the concentration of the solute, this is still the case.
Clearly then, if a solution is very dilute, the various components can be expected to behave ideally.
The ideal state for the solvent is taken to be pure solvent (water), whereas the ideal state for the
solute is at infinite dilution, and the components under these highly dilute conditions have activity
coefficients equal to 1.
The origin of non-ideal behavior at higher concentrations
We can use the same logic as above to think about the non-dilute situation where non-ideal behavior
begins to show up. Consider what happens when a solute concentration gets much higher. Now the
solvent will start to encounter solute molecules with frequencies that cannot be ignored (as
illustrated below). So if for the sake of argument the solvent and the solute make poorer or less
favorable interactions with each other than they do with themselves, then as the mixture moves into
the non-ideal range, the solvent will experience a higher chemical potential than expected for ideal
behavior owing to its increased interactions with the other component (the solute). That would
mean the activity coefficient for the solvent would be > 1. Now let’s look at it from the perspective of
the solute, which sees things differently because it is dilute rather than nearly pure like the solvent.
As the solute concentration increases, at some point solute molecules begin to encounter other solute
molecules to an appreciable extent. Now under the same scenario as before where the solvent and
40
solute make poorer interactions with each other, and better interactions with themselves, you see
that as the concentration of the solute increases it makes more favorable interactions (with itself).
So, the activity coefficient for the solute would be < 1. The reason we obtain different numerical
behavior for the activity coefficient for the solvent vs the solute under the same set of assumptions
about the energetics of the solution is due entirely to the different choices for what the ideal limit is
for the different
components: pure in
the case of the
solvent (water) and
highly dilute in the
case of the dissolved
solute. Note that if
we imagined the
opposite scenario
where the solvent
and solute made
better interactions
with each other than
with themselves,
then the behavior of
the activity
coefficients would
be reversed, with
the activity
coefficient for the
solute being > 1 and
solvent < 1.
Reworking the equilibrium equations in terms of activities instead of concentrations
The expression for the total differential dG remains true even if the behavior is non-ideal, as does the
requirement that dG equals 0 at the equilibrium composition.
𝑑𝐺 = ∑ 𝜇𝑖𝑑𝑛𝑖𝑖 = 0
But now we use
i = i0 + RT ln (ai)
41
This is the same as before except activity a has replace molar concentration C. Clearly the equations
will develop exactly as before, but with activity a replacing C everywhere. For example, for the
reaction A B, starting from µA = µB at equilibrium, we would obtain
ln (aB,eq/ aA,eq) = -(B0 - A0)/RT
(aB,eq/ aA,eq) = exp(-(B0 - A0)/RT) = K (where K is the equilibrium constant as before)
Note however that K CB,eq/CA,eq if the behavior is non-ideal, since ai Ci
The relationship between the equilibrium constant and the concentrations can be seen by grouping
the acitivity coefficients together as a single correction terms, as follows:
K = (aB,eq/aA,eq) = (BCB,eq)/ (ACA,e) = (CB,eq/CA,eq)*(B/A)
The equilibrium constant is constant and so its value is not affected by non-ideal behavior (e.g. at
higher concentrations), and the ratio of activities also remains equal to the equilibrium value. But
the ratio of concentrations, which we ordinarily think of as the equilibrium constant, is affected and
can change. You might then think of the ratio of concentrations as the non-ideal or ‘apparent’
equilibrium constant, whose relationship to the true, ideal equilibrium constant would be:
(CB,eq/CA,eq) = Kapp = K/(B/A)
And if the system were away from equilibrium then the expression for molar free energy for the
reaction would be the same as for the ideal case, except activities would replace concentrations in
the formulation of the reaction quotient Q. For the simple reaction of A B for example,
G = RT ln ((aB/aA)/K) = RT ln (((BCB)/(ACA))/K)
The equations above are of course specific for the simple equilibrium between A and B, but the idea
generalizes immediately to any reaction or stoichiometry.
For the more complex reaction A + B 2C, beginning with 2 µC = µA + µB, we would end up with,
K = aC,eq2/(aA,eq*aB,eq) = (CCC,eq)2/(ACA,eq*BCB,eq) = CC,eq2/(CA,eq*CB,eq)* C2/(A*B) and
CC,eq2/(CA,eq*CB,eq) = Kapp = K/(C2/(A*B))
And for the molar free energy if the system is away from equilibrium,
G = RT ln ((aC2/(aA*aB))/K) = RT ln (((CCC)2/(ACA*BCB))/K)
42
Ion-ion interactions in solution as an example of non-ideal behavior
(Debye-Hückel theory)
Here we will examine how ions in an electrolyte (salt) solution behave. As you know, charged species
repel or attract each other depending on whether their charges have the same or opposite signs. This
affects the positions that ions exhibit (on average) as they move around freely in solution. We will
contrast what happens when we have a very dilute (meaning ideal) electrolyte solution compared to
when the concentrations of ions gets higher. In the dilute limit, the ions are so far apart that their
electrostatic properties do not influence each other. In contrast, at higher concentrations the positive
ions will prefer to be in the vicinity of negative ions, and vice versa, while like charges will prefer to
be farther from each other. That means that, on average, a positive ion will find itself surrounded by
a slight excess of negatively charged ions, and likewise a negative ion will find itself surrounded by a
slight excess of positively charged ions; remember that we always have a mixture of positive and
negative ions in an electrically neutral solution. The ions are moving around in solution, so the effect
is subtle, but significant. From this argument you can see that each ion should enjoy a favorable
energetic interaction with its ‘counter-ion atmosphere’. Referring to our earlier discussions, this
favorable energetic contribution corresponds to an activity coefficient for the ions that is < 1.
A quantitative treatment of the energetics of electrolyte solutions was developed by Debye and
Hückel, and is worked out in detail in some texts. Here we will simply summarize the essential ideas.
Ionic strength and the Debye length
First we explain the idea of the Debye-length. Each ion is surrounded by a counter-ion atmosphere
whose total charge offsets the charge on the central ion. How is that opposing charge distributed (on
average) as a function of distance from the central ion? At a very long distance from the central ion
of interest the attractive force is small, so the counter-ion atmosphere drops to zero at long distance.
In addition, the amount of opposing charge that can exist very close to the central charge is limited
since the available volume at very small distance becomes small. So, as diagrammed below, the
amount of counter-ion charge goes up and then down with distance, and its maximum value is at a
distance referred to as the Debye length, 1/. The increased counter-ion concentration in the vicinity
of a central ion also has the effect of ‘screening’ or diminishing the electrostatic force or field that is
exerted by a given ion, and the Debye length also describes that effect. From Coulomb’s law you’ll
remember that the electrostatic potential at a distance r from a central ion is proportional to 1/r
(that is, 1/r), and that equation would apply in the infinitely dilute limit. When screening
becomes significant owing to an increase in the concentration of ions, then (1/r)exp(-r).
A simple computer simulation is shown for ions moving around in solution under forces of attraction
and repulsion. A snapshot is shown along with a calculation of the average counterion charge around
a negatively charge ion. The Debye-length length 1/ is indicated.
43
What is the value of 1/? This depends mainly on the total concentration of ions in solution; more
exactly, it depends on the ionic strength, I.
For reasonably dilute solutions the equation for ionic strength is
I = (1/2) (Cizi2)
where the Ci are molar concentrations of the charged species, and zi is their charge, and the sum is
over all ions. Note that the squaring of z gives positive values for anions as well as cations.
The dependence of on I is complex, but for aqueous solutions near 298K,
1/ ≈ 3.0Å/sqrt(I) where I is understood to be in molar concentration units
So, for example, if the ionic strength of a solution is 0.001 M, then 1/ = 96Å, whereas if I = 0.1 M,
then 1/ = 9.6Å. For reference recall that the sizes and distances between bonded atoms is in the 1Å
to 1.5Å range.
Activity coefficients for ionic species
44
A quantitative treatment of how ions are surrounded by a counter-ion atmosphere makes it possible
to calculate the theoretical magnitude of the favorable energy of interaction between an ion and its
counter-ion atmosphere. This energy of interaction will be the source of non-ideality in the
electrolyte solution, so mathematical expressions can be obtained for the activity coefficient for an
ion. Without derivation, the following is obtained. For a given charged species, i:
ln(𝛾𝑖) =−𝑧𝑖
2𝑒2
2휀𝑘𝐵𝑇
𝜅
1 + 𝜅𝑎
where a is the radius of the ion. Under relatively dilute conditions, 1/ >> a, and a << 1, so the a
term drops out of the denominator to give
ln(𝛾𝑖)−𝑧𝑖
2𝑒2
2휀𝑘𝐵𝑇𝜅
In aqueous solutions near 298K this equation, and the dependence of 1/ on sqrt(I), can be combined
and reduced to a simple approximate expression:
ln(𝛾𝑖) ≈ −1.2 𝑧𝑖2√𝐼 where I is understood to be in molar concentration units.
Note from the equation above that the activity coefficient is < 1 for each species, regardless of charge
sign, which is consistent with our qualitative discussion above. And note that as the ionic strength
goes to zero (e.g. under highly dilute conditions), the log of goes to 0 and therefore goes to 1, as
expected for ideal conditions.
Using ionic activity coefficients to analyze the effect of charge on molecular association, and
electrostatic screening
We can use the activity coefficient equation above to gain insight into how ionic strength affects
molecular association between charged
molecules (e.g. proteins or nucleic acids) in
solution. We’ll set up an abstract problem
where a molecule A has charge zA and a
molecule B has charge zB, and A and B can
come together in some association or
binding process to form species C, whose
charge is zA+zB.
From our previous discussions, we can quickly write out how we expect the equilibrium position of
this binding process to be affected by total ionic strength. Note that if we are dealing with large
molecules like proteins, their molarity is usually very low, so the charges on the molecules in question
45
(here A and B) may not contribute meaningfully to the total ionic strength. The total ionic strength
we’re talking about here would more likely relate to how much salt we added to the experiment. So
we’ll imagine that the ionic strength is something we control separately from whatever is happening
regarding A and B and their association.
What do we expect to happen to the equilibrium above if the charges on A and B are opposite and we
start adding salt? You’ve likely learned about electrostatic ‘screening’ before, which is the idea that
high salt concentration tends to mask or diminish any electrostatic force that two charged molecules
might exert on each other. So, intuitively you might expect that in the case where A and B have
opposite net charges that adding salt would lessen their tendency to associate and would therefore
shift the equilibrium position to the left.
Let’s set up the equilibrium equation for this process and see if we get the result we expect. Now that
we know how to handle non-ideal equilibrium expressions, we can write
Kapp = CC/(CACB) = K/ (C/(AB))
to describe how the non-ideal or apparent equilibrium constant would change according to the
values of the activity coefficients i for the three species. From the simplified Debye-Hückel equation
we know how the activity coefficients of the three species should depend on their charges and on the
ionic strength I. Exponentiating the previous equation for how depends on I, we would get
A = exp(-1.2*zA2*sqrt(I)) and similarly for B, and C = exp(-1.2*(zA+zB)2*sqrt(I))
Following some rearrangements,
Kapp = CC/(CACB) = K* exp(2.4*zA*zB*sqrt(I))
or
ln (Kapp) = ln (K) + 2.4*zA*zB*sqrt(I)
These equations confirm that if the charges on A and B have opposite sign, then Kapp would be lowered
(since the product of zA and zB would be negative) and the equilibrium position for the reaction would
therefore be shifted to the left by increasing ionic strength. This is precisely what we expected based
on higher ionic strength screening the attractive electrostatic force between A and B. And note that
the effect would be opposite if A and B were of like charge; the overall driving force for their
association in that case might arise from other non-electrostatic interactions, and an increase in ionic
strength would diminish the electrostatic repulsion between them.
46
Molecular crowding and excluded volume effects as an example of non-
ideal behavior in solutions of macromolecules
The idea of excluded volume
Earlier we alluded to the idea that solutions containing very large solute molecules might give rise to
non-ideal behavior. This phenomenon is sometimes described in the context of ‘molecular crowding’
or ‘excluded volume’ effects. To understand the phenomenon we need to consider a solution that
contains some large solute molecules already, and think about what effect their presence has on our
ability to add another copy of the solute. The molecules cannot occupy the same space. Therefore,
across the entire volume of the system, some of the locations are excluded as possible positions for
placing a new molecule. That is the excluded volume. To a first approximation, the relationship
between molecular crowding and the activity coefficient for a macromolecule can be written as
= Vtot/(Vtot – Vexcl) where Vtot is the total volume of the system and Vexcl is the excluded volume.
Note that this implies that molecular crowding effects correspond to > 1. Geometrically interesting
aspects of molecular crowding come into play when we look more carefully at what is meant by the
excluded volume. The excluded volume is not simply the volume of space that is occupied by the
existing solute molecules. The complication is that we have to think about where we can and can’t
choose to position a new molecule, meaning where its center could or could not reside. As you will
see from the diagram below, the region where we cannot place (the center of) a new solute molecule
is much larger than the space actually occupied by the existing solute molecules. First we illustrate
the situation where the solute has the shape of a large sphere (e.g. a compact globular protein).
What the diagram shows is that the excluded volume in the case of spherical molecules is a sphere
with twice the radius of the individual molecule. That volume is therefore 8 times larger than the
volume actually occupied. That is, Vexcl = 8Vocc. By rearranging the approximation above for by
dividing by Vtot, we see that = 1/(1-Vexcl/Vtot) = 1/(1-8Vocc/Vtot). As a result, even if a relatively small
47
fraction of the total space is occupied by macromolecules, the activity coefficient may be considerably
higher than 1. In this rough model, if 5% of the space is occupied, = 1.67.
The peculiar behavior of rigid elongated structures
This is interesting by itself, but the situation becomes much more intriguing when we consider
molecules whose shapes are highly elongated rather than spherical. Choosing a geometrically
tractable model, here we treat the case of a long rod-shaped molecule with a square cross-section,
whose dimensions are L x d x d, with L >> d. Again we can consider what volume of space is excluded
for placing (the center of) an added molecule in the proximity of another. The analysis is more
complicated than for the sphere because now the relative orientation of the two molecules matters.
Furthermore, as we carry out
the same exercise as before in
sliding the second rod around
the first one to see where we
cannot place the second one, we
must keep the orientation of the
rods unchanged; we are only
asking about the allowable
position for the second molecule at some fixed orientation. First, we will consider the best case
scenario, which is where the two rods are parallel to each other. The result is similar to the case with
the spheres: the excluded volume would be (2d)(2d)(2L)=8Ld2, which is 8 times the volume of a
single molecule.
But what about the case where the
two rods are perpendicular to each
other? This is the worst case
scenario. It takes more careful
visualization in 3D (shown on the
right), but the excluded volume in
this case is (L+d)(L+d)(2d), which is
2d(L2 + 2Ld + d2). If we compare
this to the volume of one molecule
by dividing by Ld2, we get a ratio of
2L/d + 4 (dropping the term 2d/L
which would be small). Now,
instead of getting a ratio of 8, we get
a much higher number since L >> d.
Returning to the earlier equation
for the activity coefficient, we see
that could be large even when the
fraction of the space occupied by
parallel rods
perpendicular rods
48
the rod-shaped molecules is small. The real behavior of course would have to be an average (or really
an integral) of the behavior as a function of the angle between the rods. But the effect remains
substantial.
How do these excluded volume ideas relate to real macromolecules? If we take the lessons to be
general ones that should apply even if the situations in question don’t involve molecules that look
exactly like spheres or rigid rods, then the implications are numerous. Protein folding is one relevant
example. Proteins have to be stable in their folded compact conformations compared to the unfolded
form in which their backbones would generally be flexible and much more extended. Certainly the
unfolded form of a protein should have a much greater excluded volume. As a result, under
conditions where crowding effects are significant, like when the overall concentration of
macromolecules is high, the activity coefficient for the unfolded form of a protein could be
significantly greater than 1. We can write an equilibrium process between the unfolded (U) and
natively folded (N) states:
U N
If the equilibrium constant under dilute conditions is K, then following the procedures we developed
earlier we can write that under conditions where non-ideal (crowding) effects come into play,
CN/CU = K/(N/U)
We would expect molecular crowding effects to give U > N . The consequence is that (CN/CU) should
go up and the equilibrium position should be shifted to the right, towards the direction of native
folding, by crowded conditions. This is an important point given how crowded the conditions are in
the cell, and also how dilute typical conditions are when purified proteins are studied in the
laboratory. It may be that proteins are significantly stabilized in their folded states in the cell by
crowding; this is not reflected in typical laboratory studies.
Another example involves highly elongated filamentous structures like F-actin and microtubules that
form in the cell by assembly of large numbers of protein subunits. The behaviors of these kinds of
protein filaments in the cell are influenced strongly by crowding. The effects are probably myriad,
but one basic effect in such systems is the tendency towards alignment or bundling of filaments.
Without working out a sophisticated model, one can still get a sense of why this is the case. Consider
the alternative scenarios
where you have a system
with filaments that are
either mainly aligned vs
randomly oriented. Now
consider trying to add an
additional filament (which is
a way of sensing the activity
of a component). Which case
49
allows for easier addition? The situation is illustrated above, where you can see clearly that adding
additional filaments is easier if they are more aligned. In that sense you can see that the activity
coefficient should be lower in the aligned case, and so that case will be favored as crowding comes to
dominate. Of course we know that entropy will tend to drive such a system in the other direction,
towards random molecular orientations, but at some point the crowding effects will prevail and favor
alignment. The alignment and bundling of protein filaments is likely functionally important in the
cell.
In this lecture we detailed just two kinds of physical phenomena – ionic interactions in solution and
crowding effects – that give rise to non-ideal behavior. But biological systems are highly complex,
and indeed non-ideal behavior can arise in many different ways.
50
CHAPTER 5
Chemical Potential and Equilibrium in the Presence of Additional Forces
There are many instances, in both cellular and experimental laboratory settings, where molecules
experience additional forces that contribute to the energy they feel, thereby affecting the equilibrium
positions of the processes in which they are involved. We will consider some examples here, and
develop a general framework for modifying our previous equations for chemical potential in order
to handle these situations. The essence is to rewrite our equation for the chemical potential for some
molecular species in a way that adds in the relevant extra energy:
i = i0 + RT ln Ci + energy term
with the added term relating to the energy experienced by the molecular component in question, on
a per mole basis.
Osmotic pressure
Osmotic pressure is a familiar phenomenon. It has important roles in cellular function, and is also
the basis for laboratory measurements to study molecular behavior, though this was more common
in the past than it is now. As you will recall, osmotic pressure refers to a pressure difference that
must be exerted to prevent water from moving across a semi-permeable membrane from a side
where the overall solute concentration is low to where it is higher.
We can set up a system with two chambers separated by a
semi-permeable membrane (permeable to water but
nothing else). The equilibrium process in question is
therefore the transport of water from one side to the other.
We will add protein to side A, where it will be confined to
stay.
We can see right away that the concentration of water on
the two sides is never going to be equal, and we recall from
earlier discussions that the chemical potential is
determined by the standard chemical potential plus the concentration term. So the only way water
can be at equilibrium between the two sides is if there is an addition force that is different between
the two sides, in this case a pressure on side A preventing flow of water from right to left. We can
write the equilibrium situation for water as follows:
For the chemical potential of water on the B side,
H2O,B = 0H2O
51
For the A side,
H2O,A = H2O,A0 + RT ln XH2O,A +
(Note that we are using X instead of C for the concentration of H2O)
To complete this analysis we need to know how the chemical potential energy should change as a
function of pressure.
𝜕
𝜕𝑃𝜇 =
𝜕
𝜕𝑃(𝜕𝐺
𝜕𝑛) =
𝜕
𝜕𝑛(𝜕𝐺
𝜕𝑃) =
𝜕
𝜕𝑛𝑉 = �̅�
where �̅� is the molar volume (for water in this case). Then, d = �̅� dP
and H2O = �̅�𝐻2𝑂P = �̅�𝐻2𝑂 where denotes the osmotic pressure difference.
Now, if we use the term �̅�𝐻2𝑂 to take the place of the extra term in the chemical potential for water
on the A side in our previous equation, we can equate the chemical potentials for water on the two
sides and rearrange to see that:
RT ln XH2O,A + �̅�𝐻2𝑂 = 0
This equation provides more insight if we manipulate it to obtain an expression in terms of the
concentration of the solute instead of the concentration of water. We let X2 be the mole fraction of
the solute. X2 = 1 – XH2O. Then substituting above, and noting that from Taylor’s expansion that ln(1-
X2) = -X2 + X22/2 + … ≈ -X2, as long as X2 is small, we get
RT X2 ≈ �̅�𝐻2𝑂
Now switching from mole fraction to molar concentration by noting that at low concentration X2 ≈
C2�̅�𝐻2𝑂 , we get
≈ RTC2
In other words, the osmotic pressure is proportional to the molar concentration of solute molecules
present, making it an example of a ‘colligative’ property.
It is sometimes convenient to further modify the osmotic pressure equation to convert from molar
concentration C, to weight concentration, c. We often have a better way of knowing the weight
concentration of a protein in solution, for example from a spectroscopic absorbance measurement
that reports on the approximate number of peptide groups or particular amino acid groups present
rather than the number of polypeptide chains present. Also, the conversion from molar to weight
the expected change in chemical potential
energy due to a change in pressure.
52
concentration introduces a molecular weight term, and as a result information about molecular
weight (of a protein or nucleic acid) can be obtained. The conversion follows from c = MC where
lower case c is the weight concentration (typically in g/L = mg/ml) and M is molecular weight
(typically in g/mol). We get,
≈ RTc2/M
Therefore, measuring and knowing the weight concentration allows approximation of the
molecular weight: M ≈ RTc2/. Osmotic pressure is no longer a common biochemistry laboratory
technique, but later we will discuss more common experimental methods for molecular weight
determination.
Osmotic pressure measurements are sometimes used to examine non-ideal effects in solution.
Modifying our previous equation to allow for non-ideal effects (and also realizing that other
approximations were introduced by truncation
of the Taylor’s expansion), we can write an
expression for osmotic pressure as follows:
≈ RT (c2/M + B c22 + …)
In this expression, B captures the first order non-
ideality and is referred to as the second virial
coefficient. One way of extracting B from
measurement of osmotic pressure as a function
of solute concentration is illustrated here.
Equilibrium sedimentation
We don’t often think about the effect of gravitational force on molecules in solution. But putting a
sample in a centrifuge is essentially the same as increasing the force of gravity (sometimes by a factor
of tens of thousands). If the solute molecule has a large mass (as do proteins and nucleic acids), then
these forces can have significant effects. Centrifugation is a widely used laboratory technique, and is
used in various modes for different purposes. Here we will consider a particular kind of
centrifugation experiment that is powerful for studying the molecular weights of macromolecules in
solution in a way that preserves their native conformations and assembly states.
We begin by thinking about what we expect to happen during centrifugation (or even under simple
gravitational forces) in two limiting cases: (1) where a solute is very small (e.g. ethanol or sucrose)
or (2) when the particle in question is massive (like a cell or a sand particle). If the solute is very
small, then we can spin the sample forever and the concentration of the solute will be uniform,
essentially equal throughout the tube, from the top to the bottom. Schematically, the result can be
53
diagrammed as shown, where r is the variable describing distance from the axis of rotation. [We draw
the tube and the position variable r horizontal since the axis of rotation is vertical.]
At the other extreme, if the particle in question is massive, then virtually all of it will go to the bottom,
and the concentration will be nearly zero everywhere else.
But what if the solute is intermediate in mass? Then we should expect its concentration profile to
somehow be in-between the two extreme cases illustrated: not uniform throughout the sample, but
also not completely sedimented to the bottom. In other words, we should get a higher concentration
towards the bottom and a lower concentration towards the top. And this situation should be stable,
meaning we can spin it forever and this is the final equilibrium result.
The idea is schematized above. But what is the exact form we expect for this curve? Surely it must
depend on the mass of the molecule, so how might we extract a value for the mass from the
equilibrium sedimentation behavior?
Qualitatively we can see that this is a situation of forces in balance. We end up with a concentration
that is unequal (higher towards the bottom), and we know that there must be an entropic driving
force in the opposite direction, favoring a more equal distribution. This is a balancing force that acts
against the gravitational or centrifugal force that is driving molecules towards the bottom of the tube.
54
This is a situation at equilibrium, so we can treat the problem with our general approach of setting
up a chemical potential equation that contains an extra energy term relating to work done by an
external force.
Imagine a solute molecule that is free to move between two positions in a tube. At equilibrium, the
chemical potential for the solute at those two positions must be equal (otherwise there would be
further net transport). So, we will solve our problem by requiring that d/dr = 0. But first we write
an equation for how we expect the chemical potential to depend jointly on concentration and
position in the tube, since we are ultimately interested in establishing how concentration and
position are related to each other.
µ = µ0 + RT ln C + U
where U can be viewed as a potential energy on a per mole basis relates to movement of a solute in
the applied gravitational or centrifugal field.
Then, d/dr, which must be zero at equilibrium, is
d/dr = 0 = RT d(ln C)/dr + dU/dr
Generally, force F is the negative of the derivative of potential energy with respect to position, F = -
dU/dr, so rearranging we get
RT d(ln C)/dr = F
Now we can introduce the dependence on mass, since F=ma, where m is mass and a is acceleration.
But before proceeding with the equation above we have to expand on the meaning of the mass m in
the context of the current problem. What matters here is not simply the mass of the solute, but the
‘buoyant’ mass, meaning the difference between its mass and the mass of the water it displaces, which
of course depends on its volume. Also, to be consistent with the energy equation we need to work
out the relevant mass equation in per mole terms. The mass we need in our equation above is:
NA(m – v * H20) where v is the volume of one solute molecule.
We can replace the volume of one molecule with its ‘specific volume’ �̅� (which is volume per mass or
really just the reciprocal of density), times its mass. Including a subscript 2 to make it clear that the
specific volume refers to the solute and not the solvent, we get:
NA(m – m�̅�2H20) = NAm(1 – �̅�2H20) = M(1 – �̅�2H20) (where M is the molecular weight of the
solute)
55
The unitless term (1 – �̅�2H20) is referred to as the ‘density increment’ and is sometimes replaced
with a single variable 2 for simplicity. Note that if the solute is composed of material whose density
is greater than water, which is true for proteins and nucleic acids (but not lipids), then �̅�2H20 will be
less than 1, and 2 will be greater than 0.
Using this expression (M2) for the buoyant mass on a per mole basis in our F=ma equation, we get F
= M2a.
Before returning to our equation that balanced the concentration gradient in the tube with force, we
point out that there are two different kinds of problems where these equations are useful, (1) where
the force is simply gravitational (in which case a=g), and (2) where we are doing centrifugation (in
which case a = 2r, from introductory physics, with representing angular velocity; also recall that
= ‘rpms’*2/60).
We will proceed to work out the equilibrium situation for centrifugation. Substituting F= M2a =
M22r into our previous equation for balanced forces, we get
RT d(ln C)/dr = M22r
d(ln C)/dr = M22r/RT Then separating the derivative variables gives
d(ln C) = M22r/RT dr And integrating gives
𝑙𝑛𝐶 |𝐶=𝐶0
𝐶 =𝑀𝜙2
2𝑅𝑇𝜔2𝑟2|𝑟=𝑟0
𝑟
𝑙𝑛𝐶 − ln 𝐶0 = ln (𝐶 𝐶0⁄ ) =
𝑀𝜙2
2𝑅𝑇𝜔2(𝑟2 − 𝑟0
2)
or
𝐶𝐶0
⁄ = 𝑒(𝑀𝜙22𝑅𝑇
𝜔2(𝑟2−𝑟02))
where r0 refers to some reference position in the tube and C0 refers to the concentration at that
position.
By matching the equation for ln(C) above to the standard form for a linear equation (y=mx+b), you
can see that plotting log of concentration with respect to the square of the position in the tube (i.e.
distance from the axis of rotation) should theoretically give a straight line whose slope is M22/2RT,
from which the value of M can be calculated, since the other variables represent known quantities. A
schematic diagram is shown.
56
Furthermore, note that because weight
concentration (c) is proportional to molar
concentration (C), ln(c) differs from ln(C) only by an
additive factor. That means that you can plot ln(c)
vs r2, and the slope will be the same as above. This
is useful because the weight concentration of a
protein or nucleic acid sample is typically the easier
quantity to establish from a routine spectroscopic
measurement.
Note that in comparison to some other methods that
you might be familiar with for determining
molecular weights of proteins – e.g. SDS polyacrylamide gel electrophoresis – equilibrium
sedimentation keeps proteins in their native forms, including potential assemblies of multiple
subunits. It is therefore very useful for getting information about the oligomeric states of proteins,
i.e. whether they are dimers or trimers or larger species in solution.
Gravitational sedimentation
If the acceleration on a sample is due simply to gravity instead of centrifugal acceleration, then
instead of a = 2r, we simply have a = g.
With analogy to the previous equations, we get
RT d(ln C)/dr = M2g
Rearranging and integrating gives
𝑙𝑛𝐶 − ln 𝐶0 = ln (𝐶 𝐶0⁄ ) =
𝑀𝜙2
𝑅𝑇𝑔(𝑟 − 𝑟0)
or
𝐶𝐶0
⁄ = 𝑒(𝑀𝜙2𝑅𝑇
𝑔(𝑟−𝑟0))
Equilibrium sedimentation of a mixture
If a sample contains more than one type of macromolecular solute (e.g. two different proteins), then
its sedimentation behavior will be more complex. Each component will behave exactly as described
57
above, but if the multiple components have different masses, then their concentrations will increase
to different degrees as a function of position in the tube. As a result, if you were only able to measure
the total concentration of protein as a function of position, which would be the case if you were
relying on a typical spectroscopic reading, then your concentration profile would have an unusual
behavior that could not be fit to the equations we worked out above. The resulting concentration
profile would not match what you would expect for any choice of molecular weight for a single
component. That is, a plot of ln(c) vs r2 will not be straight, but curved.
Let’s look more specifically at how that plot would look. The slope of the curve should obey, slope=
d(ln(c))/d(r2) = M22/2RT. Now rewrite this in terms of d(c) instead of d(ln(c)) by noting that
d(ln(c)) = (1/c)d(c). That gives, d(c)/d(r2) = Mc22/2RT. Now if the sample is a mixture and our
measurement is of the total weight concentration, then we can write an equation for the behavior of
the total weight concentration as a sum over the components:
𝑑𝑐𝑇
𝑑(𝑟2)=
𝑑 ∑ 𝑐𝑖𝑖
𝑑(𝑟2)= ∑
𝑑𝑐𝑖
𝑑(𝑟2)𝑖
=𝜙2
2𝑅𝑇𝜔2 ∑𝑀𝑖𝑐𝑖
𝑖
Dividing by cT on both sides and then absorbing the cT in the denominator on the left into the
derivative of the log gives
𝑑(𝑙𝑛(𝑐𝑇))
𝑑(𝑟2)=
𝜙2
2𝑅𝑇𝜔2 (
∑ 𝑀𝑖𝑐𝑖𝑖
𝑐𝑇)
We can compare the complicated equation above to the simpler equation we had before for a single
pure component. That previous equation, after rearranging the terms a bit to match the form above,
was:
𝑑(𝑙𝑛(𝑐))
𝑑(𝑟2)=
𝜙2
2𝑅𝑇𝜔2𝑀
We can see that the equation for the slope of the curve of the log of the concentration of a mixture
matches the equation for the single component case, except that in the case of a mixture the molecular
weight M has been replaced with a term that gives a kind of average of the molecular weights of all
the components present, accounting for their concentrations in weight terms. Evidently, the slope of
the curve for the case of a mixture gives (after dividing by 22/(2RT)) an effective molecular weight,
Meff given by:
𝑀𝑒𝑓𝑓 = (∑ 𝑀𝑖𝑐𝑖𝑖
𝑐𝑇)
58
This is an example of a ‘weight-average’ molecular weight, since each component gets included
according to its weight concentration (as opposed to its molar concentration).
Clearly, extracting the molecular weights of multiple components from the equilibrium
sedimentation behavior of a mixture is a complicated challenge. We will not examine that problem
in any more detail, but some essential points can be made about the overall behavior. How would
the overall shape of a plot of ln(c) vs r2 look for a mixture compared to a pure component? We have
already established that it should not be straight, and that the slope should reflect the effective
molecular weight. Is the effective molecular weight greater or smaller as we move farther down the
tube? As we move to higher values of r (e.g. towards the bottom of the tube), the relative proportion
of the heavier components should be
greater since their concentrations
increase more rapidly with increasing
r. That means that the slope of the
curve, which is proportional to Meff,
should be higher at higher r. This
corresponds to upward curvature. The
plot on the right shows the result from
an equilibrium sedimentation run for a
protein assembly composed of 12
subunits. This result was interpreted
to indicate that the 12-subunit
assembly is in equilibrium with other
smaller subassemblies.
If a sample behaves like a mixture rather than a pure component, it may mean that we were
unsuccessful at purifying the desired component prior to the centrifugation experiment, so that
contaminants remained. But there is another possibility that occurs often. Many proteins associate
naturally into oligomeric forms, like dimers for example. At typical concentrations, a protein could
be at equilibrium between a monomer and a dimer form. What would happen then in an equilibrium
sedimentation experiment? This is clearly a case of a mixture, so we will see behavior like that
detailed above. Towards the bottom of the tube, the relative proportion of the dimer will be higher
than at the top. If the monomer and dimer are at equilibrium, then you might wonder how they could
be at equilibrium both at the top of the tube and at the bottom if the relative proportion of monomer
to dimer is different at the two positions. But if the problem is worked out in detail, one sees that
this is exactly as expected: the overall concentration of protein is higher at the bottom, and this
naturally gives a higher proportion of dimer at equilibrium. In other words, the entire system is able
to reach equilibrium both with respect to monomer-dimer association and with respect to the
dependence of concentration on position in response to the external centrifugal force.
Effects of non-ideal behavior
59
If a solute exhibits non-ideal behavior, then we might also observe deviation from the equations we
worked out above for the equilibrium concentration as a function of position. Specifically, we should
still expect a plot of log of activity of the solute to be linear when plotted as a function of r2, but the
concentration may be different from the activity.
Summary
In this chapter we examined the behavior of systems where molecules are under the influence of
external forces. We provided a general strategy for modifying the chemical potential equations, and
worked out the details for two situations: osmotic pressure and equilibrium sedimentation. These
are just two examples among many different ways that external forces come into play in biochemical
systems.
60
CHAPTER 6
Electrostatic potential energy, ion transport, and membrane potentials
In previous lectures we covered scenarios where molecules were subject to specific forces. In this
lecture we will look at ions that are subject to forces arising from voltage or electrostatic potential
differences. We discussed ionic interactions earlier in a different context, where we dealt with the
energy an ion experiences from being in solution with other ions around it. In this chapter we will
deal with electrostatic interactions in a different context. We will consider how ions are distributed
in space as a result of an electrostatic potential (i.e. a voltage) that is different at different locations.
A main focus will be on situations where the electrostatic potential is different on the two sides of a
semi-permeable membrane. This has wide applications to molecular biology and electrophysiology.
The chemical potential energy of an ion at a position of electrostatic potential
We need to know what energy to associate with a charge residing at a particular electrostatic
potential (which is a voltage). You’ll recall from introductory physics that the work required (or
potential energy generated) in moving a charge q to an electrostatic potential is U = q. For our
purposes we need energy on a per mole basis. You’ll recall that the charge on a mole of elementary
particles is NA*e = 6.02 1023 * 1.6 10-19Coulombs = 96,500 C, which is defined as one Faraday, F. That
means if we are considering a particular kind of ion whose valence charge is z (e.g. zCa2+ = 2), then the
charge q on a mole of those ions will be q=zF. Finally, the potential energy gained by putting that
charge at an electrostatic potential (on a per mole basis) is U = zF. From this we can write our
equation for chemical potential energy in the presence of an electrostatic potential:
i = i0 + RT ln Ci + ziF
The Nernst equation and membrane potential
Suppose we have a system with two chambers separated by a
membrane that is permeable to the ionic species in question. Our
interest here is in situations where the electrostatic potential is
different on side A vs side B, that is A B. The equilibrium process
of interest is the transfer of ionic species i from side A to side B. From
before we know that this means i,A = i,B.
The separate equations for the chemical potential of species i on the
two sides, taking into account electrostatic energies, would be:
i,A = 0i,A + RT ln Ci,A + ziFA
61
and
i,B = 0i,B + RT ln Ci,B + ziFB
Setting the chemical potential equal to each other and rearranging gives:
= B - A = RT/(ziF) ln(Ci,A/Ci,B)
This is one form of the Nernst equation. It tells us that the electrostatic potential difference between
the two sides is related to the log of the concentrations of the ion on the two sides (assuming that the
ion is free to reach equilibrium across the membrane). Note the effect of the sign of the charge, z. A
negation of z reverses the effect. Consider first a case where z is positive (e.g. Na+ ions). The equation
tells us that if the potential is higher on side B (here meaning > 0), then the concentration of
positively charged ions will be higher on side A. The reverse is true for a negatively charged ion that
is free to equilibrate; it would be more concentrated on side B if the potential is more positive on that
side. At first this might seem backwards. How can the potential be higher on the right if the positively
charged ions are more abundant on the left? The short answer is that it is important to keep in mind
that the ions here are responding to an electrostatic potential that exists in the system. That is, the
unequal concentration of ions is an effect and not the cause of the potential difference here .
It is instructive to note that the voltage difference in the equation does not carry a subscript for
the ion or its charge. That means that if there are multiple charged species in the system that are at
equilibrium between the two sides, then the ratios of their concentrations must give the same value
for . Evidently the concentration ratios for different ions must be related to each other. We can
write out two versions of the equation above, one for ion i and the other for ion j, and then equate the
two potentials to give:
ln(Ci,B/Ci,A)/zi = ln(Cj,B/Cj,A)/zj and (𝐶𝑖,𝐵
𝐶𝑖,𝐴⁄ )
1𝑧𝑖
⁄
= (𝐶𝑗,𝐵
𝐶𝑗,𝐴⁄ )
1𝑧𝑗⁄
As an example, if Na+ and Cl- ions are both at equilibrium between the two sides, then
(𝐶𝑁𝑎+,𝐵
𝐶𝑁𝑎+,𝐴⁄ ) = (
𝐶𝐶𝑙−,𝐴𝐶𝐶𝑙−,𝐵
⁄ ) and (𝐶𝑁𝑎+,𝐵𝐶𝐶𝑙−,𝐵) = (𝐶𝑁𝑎+,𝐴𝐶𝐶𝑙−,𝐴). The equation would be
more complex if the charges were not plus or minus one, but in this simple case the product of sodium
and chloride ions on the two sides is equal at equilibrium. This result will be convenient in a
calculation shortly.
The previous equations describe equilibrium conditions. Away from equilibrium, the free energy on
a per mole basis for ion transport between positions A and B, where the ion concentrations and
electrostatic potentials may both be different, would be:
G = RT ln(CB/CA) + zF(B – A)
62
The Donnan potential
So far we have discussed how ions that can equilibrate are driven to unequal concentrations
depending on the electrostatic potential difference that exists between two positions. But what might
be the source of the electrostatic potential difference? We already discussed that it is not caused by
the unequal distribution of ions that are able to equilibrate – their distribution is in the opposing
direction. One possibility would be an external applied voltage with electrodes on the two sides.
Problems based on those sorts of electrochemical cells are typically discussed in introductory
chemistry courses. But electrostatic potential differences – between the outside and inside of a cell
for example – are a common subject in cellular and biochemical systems, and in those cases an
external battery voltage is rarely the origin of the electrostatic potential.
Here we illustrate a highly simplified
system that shows how an unequal
distribution of an ion that cannot cross a
membrane can give rise to an electrostatic
potential. This is called a Donnan
potential. We begin with a simple two-
chamber system like before, but now we
put a protein molecule on side A only, and
assume it has a negative charge.
Counterions would also be present so we’ll
begin the setup with an equi-molar
concentration of protein- and Na+ ions on
the A side. We’ll denote this starting
concentration as x. Now in addition let’s
say that a certain amount of salt (NaCl) is
added to both sides to start. Call this
concentration s. Now let’s say that the Na+
and Cl– ions can cross the membrane but the protein cannot. What happens? We can answer that
question by supposing that some quantity of Na+ and Cl– crosses the boundary in order to reach
equilibrium; the amounts of Na+ and Cl– that cross should be equal in order to maintain
electroneutrality. Let’s assume the volumes of the two sides are equal for simplicity, so that the molar
concentration change as a result of Na and Cl- movement is the same on both sides, and call that value
d (plus d on the right, minus d on the left). Now we can establish the concentrations at equilibrium
by using the equation we worked out earlier that told us the product of Na+ and Cl– concentrations
must be the same on the A side and the B side at equilibrium, assuming they are both free to
equilibrate.
We get:
(x+s-d)*(s-d) = (s+d)*(s+d)
63
sx - dx = 4sd
The values of x and s are fixed quantities related to the initial concentrations. Solving for the desired
quantity d gives:
d = sx/(x+4s)
Having solved for d, we can write out expressions for the final concentrations of the Na+ and Cl– ions.
From there we can ask whether there is an electrostatic potential difference by applying the Nernst
equation to the concentration of ions on the two sides. As discussed before we should get the same
answer regardless of whether we examine the Na+ or Cl– ions since both are able to equilibrate.
Taking the Na+ ions, we get:
= B - A = RT/(Fzi) ln(CNa+,A/CNa+,B) where z for Na+ is +1
Plugging in the expression for d at equilibrium, the argument in the log function is
(x+ s - sx/(x+4s))/(s + sx/(x+4s)) which simplifies to (x+2s)/(2s). So,
= B - A = = B - A = RT/(Fzi) ln((x+2s)/(2s))
This equation is very specific for the way we set up the problem, so it doesn’t represent a general
finding, but it does let us evaluate the electrostatic potential under a given set of initial conditions.
Suppose for example that the concentration of salt added (s) was equal to the molar concentration of
the protein (x). Then the argument to the log function is simply (x+2s)/(2s) = 3/2. The value of RT/F
near room temperature is 0.0256 V (or 25.6 millivolts), which is a useful simplification worth
remembering. Finally, we get
= 0.0256 V * ln(3/2) = 0.010 V = 10 mV
Note that the way we defined means that the potential is higher on the B side compared to the A
side. Where did this voltage come from (since we didn’t apply an external voltage)? It comes from
the charge on the species (the protein in this case) that is confined to one side. Note that the protein,
which we took to have a negative charge, is generating a negative potential on the side where it
resides.
The situation is actually a bit more complicated. For example, how could there be a voltage if we
assumed electroneutrality on the two sides, since voltage is really a charge separation. This is a fair
objection, but the energy associated with macroscopic charge separation is very high, so while there
would in fact be a small amount of charge separation creating net charge on the two sides, that minor
charge imbalance would not affect the ion concentrations significantly. Evidently, very slightly more
Na+ than Cl- would cross from left to right and this would give a slight charge separation with net
negative charge resulting on the left, consistent with the negative voltage on the left.
64
Another point of interest is to look at what would happen in a system like this if we were to add
excess salt, that is s >> x. In that case, the argument of the log function from above ((x+2s)/(2s))
approaches 1, and the log goes to 0, so ≈ 0. This shows that the Donnan potential goes away if
excess ions are present that can equilibrate freely between the two sides.
Variable ion permeabilities and complex phenomena
In our previous discussions we treated simplified situations where different ions were either
completely free to equilibrate or totally unable to permeate the membrane. This was helpful in
gaining intuition about what drives the creation of electrostatic potentials, but relevant biological
scenarios are much more complicated. The membrane has very different degrees of permeability to
different ions. Furthermore, the distributions of ions across a cell membrane do not reflect
equilibrium ratios but are instead the result of a steady state process (or even a dynamic process
changing over time). Depending on their permeabilities, the ions are flowing down their chemical
(or electrochemical) gradients at the same time that transmembrane protein pumps are continuing
to transport them against those gradients. Ion permeabilities are therefore fundamental to
understanding the potential across the biological membrane. For example, it is changes in the
permeability of the membrane for certain ions that drives changes in the membrane potential during
nerve conduction. How can the cell membrane have different (and controllable) permeabilities to
different ions? Transmembrane protein channels provide the answer. They can be highly specific
for certain ions. And whether they
are in open or closed conformations
can be controlled by ligand binding
or other phenomena, including
things like pressure.
A simplified scheme at the right
illustrates the concentration
gradients of Na+ and K+ across a
typical cell. These gradients are
created at the expense of energy
input (e.g. ATP hydrolysis). The
inward pumping of K+ and the
outward pumping of Na+ results in
K+ being higher inside the cell and
Na+ being higher outside the cell. Now, if the membrane is more permeable to K+ ions than Na+ ions,
by virtue of a potassium channel for example, then we can understand the resulting membrane
potential. [The membrane potential in a typical cell is in the range of -40 to -80 mV, meaning the
inside of the cell has a negative potential.] One thing we learned was that a Donnan potential is
created by ions that can’t cross the membrane (or that cross very slowly). In the cellular scheme
here, the Na+ ions are the ones least able to cross (since the K+ channel doesn’t allow Na+ ions to pass),
65
and the higher concentration of this species outside the cell is consistent with the outside having the
positive potential. Another way of looking at it is in terms of the net charge separation that would
occur across the membrane. The gradients for Na+ and K+ are in different directions. Na+ ions are
trying to move back into the cell as fast as they can while K+ ions are trying to exit across the
membrane as fast as they can. But owing to the K+ channel, K+ ions exit more easily than Na+ ion
enter, thereby creating a small charge separation with more positive charge on the outside, again
consistent with the correct sign of the voltage across the cell. Note that the concentration gradients
of the ions in this scenario cannot simply be used to evaluate the membrane potential because
the ions are not reaching equilibrium between the two sides; their concentration gradients reflect
the activity of membrane pumps.
Finally, you can see from the preceding arguments how changing the ion permeabilities would affect
the membrane potential. Those permeabilities are controlled by opening and closing, or ‘gating’ of
membrane channels. A simplified description of how nerve conduction depends on ion
permeabilities goes something like this: a neuron with a negative resting potential receives a signal
that causes Na+ channels in the membrane to open; according to our earlier discussions, this reduces
the membrane potential (i.e. raising it closer to 0); this depolarizing voltage change is conducted
down the length of the axon like an electrical current down a wire; at the axon terminal, this
depolarization across the membrane causes Ca2+ channels in the axon terminal to open up; the resting
Ca2+ concentration is higher in the synaptic space between neurons, so Ca2+ ions flow into the axon
terminal; the increasing Ca2+ concentration inside the axon terminal triggers the fusion of synaptic
vesicles with the inner membrane of the axon terminal, releasing the enclosed neurotransmitter into
the synaptic space; the neurotransmitter diffuses across the synaptic cleft and binds to receptors on
the adjacent cell (e.g. a muscle cell or another nerve cell); depending on the cell receiving the signal,
binding of the neurotransmitter to the receptor may open up a sodium channel on the next neuron
to propagate the electric signal, or cause some other event, like muscle contraction or sensory
signaling.
Molecular Electrostatics
We will continue with our discussion of electrostatics, focusing here on the forces they exert and the
effects they have on macromolecules and their conformations. We are familiar with Coulomb’s law,
which tells us that the force between two charges goes as the product of the two charges divided by
r squared.
𝐹 =𝑞1𝑞2
𝜖𝑟2
This form of the equation applies when using cgs units.
[The SI form of Coulomb’s law includes an extra term
66
(4𝜖0) in the denominator, which is dropped from the cgs equation by having it absorbed into the
definition of the cgs electrostatic unit of charge.]
A related equation for potential energy (U) can be obtained by taking the force equation and
integrating over r and negating (since F=-dU/dr) to give:
𝑈 =𝑞1𝑞2
𝜖𝑟
In addition, recalling that the energy for placing a charge q at a potential is U=q, we can see that
the equation above implies that a single charge creates a potential around it given by
Φ =𝑞
𝜖𝑟
The dielectric value
The equations above for electrostatic forces and energies are likely familiar, but what is sometimes
overlooked is the importance of the medium in which the interactions takes place. This is captured
by the dielectric value 𝜖, which occurs in the denominator. Roughly speaking, the dielectric describes
how polarizable the medium is. For a vacuum, 𝜖=1, which is why the term is sometimes dropped in
the equations above, for example in introductory physics problems. But it is vital for biochemical
situations. The dielectric value for water is around 78! That means electrostatic energy calculations
that take place in aqueous solutions may be off by nearly two orders of magnitude if the dielectric
value is not handled properly. The extremely high dielectric value for water relates to its large dipole
moment. Water molecules in an electric field tend to orient themselves in a way that gives the lowest
energy, i.e. with the oxygen atom pointing in the direction opposite of the electric field vector. The
effect is to diminish or screen the net electrostatic force.
The dielectric value in less polar materials is much lower than in water. For hydrocarbons (which
serve as a model for the interior of a lipid membrane) the dielectric value is between about 2 and 4.
As we will discuss later, charged amino acids are important in protein structure, and so the value of
the dielectric for a protein molecule is an important (and long debated) issue. Values between 4 and
20 occur in the literature for the dielectric in the interior of a protein. For a charge that resides on
the surface of a protein, exposed to water, the relevant value is probably close to that for pure water.
Simplified electrostatics equations
The equations above are clumsy to apply unless you remember what the value is for an elementary
charge in cgs electrostatic units. Instead it is convenient to convert the equations to forms that can
be applied more easily, using integer values, z, for the charges (e.g. z=1 for Na+). Simplified equations
that apply near room temperature are:
67
𝑈 =𝑧1𝑧2
𝜖𝑟1389 kJ/mol (where r must be in Angstroms)
and
Φ =𝑞
𝜖𝑟 14.4 Volts (where r must be in Angstroms)
Examples:
1) How much energy does it take to bring two Na+ ions from a starting distance of infinity to a final
distance of 4 Å if the dielectric value is 78? Answer: 4.5 kJ/mol. Is this energy significant or not?
Recall RT ≈ 2.5 kJ/mol, so the magnitude of the effect would be exp(-4.5kJ/2.5kJ)=0.17.
2) How much energy might an ion-pair contribute to the stability of a folded protein? Suppose the
situation in question is an aspartate side chain that is 5 Å away from a lysine. Suppose the interaction
takes place near the protein surface where the high dielectric of water makes the effective dielectric
there about 40. Answer: -6.9 kJ/mol, and the magnitude of the effect on K would be
exp(+6.9kJ/2.5kJ)=16.
A different kind of electrostatic energy: the Born ‘self-charging energy’
A powerful but underappreciated idea arises by considering a hypothetic process of creating a unit
charge out of infinitesimal charge elements, dq. [This is an example of a kind of ‘thought experiment’
referred to by physicists as a ‘gedanken experiment’; essentially an experiment that can only be
performed in one’s mind.]
Form our previous discussions we know that the (differential) energy
required to bring a (differential) charge dq to a position where the
potential is is dU=dq. Here, is the potential at the place where we
are depositing the charge, which is at the surface of the ion being created
(in our imagination). If the radius of the ion is a, then from above we know
the potential there is q/(𝜖a), where q is whatever charge has already been
deposited. We can obtain the hypothetical energy for creating this charge,
which is usually referred to as the Born self-charging energy, by
integrating our differential energy over q.
Born self-charging energy: 𝑈 = ∫𝑑𝑈 = ∫𝑞
𝜖𝑎𝑑𝑞 =
1
2
𝑞
0𝑞2 (
1
𝜖𝑎)
Again, this can be made more convenient:
Born self-charging energy: 𝑈 = 𝑧2 (1
𝜖𝑎) (1389/2) kJ/mol (where a must be in Angstroms)
68
The imaginary idea of creating a charge from nothing may seem silly at first, but it gives us a powerful
result relating to the energy for a (very real) process of transferring an ion between two locations
where the dielectric is different.
Free energy of ion transfer
As we discussed earlier, the cell is complex and the dielectric is different in different places. The low
dielectric of the lipid bilayer is particularly noteworthy, especially given the physiological importance
of ion passage through membranes. Let’s look then at the energy associated with transferring an ion
from aqueous solution into a lipid bilayer. [We’ll assume here that the energy can be considered as
a contribution to the free energy of the process.] We can think of the transfer process as a
composition of separate steps: reversing the imaginary ion creation process in the first medium, then
transferring the infinitesimal charges into the second medium (at no energy cost since they are
infinitesimal), and then recreating the charge in the second medium. Evidently, the transfer free
energy is just the difference between the energies required to create the charge in the two different
media. If the dielectric values for the two media are 𝜖1 and 𝜖2, then the free energy of ion transfer
from medium 1 to medium 2 would be:
Δ𝐺𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 = 𝑧2
𝑎(1
𝜖2−
1
𝜖1) (1389/2) kJ/mol
Things to note here are that z is squared, so the effect is
the same for positive or negative ions. Second is that the
energy is positive (i.e. unfavorable) if the transfer is to a
lower dielectric, as expected. Third, note the
dependence on the radius of the ion; the energy term is
larger if the ion is small since the charge is more
localized. Lastly is the magnitude of the effect. Consider
the free energy of transferring a sodium ion from water
into the middle of a lipid bilayer. Take 1 Å as an
approximation for the ion radius. Let the dielectric be 4 for the bilayer and 80 for the water. Under
those approximations, Gtransfer = 12*(1/1Å)*(1/4 – 1/80)*1389/2 kJ/mol = 165 kJ/mol. Is this big
or small? 170 kJ/mol = 66*RT ! This is an enormous energy barrier. This exercise demonstrates the
virtual impermeability of a lipid bilayer to naked ions. What is going on at the atomic level that would
explain why an ion prefers so strongly to remain in water rather than in the lipd? In water (or another
material of high dielectric), the surrounding molecules rearrange themselves to interact with the ion
in ways that corresponds to an overall favorable energy. This is not possible in the lipid bilayer (or
other medium of low dielectric). The same underlying idea is sometimes described in terms of the
energy required to desolvate an ion when it is moved into a non-polar environment.
69
Processes that generate, maintain and exploit ion gradients across membranes form the basis for
most of the energy conversions that occur in biology. This is only possible because ions cannot easily
cross the bilayer. The complex energy conversion processes in the cell are conducted by
transmembrane proteins (pumps, channels, and transporter) that reside in the bilayer.
70
CHAPTER 7
Energetics of Protein Folding
Proteins acquire their unique functions by folding up into specific three-dimensional structures. This
gives them the shapes and arrangements of chemical groups required to carry out their activities.
The details of how proteins manage to reach their correct shapes from a starting point of being
extruded in a more or less extended conformation from the ribosome, and what energetic features
stabilize their final configurations, are questions of considerable importance from a fundamental
perspective and for practical reasons as well. Numerous biotechnology and pharmaceutical
problems revolve around stabilizing enzymes or other types of proteins.
A balance between large opposing forces
You’ve learned before about simple molecules that have multiple conformations whose relative
energies dictate which is more preferred; the chair vs boat conformation for cyclohexane derivatives
is an example from organic chemistry. The problem of protein stability is considerably more
complex. One thing that makes the protein stability problem unique is the sheer size of the molecules
involved; thousands or tens of thousands of atoms are interacting with each other. Another
important consequence of the size and mainly linear covalent structure of protein molecules is the
vastness of the possible configurations each one could adopt in principle, by variation of the phi-psi
torsion angles along its backbone, not to mention the side chain conformations. The vastness of this
conformational space (practically all of which represents non-native configurations of the protein)
means that folding a protein into its native configuration comes at an enormous cost in terms of lost
entropy. This cost must be offset by very large numbers of favorable interactions between the
thousands of protein atoms in the natively folded conformation.
The arguments above paint a unique picture for protein energetics. The total net energy that
stabilizes a folded protein over its unfolded state is typically not very large; it arises as a relatively
small difference between very large energetic terms working in opposing directions. You can
imagine then that rather small changes to the amino acid sequence of a protein might offset this
balance, and indeed minor changes to the sequence of a protein often have surprisingly large (and
frequently unexpected) effects on protein stability and function.
As a rough numerical estimate, the conformational entropy lost in going from a flexible protein
backbone to a particular conformation is about 5 kcal/mol per amino acid residue, meaning 1000
kcal/mol for a 200 amino acid protein for example. [Confusingly, protein energetics are often
discussed in kcal instead of kJ; multiply by about 4.2 to convert from kcal to kJ.] In contrast, the net
stability (G0) for the process of protein folding is much smaller, often in the range of -5 to -10
kcal/mol (unfolded folded). This is fairly large compared to RT, so typical proteins have
stabilities that keep them nearly exclusively in their correctly folded configurations (at least under
the right conditions, though those aren’t always known or easy to replicate in vitro), but as noted
71
above, this net stability is small compared to the magnitudes of the opposing energetic terms that
must balance out in the end.
Terms that contribute to the energetics of protein folding
You are familiar already with the various forces involved in atomic interactions. Here we will
summarize how those forces relate to the particular problem of protein stability.
Electrostatics
We discussed charge-charge or ‘salt-bridge’ interactions earlier. They contribute to the stabilization
of proteins with magnitudes that are somewhat modest since they tend to occur near the surface of
proteins where the high dielectric of water reduces their strength. But they can be important,
particularly with the view that net protein stabilization has to come from the accumulation of many
smaller energetic contributions.
Following from our earlier discussions, the unfavorable energetics of putting a charge in a region of
lower dielectric also has important consequences for protein structure. Several of the natural amino
acids are charged (aspartate, glutamate, lysine, and arginine), so we expect to find those amino acids
almost exclusively on the exterior of a protein when it is in its correctly folded configuration. Charged
amino acids are occasionally found buried in the interior of a protein, but those are usually cases
where that particular amino acid is playing a critical role, for example in the catalytic cycle of an
enzyme; it is sometimes necessary for a protein to pay the cost of an unfavorable energetic feature in
order to achieve a required function.
The cost of burying a charge inside a protein also means that the natural pKa values of amino acids
can be significantly different compared to the textbook values that give the pKa value of amino acids
dissolved in water. Placement of a titratable group (i.e. a group that can add or lose protons) in a low
dielectric shifts the equilibrium position towards the neutral form. Does that raise or lower the pKa
of a carboxylate group? What about the amino group of lysine? How much do you expect the pKa
values to change? [Hint: You know how pH and pKa values relate to concentration ratios: 1 unit equals
a factor of 10. And you know how energy differences affect equilibrium ratios: divide the energy by
RT and exponentiate. And you know how to estimate the energetic cost of burying a charge according
to our previous equations for ion transfer.]
van der Waals or London dispersion forces: favorable atomic packing
You’ll recall from earlier coursework that van der Waals or London dispersion forces, sometimes
colloquially called ‘packing’ forces, are relatively weak. That may be true on an individual basis, but
with thousands of atoms the effects are extremely important. It is notable that when one examines
the structures of proteins in atomic detail, the atomic packing is seen to be generally very tight. The
atomic packing density in most protein interiors is similar to the packing seen in solid crystals of
72
organic molecules. That is a manifestation of the favorable energy associated with atom-atom
contacts on a large scale.
How does the good packing achieved in protein interiors relate to the stability of the protein in the
native state? The answer here is not so straightforward. Note that if the protein was in an unfolded
configuration it would likely be able to make good atomic contacts with the water molecules
surrounding it; water molecules are small enough to be arranged in ways that would give good
packing. The more important consideration is that if a protein had the wrong amino acid sequence,
for example if we mutated a small side chain to a large one or vice versa, then the atomic packing in
the natively folded configuration may be seriously disrupted. In that sense, favorable packing in a
protein may not be a major driving force towards folding, but if the native packing is compromised
then surely the folded configuration will be destabilized compared to the unfolded state.
Energies due to packing defect have been estimated to be about 0.5 kcal/mol per methylene-sized
cavity. That gives a rough estimate for the consequence of replacing a larger amino acid side chain
with one that does not fill the space properly. On the other hand it is hard to estimate the effect of
adding a larger amino acid side chain. You’ll recall that modeling the van der Waals potential energy
using the Lennard-Jones equation gives a very sharp rise (going as the 12th power of the interatomic
separation) in energy for steric overlap. So the cost of adding even one methylene to a place where
there might not be space can be catastrophic for the stability of the folded state.
Hydrogen bonding
Hydrogen bonding is a very specific type of interaction; it has some features of bonding (i.e. orbital
overlap) but it is mainly an electrostatic feature. It arises from (1) a hydrogen atom that carries a
partial positive charge owing to its covalent attachment to an electronegative atom (typically N or
O), which is referred to as the ‘donor’, and (2) a lone pair of electrons on an electronegative atom
(typically N or O), which is referred to as the ‘acceptor’. The strongest hydrogen bonds are where the
lone pair, the hydrogen, and the heavy atom attached to the hydrogen are arranged at least roughly
in a straight line.
The energy contributed by a hydrogen bond can
be estimated from model studies of small
molecules in solution to be in the range of about
5 kcal/mol. There are however many nuances
regarding hydrogen bonding: hydrogen bonds
involving a negatively charged acceptor like a
carboxylate can be extra strong, the lower
dielectric of the protein interior might magnify
the effects of hydrogen bonding, multiple
hydrogen bonds working together might benefit
from a cooperative effect, and so on. As a result,
the role of hydrogen bonding in proteins is a constantly discussed and debated issue. However,
73
certain points are clear. Proteins are full of hydrogen bond donors and acceptors. This is true of the
polypeptide backbone in particular; every peptide unit has a carbonyl acceptor and an amide
nitrogen donor. The cumulative energetics of hydrogen bonding is therefore substantial. It is
important to note however that water has excellent hydrogen bonding properties, so a protein in an
unfolded configuration can satisfy all its hydrogen bonding groups through interactions with water.
Accordingly, it may be that hydrogen bonding is not a major driving force for folding. On the other
hand, following the same logic as above regarding the importance of good packing in the natively
folded state, if we were to alter a protein in such a way that we created unsatisfied hydrogen bond
donors or acceptors in the interior, then this would surely destabilize the protein (since those donors
and acceptors could satisfy their hydrogen bonding needs by exposure to water in the unfolded
state). For example, if a serine side chain is buried in the interior of a natively folded protein and its
hydroxyl group is hydrogen bonded to a histidine, and then the histidine is replaced by mutation to
something like valine that lacks the required hydrogen bonding capacity, the serine would have
unsatisfied hydrogen bonding needs, and this could be highly destabilizing.
Hydrophobic effect
As you may have learned before, the hydrophobic effect is generally accepted to be the major driving
force for protein folding, at least for typical globular proteins. But the hydrophobic effect is in fact
not a separate force. It is instead a complex phenomenon arising from many-body interactions. The
net effect is that nonpolar molecules or functional groups are driven to associate with other nonpolar
molecules by being excluded from interactions with water. The name “hydrophobic” conjures the
idea that a nonpolar molecule doesn’t like water because it can’t make good interactions there, but a
closer look shows something a bit different. Three different kinds of interactions are possible here:
nonpolar-nonpolar, nonpolar-water, water-water. The nonpolar-nonpolar interaction benefits from
favorable van der Waals energies. A nonpolar molecule also benefits from good van der Waals
interactions if it is surrounded by water. So from the perspective of the nonpolar molecule, the
energetic difference is small whether it interacts with another nonpolar molecule or with water. But
things are different from the perspective of the water. Water makes highly favorable hydrogen
bonding interactions with itself. Some of those interactions must be lost if a water molecule is in
contact with a nonpolar molecule. So really it is the water molecules that don’t want to interact with
the nonpolar solute. The effect is the same in any case, the two kinds of molecules are driven to have
the least amount of interaction with each other as possible. From the description you can see that
the magnitude of the unfavorable energy relates to the surface area of the interaction.
As a side note, it is surprising to learn that when the unfavorable free energy associated with
transferring a nonpolar solute from an organic phase to water is examined in more detail
experimentally, one finds that the unfavorable free energy is not the result of a positive enthalpy
change, but is instead the result of a negative (unfavorable) change in entropy. This has been
explained by noting that at the interface between a nonpolar solute and water, the water molecules
are driven into highly ordered arrangements (sometimes referred to as clathrates) presumably in
order to recover as many of their lost hydrogen bonds as possible.
74
The magnitude of the hydrophobic effect has been estimated to be about 22 cal/mol per Å2 of
interaction area. A typical amino acid side chain has an area in the 100 – 200 Å2 range, and of course
many of the natural amino acids are nonpolar. Clearly the cumulative magnitude is very large. And
perhaps most critical is that these hydrophobic interactions are entirely different in the unfolded vs
the natively folded state (in contrast to some of the other energetic terms we discussed earlier). In
the unfolded state, enormous amounts of nonpolar surface would be exposed to solvent. As a result,
for a correctly folded protein molecule, the nonpolar side chains are mainly buried in the interior in
the ‘hydrophobic core’.
The special case of membrane proteins
More than a quarter of the protein molecules coded for by a cell are not soluble in the cytosol, but
instead spend their lives embedded in a lipid bilayer (either the cell membrane or a membrane
surrounding one of the various organelles in a cell). The energetic considerations for these
transmembrane (or TM) proteins are unique in some profound ways.
Enforcement of regular secondary structure in the membrane region
One of the most profound effects of the lipid blilayer environment is on the secondary structure of
proteins that span the membrane. We discussed the importance of proteins being able to satisfy all
or nearly all of their hydrogen bonding groups in order to maintain stability. And we noted that the
polypeptide backbone is full of hydrogen bond donors and acceptors that need to be satisfied. In
aqueous proteins, the need to satisfy backbone hydrogen bonds can be achieved with relative ease.
The backbone can either adopt regular secondary structure elements (which by their nature satisfy
backbone hydrogen bonding), in various orientations and with turns or longer unstructured loops
connecting them in almost unlimited fashion. The figure below (left) is just one example of the
tertiary structure of an aqueous protein. Aqueous proteins can have practically limitless structures
because regions of the backbone that are not in regular secondary structure elements can satisfy
their hydrogen bonding needs using water instead. That is not possible for transmembrane proteins.
The lipid bilayer is almost devoid of water. Therefore, where the protein is embedded in the bilayer,
it must practically always adopt strictly regular secondary structure so that the backbone will be
satisfied. There are two basic classes of transmembrane proteins: those that contain of a bundle of
alpha helices (or sometimes just one TM helix), and those that consist of a beta barrel, which is
essentially a beta sheet that is rolled up so that there are no unsatisfied edges. Those two classes are
illustrated below (middle and right). The alpha helix class is more abundant, but the beta barrel class
is common where large pores in a membrane are needed, i.e. in the outer membrane of many bacteria.
There are a few known cases where a protein enters only part way into the membrane or forms some
other structure that seems to involve unsatisfied hydrogen bonds, but they constitute rare
exceptions.
75
The problem of the missing hydrophobic effect
Another major difference between TM proteins and aqueous proteins concerns the hydrophobic
effect and how TM proteins can be stabilized in their native forms. You might have already surmised
that the outer surface of a TM protein (at least the region that is embedded in the bilayer) needs to
be nonpolar. Otherwise it would not partition into the membrane. But that is a major distinction
compared to aqueous proteins. Aqueous proteins have polar/charged surfaces and nonpolar
interiors, and that is what drives their folding in the presence of water. But if TM proteins have
nonpolar interiors and nonpolar exteriors as well, and are not surrounded by water in any case, then
it seems the hydrophobic effect cannot play a major role. The real situation is somewhat more
complicated, but the answer to this puzzle remains largely unanswered.
Measuring the Stability of Proteins
Much of our previous discussion has concerned the stability of protein molecules, meaning how much
lower the energy of the native configuration is compared to the unfolded configuration(s). In essence
we want to know K, and hence G0 (from G0 = -RT lnK), for the process:
U N where U denotes unfolded and N denotes natively folded
The first thing we need is to be able to tell what fraction of the protein in a sample is folded and what
fraction is unfolded (i.e. XN and XU=1-XN). This requires experimental measurement of some property
that is sensitive to whether a molecule is folded or not. We will talk later about various kinds of
experiments that satisfy this requirement – the natural fluorescence of tryptophan tends to depend
on whether it is in a polar or nonpolar environment, so you can see how that might suffice – but for
now we will keep it abstract and just say that there is some property P that we can measure for a
sample, and that the value of P should change depending on what fraction of the protein is folded.
76
At this point you might feel like you have enough to figure out K. If your measurement tells you what
fraction is folded, then you know the equilibrium constant for folding is K=XN/XU. But, we have a
problem related to sensitivity. Even though the native state of a typical protein is not extremely
stable, it is usually stable enough that K=exp(-G0/RT) is a large number, meaning the fraction of the
protein that is unfolded is very small, in fact too small to measure accurately. Another way of seeing
this is that a practical
measurement is not going
to be able to tell the
difference between
whether 1 in 1,000
molecules are unfolded vs 1
in 1,000,000. The
difference in the signal
between those two cases
would be too small to
measure even though the
value of K would be
different by a factor of
1,000.
What is the solution to this
problem? In order to get at
the value of -G0, the
standard approach is to
artificially shift the system towards the unfolded state by adding a chemical denaturant like urea. If
we go to conditions where both the native and unfolded states are reasonably populated, then at that
point we can figure out what fraction of the protein is folded. This can be done by measuring the
value of our property P under those conditions and then comparing it to the values of the property
under conditions where the protein is fully folded and where it is totally unfolded. The algebra for
this is shown in the figure. Intuitively you can see how the procedure makes sense. If you measure
the property under some amount of denaturant and you see that the value of the property is exactly
halfway between the value you get for fully folded and fully unfolded, then the sample must be half
folded and half unfolded, i.e. K=1. The calculation for an
arbitrary degree of folding is only slightly more complex.
We see now that under conditions where both forms of
the protein are populated we can calculate K and then
G0. But that experiment would just tell us what G0 was
under conditions where we had added denaturant to
destabilize the protein. What good is that? The answer
is that if we repeat the experiment at several different
denaturant concentrations, we should be able to
calculate G0 as a function of denaturant concentration.
77
Then, if we believe a simplistic theory that argues that G0 depends on denaturant concentration in
a roughly linear fashion, then we should be able to estimate G0 in the absence of denaturant, by
extrapolation. This final step of the analysis is illustrated above.
Ideas Related to How Proteins Reach their Folded Configurations
Our discussions up to this point have been only about the initial (unfolded) and final (natively folded)
protein. How or why a protein finds its correctly folded configuration is another question, and one
that has occupied protein scientists for the last half-century.
In 1961 Christian Anfinsen performed seminal experiments showing that the enzyme ribonuclease
A (RNaseA) could be unfolded and then refolded after removal of the denaturant. This showed that
the native three-dimensional structure of the protein is encoded in the linear amino acid sequence.
This seems a bit obvious decades after the fact, but the demonstration that the protein could find its
correct structure outside the cell, without other influences, was an important conceptual advance.
Anfinsen asserted that this meant that the amino acid sequence encoded the correct three-
dimensional structure by having the correct three-dimensional structure be the lowest possible
energy. That idea is known as the “Thermodynamic Hypothesis”. Some 60 years after Anfinsen, we
understand that the situation is rather more complex.
In 1969 Cyrus Levinthal formalized an argument that the number of possible configurations a protein
could conceivably adopt is vastly greater than could ever be sampled by a protein molecule wiggling
around in solution in a reasonable time. Yet most proteins fold on the time scale of seconds or faster.
His calculation was something like this. Consider a protein with 200 amino acids. Assume that only
three different phi-psi backbone configurations need to be sampled at each amino acid position –
based on the idea of choosing between helix or beta or random loop – which is clearly an
underestimate. And suppose that a protein can sample a new configuration at a speed that is limited
by molecular vibrations (from quantum mechanics, kBT/h ≈ 1013/sec).
The time required to sample 3200 conformation would be 3200/1013 >> age of the universe. How can
the Thermodynamic Hypothesis make sense if there isn’t any way a protein molecule could search
the space of all possible configurations in order to end up at the lowest energy configuration. This is
known as the ‘Levinthal paradox’.
The Levinthal paradox has motivated decades of research on protein folding. Work in the 1980’s,
especially by Robert (Buzz) Baldwin and Peter Kim, focused on the idea of specific ‘pathways’ that
might guide a protein from its unfolded state to its native state. For example:
U I1 I2 N
78
where I1 and I2, etc, are well-defined intermediates that
would be populated on the way to the native state. Work
along this line involved a search for cases where well
defined intermediates could be detected.
How can one differentiate between a process that occurs
as a two-state transition (without any populated
intermediates) vs. a process with populated
intermediates? There are multiple distinctions. One has
to do with kinetic behavior. We will discuss such topics
later, but for the moment we will just say that a single step
(or two-state) transition gives a simple exponential
approach to the final equilibrium position, whereas a
process with multiple transitions can give more
complicated kinetics, including a lag phase, as
diagrammed here. In a few cases, experiments have
identified specific protein folding intermediates, but they
have not emerged as a general feature of protein folding.
Other ideas have developed to advance our
understanding.
‘Energy landscape’ theories
were developed (by Peter
Wolynes and Ken Dill and
others), with the main idea
that the multi-dimensional
energy landscape surface
for proteins must be
smooth and funneled vs.
rugged. Figures like the
one drawn here illustrate
the basic idea of a good vs
bad energy landscape for
rapid folding. If the energy
landscape is rugged, then the folding process is likely to get trapped in a local minimum. The idea of
a smoothly funneled landscape also lifts the requirement for pathway intermediates. Instead, all
downhill routes lead to the native state. Under this idea, evolution would have selected amino acid
sequences and structures whose energy landscapes were favorable.
Other ideas related to protein folding
Current ideas like energy landscape theory offer a good framework for understanding protein
folding. But important questions remain. For example, it turns out that many proteins do not fold
79
spontaneously to their native states either in vitro or in vivo. Many proteins in the cell rely on
sophisticated protein machinery known as molecular chaperones, which consume ATP to help
proteins reach their correct configurations. Apparently those proteins either do not have smoothly
funneled energy landscapes, or perhaps their native states are not the state of lowest energy. Another
wrinkle is that at high concentration and given sufficient time (or partially destabilizing conditions),
many proteins adopt an alternate beta-rich conformation and then aggregate into ‘amyloid’ fibrils.
Amyloid formation is suspected as the basis for a growing number of diseases, from Alzheimer’s to
Parkinson’s to Lou Gherig’s. Does this mean that the lowest energy configuration for some proteins
is not the natively folded state seen in the cell, but the amyloid fibril state instead? Finally, there are
some rare proteins that have extremely peculiar folded structures in which the protein backbone is
tied in a knot! How do those proteins reach their native states? The energy landscapes in those case
would seem to be rather complex and require traversal of narrow valleys to reach the native state.
The points above are of fundamental interest in biology, but they also have potentially important
practical implications. Much work has been done in the last few decades (and some notable progress
has been made) on the problem of predicting the three dimensional structures of proteins from their
amino acid sequences alone. What does it mean for those efforts if proteins might have lower energy
configurations than their native states? As you can see, the area of protein folding remains rich with
open questions.
80
CHAPTER 8
Describing the Shape Properties of Molecules
Some of our previous discussions have introduced the idea that shape is an important consideration
for the behavior and function of macromolecules. Later in the course we will talk about techniques
for determining the three-dimensional structures of macromolecules in atomic level detail. But now
we will discuss more simplified descriptions of shape that are sometimes obtained from biophysical
measurements in the laboratory.
Radius of gyration
Often we have an object or molecules whose shape is reasonably compact but it is not really a sphere.
How might we assign a single size scale that would describe such an object, in the same way that a
radius describes the size of a sphere? The ‘radius of gyration’ (RG) provides this. It is essentially an
average radius, but more accurately it is an ‘rms’ or root-mean-square radius. The general meaning
of root-mean-square is, taking the monikers in reverse order: square the quantities, then average
them, then take the square root. You’re undoubtedly familiar with this in the context of rms deviation
from the mean (of test scores for instance). For the radius of gyration, two general cases arise: (1) a
collection of discrete points or atoms (like you would have once the detailed structure of protein is
known, for example), and (2) a continuous shape defined by a boundary (like an ellipsoid for
example). We will handle the discrete case first.
Discrete objects
Assuming that the points that make up the object should all be given equal weight, the formula for
radius of gyration is:
𝑅𝐺 = (∑ 𝑟𝑖
2𝑁𝑖=1
𝑁⁄ )
12⁄
Here and in the later equations for the
continuous case, it is vital to note that the radius
of each point, ri, is its distance to the center of
mass of the object. An entirely different and
incorrect result will be obtained if the center is
not defined correctly. A simple example for an
object composed of 5 points arranged like on
the face of a die is shown. This example
happens to be two-dimensional, but the
situation is equivalent in three-dimensions. As
reminders, the distance between two points in
81
three-dimensional space is r=sqrt((x)2+(y)2+(z)2). And the center of mass of a collection of
points is obtained simply by averaging their x, y, and z coordinates separately.
You can see that this is a simple procedure, so calculating the radius of gyration given the atomic
coordinates of any molecules, large or small, is straightforward.
Objects with continuous shapes
For objects that have a continuous shape, the summation in the equation for radius of gyration must
be replaced by an integral, and the division by the number of points must be replaced by division by
the volume. Before doing that, we’ll just point out the one shape where the radius of gyration requires
no calculation.
Spherical shell (not to be confused with a solid sphere):
Every point on a spherical shell has the same radius, r. So the radius of
gyration, RG=r.
Solid sphere:
The general equation for radius of gyration for a continuous object is:
𝑅𝐺 = (∫ 𝑟2𝑑𝑉𝑉
∫ 𝑑𝑉𝑉
⁄ )
12⁄
= (∫ 𝑟2𝑑𝑉𝑉
𝑉⁄ )
12⁄
For a solid sphere, the simplest way to integrate over the whole volume
is in a series of infinitesimally thin shells of radius r and thickness dr.
The differential volume of that infinitesimally thin shell is dV=4r2dr.
The integral above becomes:
𝑅𝐺,𝑠𝑝ℎ𝑒𝑟𝑒 = (∫ 𝑟2𝑑𝑉𝑉
𝑉⁄ )
12⁄
= (∫ 4𝜋𝑟2𝑉
𝑟2𝑑𝑟𝑉
⁄ )
12⁄
= (∫ 4𝜋𝑟4𝑉
𝑑𝑟43𝜋𝑟3
⁄ )
12⁄
= (3
5𝑟2)
12⁄
= √3 5⁄ 𝑟
Note that, as expected, the radius of gyration of a solid sphere is less than the outer radius, since the
points belonging to the sphere are all at a distance less than or equal to r.
82
Ellipsoid:
Instead of resorting to some horrible integrals, we will
solve this by geometric reasoning. An ellipsoid is really
just a stretched out sphere. So let’s begin with a unit
sphere (r=1), decompose its behavior into x, y, and z
components, and then see what happens when we
stretch it out. For a solid unit sphere (r=1), from above
we know that the average value of r2 is (3/5). But for a
sphere, the x, y, and z behaviors must be the same, so that means the average value of x2, y2, and z2,
must all be the same, and because r2 = x2 + y2 + z2, we can conclude that the average values of x2, y2,
and z2 are all 1/5 for a solid unit sphere. Now if we stretch the unit sphere along the x axis alone, so
that its axial radius along x is now a instead of 1, then the average value of x2 must be (1/5)a2. There
would be no change along y or z. Then stretching by b along y and c along z, we conclude that the
average value of y2 is (1/5)b2 and z2 is (1/5)c2. Putting it back together, the average value of r2 would
be (1/5)(a2+b2+c2). [Note that this gives the expected expression for a sphere if a=b=c=r.]
Non-spherical objects have higher radii of gyration
Now let’s compare the radius of gyration for a sphere and an ellipsoid that have the same volume.
Suppose the sphere has radius 10. Its radius of gyration would be sqrt(3/5)*10 = 7.74. Now suppose
the ellipsoid has principle axes of 5, 10, and 20 (the volume of an ellipsoid is (4/3)abc, which would
be the same volume as the sphere. Its radius of gyration would be sqrt((1/5)(52+102+202)), which is
10.2.
The important thing to note here is the radius of gyration is greater for the ellipsoid compared to the
sphere. This is just one specific case, but it is a completely general conclusion that a sphere has the
lowest possible radius of gyration compared to any other possible shape of the same volume. The
significance is that if an experimental study gives us a radius of gyration of a molecule whose mass
(and therefore volume) we know, and that radius of gyration is larger than we would have expected
for a molecule of the known volume if it was a sphere, then we have established something about the
molecule’s shape: i.e. that it is nonspherical, or elongated.
The behavior of flexible polymer chains or filamentous assemblies of protein subunits
Above we discussed a way of looking at relatively compact objects. Now let’s look at the behavior of
objects that are so much longer than they are wide that they are flexible and we can analyze them by
thinking about the path their backbone takes in terms of a random process. Theories for analyzing
molecules in this way were developed by polymer chemists dating back to the 1950’s (see Paul Flory),
but biochemical systems are rich with examples that fit the description as well: long molecules of
DNA or RNA, unfolded protein molecules, and (on a longer scale) noncovalent polymers formed by
the end-to-end assembly of protein subunits, as in F-actin.
83
Persistence length
If we have to pick a single parameter that would be useful for describing the behavior of a flexible
chain, it would be a measure of its flexibility, or to be more precise, a measure of the length scale over
which it appears to be flexible (any curve seems straight if you examine a small enough length). One
specific measure of stiffness is called the persistence length. Roughly speaking, it is a measure of how
far a curve tends to proceed in the direction it started before random curvature renders its progress
in the original direction negligible. Of course to extract such a value from a curve requires repeating
the evaluation of how far it extends from many different starting points on the curve. The plot below
conveys the essence of the persistence length. Clearly, a stiffer polymer has a greater persistence
length.
Approximate persistence lengths for some biological molecules are given below. These are
approximations, and the persistence lengths of nucleic acids in particular are rather strongly
dependent on conditions like salt concentrations. The stiffness of a biological polymer often has
important implications for how it behaves. Note for instance how the exceptional stiffness of
microtubules means that they are nearly perfectly rigid over the length scale of a cell, which is
obviously important for their function in mechanical division of the cell and transport of molecular
cargo across long distances.
Polymer Persistence length DNA (double stranded) 500 Å RNA (double stranded) 800 Å F-actin filament 5 um
(< eukaryotic cell) microtubule 5 mm
(>> cell)
84
(Jointed) Random walk models
The diagrams above were based on smooth worm-like curves. A different kind of model, slightly less
realistic but mathematically more generalizable, is often used to treat problems of this type. In the
random walk model, a chain travels
straight in some direction for a distance
b (the statistical Kuhn length). Then it
takes a turn in a random direction, and
so on.
The mathematical treatment is fairly
straightforward:
N = # of steps
b = step length (Kuhn statistical length)
C = length of the curve (i.e. if it was
stretched out)
L = straight end-to-end distance
L of course would change with every random walk so we are really just interested in the average or
expected behavior of L. We can get the average behavior of L by treating it like a
vector, which is the sum of N smaller vectors, one for each step. Call the individual
step vectors li. Each one has length b and is in a random direction.
�⃗� = 𝑙1⃗⃗ + 𝑙2⃗⃗⃗ + ⋯+ 𝑙𝑁⃗⃗ ⃗
What is the expected value of |L|2? We can get the squared length of a vector by
taking a dot product of the vector with itself. Letting angle brackets denote the
average or expected value,
⟨|�⃗� |2⟩ = ⟨�⃗� ∙ �⃗� ⟩ = ⟨(𝑙1⃗⃗ + 𝑙2⃗⃗⃗ + ⋯+ 𝑙𝑁⃗⃗ ⃗) ∙ (𝑙1⃗⃗ + 𝑙2⃗⃗⃗ + ⋯+ 𝑙𝑁⃗⃗ ⃗)⟩
Now the expression on the right is a product of sums, which can be expanded to a sum of products,
N2 terms in all. For example, ⟨(𝑙1⃗⃗ ∙ 𝑙1⃗⃗ + 𝑙1⃗⃗ ∙ 𝑙2⃗⃗⃗ + ⋯+ 𝑙1⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗) + (𝑙2⃗⃗⃗ ∙ 𝑙1⃗⃗ + 𝑙2⃗⃗⃗ ∙ 𝑙2⃗⃗⃗ + ⋯+ 𝑙2⃗⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗) + ⋯ ⟩
We can move the brackets to the individual terms to give:
⟨|�⃗� |2⟩ = (⟨𝑙1⃗⃗ ∙ 𝑙1⃗⃗ ⟩ + ⟨𝑙1⃗⃗ ∙ 𝑙2⃗⃗⃗ ⟩ + ⋯+ ⟨𝑙1⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗⟩) + (⟨𝑙2⃗⃗⃗ ∙ 𝑙1⃗⃗ ⟩ + ⟨𝑙2⃗⃗⃗ ∙ 𝑙2⃗⃗⃗ ⟩ + ⋯+ ⟨𝑙2⃗⃗⃗ ∙ 𝑙𝑁⃗⃗ ⃗⟩) + ⋯
But this simplifies. The key is to realize that if you take the dot product of two vectors where the
angle between them is random, the expected value is 0. That means that among the N2 terms, they
85
all become zero except those representing a dot product between a little vector li and itself. There
are just N of those. And each term ⟨𝑙𝑖⃗⃗ ∙ 𝑙𝑖⃗⃗ ⟩ is just the squared length of the little vector, which is b2.
Therefore,
⟨|�⃗� |2⟩ = 𝑁𝑏2 and Lrms = N1/2b
This is a rather general result that applies not only to polymer behavior but to other kinds of physical
problems like diffusion that can be modeled as a random walk. The average (or rms) distance you
expect after taking N steps of length b is proportional to b, but is not proportional to N, but to the
square root of N.
Our reason for developing the random walk model was to use it to characterize the behavior of a
flexible chain. In this model you can see that the value of the step length (b) is going to be a
description of how stiff the polymer is. If you take a random walk with tiny steps, the path will have
the properties of a curve that is highly flexible, i.e. it will not extend very far from where it started.
How can the value of b be extracted from the behavior of a random walk path.
From before, the contour length C of the path is C=Nb. Substituting into our equation for L2, we see
that (dropping the vector notation)
<L2> = Nb2 = Cb
or
b = <L2>/C
Depending on the study, we may know the length of the polymer chain. For example if we know the
molecular weight of a large DNA molecule, and we know the molecular weight of one base pair, then
we know how many base pairs there are, and we know the spacing between base pairs in DNA is
about 3.4Å, so we can do the math and estimate C. Then, if we have a way of experimentally
measuring the average straight end-to-end distance L for the molecule, then we can get b directly.
We might get an estimate of end-to-end distance from some kind of spectroscopic experiment where
the ends of the molecule were labeled, or maybe we can visualize the molecule on an electron
microscopy grid and do a series of end-to-end measurements that way.
We can connect the random walk polymer model to our earlier topic of radius of gyration. We used
RG earlier as a way to characterize compact shapes, but it can be used to describe flexible structures
as well. The algebra is a bit messy so we will not work it out here, but it turns out that the radius of
gyration for a flexible chain is closely related to its expected end-to-end length. Specifically,
<RG2> = <L2>/6
86
That means if we can do an experiment that gives us a value for the radius of gyration for a polymer
chain, then we can estimate b by substituting into the previous equation. This is useful because there
are in fact biophysical experiments that give values for the radius of gyration. Dropping the brackets,
under the assumption that an actual experiment to measure the radius of gyration in solution would
give us a time average,
b = 6RG2/C
Finally, we need to reconcile the two models we developed here, the smooth worm-like chain and the
jointed random walk. It can be shown mathematically that the relationship between the two models
is that the statistical Kuhn length b is twice the persistence length a. That is, b=2a. The scientific
literature generally reports persistence lengths, so if an experiment is interpreted in terms of the
jointed random walk model to give b, then the persistence length a=b/2.
87
CHAPTER 9
A Brief Introduction to Statistical Mechanics for Macromolecules
Complex systems of biological molecules are often characterized by many different configurations,
which may all be at equilibrium. Handling such systems and predicting their behavior can be
simplified with an appropriate mathematical framework. The term ‘statistical mechanics’ is often
used to describe such treatments.
Probabilities and expected values
We begin with some examples involving familiar phenomena. Consider a fair die (singular for dice).
In a single roll, there are six possible outcomes, all with equal probabilities: P(1) = P(2) = … = P(6)
=1/6
What is the average or ‘expected number’ of dots that will show up in a roll of the die? The average
of 1 through 6 is 3.5, so we can correctly deduce that the expected value is 3.5. A table makes the
case more explicit.
i P(i) P(i)*(# of dots) 1 1/6 1/6 * 1 =1/6 2 1/6 1/6 * 2 = 2/6 3 1/6 1/6 * 3 = 3/6 4 1/6 1/6 * 4 = 4/6 5 1/6 1/6 * 5 = 5/6 6 1/6 1/6 * 6 = 1 P(i)=1 (P(i)*(#dots))=3.5
Formulating the problem this way illustrates a powerful and general point:
⟨property⟩ = ∑ (𝑃(𝑖) ∗ property𝑖)
outcomes,𝑖
where <property> denotes the average value of some property of interest for the system, P(i) denotes
the probability of configuration or outcome i, and propertyi denotes the value of the property for
outcome i. For the problem above, the property of interest is the number of dots showing.
The case above was trivial, but practically any property that one can construct can be evaluate this
way. As a non-obvious example, suppose you need to know what the expected value is for the square
of the number of dots that shows up on the die. You might want to know this if someone offered to
roll a die and pay you $1 for rolling a 1, $4 for rolling a 2, $9 for rolling a 3, and so on, and asked you
88
how much you would be willing to pay to play this game of chance. The answer is easy to obtain from
the equation above.
<dots2> = (1/6)*12 + (1/6)* 22 + (1/6)*32 + … + (1/6)*62 =15.1
So, paying anything less than $15.10 is a favorable bet for you.
Statistical weights for outcomes with unequal probabilities
In order to handle problems of real interest, for example where different molecular arrangements
have different energies and therefore different probabilities, it is convenient to introduce a scheme
for handling unequal probabilities. The main equation above is suitable for unequal probabilities,
but sometimes the probabilities of the different outcomes are not given directly. Instead, we are
often given relative probabilities between different outcomes – think about the meaning of an
equilibrium constant for example. Relative probabilities are sometimes referred to as statistical
weights, and denoted wi.
For an application, let’s consider the case of a strange die whose outcomes are not equally likely, but
instead the probability of rolling any given number of dots is twice as high as the probability of rolling
one fewer dots. In other words, consider the case where P(i+1) = 2*P(i). We can work out the
behavior of this system as before, but starting with relative probabilities or statistical weights, and
then converting them to individual probabilities according to the equation, P(i)= wi/wi .
i wi P(i) P(i)*(# of dots) 1 1 1/63 1/63 * 1 =1/63 2 2 2/63 2/63 * 2 = 4/63 3 4 P(i)= wi/wi 4/63 4/63 * 3 = 12/63 4 8 8/63 8/63 * 4 = 32/63 5 16 16/63 16/63 * 5 = 80/63 6 32 32/63 32/63 * 6 = 192/63 wi=63 P(i)=1 (P(i)*(#dots)) = 5.1
Above we worked out the problem by converting weights explicitly to probabilities, but the
formulation can also be written directly in terms of the weights without explicitly writing out
probabilities. By substituting P(i)= wi/wi into the equation above for the average value of some
property, we get
⟨property⟩ =∑ (𝑤𝑖 ∗ property𝑖)𝑖
∑ (𝑤𝑖)𝑖⁄
Applied to the die problem above, where the property of interest is again the number of dots showing,
this equation gives <# of dots> = (1*1 + 2*2 + 4*3 + 8*4 + 16*5 + 32*6)/(1+2+4+8+16+32) = 5.1, in
agreement with the value in the Table above. Note that in setting up the statistical weights, one is
free to choose the first configuration (or any other) as a reference whose statistical weight is set to 1.
89
It would be slightly less convenient numerically, but one could set the weight to be 1 for one of the
other configurations and obtain the same answer in the end, as long as the correct relative weights
get assigned.
As a final example with the die, to show how simple it is to evaluate any property as long as it can be
evaluated for each outcome, consider the strange die from before, where P(i+1) = 2*P(i), and evaluate
the expected value for the squared number of dots that would show up in one roll. Without needing
to construct a table,
<dots2> = (1*12 + 2*2*2 + 4*32 + 8*42 + 16*52 + 32*62)/(1+2+4+8+16+32) = 27.4
So you wouldn’t want to pay more than $27.40 to play a game where this strange die is rolled and
you get payed the square of the number of dots showing.
Handling degeneracies
For a complete treatment we need just one more element. There are often systems where a single
kind of state can be obtained in several different ways. We saw this type of situation earlier in the
course when we were evaluating the number of distinct ways that a particular state could be
constructed (e.g. by exchanging the identities of molecules of like type). The same situation arises
here. We assign the variable gi to the degeneracy of arrangement i. With that adjustment, the two
main equations above can be re-written, replacing wi everywhere it appeared with wigi, to give these
two general equations:
𝑃(𝑖) =𝑤𝑖𝑔𝑖
∑ (𝑤𝑗𝑔𝑗)𝑗⁄
and
⟨property⟩ =∑ (𝑤𝑖𝑔𝑖 ∗ property𝑖)𝑖
∑ (𝑤𝑖𝑔𝑖)𝑖⁄
As we will see later, for molecular systems the weights relate to equilibrium constants between
different molecular configurations, which are in turn related to the energies of the configurations (i.e.
by exponentiating the negated energies after dividing by kT or RT, as usual). The denominator in the
equation above therefore takes the form of a sum over Boltzmann-like terms for the different
configurations. That summation has a special role in statistical mechanics applications and is
sometimes referred to as the partition function, and sometimes replaced with the notation Q or Z,
depending on the text or context. The student is referred to texts on statistical mechanics for a
treatment of how the dependence of the partition function (on temperature for example) can make
it possible to evaluate thermodynamic state variables for a system.
We will turn instead to see how the equations above can be applied to evaluate the behavior of
various physical properties of complex biological molecules.
90
A Statistical Mechanics Treatment of the Helix-Coil Transition for a
Polypeptide
A polypeptide that has a tendency to fold up into a single alpha helix serves as a classic example of a
system with a series of possible conformations ranging from a fully unfolded ‘random coil’ to a fully
formed alpha helix. Early treatments of this systems came in the late 1950’s by Bruno Zimm and J.K.
Bragg and are sometimes referred to as Zimm-Bragg models. Here we consider a simplified version
sometimes referred to as a zipper model. Along a single pathway of conformational transitions from
coil to helix, a first turn of helix forms when around 4 amino acid residues come into the right
conformation to form a backbone hydrogen bond (i to i+4) characteristic of an alpha helix. From
there the helix can propagate by extension, in a step-by-step addition of more amino acid residues to
the helix, eventually reaching the fully helical form. A cartoon diagram is below.
The propagation parameter s,
describes an equilibrium
constant for adding another
residue to the helical
conformation, leading to one
more hydrogen bond. But a
key element of the model is
that the first step is different.
In order to form the first
hydrogen bond, several
amino acid residues must all adopt a specific conformation. That comes at an entropic cost, which
contributes an extra (opposing) term to the equilibrium for the first step. That effect is described by
the nucleation parameter, . The values of s and depend on the amino acid type that comprises the
polypeptide. But for a typical case of interest, s is slightly greater than 1 and is much smaller than
1.
To work out a statistical mechanics treatment it is convenient to re-draw the system symbolically,
using a ‘C’ to denote an amino acid in the random coil conformation and ‘H’ to denote an amino acid
in the helix conformation, as shown below. From this diagram we can see how we might assign
statistical weights and degeneracies to the configurations. For the statistical weights, we can begin
by assigning 1 to the first conformation as a reference. Then the statistical weights for the other
forms can be assigned by taking into account the equilibrium constants for each step in a cumulative
fashion. Recall that for a multi-step equilibrium, the total equilibrium constant between two species
is the product of the equilibrium constants for the separate steps between them.
91
The degeneracies in this
problem relate to how
many locations can be
chosen for the helical
region. [In our simplified
model only one helical
segment is allowed in the
polypetide.] For the
unfolded form, the entire
polypeptide is in the ‘C’
conformation, so there
are no choices to be made
and therefore no
associated degeneracy
(g=1). When we
introduce a segment of 4
amino acids to nucleate
the first helical turn, then
we have a choice for the
location of that segment.
The total number of distinct places where that segment can be chosen is N-3, where N is the total
number of amino acids, so the degeneracy g for that conformation is N-3. [This can be seen by noting
that if N was 4, there would be only one choice (consistent with N-3) for the location of the segment,
if N was 5 there would be 2 choices (again consistent with N-3), and so on.] Another way of
understanding the meaning of the degeneracies is to realize that the specific drawing provided is just
one representation of the multiple (N-3) different configurations that could have been drawn having
a 4-residue segment in the helical conformation. As we move further to the right along the multi-step
equilibrium, the degeneracy drops by one in each step, as there are fewer and fewer different choices
for selecting a longer helical segment, until at the end there is no choice and the degeneracy is back
to 1.
Once we have the weights and degeneracies, we can calculate the behavior of a system. Here we
might want to evaluate the overall degree of helical folding in the polypeptide. Some molecules in
the system will be less helical and some will be more helical, but we can evaluate the average, and we
can also see which forms are more or less populated. The figure below shows how this kind of
calculation can be done easily with a spreadsheet (like in Excel), where the weights and degeneracies
can be filled in, the individual probabilities of the states can be evaluated, and the distribution of
states can be plotted. The figure below illustrates a case were N=40, =0.01, and s=1.1.
92
Looking at the probability distribution of the conformations, we see that compared to the fully
unfolded state, the state with just one helical turn is poorly populated, but as we move further to the
right it is increasingly probable to find states with more helical character. In other words, it is hard
to begin the process of folding, but easier to continue it once it has begun. There is also a peculiar
behavior towards the very right, where we see a drop in the likelihood of finding fully folded states.
Mathematically, this comes from the lower degeneracies for the fully folded states. In terms of
structure, the consequence is that the ends of the polypeptide tend statistically to be unfolded or
floppy.
Setting aside the peculiar behavior at the far right of the diagram, the rest of the picture exhibits the
general property of being hard to begin and easier to continue, which is a hallmark of cooperative
processes. Another common property of cooperative systems is a tendency to show a sudden or
93
steep response to changes in certain
parameters of a system, like
concentration or temperature for
examples. We can look at the behavior
of the zipper model above as a
function of the value of the
propagation parameter s. In reality s
might depend on temperature, so the
dependence on s could also be an
illustration of dependence on T for
example. From our equations above
you can see that it is a relatively simple
matter to evaluate the average
number of hydrogen bonds in our
system if we are given N, and s. We
can convert this to a fractional degree
of helicity by dividing the average
number of hydrogen bonds by the
maximum number possible (N-3).
Repeating the calculation of fractional
helicity as a function of s (keeping
N=40 and =0.01) gives the behavior
shown, where the dependence on s shows a relatively sharp transition.
Our calculations on this model illustrate the power of statistical mechanics approaches to
characterize the behavior of complex systems. The specific behavior of this system also illustrates
general themes that underlie macromolecules and biochemical systems, in particular the appearance
of cooperative phenomena, sharp transitions, and a high sensitivity to physical parameters.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.8 0.9 1 1.1 1.2 1.3 1.4
frac
tio
nal
hel
icit
y
s
helix-coil zipper model - helicity as a
function of s (N=40, =0.01)
94
CHAPTER 10
Cooperative Phenomena and Protein-Ligand Binding
Relationship between cooperative behavior and processes involving formation of multiple
interactions simultaneously
Consider the behavior of a reaction at equilibrium wherein n molecules of A come together to form
an assembly B (and where no intermediates with fewer than n units of A are allowed).
nA B
What does the concentration of B look
like as a function of increasing
concentration of A? This is easy to
evaluate from K = CB,eq/(CA,eq)n, and CB,eq
= K * (CA,eq)n. Evidently, the
concentration of B depends on A
according to a simple polynomial whose
exponent is the stoichiometry of the
association, n. This leads to a surprising
interpretation when one looks at how a
polynomial term behaves with
increasing exponent, n. This is shown
below, where for simplicity the
equilibrium constant K is taken as 1 for
all cases.
The result is remarkable. For large n
(emphasized in red in the figure), you
see an effective maximum value of [A],
after which further addition of molecule
A would lead suddenly to formation of B
(in order to avoid a concentration of A exceeding its effective upper limit). B is effectively absent at
lower concentrations of A. Not surprisingly, this resembles the sort of thing you expect for a curve of
precipitation as a function of concentration; at the solubility limit of the substance, further addition
of that component to solution leads only to a solid-state form of the solute molecule.
Another interesting comparison is to micelle formation by an amphiphilic detergent. There, many
copies of the detergent molecule come together in the form of a micelle, where the non-polar lipid
95
tails of the detergent molecules project into an
entirely hydrophobic core, with the polar or
charged head groups of the lipids exposed on the
surface of a roughly spherical supramolecular
assembly.
Since only complete micelles fully shield the
hydrophobic tails, partial micelles are hardly
populated. This corresponds to a very high level
of cooperativity for the assembly process. Much
like the case of aggregation or precipitation at the solubility limit, there is a limit to the concentration
of the detergent monomer, after which further added detergent leads only to the micelle form. In
analogy to the polynomial plots above, plotting the concentration values where the detergent
monomer would be at equilibrium with micelles would give a curve with a sharp break, as shown.
The behavior can also be plotted a different way, with the x-axis representing the total detergent
added to the system (which is a convenient independently controllable variable), and separate curves
shown for the concentrations of the monomer and the micelle. That scheme is also shown.
Both plots indicate the key concentration above which the monomer cannot be taken. That is called
the critical micelle concentration (or CMC) and is a particular property of a detergent; it depends on
tail length, number of tails, size and charge repulsion of the head, etc. The overarching idea is that
high-order transitions tend to give rise to sharp transitions, reminiscent in some ways of typical
phase transitions.
96
Protein-ligand binding equilibria
The binding of ligands (substrates, cofactors, inhibitors) to proteins (or nucleic acids) sometimes
shows cooperative behavior and sharper-than-usual transitions. This is usually seen in oligomeric
proteins or enzymes – the coordinated action of multiple subunits bound together makes the
cooperativity possible. The case of hemoglobin, with four heme groups and four protein subunits, is
well-known.
Before we tackle the case of cooperative binding by multiple binding sites in an oligomeric protein,
we will analyze the simple (non-cooperative) binding behavior of a single protein subunit (P) and its
ligand (A).
P + A PA
Note that the K here is an equilibrium association constant, not a dissociation constant, as is
sometime written for substrate dissociation in enzyme kinetics treatments.
We introduce a binding parameter, v, to describe the extent of binding, i.e. the average number of
ligands bound to any given protein molecule.
v = (# or concentration of bound ligands)/(# or concentration of protein molecules) (0 v 1).
At equilibrium, K = [PA]/([P][A]). Note that the [A] in this equilibrium equation is the concentration
of free A molecules, not the total A concentration, which would include ligands bound to the protein.
And [P] is the concentration of unbound protein. Taking the ratio of [PA] to [P] gives an expression
that is useful in later substitutions:
[PA]/[P] = K[A] (which is a unitless fraction).
From the definition of v, we can see that
v = [PA]/([P]+[PA])
Dividing through by [PA] gives
v = ([PA]/[PA]) / ([P]/[PA] + [PA]/[PA])
Then substituting from above,
v = 1/(1/(K[A]) + 1) = K[A]/(1 + K[A])
97
This gives the familiar hyperbolic curve for binding when the extent of binding is plotted versus free
ligand concentration. Binding is half-saturated when [A] = 1/K.
As you know, simple enzyme
kinetics share this behavior.
This is to be expected since, at
least in the simpler
mathematical treatments, the
catalytic event is preceded by a
binding event at equilibrium,
and the rate of the reaction is
proportional to the
concentration of the enzyme-
substrate complex, usually
denoted [ES]. In that case, the
reaction rate over the maximal
rate (at very high substrate), v/Vmax, is equal to [ES]/([E] + [ES]). By analogy to the equations above
for ligand binding (matching [E] with [P] and [ES] with [PA]), and using a dissociation constant Kd for
the enzyme case that would be the reciprocal of the binding association constant K, one gets v/Vmax
= ([S]/Kd)/(1 + ([S]/Kd)) = [S]/(Kd + [S]), or [S]/(Km + [S]) where the Michaelis-Menten constant Km
would be equal to Kd. The reaction rate is half its maximal value when [S] = Km. This behavior (which
should be relatively familiar) matches precisely what we’ve done here for ligand binding – only the
variable names have changed.
Binding to an oligomeric protein – independent binding events, no cooperativity
What about binding to an oligomeric protein with
multiple binding sites (e.g. one per subunit)?
Suppose the multiple sites are identical and
independent. If for the case of the oligomer we
express the binding parameter as the average
number of ligands bound per trimer, then v = (#
ligands at site 1 + # ligands at site 2 + # ligands at
site 3)/(# of protein trimers). The number of
ligands at site 1 per trimer would be the same as
above for binding to a monomer (K[A]/(1 + K[A]).
And the same for binding at site 2 and site 3. So, v = 3K[A]/(1 + K[A]). And v would be between 0
and 3. The behavior has the same character as before – e.g. hyperbolic saturation – all that is different
is the multiplicative factor of 3, which arises simply because we’re expressing the binding per trimer
instead of per monomer. This is the expected behavior for binding to multiple identical and
independent sites. It generalizes readily to n sites:
98
v = nK[A]/(1+ K[A]) (0 v n)
Extracting K (and n) from equilibrium binding measurements
You’ll remember from earlier coursework in enzyme kinetics that, for extracting parameters from
graphs, it can be convenient to do algebraic rearrangements so the hyperbolic function becomes
linear in some variables. Analysis of binding data is simplified by using a rearrangement to give a so-
called Scatchard plot. [The Scatchard plot closely resembles a rearrangement used sometimes in
enzyme kinetics, the Eadie-Hofstee plot – the two kinds of plots are related by exchanging x and y
axes]. In our case we begin with
v = nK[A]/(1 + K[A])
Then multiplying through by the denominator on the right gives
v + vK[A] = nK[A] then v = nK[A] – vK[A] and dividing by [A] gives
v/[A] =nK – vK
So, plotting v/[A] vs v should give a straight line with slope –K, and x-intercept equal to n (y-intercept
of nK), as shown on the plot on the left.
Sometimes it is difficult to obtain a value for v as we have expressed it here for binding to an oligomer,
since the oligomeric state of the protein may be unknown at the outset, preventing one from knowing
what the concentration of the protein is in terms of # of oligomers per volume. It is sometimes easier
to establish the fractional binding, f = v/n, from experimental measurements. For example, if binding
of a ligand causes some measurable change in the system – e.g. maybe the protein has a tryptophan
whose fluorescence changes when a ligand binds – then one can add a certain amount of ligand and
compare the change in the measured property to the maximum possible change (e.g. by adding excess
ligand to the point of saturation). The ratio of the change observed to the maximum possible change
would be a measure of the fractional binding, f. The algebra would be the same as above. Dividing
99
the final equation above by n on both sides would give f/[A] = K –fK. As shown in the plot on the
right, the analysis is the same, but no attempt can be made to establish the number of binding sites,
n.
Non-linear Scatchard plots – non-identical or non-independent binding sites
What might cause a non-linear Scatchard plot? If the protein sample is impure, it might contain
slightly different forms of the protein of interests – a mixture of phosphorylated vs non-
phosphorylated forms is just one example – whose binding affinities for a ligand might be different.
Or, perhaps the protein of interest really has multiple
distinct binding sites that have evolved to have different
affinities. A cartoon of a case with two binding sites of
one type and a single binding site of another type is
shown.
For a case where there are different types of binding sites, if the binding events are independent (not
cooperative), then the binding behavior is simply additive, with terms matching those from before:
𝑣 = ∑ (𝑛𝑖𝐾𝑖[𝐴]
(1 + 𝐾𝑖[𝐴])⁄ )
types, 𝑖
If the binding affinities (Ki) for the
different kinds of sites are not equal,
then the Scatchard plot cannot be
straight. Reasoning that the left side of
the curve in a Scatchard plot
corresponds to initial binding at low
ligand concentration to the highest
affinity sites, and noting that the slope
relates to the binding constant K, we
can see that the curve should be steeper
on the left, and therefore bent as shown.
If the affinities of two different kinds of
sites are different enough it may be
possible to extract separate binding constants from different parts of the curve. This may not be
possible for binding constants that are not so different from each other, and practically impossible if
there are more than two types of binding sites. In those case, if accurate data are recorded over a
wide range of ligand concentration it may be possible to analyze the detailed behavior using
sophisticated computer fitting software.
Our discussions above have all assumed that the binding events are independent – i.e. no
cooperativity arising from communication between sites. We will deal in more detail later with
100
cooperative binding, where binding of a first ligand promotes binding of subsequent ligands to other
sites in the same oligomer, but for now we can simply anticipate what effect that would have on a
Scatchard plot. The behavior would effectively be the reverse of the case above where we had sites
that were independent but naturally different in affinity. In that case we naturally tended to fill the
high affinity sites before the low affinity sites. But with cooperative binding, the first binding event
is harder and the later binding events are easier, which is the reverse. So, our Scatchard curve would
curve downward for cooperative binding, as shown on the left.
Whereas a straight
Scatchard plot
corresponded to an
ordinary hyperbolic
binding curve when
plotting v vs [A], a
downward curving
Scatchard plot of
the type shown
above, resulting
from cooperative
binding, would correspond to a sigmoidal shape if the binding data were plotted as v vs [A], as shown
on the right. We will discuss cooperative behavior more rigorously later.
Experiments for measuring binding
Classic method – Equilibrium dialysis
The classic method for studying binding equilibrium is
equilibrium dialysis. The protein is placed in a dialysis
bag that allows the ligand to cross but not large molecules
like the protein. Ligand is then added, which equilibrates
between the inside and the outside. Outside the bag, the
ligand exists only in its free form. Inside the bag, the
ligand exists in two forms: free and bound to the protein.
At equilibrium, the concentration of free ligand inside the
bag must equal the concentration of ligand outside the
bag. That means if you measure the concentration of total
ligand inside the bag, and then subtract the concentration
of ligand outside the bag, you have a measurement of the
concentration of the ligand in its bound form, that is [PA]. From
[A]total,inside = [A]free,inside + [PA]inside and [A]outside = [A]free,inside
101
[PA] = [A]total,inside – [A]outside
Then, the binding parameter v can be obtained by dividing [PA] by the total concentration of protein
that was placed inside the bag (or the concentration of protein oligomers if the oligomeric state of
the protein is already known).
A modern method – Isothermal titration Calorimetry (ITC)
This experiment is based on the
expectation that there will be some heat
(H) associated with the binding event.
The protein is held in a sample chamber
that is kept at constant temperature.
Ligand is added slowly in a series of
small increments. After each
incremental addition, the instrument
measures the heat transfer required to
keep the protein sample at a constant
temperature.
The amount of heat transferred during
equilibration at each step is plotted;
note that H is often < 0 for binding. A
typical readout looks something like
this:
As more and more ligand is added, the system begins to saturate. The individual peak areas
correspond to the amount of heat released and therefore to the amount of additional ligand that was
102
bound in that incremental step. Therefore, the total amount of ligand bound can be obtained by
accumulating the integrated peak heights. This leads to a more traditional looking binding plot.
Note that in some experiments, like ITC, it is easy to know the total amount of ligand present in the
system and harder to know the free amount – contrast that with the equilibrium dialysis experiment
where the free A concentration was evident from the concentration of A outside the bag. Not being
able to plot the free A concentration makes it a bit harder to analyze binding curves with the usual
tricks (like identifying the point of half saturation and estimating the Kd or 1/K from the free ligand
concentration at that point). Computer software is usually used to interpret the binding constant
(and whether multiple binding sites might be present) from ITC data.
Various spectrophotometric methods
As noted earlier, if there is some kind of spectroscopic experiment that gives a different reading for
the ligand-bound protein PA compared to the unbound protein P, then it is often possible to
determine what fraction of the total protein exists in the two forms; doing this as a function of ligand
concentration then enables determination of binding constants. The algebra is reminiscent of the
way we looked at measuring the extent of protein folding vs unfolding earlier.
If we let the variable P denote the value of some spectroscopic property – maybe the natural
tryptophan fluorescence of a protein if it is affected by ligand-binding – then assuming that
spectroscopic contributions are additive,
Pmeas = f * PPA + (1-f) * PP
where Pmeas is the value of some spectroscopic property measured after addition of some specific
amount of ligand, PPA is the value you expect to obtain for the protein in its bound form, and PP is the
value you expect for the unbound protein. As before, f is the fractional binding. Now, realizing that
PP and PPA can be obtained by doing the spectroscopic experiment with no ligand added and with
saturating ligand added, we can change the notation above to give:
Pmeas = f * Psaturating A + (1-f) * Pno A
which rearranges to give:
f = (Pmeas - Pno A) / (Psaturating A - Pno A)
This makes sense since it is really just a ratio between a partial change and the maximum possible
change.
103
Phenomenological treatment of cooperative binding- the Hill equation
We turn now to the case of cooperative binding to multiple sites, such as in an oligomeric protein
composed of several identical subunits.
The limiting case of perfect cooperativity
To establish the limiting case for cooperative behavior, we examine an idealized situation of “all –or-
none”, which essentially means perfect cooperativity. Either 0 or n ligands can be bound to a
particular oligomer.
P + nA PAn
Note that this formulation implies that there is no formation of partially bound forms, PA1, and so on.
As before, anticipating the usefulness of having a ratio of the bound form to the unbound form, we
can write K = [PAn]/([P][A]), and then
[PAn]/[P] = K[A]n
From the meaning of the binding parameter v as the number of ligands bound per oligomer, we know
that
𝑣 =𝑛[𝑃𝐴𝑛]
([𝑃] + [𝑃𝐴𝑛])⁄
With the same rearrangements as we used before for binding to a monomer – namely, dividing the
top and bottom by [PAn] and then substituting the term K[A]n for [PAn]/[P], we get
𝑣 = 𝑛𝐾[𝐴]𝑛
(1 + 𝐾[𝐴]𝑛)⁄
How does this binding curve behave as a function of ligand concentration? Clearly it begins at v=0
for [A]=0. And it saturates as expected, getting ever closer to 1 as [A] gets very large. In those ways
it is similar to the binding equation we developed for a monomer, which was v = 1/(1+K[A]). But the
key distinction is in the exponent applied to the ligand concentration, [A]. From our earlier
discussions you should appreciate the consequences of that exponent; it creates a sharper transition
in terms of [A]. As a result, the binding curve will not be simply hyperbolic like before, but there will
be a region where the curve exhibits steeper behavior. In other words, we get a sigmoidal curve of
the type you’ve seen before (probably in the context of cooperative oxygen binding to hemoglobin).
Binding curves calculated from the equation above for perfect cooperativity are shown here (taking
K=1 for convenience).
104
Realistic case – partial cooperativity
Real molecular binding processes are
never perfectly cooperative; their
behaviors fall in between the perfect case
and the simple case of independent
binding. By comparing the equations we
obtained for the two cases, you’ll notice
that the only difference is in the exponent
that gets assigned to the ligand
concentration. In the case of independent
binding events (or binding to a
monomer), the exponent was 1, whereas
in the case of perfect cooperativity and n
binding sites, the exponent was n.
From this comparison, A. V. Hill
generalized the binding equation to
intermediate cases where the cooperativity would not be perfect. The binding equation becomes
𝑣 = 𝑛𝐾[𝐴]𝑥
(1 + 𝐾[𝐴]𝑥)⁄ (1 x n is the allowable range for positive cooperativity)
where x is used as the exponent and is called the Hill coefficient. There is frankly no mathematic
justification for this equation. But what it does allow for is a way to compare observed binding
behavior to an equation where the exponent that gives the best fit to the data is some indication of
the degree of cooperativity. That is, if the observed binding data for a case where there are 4 binding
sites (n=4) is best matched by the Hill equation when x is chosen to be 2.8, then this gives you a sense
of the degree of cooperativity.
The standard treatment for analyzing observed binding data according to the Hill equation goes as
as follows. From the equation above, multiplying through by the right-side denominator, and then
subtracting the second term on the left from both sides gives
v/(n-v) = K[A]x
It can be more convenient at this point to switch to fractional binding, f = v/n, to give
v/(n-v) = (v/n)/(n/n – v/n) = f/(1-f) =K[A]x
105
Then taking logs on both sides gives
ln(f/(1-f)) = ln K + x ln[A]
Evidently, a plot of ln(f/(1-f)) vs ln[A] should have
a slope of x. When one plots real binding data this
way, the result is invariably a curve rather than a
line with constant slope. That illustrates a
weakness of the Hill equation – it is not founded
on any underlying physical model of binding.
Nonetheless, the slope of a Hill plot at its steepest
point remains a useful description of the degree
of cooperativity.
Physical models of cooperative binding - MWC
Monod, Wyman, and Changeaux (MWC) (among others) developed explicit models to explain how
cooperative behavior could emerge in biological systems. Their idea was simple and elegant, and the
underlying principles have turned out to be strongly supported by a wealth of detailed structural
investigations in diverse systems over several decades.
The key elements of the MWC model are as follows:
an oligomer, with each subunit having a binding site
(at least) two different conformations are possible for the protein subunit and its binding site,
and those alternate conformations have very different affinities for the ligand. [In the limiting
case, only one of the conformations can bind the ligand]. The high affinity binding site
conformation is designated R and the weak (or forbidden) binding site conformation is
designated T.
symmetry must be preserved, so that all the subunits in any one oligomer are all in the same
conformation.
No other assumptions are required. As you might recall from earlier studies, and from some of our
previous discussions, cooperativity invokes the idea of non-independent events, and is often
discussed in terms of ‘communication’ between different binding sites, i.e. binding at one site
promotes binding at the other sites. But you’ll note that those ideas are not explicit aspects of the
MWC model. Yet, the tenets of the model lead to that apparent behavior.
We can sketch out the behavior of a tetrameric system with 4 subunits. We will designate the R form
of the subunit (which has high affinity for the ligand) as a circle, and the T form as a square. We will
assume all the possible forms of the oligomer – in terms of conformation and ligand binding – are all
at equilibrium. But note that most of the possible forms of the oligomer are disallowed by the element
106
of the MWC model that requires all the subunits in one oligomer to have the same conformation (R
or T). That is, we don’t need to consider the R3T1 conformation, and so on. Only the symmetric forms
need to be written out. And if we take the limiting case where the T form cannot bind ligand at all,
then we are left with just a few configurations to consider. The R forms of the oligomer can be bound
to a number of ligands from 0 to 4, and the equilibrium between those forms is affected by the ligand
binding constant K (and the ligand concentration). And an equilibrium constant L must be written
to describe the relationship between the T form of the tetramer and the (unbound) R form of the
tetramer.
How can this scheme (which doesn’t explicitly invoke ‘communication’ between different binding
sites) give rise to cooperative behavior? We can get an understanding of this from two perspectives.
First, by mass action, addition of a ligand to one site in an oligomer drives the other subunits into the
high affinity configuration; that follows from the requirement stipulated in the model that symmetry
has to be preserved in an oligomer. It is the requirement of the subunits to adopt the same
conformation that gives the effect of communication between sites.
107
The other view of how the MWC model creates
cooperative behavior is statistical, relating to the
probability (or concentration) distribution of the
distinct forms of the protein. We can analyze the
situation above in terms of the R forms of the protein
binding ligands at the four sites totally independently,
plus the extra T form. For independent binding to the
R forms, the concentration ratio for incrementally
bound forms would go up by the same ratio in each
step. From there you can see that at some
concentration of ligand there will be substantially
more R4 than R3, and more R3 than R2, and so on,
meaning that, considering the R forms by themselves,
R4 will be the dominant form, with little R0, R1, R2, and
R3. But what about the T form. If the equilibrium constant L between the T0 form and the R0 form is
high enough, then T0 will be well-populated even if R0 is low. Now think about plotting the
distribution of the forms of the oligomer that have 0, 1, 2, 3, or 4 ligands bound; taking into account
that the concentration of oligomers with 0 ligands bound includes both R0 and T0. As you can see, the
distribution of the different forms is concentrated at the extremes, which is a hallmark of a
cooperative system.
Exactly what kind of behavior do models like MWC predict? Our statistical mechanics tools let us
answer that in straightforward fashion. We start with the case of n=2. But first a comment about
statistical weights. When we work out the statistical weights for a problem that involves binding, we
need to come up with a (unitless) ratio that relates the forms that arise by sequential addition of a
new ligand. From K= [PA]/([P][A]), we can see that the ratio of [PA]/[P] is the familiar term K[A]. It
is this term, and not the equilibrium constant by itself, that we need to multiply cumulatively in each
sequential step of our reaction. Our statistical mechanics terms are:
108
Note that the degeneracies have to account for the combinatorial ways for choosing which subunits
will have ligands bound. Now we can calculate the average number of ligands bound, which is v,
according to our familiar rules for evaluating the expected value of some property. We get
𝑣 =(𝐿 ∗ 1 ∗ 0) + (1 ∗ 1 ∗ 0) + (𝐾[𝐴] ∗ 2 ∗ 1) + ((𝐾[𝐴])2 ∗ 1 ∗ 2)
𝐿 + 1 + 2𝐾[𝐴] + (𝐾[𝐴])2
As an aside, you can show that if L=0 (meaning that we have removed the element of the model that
is critical for cooperativity, leaving only the R forms, which bind ligands at their sites independently),
the equation above reduces to the equation we developed earlier for binding to identical and
independent sites: v = 2K[A]/(1+K[A]), as it should.
Similar equations to the one above for n=2 can be developed for higher values of n, using the same
statistical mechanics treatment. With these equations, with judicious choices of K and L, one can get
binding behavior that exhibits the features we expect for cooperativity. Binding and Hill plots are
shown below, as calculated from the MWC model (with n=4) using the statistical mechanics approach
above.
109
Advantages of cooperative behavior
Why are these kinds of cooperative binding (and catalytic) phenomena useful or advantageous in
biological settings? For one, steep response curves generally allow for better, tighter control of a
system. In a non-cooperative, hyperbolic binding scenario, a certain fractional increase in the ‘input’
(i.e. ligand concentration) leads always to a smaller fractional change in the ‘output’ (i.e. binding). In
contrast, a sigmoidal curve allows a large change in response to a smaller fractional change (e.g. in
the ligand concentration). This feature is key to the ability of hemoglobin to have very different
affinities for oxygen (thereby allowing efficient uptake and release) at oxygen concentrations in the
lungs and in muscle tissue that are not very different. Similar advantages apply to cooperative
enzymes. Their activities can increase more substantially in response to smaller increases in the
concentration of
the substrate.
This allows for
tighter metabolic
control in a
system and also
enables more
‘on-off ‘type
signaling in the
cell.
110
Allostery
Roughly translated, allostery means ‘other spatial arrangement’. In the context of macromolecules,
it describes the general phenomenon wherein binding of one compound to a protein or nucleic acid
at one site affects the conformation elsewhere, with diverse consequences for activity. You may
remember the well-studied case of hemoglobin, where binding of effector molecules (including 2,3-
bisphosphoglycerate, CO2, and protons) affects the protein conformation some distance away where
oxygen binds. Similarly, effector molecules can bind to allosteric sites in enzymes and affect the
catalytic properties of the active site, which may be in a distant region of the protein. Or, binding of
effectors can control signaling pathways by affecting molecular recognition events. In some cases,
allosteric regulation occurs together with cooperative phenomena in an oligomeric protein (like
hemoglobin), but it can also occur in simpler scenarios in a single protein subunit. Allosteric
regulation is a deep subject with
diverse manifestations in
molecular biology, but a unifying
theme can be articulated in a
scheme where a protein has two
available conformations, and the
two conformations have different
affinities for the effector, as
shown.
If we say that the conformation on
the right has a higher affinity for
the effector than the
conformation on the left (i.e. K3 >
K1), then we must also conclude
that K4 > K2 (since the two
different routes from the top left
to the bottom right must give the
same total equilibrium constant,
meaning that K2*K3 = K1*K4). The interpretation of K4 > K2 is that binding of the effector shifts the
conformational equilibrium to the right. If K1 K3 and K2 K4, we would say that there is a
thermodynamic linkage between the effector binding and the conformational change. The shapes of
the two alternate conformations of the protein are drawn in this
diagram to emphasize that the two conformations may be different in
multiple ways, e.g. at multiple different locations. How this happens
depends on the detailed structure of the protein. But if you recognize
that proteins have a certain degree of structural rigidity, you can
imagine any number of different ways where the movement of atoms
at one location can propagate to another site. As just one example, if
the protein molecule is composed of two relatively rigid domains, but
111
the relative position of those domains can change, then the conformation at one location will be
coupled to the conformation elsewhere.
We can use our statistical mechanics framework to analyze the simple allosteric scheme above. We
might want to know what fraction of the total protein molecules would be in the conformation on the
right (with the pointed blnding cleft for the effector and the larger opening on the top surface), as a
function of the effector concentration. From our thermodynamic reasoning above, we can anticipate
that higher effector concentration will cause more of the protein to be in the conformation on the
right by mass action, but exactly how much? There are no degeneracies to worry about in this case,
and the weights follow from the equilibrium constants, and the effector concentration. With these
values, the fraction of the protein in
the conformation on the right (with
the top binding site more open)
would be:
(K2 + K1K4[E]) / (1 + K2 + K1[E] +
K1K4[E])
(equivalent expressions are
possible with K1K4 = K2K3). A plot of
this behavior is shown for judicious
choices of the equilibrium
constants.
Catalytic cycles linked to conformational changes linked to motion = molecular motors
Nature has evolved a wide range of molecular motors whose operation are beyond extraordinary.
They are essentially all based on some kind of allostery combined with cycles of catalysis. Bear in
mind that when the active site of an enzyme goes through a series of reactions, it goes through a
sequence of events where it is empty, then bound to substrate, then bound to product, then empty
again, and so on. If those binding states each favor different conformations of the protein, then
ongoing catalysis will drive the protein through a cyclical series of conformations. How such events
can be linked to larger scale movements of the type one would call a motor varies, but the classic case
it the rotary F1-ATPase. In a remarkable example of intuition and foresight, in the 1970’s Paul Boyer
(UCLA) predicted that ATPase would act like a cyclic motor using a ‘binding-change’ mechanism
based on careful biochemical experiments and an understanding that the trimeric assembly had
three active sites, but without the benefit of knowing the three-dimensional structure of ATPase. It
was almost 20 years afterwards that Andrew Leslie working in the laboratory of John Walker
determined the crystal structure of ATPase, revealing that indeed the head of the ATPase (composed
of the alpha and beta subunits) has the structure of a wheel that rotates on an axle formed by the
gamma subunit in an extended alpha helical structure. Together with Jens Skou (who discovered a
112
different ATP-driven ion transporter), Boyer and Walker shared the Nobel Prize in Physiology or
Medicine in 1997.
113
CHAPTER 11
Symmetry in Macromolecular Assemblies
Definition of Symmetry
Symmetry is an important subject in essentially all branches of science, and the arts as well. Loosely
defined, we think of something that is symmetric as being repetitive in some way, being composed of
multiple copies of an underlying subunit. In scientific applications, symmetry has a precise meaning.
An object is symmetric if there is some physical operation we can do to it that leaves it invariant (i.e.
indistinguishable from the way it appeared before). The operation in question is usually an isometry,
that is a physical movement in space that preserves distances. Those operations include rotations in
space and mirror inversions. However, since biological macromolecules are chiral and exist in just
one of two possible hands or enantiomers, for our purposes we can dispense with mirrors and
inversions (i.e. so-called ‘operations of the second kind’) and focus on rotations. We are lucky in that
regard as it leads to considerable restrictions upon an otherwise larger variety of symmetry types that exist in three-dimensions.
We will shortly work through all the
possible symmetries in three-dimensions,
but we start with one example here. The
assembly shown is comprised of three
copies of the same subunit rotated 120°
and 240° relative to each other. As you
can see, if we rotate the entire assembly
by 120°, the result is indistinguishable
from the initial configuration. In fact there are exactly three operations we can do to the assembly
that leave it invariant. They are: {Identity (i.e. 0° rotation), 120° rotation, 240° rotation}. The set of
operations that that leave an object invariant is a complete description of its symmetry. Sets of this
type obey special properties that make them examples of mathematical groups, which we discuss
next.
Mathematical Groups
In mathematics, a group is a set that, together with a defined binary operator, obeys a specific set of
rules. A binary operator is something that takes two elements as input and returns one element as
output. In regular arithmetic, addition and multiplication are examples of binary operators, but as
we shall see binary operators can take diverse forms.
The rules that must be obeyed for a set to be a group are as follows:
There must be an identity element (I) in the set, such that for every element A in the set,
I A = A I = A. [Here, the symbol is used to denote the general binary operator.]
114
For every elements A in the set, there must be an inverse element (denoted A-1), also
within the set, such that A A-1 = A-1 A = I.
The associative rule must apply: A (B C) = (A B) C for all elements in the set.
A closure rule must be satisfied so that the product of any two elements from the set
(including the product of an element with itself) must also belong to the set. That is, if A
and B belong to the set, then so must (C = A B) for all choices of A and B within the set.
The rules must all be satisfied for a set to constitute a group, but for our purposes the last rule is
especially illuminating.
Here are a few examples of groups relating to pure mathematics:
{1, -1} under ordinary multiplication
{integers} under ordinary addition
{1, i, -1, -i} under complex-valued multiplication
{[1 00 1
] , [0 −11 −1
] , [−1 1−1 0
]} under matrix multiplication
In each case you should be able to identify the identity element and also work out a multiplication-
type table. For the third example above, the table would be:
x 1 i -1 -i
1 1 i -1 -i
i i -1 -i 1
-1 -1 -i 1 i
-i -i 1 i -1
From our discussions above we can now see why the symmetry of an object obeys the properties of
a group. In particular we can see why a set composed of symmetry operations of an object must obey
the closure rule for a group; if operation A leaves the object indistinguishable, and the same is true
for operation B, then surely performing operation A followed by operation B must also comprise an
operation that leaves the object invariant. The symmetries obeyed by objects are therefore typically
referred to as symmetry groups. Next we will enumerate the possible symmetry groups for
assemblies of macromolecules.
Point Group Symmetries for Biological Assemblies
The prevalence of symmetry in natural proteins is impossible to miss. About 50% of all proteins that
have been purified and studied in the laboratory have been shown to be symmetric oligomers. In
this section we enumerate all the finite symmetry groups that are possible in three-dimensions; we
save for later a discussion of essentially infinite symmetry groups that characterize extended
assemblies like those in filamentous structures. Finite symmetry groups are referred to as point
group symmetries because the symmetry axes pass through a central point in the assembly. The 3-
115
D point group symmetries can be arranged, in order of increasing complexity, as cyclic, dihedral, and
cubic.
Cyclic Point Group Symmetries
Each of these symmetry groups is based on a single axis of rotational symmetry: two subunits in a
dimer, three subunits in a trimer, generalizing to n subunits in a cycle. The symmetry designations
are C2, C3, …Cn. [C1 would be the symmetry group for an object with no symmetry.] For C2, the axis
of symmetry corresponds to a 180° rotation. Because applying this operation twice (or two-fold)
returns one back to
the starting
orientation, that
symmetry element
is often referred to
as a “two-fold” axis
of symmetry.
Likewise, a
symmetry element
for 120° and 240°
(and of course 0°) rotations is referred to as a “three-fold” axis, and so on. In drawings, the rotational symmetry axes are denoted by symbols that match the order of their rotation (e.g. a small square
representing a 4-fold axis). In any symmetry
group, the number of elements in the group is
the same as the number of differently oriented
but otherwise identical subunits required to
construct the symmetry. For cyclic symmetry
groups, Cn, that number is n. Example drawings
are shown for the first few cyclic symmetries.
There is no theoretical limit to the value of n,
but the highest rotational symmetry for any
known protein assembly is 39, for a truly
extraordinary barrel-shaped protein chamber
known as the vault, which is present in
eukaryotic cells for an as-yet uncertain
function.
As you can see from the diagrams, a
fundamental point that arises from the
principles of symmetry is that the individual components (e.g. protein subunits) are all in identical
environments. No physical difference of any kind can be ascribed to the multiple copies of the
subunit.
An example of a pentameric protein obeying C5 symmetry is shown as a semi-transparent surface
over a ribbon diagram. The five copies of the subunit are shown in separate colors.
116
Dihedral Point Group Symmetries
Dihedral symmetry groups are somewhat more
complicated. They are essentially built by combining
two copies of a cyclically symmetric arrangement,
one flipped upside down on top of the other. As a
result, they sometimes resemble double ring
structures, but sometimes that feature is not so
evident, depending on the shape of the subunit and
its position relative to the symmetry axes. In
dihedral symmetry, there are multiple axes of
rotational symmetry, all passing through and hence
intersecting at the center of mass of the assembly.
Symmetry D4 is shown. As you can see, there is a
unique 4-fold axis of symmetry, along with four 2-
fold axes of symmetry, which all intersect the 4-fold
axes in a perpendicular fashion. If the unique 4-fold
axis is along the z-direction, then the four 2-fold axes
lie in the x-y plane, evenly spaced at 45° from each
other. Note that a rotation about the 4-fold axis exchanges subunits within the same ring, whereas the 2-fold axes exchange subunits between the two rings. As shown here, a convenient way to draw
dihedral symmetries is to base them on a prism of the appropriate symmetry; e.g. a square prism for
D4. For dihedral symmetry, the number of subunits (and distinct subunit orientations) is 2*n, where
n is the order of the unique axis of symmetry. The enzyme RuBisCO, argued to be the most abundant
enzyme on Earth, is an example of a protein assembly with D4 symmetry. Its subunit composition is
L8S8 (eight large subunits and eight small subunits). In order to most clearly illustrate the D4
symmetry, the arrangement of the large subunits in RuBisCO are shown here, with each subunit
colored differently, oriented in order to show views down the different symmetry axes.
Dihedral symmetries are possible from D2 to Dn for any n. Note that the case of D2 is somewhat
unique. In that case there is no single unique axis of highest order. Instead there are three 2-fold
axes all perpendicular to each other (e.g. along x, y, and z). And instead of a pair of ring structures
there is a pair of dimers; D2 symmetry is therefore sometimes referred to as a dimer-of-dimers. But
otherwise the situation is the same as for higher n. That is, there is still an axis with n-fold symmetry
(where n=2 for D2) combined with n evenly spaced 2-fold axes perpendicular to that axis.
117
Hemoglobin has subunit stoichiometry α 2β2, where the alpha and beta subunits are highly similar. If
they were identical, the four subunits in hemoglobin would be an example of D2 symmetry. D2 is a
very common symmetry for proteins; C2 is the most common.
Beyond the dihedral symmetries, there are just three cases of higher rotational symmetry groups in
three-dimensions. These are the cubic symmetries, discussed next.
Cubic Symmetries
The cubic symmetries are based on the Platonic solids and thus share their symmetry. There are
exactly five Platonic solids – their study dates back to the ancient Greek mathematicians – defined by
the requirement of having equivalent vertices, equivalent faces, and equivalent edges. They are the
regular tetrahedron, cube, octahedron, icosahedron, and dodecahedron. It turns out that two pairs
of these are intimately related to each other, sharing the same symmetry, so in fact there are really
just three symmetries represented by the five Platonic solids. Tabulating the numbers of faces,
vertices, and edges in the five Platonic solids illuminates the so=called ‘dual’ relationship between
the cube and the octahedron and between the icosahedron and the dodecahedron. Those pairs are
related to each other by exchange of faces for vertices and vice-versa. That is, if you place a point at
the center of each of the six faces of a cube, those points are the vertices of an octahedron. And
likewise, points at the centers of the eight faces of an octahedron produce the vertices of a cube. In the same way, the icosahedron and the dodecahedron are duals of each other. And, interestingly, the
tetrahedron is its own dual.
The Platonic solids are shown with
rotational symmetry axes indicated.
For simplicity, only one instance of
each axis type is shown on each figure.
Note how 2-fold symmetry axes pass
through opposing pairs of edges.
Symmetry axes passing through faces
must conform to the symmetry of the faces. And symmetry axes passing through vertices must
conform to the number of faces that meet at a vertex.
An assembly conforming to tetrahedral symmetry (T) can be constructed by placing three subunits
(or symbols) on each face in a symmetric arrangement, for a total of 12 subunits. Octahedral
Platonic solid
vertices faces edges symmetry
tetrahedron 4 4 6 T cube 8 6 12 O octahedron 6 8 12 O icosahedron 12 20 30 I dodecahedron 20 12 30 I
118
symmetry (O) can be constructed by placing 4 subunits on each square face of a cube, or three
subunits on each face of an octahedron, leading to 24 subunits in either case. Whether a real
assembly (e.g. of protein subunits) that obeys symmetry O looks more like a cube or an octahedron
typically depends on the situation and can be subjective. But the symmetry properties do not depend
on whether one thinks of the assembly as cube-like or octahedron-like. The situation is the same for
icosahedral symmetry I. Those cases can be drawn and visualized as either three subunits on 20
triangular faces or five subunits on 12 pentagonal faces, for a total of 60 subunits.
Schematic diagrams are drawn for assemblies in tetrahedral, octahedral and icosahedral symmetry.
Broken or pseudo symmetry
Many examples appear in nature where an assembly nearly has a higher symmetry, but owing to
subtle differences the symmetry is broken down to a lower symmetry group, which is a subgroup of
the higher symmetry group. Hemoglobin is the best-known example. As noted above, if the α and β
subunits were identical the symmetry would be D2. But because the two subunit types are slightly
different, the true symmetry is only C2; one might say the symmetry is pseudo-D2. The head of the
F1-ATPase motor is another well-known example. The subunit stoichiometry is (αβ)3. Again, the α
and β subunits are very similar, but only the beta subunits have active catalytic sites. The α and β
subunits alternate in a hexameric ring. The true symmetry is C3 (although really even the C3
symmetry is broken by conformational differences in the chemically identical subunits), while it is pseudo-C6. Note that C3 {0°, 120°, 240°} is a subset or subgroup of C6 {0°, 60°, 120°, 180°, 240°, 300°}.
Both of these examples presumably arose from gene duplication of a single ancestral protein subunit,
followed by divergent evolution to give slightly different sequences and structures.
Biological Considerations
Why are nearly all the oligomeric proteins found in nature symmetric? The short answer is that
symmetric arrangements are easier to build compared to non-symmetric arrangements. The key
distinction is that symmetric arrangements require the fewest number of distinct subunit interaction
types. That is illustrated below for a tetramer of four identical subunits. To create a C4 arrangement,
119
a single interface type (highlighted by the
red dot) is sufficient to hold the entire
assembly together. On the other hand, the
non-symmetric case has four distinct
interaction interfaces that are all
necessary. The question of how easy it is
for something to arise by chance is critical
in evolution, since natural selection can
only operate on phenotype outcomes that
are somehow sampled by random
incremental mutations. Interestingly, it
was articulated as early as 1956 by Crick and Watson in their early work on how virus capsids should
be assembled (which predated their better-known discovery of the structure of DNA) that symmetric
arrangements would be dominant in natural structures like viral capsids because of the fewer
number of contact types required.
Setting aside the issue of symmetry, why are so many proteins and enzymes oligomeric in the first
place? One explanation is cooperativity. We already discussed the cooperativity that is made
possible in oligomers. However, the number of oligomeric enzymes where cooperativity has been
established is quite a small fraction of all the known oligomers that have been studied. Another explanation holds for some cases where large-scale structural integrity is required; viral capsids,
microtubules and bacterial S-layers are well-known examples. But again, these are special cases, and
they do not speak to the question of why enzymes are so often oligomeric. Other potential advantages
have been proposed, including the idea that oligomers are naturally more stable than monomers. But
there is little evidence to support such ideas. The exceptionally high abundance of oligomeric
enzymes is (in the author’s opinion) a largely unexplained puzzle in molecular biology.
Special Topics in Protein Symmetry
Helical Symmetry (non-point group)
Some symmetries contain operations that have translational or shift components in addition to
rotation. Repeated application of an operation that includes a shift naturally implies a structure that
extends essentially indefinitely; i.e. a filamentous structure. F-actin filaments, microtubules, many
rod-shaped filamentous viruses, and phycobilisomes are some examples of protein assemblies that
follow helical symmetry.
Describing the geometry of helical assemblies is generally more complicated than describing finite
assemblies that obey point group symmetries. In some case, the organization of a helical assembly
can be fully described by a single spatial operation (a rotation combined with a shift), which when
applied repeatedly generates all the subunits in the structure. The F-actin filament is described by a
rotation of about 167° combined with a translation of about 28 Å. Because the rotation is close to
120
180°, the F-actin
filament takes the
appearance of two
separately
interwound helical
‘protofilaments’. The
cylindrical protein
coat of tobacco mosaic
virus (TMV) can be
described by a single
rotational operation
of about 22° combined
with a translation of
about 23 Å. Because
that rotation is
somewhere between
1/16 and 1/17 of
360°, the assembly
can also be viewed as
16 slowly twisting protofilaments twisting one way, or 17 protofilaments twisting the other direction. In other cases, like the microtubule, the helical assembly is much harder to describe by a
single operation. Instead, different families of helical curves can be drawn on the ‘surface lattice’.
With one family of curves, there are apparently 10 protofilaments (such a curve is therefore referred
to as a ’10-start’ helix). There are also 13-start helical curves and 3-start helical curves for the
microtubule. In yet other kinds of tubular protein assemblies, there can be a true rotational
symmetry along the axis of the tube; in those cases it is impossible to describe the assembly in terms
of a single spatial operation between subunits.
Quasi-equivalence and the structure of icosahedral viral capsids Early work on viral capsids – of the ‘spherical’ variety, not the filamentous variety – led to the
conclusion that they would be constructed according to principles of symmetry. The highest cubic
point group symmetry in three-dimensional space is icosahedral (as we discussed above), and this
posed a major problem. It was clear that a 60-subunit (icosahedral) protein shell would not be large
enough to encapsulate all the genetic material of a virus. Don Caspar and Aaron Klug proposed a
solution to that puzzle based on the idea of ‘quasi-equivalence’. Under symmetry, related copies of a
subunit are in equivalent environments, so only 60 subunits can be assembled while retaining strict
environmental equivalence. But Caspar and Klug showed how larger numbers of subunits can be
assembled in quasi-equivalent environments. The key was to begin with a scheme based on
triangular facets of an icosahedron, but then to subdivide the triangular facet of the icosahedron into
several smaller triangles. The simplest way to do this is to divide a triangular facet into four smaller
triangles. Then, instead of placing three subunits in a symmetrical arrangement on a single face of
an icosahedron, as one would do for a simple icosahedral assembly, one can place three subunits on
121
each smaller subdivided triangle,
again in a symmetric arrangement.
Clearly you would end up with four
times as many subunits as for a simple
icosahedral assembly, namely
4*60=240 total subunits. We asserted
before that no more than 60 subunits
can be placed in strictly equivalent
environments, but this triangulation
method leads to subunits that are in
nearly equivalent or quasi-equivalent
environments, as shown. To see the
difference, note how some subunits
appear to be part of hexameric units
while others appear to be part of pentameric units. If the capsid is composed of only a single kind of
protein, then the same protein must be able to occupy multiple distinct conformational states. In
some viruses, the distinct geometric sites are occupied by slightly different capsid proteins; that
avoids the problem of the same subunit having to take on different conformations, but it also creates
a need for the viral genome to encode more proteins.
The case explained above is referred to as T=4 based on the factor by which the number of subunits
increases; the total number of subunits is 60*T. Other cases besides T=4 are more common in nature.
These triangulation scheme are a bit harder to draw because the side of the large triangular facet of
the icosahedron does not fall along a lattice line of the smaller triangulation pattern on which the
subunits are arranged. Only some triangulation numbers (T) are possible. The governing equation
is
T = h2 + k2 + hk
where h and k are
integers. In the
triangulation diagram
shown, h and k
describes the indices of
an edge of the large
triangular facet of the
icosahedron in terms of
edges of the smaller
triangular lattice. The
recipe for assigning the
values of h and k for a
given diagram is as
follows. Take one
corner of the large triangle as the origin (0,0). Then draw two unit vectors a and b to serve as
coordinate axes on the pattern of smaller triangles; these two unit vectors drawn from the origin
must be 60° apart (not 120°). Now figure out what the coordinates would be in this system for one
122
of the other corners of the larger triangular facet. In other words, determine how many steps you
would have to take along a and b in order to reach the other corner of the large triangular facet.
Those numbers of steps are the values of h and k. Note that as long as you follow these rules, you can
choose any edge of the larger triangular facet and multiple choices for the two coordinate basis unit
vectors; you may get different values for h and k, but the value for T should be unchanged. T=3 is a
common case in natural viruses, while at the upper limit a few giant viruses are known where T is at
least 1000 and the virus exceeds the size of a bacterial cell!
Using symmetry to design novel protein assemblies
Nature is full of examples of proteins that have evolved to form elaborate assemblies. A long-standing
goal in bioengineering has been how do design novel proteins in the laboratory so they will self-
assemble to make interesting architectures like those seen in nature. Ideas for how this might be
accomplished were laid out by the author’s laboratory several years ago and have reached fruition in
recent years. Symmetry has played the key role in the design strategy. In one approach, simple
natural protein oligomers like dimers and trimers are connected together in specific geometric ways
to give giant structures like cubic cages. Designed structures of this type may have utility in varied
biomedical and nanomaterials applications. The crystal structure of a designed protein assembly
having 24 identical subunits in symmetry O is shown below, oriented along its different axes of rotational symmetry.
Algebra for describing symmetry
When working with symmetry, it is often necessary to describe the underlying spatial operations in
algebraic terms. Some recollection of matrices and how to multiply matrices and vectors together is
important. We noted earlier that groups can be composed of matrices, and indeed we can take a
symmetry group and represent its elements by matrices. Each matrix represents a rotation operation
that is an element of the symmetry group.
A simple recipe makes it possible to construct a 3 by 3 matrix from a physical description of a rotation
operation. First, imagine that you have a starting point at coordinates (1,0,0), that is a point one unit
along the x-axis. Now ask yourself where that point would go under the operation in question. For
example, if the operation in question is a 180° rotation about the z-axis, a point that starts at (1,0,0)
would rotate to a position where the coordinates are (-1,0,0). Now write those coordinates (-1,0,0)
123
as the first column vector of a matrix. Now repeat the exercise with (0,1,0) as the starting point.
Under the operation of interest it would go to a position where the coordinates are (0,-1,0). Write
that as the second column vector. Now do the same for the starting point (0,0,1). That point actually
sits on the z-axis about which the rotation is occurring, so it would not go anywhere; it’s final position
would be (0,0,1). So, the constructed matrix for a 180° rotation about the z-axis would be
𝑅 = [−1 0 00 −1 00 0 1
]
We can do much with this matrix representation. For one, we can use it to multiply a generic ‘x,y,z’
vector notation to get a symbolic representation of the rotation in question:
[−1 0 00 −1 00 0 1
] [𝑥𝑦𝑧] = [
−𝑥−𝑦𝑧
]
That means we can equally well represent the rotation about z symbolically as ‘(-x, -y, z)’. Also, by
reversing our steps we could begin with a symbolic description of a rotational operation, write out
what the elements of the rotation matrix must be, and then use the columns of that matrix to get a
physical picture of what kind of operation is being performed.
Finally, there is a useful trick. In three-dimensions, the angle of rotation described by a 3x3 rotation matrix can be determined easily from its ‘trace’. The trace of a square matrix is the sum of its diagonal
elements (i.e. R(1,1)+R(2,2)+R(3,3)). The equation for the angle of rotation is:
Trace(R) = 1 + 2cos()
Checking the case we worked out above, the trace is (-1 + -1 + 1) = -1. Solving for, , we get 2cos()
= -2, then cos() = -1, and finally = 180°, as expected. [Note that for a 2x2 matrix describing a 2-D
rotation, the equation is different; the additive term “1” on the right must be removed for the 2-D
case.]
124
CHAPTER 12
Equations Governing Diffusion
We turn our attention now to dynamic processes, where behavior becomes a function of time.
Diffusion – the movement of molecules as a result of random thermal motion and collisions – is one
of the most basic dynamic processes for molecules. We will discuss the central equations that govern
diffusion and consider their consequences for diffusive behavior.
Diffusion in 1-D
Consider diffusion in one dimension as a random walk along a number line. Suppose we start at x=0,
and then take n steps randomly either to the left or right, with each step having length .
Where do we expect to end up after n steps?
𝑥𝑛 = ∑𝑙𝑖𝛿
𝑛
𝑖=1
where li is either +1 or -1 with equal probability. Then, the expected value of the position after n
steps, <xn>, will be
⟨𝑥𝑛⟩ = ⟨∑𝑙𝑖𝛿
𝑛
𝑖=1
⟩ = 𝛿 ⟨∑𝑙𝑖
𝑛
𝑖=1
⟩ = 𝛿 ∑⟨𝑙𝑖⟩
𝑛
𝑖=1
= 0
since the average or expected value of l is 0. The equation tells us that the average value of the
position of the particle after n steps remains at 0. This is as expected since there was no preference
to step one way or the other. This means that if a large group of particles take independent random
walks starting at the origin, the average of their distribution will remain at 0. But clearly the particles
themselves do not individually remain at zero. That leads us to ask how spread out the distribution
of particles would be after they each take n steps.
The standard way of expressing a degree of spreading is to evaluate the average value of the squared
position – the squaring causes displacements in both directions (positive and negative) to contribute
positively to the spread, as they should. The average squared displacement after n steps is
125
⟨𝑥𝑛2⟩ = ⟨(∑𝑙𝑖𝛿
𝑛
𝑖=1
)
2
⟩ = 𝛿2 ⟨(∑𝑙𝑖
𝑛
𝑖=1
)
2
⟩ = 𝛿2⟨(𝑙1 + 𝑙2 + ⋯+ 𝑙𝑛)(𝑙1 + 𝑙2 + ⋯+ 𝑙𝑛)⟩
= 𝛿2 (⟨𝑙1𝑙1⟩ + ⟨𝑙1𝑙2⟩ + ⋯+ ⟨𝑙1𝑙𝑛⟩
+⟨𝑙2𝑙1⟩ + ⟨𝑙2𝑙2⟩ + ⋯+ ⟨𝑙2𝑙𝑛⟩+⋯
) = 𝛿2((1 + 0 + ⋯+ 0) + (0 + 1+)… ))
= 𝑛𝛿2
The logic here parallels what we saw earlier in the course when treating the path of a flexible polymer
using a random walk model. Namely, the average squared distance traveled is proportional to the
number of steps. This means that the rms distance goes as the square root of the number of steps, as
before.
𝑥𝑟𝑚𝑠 = √𝑛𝛿
This is an important result. It shows that diffusion can be an efficient mechanism for movement over
short distances but not over long distances. This important limiting feature of diffusion explains the
evolution of a range of biological phenomena where energy is expended (e.g. in the form of ATP or
GTP hydrolysis) to cause directed movement of molecules or even organelles across long distances.
The places where this becomes most important is where the cellular length scales are largest;
neurons are the classic example, and energy driven transport (e.g. by molecular motors tracking
along microtubules) is critical there.
We can convert the equation <xn2> = n2 from a form that depends on details of the random walk
(e.g. n and ) to a more phenomenological form that depends on time. Let be the time interval
between steps. Then the elapsed time during a random walk is t=n, and n=t/. So,
<xn2> = (t/) 2 = (2/)t
Now we replace the term (2/) with 2D, where D is called the diffusion coefficient, which is a
property of the molecule in question, among other things such as the viscosity of the solution in which
the diffusion is occurring. With that substitution, we find that the rms distance traveled is
<xn2> = 2Dt
This equation applies to diffusion in one dimension. In three dimensions,
<xn2> = 6Dt
We will discuss the relationship between molecular properties and the diffusion coefficient D later,
but for now we continue with an analysis of how molecules spread out from a central starting point,
and how this depends on time and on D.
126
For the 1-D case we can
imagine a thin tube in
which material is
initially concentrated at
one point (x=0). Over
time, the molecules
naturally spread out.
What will the concentration profile
look like at different time points? The
answer is that the concentration will
take the form of a Gaussian
distribution. Specifically,
𝐶(𝑥) ∝ 𝑒−(𝑥2)/4𝐷𝑡
Comparing this to the standard form
of a Gaussian (𝑒−(𝑥2)/(2𝜎2)), where σ
is the standard deviation, we see that
the standard deviation of the
concentration profile is obtained by
setting 4Dt = 2σ2, which gives a
standard deviation for spreading
from the center of σ = sqrt(2Dt),
which matches our previous
expression for rms distance traveled,
as it should.
General Equations for Diffusion
We were able to work out some basic properties for diffusion from a fixed point, but what about
equations we can apply to more general cases? This leads us to Fick’s first and second laws of
diffusion.
Fick’s first law
We begin by setting up a 1-D system as before. Then we consider how much material (i.e. how many
molecules) would cross an imaginary boundary between two points in the system, x and x+, over a
time interval where each molecule would take a step.
127
We can let N(x) denote
the number of molecules
that are at position x, and
N(x+) denote the
number of molecules at
position x+. To relate
numbers of molecules to
concentrations, we would need to divide the number at each location (x or x+) by the volume
allotted to each position, namely A*, where A is the cross sectional area.
Now we can consider how many molecules we expect to cross the imaginary boundary. Our interest
is in the net movement. The net movement or transport across a boundary (real or imaginary) is the
flux, J, expressed as a # per area per time (cm-2 sec-1 in cgs units). We can calculate the flux in our
system by noting that half of the molecules at position x will cross the boundary from left to right
(since the probability is ½ in each time step that a molecule will take a step to the right), and likewise
half of the molecules starting at x+ will cross the boundary from right-to-left. Clearly then the net
flux (taken here as the net movement from left to right) would be the first quantity minus the second
quantity, dividing by area and the time interval:
J = ½*(N(x)-N(x+))/(A)
Now multiplying the top and bottom by 2 gives
J = ½*(N(x)-N(x+))2/(A2)
Rearranging and substituting concentration C for N/V = N/(A) gives
J = - ½ 2/ * (C(x+)-C(x))/
Now we recognize that (½ 2/) is just the diffusion coefficient D from before. And the expression on
the right side of the equation appears as a difference between two values of the concentration C at
two closely spaced points (x and x+), divided by the spacing; this has the form of a derivative of C
with respect to position. So,
J = -D dC/dx
This is Fick’s first law in one dimension.
This general result tells us that the net movement of molecules due to diffusion (down a
concentration gradient) is proportional to the steepness of the gradient (dC/dx) times the diffusion
coefficient. And of course the negative sign is important as it specifies movement in the direction
128
opposite from the direction of the
gradient. The idea can be graphed on a 1-
D concentration profile as shown.
Fick’s second law
What can we say about how
concentrations will be changing over time
as a result of diffusion? We can answer
that with a similar treatment. But now we
think about how the number of molecules
in some particular region would change
as a result of the flux that is occurring on
one side compared to the other (i.e. on the
left compared to the right). If the total flux
into a region, taking into account movement across the boundaries on either side, is positive, then
the concentration should be increasing over time.
Thinking about this as a change in concentration over a change in time,
ΔC/Δt = (Δ(# of molecules)/V) / Δt
Then realizing that the change
in the number of molecules is
given by the net number being
transferred across the
boundary on the left minus the
net number being transferred
across the boundary on the
right, and taking the volume
element to be A* , and taking
the time interval to be ,
ΔC/Δt = (J(x)A – J(x+)A)/(A)) /
This simplifies to
ΔC/Δt = - (J(x+) – J(x))/
Similar to before, we can recognize this as a derivative of J with respect to position. So,
dC/dt = -dJ/dx.
129
But we know from Fick’s first law that the flux J is the first derivative of the concentration C with
respect to position x. So, the change in concentration as a function of time is evidently the second
derivative of C with respect to position, multiplied by the diffusion coefficient D.
(𝜕𝐶
𝜕𝑡)
𝑥= 𝐷 (
𝜕2𝐶
𝜕𝑥2)
𝑡
This is Fick’s second law in 1-D.
Essentially, it tells us that the way
the concentration is changing at a
fixed position due to diffusion is
determined by the curvature (i.e.
the second derivative) of C with
respect to x. With that
understanding we can sketch how a
concentration profile would be
expected to change over time (at
least over a short interval where the
derivatives are not changing much):
Fick’s second law is a second order
differential equation for C in terms
of x and t. Some scenarios have
simple enough ‘boundary’
conditions (e.g. a simple form for
the concentration at time 0), that we
can solve Fick’s law to obtain a
complete expression for C in terms
of x and t, meaning we would know what the concentration profile would look like at any time t. Most
real problems have mathematical forms that are difficult to solve. But in the case of diffusion from a
point that we dealt with earlier, we did write out an equation (without proof) saying that the
concentration profile as a function of time and position was proportional to a Gaussian. Introducing
a leading multiplicative term in order to make the total amount of material in the system constant
over time, the correct equation is:
𝐶(𝑥) ∝ (1
√𝐷𝑡) 𝑒−(𝑥2)/4𝐷𝑡
Although we write this equation without proof, we can show that it does indeed obey Fick’s second
law, as it must. Take the first (partial) derivative of C with respect to t. Then take the second (partial)
derivative of C with respect to x. The resulting expressions should be equal to each other after
multiplying by the diffusion coefficient, D.
130
Generalizing Fick’s laws for three dimensions
Fick’s laws generalize readily to higher dimension. The single-variable derivatives get replaced by
the gradient operator.
In dimensions higher than 1, the flux J is a vector. It points directly down the concentration gradient
in the direction of steepest descent. Fick’s first law takes the following form:
𝐽 = −𝐷 ((𝜕𝐶
𝜕𝑥)𝑡,𝑦,𝑧
�̂�𝑥 + (𝜕𝐶
𝜕𝑦)𝑡,𝑥,𝑧
�̂�𝑦+(𝜕𝐶
𝜕𝑧)𝑡,𝑥,𝑦
�̂�𝑧) = −𝐷∇⃗⃗ 𝐶
where the �̂�𝑥 term denotes the unit vector along x and likewise for y and z, and ∇⃗⃗ symbolizes the
gradient or ‘del’ operator.
Similarly, Fick’s second law generalizes to:
(𝜕𝐶
𝜕𝑡)
𝑥,𝑦,𝑧= 𝐷 ((
𝜕2𝐶
𝜕𝑥2)
𝑡,𝑦,𝑧
+ (𝜕2𝐶
𝜕𝑦2)
𝑡,𝑥,𝑧
+ (𝜕2𝐶
𝜕𝑧2)
𝑡,𝑥,𝑦
) = 𝐷∇2𝐶
Special topic: Using numerical (computational)
methods to simulate diffusion behavior
Many problems that involve differential equations can
be treated effectively using computer techniques. The
derivative quantities are effectively replaced with
differences in a variable over small sampling distances.
To apply Fick’s second law to a diffusion problem, we
need to know the second derivative of concentration
with respect to position. Examining the 1-D case first,
we might have a plot of concentration at some time t as
shown:
How would we estimate the value of (d2C/dx2) at point x? Well, the second derivative is just the
derivative of the first derivative, so we need to evaluate how the first derivative changes. We can
take the difference between the first derivatives between points x and x+Δx and between points x-
Δx and x, keeping in mind we need to divide by the separation distance when calculating derivatives.
We get:
131
𝑑2𝐶
𝑑𝑥2≅
𝐶(𝑥 + ∆𝑥) − 𝐶(𝑥)∆𝑥
−𝐶(𝑥) − 𝐶(𝑥 − ∆𝑥)
∆𝑥∆𝑥
=𝐶(𝑥 + ∆𝑥) + 𝐶(𝑥 − ∆𝑥) − 2𝐶(𝑥)
(∆𝑥)2
In other words, the second derivative is approximated by first adding up the values of a variable on
either side of the central point and then subtracting twice the value the variable takes at the central
point, dividing by the square of the sampling distance. This recipe makes it possible to simulate the
evolution of a concentration profile under diffusion by estimating the second derivative of C at each
point and then using those values to update the new concentrations at a new time point. Since
∆𝐶
∆𝑡= 𝐷
𝜕2𝐶
𝜕𝑥2
∆𝐶 = (𝐷𝜕2𝐶
𝜕𝑥2)∆𝑡
The procedure can be extended easily into higher dimensions. For 2-D, the numerator is just the sum
of separate second partial derivatives with respect to x and y, and we would end up with:
∆𝐶 = (𝐷 (𝜕2𝐶
𝜕𝑥2+
𝜕2𝐶
𝜕𝑦2))∆𝑡
=𝐶(𝑥 + ∆𝑥, 𝑦) + 𝐶(𝑥 − ∆𝑥, 𝑦) + 𝐶(𝑥, 𝑦 + ∆𝑦) + 𝐶(𝑥, 𝑦 − ∆𝑦) − 4𝐶(𝑥, 𝑦)
(∆𝑥)2∆𝑡
assuming the sampling distance is the same in both
directions (i.e. Δx = Δy). Graphically, the effect is to add up
the value of the variable (C) at the four points surrounding
the central point (x,y) and then substract 4 times the value of
the variable at the central point – this is the numerator
above.
In 3-dimensions, the required coefficients are of course
+1.+1,+1,+1,+1,+1, and -6.
132
CHAPTER 13
The Diffusion Coefficient: Measurement and Use
We noted earlier that the diffusion coefficient, D, which describes how rapidly a molecule diffuses,
depends on the properties of the molecule (chiefly its size). Naturally then, if we measure D we can
learn something about molecular size. We begin with a discussion of experiments for measuring D.
Measuring the diffusion coefficient, D
The diffusion coefficient can be measured in various ways, each of which may be suitable for some
systems and not others. One approach is to measure the rate of spread from an initial point. From
our earlier equations we know that the standard deviation in the spread of material initially
concentrated at a point goes as xrms=sqrt(2Dt) in 1-D and sqrt(6Dt) in 3-D. Therefore, if the spatial
spreading (i.e. the standard deviation of the concentration distribution) after time t can be measured,
D can be obtained.
Recent technology developments have made it possible with special instrumentation to track the
location of a single molecule, usually on the basis of fluorescent labeling combined with highly
sensitive detectors. If a single particle is monitored long enough, its average movement over a
particular time period can be evaluated, leading to a value for D in the same way as for the
measurement of spreading in an ensemble, as above.
Fluorescent recovery after photobleaching (FRAP)
Diffusion is a process whereby molecules move about in a way that tends to result in a uniform
distribution; i.e. an equal concentration everywhere. As a result, measuring diffusion in a system
where the molecule of interest is already uniformly distributed is naturally problematic, since further
diffusion has no effect on the concentration distribution. The principle behind methods known as
fluorescent recovery after photobleaching or FRAP, is to create a non-uniform distribution of the
molecule of interest and then monitor how rapidly its concentration profile returns to uniformity.
The standard approach is to first label the molecule (e.g. a protein) with a fluorescent group. [As we
will discuss later, fluorescence is a convenient and sensitive way to measure the concentration of a
labeled molecule.] Then, a strong laser pulse is used to ‘bleach’ (i.e. destroy by bond breakage) the
fluorescent probe, not everywhere but just in one spot, leaving a region (e.g. a circular spot) where
there is no fluorescence from the molecule of interest. Then, one waits and measures fluorescence
in the bleached region. Assuming the labeled molecules are diffusing about, unbleached molecules
from outside the bleached region will find their way into that region. How many of those molecules
have diffused into that region can be monitored with a (usually microscopic) fluorescence reading.
Of course the bleached molecules will also diffuse out of the bleached region, but they lack a
fluorescent group and so do not contribute to the measured fluorescence. After an extended period,
the fluorescence as a function of time will plateau as the concentration of unbleached molecules
becomes equal inside and outside the bleached region. The time scale over which the fluorescence
133
returns depends on the diffusion coefficient of the molecule being studied. So, D can be obtained by
measuring the rate of fluorescence recovery.
The behavior of fluorescence recovery and its relation to D is diagrammed here. We will not go
through a detailed mathematical treatment here other than to say that the equations for diffusion
make it possible to formulate what the curve for fluorescence recovery should look like as a function
of D. And therefore a value for D can be extracted by determining what value of D gives the best
match between the mathematically calculated behavior and the observed data.
FRAP can be used in
various types of set-ups.
It is naturally suited for
measuring two-
dimensional diffusion in
a thin layer, for example
of a protein in a lipid
bilayer. It is also
commonly used in situ
(i.e. inside cells) using
fluorescent microscopy.
For fluorescent studies
inside cells, the protein
of interest must be fluorescently labeled by genetic fusion to a naturally fluorescent protein, a topic
we will discuss in more detail later.
Dynamic Light Scattering (DLS)
Dynamic light scattering (DLS, also sometimes referred to as photon correlation spectroscopy) is a
powerful and convenient experiment for measuring diffusion coefficients. Part of its power and
convenience comes from not having to artificially create a system where the concentration
134
distribution is out of equilibrium (i.e. non-uniform), as required in FRAP. DLS relies on natural
fluctuations in the intensity of light that is scattered by the large solute molecules in a solution. We
will not discuss the physics of light scattering in detail, other than to say that the phenomenon under
discussion here is typically referred to as elastic or Rayleigh scattering and occurs where the
wavelength of light is much larger than the sizes of the molecules involved; macromolecules have
sizes between a few nanometers to tens of nanometers, while the visible/UV region of the
electromagnetic spectrum is in the few hundred nanometer range. The intensity of the light
scattering – e.g. the fraction of the incident photons that bounces off in directions other than the
incident direction – depends on the index of refraction or polarizability of the molecules, and is
strongly dependent on molecular size. In fact, it tends to be dominated by the largest species of
molecule in a solution.
Without belaboring the details, the random movements of molecules in solution causes random
fluctuations in the intensity of light that is scattered in any fixed direction. At one moment in time,
the scattered light intensity may be slightly lower than the average (over time), whereas a moment
later the intensity may be higher. But the crux of the phenomenon is that the time scale over which
the fluctuations persist depends (inversely) on the rate of diffusion. If at some instant in time the
positions of the macromolecules in a solution are such that the light scattering is higher than average,
then the light scattering will remain above average until the molecules have moved far enough (by
diffusion) to erase the momentary fluctuation, and likewise if the scattered intensity is lower than
average. If the molecule under study has a high diffusion coefficient, then deviations above or below
the average intensity
will vanish or
dissipate quickly,
whereas if the
molecule has a low
diffusion coefficient
then whatever
fluctuations occur will
persist longer. In
other words, a
phenomenon that
shows random
fluctuating behavior
has a natural time
scale associated with
the fluctuations. The
plots here illustrate
the general idea of
fluctuating behavior
having different
characteristic time
scales.
135
How can the time scale of something that is randomly fluctuating be characterized mathematically?
The autocorrelation function provides the answer. The essence of an autocorrelation function is to
ask how similar or correlated the intensity measured at time t is compared to the value measured at
time t+, where is some specified time increment. Of course the answer depends on the value of .
If is sufficiently small, then the values measured at t and t+ will be very similar (in fact identical if
=0). On the other hand, if we consider a large value of (i.e. longer than the time scale of the
fluctuations), then the intensity values at t and t+ will be uncorrelated. And of course at
intermediate values of we will see intermediate values of the correlation (i.e. between 1 and 0). In
other words, the value of the autocorrelation function will be 1 when is 0, and will decay to 0 when
is large. Precisely how quickly the autocorrelation function decays as a function of tells us what
the characteristic time scale is for the fluctuating behavior.
The plot above (lower panel), illustrates the mechanics of how the autocorrelation function is
evaluated. First, recall from your prior exposure to statistics that when calculating the correlation
coefficient between two ordered sets of values, for the numerator one simply takes the sum or
average of the products of one set of values with the other set; the denominator is simply a
normalizing factor. So, to calculate the autocorrelation function we just need to calculate the average
value of the product of the intensities at time t and t+. To calculate the average value of the product
of I(t) and I(t+), you can imagine taking a bar of length and sliding it along the length of the plot to
identify intensity values to multiply together and average. Without loss of generality we can simplify
things by pretending that the average value of the intensity is 0, with fluctuations giving plus and
minus values. Then, the autocorrelation function A() is just
A() = <I(t)*I(t+)>
A plot of A() vs will decay exponentially, according to our discussions above. The characteristic
time for the fluctuating intensity is the value of the time increment at which A()/A0 = e. The
motivation for this kind of autocorrelation analysis is that the characteristic fluctuation time is
136
related inversely to the value of the diffusion coefficient, D. From more advanced texts one can find
that a plot of ln(A) vs gives a slope equal to D times (-82n2 sin2(/2)/2) where n is the index of
refraction, is the scattering angle, and is the wavelength of the light. In that way, D can be
obtained from the autocorrelation analysis.
With modern instrumentation, the value of D can be obtained by DLS quickly with very little material.
Built-in software performs the necessary analysis. The experiment is ‘native’ in the sense that the
macromolecule is examined in its native state, and nondestructive. Of course the macromolecule of
interest must be purified, and owing to the strong dependence of scattering on size, special care must
be taken to remove any particulates or aggregated material. If the sample in question is
heterogeneous, containing molecules of more than one size, then in ideal cases it is possible to
decompose the behavior into separate components, but this becomes challenging.
Relating the diffusion coefficient to molecular size
Relating D to molecular friction
When a molecule moves down its concentration gradient under diffusion, the overall rate of
movement or transport reflects a sort of terminal velocity situation, where the force of movement
caused by the concentration gradient is balanced by the friction the molecules feel as they travel. The
frictional force on an object is the product of its velocity with its frictional coefficient, f. The direction
of force is opposite from the velocity, so
Ffrict = -f v
What is meant by a force due to a concentration gradient? We have seen this before in the context of
balancing forces in equilibrium sedimentation. If we denote the force due to a concentration gradient
as FC, then recalling that force is the derivative of (potential) energy as a function of position,
FC = -d(0 + kBT lnC)/dx = (-kBT/C) dC/dx
Now requiring that the two forces sum to zero at terminal velocity, FC + Ffrict = 0 and
f v = (-kBT/C) dC/dx
But the velocity v is a description of how fast the molecules are moving (not their vibrational speed
between collisions but their net transport speed), so v must be related to flux J. By examining units
(J is #/(cm2 sec), and v is cm/sec), we can see that the conversion between them is by units of #/cm3
which is concentration, so
J = v C and v = J/C
137
Putting this into the previous equation and cancelling concentration on both sides,
f J = (-kBT) dC/dx
But from Ficks first law we know that J = -D dC/dx . Substituting that into the equation above and
cancelling like terms, we find remarkably that
f D = kBT
Of course kBT is a constant (our familiar average thermal energy), having no dependence on the
molecule in question. Evidently, the frictional coefficient of a molecule f and the diffusion coefficient
of a molecule D are just two manifestations of the same thing, inversely related. If we know one, we
know the other. This equation is known as the Einstein-Smoluchowski equation. The reason it is
important is that there are well-known physics equations to describe how the frictional coefficient
of an object relates to its size, and since knowing D give us f, we have a way to get from D to molecular
size, as we describe next.
Relating the frictional coefficient f to molecular size (spherical radius)
In 1850 Stokes showed that for a sphere of radius R moving in a medium of viscosity , the frictional
coefficient is
f0 = 6 R
where the subscript in f0 denotes the assumption of a sphere. This is known as Stokes’ equation. For
water at ambient temperatures, the viscosity is about = 0.010 g/(cm sec). From the equations
above you can see that if you measure D, you can immediately obtain the frictional coefficient f, and
if you assume the molecule is spherical then you can obtain R. The value of R so-obtained is
sometimes called the Stokes radius or sometimes the hydrodynamic radius. From a value for R, one
can use the known density for a protein or nucleic acid to calculate its mass.
Transport problems are often treated in cgs units. The units for the key variables are somewhat
peculiar. They are listed here for convenience.
variable units D cm2/sec f g/sec
J 1/(sec cm2)
v cm/sec
g/(cm sec)
138
Example:
Suppose the measured value of D for a large protein complex is 510-7 cm2/sec. Assuming
the molecule is spherical, what is the molecular weight? (Let the density of protein be 1.35
g/cm3)
f = kBT/D = 8.2 10-8 g/sec
R = f/(6) = 4.4 10-7cm (= 44 Å)
MW = 1.35 g/cm3 (4/3) R3 NA = 290000 g/mol = 290 kDa
Non-spherical molecules
For a given volume, a sphere has the lowest possible frictional coefficient, which also means it has
the highest diffusion coefficient. Non-spherical objects have higher values of f and lower values of D.
That means that estimating molecular size from the diffusion coefficient alone (using Stokes’
equation, which assumes a spherical shape) can give somewhat erroneous values. In particular, since
a non-spherical shape leads to a lower diffusion coefficient and a higher frictional coefficient, and the
frictional coefficient varies directly with the assumed spherical radius, a highly non-spherical
molecule will have the same diffusion coefficient as a larger spherical molecule. In other words,
estimating molecular size from D alone can lead to an overestimation of size.
Advanced texts provide complex equations that relate f to the degree of non-sphericality, but these
are not useful in practice very often. As we will see shortly, combining measurements of D (and hence
f) with other kinds of measurements makes it possible to obtain molecular weights without assuming
a spherical shape.
In addition to the issue of shape, the frictional coefficient can also be affected by other factors. In
particular, macromolecules tend to carry a hydration layer of bound water molecules with them, and
this sometimes complicates the analysis of frictional coefficients.
Diffusion or spreading out with respect to orientation rather than position
The idea of diffusion can be generalized to go beyond positional variables. Ordinary diffusion
considers how molecules that begin in the same place spread out to other positions. That idea can
be generalized to molecular orientations. If the macromolecules in a solution can be initially driven
to the same orientation and then allowed to reorient through random rotational changes, then over
time their orientations will spread back out until their orientational distribution becomes uniform.
As with ordinary diffusion, there is a distinct constant associated with rotational diffusion of a
molecule, which is inversely related to the degree of friction the molecule experiences when it
139
tumbles in the viscous solution. Rotational diffusion will come up again later when we discuss special
spectroscopic techniques.
Special Topic in Diffusion: Diffusion to Transporters on a Cell Surface
It is a surprising observation that cells that have surface transporters for taking up nutrients usually
have a rather low density of transporters on their cell surface. Why not gain an advantage by densely
packing the surface with transporters in order to obtain nutrients more rapidly? The answer to this
puzzle comes from examining the peculiar and surprising properties of diffusive behavior.
Net movement can be present in situations where the concentration is not changing over time
anywhere. Steady-state can be achieved where a molecule of interest is being produced at one place
(called the source) and consumed at another (called the sink). Under steady state conditions,
meaning dC/dt=0, Fick’s second law becomes:
((𝜕2𝐶
𝜕𝑥2)
𝑡,𝑦,𝑧
+ (𝜕2𝐶
𝜕𝑦2)
𝑡,𝑥,𝑧
+ (𝜕2𝐶
𝜕𝑧2)
𝑡,𝑥,𝑦
) = ∇2𝐶 = 0
Solving this differential equation gives a description of the concentration everywhere in a system
between the source and the sink. Solving this one equation gives different results for the function
C(x,y,z), depending on the ‘boundary conditions’. The boundary conditions are specified by having
fixed (and unequal) concentrations at the surfaces of the source and sink, so one obtains a different
solution to the differential equation and a
different function for C(x,y,z) depending on
the nature (e.g. size, shape and
arrangement) of the source and sink.
We begin our analysis of the problem of
diffusion to cell surface transporters with a
treatment of a simpler problem where we
have a sphere (representing a cell) whose
entire surface acts as an absorber;
whenever a diffusing molecule of interest
hits the surface it is captured or consumed.
This is the case of a spherical sink, and the
boundary condition is that the
concentration C=0 at the surface of the
sphere. We can treat the source as being infinitely far away – imagine an enclosing sphere of very
large radius giving a boundary condition of C equal to some fixed value C0 at infinity. Without proof,
we can find that the solution to C2 = 0 with these boundary conditions is
140
C(r) = C0 (1 – a/r)
where the spatial variables x, y, z, are
replaced instead with r since the
problem is spherically symmetric and
a is the radius of the spherical
absorber. Notice that, as required, C =
0 at r=a and C = C0 at r = infinity. You
can further prove to yourself that this
equation does indeed obey Fick’s
second law by converting r to
sqrt(x2+y2+z2) and taking second
partial derivatives to show that 2C =
0.
Now that we know what the concentration function looks like, we can determine how fast the
absorbing sphere is capturing diffusing molecules. We can find this by evaluating the flux J at the
surface and then multiplying by the area of the sphere, 4a2.
|J| = D|dC/dr| = DC0 a/r2
and at the surface of the sphere (r=a), so
|J| = DC0/a
Taking into account the surface area of the sphere,
Capture rate for an absorbing sphere = 4a2 DC0/a = C04Da
This is how fast a spherical cell of radius a could capture a diffusing nutrient whose diffusion
coefficient is D and whose bulk concentration is C0.
Now we have to answer the harder question concerning the capture rate for a sphere where the
diffusing molecule is captured only if it collides with the sphere at small absorbing patches (i.e.
transporters); the rest of the sphere is not absorbing. This problem was first treated by Howard Berg
and is discussed in his short classic Random Walks in Biology. Let’s say that there are N patches on
the surface and each is circular with radius s. Using a clever mathematical analogy between diffusive
resistance and resistance in an electrical circuit, Berg reasoned as follows. The rate of capture by a
single circular disk of radius s is (given without proof) C04Ds. [This comes from solving C2 = 0 using
boundary conditions of C=0 at the surface of a flat circular disk, calculating the flux as the derivative
of C, and then integrating the flux over the circular patch.] Then Berg notes that the problem of
diffusion to a set of patches on a sphere can be broken down into two steps: (1) diffusion from infinity
141
to a spherical surface just outside the sphere of interest (but with a radius not substantially greater
than a), followed by (2) diffusion to a set of circular patches.
Then Berg introduces an
electrical analogy. Recall that
electrical resistance is R=V/I
(which is a driving voltage
divided by a flow). By analogy,
diffusive resistance would be the
driving concentration divided by
the capture rate. For the case of
the sphere whose entire surface
is absorbing, we get C0/(C04Da)
= 1/(4Da) as the diffusive
resistance. For capture by a
single circular disk, for the
diffusive resistance we get
C0/(C04Ds) = 1/(4Ds). Now we
put the two steps together. The
two steps occur in series (one
after the other), so we should add
the resistances of the two steps
together. But first we have to
account for there being N
separate patches. Flow to the separate patches can occur in parallel, so we need to divide the
diffusive resistance of the second step by N. The total diffusive resistance becomes
1/(4Da) + 1/(N4Ds) = 1/(4Da) (1 + a/(Ns))
Finally, we convert to a rate of capture by taking the driving concentration and dividing by the
diffusive resistance.
Capture rate for a sphere with N absorbing patches = C04Da / (1 + a/(Ns))
Recall that the sphere whose entire surface was absorbing had a capture rate of C04Da. Therefore,
we can express the relative or fractional speed of capture for the sphere with patches (in comparison
to the fully absorbing sphere) as
fractional capture rate = 1/ (1 + a/(Ns))
Evidently, the capture rate has asymptotic behavior in terms of the number of patches N. If we call
the number of patches where the fractional capture rate is 50%, N50%, then N50%=a/s. In other
142
words, if the sphere has a radius that is a hundred times greater than the radius of the patches, then
a few hundred patches spread across the surface of the sphere will achieve 50% capture efficiency.
So what fraction of the surface of the sphere is actually covered by patches under these conditions of
50% capture efficiency? The area of the sphere is 4a2. The total area occupied by the circular
patches of radius s would be N50%s2, which after substituting N50%=a/s would give 2as. So the
fractional coverage of the spherical surface would be 2as/4a2 = s/(4a). Note that this is a small
number if the radius of the sphere is large compared to the size of the patches.
As a practical example, if the sphere is a bacterial cell with radius 1um, and for the sake of argument
we take the size of a transporter on the cell surface to be about 5Å in diameter, then the fraction of
the cell surface that needs to be occupied by transporters to reach 50% maximum capture rate is
510-10M/410-6M = 3.910-4, which is less than 1/10th of 1%!
What this exercise shows is that diffusion has peculiar properties, leading here to a phenomenon
where good capture efficiency is achieved with low surface coverage, and not much extra advantage
is gained by increasing the density of transporters significantly – increasing the density by a factor of
9 gets you to 90% efficiency. And there is of course a substantial cost associated with producing large
quantities of cell surface transporters, so the cell reaches a point where there are diminishing returns
for synthesizing more transporters. As an interesting counterpoint, cell surface phenomena that are
not diffusion-limited show strongly contrasting behavior. Photosynthetic antenna proteins and other
proteins that generate energy by light absorption are sometimes very densely packed on cell
surfaces, sometimes to the point of forming two-dimensional protein crystals in the membrane; light
absorption does not obey the same behavior as diffusion.
143
CHAPTER 14
Sedimentation velocity
Earlier we discussed sedimentation in the context of an equilibrium situation, where the experiment
was run essentially to completion, to a point where no further concentration changes were occurring,
and the external force of centrifugation (or gravity) came into balance with the opposing force of a
concentration gradient. Now we will consider a different scenario. In the limit we might imagine a
situation where the sample is being spun at such a high speed that it would nearly all be driven to the
bottom of the tube if the experiment were continued indefinitely. Now instead of considering the
ultimate equilibrium situation we might ask how fast the macromolecules are moving downward
during the experiment. This brings us to consider a balance between different forces: the downward
external force due to centrifugation and an opposing frictional force limiting the speed of movement.
We saw the frictional force come into play in the previous chapter, in opposition to a concentration
gradient. We can draw a scheme that ties together different kinds of measurements and experiments
we have discussed where different pairs of forces are put into balance as shown below. The arrow
at the lower right is of interest to us now.
144
Sedimentation coefficient, s
From before, the external force (on a per molecule basis) due to centrifugation is m2r, where is
the density increment, is the angular velocity, m is the mass, and r is the distance from the axis of
rotation. The opposing frictional force is –fv. Setting the sum of forces to zero (e.g. at terminal
velocity) gives
v = m2r/f
or converting to a per mole basis
v = M2r/(NAf)
How is the velocity v visualized in a velocity sedimentation experiment? If the sample begins with
the macromolecule uniformly distributed (e.g. with the concentration equal everywhere in the tube),
then when the centrifugation begins (at very high speeds as we discussed above), then the top region
of the sample will begin to be depleted of macromolecules. In the limiting scenario, there will be an
effective boundary position; at lower values of r the concentration of the macromolecule would be
nearly zero, as diagrammed here:
The sedimentation velocity, v, is the speed of the boundary, i.e. (boundary position, r)/t.
Meaning and measurement of the sedimentation coefficient, s
How does the sedimentation velocity v relate to molecular properties? We can see from the equation
above that v is affected both by molecular properties (e.g. M) and by experimental parameters (e.g.
). The behavior is clarified by separating the two kinds of variables on different sides of the
equation. Then,
v/2r = m/f
145
Now we can introduce the sedimentation coefficient s to be equal to those quantities. In other words,
s is obtained experimentally as s=v/(2r). And s relates to molecular properties according to
s = M/(NAf) or s = m/f
The advantage of separating the variables in this way and assigning a new variable s is clear. If we
increase the angular velocity in the centrifugation experiments, the sedimentation velocity goes up
also, but the sedimentation coefficient is unaffected. This must be the case since the equation above
shows us that s can be written in terms of molecular properties alone without reference to
experimental parameters.
As an aside, you can see that since s=v/(2r), we could try to obtain a value for s by measuring the
sedimentation velocity v (i.e. the speed of the boundary) at some instantaneous point in the
experiment and then dividing by 2r (using the value for r of the boundary at that instantaneous
point). But this is a bit sloppy given that v will be dependent on r. Better is to note that since v is
defined as dr/dt, s = (dr/dt)/(2r) = (d(ln(r))/dt)/2. So, measuring the position r of the boundary
at a series of time points during the experiment and plotting them as ln(boundary position) vs t
should give a straight line with slope s2, from which s can be obtained.
For convenience, a special unit is typically used to express the value of the sedimentation coefficient
s (whose natural units are seconds). The Svedberg, S, is defined as 10-13 sec. You are likely familiar
with this notation from molecular biology courses. For example, you learned about the 50S large
subunit of the ribosome; its name originates from its sedimentation coefficient – it’s s value is 5010-
13 sec.
Relating s to molecular properties
The sedimentation coefficient relates to molecular properties in two ways, through direct
dependence on mass and through the frictional coefficient f, which also depends on size (and
therefore mass). This leads to a somewhat complex dependence of s on mass. Because the frictional
coefficient depends on size via a linear dimension (i.e. radius, R), f goes as the 1/3 power of volume
and mass. So, from the equation above, s = m/f , we should expect s to depend on the 2/3 power of
m. Combining
𝑠 =𝑚𝜙
𝑓⁄
with Stokes’ equation for f (assuming a spherical shape), f= 6R,
𝑠 =𝑚𝜙
6𝜋𝜂𝑅⁄
146
But R relates to volume and mass according to
𝑉 =4
3𝜋𝑅3 and 𝑅 = (
3𝑉
4𝜋)
13⁄
Relating volume V to mass m by the density of the protein or nucleic acid, m=V, and substituting,
𝑠 =𝑚𝜙
6𝜋𝜂 (3𝑚
4𝜋𝜌)
13⁄⁄
which gives
𝑠 = 𝑚2
3⁄ (𝜙
6𝜋𝜂⁄ ) (4𝜋𝜌/3)1
3⁄ or 𝑠 = (𝑀 𝑁𝐴⁄ )
23⁄
(𝜙
6𝜋𝜂⁄ ) (4𝜋𝜌/3)1
3⁄
This can of course be further rearranged to give M in terms of s raised to the 3/2 power.
Do larger molecules sediment faster or slower than smaller molecules of equal density and similar
shape? The centrifugal force on an object is proportional to its mass, but the opposing force is
proportional only to the 1/3 power of the mass, so larger molecules sediment faster, according to the
2/3 power of their mass.
The equation above for relating mass to the sedimentation coefficient relies on the assumption of a
nearly spherical shape, since we were forced to employ Stokes’ equation to relate friction to size and
mass. And as we discussed before, if the molecule of interest is highly non-spherical, we may
misestimate the mass by such an approach. To be specific, if a molecule is highly non-spherical, than
compared to a spherical molecule of the same mass, it will experience the same centrifugal force but
a greater frictional force, resulting in a smaller value of s. From the equation above you can see that
a lower value of s for the non-spherical molecule would lead to an erroneously low value for the
estimated molecular weight.
Combining s and D to get molecular weight without a spherical assumption
We can free ourselves from the assumptions of a spherical shape if we have measured values for s
and D together. D and s both had relations to the frictional coefficient, but if we have values for s and
D we can cancel f out and avoid the need to obtain an expression for f in terms of a sphere. From our
previous chapter, f = kBT/D, and from above, f = M/(NAs). Setting kBT/D = M/(NAs), we get
𝑀 = (𝑅𝑇
𝜙)
𝑠
𝐷
147
So the molecular weight relates simply to the ratio of s to D, regardless of shape. Furthermore, if we
obtain a valid value for M from s and D together, we have the opportunity to evaluate the shape
properties, for example by checking to see how closely the value for f (which we can calculate directly
from D) matches the value you would expect for f for a molecule with mass M if it was indeed a sphere
(using Stokes’ equation).
Earlier we discussed various ways of
measuring the diffusion coefficient,
but in fact information about D can
typically be obtained from the
sedimentation experiment itself. In
our initial discussions we imagined
that the boundary in the
concentration profile would be
perfectly sharp. But in fact diffusion
would be occurring at the same time
as sedimentation, thereby causing
some spreading out at the boundary,
and to a concentration profile that is
not infinitely sharp and steep at the
boundary. Therefore, a more
advanced mathematical treatment
can extract information about D from
the shape of the sedimentation
profiles.
A summary of molecular
weight determination
from sedimentation and
diffusion measurements
148
CHAPTER 15
Chemical Reaction Kinetics
In this chapter we discuss the rates of chemical reactions, focusing on the meaning of reaction
velocity, its dependence on the stoichiometric order of a reaction, and on the time-dependent
behavior of reactant and product concentrations.
Reaction velocity, v
If we conceive a reaction generally as
reactants products
then the reaction velocity v is the frequency (#/time) with which the event described by the reaction
arrow is occurring per unit volume. The units of v are therefore #/(volumetime), which is
concentration per time or M/sec. Consistent with those units, one can see that (as long as other
reactions are not simultaneously producing or consuming the same reactants and/or products) the
reaction velocity is directly reflected in the rate of change of the concentration of reactants and
products. More specifically, note that there is but one velocity associated with the reaction, though
it may be that multiple reactants are being consumed and multiple products are being generated.
The reaction velocity is indicated equally by the rate of change of any of the species involved.
However, the stoichiometric coefficients associated with the reactants and products must be
accounted for carefully. If for instance a chemical species has associated with it a stoichiometry of 2,
then each reaction event corresponds to a consumption or production of two molecules of that
species. Therefore, for the general reaction:
A + B + … C + D + …
-d[A]/dt = v
-d[B]/dt = v
d[C]/dt = v
d[D]/dt = v
and so on for any and all species. Alternatively,
v = -(1/) d[A]/dt = -(1/) d[B]/dt = (1/) d[C]/dt = (1/) d[D]/dt = …
Evidently, if we measure the rate of change of the concentration of some species involved in the
reaction then we have measured the reaction velocity v, assuming we have properly accounted for
the stoichiometry.
149
Rate laws: how v depends on concentrations
The velocity of a reaction naturally depends on how concentrated the reactants are; if the number of
reactant molecules in a unit volume is vanishingly small then surely we can expect the frequency with
which we observe the molecule in question undergoing reaction events in that volume to also be
effectively zero. If the concentrations are higher, then the reaction velocity will be higher. Besides
the dependence on concentration, different reactions (i.e. involving different chemical species) will
have different reaction velocities according to the likelihood of the underlying chemical events. This
natural likelihood of a reaction to occur is captured by a rate constant k. The combined dependence
of a reaction velocity v on the rate constant k and the concentrations is referred to as a rate law. The
rate law can be complicated for complex reaction schemes, but for reactions that represent simple
individual chemical events, the dependence of v on concentrations can be written by inspection. Such
reactions are sometimes referred to as ‘elementary reactions’, and are to be distinguished from
scenarios where a written reaction actually describes the net stoichiometric result arising from more
than one reaction, e.g. two operating in sequence. For a single elementary step of the form:
A B
where k is the rate constant, the rate law is
v = k[A]
and [A] is the concentration of species A. Such a reaction is said to be first order in A. For a reaction
of the form
2A B
v = k[A]2
and the reaction is said to be second order in A. For a reaction of the form
A + B C
v = k[A][B]
and the reaction is first order in A and first order in B, and so on. The multiplicative or higher order
dependence of the reaction velocity in cases where multiple molecules are reacting at the same time
is a reflection of the joint probability that both reacting molecules are present together, colliding with
each other. Note that in our discussions of kinetics we will use brackets to denote concentrations to
be most consistent with common usage (instead of using Ci as we did in earlier chapters).
150
Relationship of rate constants to equilibrium constants
For reactions drawn as above, the single forward arrow indicates an irreversible process where the
combined free energies of the reactants are so much higher than for the products, that reaction events
in the reverse direction effectively never occur. Such reactions go to completion, without residual
reactants. In contrast, for reactions where the energetics on the two sides are more nearly balanced,
reaction events can occur in both directions. The velocities of the forward and reverse reactions
depend on the concentrations of the reactants and products, respectively. And when concentrations
are reached where the forward and reverse velocities are equal, then no net conversion is occurring
(though conversions are in fact occurring in both directions). This is what is meant by chemical
equilibrium. This notion gives us an important relationship between rate constants and equilibrium
constants. For the reaction
2A B
where k1 is the forward rate constant and k-1 is the reverse rate constant, the forward reaction
velocity would be k1[A]2, while the reaction velocity in the reverse direction would be k-1[B]. The
equilibrium condition is where those two velocities are equal, giving
k1[A]2 = k-1[B] (at equilibrium)
and
k1/k-1 = [B]/[A]2 (at equilibrium)
Evidently, the ratio of rate constants k1/k-1 is equal to the equilibrium constant K. This is a general
result.
Integrating rate laws
For simple reaction schemes it is often possible to integrate the differential equations that come from
the rate law in order to obtain a complete description of how the concentrations of the reactants and
products change over time. We will work out the results for first order and second order reactions:
1st order decay
A B
151
In order to get to a differential equation in terms of [A], we combine two points. First is the definition
of the velocity in terms of the rate of change of [A], v = -d[A]/dt. Second is the rate law that describes
the dependence of the velocity on [A], v = k[A]. Together these give,
−𝑑[𝐴]
𝑑𝑡= 𝑘 [𝐴]
∫𝑑[𝐴]
[𝐴]= −𝑘 ∫𝑑𝑡
ln[𝐴] |[𝐴]0
[𝐴]= −𝑘 𝑡|
0
𝑡
which gives the familiar first order decay equations
ln ([𝐴]
[𝐴]0) = −𝑘𝑡
and
[𝐴] = [𝐴]0𝑒−𝑘𝑡
The behavior of [A] over time is exponential and the behavior of ln [A] is linear with time, with a slope
that gives the rate constant k.
Describing decay times for 1st order decay
152
The time scale of first order decay is often
described in terms of a half-life, t1/2. This is the
time required for a reaction to go from some
given conditions to 50% completion. A slightly
different parameter, , is sometimes used to
describe decay times. It gives the time required
for a reaction to go to a degree of completion
that is 1/e compared to the initial condition.
That is, [A]/[A]0 = 1/e. The relationship
between t1/2 and is obtained by comparing
ln (1/2) = -k t1/2 to ln (1/e) = -1 = -k
giving
t1/2 = ln(2)
For the simple first order decay reaction of [A] above,
t1/2 = ln(2)/k
and
= 1/k
Note that the physical interpretation of is slightly more complex than t1/2, but gives a simpler
algebraic relationship to the rate constant. We will see later that in more complex kinetic schemes
we sometimes get first order equations with more complex expressions in the exponent term. But
is always simply related to the exponent (which multiplies time) by a reciprocal relationship. That
is, if
𝑥 = 𝑥0𝑒(some expression)𝑡
then
𝜏 =1
(some expression)
and
𝑡12⁄
=ln (2)
(some expression)
153
Integrated rate law for a 2nd order irreversible reaction
2A B
Again we combine two expressions involving the velocity v, one that defines v in terms of the rate of
disappearance of the substrate (v = (-1/2)d[A]/dt) and the other a rate law that describes the
dependence of v on the substrate concentration (v=k[A]2). Combined, these give
−𝑑[𝐴]
𝑑𝑡= 2𝑘 [𝐴]2
∫𝑑[𝐴]
[𝐴]2= −2𝑘 ∫𝑑𝑡
−1
[𝐴]|[𝐴]0
[𝐴]
= −𝑘 𝑡|0
𝑡
1
[𝐴]−
1
[𝐴]0= 2𝑘𝑡
In this case, a plot of 1/[A] versus time gives a
straight line whose slope relates to the rate constant
k.
Other irreversible reactions of higher order can be integrated easily in a similar fashion.
Establishing a rate law from measured reaction velocities
We saw above how different rate laws give a different time evolution for the concentration of
reactants. And whether a reaction follows a first order or second order (or some other) rate law can
therefore be examined by plotting ln[A] or 1/[A] as a function of time and checking to see if the result
is a straight line. A different way of experimentally examining a rate law is by evaluating the
dependence of reaction velocity on concentrations. Measuring initial reaction velocities under
different concentrations makes it possible to determine what exponents are associated with the
reactant concentrations. If a reaction is first order in [A], then v will depend linearly on [A] (i.e.
doubling [A] will double the reaction velocity). Likewise, if a reaction is second order in [A], then
doubling [A] will quadruple the velocity, and so on. Some complicated reaction schemes can show
non-trivial dependence on concentration, even non-integer exponents. As a general approach to
establishing an exponent, if
154
v = [A]
then for rate measurements made at two different concentrations,
ln (v2/v1) = ln([A]2/[A]1)
Behavior of more complex reaction schemes
We are very often interested in the behavior of kinetic schemes involving more than one independent
reaction event. Complexity can arise in different forms. The two reactions may effectively operate
in sequence, with the product of the first reaction being the reactant in the second reaction. Or two
reactions may be operating on the same species, giving a scheme that is branched rather than linear.
Regardless of the complexity of the reaction scheme being proposed, setting up the underlying
differential equations is generally straightforward. As an example,
gives the following equations:
d[A]/dt = -k1[A], d[B]/dt = k1[A] – k2[B], d[C]/dt = k2[B]
Note here that the rate of change of [B] involves two terms, the rate at which it is being formed minus
the rate at which it is being consumed. For another example,
d[A]/dt = -k1[A] - k2[A] = -(k1+k2)[A], d[B]/dt = k1[A], d[C]/dt = k2[A]
Steady state assumptions for obtaining simple rate laws for complex reactions
Complex reaction schemes generally have complex behavior, including non-trivial equations for the
time dependence of the reactants and products. And the dependence of the rate of the overall
reaction (i.e. the rate law) can depend on the concentrations of species that do not contribute to the
overall reaction. Sometimes a complete description of the behavior can be obtained by solving the
full system of differential equations, but this may be difficult. However, especially in cases where
155
sequential reactions are involved, simplified rate laws can often be obtained by assuming steady state
conditions. Steady state refers to conditions where there are intermediate species (which do not
contribute to the overall reaction stoichiometry) whose concentrations have reached a constant
value, at least momentarily. In other words, one can work out a simplified expression for the overall
reaction velocity under conditions where d[Intermediate]/dt=0. As an example, consider
Here, the intermediate is B; it does not contribute to the overall reaction stoichiometry A → C. First
we write out an expression for the change in [B], based on elementary rate laws for all the steps in
which B is formed or consumed, and then we set that derivative to zero according to the steady state
assumption.
d[B]/dt = k1[A] – k-1[B] –k2[B] = k1[A] – (k-1+k2)[B] = 0
Now we rearrange to get an expression for [B] that we can use for substitution subsequently. At
steady state,
[B] = k1[A]/(k2 + k-1)
Now we can go back to the original reaction scheme and write an expression for the overall velocity.
The overall reaction velocity could be defined as v =d[C]/dt. Then we can write an equation for
d[C]/dt in terms of elementary rate laws. Here, d[C]/dt = k2[B]. Now we substitute for [B] at steady
state from above to get
v = k1k2[A]/(k2 + k-1)
Under this treatment, the 2-step reaction behaves as first order in [A] at steady state. This particular
reaction scheme shows up often, including in treatments of enzyme kinetics, as we will see later.
Numerical computer simulation of more complex reaction schemes
In cases where complete solutions are difficult to obtain by solving differential equations, and where
approximations like steady state are undesirable, one can almost always simulate the behavior of a
complex reaction scheme using simple computer programs. The key is to treat the time derivative of
the concentration of each species as the ratio of a very small change in concentration over a very
small time increment. For example, in the scheme above
Δ[A]/Δt = -k1[A] + k-1[B]
and
156
Δ[A] = (-k1[A] + k-1[B]) Δt
A related equation can be written for each species. To make the computer simulation go, one simply
assigns starting values to the concentrations of all the species, and then updates the concentrations
of all the species in a series of very small time steps on the basis of equations like the one above.
An example of computer code (in the Python programming language) is shown below for simulating
the behavior of the kinetic scheme above. The initial concentrations are [A] = 1M, [B]=0, [C]=0.
# Set up arrays to hold concentrations for 500 time steps
A = [0.0 for n in range (0,500)]
B = [0.0 for n in range (0,500)]
C = [0.0 for n in range (0,500)]
# Assign initial concentrations
A[0]=1.
B[0]=0.
C[0]=0.
# Assign rate constants for the simulation
k1=500.
kminus1=400.
k2=300.
# Choose a time interval small enough so that concentration
# changes in each step will be small.
timestep = 0.00005
# Set up the loop over time. Here, after 500 steps, the total
# time elapsed would be 500 * 0.00005 = 0.025 seconds
# Apply kinetic equations to update the concentrations in each step.
# The index for the time step is specified in brackets.
for nt in range (1,500):
A[nt] = A[nt-1] + timestep*(B[nt-1]*kminus1 – A[nt-1]*k1)
B[nt] = B[nt-1] + timestep*(A[nt-1]*k1 – B[nt-1]*kminus1 - \
B[nt-1]*k2)
C[nt] = C[nt-1] + timestep*(B[nt-1]*k2)
print (nt*timestep, A[nt], B[nt], C[nt])
157
The result of that simulation is shown here.
Enzyme kinetics under a steady-state assumption
The following model is often used to treat the kinetics of a simple unimolecular enzyme reaction:
Here, E is the free or unbound enzyme, S is the free or unbound substrate, ES is the enzyme-substrate
(or Michaelis-Menten) complex between the enzyme and substrate, and P is the product. The ratio
of k1 to k-1 describes how tightly the enzyme binds the substrate, while kcat describes the unimolecular
catalytic rate constant for conversion to P. The treatment of this enzyme model at steady state dates
to the 1920’s by Briggs and Haldane. The velocity of the overall reaction is described by v = d[P]/dt,
and according to the rate law for this elementary step, v = kcat[ES]. But [ES] is an intermediate, and
to obtain an equation for v in terms of species that contribute to the overall stoichiometry of the
reaction, we need to replace [ES]. If we adopt a steady state assumption we can obtain an expression
for [ES]. Taking account of all the steps in which [ES] is formed or consumed, and then setting this to
0,
d[ES]/dt = 0 = k1[E][S] – (k-1 + kcat)[ES]
158
which gives
[ES] = k1[E][S]/(k-1 + kcat)
Then substitution gives
v = kcat k1[E][S]/(k-1 + kcat)
This equation is valid but not very insightful in the sense that it describes the reaction velocity in
terms of the free enzyme concentration; in an experimental set-up one typically has control over the
total enzyme concentration but not the free enzyme concentration, which is clearly a function of how
much substrate is present. To gain more insight, the standard approach is to recast the kinetic
equations in terms of the total enzyme concentration and in terms of the ratio of the reaction velocity
to its maximum possible value (i.e. when all the enzyme is in the [ES] form so that [ES]= [E]total). The
maximum velocity is kcat times the maximum possible value for [ES], which is kcat[E]total =
kcat([ES]+[E]). Then,
v/Vmax = kcat[ES]/(kcat([E]+[ES])) = [ES]/([ES]+[E])
This is sensible. It simply states that the velocity in terms relative to the maximum is given by the
fraction of the enzyme that is in the [ES] form. To simplify the expression further we can divide the
top and bottom by [ES] to give
v/Vmax = 1/(1+ [E]/[ES])
Then we can take the previous equation, [ES] = k1[E][S]/(k-1 + kcat), and rearrange it to get an
expression for [E]/[ES] = (k-1 + kcat)/(k1[S]), which we can substitute in the equation above to give
v/Vmax = 1/(1 + (k-1 + kcat)/(k1[S]))
Multiplying the top and bottom by [S] gives
v/Vmax = [S]/([S] + (k-1 + kcat)/(k1))
This has the form of the familiar Michaelis-Menten equation
v/Vmax = [S]/([S] + KM)
where the Michaelis-Menten constant KM can be seen to be (k-1 + kcat)/(k1) for this kinetic model. The
equation above can also be converted from fractional velocity to v to give
v = kcat[E]total [S]/([S] + KM)
159
From the form of this equation you can see that the behavior is hyperbolic, with v approaching Vmax
asymptotically as [S] gets much higher than KM, and v/Vmax =1/2 at [S]=KM.
Relaxation kinetics: how systems approach equilibrium
Fast reactions are difficult to study experimentally. A rapid reaction may proceed to completion so
rapidly once it is initiated that measuring concentration changes is problematic. For bi-molecular
reactions, the reaction may be faster than the time required to mix the reactants prior to
measurement. Problems of that type can sometimes be mitigated with instruments designed to
achieve rapid mixing. So-called stopped-flow or continuous-flow instruments deliver reagents from
two separate syringes into a small mixing chamber where spectroscopic readings can be taken
immediately. The delivery of reagents occurs in a single bolus after which the delivery of reactants
is stopped while the time course of the reaction is monitored. In a variation referred to as
continuous-flow, delivery of reactants continues and the reacting mixture flows down a capillary
tube. In this set-up, distance along the capillary corresponds directly to time elapsed since the
reactants encountered each other, so that concentration measurements at different positions give an
effective time course. The disadvantage of continuous-flow methods is that large amounts of material
may be required. On the other hand, microfabrication techniques for making very small systems for
fluid handling are now fairly commonplace. Microfluidic devices can be custom designed and
fabricated from transparent polymers (typically PDMS), with very many chambers and transport
capillaries with flow controlled by computers and pressure sensitive valves. This has given rise to
the notion of a ‘lab-on-a-chip’, with thousands of reactions being monitored in a piece of polymer the
size of a credit card. These kinds of devices make it possible to study many reactions and reactant
conditions with very little loss of material.
Another category of kinetic analysis aims to study how rapidly systems approach (or ‘relax’ towards)
equilibrium after they have been (nearly instantaneously) perturbed from equilibrium. So-called
relaxation methods circumvent the mixing problem in the sense that the system under study is
already mixed. How can a system in which all the reacting substrates and products are present and
equilibrate rapidly be brought to a point where it is not at equilibrium so that the speed of approach
to equilibrium can be studied? The T-jump (for temperature jump) method was developed in the
1950’s by Manfred Eigen to study fast reactions and their relaxation back to equilibrium. The idea is
that if energy is rapidly delivered to a solution containing a reactant and product at equilibrium, for
example by discharging an electrical capacitor (or in later developments using a laser pulse), then
the temperature of the system can be heated nearly instantaneously. Now, recalling the van’t Hoff
equation, if H for the reaction under study is non-zero, then the equilibrium constant will be
different at the new temperature. So by increasing the temperature suddenly, the system has
effectively been perturbed from its previous equilibrium, not by changing the concentrations of
reactants and products, but by changing the equilibrium position; the system is maintained at the
new temperature while the system approaches the new equilibrium concentrations by conversion of
reactant to product or vice-versa. How fast the system approaches equilibrium clearly depends on
160
the forward and backward rate constants of the reaction. The mathematics for how systems
approach equilibrium reveals some general principles.
The simplest system to consider is A interconverting to B. First we note that because there is just
one conversion in this system that it must be possible, given any concentration values for A and B, to
describe how far the system is from equilibrium with a single concentration variable. Call this
distance from equilibrium x. If we introduce a shorthand notation of �̅� to represent the equilibrium
concentration of A (at the new temperature) and similarly for �̅�, then we can relate the
concentrations of A and B to their equilibrium values plus or minus x, by [A]= �̅�+x and [B]= �̅�-x.
Now we can examine the approach to equilibrium by writing an equation for the time dependence of
x. First, we note that d[A]/dt = d(�̅�+x)/dt = dx/dt. Then we can write an expression for d[A]/dt as
–k1[A] + k-1[B] = –k1(�̅�+x) + k-1(�̅�-x) = k-1�̅�- k1�̅� –x(k1+ k-1). The term k-1�̅�- k1�̅� can be seen to be
equal to zero because �̅�/�̅�= K = k1/ k-1. Dropping those terms gives
dx/dt = –(k1+ k-1) x
Evidently, x (the distance from
equilibrium) follows first order
kinetics. Skipping the familiar
details for handling a first order
differential equation,
𝑥 = 𝑥𝑜𝑒−(𝑘1+𝑘−1)𝑡
and
ln(x/x0) = -(k1+ k-1)t
We can further convert these to the
general forms
𝑥 = 𝑥𝑜𝑒−𝑡/𝜏
161
and
ln(x/x0) = -t/
where in this case = 1/(k1+ k-1).
Assuming one can measure the
concentration of A or B as a function of
time, then x as a function of time is
known, since x = [A]− �̅� = �̅� −[B]. This
allows measurement of (e.g. from the
reciprocal of the slope of ln(x) vs t), so
that the value of (k1+ k-1) is obtained. If
the equilibrium constant K for the
reaction is known also, then k1 and k-1
can both be obtained from values of
and K.
1/ = k1+ k-1 and k1 = Kk-1 , so 1/ = Kk-1+ k-1 = (K+1)k-1
k-1 = 1/( (K+1)) and k1 = K/( (K+1))
Higher order reactions approaching equilibrium
It is not surprising to see that the first order reaction above approaches equilibrium in a first order
fashion. However, we can show that more complex reactions also approach equilibrium in a first
order fashion as they come close to equilibrium. Consider this second order reversible reaction:
Again, there is just one transformation so the distance from equilibrium can be described by a single
variable x, and the concentrations at any point in time can be expressed in terms of x and the eventual
equilibrium concentrations. Following the same approach as before, d[A]/dt = d(�̅�+x)/dt = dx/dt.
And d[A]/dt = –k1(�̅�+x) (�̅�+x) + k-1(𝐶̅-x) = k-1𝐶̅- k1�̅��̅� – x(k1(�̅�+�̅�)+ k-1) – k1x2. The first two terms,
k-1𝐶̅- k1�̅��̅�, cancel to zero, and if we are close enough to equilibrium then x will be small and we can
neglect the x2 term. Then,
dx/dt = –(k1(�̅�+�̅�)+ k-1) x
162
Therefore, the distance from equilibrium x shows first order behavior close to equilibrium, with
=1/(k1(�̅�+�̅�)+ k-1)
Kinetics from single molecule studies
Recent developments in instrumentation have made it possible to perform a variety of
measurements on single molecules. The kinetics of individual molecules undergoing chemical
reactions or conformational transitions can be analyzed, but when working with individual
molecules we have to look at things in a way that does not involve concentrations. For a unimolecular
event, we can get a sense for the rate constant by looking at how long the molecule persists in its
current state before undergoing a reaction. Think of this as a waiting time before a reaction event or
conformational change occurs. Reaction events are always stochastic (i.e. having a random
character), but if we measure the waiting time for several independent reaction events we should be
able to relate that to an underlying rate constant; the average waiting time should be shorter for a
process with a higher rate constant. Consider the irreversible conversion of A to B with a rate
constant k. On average, how long should we expect to wait before any given molecule of A converts
to B? We can work out the relationship by starting with the equation for treating the reaction in bulk:
A/A0 = exp(-t/). In this case =1/k but we will keep the equation in terms of for generality. One
way to look at A/A0 is to see it as the probability that any given molecule of A has not reacted before
time t. From there we can see that the probability that a molecule of A will react precisely at time t
is the derivative of that expression with respect to t. Differentiating, and correcting for the negative
sign, the probability that molecule A reacts precisely at time t is (1/)exp(-t/). Then, to get the
average time at which a molecule of A reacts, which has the same meaning as the waiting time, we
need to get the average value of t by weighting all possible values of t by the probability of reaction
at time t. We get this by multiplying t by the probability of reaction at time t and integrating from t=0
to infinity. From t((1/)exp(-t/)dt evaluated from t=0 to infinity (which requires integration by
parts), we get <waiting time>=. This simple result makes sense since the decay or relaxation time
is a general description of the time scale of a first order reaction.
The result above means that if we evaluate how long it takes a single molecule to undergo a transition
(preferably making the time measurement several times), then we have effectively measured . And
here, k=1/. The figures below illustrate three different kinds of experiments where single molecule
studies have been used to measure the rates of conformational conversions. The first is an example
of a voltage measurement across a cell using the patch-clamp method, where the voltage depends on
whether an ion channel is in the open or closed conformation. The second example illustrates a
spectroscopic measurement where the output signal depends on the conformation of a fluorescently
labeled ribosome, which is interconverting between two states. The third example illustrates a
reversible winding-unwinding transition in a single DNA molecule whose ends are being pulled apart
gently. In all three cases you can see how the data could be interpreted in terms of waiting times
between transitions. In fact, you can see that it should be possible to get the rate constants for both
the forward and reverse transitions by measuring the waiting times in both states. The ratio of those
163
would be the equilibrium constant, and as expected this matches the ratio between the average time
the molecule spends in the two conformations.
164
CHAPTER 16
Kinetic Theories and Enzyme Catalysis
We have discussed the reaction velocity and rate laws, but so far we have not said anything about
what determines the rate constant, k. What makes some reactions intrinsically fast and some slow?
Given the variety of reactions that occur in nature – with vast differences in speed, number of
reactants involved, types of bonds formed, etc. – it is not surprising that different models have been
developed to explain the mechanisms of chemical reactions and their rate constants. Two widely
discussed models are due to Arrhenius and Eyring.
The Arrhenius equation
The Arrhenius equation is often used to discuss reaction rates in terms of molecular collisions.
According to the Arrhenius equation, a rate constant k is determined by
𝑘 = 𝐴𝑒−𝐸𝑎
𝑅𝑇⁄
A is a ‘frequency factor’ and Ea is an activation energy. The frequency of collisions in a reaction is
clearly dependent on concentrations, and that dependence is already built into the equation for
reaction velocity; e.g. for X + Y → Z, v=k[X][Y]. The frequency factor in the equation for k therefore
embodies other phenomena, such as the dependence of molecular velocities and consequently
collision rates on temperature, and the dependence of reaction probability on the orientation of the
colliding molecules.
The activation energy, Ea, describes a lower bound for the energy that reactants must have if reaction
is to occur. Why does Ea enter the equation for k as an exponential term? This follows directly from
the Boltzmann distribution. If we express the number of molecules N(E) that have energy E according
to the Boltzmann distribution (N(E) exp(-E/RT)), we can evaluate the fraction of molecules having
energy at least as high as some fixed energy value Ea by taking the ratio of the area that falls under
the curve and has E greater than or equal to Ea, divided by the entire area under the curve.
∫ 𝑒−𝐸
𝑅𝑇⁄∞
𝐸𝑎
∫ 𝑒−𝐸
𝑅𝑇⁄∞
0
⁄ =−𝑅𝑇𝑒
−𝐸𝑅𝑇⁄ |
𝐸𝑎
∞
−𝑅𝑇𝑒−𝐸
𝑅𝑇⁄ |0
∞⁄ =𝑒−𝐸𝑎
𝑅𝑇⁄
1= 𝑒−𝐸𝑎
𝑅𝑇⁄
165
This explains the exponential term in
the Arrhenius equation. A key element
of the Arrhenius equation is that the
rate constant depends strongly on the
height of an energy barrier. It also
depends on temperature. In fact the
dependence of k on T can be used to
evaluate the activation energy in the
Arrhenius model. The frequency factor
introduces some dependence of k on
temperature vis-à-vis molecular
velocities, but the main dependence of k
on T is through the exponential term.
𝑑(𝑙𝑛(𝑘))𝑑𝑇
⁄ ≅𝑑 (
−𝐸𝑎
𝑅𝑇 )
𝑑𝑇⁄ =
𝐸𝑎𝑅𝑇2⁄
or
𝑑(𝑙𝑛(𝑘))
𝑑 (1𝑇)
⁄ ≅𝑑 (
−𝐸𝑎
𝑅𝑇 )
𝑑 (1𝑇)
⁄ =𝑑 (
−𝐸𝑎
𝑅𝑇 )
−𝑑(𝑇)𝑇2
⁄ =−𝐸𝑎
𝑅𝑇2⁄ 𝑇2 =−𝐸𝑎
𝑅⁄
Eyring transition state theory
Eyring transition state theory provides a slightly different way of looking at things that is more
explicit about the occurrence of high energy species during a single reaction event. In the Eyring
model, a single reaction step, for example
is reimagined in terms of two steps. In the first step, an unstable high energy species referred to as
the transition state is formed, The transition state breaks down to product in the second step. The
‘double-dagger’ symbol indicates the transition state.
The rate constant for breakdown of the maximally unstable transition state is approximated to be the
frequency of molecular vibrations, which from quantum mechanics is on the order or kBT/h, where
h is Planck’s constant. With that substitution, the velocity of the reaction scheme above would be
v=d[C]/dt=(kBT/h)[AB‡]. As we did in our earlier treatments of multistep reactions, we need to make
some assumption if we want to express the velocity in terms of reactants that contribute to the
166
stoichiometry. Here, if we assume that first step describing formation of the transition state is at
equilibrium, then k+‡/k–‡=K‡=[AB‡]/([A][B]), so [AB‡]= K‡[A][B]. Making that substitution, the
velocity of the reaction would be v=(kBT/h) K‡ [A][B]. Now if we compare this expression for v to the
simple rate law we would write for a single elementary reaction step, namely v=k[A][B], we can see
by matching up terms that the Eyring model gives
k=(kBT/h) K‡
as the expression for the rate constant of the reaction. In this model, the rate constant is determined
largely by the equilibrium constant K‡ for forming the transition state. We can also write K‡ in terms
of the free energy for reaching the transition state, G‡. K‡=exp(-G‡/RT). That substitution would
give
k=(kBT/h) exp(-G‡/RT)
This result differs in detail from the Arrhenius equation, but the similarity in terms of an exponential
dependence on an energy barrier is clear. We can look at the temperature dependence of ln(k) for
the Eyring equation in the same was as we did for the Arrhenius equation. The multiplicative factor
at the beginning of the expression introduces a minor dependence on temperature, which we will set
aside in order to look at the main dependence.
d(ln(k))/dT d(-G‡/RT)/dT = d(-H‡/RT + S/R)/dT = H‡/RT2
Comparing to our earlier result with the Arrhenius equation, we can see that the activation energy in
the Arrhenius equation relates closely to the transition state enthalpy in the Eyring model.
Catalysis by lowering the transition state energy
The Eyring transition state model provides a way to look at catalysis in terms of transition state
energies. Consider the reaction of a substrate to form a product in either an uncatalyzed reaction or
a catalyzed reaction. Let the rate constant for the uncatalyzed reaction be kuncat and the rate constant
for the catalyzed reaction be kcat. We can draw the two reactions on the same energy diagram and
consider the effect of lowering the transition state energy in the case of the catalyzed reaction. From
the Eyring equation for the rate constant we can write out the ratio of the two rate constants.
𝑘cat
𝑘uncat=
𝑘𝐵𝑇ℎ⁄ 𝐾cat
‡
𝑘𝐵𝑇ℎ⁄ 𝐾uncat
‡=
𝐾cat‡
𝐾uncat‡
=𝑒
−Δ𝐺‡cat
𝑅𝑇⁄
𝑒−Δ𝐺‡
uncat𝑅𝑇
⁄= 𝑒
−(Δ𝐺‡cat−Δ𝐺‡
uncat)𝑅𝑇
⁄
167
In other words, if a catalyst lowers the
transition state energy for a reaction by an
energy that amounts to 10RT, then the
reaction will be sped up by a factor of e10,
which is about 22,000.
But how is it that a catalyst lowers the
transition state energy of a reaction? This
question was considered in the context of
enzyme catalysis as early as 1948 by Linus
Pauling. His description of how enzymes
must operate (which came more than a
decade before the atomic structures were
known for any proteins or enzymes) was
extraordinarily prescient. According to
Pauling:
I believe that … the surface configuration of the enzyme is … complimentary to an unstable
molecule with only transient existence – namely the “activated complex” for the reaction that
is catalyzed by the enzyme. The mode of action of an enzyme would then be the following: the
enzyme would show a small power of attraction for the substrate molecule or molecules,
which would become attached to it in its active surface region. This substrate molecule, or
these molecules, would then be strained by the force of attraction to the enzyme, which would
tend to deform it into the configuration of the activated complex, for which the power of
attraction by the enzyme is the greatest. The activated complex would then, under the
influence of ordinary thermal agitation, either reassume the configuration corresponding to
the reactants, or assume the configuration corresponding to the products. The assumption
made above that the enzyme has a configuration complementary to the activated complex, and
accordingly has the strongest power of attraction for the activated complex, means that the
activation energy for the reaction is less in the presence of the enzyme than in its absence, and
accordingly, that the reaction would be speeded up by the enzyme.
Further insight can be added by drawing
a kinetic diagram that relates the binding
events in the presence of an enzyme to
the formation of the transition state. The
reactions across the top describe
reaction in the absence of the enzyme
while the reactions across the bottom
describe reaction in the presence of the
enzyme. Following our previous
equations from the Eyring theory, the
ratio between the rate constants in those
two cases would be
168
𝑘cat
𝑘uncat=
𝐾ES‡
𝐾S‡
By providing a binding surface that is complimentary to the transition state form of the substrate, the
equilibrium constant for reaching the transition state is increased by mass action, and according to
the equation above this speeds up the reaction. The situation can also be viewed in terms of binding
affinities of the enzyme for S compared to S‡. Those steps are described by the vertical reactions. By
completing the thermodynamic cycle in the figure, we know that K‡S Kbinding S‡ = Kbinding ES‡K‡ES. The
ratio above for how much a reaction is sped up is then
𝑘cat
𝑘uncat=
𝐾ES‡
𝐾S‡
=𝐾binding S‡
𝐾binding S
In this view, an enzyme speeds up its reaction by binding exceptionally tightly to the transition state
form of the substrate; that is what lowers the free energy of the transition state.
Practical consequences of enzymes binding tightly to the transition state
Understanding that an enzyme binds extremely tightly to the transition state form of its substrate
has led to a number of important practical scientific developments.
Transition state analogues as enzyme inhibitors
Designing molecules to inhibit key enzymes is a major effort in pharmaceutical research. Important
enzyme targets are too numerous to list, but they include enzymes from pathogenic bacteria and
viruses as well as human enzymes involved in disease-related pathways, such as those that regulate
blood pressure and inflammation. If an enzyme speeds up a reaction by a factor of a thousand, then
our reasoning above indicates that the enzyme binds the transition state form of the substrate a
thousand times more tightly than it binds the substrate. So, a drug molecule that looks like the
transition state form of the substrate will bind tightly to the enzyme and act as an inhibitor. The main
challenge of course is that the transition state is entirely unstable. The goal then is to come up with
a compound – a transition state analogue – that looks as much as possible like the transition state,
but yet is stable and can be synthesized (cheaply). This can be a difficult proposition.
Creating new enzymes from a natural antibody repertoire
The concept of catalytic antibodies was first mentioned by chemist William Jencks in 1969 and
reduced to practice about the same time by Richard Lerner and Peter Schultz beginning in the 1980’s.
The goal was to create novel enzymes that would catalyze useful chemical reactions, including types
of reactions that no natural enzymes had evolved to carry out. The idea relies on the spectacularly
169
large diversity of antibodies that can be generated by the mammalian immune system. If an animal
has the genetic capacity to generate 1012 different antibody molecules, surely some of them should
have a tight binding affinity for any imaginable chemical entity, including a transition state for a
reaction one might want to catalyze. According to the Eyring theory and the logic articulated by
Pauling and Jencks, if you can find an antibody sequence with a high affinity for the transition state
of a desirable reaction, then you have found an enzyme for that reaction. The work required to
identify an antibody with the desired property is challenging. In order to induce production of
antibodies that might bind tightly to the transition state, the animal must be inoculated with a
transition state analogue for the reaction, and the same difficulty noted above regarding transition
state analogues must be overcome. Several studies have succeeded in finding antibodies that exhibit
catalytic activity for a desired reaction, but the rates of acceleration have generally not been very
high.
Computational enzyme design
There is much current interest in the idea of using sophisticated computer programs to design the
amino acid sequence of a protein that will catalyze a desired reaction. Rather than designing a novel
protein from scratch, the most feasible approach is to take a natural protein that has a surface cleft
suitable for binding a compound of about the right size, and modify the amino acid sequence mainly
within the binding site cleft. The potential power of this approach is very high. In contrast to the
catalytic antibody approach, there is no need to synthesize a transition state analog. Instead, one
requires an accurate model (i.e. detailed atomic coordinates) for what the transition state is likely to
look like. Modern computer programs are capable of producing reasonable models of transition
states. The most challenging element is designing amino acid changes into a protein in such a way
that the transition state would be tightly bound. One difficult issue concerns the calculation of free
energies for large systems like proteins and their complexes. Even the aqueous solvent is important
to consider given the contribution of hydrophobic effects and water structure in general to the
energetics. Beyond the still unsolved problem of accurate energy predictions, changing the amino
acid sequence of a natural protein very often causes unforeseen and unpredictable effects, including
loss of stability and aggregation. In many cases, changing the amino acid sequence of a protein may
make alternate (non-native) configurations of the protein more stable than the intended structure.
The ability to consider and avoid all the possible alternate structures a modified protein might adopt
is well beyond the current capacity of computers and protein modeling software. Nonetheless, there
have been a few exciting successes in designing new enzyme activities computationally. As with the
catalytic antibodies however, catalytic rates have so far been fairly modest. Further advances along
this line are very likely in the future as computer programs continue to improve.
Kinetic parameters of natural enzymes
Natural enzymes have evolved over billions of years to speed up the reactions they catalyze. How
well do they perform? And could they be better? These are thorny questions. Part of the complexity
concerns the saturation behavior of enzyme kinetics (v=[Etotal] kcat [S]/([S]+KM)). An enzyme that has
170
a very high kcat may not be so great if it doesn’t bind its substrate very well (i.e. if KM is high). Of
course the best thing would be to have a very high kcat and a very low KM. But there may be tradeoffs
in the ability of any given enzyme to optimize both parameters. In view of this, the ratio kcat/KM is
often discussed as a general measure of the efficiency of an enzyme, essentially a reflection of the
joint value of having high kcat and low KM. In terms of a typical hyperbolic graph of enzyme activity
vs substrate concentration, kcat determines the maximum velocity at any substrate concentration,
while kcat/KM is the slope of the velocity curve (normalized for total enzyme concentration) in its
linear region well below saturation. This can be seen by evaluating the standard Michaelis-Menten
velocity equation (above) at [S] << KM. There, v/[Etotal](kcat/KM)[S], which confirms the statement
about the slope of the velocity curve. And v(kcat/KM)[Etotal][S], which shows that kcat/KM takes the
form of a bimolecular rate constant – recall that for A+B→C, v=k[A][B].
What are the values for kcat and KM for natural enzymes? These values show an astonishing range of
variation from enzyme to enzyme. Much of that variation reflects differences in the kinds of
substrates involved and the kinds of chemical rearrangements that take place. Subtler effects relate
to cellular conditions. For example, there is no need for an enzyme to evolve an incredibly tight
binding constant (low KM), which might even come at the expense of a lower kcat, if the substrate in
question exists at high concentration in the cell; operating under highly saturated conditions is
generally not an advantageous strategy. Conversely, if the KM is too high, then the enzyme will be
very poorly occupied; synthesizing idle enzyme molecules is an expensive burden for the cell. A
general finding is that the KM values exhibited by natural enzymes tend to be roughly in the same
range as the natural cellular concentration of the substrate or substrates on which they operate.
According to a recent survey of published enzyme kinetic parameters in the literature, median kcat
values for natural enzymes are on the order of 10 sec-1 overall, and about 10 times faster (100 sec-1)
for enzymes that operate in central metabolism, where high flux is important. The median value for
KM in natural enzymes is around 100M or 10-4 M. The median value for kcat/KM is on the order of
105 M-1sec-1.
As a group, natural enzymes vary widely from these median values. What about limiting cases? Is
there a maximum? Reactions that are bimolecular face an upper limit governed by diffusion. Even if
an enzyme could bind a substrate infinitely tightly and catalyze its conversion to product infinitely
fast, the rate of the reaction would be limited by how fast the two molecules can encounter each other
in solution owing to the limits of diffusion. Depending on molecular sizes (which govern diffusion
coefficients), the upper bound for kcat/KM in diffusion-limited bimolecular enzyme-substrate
encounters is in the range 108 – 109 M-1sec-1. This limiting value comes from analyses beyond those
used to develop our standard equations for reaction velocities, which ignore the role of diffusion.
Very few enzymes operate near the diffusion-limiting value of kcat/KM, but a few do. Superoxide
dismutase (SOD) and triose phosphate isomerase are two well-studied examples.
171
CHAPTER 17
Introduction to Biochemical Spectroscopy
Energy transitions
We understand from quantum mechanics that molecules
can exist only in discrete energy states, and transitions
between one energy state and another can be driven by
absorption or emission of electromagnetic radiation (i.e.
photons) if the energy of the photon matches the energy
difference of the transition. The relationship between the
frequency or wavelength of the radiation and energy is
E=hv=hc/.
You may remember from general chemistry that very
simple molecules like single atoms typically show very sharp absorption and emission bands. They
undergo transitions at only very narrow wavelengths – recall the Rydberg series. The discreteness
of their spectral properties reflects the simplicity of their allowable energy states (i.e. electronic
states of hydrogen-like atomic orbitals).
In contrast, complex molecules
have complex spectra. The
presence of multiple atoms in a
molecule introduces a dependence
of energy on nuclear positions.
Nuclear motions give rise to
vibrational energy states. The
energy differences between
vibrational states are generally
much smaller than those between
electronic states. The idea that
vibrational transitions are smaller in energy and essentially separable from electronic transitions
gives a picture where more finely spaced vibrational states can be superimposed on individual
electronic states, as shown. And even more finely spaced rotational transitions exist within those
states. The much greater complexity of the energy profile for complex molecules introduces the
possibility of very many transitions with closely spaced energies. As a result, absorption and
emission spectra for larger molecules are complex and more continuous in nature rather than
discrete.
An examination of typical energy magnitudes for electronic and vibrational transitions is instructive.
Electronic transitions are typically the subject of spectroscopy in the UV and visible range of the
electromagnetic spectrum. Consider then the energy associated with a wavelength of 400nm in the
172
violet region of the visible spectrum, E=hv=hc/ = 4.110-21 J. By comparison, this is about 120kBT.
According to the Boltzmann equation, the probability of a molecule residing in the excited electronic
state rather than the ground electronic state is essentially zero. We can repeat the calculation for a
typical vibrational transition; these typically occur in the infrared (IR) region of the electromagnetic
spectrum. Consider a carbonyl stretch, for which is approximately 1.9 M. The corresponding
energy is 1.0410-19 J. This is smaller than the energy for transitions in the UV/visible range, but still
equal to about 25 kBT. The conclusion is that, at ordinary temperatures and unless otherwise excited,
molecules generally populate almost exclusively the lowest vibrational state of the lowest electronic
states. This general idea has important implications for what energy transitions are most likely to
occur; a high probability transition requires the initial energy state to be well-populated.
Fluorescence
Our previous analysis tells us that an absorption transition is likely to occur from the ground
vibrational state of the ground electronic state. But to what higher electronic energy states is a
molecule likely to be excited by absorption? An interesting phenomenon arises from the general idea
that the lowest energy nuclear positions for a molecule are typically slightly different for different
electronic states. This is typically diagrammed as show here, where the two black curves indicate
the classical energy of a molecule as a function of its nuclear positions in two different electronic
states. The minimum energies occur at slightly different nuclear positions. Within each electronic
state, a series of vibrational states
are indicated. The width of the
lines (setting aside quantum
mechanical aspects of harmonic
oscillations) illustrate the range of
nuclear positions that are allowed
in each vibrational state. A
consideration of timescales now
leads to an interesting conclusion.
The timescale for photon
absorption is much shorter than
the timescale for nuclear motions.
This means that electronic
transitions occur ‘vertically’ in the
sense of the diagram shown. This
is known as the Franck-Condon
principle. If an electronic
transition must occur without
appreciable movement of nuclei,
then it must occur to a vibrational
state for which the initial nuclear
positions are allowable. The
173
diagram emphasizes that this typically is an excited vibrational state rather than a ground vibrational
state.
After absorption to an excited electronic state, according to the Boltzmann equation a molecule must
return to the ground state. The return to the ground electronic state can occur with emission of a
photon; this is fluorescence. The timescale for fluorescence is typically in the 10-8 to 10-5 sec range,
which is long enough for thermal vibrations and collisions (whose effects are illustrated in red in the
figure) to allow the molecule to descend to
lower vibrational states within the excited
electronic state before returning to the
ground electronic state. Again, that
electronic transition occurs vertically to an
excited vibrational state of the ground
electronic state, after which further
transitions lead back to the ground
vibrational state of the ground electronic
state. The key consequence of this
phenomenon is that the fluorescence
emission spectrum for a molecule is shifted
to lower energy and longer wavelength
compared to the absorption spectrum. This is referred to as the Stokes shift.
Uses and advantageous properties of fluorescence
Fluorescence offers a high degree of sensitivity for detecting and measuring the concentrations of
specific molecules, which may be fluorescent either naturally or by virtue of being chemically labelled
with a fluorophore (a fluorescence chemical group). Particularly in contrast to absorption studies
for measuring concentration, the high sensitivity of fluorescence derives from two features. First is
the shift in wavelength from the incident wavelength to the emission wavelength. In an absorption
experiment involving a dilute or weakly absorbing molecule, one is forced to analyze a small
difference between two large numbers – the number of photons transmitted by a blank compared to
the number transmitted by the sample. In a fluorescence experiment, the change in wavelength
makes it possible to analyze the number of emitted photons without interference from transmitted
photons, which have the same wavelength as the incident light. Taking advantage of the wavelength
difference requires a second monochromator placed between the sample and the detector; a first
monochromator is required between the light source and the sample. The fluorescence emission
intensity is proportional to the concentration of the fluorophore (as long as the concentration is not
too high), making accurate concentration determination possible from very dilute solutions. An extra
level of sensitivity comes from the ability to monitor fluorescence in a direction different from the
path of the transmitted beam. The figure illustrates the combined effects of wavelength change and
detection at an angle. Photons that are scattered (elastically) from the sample emerge at all angles,
174
but their wavelength is the same as the incident beam so they are distinguishable from fluorescent
photons.
Proteins typically have some natural fluorescence owing to the presence of tryptophan amino acids.
But the intensity of tryptophan fluorescence is not especially high, and one is often interested in using
fluorescence to monitor one protein (or nucleic acid) in particular. An enormous range of
fluorophores are available commercially with a wide range of spectral characteristics. These are
typically conjugated chemically to the macromolecule of interest by covalent attachment, often
through nucleophilic attack by a cysteine thiol or lysine amine groups. Fluorescence experiments
can also be performed in situ to monitor the presence and subcellular location of a specific protein
inside cells in tissue culture using fluorescence microscopy. Chemical labeling is generally not
possible in that scenario. Instead, the protein of interest can be rendered fluorescent inside the cell
by creating a fusion at the DNA level between the protein of interest and a naturally fluorescent
protein. Originally discovered in coral sea organisms, numerous such proteins are known with a
diverse range of emission colors; green fluorescent protein (GFP) is the most widely studied. An
interesting variation on the approach is to label two different proteins with distinct fluorescent
proteins having different emission colors, like red and green. Whether the two proteins localize
together in the cell – e.g. if they interact with each other – is evident by joint emission of red and
green colors (making yellow). The level of spatial detail that can be visualized in a standard visible
or UV microscopy experiment is a few hundred nanometers, which is fine enough to visualize
organellar, nuclear, and cytoskeletal structure in eukaryotic cells, but not fine enough to see
molecular structure.
A particularly useful feature of fluorescence is its sensitivity to chemical environment. The greater
sensitivity to environment for fluorescence compared to absorbance relates in part to the longer time
scale of fluorescence. In general, increased flexibility and environmental polarity lead to lower
fluorescent intensity; the peak emission wavelength can also be affected. As an example, the
fluorescence of tryptophan increases by a factor of roughly 4 in a low polarity solvent such as DMSO
175
(dielectric of about 35) compared to water (dielectric of about 80). Exposure of a fluorescent group
to particular chemicals known as quenchers also reduces fluorescence, and the magnitude of the
effect can depend on the degree to which the fluorophore is exposed on the surface of the
macromolecule.
The environmental sensitivity of fluorescence can be exploited in various types of experiments. We
discussed earlier how native tryptophan fluorescence can be used to monitor protein folding.
Tryptophan residues almost always become less flexible and more rigidly held in the folded state of
a protein, leading to higher fluorescence. In another type of experiment, if a protein is suspected to
bind a ligand that is fluorescent (or for which a fluorescent analogue is available), then binding of the
ligand to the protein can be detected by an increase in fluorescence.
Kinetics of fluorescence and competing routes for return to the ground state
After a molecule has
been driven to an
excited state by
absorbing a photon,
there are several
possible competing
routes for returning
to the ground state.
Some of these we
have discussed
already while some
we will return to
later. The relative
rates of these
processes relates
directly to which pathways dominate for a given molecule. If the rate constant for fluorescent
emission is higher than the rate constants for other processes, then most of the excited molecules
will return to the ground state by way of fluorescent emission.
As we discussed above, a number of phenomena affect fluorescence, including chemical environment,
so fluorescence can be used to monitor various events that alter the environment of a fluorophore in
solution. The details of the behavior expected can be understood by analyzing the phototransitions
using treatments similar to those we developed earlier for chemical kinetics. We can simplify things
by lumping together the various non-fluorescent pathways for return to the ground state under a
single rate constant, kother. Various underlying events can then be analyzed in terms of the effects
176
they have on the relative rates
of kfluor and kother. With respect
to kinetic treatments,
fluorescence experiments can
be of two essentially different
types: 1) under continuous
illumination where steady state
behavior is assumed, or 2)
following a brief pulse of incident light, after which time-dependent measurements are made. Note
that the latter type of experiment requires special instrumentation because the time scale for
fluorescence decay is usually shorter than milliseconds. We can analyze the behavior of both kinds
of experiments.
Constant illumination
Under constant illumination, the concentration of the excited state form of the fluorophore (P*) is not
changing. Setting d[P*]/dt = 0
d[P*]/dt = 0 = kabs[P] – (kfluor + kother)[P*]
Rearranging to obtain an expression for [P*],
[P*]=kabs[P]/(kfluor + kother)
Then, the fluorescent intensity Ifluor is given by
Ifluor = kfluor[P*] = kabs[P]kfluor/(kfluor + kother)
Since the rate of photon absorption is kabs[P], the ratio of the number of photons emitted to the
number absorbed – a fractional quantity known as the quantum yield Q – is given by
Q = kfluor/(kfluor + kother)
We can conclude from this analysis that the quantum
yield and the intensity of the fluorescence observed
under constant illumination is decreased by events in
solution that increase the rates of non-fluorescent
‘other’ pathways for return to the ground state. That
idea is illustrate here. One example of such a scenario
is binding of a fluorescent molecule (perhaps an
analogue of a suspected ligand) to a protein; this
would decrease the non-fluorescent pathways by
177
reducing mobility of the fluorophore and thereby increase the quantum yield along with the steady
state fluorescence intensity.
Time-resolved fluorescence
With appropriate instrumentation, an excitation pulse can be applied and the fluorescence intensity
(which must decay back to zero) can be
monitored over time. The same kinetic scheme
as above can be used if we remove the
continuous absorption event. This becomes a
simple case of exponential decay with a total rate
constant of kfluor + kother and a decay time of
= 1/(kfluor + kother)
The comparison of decay behavior in the
presence and absence of processes competing
with fluorescence can be diagrammed as shown.
178
CHAPTER 18
Special Topics in Biochemical Spectroscopy
Polarization and selection rules
In this section we discuss the
important role orientation
plays in spectroscopy.
Orientational effects become
apparent in spectroscopic
experiments conducted using
polarized light. You may
remember that
electromagnetic radiation is a
transverse wave in which a
traveling photon carries
oscillating electric and magnetic field vectors perpendicular to the direction of travel. Light emitted
from an ordinary source (e.g. a light bulb) carries photons whose electric field vectors point in all
possible directions perpendicular to the direction of travel. A variety of materials can be used to filter
ordinary incoming light to produce light that is ‘plane polarized’, meaning that the electric field vector
points in a single direction while it oscillates in magnitude, up and down; the direction of travel and
the electric field vector form a plane.
We know that for a photon of light to be absorbed and cause a transition from an initial state to a final
state that the energy of the photon must be correct. But the direction of its electric field vector is also
critically important. Whether a potential transition can be caused by a photon polarized in a certain
direction is embodied in quantum mechanical ‘selection rules’. In absorption spectroscopy, the
extinction coefficient relates to the strength or probability of a transition by being proportional to
the square of a transition dipole moment, 𝜇 . In a general form of the transition dipole moment,
𝜇 = ∫Ψ𝑖𝑥 Ψ𝑓𝑑𝑥
where 𝑥 describes the general position vector in space and Ψ𝑖 and Ψ𝑓 are the quantum mechanical
wavefunctions for the initial and final energy states. For our purposes of considering the absorption
or emission of light polarized in a particular direction, we can rewrite the equation in separate x, y, z
components. The probability of absorbing a photon polarized along the x direction is related to the
x component of the transition dipole moment, evaluated as
𝜇𝑥 = ∫Ψ𝑖𝑥 Ψ𝑓𝑑𝑥
179
with equivalent equations for polarization along y or z.
Analyzing whether electronic transitions can or cannot occur when the light is polarized in certain
directions can be simplified using a treatment that considers the symmetry vs anti-symmetry of the
initial and final wavefunctions. We will illustrate one example situation where the initial and final
wavefunctions are simple – much simpler than one would encounter with complex molecules, but
still highly instructive in understanding orientational effects. We begin with a reminder about
symmetric and anti-symmetric functions. These refer to functions whose values are either the same
when a spatial variable is negated (i.e. f(-x)=f(x)) or negated when a spatial variable is negated (f(-
x)=-f(x)), respectively. One way of looking at symmetric vs antisymmetric functions is in terms of
polynomial functions. We find that polynomial functions with even exponents (x0, x2, x4, etc.) are
symmetric whereas polynomial functions with odd exponents (x1, x3, etc.) are antisymmetric. We are
particularly interested in considering what happens when we evaluate the integral of a function that
is symmetric or antisymmetric. By either explicitly evaluating the integrals of such functions or by
thinking about the areas under the curves (positive and negative), we can see that odd
(antisymmetric) functions integrate to zero while even (symmetric) functions generally do not. The
illustrations here are 1-dimensional (depending only on x). In three-dimensions where a function
would depend on x, y, and z and integration would be over all dimensions, the result integrates to
zero if the function is odd or antisymmetric with respect to any of the three spatial variables.
Now consider an electronic transition between a bonding molecular orbital and a * anti-bonding
molecular orbital. Such transitions are common in conjugated double bond systems. Both molecular
180
orbitals are effectively
combinations of two side-
by-side p orbitals. The
signs of the two p orbitals
are aligned in the
molecular orbital but
oppositely oriented in the
* molecular orbital,
creating an extra nodal
plane in the latter case.
We can set up a coordinate system at the center of the molecular orbital and then tabulate the
symmetry vs anti-symmetry (or evenness vs oddness) of the functions that get multiplied together
inside the integral for the transition dipole moment. In order to evaluate the total symmetry of the
product of the functions inside the integral, we need to understand the rules for symmetry vs anti-
symmetry when functions are multiplied together. Since the evenness or oddness is a property of
exponents, products of the functions behave according to addition of even and odd numbers:
even+even=even; even+odd=odd; odd+odd=even. We need to make a separate analysis to consider
the transition dipole component for light polarized in each possible direction. For the case of light
polarized along the x direction, the term ‘x’ in the middle of the integral can be understood as being
x1y0z0, which is therefore odd with respect to x, and even with respect to y and z. With these rules in
hand we can construct a table to analyze 𝜇𝑥.
w/r/t axis
�⃗⃗� 𝑖 x = x1y0z0
�⃗⃗� 𝑓 total
x even odd odd even
y even even even even
z odd even odd even
In evaluating 𝜇𝑥, the total function inside the integral is even with respect to all three coordinate
variables, so the integral does not necessarily vanish. We conclude that the to * transition can
occur by absorption of a photon polarized along x (which is the bond direction). This transition is
therefore allowed, though our simple symmetry vs anti-symmetry treatment doesn’t tell us about
magnitudes. Also note that the allowable transition for polarization along x does not mean the
direction of travel of the photon is along x; in fact the direction of travel would have to be
perpendicular to x in order for the polarization to be along x.
Next we can evaluate transition dipoles for light polarized along y or z. Those tables are shown.
181
In both of these cases the total function is odd with respect to at least one variable, so the integral
vanishes. That means that 𝜇𝑦 and 𝜇𝑧 both vanish. Those transitions are forbidden, meaning the to
* transition cannot be promoted by absorption of a photon polarized along y or z. Absorption is
only allowed for polarization along x. Some instinct can be developed to understand this result. In
comparing the symmetry vs anti-symmetry of the initial and final wavefunctions we can see that the
x direction is the only direction in which the two functions differ. An electric field vector along that
direction can therefore drive the conversion of one to the other. The treatment of the transition
dipole moment and selection rules for emission are similar to those for absorption.
Example of absorption of polarized light by an oriented pigment
Although light may only be absorbed when the electric field
is oriented in a particular direction relative to the
chromophore, the effects of this are often not evident in
experiments done in solution, since the absorbing
chromophore is present in all possible orientations. The
dependence of absorption on direction of polarization can
sometimes be seen in a crystalline sample where the
chromophore exists in the same orientation throughout the
crystal specimen. The example shown here comes from a
crystal of a protein that binds a carotenoid molecule as a
cofactor. Carotenoids are long organic molecules with
conjugated double bonded orbital systems, and from our
previous exercise we might expect a carotenoid to absorb
light polarized along the long axis of the molecule. The
photographs shown were taken under a light microscope
where the incident light passed through a polarizing filter
before passing through the crystalline sample. The polarizer was rotated at different angles for the
two photographs. Evidently, the light was polarized in a direction that allowed absorption by the
carotenoid in the top panel, but it was oriented in a direction that did not allow absorption in the
second panel.
Fluorescence experiments with polarized light
w/r/t axis �⃗⃗� 𝑖
y = x0y1z0
�⃗⃗� 𝑓 total
x even even odd odd
y even odd even odd
z odd even odd even
w/r/t axis �⃗⃗� 𝑖
z = x0y0z1
�⃗⃗� 𝑓 total
x even even odd odd
y even even even even
z odd odd odd odd
182
Interesting phenomena occur when a sample absorbs polarized light and then reemits photons by
fluorescence. To simplify the discussion at the outset we will assume that if a molecule absorbs a
photon polarized in a particular direction then by fluorescence it will emit a photon polarized in the
same direction if the molecule has not changed its orientation between the absorption and emission
events. But what about molecular motions, particularly changes in molecular orientation, that are
occurring during the process? How much might one expect a molecule to rotate in the time between
when it absorbs a photon and re-emits a photon by fluorescence? Clearly this depends on the relative
rates of fluorescence and molecular rotation in solution. If random molecular rotation occurs very
slowly compared to fluorescence, then fluorescent photons will be polarized in the same direction as
the incident (polarized) light. Conversely, if random rotations occur much faster than fluorescence,
then emitted photons will have electric field vectors oriented in all directions equally. As a result it
is possible to learn about the relative rates of fluorescence vs molecular rotation by studying the
degree to which emitted photons are polarized in the same way as the incident light. If the time scale
for fluorescence is known then the time scale for molecular rotations can be determined. This is
useful because the time scale for molecular rotation in solution depends on the size of the molecule
(and on viscosity), so ultimately we can get information about molecular size using experiments of
this type. Some of the technical details are described here.
The figure diagrams essential features of a fluorescence polarization or fluorescence anisotropy
experiment. Two polarizing filters are required: one before the sample and one after the sample.
Monochromators (not shown) are also required to select appropriate wavelengths for the incident
and emitted photons being detected. The second polarizer (sometimes referred to as the analyzer) is
rotated during the experiment. This makes it possible to measure the relative intensity of emitted
light that is polarized in different directions; this is a measure of how much molecules have rotated
after absorption and before emission.
183
The mathematical treatment is as follows. The intensity of light emitted parallel to the incident light
is denoted 𝐼∥. The intensity emitted perpendicular to the incident light is denoted 𝐼⊥. The (unitless)
measure of how much stronger the parallel emission is compared to the perpendicular is described
by the fluorescence anisotropy, r. [Anisotropy translates roughly to “not” the “same” in all
“directional movements”.] The anisotropy of the emitted fluorescence is defined in terms of
experimental measurements as
𝑟 =𝐼∥ − 𝐼⊥𝐼∥ + 2𝐼⊥
(0 ≤ 𝑟 ≤ 1)
If the value of r is close to zero (i.e. no anisotropy) then the intensity is the same for parallel and
perpendicular emission, meaning the rate of molecular rotation is much faster than the rate of
fluorescence. If the value of r is close to 1 (perfect anisotropy) then the intensity of emission in the
perpendicular orientation is negligible, meaning the rate of fluorescence is much faster than the rate
of molecular rotation. Useful information comes from intermediate scenarios where the two rates
are in a comparable range and the value of r is intermediate between 0 and 1.
Exactly how does the fluorescence anisotropy relate to the relative rates of fluorescence and
molecular rotation? As in our earlier kinetic analyses, unimolecular rates can be described by
(reciprocally related) decay times. The decay time for changes in molecular orientation is referred
to as the rotational correlation time, denoted here as rot. We denote the fluorescence decay time as
fluor. One form of Perrin’s equation (given without proof) states that
𝑟
𝑟0=
𝜏𝑟𝑜𝑡
𝜏𝑟𝑜𝑡 + 𝜏𝑓𝑙𝑢𝑜𝑟
Setting aside the term r0 momentarily, the equation indicates that if the decay time for rotation is
much longer than the decay time for fluorescence (meaning the rate of rotation is much slower than
the rate of fluorescence), then the anisotropy r would be 1. And conversely r would be zero if the
decay time for rotation was much shorter than for fluorescence. The term r0 is necessary to deal with
an imperfect alignment of emitted and absorbed photons that occurs even without any molecular
rotation. Finally, the rotational correlation time is directly related to molecular size by 𝜏𝑟𝑜𝑡= V/RT,
where is viscosity and V is molecular volume. Therefore, in principle, one can obtain a value for
molecular volume from fluorescence anisotropy measurement, assuming the molecule of interest is
fluorescent or has been fluorescently labeled and the fluorescence decay time can be established in
separate experiments.
In biochemical applications, fluorescence anisotropy experiments are often used not to estimate
actual molecular volumes, but in a somewhat more qualitative way, comparing the degree of
anisotropy before and after some potential binding event for example. The central requirement is
that the event under investigation must cause a change in the rotational correlation time of the
fluorescent molecule. Two kinds of experiments are possible as illustrated. Under constant
illumination (steady state), conditions or events that give rise to slower rotational tumbling give an
184
increase in fluorescent anisotropy, r, since less tumbling occurs prior to fluorescent emission. In a
time-resolved experiment following an incident pulse, the anisotropy will decay more slowly if the
rotational tumbling is slower.
Fluorescent resonant energy transfer (FRET)
Under special circumstances, an excited chromophore can return to the ground state not by emission
but by transferring exciton energy to a nearby chromophore. The efficiency of this process depends
on two main factors: the
degree of overlap
between the emission
spectrum of the first
chromophore (referred
to as the donor) and the
absorption spectrum of
second chromophore
(referred to as the
acceptor).
According to the Förster equation, the transfer efficiency depends steeply on the separation R
between the donor and acceptor.
Efficiency = 1
1+(𝑅 𝑅0⁄ )
6
The parameter R0 is particular for the donor and acceptor pair and depends chiefly on the quality of
the spectral overlap between the donor emission and the acceptor absorbance. Note that when R=R0
the efficiency of energy transfer is 1/2, so a good donor acceptor pair will have a relatively high value
for R0. FRET experiments are useful mainly in the 10 to 100Å range.
185
Looking back at our scheme from the previous chapter showing the possible routes for return of the
excited state to the ground state, we see that energy transfer by FRET is a phenomenon that competes
with the fluorescent emission from the donor. As a result, the presence of the acceptor chromophore
reduces donor fluorescence and speeds decay of the excited donor and its fluorescence. As
diagrammed here, FRET experiments can be done either under constant illumination, where the
fluorescence intensity from the donor is reduced by the presence of the acceptor, or in a time-
resolved experiment where the speed of decay is increased by the acceptor and the characteristic
decay time is decreased.
FRET experiments find use in diverse experiments where detecting the proximity or approximating
the distance between two molecules or functional groups might be informative. Unless one of the
two components of interest is naturally fluorescent, this generally involves labeling both components
– one with the donor fluorophore and one with the acceptor. Experimental measurements give rise
to a value for the transfer efficiency, as diagrammed here for either continuous illumination or time-
resolved studies, after which the efficiency value can be used to approximate the distance between
the donor and acceptor according to the Förster equation. Note that values for R0 have been tabulated
for very many donor-acceptor pairs, so that parameter is typically a known quantity.
FRET in biology
Besides being useful in biochemical experiments, the FRET phenomenon plays a key role in
photosynthesis. The key step that converts light energy into chemical energy in photosynthesis takes
place in a transmembrane protein complex known as the photosynthetic reaction center (RC). The
RC binds a ‘special pair’ of chlorophyll molecules in a parallel and partially overlapping arrangement.
That grouping makes the special pair suitable for participating in the primary photochemical event.
After the special pair is excited, instead of returning to the ground state an electron leaves the special
186
pair and jumps along a path of neighboring pigment cofactors (chlorophylls and carotenoids) bound
to the protein, leading the electron across the membrane and thereby generating an electrochemical
difference between the two sides; this forms the basis for chemical energy conversion in
photosynthesis. But the reaction center by itself is not suited for absorbing photons that are hitting
the photosynthetic membrane everywhere and with a broad range of wavelengths. Other
transmembrane proteins, known as light harvesting complexes (LH), bind a large number of pigment
molecules and surround the reaction center. Depending on the particular system and organism,
multiple types of LH rings may be present. The LH proteins are designed to hold their pigment
molecules in very specific positions with respect to each other, and to tune their spectral properties
so that the pigment molecules can absorb photons efficiently throughout the photosynthetic
membrane and then transfer that exciton energy, essentially by FRET, to pigment molecules closer
to the RC. This results in a funneling effect where, at some expense of lost energy in each transfer
step, the exciton energy is eventually delivered to the special pair in the RC in order to drive the
primary photochemical event. The photosynthetic reaction centers from bacteria (which are
analogous to Photosystem II from higher plants) were the first transmembrane proteins whose
structures were determined in atomic detail in the mid 1980’s. The structure of the bacterial RC is
shown here in a side view with the membrane running horizontally and its pigment molecules in red,
along with a view of the RC and surrounding light harvesting complexes viewed perpendicular to the
photosynthetic membrane.
Spectroscopy of Chiral Molecules: Optical Rotation and Circular Dichroism
Chiral molecules exhibit special spectroscopic phenomena that become evident when they interact
with polarized light. Because practically all biological macromolecules are chiral, as are many smaller
biochemical metabolites, spectroscopic techniques that exploit these phenomena are widely used in
the laboratory.
Circularly polarized light
187
We have discussed plane polarize light at length. There, the electric field vector oscillates in a plane
(e.g. vertically for vertically plane polarized light). Much insight can be gained about how chiral
molecules interact with plane polarized light by taking a monetary leap of faith and noting that a
vector that oscillates up and down vertically can be generated by the sum of two vectors that rotate
in a circle in opposite directions at equal frequency; when they are both vertical (up or down), they
sum to give a vertical result, whereas when they are horizontal they oppose each other and cancel.
For a traveling wave, the circularly rotating electric field vector means that the wave takes the form
of a helix. Therefore, plane polarized light can be imagined as being composed of two circularly
polarized components: one that is ‘right circularly polarized’ and the other that is ‘left circularly
polarized’. This is not merely a thought exercise, because in fact pure circularly polarized light can
be prepared by passing plane polarized light through a so-called ‘quarter-wave plate’, but for now
we will stick to our view of plane polarized light as a composition of two circular components. The
figure shows both
forms of circularly
polarized light; the
‘right’ component
forms a right-handed
helix (like DNA or a
protein alpha helix or a
standard hardware
screw) while the ‘left’
component forms a
left-handed helix. The
sense of the rotation is
that a fixed observer
looking towards the
source will see the
direction of the E field
vector rotate
clockwise in time for
right circularly polarized light as the traveling wave moves past the point of observation. This is
reversed for left circularly polarized light.
The point of considering plane polarized light as a sum of two circular components is that by viewing
them in terms of helical waves we can immediately appreciate why chiral molecules might interact
differently with left vs right polarized light. Helices are chiral, as is a biological macromolecule, and
we can appreciate the distinct interactions chiral objects make with each other by thinking about
putting our foot into a shoe; feet and shoes both being chiral, a particular shoe interacts differently
with your two feet. So what are the distinct kinds of interactions that a chiral molecule can make
with a chiral light wave? Two effects are noteworthy, relating to differences in absorption and
differences in index of refraction, and these lead to two important types of experiments, which we
discuss next.
188
Circular dichroism (CD)
If the right and left circularly polarized components (imagined to be contained within a beam of plane
polarized light) are absorbed to the same extent when passing through a sample, then the light that
is transmitted should naturally remain plane polarized. But one component may be absorbed more
strongly than the other; this forms the basis for circular dichroism or CD. What is the consequence?
Clearly if the left circularly polarized component is absorbed slightly more, then the transmitted
beam should have at least a slightly larger component of the right circularly polarized type. If we add
up oppositely rotating vectors of unequal magnitude, we get an elliptical shape for the resulting path
of the electric field vector.
The magnitude of the circular
dichroism effect is captured by a
parameter referred to as the ellipticity,
. Diagrammatically, relates to the
angle formed by a line between the tips
of the transmitted electric field vectors
in the perpendicular directions of
maximum and minimum magnitude, as
shown. The ellipticity of the
transmitted beam can be measured by
a CD spectrophotometer; this requires
additional polarizers between the sample and the detector.
The ellipticity effect originates from a difference in extinction coefficients (and therefore absorbance
values) for left vs right circularly polarized components, so naturally should reflect that
relationship. The equation for in terms of absorbance for left vs right is:
𝜃 =2.303(𝐴𝐿 − 𝐴𝑅)
4=
2.303
4𝑑𝐶(𝜖𝐿 − 𝜖𝑅)
where A refers to absorbance, 𝜖 refers to the extinction coefficient, 𝑑 is the path length of the light
through the sample, and C is the molar concentration (recall A= 𝜖𝐶𝑑). According to the sign
convention, higher absorption of left-circularly polarized light, resulting in a greater right component
and therefore a clockwise-rotating elliptical field vector, corresponds to positive . But this equation
requires further explanation of the multiplicative factors, 2.303 and 1/4. The 2.303 term is
recognizable as ln(10) which we might guess relates to the conventional use of log10 for absorbance
equations. But what about the 4? In many texts this appears without comment. At the expense of
some thorny details we will show the origin of these multiplicative terms. To begin we point out that
the diagram for is vastly exaggerated; the actual differences in absorbance are usually very small
(which means that is small), which makes it possible to simplify a number of complex non-linear
relationships between variables in this problem with linear approximations (i.e. keeping just the first
terms in a Taylor expansion). Briefly, the transmittance for the left component would be 10−𝜖𝐿𝑑𝐶 =
189
𝑒−2.303𝜖𝐿𝑑𝐶, and similarly for the right. But you may recall from earlier physics courses that the
intensity of a light beam (which here relates to the transmittance) goes as the square of the
magnitude of the electric field vector, so the lengths of the electric field vectors in the diagram for
go as the square roots of the transmittance values. So, the magnitude of the transmitted electric field
vector for the left component would be 𝑒−2.303𝜖𝐿𝑑𝐶/2 , and likewise for the right. When the exponents
in those terms are small, we can approximate e-x as 1-x from Taylor’s expansion to get
(1 − 2.303𝜖𝐿𝑑𝐶/2) for the E field magnitude for the left component, and similarly for the right. Then,
noting from the diagram that the tangent of would be the ratio of the short axis to the long axis, and
the length of the short axis is the length of the right circularly polarized electric field magnitude minus
the left, and the long axis is the sum of the magnitudes, then
tan(𝜃)
=((1 − 2.303𝜖𝑅𝑑𝐶/2) − (1 − 2.303𝜖𝐿𝑑𝐶/2))
((1 − 2.303𝜖𝑅𝑑𝐶/2) + (1 − 2.303𝜖𝐿𝑑𝐶/2))⁄
If the terms of the form 2.303𝜖𝐿𝑑𝐶/2 that appear in the denominator are << 1, then the whole
denominator is very nearly equal to 2. Finally, when is small, then Taylor’s expansion gives
tan() (in radians), and so finally the whole expression simplifies to the one earlier with the 2.303
in the numerator and 4 in the denominator coming from (1/2)/2.
As a final manipulation, if the value for is expressed in degrees instead of radians, which introduces
a multiplicative factor of 180/=57.3 degrees/rad, and the ellipticity is normalized to be a molar
value by dividing by molar concentration and also normalized for path length (typically in cm), then
the molar ellipticity in degrees is 𝜃(in degrees)
𝐶 𝑑=
2.303
457.3(𝜖𝐿 − 𝜖𝑅) = 32.98(𝜖𝐿 − 𝜖𝑅). And finally, for historical reasons relating to
volume and length unit conversion, a factor of 100 is present in the standard equation for the molar
ellipticity (denoted by square brackets) giving, [𝜃] = 100𝜃 (𝑑𝐶)⁄ , to give:
[𝜃] = 3298(𝜖𝐿 − 𝜖𝑅)
which matches standard textbook expressions.
Optical Rotation
The CD effect arises from differences in absorption. A different effect arise when the left and right
circularly polarized components travel through the sample at different speeds (owing to electronic
interactions with a chiral molecule). What happens when light goes through a sample more slowly?
The frequency of the wave is unchanged, but the wavelength changes. The speed of light is inversely
dependent on the index of refraction, n, and the index of refraction here may be different for the left
component compared to the right. cL=c0/nL and cR=c0/nR, where c0 is the speed of light in a vacuum
and the subscripts refer to left and right. Then, L = cL/v = c0/(nLv). How many oscillatory cycles does
190
a light beam make when it passes through a sample of thickness d? The
answer is d/. The angular rotation in radians would be 2d/L, which after
substituting the expression for would be 2dnLv/c0 = 2dnL/ for the left,
and similarly for the right. Because of the dependence on the index of
refraction, the different components will execute a different amount of
rotation as they pass through the sample. The final orientation of the
polarization direction is determined by the sum of left and right vectors,
whose angle is the average of the two component vectors, so the resulting
transmitted wave should be rotated (as shown) according to half the
difference between their separate angles of rotation. This gives for the angle
of rotation of the polarized beam,
𝛼 =𝜋𝑑
𝜆(𝑛𝐿 − 𝑛𝑅)
The sense of the rotation is worth clarifying. According to the equation, the optical rotation angle
is positive if the index of refraction is higher for the left circularly polarized light, meaning its speed
through the sample will slower. As a result, that wave will oscillate further (i.e. execute more of a
wave cycle) compared to the right circularly polarized light. But referring to the earlier figure
showing circularly polarize light you will notice that when left polarized light rotates further as a
function of position along the direction of travel, it is actually rotating clockwise; this is opposite from
the apparent counterclockwise rotation of the E field vector seen by a fixed observer as the left
circularly polarized traveling wave passes. As a result, if 𝑛𝐿 − 𝑛𝑅 > 0, then the rotation of the electric
field vector is clockwise as shown, as viewed by an observed looking towards the source. If the
optical rotation is expressed as a molar quantity by dividing by concentration, and also normalized
for path length, an equation for molar optical rotation is obtained.
[𝛼] =100𝛼
𝐶𝑑=
100𝜋
𝜆𝐶(𝑛𝐿 − 𝑛𝑅)
As with the CD effect, for most molecules in solution the (unitless) difference in index of refraction
(𝑛𝐿 − 𝑛𝑅) is very small, perhaps 10-5. But from the equation above for you can see that because the
path length d is often about 104 or more times longer than the wavelength , that the optical rotation
is often substantial and can be measured accurately. Again, this requires an additional polarizing
filter between the sample and the detector. Optical rotation can be used to identify chiral molecules
and it is particularly useful in organic chemistry for evaluating the enantiomeric purity of a synthetic
product; a racemic product, being composed of equal amounts of both enantiomers, shows no optical
rotation.
The use of optical rotation has a storied past. In the early 1800’s it was observed that individual
quartz crystals, which grow in two mirror-related forms, rotated polarized light in different
directions. In 1849 Louis Pasteur took a crystallized sample of tartaric acid (a 4-carbon compound
with two chiral centers – the meaning of which was unknown at that time), separated the crystals
191
into two piles according to their apparently mirror-related morphology and discovered remarkably
that the dissolved crystals of mirror-related morphology rotated polarized light in opposite
directions. That experiment came several decades before atomic structures were determined for any
compounds, at a time when theories of bonding and the atomic structure of matter were still
undeveloped. With regard to the apparent luck and insight required to make that discovery,
Pasteur’s own words are notable – “Dans les champs de l'observation le hasard ne favorise que les
esprits prepares” [In the field of observation, chance favors only the prepared mind].
Optical rotation and circular dichroism are interrelated
The phenomena of optical rotation and circular dichroism are related to each
other and they occur together in the same molecular sample, as illustrated.
Both arise from complex relationships between electric transition dipole
moments (which we discussed briefly earlier) and magnetic dipole moments
in a molecule. We will only touch on the subject qualitatively here. An
important point is that the effects (like other spectroscopic phenomena) have
to do with allowable energy transitions in a molecule. As a result, the observed
effects are strongly wavelength dependent. Indeed the common term ORD
(optical rotary dispersion) comes from the wavelength dependence of the
optical rotary effect. The CD and ORD effects are strongest at or near
wavelengths where some underlying absorption transition occurs. An exact
integral relationship exists between ORD and CD in
the form of a Kramers-Kronig transform, which we
will not discuss here, but in its simplest form the
relationship leads to a characteristic result
diagrammed here for an idealized electronic
transition whose maximum absorption would be at
max. Molecules with complex absorption spectra
give more complex CD and ORD spectra.
One result of the integral relationship that gives
from is that even when the circular dichroism
peak (which relates to absorbance differences) is
sharply peaked at the absorption maximum and is
weak elsewhere in the spectrum, the optical
rotation signal may be appreciable at wavelengths
farther from the transition. In some sense this
amounts to a smoothing out effect. The CD signal from a complex molecule may therefore offer
sharper distinguishing features, which can be important in analyzing detailed behavior and
conformations, while the advantage of optical rotation is that its effects can often be observed in the
visible range of the spectrum even if the molecule being studied has strong electronic transitions only
in the far UV region. Pasteur’s tartaric acid is a case in point. The optical rotation phenomenon and
192
its wavelength dependence can also be demonstrated easily with simple corn syrup owing to its high
concentration of chiral sugars.
CD studies for analyzing protein secondary structure
CD spectroscopy is widely used to monitor the conformation of proteins. There are strong transitions
from the polypeptide backbone in the 200-220 nm range, so CD measurements on proteins are
typically made in the surrounding range. A particularly common use is to estimate the percent
composition of the basic secondary structure elements – alpha helix, beta sheet, and ‘random coil’ –
in a protein. This can be informative if the three dimensional structure of the protein is not known
in more detail from other techniques, or if one is concerned about whether a protein is folded
properly. As we have discussed, under various conditions, or after mutations have been made, a
protein may become partially or totally unfolded.
The different types of protein secondary structure have distinctive CD spectra (shown here), which
have been established with model polypeptides or proteins. Clearly, if one measures the CD spectrum
of an unknown protein and it matches precisely to one of the three reference spectra (alpha, beta, or
random), then you could surmise that
the protein in question was entirely
helical, entirely beta sheet, or entirely
unfolded. This is of course rarely the
case. Instead, after recording a CD
spectrum one is generally faced with the
problem of how to decompose it into a
sum of the reference spectra, weighted
according to the estimated fractional
contribution each makes to the total
observed spectrum. Virtually all
spectroscopic techniques give additive
behavior with respect to multiple
components that are present in a
mixture, and CD is no different. As a
result, we can write a series of linear
equations stating how the reference CD
values at each wavelength would be
expected to sum to the value observed
in the unknown sample.
At each wavelength, i, we can write an equation of the form
obs(i) = f(i) + f(i) + frr(i)
193
where obs is the observed ellipticity and f, f, and fr, are the unknown fractions of alpha, beta, and
random coil that make up the protein under study. The other terms in the equation, e.g. (i), are
known quantities based on the reference curves. A series of equations at different wavelengths can
be written in matrix form as shown.
[
𝜃𝑜𝑏𝑠(𝜆1)
𝜃𝑜𝑏𝑠(𝜆2)
𝜃𝑜𝑏𝑠(𝜆3)…
] =
[ 𝜃𝛼(𝜆1) 𝜃𝛽(𝜆1) 𝜃𝑟(𝜆1)
𝜃𝛼(𝜆2) 𝜃𝛽(𝜆2) 𝜃𝑟(𝜆2)
𝜃𝛼(𝜆3) 𝜃𝛽(𝜆3) 𝜃𝑟(𝜆3)… … … ]
[
𝑓𝛼𝑓𝑏𝑓𝑟
]
Ideally we would write a large number of equations based on measurements at many different
wavelength values. This would give us a system of n equation in three unknowns (f, f, fr), with n
>> 3. As you may know, if the number of equations is larger than the number of unknowns, then
there may be no exact solution for the unknowns for which all the equations are satisfied. The key is
to determine what values for the unknowns give the best agreement overall with the equations
provided. The general solution to this problem is by the method of linear least squares. More
complex treatments are possible in which different weights are given to different measurements
according to their uncertainties, but we will give the simplest treatment here where all the estimated
errors are assumed to be equal. Then the optimal solution can be written out relatively easily. First,
we will shorten our notation for the system of equation above as: a vector �⃗� (of dimension n) is equal
to a rectangular matrix A (n rows by 3 columns) times a vector 𝑓 (of dimension 3, representing the
quantities to be determined (f, f, and fr,).
𝐴𝑓 = �⃗�
Then multiplying on the left by the transpose of the matrix A, AT, we get
𝐴𝑇𝐴𝑓 = 𝐴𝑇�⃗�
(𝐴𝑇𝐴) is a square 3x3 matrix that can be inverted to give its reciprocal, (𝐴𝑇𝐴)−1. Then multiplying
on both sides on the left by (𝐴𝑇𝐴)−1 gives
𝑓 = (𝐴𝑇𝐴)−1𝐴𝑇�⃗�
This is a straightforward calculation to perform by computer, making it easy to obtain estimates of
the secondary structure composition from measured values of the ellipticity at several wavelengths.
The linear least squares approach above is extremely powerful and can be applied to wide ranging
problems where a large set of equations can be written in terms of a smaller set of unknown variables.
194
CHAPTER 19
Macromolecular Structure Determination and X-ray Crystallography
Our current understanding of biology dwarfs what was known only a few decades ago. During that
time, two areas of study have driven genuine scientific revolutions: genome sequencing and
structural biology. This chapter focuses on the latter subject.
The power of structural biology rests on the adage that seeing is believing. And indeed learning and
seeing what macromolecules look like in atomic detail has changed the way we understand the
workings of the cell and all its components. This chapter will focus on the diffraction technique
known as x-ray crystallography. Only a few comments on other important methods in structural
biology will be offered, mainly in counterpoint.
We will see shortly that at its heart, x-ray crystallography is a type of imaging method; there are
important complications, but in the end one obtains a true three-dimensional image of the molecule
under study. On this issue a contrast can be drawn to nuclear magnetic resonance (NMR), which is
the second leading method for studying macromolecular structure. NMR methods probe the complex
interactions between nuclear spins in a molecule and powerful external magnetic pulses. From
sophisticated analyses of those interactions, sometimes relying on special biochemical protocols
involving isotopic labeling of specific atom types or residues in a macromolecule, information is
extracted about the proximity and relative orientation between different amino acids in the protein
(or nucleotides in the case of nucleic acid molecules). This leads ultimately to a large number of
inferred spatial constraints that must be obeyed by a correct atomic model. Computer programs then
attempt to generate a set of atomic coordinates that is most consistent with the body of NMR
constraints, along with other known information (chiefly the amino acid or nucleotide sequence). If
a sufficient number of spatial constraints can be obtained, then an accurate model can be produced.
With NMR, the experimental challenges increase steeply with molecular size and complexity, but new
methods continue to push the limits of size. In addition, NMR methods offer valuable dynamical
information about macromolecules that is difficult to obtain by other methods, including x-ray
crystallography.
We will shortly discuss the importance in imaging methods of using a sufficiently short wavelength
for the radiation source in order to get detailed structural information. X-rays fit that requirement,
but the high energy electrons used in electron microscopy also fit that requirement; they have very
short (DeBroglie) wavelengths. Yet despite the sufficiently short wavelength offered by high energy
electrons, until recently electron microscopy has not been able to produce images of macromolecules
in atomic detail. The reasons are complex, but they concern two interrelated issues of instrument
sensitivity and the strongly destructive interaction of electrons with biological materials (like
proteins and nucleic acids). But it appears that those limitations are finally falling away. Very recent
instrumentation developments have produced systems with detector sensitivities high enough that,
if sufficient effort is applied to collect very large numbers of molecular images, atomic level detail can
indeed be obtained by electron microscopy in favorable cases. Electron microscopy methods, and
195
NMR methods as well, are sure to continue to grow in power and to contribute increasingly to our
body of knowledge in the area of structural biology. But we turn now to the method that has
contributed so enormously to our understanding of the three dimensional structures of
macromolecules: x-ray crystallography.
The limiting effect of wavelength
In order to explain why x-ray crystallography is necessary, we have to understand the fundamental
limiting effect that the wavelength has in an imaging experiment. One way to understand this point
is to ask how different the scattering is from two points in space that are separated by a distance d, if
the wavelength of the radiation is . Besides d and , the answer also depends on the geometry of
the scattering, as shown. Scattering phenomena depend on how light waves interfere with each
other, and whether light waves interfere with each other (constructively or destructively) depends
on the relative phases of the scattered waves, and this depends in turn on the relative distance the
light waves travel when they scatter from different points in space. If the path a light wave takes
from its origin to a detector is the same whether it scatters from point A or point B, then those two
points interact with the wave in an effectively indistinguishable way. Indeed, the distinction between
the scattering from two points comes from the ‘path length difference’ for rays traversing paths
through those points. In the scheme shown, the path length difference is 2𝑑𝑠𝑖𝑛(𝜃). Advanced texts
in different fields of study address the next
point in different ways. Here we will err on
the side of simplicity and just argue that if the
path length difference is short compared to
the wavelength of the radiation, then
scattering from the two points is not so
different, and an optical experiment based on
the indicated set-up (of d and ) would not
clearly resolve the two points. If we press the
argument and say then that the level of detail
or spatial ‘resolution’ d is defined by requiring
2𝑑𝑠𝑖𝑛(𝜃) to be comparable to , then we see
that the minimum possible value for d (i.e. the
finest detail that could be resolved) is limited
by /2 (which occurs at =90°). This is why light (or UV) microscopy cannot provide spatial detail
below a few hundred nanometers, no matter how large or perfect the lenses are. This fundamental
limitation that the wavelength places on resolution is sometimes referred to as the diffraction limit
or the Abbe limit. Some very special tricks – some having to do with the power of statistical averaging
and some having to do with special instrumentation – have been developed over the last few years
to circumvent the diffraction limit; these techniques are sometimes grouped under the moniker of
‘super-resolution’ microscopy, recognized by the Nobel Prize in Chemistry in 2014. Setting aside
such special techniques, the limiting effect of the wavelength means that in order to resolve atomic
level details in molecules, we have to use radiation with a wavelength not much longer than the
196
separation between atoms, which is 1 to 2Å. That corresponds to the x-ray region of the
electromagnetic spectrum, hence x-ray crystallography.
X-rays and the problem of focusing
X-ray radiation provides the answer we need with respect to wavelength, but it also introduces a
critical problem. In a typical imaging experiment, using a camera or a telescope or your eye (or even
magnetic lenses for the case of electron microscopy), the photons or waves that are scattered from
the object under study are focused back to form the (typically enlarged) image. But x-rays cannot be
focused, at least not to a practical degree, because there are no materials with a high index of
refraction for x-ray. So, using x-rays we can do the first part of an imaging experiment (i.e. the
scattering) but not the second part (focusing). To understand what must be done instead, consider
that, though there is no suitable lens for x-rays, information sufficient to create the desired image
must be contained in the scattered waves that arrive at the lens location; if the x-rays could be focused
then an image would be formed. The solution to the problem then is to record the scattered
information and figure out what mathematical relationship is required to convert the observed
scattering back into an image – that is, to do using a computer what a lens would do naturally. It
turns out that that relationship is well-understood. An object and its scattering pattern are related
by a Fourier transform, an integral transform that is ubiquitous in mathematical physics and
engineering. Before we discuss how such operations relate to applications in x-ray crystallography
we have to consider certain aspects of how repetitive objects like crystals scatter radiation.
Diffraction geometry
From earlier physics experiments you are likely familiar with the basic idea that when light is
scattered from a regularly repeating object, like a light or a laser passing through a set of fine slits,
one get destructive
interference almost
everywhere, but
constructive
interference (i.e.
bright spots) at a
series of special
positions.
Constructive
interference occurs
at diffraction angles
where the light
passing through
different slits have
path length
differences that are
197
integral multiples of the wavelength of the light. Destructive interference occurs elsewhere. The
example shown is essentially a one-dimensional system, slits repeating in one direction. Describing
diffraction from objects that repeat in more dimensions becomes a bit more complex, but the two
dimensional case can be illustrated clearly.
We can imagine scattering
from a two-dimensional
crystal where a molecule
repeats regularly in the x
and y dimensions. Let the
repeat distance along x be
|a|, and the repeat distance
along y be |b|. The a and b
vectors define the
boundaries of a ‘unit cell’,
whose contents could be
used to construct the entire (ideally indefinite) crystal by translational shifts; for our simplified
discussions we will be ignoring the possibility of rotation symmetry within the crystal. The a and b
vectors also describe a lattice of points embodying the properties of the translational repetition in
the crystal. We can understand the geometry of diffraction by momentarily forgetting about the
underlying structure of the object in the crystal in order to focus on the repeating lattice. The lattice
captures the relationship in the crystal between equivalent atoms belonging to molecules from
different unit cells. For example, if you considered just the C-alpha atom of the first amino acid in the
protein molecule, all the instances of that atom throughout the crystal would describe the crystal
lattice. Now we can apply what we know about scattering from a repeating object to this system of
lattice points. Scattering will be constructive for some choices of the direction of the incoming and
outgoing beams, but it will be destructive for most choices, giving no intensity for the outgoing beam
in those cases. A useful point is that if the scattering for a particular choice of incoming and outgoing
beam directions would be destructive for scattering from the repeating arrangement of one
particular atom in the protein molecules (i.e. the C-alpha atom alluded to earlier), then the scattering
would also be destructive when considering the arrangement of some other particular atom in the
protein. In other words, if a particular choice of incoming and outgoing beam directions would give
destructive interference from the crystal lattice points, then there would be destructive interference
from the entire crystal, regardless of what the molecule looks like or how its atoms are arranged
internally. The geometry of diffraction is therefore dictated only by the repeat pattern in the crystal
and not by the contents of the unit cell. This important simplification allows us to proceed to discuss
diffraction geometry separately from the question of molecular structure. [We will see later that
while the lattice geometry alone determines where we see diffraction, the molecular structure within
a unit cell determines which diffraction spots are bright and which are weak, and that information is
ultimately the basis for structure analysis].
198
The key relationship between
the incoming x-ray beam
direction and the outgoing
direction is captured by the
scattering vector, S. First we
define the incoming and
outgoing directions by unit
vectors �̂�𝑖𝑛 and �̂�𝑜𝑢𝑡. Then, a
diagram and a little algebra
shows us that the vector
difference between the
outgoing and incoming unit vectors, �̂�𝑜𝑢𝑡 – �̂�𝑖𝑛 is a vector of length 2sin. From before, our condition
for constructive interference for scattering or reflecting from planes that are separated by distance
d is 2d sin = n, or 2 sin/ = n(1/d). By substituting 2sin = (�̂�𝑜𝑢𝑡 – �̂�𝑖𝑛) we get (�̂�𝑜𝑢𝑡-�̂�𝑖𝑛)/ =
n(1/d). This motivates us to define a new vector, the scattering vector S, to be S = (�̂�𝑜𝑢𝑡-�̂�𝑖𝑛)/ . The
scattering vector bisects the outgoing vector and the (negated) incoming vector. According to the
algebra used to construct S, for constructive interference S is perpendicular to the reflecting or Bragg
planes drawn on the lattice, and the length of S must satisfy
|S| = (1/d)n
S is defined geometrically by the incoming and outgoing beam directions, and scattering is only
constructive when S follows the equation above. But the planes we drew in the diagram above
illustrate just one possible way that parallel planes can be drawn on a lattice. A practically unlimited
number of choices can be made for a set of planes running through the lattice at different angles. But
if we just choose two directions as our foundation, we can set up a system for describing the 2-
dimensional diffraction completely. Here the lattice has been drawn to be orthogonal (i.e. rectangular
instead of oblique). This is not necessary, and in fact many crystal have non-orthogonal unit cells,
but we will treat the orthogonal case because the algebra is simpler there. It makes sense to choose
our planes to be horizontal or vertical. Referring to our previous diagram of the two-dimensional
crystal, for the vertical planes along b, the spacing would be |a|, and there would be diffraction (i.e.
constructive scattering) for an S vector perpendicular to b (and therefore along a) and having length
1/|a| times an integer, which we will call h. For reflection from horizontal planes along a, S would be
along b and have length 1/|b| times an integer, k. Note the reciprocal relationship between the
lengths of the unit cell edges |a| and |b| and the length of the S vector where we get diffraction.
At this point there is utility in introducing a new set of basis vectors for describing the S vector. Owing
to the reciprocal nature of the relationship noted above, the coordinate space where we construct
the scattering vector S is referred to as ‘reciprocal space’. As basis vectors in reciprocal space, we
create an a* vector (perpendicular to b and having length 1/|a|) and a b* vector (perpendicular to a
and having length 1/|b|). That scheme is shown in the following figure. Now, for the S vectors
perpendicular to the planes defined by the b axis, we have S = ha*. And for S vectors perpendicular
to the planes defined by the a axis we have S = kb*. But in addition to the horizontal and vertical
199
planes, we could also draw sets of planes through the lattice at oblique angles. We should expect
diffraction for S vectors perpendicular to those planes as well, and with the length of S reciprocally
related to the spacing between planes. We will skip a full algebraic treatment, but it turns out that
the scattering vector S for any choice of planes is described by a linear combination of integral
multiples of the reciprocal axes a* and b*. That is,
S = ha* + kb* (h and k integers)
That equation clearly describes a two-dimensional lattice of spots (or really scattering vectors S) in
reciprocal space for which we expect diffraction. Every ordered pair (h, k) defines a set of Bragg
planes through the lattice, and those planes gives rise to a reflection corresponding to a scattering
vector S perpendicular to those planes, whose ‘Miller indices’ in reciprocal space are h and k. The
figure illustrates the relationship between different sets of Bragg planes that can be drawn on the
crystal lattice, the corresponding scattering vector S, and the location of the resulting reflection in
the diffraction pattern. The green arrows in the bottom panels indicate the diffraction spot or
‘reflection’ that arises from the Bragg planes drawn in green in the upper panels.
We can also work backwards from an observed diffraction spot and calculate what the spacing was
between the (generally oblique) lattice planes that gave rise to that reflection: from the indices h and
k of the diffraction spot, and knowing the lengths of a* and b*, we can use the Pythagorean equation
to calculate the length of the scattering vector S. Then, from above (ignoring the n from before), d =
1/|S|. This has an important meaning. Scattering from closely spaced planes in the lattice (i.e. where
d is small) shows up in the diffraction pattern where the S vector is long, i.e. farthest from the center
(which is where the direct beam would hit [h=0, k=0]). Small values of d correspond to a fine level
of detail (or ‘high resolution’) in the image we ultimately obtain for the crystallized molecule.
Therefore, in order to ultimately obtain a high resolution image of the crystallized molecule,
200
diffraction data must be present and recorded at high angles of diffraction (i.e. high ). Note from the
figures we have drawn that the angle the outgoing beam makes with the incoming beam is actually
2.
Our diffraction geometry equations can be put into practice in several ways:
Suppose that a horizontal x-ray beam with wavelength =1.54Å hits a crystal and we are able to
record good diffraction data on a detector at an angle up to 50° degrees away from the direction of
the direct beam. What is the highest resolution (i.e. lowest value of d) for diffraction spots that would
be recorded? 2=50°, =25°. |S|=2 sin/ and d=1/|S| = /(2 sin) = 1.82 Å
Given the indices of a reflection, we can calculate the resolution it provides; we must also know the
reciprocal unit cell. If a=100 Å, b=125 Å, and c=160 Å, the resolution for reflection (24, 8, 17) would
be:
d=1/|S| = 1/sqrt((24*(1/100Å))^2 + (8*(1/125Å))^2 + (17*(1/160Å))^2) = 3.7 Å. This calculation
assumes an orthogonal lattice; otherwise the calculation would have to take angles into account.
For the same unit cell as above, how many total reflections exist in three-dimensions within the limit
of 2.5 Å resolution, i.e. where d > 2.5 Å or |S| < 1/(2.5 Å)? The volume of a 3-D sphere in reciprocal
space with radius 1/(2.5 Å) is V=(4/3)(1/(2.5 Å))^3. Dividing this by the volume occupied by one
reciprocal unit cell volume (a*b*c*), gives about 536,000 reflections. We did not discuss internal
rotational symmetry in crystals, but that would make some of the reflections equivalent to each other
and therefore redundant, but nonetheless you can appreciate the very large number of observed
quantities that are measured in a macromolecular crystallography experiment, which is consistent
with the requirement of producing an image that can define the detailed structure of a molecule
typically containing thousands of atoms.
Diffraction in three dimensions
Our diagrams above were drawn for two-dimensional diffraction, and there the patterns come out
like we might imagine, or like we have seen in a classroom demonstration of a laser passing through
a fine screen. The geometry of diffraction from an object that repeats in three dimensions is a bit
more complex. At any given orientation of the crystal and the incoming beam, we are able to see just
a two-dimensional slice through the diffraction pattern that exists in our hypothetical three-
dimensional reciprocal space. But the slice we observe is not from a flat plane, but rather from a
sphere intersecting the three-dimensional reciprocal lattice. Why? We saw earlier that the scattering
behavior is governed by the scattering vector S. And S is the sum of the outgoing beam unit vector
201
(�̂�𝑜𝑢𝑡) with the negated incoming beam unit
vector (−�̂�𝑖𝑛), divided by . Now if the beam has
a fixed direction relative to the crystal (i.e. the
incoming beam and the crystal are both
stationary), so that sin is fixed, then the question
is: what values of S can possibly be sampled by
all the allowable directions for the outgoing
beam unit vector �̂�𝑜𝑢𝑡? Interestingly, the answer
is a sphere, as shown here. This is referred to as
the sphere of reflection or the Ewald sphere. So,
for diffraction from a three dimensional crystal,
we see diffraction only where S falls on a sphere
(of radius 1/), and where S simultaneously
falls on a reciprocal lattice of points. This has
the effect of planes of spots intersecting a
sphere, and since a plane intersects a sphere in
a circle, we see a diffraction pattern with spots seeming to appear in circular rings. In order to obtain
information on the full three-dimensional diffraction pattern, the crystal must be rotated about an
axis while diffraction images are recorded. An example of diffraction from a crystal undergoing a
narrow rotation is shown. Interpreting such a diffraction pattern, e.g. determining the indices (h,k,l)
of all the spots is a complicated problem. Modern crystallographic programs can usually do this
automatically for good diffraction data, a procedure known as ‘autoindexing’. This was not possible
in the early days of crystallography. Then, a crystallographer had to take pains to characterize the
crystal unit cell and the reciprocal lattice. From the arguments above, you can see that to take a
diffraction image showing values of S in a flat slice of reciprocal space requires changing the
orientation of the crystal relative to the incoming beam during the film exposure, in a very particular
way. The complex motion of the crystal and the film and an intervening annular screen can be
202
accomplished by a ‘precession camera’. An example of a precession photograph of a protein crystal
is shown. Owing to the time required and the complexity of the procedure, precession photographs
are rarely produced in modern crystallographic work.
Limited diffraction and disorder
We noted earlier that the geometry of a data collection experiment can set a limit on the resolution
obtained. A (typically flat) x-ray detector panel only allows data collection to a certain value of , and
the resolution is limited by d=/(2 sin). But the geometry of the experimental setup is rarely the
element limiting the resolution. The resolution of an x-ray diffraction experiment on a
macromolecular crystal is most often limited by the absence of detectable diffraction spots above
some scattering angle ; spots are clear and strong in the inner region of the diffraction pattern, but
they fall off and become unmeasurable farther from the center. This natural limit in resolution is a
direct reflection of the degree of order vs disorder in a crystal. For a perfect crystal, where the protein
atoms in one unit cell are in identical positions in every unit cell all through the crystal, diffraction
would be strong to unlimited resolution. But if the protein exhibits substantial atomic motion or
conformational variation, then diffraction will vanish at resolutions (i.e. values of d) comparable to
those variations. Stronger x-ray beams make it possible to observe weaker reflections, and indeed
the development of synchrotrons that produce x-ray beams thousands of times stronger than home
laboratory sources is a major reason for the current ability to determine atomic structures of highly
complex macromolecular assemblies, which often yield only small and weakly diffracting crystals.
Obtaining the atomic structure
How the contents of the unit cell affects the diffraction: the structure factor equation
In our previous discussions, we imagined abstracting just one specific atom from each molecule in
the crystal to establish the lattice, and from there we analyzed where the diffraction spots would
appear, and what their indices (h,k,l) would be. That exercise did not depend on the internal
structure of the molecule in the crystal, and so the positions of the spots evidently reveal nothing
about the underlying molecular structure. Instead, the molecular structure is manifest through the
character of the individual waves that comprise the diffraction pattern.
Every diffraction spot or reflection has an intensity (i.e. a brightness or darkness on the detector,
depending on the display device), which is denoted as I(h,k,l) according to the indices of the
particular reflection. But each reflection is a wave and so also carries with it a ‘phase’. The phase
describes how far advanced the wave fronts of a wave are compared to a reference wave (which in
our case is a hypothetical wave scattering from a reference point at the origin of the unit cell of the
crystal). How do the positions of the thousands of atoms in the crystal unit cell relate to the intensity
and phase of the wave corresponding to reflection h,k,l? Each atom scatters in all directions, so each
reflection is just a sum of waves scattered from the atoms in one unit cell, as shown.
203
How do those waves
add up? The separate
waves scattered from
the many atoms in the
unit cell interfere
constructively or
destructively in
matters of degree
rather than absolutely,
as was the case for our
earlier analysis of
scattering from a
lattice. To add them
up we have to account
for magnitudes and
the relative phases of
the waves scattered
from the separate
atoms. To a first approximation, the magnitude of the wave scattered from each atom is determined
by the number of electrons in the atom. The phase is determined by the position of the atom, since
that (along with the value of S for the reflection in question) is what determines the path length for
the wave. The phase angle (compared to hypothetical scattering from a point at the origin of the unit
cell) is governed by the dot product of the S vector for the reflection and the position vector for the
atom, which is usually written as r=xfa + yfb + zfc. Note that the coordinates xf, yf, zf, are ‘fractional’
relative to the unit cell axis vectors, a, b, and c. The consequence is that the path length difference is
r(�̂�𝑖𝑛-�̂�𝑜𝑢𝑡). To convert the path length difference to an angular phase, we need to divide the path
length difference by , multiply by 2, and then
negate since a longer path gives a negative shift of
the wave peaks. The phase angle then becomes
2r(�̂�𝑜𝑢𝑡-�̂�𝑖𝑛)/, and then substituting S for (sout-
sin)/ we obtain 2(rS) for the phase of the wave
scattered from an atom at position r to the reflection
whose scattering vector is S. A further simplification
in notation comes from expanding (rS) in
components as (xfa + yfb + zfc)(ha* + kb* +lc*).
Because of the specific way the reciprocal lattice was
constructed (aa*=1, ab*=0, ac*=0, and so on), (rS)
can be written as (hxf + kyf + lzf), which makes it
clearer how to calculate the required value. These
relationships are diagrammed in the figure here. The
phase angle term 2(hxf+kyf+lzf) that relates
scattering from position x,y,z to reflection h,k,l
appears throughout crystallography equations.
204
Now we have to add up the waves scattered from all the atoms based on their separate magnitudes
and phases. We will skip over some basic mathematical details here and simply explain that the way
to add up interfering waves (of the same frequency) with different phases is to decompose each wave
into a sine and a cosine term. That decomposition depends on the phase: If the phase is zero we say
that the wave is a cosine wave. If the phase is 90° we say it is a sine wave. If the phase is intermediate
then it decomposes into both cosine and sine components according to the cosine and sine of the
phase angle. Then, the waves can all be added up by adding their cosine components and sine
components separately. The final summed components describe a new total wave whose magnitude
and phase are embodied in the values of the total cosine and sine components. In order to collapse
the two components together as a single number so that the waves can all be added together with a
simple summation, we cast things into the space of complex numbers, assigning the cosine terms to
be real and the sine terms to be imaginary. Making those representations, we end up with the
structure factor equation:
𝐹 (ℎ, 𝑘, 𝑙) = ∑ 𝑓𝑗𝑐𝑜𝑠(2𝜋(ℎ𝑥 + 𝑘𝑦 + 𝑙𝑧)) + 𝑖 ∑ 𝑓𝑗𝑠𝑖𝑛(2𝜋(ℎ𝑥 + 𝑘𝑦 + 𝑙𝑧)) = 𝐴 + 𝑖𝐵
atom 𝑗atom 𝑗
The 𝑓𝑗 term is the form factor for
atom j (its number of electrons to a
first approximation). 𝐹 (ℎ, 𝑘, 𝑙) is
called the structure factor for
reflection h,k,l. The figure shows a
graphical representation of how the
total structure factor 𝐹 arises from
the summation of atomic
contributions.
The structure factor, 𝐹 (ℎ, 𝑘, 𝑙),
describes the wave that gives rise to
a specific diffraction spot.
Specifically, the brightness or
intensity of the spot on the detector
is the square of the amplitude of the
structure factor, 𝐼(ℎ, 𝑘, 𝑙) =
|𝐹 (ℎ, 𝑘, 𝑙)|2
, so the magnitude of the
structure factor for each reflection
is obtained by taking the square root of the measured intensity. This leaves us with one major
problem. We wrote the structure factor 𝐹 as a vector to emphasize its complex character. It contains
a real and imaginary part (A and iB above). Or, viewed another way, it is described by a length (or
magnitude) and an angle in the complex plane, which is the phase of the total wave. We can
205
measure the magnitudes but not the phases; that information contained in the waves is lost upon
collision with the detector.
To summarize, the positions of the atoms in the unit cell determine the structure factors for the
diffraction pattern. The structure factor for each reflection specifies (1) its magnitude, which can be
measured (as the square root of the spot intensity), and (2) its phase, which cannot be measured.
The ability to calculate precisely what the structure factors should be once we have an atomic
structure is an invaluable property, as we shall see later. But, the absence of phase information
creates an immediate problem.
Without the phases, there is no easy way to work the problem backwards and calculate where the
atoms must have been in order to give rise to the observed structure factor magnitudes. This places
the crystallography problem in a class mathematicians refer to as inverse problems: the results of a
calculation can be worked one way easily, but not the other. If we do have the phases for the structure
factors, then it is easy to calculate an image of the contents of the unit cell. The essential problem in
crystallography then is to recover (at least approximate values for) the missing phases. This is known
in crystallography as the phase problem.
Before turning to the phase problem, we will simply point out how the contents of the unit cell can
be calculated from the structure factors, once the phases are known. The relationship is an inverse
Fourier transform, well known in physics and engineering problems; it describes essentially what a
lens would have done if we had one for x-rays:
𝜌(𝑥, 𝑦, 𝑧) =1
𝑉∑|𝐹 (ℎ, 𝑘, 𝑙)|
ℎ𝑘𝑙
𝑐𝑜𝑠(𝛼(ℎ, 𝑘, 𝑙) − 2𝜋(ℎ𝑥 + 𝑘𝑦 + 𝑙𝑧))
In this equation, |𝐹 (ℎ, 𝑘, 𝑙)| is the magnitude of the structure factor and 𝛼(ℎ, 𝑘, 𝑙) is its angular phase.
𝜌(𝑥, 𝑦, 𝑧) is the electron density in the unit cell at position x,y,z. To obtain the electron density at any
point x,y,z, a summation is required over all reflections h,k,l, emphasizing that information about the
electron density is distributed across all the reflections. We will say more about the electron density
calculation later, but recognizing the requirement for the phases for all the reflections (𝛼(ℎ, 𝑘, 𝑙)), we
turn now to the phase problem.
Phasing and the phase problem
There are two essentially different ways the phase problem can be surmounted: (1) by methods
known as ‘molecular replacement’ and (2) by ‘heavy atom methods’ or variations thereon including
anomalous phasing; these make use of strong or unusual scattering from certain atom, generally
heavier than those naturally present in proteins and nucleic acids.
Molecular replacement - Summarizing briefly, molecular replacement requires that a structure
already be known for a similar molecule, or perhaps part of the unknown molecule. Common
206
scenarios include when the structure is known for a homologous protein from another organism, or
where the structure is known for the alpha subunit from an alpha/beta heterocomplex, or where a
protein-ligand complex has been crystallized and the structure is already known for the protein by
itself. In the molecular replacement approach, if one can figure out how the approximate model or
‘search model’ should be placed in the unknown unit cell, then structure factors can be calculated
from this oriented search model. The reason for doing this is the structure factor calculation
produces phases for the reflections. The phase values so-obtained may not be so accurate, because
they come from some approximate model, but they are usually good enough. One proceeds by taking
the structure factor magnitudes from the observed diffraction pattern from the unknown crystal, and
combines them with the approximate phases calculated from the search model. Those quantities,
placed in the equation above, can be used to produce an electron density map. Because the phases
do not contain information from the actual unknown structure, electron density maps calculated by
molecular replacement are typically biased to look like the search model. Overcoming this bias is a
thorny problem, but the speed and convenience of molecular replacement makes it an attractive
choice whenever it is possible. Of course if the crystallized molecule is unlike any known structure,
molecular replacement is not an option. In addition, a problem in molecular replacement we glossed
over is how the correct orientation and position can be identified for the search model in the unit cell.
The short answer is that if the search model is correctly placed then the structure factor magnitudes
calculated from that model (using the structure factor equation) should approximately match the
measured structure factor magnitudes. When the search model differs substantially from the
unknown structure, molecular replacement methods may fail, leaving only heavy atom and related
methods as viable routes.
Heavy atom methods – With heavy atoms methods, one obtains estimates of the phases by doing
additional diffraction experiments after perturbing the atomic structure (by addition of heavy
atoms). Making additional measurements of the structure factor magnitudes from perturbed
versions of the crystal makes it possible to break the phase ambiguity. As an example, suppose you
were trying to determine the value of an unknown (signed) quantity, and you were told that its
absolute magnitude was 100. It could be +100 or -100, but you can’t tell from a single measurement
of its magnitude. But what if you were able to ask what the absolute magnitude would be after
perturbing it in a known way? What if you were told that if you added 5 to the number the result had
a magnitude of 95. Then you would conclude that the unknown value was -100, not +100. That is
the essence of heavy atom methods and its variations. The crystallographic phase problem is more
complicated because what is missing is not merely the sign of the value but its phase angle. But it
turns out to not be so much more complicated.
The typical view of the problem is to take a single reflection and think of its structure factor as being
a vector that lies on a circle of known radius, namely the structure factor magnitude that comes from
the square root of the measured reflection intensity. Each structure factor has a phase associated
with it – if the atomic structure was already known then that phase could be calculated directly from
the structure factor equation – but not knowing the phase means that we do not know where on the
circle the structure factor vector for that reflection points. But now, imagine we have briefly soaked
the protein crystal in a solution containing a heavy atom compound, say HgCl2 for example, and that
207
as a result the crystal had been modified in a uniform way, say with a single mercury atom bound to
an exposed cysteine thiol in each copy of the protein molecule. We could do a diffraction experiment
on this ‘derivatized’ crystal, and we would obtain slightly different structure factor magnitudes
compared to the native protein crystal. Now for each reflection we have two magnitudes, the native
protein magnitude (FP) and the derivative magnitude (FPH). Now the parallel to the scenario laid out
before should start to come into view. To proceed further we have to understand precisely what was
added to each structure factor of the native crystal to produce the derivative structure factor. In
other words, we need to know what the heavy atom contribution was. We know from the structure
factor equation that if we are able to determine the location of the heavy atom(s) within the unit cell,
then we can calculate directly what contribution the heavy atom(s) made to each structure factor.
Determining the heavy atom position(s) is a separate problem – known as ‘solving the heavy atom
substructure’ – which we will not discuss in this chapter as it requires specialized calculations and
analyses. Instead we will proceed to explain how knowing the heavy atom position(s) and being able
to calculate the heavy atom contribution makes it possible to determine the unknown phase for each
reflection. First, we have to recognize that it is the vector (or complex valued) quantities for the
structure factors that determine how they add together. For each reflection (h,k,l),
𝐹 𝑃𝐻 = 𝑓 𝐻 + 𝐹 𝑃
where 𝑓 𝐻is the heavy atom contribution, whose value including its phase (or real plus imaginary
components) can be calculated from the positions determined for the heavy atom(s). The behavior
of this equation is typically illustrated with a phase circle or Harker diagram, as shown. The FP and
FPH structure factors are depicted
as circles, since their phases are
unknown at the outset. But the
centers of those two circles are
offset by a vector amount dictated
by 𝑓 𝐻. Under that construction,
with the 𝑓 𝐻and 𝐹 𝑃 vectors laid head
to tail, one sees that the
intersections of the two circles
gives two possible solutions to the
vector triangle addition equation.
Either of two choices for the phase
of the native structure factor
would agree with the diffraction
data. So, with information from
only one heavy atom derivative
(referred to as SIR for single
isomorphous replacement) we are
still left with an ambiguity about
the correct choice of phase angle. It
is possible to proceed with an
208
electron density calculation with an average of the two possible phase values, and sometimes this is
good enough, but clearly one would like to do better. The solution is MIR (multiple isomorphous
replacement). Additional heavy atom derivatives are sought, perhaps using different heavy atom
types. Further diagrams are not given here, but you can anticipate how collecting diffraction data on
a second derivatized crystal would for each structure factor produce a second heavy atom circle with
yet another different origin (related by a different heavy atom contribution), and if the data are well-
measured then there should be a point on the native protein phase circle where the other two circles
nearly intersect, and this gives a value for the native phase angle. In reality, the presence of
experimental errors makes the actual assessment of the best phase somewhat more involved.
In the last two decades, a large fraction of structures have been determined with variations on the
heavy atom method above (which dates to the first protein structures of myoglobin and hemoglobin
by the Kendrew and Perutz laboratories). These variations take advantage of the ‘anomalous’ x-ray
scattering of certain atoms, including many heavy atoms but also including lighter atoms, most
notably selenium. The twist is as follows. Ordinary heavy atom methods gain the additional
information required to break the phase ambiguity by measuring two different structure factor
magnitudes from two different crystals: the native protein crystal and the heavy atom derivatized
crystal. Anomalous scattering methods gain the information required for phasing from two structure
factors from the same crystal. We did not discuss it earlier, but for scattering from an ordinary crystal
(i.e. one not containing anomalously scattering atoms), the structure factor magnitudes for the (h,k,l)
reflection and the (-h,-k,-l) reflection are identical. This equality is broken by anomalously scattering
atoms, and extra information is obtained then from comparing the magnitudes of F(h,k,l) and F(-h,-
k,-l) for each reflection. Phasing approaches involving combinations of heavy atoms and anomalous
scattering are possible, leading to various acronyms (e.g. SIRAS for single isomorphous replacement
with anomalous scattering). The phase circle constructions in these various cases are different in
detail, but the main idea remains the same. An important feature of anomalous scattering methods
is that selenium atoms provide a relatively strong anomalous signal and can often be incorporated
seamlessly into a native protein by expressing the protein in bacteria grown on selenomethione; a
powerful and general method developed by Wayne Hendrickson and colleagues.
Electron density maps: obtaining an atomic model
We have already seen the electron density equation by which we can calculate an electron density
function (x,y,z) or ‘map’, once we measure the structure factor amplitudes and recover approximate
values for the missing phases. We will just note here that the two elements most critical to the quality
of the electron density map obtained are the accuracy of the phases and the resolution (minimum
value of d) to which the data extend. Electron density maps are shown here at resolutions of 3Å and
1.7 Å to emphasize what can and cannot be visualized. Depending on the quality of the phases, a
reliable tracing of the path of the backbone may require a resolution of about 3.0 Å or better. Side
chain identities become relatively clear at about 2.7 Å; prior knowledge of the amino acid sequence
is extremely valuable in most cases. At a resolution of about 1.7 Å, holes in ring structures may
209
become evident. And at about 1.1 Å
separate spherical densities for
individual atoms appear. Hydrogen
atoms scatter weakly in x-ray
diffraction so at typical resolutions
their positions are not visualized but
are instead inferred from geometric
considerations.
Modern software packages attempt to
model an atomic structure, given the known amino acid or nucleotide sequence, into a calculated
electron density map. Such automatically traced models usually require several rounds of human
inspection and rebuilding, combined with automated minimization by computer, in order to obtain a
final model that is reliable. An important feature of x-ray crystallography is the ability to quantitate
the degree to which a final atomic model agrees with the observed data; the structure factor equation
makes this possible. The level of disagreement between the model and the observed structure factor
magnitudes is given as an ‘R-factor’, which is just a measure of the residual error on a fractional scale.
Macromolecules are complex. They are often flexible in ways that are hard to capture in a single
model, and they often have ordered solvent structure around their surfaces that can be difficult to
model. As a result, the R-value for a relatively good structure may be in the range of 20%, and worse
for structures determined at lower resolution. The final refined structure in a crystallography
experiment is usually also informed by known geometric restraints, based on known values for bond
distances and angles. In crystal structures reported in the literature, a standard crystallographic
table will report values for the R-factor along with deviations from ideal geometry. Additional entries
give further information by which an expert can assess the quality and likely accuracy of the reported
structure.
Protein Crystallization
Often the hardest part of a crystallographic project is obtaining good crystals. Surprisingly,
sometimes crystals that exhibit excellent morphology by microscopic examination turn out to diffract
poorly, presumably reflecting insufficient order on the atomic scale. Though there are important
theoretical considerations, protein crystallization is still largely an art. In effect, one drives the
protein or nucleic acid out of solution under many different conditions, perhaps thousands, searching
for conditions that give highly ordered crystals. The most common experimental set up is referred
to as hanging-drop vapor diffusion. In each essentially separate experiment, a tiny drop of protein
(typically from a tenth to a few microliters) is mixed with an equal volume of a ‘reservoir solution’,
which contains a precipitant of some kind (high salt or a crowding agent like polyethelyne glycol), a
buffer to control pH, and possibly other compounds such as metal ions or organic compounds. Then
210
that mixed drop is hung upside down over the reservoir solution in a sealed chamber. Because the
precipitant is more concentrated in the reservoir, water leaves the protein drop and condenses in the
reservoir by evaporation. The protein solution thereby become more concentrated, and if the
solubility limit is exceeded the protein precipitates. Modern hanging drop experiments are
commonly set up by robotic liquid handling devices in 96 well plates. Many plates are typically set
up during the search for good crystals. With good luck, and assuming the protein or nucleic acid has
a sufficiently well-defined three-dimensional structure, crystals can be obtained. Examples of
crystals grown in hanging drops are shown. In most experiments, well-diffracting crystals have
linear dimensions in the 50m to 500m range. Crystals diffracting to good resolution have been
obtained for whole ribosomes, other giant complexes containing multiple protein and nucleic acid
subunits, membrane protein complexes, and numerous whole viral capsids including some with large
triangulation numbers. Size and complexity does not present a fundamental obstacle as long as a
defined three-dimensional structure is well-populated in solution. A final comment about protein
and nucleic acid crystals is that the molecules in these crystals are fully hydrated. In fact the water
content in typical protein crystals is in the 40% to 50% range. This is an important reason why the
structures obtained in the crystal state can be shown in most cases to be largely unaffected by crystal
formation, aside from local conformational effects where molecules contact each other in the crystal
lattice.