Jra Phd Final 051107
-
Upload
jane-allison -
Category
Documents
-
view
66 -
download
1
Transcript of Jra Phd Final 051107
Computational Methods for
Characterising Disordered States of
Proteins
Jane R. Allison
A dissertation submitted for the degree of Doctor of Philosophy
Trinity College
University of Cambridge
13 September 2007
Declaration
The research outlined in this dissertation was carried out by its author at the
Department of Chemistry of the University of Cambridge between October 2004
and September 2007. The work described herein is the original work of the au-
thor and includes nothing which is the outcome of work done in collaboration
except where specifically indicated in the text. It has not previously been sub-
mitted to any institution for any qualification or degree. The length of this
dissertation does not exceed the word limit.
Jane Allison
Cambridge, England
September 2007
i
Acknowledgements
I would like to extend my thanks to everyone who helped me with the writing of
this thesis; it is impossible to mention you all individually. In particular, how-
ever, I acknowledge Chris Dobson for accepting me into his group and allowing
me the academic and personal freedom that has made my time in Cambridge
so special. Michele Vendruscolo provided more hands-on supervision, including
an inexhaustible supply of ideas and admirable patience regarding my complete
ignorance of statistical mechanics. Peter Varnai deserves mention for always
taking the time to read and discuss my work and ask me a myriad of ques-
tions. Barbara Richter also provided invaluable support and, in combination
with Amol Pawar, a seemingly boundless appreciation of my cooking.
The key experimental data used in this thesis were provided by Matt Ded-
mon, Rob Rivers and Neil Birkett. Additional data used for validation were
obtained from Carlos Bertoncini and Markus Zweckstetter, and the coil library
ensembles were generated by Abhishek Jha. Finally, it was the work of Kresten
Lindorff-Larsen that initiated the development and application of PRE-ERMD
which forms the basis of this thesis. Thanks must also go to the entire Dobson
group for their willingness to share their expertise.
On a more personal note, I thank my family for their encouragement and
for putting me in a position to take this opportunity. Of the many people in
Cambridge and elsewhere who have contributed to my life over the past three
years, the consistent support of Hope Johnston and the proof-reading efforts
and shared endorphin addiction of Erica Thompson were greatly appreciated.
Additionally, the various sports teams and crews that I have been involved with
provided a vital outlet and both challenged and maintained my sanity. Finally, it
remains to thank the Woolf Fisher Trust for providing the funding that allowed
me to study towards my PhD at Cambridge University.
ii
Abbreviations
2D 2-dimensional
3D 3-dimensional
A alanine
A Angstrom
ANS 1-anilinonaphthalene-8-sulfonic acid
αS α-synuclein
βS β-synuclein
β+HC αS/βS construct
C8E5 n-octyl-penta(ethylene glycol)
Cα α carbon atom of an amino acid
Cβ first carbon of an amino acid side chain
CO carbonyl atom of an amino acid
CD circular dichroism
Cf compaction factor
δ chemical shift
∆δ secondary chemical shift
D aspartic acid
D2O deuterated water
DC distance comparison
DLB dementia with Lewy bodies
iii
DNA deoxyribonucleic acid
DS disordered state(s)
DSS 2,2-dimethylsilapentane-5-sulfonic acid
E energy
E glutamic acid
EK kinetic energy
EPR electron paramagnetic resonance
ERMD ensemble-restrained molecular dynamics
ET electron transfer
F phenylalanine
FET fluorescence energy transfer
fs femtosecond
G glycine
GB generalised Born
GB/SA generalised Born/surface area
GndHCl guanidine hydrochloride
H hydrogen
Hα hydrogen atom attached to α carbon
HC hydrophobic core
HCl hydrochloric acid
HSQC heteronuclear single quantum coherence
Hz Hertz
IDP intrinsically disordered protein
INEPT insensitive nuclei enhanced by polarization transfer
Iox intensity of peak when spin-label is in oxidised (paramagnetic) state
Iox/Ired intensity ratio
iv
Ired intensity of peak when spin-label is in reduced (diamagnetic) state
3J-coupling scalar 3-bond coupling
3JHNHα scalar 3-bond coupling between the amide and Cα hydrogens
kcal kilocalorie
K lysine
K degrees Kelvin
L lower bound on PRE distance restraint
L leucine
M methionine
M molar
MC Monte Carlo
MD molecular dynamics
Mes 2-(N-morpholino)ethanesulfonic acid
mM millimolar
mol mole
ms milliseconds
MTSL 1-oxyl-2,2,5,5-tetramethyl-3-pyrroline-3-methyl methanethiosulfonate
N nitrogen
N asparagine
NAC non-amyloid β component
NaCl sodium chloride
NaOH sodium hydroxide
NFP natively folded protein
NMR nuclear magnetic resonance
nOe nuclear Overhauser effect
ns nanoseconds
v
NS native state(s)
P proline
PB Poisson-Boltzmann
PD Parkinson’s disease
PDB protein data bank
PFG-NMR pulse field gradient nuclear magnetic resonance
φ dihedral angle about the N-Cα bond of a polypeptide
PI3-SH3 bovine phosphatidylinositol-3’-kinase SH3 domain
PMF potential of mean force
PPII polyproline II
PRE paramagnetic relaxation enhancement
PRE-ERMD ERMD using distance restraints derived from PRE-NMR
PRE-NMR paramagnetic relaxation enhancement NMR
ps picoseconds
ψ dihedral angle about the Cα-CO bond of a polypeptide
Q glutamine
R1 longitudinal relaxation rate
Rsp1 paramagnetic enhancement of the longitudinal relaxation rate
R2 transverse relaxation rate
Rred2 transverse relaxation rate in diamagnetic conditions
Rsp2 paramagnetic enhancement of the transverse relaxation rate
RCP residual contact probability
RDC residual dipolar coupling
Rg radius of gyration
Rh hydrodynamic radius
rms root-mean-square
vi
S serine
SASA solvent accessible surface area
SAXS small-angle X-ray scattering
SD standard deviation
SDS sodium dodecyl sulfate
SDSL side-directed spin-labelling
SE statistical error
SH3 Src homology 3
SPC-SH3 chicken α-Spectrin SH3 domain
T threonine
T simulation temperature
τc correlation time of the electron-proton vector
TS transition state
µM micromolar
µs microseconds
U upper bound on PRE distance restraint
UV ultraviolet
V valine
vdw van der Waals
X1−5 dihedral angles of the MTSL spin-label
Y tyrosine
Zagg predicted aggregation propensity
Zprofagg predicted aggregation propensity profile
vii
Abstract
To obtain a complete understanding of the behaviour of proteins it is necessary
to characterise all accessible conformations. This includes not only folded struc-
tures, but also the partially and fully unfolded states populated during folding
and mis-folding. The existence of intrinsically disordered proteins (IDPs) adds
a further category.
The heterogeneous range of structures comprising disordered states (DS)
presents a challenge for structure determination, making an ensemble descrip-
tion essential. Recent advances in techniques such as nuclear magnetic resonance
(NMR) spectroscopy allow site-specific structural information to be obtained
for DS. Experimental observables, however, are time- and ensemble-averages,
whereas definition of an ensemble requires knowledge of the underlying distribu-
tions. Simulations can complement experiments by providing such information.
Consequently, this thesis focuses on the development of computational meth-
ods for characterising DS of proteins. Firstly, a range of existing techniques of
varying degrees of accuracy are tested. Producing structures that are suffi-
ciently expanded proves a major difficulty, and even when this is overcome, the
structures remain incorrect. Long-range distances derived from paramagnetic
relaxation enhancement (PRE)-NMR are therefore incorporated into ensemble-
restrained molecular dynamics (ERMD) simulations to modulate the accessible
conformations. The initial tests are conducted using synthetic data so that the
success of the simulations can be evaluated by comparing distributions as well as
averages. The methodology is improved to account for the anomalous effects of
restraining a highly non-linear average across a limited number of replicas and
the inability of a single type of average to report on the underlying distribution.
The conversion of experimental data into distance restraints is also refined. The
resulting general method is applied using experimental data for three IDPs and
the acid-denatured state of a natively folded protein and new analysis methods
are introduced. The use of ERMD allows the aggregation propensities of the
proteins to be rationalised in terms of the nature of their residual structure.
viii
Contents
Declaration i
Acknowledgements ii
Abbreviations iii
Abstract viii
Contents ix
1 Introduction 1
1.1 Unfolded and partially folded states of
proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Protein folding . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Protein mis-folding and aggregation . . . . . . . . . . . . 3
1.1.3 PI3-SH3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Intrinsically disordered proteins . . . . . . . . . . . . . . . . . . . 7
1.2.1 α-synuclein and β-synuclein . . . . . . . . . . . . . . . . . 9
1.3 Methods for characterising disordered states . . . . . . . . . . . . 11
1.3.1 Experimental methods . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Theoretical representations . . . . . . . . . . . . . . . . . 17
1.3.3 Biomolecular simulations . . . . . . . . . . . . . . . . . . 18
1.3.4 Ensemble-restrained molecular dynamics . . . . . . . . . . 20
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Methods 23
2.1 Simulation methods . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.1 Unrestrained simulations . . . . . . . . . . . . . . . . . . 23
2.1.2 ERMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Restraints for PRE-ERMD . . . . . . . . . . . . . . . . . . . . . 27
ix
2.2.1 Calculation of distances from experimental data . . . . . 27
2.2.2 Calculation of distances from reference ensembles . . . . . 28
2.2.3 Accounting for uncertainty in PRE distance restraints . . 28
2.3 Analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 Back-calculation of experimental observables . . . . . . . 29
2.3.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.3 Correlation of distance distributions . . . . . . . . . . . . 33
2.3.4 Distance comparison maps . . . . . . . . . . . . . . . . . 33
2.3.5 Free energy landscapes . . . . . . . . . . . . . . . . . . . . 34
2.3.6 Ramachandran plots . . . . . . . . . . . . . . . . . . . . . 34
2.3.7 Predicted properties . . . . . . . . . . . . . . . . . . . . . 35
3 Simulation of disordered states of proteins 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Random coil model . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Explicit solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Implicit solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.1 Physiological temperature . . . . . . . . . . . . . . . . . . 43
3.4.2 Methods for generating expanded structures . . . . . . . . 44
3.5 Comparison with experimental data . . . . . . . . . . . . . . . . 47
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Improving the accuracy of ensemble-restrained molecular dy-
namics 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Theoretical aspects of ERMD . . . . . . . . . . . . . . . . . . . . 61
4.3 Definition of PRE distances . . . . . . . . . . . . . . . . . . . . . 62
4.4 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.1 Generation of reference ensembles . . . . . . . . . . . . . 65
4.4.2 Absence of correlated motions . . . . . . . . . . . . . . . . 66
4.4.3 Calculation of synthetic distance restraints . . . . . . . . 67
4.4.4 Application of PRE-ERMD . . . . . . . . . . . . . . . . . 69
4.5 Improvement of the PRE-ERMD method . . . . . . . . . . . . . 70
4.5.1 Cross-validation against multiple observables . . . . . . . 70
4.5.2 Explanation of the compaction problem . . . . . . . . . . 72
4.5.3 Solving the compaction problem . . . . . . . . . . . . . . 72
4.6 General protocol for PRE-ERMD . . . . . . . . . . . . . . . . . . 75
4.6.1 Additional modes of validation . . . . . . . . . . . . . . . 76
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
x
5 Comparison of the solution state ensembles of α-synuclein, β-
synuclein and β+HC 81
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Factors influencing the calculated distances . . . . . . . . . . . . 82
5.2.1 Correlation time . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.2 Transverse relaxation rate . . . . . . . . . . . . . . . . . . 83
5.2.3 Intensity ratio . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Choice of optimal T for characterisation by PRE-ERMD . . . . . 86
5.4 Global dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.5 Characterising residual structure . . . . . . . . . . . . . . . . . . 91
5.5.1 Distance comparison maps . . . . . . . . . . . . . . . . . 93
5.6 Residual structure of αS, βS and β+HC . . . . . . . . . . . . . . 95
5.6.1 Comparison of the re-calculated and previously published
αS ensembles . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.6.2 Long-range structure of βS and β+HC . . . . . . . . . . . 98
5.6.3 Structural propensities of the C-terminus . . . . . . . . . 98
5.6.4 Structural propensities of the N-terminus . . . . . . . . . 99
5.6.5 Dihedral angle preferences . . . . . . . . . . . . . . . . . . 100
5.6.6 Comparison with experimental data . . . . . . . . . . . . 101
5.6.7 Free energy maps . . . . . . . . . . . . . . . . . . . . . . . 103
5.7 Implications for aggregation . . . . . . . . . . . . . . . . . . . . . 105
5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6 Characterisation of the acid-denatured state of PI3-SH3 109
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Experimental PRE-NMR data implies non-native structure . . . 110
6.3 Choice of optimal T for characterisation by PRE-ERMD . . . . . 112
6.4 Global dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.5 Residual structure . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.5.1 Comparison of the native and acid-denatured states . . . 115
6.5.2 Structural propensities of the acid-denatured state . . . . 117
6.5.3 Comparison with experimental data . . . . . . . . . . . . 120
6.5.4 Free energy maps . . . . . . . . . . . . . . . . . . . . . . . 121
6.6 Implications for aggregation . . . . . . . . . . . . . . . . . . . . . 122
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7 Conclusions 125
References 128
xi
Chapter 1
Introduction
Proteins are the primary constituent of the interwoven biochemical networks
that make up living organisms1, enabling and controlling virtually every chem-
ical process that takes place in the cell2. A complete description of the mech-
anistic link between proteins and physiology is therefore essential in order to
understand not only the normal functioning of biological entities, but also the
myriad of pathological conditions that result from protein malfunction3.
Because a protein’s function is inherently linked to its three-dimensional
(3D) structure, it has been a long-standing goal in structural biology to under-
stand the relationship between a protein’s amino acid sequence and its structural
properties. This traditionally took place within a structure-function paradigm
based on the assumption that a given amino acid sequence dictates a single,
rigid structure upon which the function is entirely dependent4–6. Such a con-
clusion was inevitable given the use of X-ray crystallography as the principal
method to study protein structure, as by definition this is a purification process
that leads to the isolation of conformationally homogeneous molecules5. Addi-
tionally, enzymes, the traditional focus of biochemical studies, are proteins for
which the concept of a unique 3D structure is most tenable7,8. Thus the pro-
tein data bank (PDB), which is dominated by enzymes9 and other proteins that
have been successfully crystallized, does not constitute a representative sample
of the types of structures adopted by proteins in solution8.
More recently, a variety of solution-based techniques, in particular nuclear
magnetic resonance (NMR), have challenged the “one sequence - one structure
- one function” paradigm by revealing the conformational diversity exhibited by
proteins in solution. This includes bond bending and stretching, fluctuation of
side chains, movement of loops and elements of secondary structure and even
global tertiary structure rearrangements5. It is now widely acknowledged that
1
proteins are better represented as a probability distribution of conformations
rather than as a single structure. For ‘natively folded proteins’ (NFPs), which
fold into a well-defined and compact globular structure and exhibit only a mod-
est range of motion, the width of these distributions may be relatively narrow.
An increasing number of proteins, however, are being shown to be unstructured
under physiological conditions 4,10,11. Additionally, partially folded and fully
unfolded states of NFPs are of interest due to their central role in defining the
conceptual framework for protein folding and mis-folding12–14. An ensemble
representation is essential for the characterisation of such states, which com-
prise dynamic ensembles of interconverting structures and thus are described
by much broader probability distributions7,15 than NFPs.
In the following sections, the importance of characterising disordered states
(DS) of proteins is elaborated on, and the characteristics and roles of intrin-
sically disordered proteins (IDPs) are outlined. The model systems studied in
this thesis are introduced within the context of the type of DS epitomised by
each. Experimental and computational methods for describing DS are discussed
along with currently accepted concepts of the nature of DS and the supporting
evidence. Finally, ensemble-restrained molecular dynamics (ERMD), a method
for combining theory and experiment to gain a fuller understanding of the struc-
tures accessible to DS at a molecular level, is described. Development of this
method and its application forms the basis of the work described in the subse-
quent chapters of this thesis.
1.1 Unfolded and partially folded states of
proteins
Unfolded states of proteins are the reference state from which both folding into
the native state (NS) and mis-folding into disease-related aggregates such as
amyloid fibrils are initiated. This lends a fundamental motive to the character-
isation of unfolded states; namely, to explain why proteins predominantly fold
to their globular native structures rather than mis-folding into oligomers and
fibrillar aggregates. Additionally, both of these processes may in some cases pro-
ceed via partially folded intermediates3,16–20, which are therefore also of interest
with respect to understanding the mechanisms of folding and mis-folding.
2
1.1.1 Protein folding
In order to carry out their biological functions, NFPs must fold into a unique
3D structure. When studied in vitro, folding is initiated from a highly unfolded
state, and it is likely that a similar situation occurs in vivo. An explanation
for the ability of a nascent polypeptide chain to fold rapidly and precisely into
its native fold has long eluded structural biologists. The defining issue, termed
‘Levinthal’s Paradox’14, is the failure of a random search for the native fold
among the vast number of possible conformations accessible to even a small
protein to account for the observed time-scale of folding, which is of the order
of milliseconds to seconds. A proposed solution, which reconciles the stochastic
nature of the folding process with its robustness, is that folding is initiated by
a nucleation event21–25. This establishes a critical core, which then drives the
formation of the remainder of the structure21,26–28. By extension, larger pro-
teins fold by coalescence of substructures that are already partially preformed
according to the same principles21–24,27,29,30, making folding a hierarchical pro-
cess. Such a mechanism implies that the conformational free energy landscape
has been sculpted by evolution so as to allow efficient folding.
To establish the factors responsible for the initiation of folding, a descrip-
tion of the unfolded state of a protein in terms of its constituent structures and
their relative populations is required31. Characterisation of the partially folded
states that occur during the folding process is a further prerequisite for the elu-
cidation of protein folding mechanisms. Small proteins often fold in a two-state
manner, passing through a high energy transition state (TS) which has been
shown to be native-like in many cases25. Larger proteins may form transient
intermediates which may contain non-native as well as native-like structure32.
The probability distributions describing such states are generally broader than
those of the native fold, but narrower than those of fully unfolded states.
1.1.2 Protein mis-folding and aggregation
It has recently become apparent that the highly individualistic globular struc-
ture of a NFP is not the only stable ordered state accessible. An increasing
number of proteins have been shown to mis-fold into amyloid fibrils3. The ag-
gregation of these species is implicated in a number of debilitating diseases,
including Alzheimer’s disease, Parkinson’s disease (PD), type II diabetes, vari-
ant Creutzfeldt-Jakob disease and bovine spongiform encephalopathy33–35. Ad-
ditionally, many polypeptides unrelated to disease can form amyloid fibrils in
vitro under specific conditions3. Whilst it was the detection of amyloid fibrils in
3
pathological studies that led to their association with disease, it has been sug-
gested that the fibrils themselves are not the toxic species, but a sequestration
mechanism. The observation that the early pre-fibrillar aggregates are highly
damaging to cells, but the mature fibrils are relatively benign18,36–38 provides
support for such a hypothesis.
The highly organised core of amyloid fibrils consists of β-sheets whose strands
run perpendicular to the fibril axis39. The fibril structure is stabilised primarily
by interactions involving the polypeptide backbone. This, along with the fact
that seemingly any protein can be induced to form fibrils, has led to the be-
lief that the ability to form fibrils is a generic feature of proteins, although the
propensity to form such structures depends on a subtle interplay between the
protein in question and the conditions in which it is studied3,40. It is of interest,
therefore, to consider how disease-related mutations and changes to the envi-
ronment affect the aggregation propensity of a given protein. In Chapter 6, the
latter factor is investigated with respect to the model system described below.
1.1.3 PI3-SH3
The Src homology 3 domain from bovine phosphatidylinositol-3’-kinase (PI3-
SH3) is an example of a NFP for which characterising the unfolded state is
important with respect to both folding and mis-folding.
PI3-SH3 is an 84 residue globular protein which consists of two perpendicular
antiparallel β-sheets of three and two strands, respectively, and two helix-like
turns, arranged into a β-barrel41 (see Figure 6.2 A). It is a member of the SH3
family, a set of small protein modules of around 60− 85 residues which mediate
intra-cellular signal transduction42–44. Despite their low sequence homology,
the family of SH3 domains all exhibit a common fold which has been well-
characterised by both NMR spectroscopy and X-ray crystallography.
At neutral pH, PI3-SH3 folds cooperatively and reversibly in a two-state
manner with no intermediates45. The folding TS of three other SH3 domains
have been shown to contain predominantly native-like structure46 and it is likely
that a similar situation occurs for PI3-SH3.
Under folding conditions, the unfolded state is not stable, thus methods
such as acid-denaturation are used to generate unfolded states from which to
initiate folding studies. After prolonged incubation at low pH, however, PI3-
SH3 aggregates into amyloid fibrils47–51. Characterisation of the acid-denatured
state of PI3-SH3 is therefore important for understanding the factors controlling
the competition between folding and mis-folding.
The fact that PI3-SH3 does not aggregate at neutral pH combined with the
4
pronounced lag phase of several days indicate that the acid-induced destabilisa-
tion of the NS is a prerequisite for fibril formation49. Moreover, the susceptibil-
ity of acid-denatured PI3-SH3 to proteolysis is enhanced during the initial stages
of aggregation, suggesting that further unfolding from the acid-denatured state
occurs prior to formation of ordered aggregates, which are almost completely
resistant to proteolytic attack52.
Additional evidence for the requirement of unfolding prior to aggregation
comes from a 3D reconstruction of the fibril structure from cryo-electron mi-
croscopy data50, which provided the first glimpse of amyloid fibril structure.
The fibrils consist of a double helix of two protofilament pairs wound around a
hollow core. The 20 A-wide protofilaments can only contain two flat β-sheets,
which must be oriented differently to those of the native fold to ensure that all
of the strands are perpendicular to the fibre axis. Although the strands may
occur in similar regions of the polypeptide chain to those of the native fold, the
native structure of PI3-SH3 at neutral pH is too compact to fit into the fibril
density, thus it must unfold to adopt a more extended conformation.
The aggregation and structural properties of the acid-denatured state formed
at pH 2.0 have been widely studied. Under these conditions, PI3-SH3 is sub-
stantially unfolded relative to the native fold at neutral pH according to far-
and near-UV circular dichroism (CD) and 1H-NMR, although the binding of
the hydrophobic dye 1-anilinonaphthalene-8-sulfonic acid (ANS) suggests that
there is a partially formed hydrophobic core47. Whilst the acid-denatured state
is more expanded than the native fold, it is still relatively compact compared
to the fully unfolded protein denatured in guanidine chloride (GndHCl)47.
Interestingly, pH titration of PI3-SH3 showed that although the hydrody-
namic radius (Rh) initially increases as the pH is lowered, it reaches a maxi-
mum at around pH 2.4 and decreases again thereafter48. The CD ellipticity at
200 nm follows a similar pattern as a function of pH. The nature of the aggre-
gation process is also dependent on the pH: at pH values less than 2.0 (1.2 and
1.5), amorphous aggregates are rapidly formed, and the aggregation product
includes only a small number of short fibril-like structures, whereas at higher
pH values (2.0 and 2.7) there is a long lag phase, but the final aggregates consist
of morphologically well-defined fibrils. It is thought that these effects are due
to the screening of positive charges by anions at pH values less than 3.0 rather
than changes in the ionisation state of the protein, as it is unlikely that any of
the amino acids in a denatured protein have a pKa below 2.448. Such screen-
ing reduces intra- and inter-molecular repulsion, favouring protein compaction
and aggregation. At higher pH values, where there are fewer positively charged
5
side-chains, or at lower ionic strength, where the screening is less effective, the
aggregation occurs more slowly allowing well-organised fibrillar structures to
form.
At first sight, therefore, it appears strange that the GndHCl-denatured state
of PI3-SH3 does not aggregate, as it is even more unfolded than the acid-
denatured state and the ionic strength in these conditions is much higher. Gnd-
HCl, however, interacts preferentially with backbone CONH groups53, increas-
ing their solubility in aqueous solution and thus negating the energetic benefits
of forming not only native contacts, but also the intermolecular interactions
that lead to aggregation.
The role of charge in controlling the aggregation process may be why PI3-
SH3 was, until recently, the only member of the SH3 family known to form
amyloid fibrils. The long n-Src loop unique to PI3-SH3 was shown not to be
responsible for its differing amyloidogenic properties, as insertion of this region
into the chicken α-Spectrin SH3 domain (SPC-SH3), which has the same fold
and 24% sequence identity with PI3-SH3, does not induce fibril formation54.
Instead, insertion of six amino acids from the diverging turn and adjacent RT
loop of PI3-SH3 into SPC-SH3 results in an aggregation phenotype similar to
that of PI3-SH351. Replacement of two residues in this region of PI3-SH3
with those most highly represented in SH3 domains, which increases the net
charge by +2, significantly reduces the aggregation propensity. Other mutations
that increase the charge in this region also prevent aggregation, but addition of
charged residues to the N-terminus do not, indicating that this region plays a
key role in the aggregation of PI3-SH3. Two other SH3 domains lacking the two
conserved basic residues at the diverging turn, the human and Drosophila Abl-
SH3 domains, also aggregate into amyloid fibrils55. The only other SH3 domain
known to form fibrils, c-Yes-SH3, only aggregates in acidic conditions56, further
highlighting the importance of charge in the aggregation of SH3 domains.
As well as elucidating the specific mechanism of fibril formation by PI3-SH3,
characterisation of the acid-denatured state of PI3-SH3 provides an opportunity
to gain insight into the generic determinants of protein aggregation. PI3-SH3
is one of a growing number of model systems used to study the mechanism of
amyloid fibril formation and the toxicity and structural properties of mature
amyloid fibrils47–51,57. Whilst PI3-SH3 is not related to any known patholog-
ical condition and does not form amyloid fibrils in vivo, the fibrils formed in
vitro are morphologically identical to fibrils formed by amyloid disease related
proteins47. Additionally, early granular aggregates formed by PI3-SH3 exhibit
substantial cytotoxicity18. Together, these observations lend support to the
6
argument that the ability to form amyloid fibrils is a generic property of the
polypeptide backbone that can be induced in any polypeptide chain given the
right conditions3,58. The investigations of the acid-denatured state of PI3-SH3
described in Chapter 6 are therefore of general interest with respect to the
determinants of protein structure.
1.2 Intrinsically disordered proteins
The unfolded and partially folded states of NFPs are seldom stable under phys-
iological conditions, thus they must be stabilised by artificial means such as
increasing the temperature, introducing mutations, adding chemical denatu-
rants, or, as for PI3-SH3, altering the pH. In contrast, IDPs are fully or par-
tially unfolded under physiological conditions4,6,10,11,59–75, so that their char-
acterisation divulges facets of the relationship between sequence and struc-
ture not encountered in the study of NFPs. Some IDPs undergo a transi-
tion to a more ordered state upon binding to their biologically relevant lig-
ands such as metals, polyamines, small polypeptides, other proteins or mem-
branes4,5,7,10,11,60–63,65,66,76,77. The function of IDPs is dependent on their
highly flexible nature7,72,77, demonstrating that a defined 3D structure is not a
prerequisite for function4.
Statistical analysis has shown that the sequences of IDPs are significantly
different to those of NFPs. In particular, they exhibit a low sequence complexity,
with relatively few of the bulky hydrophobic residues typically found in the core
of folded globular proteins, and a high proportion of polar and charged amino
acids11,73,75,78–85. The resultant charge-charge repulsion and lower driving force
for hydrophobic collapse may explain their disorder10,62,83.
The compositional bias of IDPs provides a distinct signature for disordered
regions in sequence space that has formed the basis of various algorithms that
predict disorder based on sequence7,8,65,70,78,79,82,86–89. These predictors have
shown that disorder is ubiquitous10,68,76,79,86 in all kingdoms of life68, empha-
sising the importance of characterising this class of proteins. A significant pro-
portion of the proteomes of eukaryotes, eubacteria and archaea predicted to
comprise disordered regions of more than 40 contiguous residues4,8,69,84,90. The
exact proportions vary from study to study, however, and may in some cases be
overestimates83, as ligand-induced structure91 as well as the crowded nature of
the intracellular environment5,11,92–96 have been shown to increase the degree
of structure exhibited by such proteins.
IDPs participate in numerous non-catalytic interaction-based biological func-
7
tions10,11,59,60,62,63,65,66. These include protein-nucleic acid
interactions7,8,62,84,97,98 during transcription4,62 and translation4,11,61,62,76 and
protein-protein interactions99 that contribute to cellular scaffolding4,84, ion
binding84 and vesicle fusion100 and regulate signal transduction11,61,62,74,76,101,102
and the cell cycle4,11,61,76. In fact, it has been shown that the hubs in the
scale-free protein-protein interaction networks that define cellular function are
typically unstructured or partially structured proteins65. In contrast, enzymes
rarely contain disordered regions, especially those involved in biosynthesis and
metabolism8,74, the prime exception being regulatory kinases8. The prevalence
of disordered regions in the genomes of eukaryotes is therefore likely to be a
consequence of the increased need for cell signalling and regulation in higher
organisms65,90.
The means by which IDPs carry out their functions are intrinsically linked to
their unique physical characteristics103. The relatively large solvent-accessible
surface areas (SASA) of extended disordered structures makes large surfaces
available for intermolecular interactions59. Provision of an equivalently sized
interface by a structured protein would require a 2 − 3-fold increase in molec-
ular weight, resulting in either increased cellular crowding or an enlargement
of cell size by 15− 30%59,62. Moreover, the coupling of folding with binding83
affords low affinity but high specificity binding7,59,98, thus providing fine ther-
modynamic control. Such easily reversible binding is a fundamental requirement
for signalling104.
The conformational heterogeneity of IDPs allows functional diversity at a
single-protein level5,7,11,65,105 in a natural extension of the familiar principle
of allostery5. Alternative splicing may further increase the range of possible
conformations106. The binding of multiple partners by a single protein is a key
feature of the role of IDPs as network hubs; different regions of the protein
can participate in different pathways to avoid cross-talk65. Additionally, con-
formational disorder provides a mechanism for controlling protein activation5.
Dynamic flexibility facilitates post-translational modifications such as phospho-
rylation and ubiquitination, which are common regulatory mechanisms, as the
substrate protein can conform more easily to the active site of the modifying
enzyme7. Furthermore, the rapid proteolytic degradation of IDPs permits fast
and accurate responses to changes in the environmental conditions3,4,62,65. It
has also been shown that allosteric coupling is maximised when one or more of
the coupled domains is intrinsically disordered107.
It has been proposed that the conformational diversity and functional promis-
cuity of IDPs are necessary for the co-evolution of protein fold and function5,
8
and thus may be evolvable traits. The expansion of internal repeat regions may
power such evolution71. However there are also some disadvantages to disorder.
IDPs appear to be related to the promotion and proliferation of protein-folding
diseases5,62,74,108 including many neurological conditions. Their role as hubs in
signalling networks means that mis-function has serious consequences, including
the development of cancer8,74. Characterisation of IDPs is therefore important
for understanding both the normal functioning and pathogenesis of biological
entities.
1.2.1 α-synuclein and β-synuclein
α-synuclein (αS) and β-synuclein (βS) are IDPs67,108–110 and so are disordered
in solution4,10,11,59,62,63. Despite being closely related, αS forms amyloid fibrils
in vivo whereas βS does not. The characterisation of these proteins along with
a related construct, β+HC (see below), described in Chapter 5, provides an
opportunity to gain insight into the determinants of intrinsic disorder and the
factors that govern the aggregation of IDPs at a molecular level.
αS and βS are members of the synuclein family of proteins, all of which are of
similar size (∼ 127− 140 residues)10,111,112. They show 62% sequence identity,
mostly due to the conserved imperfect repeats of the KTKEGV lipid-interaction
domain in their N-termini. Despite considerable sequence divergence, the C-
termini of both proteins contain a large number of acidic residues. Perhaps the
most important difference between the two proteins is the absence in βS of 12
mostly hydrophobic residues from within the central non-amyloid β component
(NAC) region (residues 61− 95) of αS113.
Upon binding to lipid membranes and mimetics such as sodium dodecyl
sulfate (SDS) micelles the N-terminus of each protein forms two anti-parallel
α-helices, with a break around residue 40114–120. The C-terminus remains dis-
ordered in the lipid-bound state114–122, which is thought to facilitate its inter-
actions with a variety of binding partners64,65. Although the physiological roles
of αS and βS remain unclear123, several pieces of evidence suggest that their
functions are mediated by lipid binding114,121,124–129.
αS forms amyloid fibrils both in vitro and in vivo, where it is the primary
constituent of the amyloid plaques found in PD and the related dementia with
Lewy bodies (DLB)130–133. The causative link has been further strengthened by
the identification of three mis-sense mutations of αS and a gene triplication, all
of which lead to familial PD and DLB134–139. The pre-fibrillar aggregates rather
than the mature fibrils appear to be the cytotoxic species140–142; accordingly,
protofibril formation is accelerated by the PD-linked mutations141,143. In con-
9
trast to αS, βS does not aggregate in vivo144 and requires specific conditions
to induce in vitro aggregation110,145–147. In fact, it may inhibit αS aggrega-
tion148,149.
The contrasting aggregation profiles of αS and βS may be partly due to the
differences in the C-termini150, although this region is known to reduce the ag-
gregation propensity of both proteins145,151. C-terminal-truncated forms of αS
are key components of Lewy body deposits131,133,152 and C-terminal-truncated
mutants of both αS and βS aggregate more readily in vitro145,151,153,154. The
C-termini also have much lower predicted aggregation propensities than the re-
mainder of either sequence145. The major difference between the two proteins
is the greater number of negatively charged residues in the C-terminus of βS,
which may facilitate electrostatic interactions with the N-terminus, thus reduc-
ing the exposure of the central NAC region and inhibiting aggregation. However
a construct containing residues 1− 97 of αS and residues 87− 134 of βS forms
fibrils at a similar rate to wild-type αS in vitro, indicating that differences in the
C-termini are unlikely to be solely responsible for the differences in aggregation
propensity.
The absence of residues 73− 83 of αS from βS147,155 appears to be a more
likely cause of the different aggregation properties. These residues lie within the
central NAC region, which forms the core of αS amyloid fibrils155 and is neces-
sary for fibril formation, particularly residues 66−74156. Residues 71−82 of αS
aggregate alone155,157 whereas αS∆71-82, in which residues 71−82 have been re-
moved from wild-type αS, does not aggregate under physiological conditions155.
On the other hand, αS∆73-83 forms fibrils even faster than wild-type αS145
and αS∆83 has an extremely high predicted aggregation propensity (Zagg)145.
Moreover, the Zagg and measured aggregation rate of the α/β construct stud-
ied in this thesis, β+HC, in which the 11 residue hydrophobic core from αS
(residues 73− 83) that is missing from βS is inserted into βS following residue
72, are closer to those of βS than αS145. These observations have led to the
idea that the negatively charged E83 may act as ‘gatekeeper’ residue145,155,156,
preventing or reducing aggregation by breaking up the stretch of hydrophobic
residues in the NAC region and thus disrupting hydrophobic inter-molecular
interactions.
The contrasting aggregation behaviour of αS and βS despite their high se-
quence homology and similar lipid-bound structures implies that the key to
understanding their differences lies in the solution state from which both lipid
binding and aggregation are initiated, hence furnishing the motivation for their
study by ERMD simulations (Chapter 5). A variety of experimental data have
10
been gathered for both αS and βS in solution109,158,159. The N-termini of both
proteins exhibit helical propensity, although to differing degrees. For βS, there
are two distinct regions of higher helical propensity, comprising residues 20−35
and 55−65158. In αS, residues 6−37 have the greatest helical propensity109, al-
though the region of helical propensity extends further towards the N-terminus
than in βS159. There is some suggestion that the break between the helices
that occurs in the lipid-bound structure of βS also occurs to some extent in the
solution structure158. The C-terminus of βS appears to form transient polypro-
line II (PPII) structure145,158, whereas the C-terminus of αS is more disordered.
This difference, which implies an increased stiffness of βS, is most likely due to
the higher negative charge (-16 compared to -14) and greater number of proline
residues (8 compared to 5) in the C-terminus of βS. The comparison of the
ensembles of structures representative of αS, βS and β+HC in solution with
each other and with the experimental data outlined above carried out in Chap-
ter 5 allows the relationship between the differing structural propensities and
aggregation properties of the three proteins to be clarified.
1.3 Methods for characterising disordered states
1.3.1 Experimental methods
The heterogeneity of DS poses severe methodological challenges to their study
by both experimental and computational methods. Traditional structure de-
termination methods such as X-ray crystallography are inappropriate for char-
acterising dynamic ensembles. Various solution spectroscopy techniques have
been applied160, although the wide variety of structures present at any point in
time and their rapid interconversion can hamper the extraction of meaningful
structural information. The most successful of these techniques so far have been
small-angle X-ray scattering (SAXS) and NMR.
The majority of the experimental data utilised in this thesis were deter-
mined by NMR spectroscopy, which is a particularly powerful technique for
characterising DS as it is capable of providing site-specific structural informa-
tion. Additionally, NMR observables contain information about the underly-
ing conformational distribution, although in practice it is extremely difficult
to extract this information. Accessing the underlying distribution is especially
important for DS, for which an average structure is unlikely to be an appropri-
ate representation. NMR is important because it provides the opportunity to
gain a complete description of DS in terms of both the nature of the accessible
conformations and their relative populations. Of the techniques discussed in
11
the remainder of this section, some are used quantitatively in the subsequent
chapters as restraints in the ERMD and to assess the quality of the calculated
ensembles, whereas others merely provide a qualitative aid to the interpretation
of the residual structure exhibited by the calculated ensembles.
Global dimensions
Reproducing the global dimensions of DS proved to be fundamental to the
success of the simulation methods investigated in this thesis. The global di-
mensions of a polypeptide chain can be probed by NMR and SAXS, both of
which yield an average over all molecules in solution and the time-scale of the
experiment. Pulsed-field-gradient (PFG)-NMR supplies the translational diffu-
sion coefficient from which the Rh can be calculated in the form of⟨R−1
h
⟩−1,
where the angular brackets denote time- and ensemble-averaging161. The most
common parameter extracted from a SAXS experiment is the radius of gyration
(Rg), which is determined as the root-mean-square (rms) average,⟨R2
g
⟩1/2. The
distribution of all pairwise interatomic distances, p(r) and information regard-
ing the overall shape of the macromolecule can also be obtained162,163. The⟨R−1
h
⟩−1is the preferred measure of the global dimensions for the work de-
scribed in this thesis, as PFG-NMR is conducted under similar conditions to
the remainder of the experimental measurements. p(r) is a potentially impor-
tant quantity, however, due to the scarcity of experimental techniques able to
report on distribution functions, hence its fitness as a means of quantitatively
comparing ensembles is tested in Chapter 4.
Chemical shifts
The utility of NMR stems from its ability to provide local as well as global in-
formation. Chemical shifts, δ, report on the chemical environment experienced
by an atom. The chemical shift dispersion for DS is poor due to conformational
averaging164,165, but it is usually possible to assign the majority of the peaks
using triple-resonance experiments. Deviations from the values expected for a
random coil, referred to as secondary chemical shifts, ∆δ, are used to infer the
tendency of individual residues to sample helical, PPII or extended β-sheet-like
structure166,167. The absolute values of ∆δ recorded for DS are generally much
lower than for residues in fully formed secondary structure elements. Attempts
have been made to obtain quantitative estimates of the fractional occupancy of
the various types of secondary structure167,168, but such analysis is complicated
by the fact that ∆δ from different types of nuclei and residues are not equally
sensitive to secondary structure169. Thus in most cases, ∆δ are simply inter-
preted in terms of structural propensities145,158,170–173. In the work described
12
here, they are used to help interpret the residual structure propensities of the
ensembles of structures calculated in Chapters 5 and 6.
3J-couplings
Additional information regarding local structural propensities can be obtained
from 3J-couplings, which report on the φ and ψ dihedral angles of the polypep-
tide backbone174–176. The conformational fluctuations that occur in DS, how-
ever, preclude the direct interpretation of 3J-couplings in terms of a particular
type of secondary structure. For instance, although the characteristic 3JHNHα-
couplings for α-helices and β-sheets are ∼ 4.8 and ∼ 8.5 Hz, respectively, aver-
aging over the contributing conformers results in a shift in the 3JHNHα-couplings
measured for DS towards intermediate values175. It is still possible, however,
to make inferences regarding conformational preferences by considering the de-
viation from the expected random coil values. In this thesis, comparison of
the experimental 3JHNHα-couplings with those back-calculated from various en-
sembles is used to evaluate the legitimacy of the description afforded by both
unrestrained and restrained simulations.
Transverse relaxation rates
The dynamics that complicate the derivation of structural information from3J-couplings can be probed by spin-relaxation NMR techniques. Measurement
of the heteronuclear 15N transverse relaxation rates (R2) of backbone amide
groups allows the identification of regions undergoing restricted motion up to the
ms time-scale170,177. If a simple model is used in which the physical properties
of the polypeptide chain are dominated by unrestrained segmental motion of
the polypeptide main chain178, the R2 values for a fully denatured protein
are predicted to follow a bell-shaped curve, with the shortest relaxation rates
occurring for the terminal regions of the protein. Positive deviation of the R2
values may then be attributed to the presence of non-random structure such as
clusters of hydrophobic side chains170 or regions of increased stiffness. Such an
interpretation is used in this thesis to aid the identification of residual structure
from the calculated ensembles of αS, βS (Chapter 5) and PI3-SH3 (Chapter 6),
although without specifically defining the parameters of the model.
Residual dipolar couplings
Residual dipolar couplings (RDCs) are emerging as a particularly powerful
NMR technique, as they report on both structure and dynamics, providing
long-range as well as local information. RDCs probe the orientation of bond
13
vectors relative to the magnetic field179. In isotropic solution, the dipolar cou-
plings average to zero, thus weak alignment of the macromolecule of interest
is required179. This is most commonly induced by carrying out the measure-
ment in dilute liquid crystal media180–186 or in axial matrices such as stressed
polyacrylamide gels187,188.
The measured coupling is an average over all orientations of a given con-
formation with respect to the magnetic field and all conformations sampled by
the macromolecule, thus RDCs report on both the overall shape of the macro-
molecule and the local dynamics of the chemical bond. Because RDCs are av-
eraged over much longer time-scales (ms) than traditional spin-relaxation NMR
experiments (ps-ns), they provide complementary information by reporting on
slower molecular motions that are otherwise inaccessible179,189. The angular
degeneracy of RDCs, however, means that either multiple different types of
couplings must be measured, or the experiments must be repeated in media in
which the alignment of the macromolecule is significantly different179.
If the structure of the molecule is known, then an expected alignment tensor
can be estimated based on the physical properties of the solute in combination
with an appropriate description of the mechanism of alignment. For purely
steric alignment, only short-range repulsive forces dependent on the size and
shape of the molecule need to be taken into account. In charged alignment me-
dia, the situation is more complicated and the electrostatic properties of both
the solute and the liquid crystal must be considered. Methods for computing the
alignment tensor in both situations have been developed190–192. Based on these,
RDCs have been used to define the relative orientations of domains of known
structure and ligand-receptor geometries, validate structures obtained using ho-
mology modelling and refining structures determined using other experimental
observables179. The use of RDCs in ab initio structure determination is com-
plicated by the fact that the magnitude and orientation of the alignment tensor
is not known a priori179. A further limitation is their orientational degeneracy
and the resulting complexity of the energy landscape.
If the molecule or domain under investigation can be considered to be rigid,
then its preferential orientational averaging, including the effects of imperfect
alignment, can be described in terms of the alignment tensor. The measured
coupling then depends simply on the orientation of the inter-spin vector in the
eigenframe of the alignment tensor179. Such a description is seldom appropri-
ate for proteins in solution, however, as even folded globular proteins undergo
significant thermal motion so that the measured RDC incorporates both time
and ensemble conformational averaging193. Various techniques for overcoming
14
the aforementioned problems have been developed, mostly pertaining to the
determination of folded NS ensembles.
In the case of DS, the analysis of RDCs is further complicated by the fact
that the internal frame of reference is dynamic on the time-scale of the mea-
surement194. Initially, it was implicitly assumed that this would mean that the
RDCs measured for a random coil would be uniformly zero195, ignoring early
work showing that the ensemble of conformations sampled by a random flight
chain is not spherically symmetric196. Various studies have since confirmed this
result both theoretically195 and experimentally197. It is now well understood
that a random flight chain will give rise to a bell-shaped distribution of RDCs
throughout the sequence, due to the fact that RDCs are local probes and, at
individual loci along the chain, the distribution of orientations of the chain
segment are non-random195. As the most elongated structures align most effec-
tively194, the measured dipolar couplings incorporate information regarding the
range of shapes present as well as their relative weights190. RDCs therefore con-
tain a wealth of information, but methods for implementing them as restraints
in multiple-replica simulations are still probationary, especially for cases such
as DS where the alignment tensor of each replica is expected to be significantly
different. For this reason, they are used here to examine the accuracy of the
calculated ensembles in a similar manner to the 3JHNHα-couplings.
Nuclear Overhauser enhancement
In addition to the parameters discussed thus far, NMR is also capable of pro-
viding inter-atomic distances. The most common form of information used to
determine the structures of NFPs are inter-atomic distances derived from nu-
clear Overhauser enhancements (nOes), cross-relaxation effects between protons
close together in space198. Their used for DS is limited because the expanded
nature of the structures comprising DS means that non-sequential nOes, which
provide the most useful information for structure refinement, are seldom de-
tected164,199, as they are only sensitive to distances up to ∼ 5 A. A modified
method, in which high levels of deuteration were used to increase the sensitivity
of the experiment, allowed a considerable number of long- and medium-range
nOes to be observed for one unfolded state200, but for another only medium-
range nOes were detected173. This approach also has several disadvantages164
which have precluded its widespread application.
Paramagnetic relaxation enhancement
In comparison to nOe experiments, paramagnetic relaxation enhancement
(PRE) is an NMR technique that is sensitive to distances in the range 12−20 A,
15
making it particularly useful for characterising DS199,201–205. In the work de-
scribed here, long-range distances derived from PRE-NMR provide the primary
structural information for determining ensembles of structures representative of
DS of proteins.
PRE-NMR utilises the enhancement of proton relaxation by free electrons
to provide information about the distance between a paramagnetic centre and
a nuclear spin173,199,201–204,206–211. The free electron may be provided by ions
bound to native or engineered207,209,211–215 metal-binding sites, modified amino-
acids216–218 or ligands219, intrinsically paramagnetic co-factors220 or, in site-
directed spin-labelling (SDSL), by a covalently attached
spin-label173,201–204,208,210,221,222, many of which contain nitroxide moieties. The
experimental data used in Chapters 5 and 6 of this thesis were determined us-
ing SDSL with 1-oxyl-2,2,5,5-tetramethyl-3-pyrroline-3-methyl methanethiosul-
fonate (MTSL), an example of a nitroxide spin-label. The advantage of using
SDSL is that residues distributed throughout the sequence can be spin-labelled,
ensuring that distance information pertaining to the entire protein is obtained.
The contribution made by the free electron to the relaxation rate of the
amide protons in the protein of interest is defined as the difference between the
longitudinal or transverse relaxation rates measured for the paramagnetic and
diamagnetic states208. The Solomon-Bloembergen equations are then applied
to derive the r−6 distance between each proton for which relaxation rates can
be measured and the free electron223. These equations are based on the as-
sumptions that the proton-electron vector is free to undergo isotropic rotational
diffusion and that its length is fixed199. The consequences of these assumptions
along with other aspects of the distance calculation are discussed in Chapters 4
and 5.
PRE-NMR has been used to refine the global fold of NFPs208–210, de-
termine the structure of integral membrane proteins206,217,221, follow protein-
protein211, protein-DNA207,213,214,220 and protein-ligand215,219 complex forma-
tion and characterise DS of proteins173,199,201–205,222,224. PRE effects have also
been examined using electron paramagnetic resonance (EPR) spectroscopy212,216
and solid-state NMR225. Whilst other experimental methods, notably fluores-
cence energy transfer (FET) and electron transfer (ET)226,227, are able to pro-
vide similarly long-range distance information, a typical PRE-NMR experiment
yields many more distances. This is an important prerequisite for avoiding
under-restraining in multiple-replica simulations, a matter that is discussed fur-
ther in Chapter 4.
16
1.3.2 Theoretical representations
The NMR techniques introduced in the previous section provide complementary
information on different aspects of protein structure and dynamics. In order to
interpret the measured observables, which are time- and ensemble-averages, a
conceptual model of the nature of DS is required. There has been much debate
about whether DS are best described as random flight chains, or whether there
is a significant amount of residual structure present.
A random flight chain describes an idealised state in which the bonds be-
tween atoms are of set length, but the angles are unconstrained, giving rise
to distributions of dihedral angles that are dependent only on local steric con-
straints228. The distributions of distances between pairs of atoms are Gaussian-
like in the limit of large sequential separations15. The global dimensions of
unfolded polypeptides measured by experiment229–233 and calculated from sim-
ulations15 agree with random coil predictions, although some IDPs, including
αS and βS, are more compact232,234,235.
In apparent conflict with these results, experimental techniques, in particular
NMR, that give site-specific conformational information, suggest that disordered
states are not completely devoid of
structure31,160,170,171,174–176,178,199,202–205,224,232,236–254, giving rise to the so-
called “reconciliation problem” of how to explain the simultaneous existence
of random coil scaling behaviour and a significant amount of local
structure12,13,233,236,246,255,256.
The solution appears to lie in the fact that the overall dimensions of a
polypeptide chain are relatively insensitive to either local conformational prefer-
ences or more global changes in the distribution of dihedral angles15. Theoreti-
cal studies have shown that random coil-like global dimensions do not preclude
the presence of some degree of residual structure257–259. As an extreme exam-
ple, ensembles of conformations constructed by introducing joints at random
into the structures of NFPs were shown to reproduce adequately the random
coil scaling of the⟨R2
g
⟩−1/2 258.
Whilst these observations explain the apparent discrepancy between random
coil dimensions and the existence of local structural preferences, the origin of
the residual structure observed for many DS remains to be accounted for. The-
oretical models have been developed that explain the limited menu of observed
protein folds in terms of symmetry and geometric considerations260–263, but
these have not been explicitly extended to apply to DS. One suggestion perti-
nent to DS is that the residual structure resides predominantly in hydrophobic
clusters170,247,253,254. It has also been proposed that steric repulsion among
17
side-chains may favour native-like topology in unfolded states236. Computa-
tional studies have failed to provide unequivocal support for such an effect
however15,256, even suggesting that the dihedral angle distributions undergo
a quantifiable shift upon folding264.
Further insight into whether dihedral angle preferences can explain the devi-
ations from random coil behaviour suggested by many experimental observables
has been gained from models of DS in which ensembles of structures are gener-
ated by selecting the dihedral angles from coil library databases265,266. These
databases describe the amino-acid specific probabilities of each φ/ψ combina-
tion for residues in loop regions of high-resolution X-ray structures. Whilst it
is not clear whether the analogy between such regions and DS is appropriate,
such models have been remarkably successful in reproducing the residue-level
patterns of experimental observables such as RDCs257,266 and 3J-couplings172.
Inclusion of nearest neighbour effects on the dihedral angle preferences257,267
improves the agreement with experimental data. However in some cases it has
been found that additional information is required in order to explain the exper-
imental observations. For instance, although both the bulkiness of the amino
acids and the RDCs predicted from a coil library ensemble correlate well with
the RDCs measured for the urea-denatured state of αS268, it is necessary to
enforce long-range interactions between the N- and C-termini to obtain a good
match with the experimental data for the unperturbed solution-state ensem-
ble266. On the other hand, when the effects of electrostatic interactions with
charged alignment media are taken into account, the inclusion of long-range in-
teractions no longer improves the predicted RDCs269. To further investigate the
appropriateness of such models as a description of DS ensembles, coil library
ensembles obtained for αS and PI3-SH3 are analysed in Chapters 3, 5 and 6
alongside the ensembles produced in those chapters.
1.3.3 Biomolecular simulations
Computer simulations of biological molecules provide a link between theory and
experiment. The information available from experimental measurements is com-
plemented by the provision of distributions of the properties of interest as well
as atomic-level structural detail270. The ability to visualise protein structures
and their motions has greatly enhanced our understanding of the mechanisms of
protein folding and aggregation by providing a conceptual link between the ab-
stract chemistry of the amino acid sequence and the biology of the 3D fold. The
distributions of observables accessible by simulation are particularly important
for DS, where a broad and heterogeneous range of conformations contribute
18
to the time- and ensemble-averaged experimental observables. In such cases,
the relationship between an observable and the underlying distribution is far
from simple. Simulations such as those described in this thesis are therefore
indispensable tools with which to interpret experimental data.
In order to carry out simulations, the molecule(s) of interest must be repre-
sented in silico. This is done using molecular mechanics force-fields270–274 which
comprise various terms describing protein geometry. The functional form of the
potential energy of a given conformation is a sum of individual energy terms:
E = Ecovalent + Enon−covalent. The covalent term includes contributions from
the bond lengths, bond angles and torsion angles (dihedral and improper) and
the non-covalent term comprises the van der Waals (vdw) interactions between
non-bonded pairs and electrostatic interactions between partial charges. Hy-
drogen bonding is often implicitly included in the non-bonded interactions271.
Unlike the remainder of the terms, which provide the energy of a protein in vac-
uum, the electrostatic interactions depend on the environment of the atom(s) in
question. Both theoretical considerations and simulation results indicate that
the effective energy hypersurface of a protein, which includes the effects of sol-
vent, is significantly different from the intramolecular energy hypersurface275.
It is therefore desirable to include solvent in biomolecular simulations, although
the random coil model used throughout this work, which comprises only a sim-
plified representation of the polypeptide chain, is found to afford a reasonable
approximation of a fully unfolded state that is computationally efficient.
Solvent models
So called ‘explicit solvent’, in which the solvent molecules are simulated along
with the biomolecule of interest, provides the most exact representation of the
solvent environment270,276,277. The very large number of solvent molecules re-
quired to model bulk solution, combined with the expanded structures typical
of DS and the long simulation times required to sample the large regions of
conformational space accessible to DS are expected to make explicit solvent un-
suitable for the simulation of DS, a premise that is confirmed by the limited
testing of this technique reported in Chapter 3. For this reason, implicit solvent
models were used for the remainder of the simulations that embody this work.
The computational expense of simulating large biomolecules in explicit sol-
vent has led to the development of various implicit solvent models by substitut-
ing speed for accuracy277. These are generally classified as either empirical or
continuum electrostatics solvation models276,278,279, depending on the theoreti-
cal approaches used to describe the solvation. Essentially, a solvation correction
19
is combined with the usual molecular mechanical force-fields describing the in-
tramolecular interactions in vacuum280. The influence of solvent is expressed
within a theoretical framework based on a statistical mechanical formulation of
the so-called ‘potential of mean force’ (PMF)276, whereby the free energy of
the system is expressed as an average over all solvent degrees of freedom280.
As well as increasing the speed of the calculations, this mean field approxi-
mation ameliorates the need for long simulation times to adequately sample
the instantaneous solute-solvent interactions277. The significant enhancement
of computational efficiency provided by implicit solvent models is particularly
beneficial for the simulation of DS, as it allows a wide range of structures to be
sampled within a reasonable time-frame, although the kinetic behaviour may
be unrealistic277,281.
Implicit solvent models have been widely used for a range of applications
including scoring functions for distinguishing native structures from non-native
decoys282–287 and mis-folded structures288,289, the calculation of binding free en-
ergies for protein-protein and protein-ligand interactions290–294, molecular dy-
namics simulations of folding and unfolding trajectories281,295–306 and the pro-
cess of aggregation307,308 and the determination of folding landscapes281,309–323.
They have also been used in biased MD simulations, often in combination
with experimental data, for the refinement of native and near-native struc-
tures287,324,325 and the generation of transition46,298, intermediate326, molten
globule327 and disordered state ensembles201,202,204,328.
One disadvantage of using implicit solvent models is that they are parame-
terised to reproduce the compact globular structures typical of NFPs, thus when
characterising partially or fully unfolded states, artificial means of overcoming
this bias towards compact structures, such as carrying out the simulations at
unphysically high temperatures, are required. As is found in Chapter 3, this
may reduce the quality of the description of the protein-like features of the
molecule, thus methods have been developed for including experimental data
as restraints (see below). The improvement and application of one of these
techniques, ERMD, forms the basis of Chapters 4, 5 and 6.
1.3.4 Ensemble-restrained molecular dynamics
Restrained MD simulations provide a means of overcoming force-field inaccura-
cies and alleviating statistical sampling errors by biasing the trajectories towards
experimentally relevant areas of conformational phasespace. When incorporat-
ing experimental data into simulations, it is essential to consider the fact that
experimental observables are averages over the duration of the experiment and
20
the ensemble of molecules present329–331. This is particularly important for DS,
where an average structure is unlikely to be representative. In fact, it may be
physically impossible to find a single structure that satisfies all experimental
observables simultaneously330.
Various methods have been suggested for taking experimental averaging into
account when implementing restraints in MD and Monte Carlo (MC) simula-
tions. One technique is to apply a restraining force if the average of an observable
over a predetermined time-window prior to the current time does not satisfy the
restraint332–334. An alternative approach is ERMD, in which multiple copies
of a molecule are simulated in parallel and the restraint enforced upon the
ensemble average of the observable at each point in time204,328,330,331,335–344.
Simultaneous time- and ensemble-averaging has also been used345. A range
of different protein states have been characterised by ERMD, including disor-
dered201,202,204,205, intermediate, transition and folded
states204,326,337,339–341,346,347.
The ability of ERMD to generate an ensemble of structures that, on average,
satisfies the experimental data is the key to its usefulness for characterising DS
of proteins, for which an average structure does not provide an adequate rep-
resentation. However there remain many issues pertaining to the relationship
between averages and distributions. These are discussed in Chapter 4 and solu-
tions are proposed and tested using synthetic restraints prior to the application
of ERMD with experimental data that forms the basis of Chapters 5 and 6.
1.4 Overview
The results reported in this thesis commence with a thorough investigation of
the ability of unrestrained MD simulations to produce ensembles of structures
representative of DS of proteins using the IDP αS as a model system (Chap-
ter 3). The calculated ensembles are compared with the experimental data
available for αS, firstly in terms of the global dimensions and then with re-
spect to observables that provide more detailed structural information. The
best method identified from these trials is used to generate two reference en-
sembles from which synthetic distance restraints equivalent to those obtained
from a PRE-NMR experiment are calculated. These restraints are used in a
series of tests in which the previously published ERMD method201,202,205 is im-
proved, making it generally applicable to any DS (Chapter 4). The changes
that are made are justified according to how well the reference ensembles are
reconstructed. The resulting protocol is used to produce ensembles of structures
21
representative of the IDPs αS, βS, the artificial construct β+HC (Chapter 5)
and the acid-denatured state of PI3-SH3 (Chapter 6). Interpretation of these
ensembles with recourse to the experimental data for each protein provides in-
sight into the factors that govern the balance between folding, mis-folding and
intrinsic disorder.
22
Chapter 2
Methods
2.1 Simulation methods
All simulations were carried out within the charmm molecular simulation pack-
age (v. c32a2)271. Where more than one copy of the molecule was simulated
in parallel an in-house version of charmm that has been modified to allow re-
straints to be applied across multiple replicas (ensemble-charmm) was used.
Newtonian dynamics were used, and the Nose-Hoover thermostat348,349 was
employed to ensure that the kinetic energy was compatible with the desired
temperature. Bond lengths were constrained with the shake algorithm350, al-
lowing for an integration timestep of 2 fs. The starting structures for each
protein were generated by building the coordinates for a linear structure from
the amino acid sequence, minimising the energy, running a high temperature
(500 K) simulation with the eef1280 implicit solvent model, and selecting at
random a set of relatively expanded structures. The final ensemble for each sim-
ulation was obtained by pooling together all of the structures obtained during
the production phase; if multiple replicas were used, these were pooled as well.
2.1.1 Unrestrained simulations
Random coil model
A random coil model for each protein (Chapters 3, 5 and 6) was produced
using the charmm19 polar hydrogen representation with the non-bonded inter-
actions truncated so that only the repulsive part of the Lennard-Jones potential
remained (CUTNB 6.0 CTOFNB 3.5 CTONNB 3.0). The simulations were run
in vacuum and electrostatic interactions were ignored. The simulation temper-
ature, T , was typically 500 − 600 K to enhance the rate of sampling, but the
23
nature of the resulting ensemble was similar at lower T . The coordinates were
saved every 20 ps for 200 ns, giving 10 000 structures in total.
Explicit water
Simulations of αS in explicit water (Chapter 3) were carried out using the
charmm22351 all-atom potential for the protein and the TIP3P water model352
for the solvent. Periodic boundary conditions were used with a cutoff of 14 A on
the non-bonded interactions. A water box of dimensions 58× 68× 68 A, large
enough to avoid self-interaction of a reasonably expanded αS structure, was
built by translation of a previously equilibrated box. The energy of the starting
structure was minimised prior to insertion in the water box. Once solvated,
the protein was first equilibrated at 300 K for 5 ps with a harmonic restraint
on the positions of all atoms. The force constant was then reduced from its
starting value of 10 kcal·mol−1·A−2 in a series of steps consisting of 50 ps with
a force constant of 1.0 kcal·mol−1·A−2 on all atoms, 25 ps with a force constant
of 0.5 kcal·mol−1·A−2 applied to backbone atoms only, and finally 20 ps with
no restraints. The temperature was then increased to 330 K in 5 K increments
(5 ps per increment). 330 K rather than 300 K was used to increase the rate of
conformational sampling. After further equilibration for 40 ps at 330 K without
restraints, structures were collected every 2 ps for 1 ns. The lengths of covalent
bonds to hydrogen atoms were constrained with shake throughout to prevent
the energy change between progressive integration steps exceeding 20%.
CHARMM generalised Born
Simulations of αS with the generalised Born/surface area (GB/SA) solvation
model (Chapter 3) were carried out using the charmm ‘gbsw’ module, which im-
plements a simple switching function to smooth the electrostatic and non-polar
solvation energy and forces at the boundary353,354. Both the charmm19280 and
charmm22351 representations were tested, with similar results; only those per-
taining to the charmm22 are reported here. Default settings for the integration
parameters, grid spacing, and Coulomb field settings were used. The Born radii
were updated at every integration step.
In the gbsw implementation, the non-polar solvation contribution is con-
sidered only when a non-zero SGAMMA is issued. A zero SGAMMA was
tested along with the default value of 0.03 kcal·mol−1·A−2 after preliminary
simulations using amber showed that eliminating the surface tension term ap-
peared to reduce the bias towards collapsed structures (data not shown). The
change was justified on the grounds that this term was parameterised for na-
tively folded proteins for which a large proportion of the surface of the polypep-
24
tide chain is buried. Simulations were carried out at 300, 350, 400, 500 and
600 K with SGAMMA = 0.03 kcal·mol−1·A−2 and at 300 and 350 K with
SGAMMA = 0.00 kcal·mol−1·A−2. 20 independent replicas were simulated in
parallel to enhance the conformational sampling. The starting structures were
minimised in GB/SA prior to starting the simulation. The molecules were first
heated to the desired temperature in 50 K increments (10 ps per increment),
then equilibrated for 0.2 ns before collecting coordinates every 5 ps for further
analysis.
SASA and EEF1
All simulations using the sasa355 and eef1280 implicit solvent models (Chap-
ters 3, 4, 5 and 6) were carried out using the charmm19280 representation, for
which they were exclusively parameterised. The default cutoffs for non-bonded
and electrostatic interactions were used. Periodic boundary conditions were used
with the sasa model because the polypeptide undergoes marked translation in
this system. Multiple independent replicas, typically 16 − 24, were simulated
in parallel to facilitate conformational sampling. The system was first heated
to the desired temperature in 50 K increments (10 ps per temperature), then
equilibrated briefly (0.2 ns) before collecting coordinates every 5 ps for further
analysis.
Reference ensembles
Two different (unrestrained) αS reference ensembles (Chapter 4) were gen-
erated using the eef1280 implicit solvent model as described above. The first,
REF23, was generated at 540 K using 20 independent replicas and the second,
REF20, at 505 K using 16 independent replicas. Structures were collected every
5 ps (2500 steps) for 20 ns per replica, giving a total of 400 ns, or 80 000 struc-
tures for REF23 and 320 ns, or 64 000 structures for REF20. REF23 was filtered
to increase the degree of residual structure by selecting only those structures
with more than 15 contacts between the NAC region (residues 61− 95) and the
C-terminus (residues 110−140) (see section 4.4). Two residues were considered
to be in contact if their Cα atoms were within 8.5 A205. The final ensemble
consisted of 23 675 structures.
2.1.2 ERMD
In ERMD (Chapters 4, 5 and 6), the restraints are applied to multiple indepen-
dent replicas simulated in parallel204,328,330,331,335–344. A reaction coordinate, ρ,
is defined as the difference between the current average of each observable across
25
all replicas, f calcl , and the restraint, f ref
l , averaged over all Nrestr restraints:
ρ(t) = N−1restr
Nrestr∑
l=1
(f ref
l − f calcl (t)
)2. (2.1)
When the restraints are distances derived from PRE-NMR experiments
(PRE-ERMD),
f calcl (t) = dcalc
ij (t) =
N−1
rep
Nrep∑
k=1
r−6ij,k(t)
−1/6
, (2.2)
where rij,k(t) is the distance between residues i and j calculated from replica
k of the restrained ensemble at time t and Nrep is the number of replicas. r−6
averaging is used because the distances calculated from the PRE experiment
are r−6 averages.
In the work constituting this thesis, f refl = dref
ij was either the ensemble-
averaged distance calculated from one of the reference ensembles according to
equation 2.8 (Chapter 4) or the distance calculated from the experimental data
(equations 2.6 and 2.7 below) (Chapters 5 and 6). The distance between residues
i and j was defined as being between the Cα atom of the spin-labelled residue
i and the amide hydrogen of residue j. The reasons for this choice are outlined
in Chapter 4.
During PRE-ERMD simulations, dcalcij (t) is allowed to vary freely within
a harmonic square well defined by the lower (L) and upper (U) boundaries.
Justification for the use of these boundaries and the values of L and U chosen
for use in the general ERMD method developed here is given in Chapters 4
and 5.
To enforce the restraint, an energy penalty of the form
αNrep
2(ρ(t)− ρ0(t))
2 (2.3)
is added to the potential energy if ρ(t) > ρ0(t), where
ρ0(t) = min[ρ(τ)] (0 ≤ τ ≤ t) (2.4)
and α is a force constant associated with the restraints. In this way, as the
simulation proceeds, the ensemble of structures is progressively biased towards
structures that, on average, satisfy the restraints.
When the Rg was restrained (Chapter 3), f refl was the desired Rg and
26
f calcl = N−1
rep
Nrep∑
k=1
Rg,k(t) (2.5)
where Rg,k(t) was the Rg of replica k at time t.
The PRE-ERMD described in Chapters 4, 5 and 6 was carried out using the
sasa355 implicit solvation model. An extra phase was included immediately
after the heating stage during which α was increased from its starting value of
500 to its final value (Table 2.1) by a factor of 3 every 10 ps. Nrep, L, U and T
were varied as discussed in Chapter 4.
2.2 Restraints for PRE-ERMD
2.2.1 Calculation of distances from experimental data
The PRE-NMR data for αS205 used in Chapter 5 were obtained from M.M. Ded-
mon, including data for an additional spin-label attached to residue N122 which
gave rise to a further 117 distance restraints. PRE-NMR experiments were con-
ducted for βS and β+HC (Chapter 5) by R.C. Rivers and on PI3-SH3 (Chap-
ter 6) by N.R. Birkett. Individual residues throughout the sequence of each
protein were mutated to cysteine for attachment of the paramagnetic spin-label
MTSL. The locations of the spin-labels were chosen so as to minimise the per-
turbation of any residual structure predicted on the basis of ∆δ and, for βS
and β+HC, to match the previous PRE-NMR analysis of αS. The identity of
the spin-labelled residues and the total number of distance restraints for each
protein are shown in Table 2.1.
The 1H-15N heteronuclear single quantum coherence (HSQC) spectra of the
labelled protein was recorded with the spin-label in its oxidised (paramagnetic)
and reduced (diamagnetic) states. The PRE due to the presence of a free
electron was quantified by the intensity ratio, Iox/Ired, which compares the
intensity (height) of the cross-peaks in the oxidised (Iox) and reduced (Ired)
states.
The paramagnetic relaxation enhancement, Rsp2 , was determined by fitting199,208
Iox
Ired=
R2exp(−Rsp2 t)
(R2 + Rsp2 )
, (2.6)
where t is the total INEPT delay time (15.72 ms). R2, the intrinsic transverse
relaxation rate, was assumed to be equal to the R2 of the diamagnetic sam-
ple and was estimated for each residue from the half-height linewidth assuming
27
Lorentzian line shapes. The electron-proton distance was then calculated ac-
cording to199,208
r =
[K
Rsp2
(4τc +
3τc
1 + ω2Hτ2
c
)]1/6
, (2.7)
where ωH is the Larmor frequency of the proton and K is a combination of
physical constants. τc is a correlation time that is discussed in more detail in
Chapter 5. The set of distances obtained in this manner were analysed as de-
scribed in Section 5.2.3 to account for experimental uncertainty and imprecision
arising from the nature of equation 2.7. Every 5th distance was excluded from
the working dataset to form a ‘free’ dataset for cross-validation.
2.2.2 Calculation of distances from reference ensembles
Synthetic αS distance restraints were calculated from REF20 and REF23 (Chap-
ter 4) so as to be analogous to the ‘PRE’ distances that would be obtained from
a PRE-NMR experiment. 8 residues distributed throughout the αS sequence
were selected to be ‘spin-labelled’. The r−6-averaged distance between the Cα
atom of each ‘spin-labelled’ residue i and all non-adjacent amide hydrogens on
residues j were calculated from the Nref structures comprising each reference
ensemble according to
drefij =
(N−1
ref
Nref∑
k=1
r−6ij,k
)−1/6
, (2.8)
giving 1000 restraints in total. This number corresponds to the upper limit on
the number of distances that can typically be determined experimentally. A
‘free’ dataset, consisting of a further 1000 distances, was also calculated for use
in cross-validation. r−6 averaging was used because the distances calculated
from the PRE experiment are r−6 averages.
2.2.3 Accounting for uncertainty in PRE distance restraints
The effect of uncertainty in Iox/Ired on the calculated distance was quantified
(Chapter 5) by calculating the distances corresponding to Iox/Ired ranging from
0 − 1, and then repeating the calculations with the Iox/Ired altered by ± 1, 5,
10 or 15%. To ensure that the calculated distances were physically reasonable,
the remaining parameters required for the distance calculation were the same
as for the calculation of distances from experimental Iox/Ired except that the
R2 used was the average over all residues in the sequence.
28
Table 2.1: The residues to which spin-labels were attached, the total number
of PRE distance restraints (NPRE) and the value of α used in the ERMD of
αS, βS, β+HC (Chapter 5) and PI3-SH3 (Chapter 6). αS(REF) refers to the
synthetic data back-calculated from REF20 and REF23 (Chapter 4); in all other
cases the data was obtained experimentally.
Protein Spin Label Positions NPRE α
αS(REF) A17 K34 G51 G68 A85 K102 D119 Y136 1000 364 500
αS Q24 S42 Q62 S87 N103 N122 595 364 500
βS A30 S42 S64 F89 A102 S118 A134 635 364 500
β+HC A30 S42 S64 A113 A145 578 364 500
PI3-SH3 M1 S2 L11 L24 L40 S43 E52 E61 G78 P84 639 121 500
The assignment of L and U was carried out in a similar manner for the exper-
imental (Chapters 5 and 6) and synthetic (Chapter 4) distance restraints. The
nature of equations 2.6 and 2.7 means that for high Iox/Ired, a small change in
Iox/Ired results in a large change in the calculated distance. For the experimen-
tal data, Iox/Ired > 0.85 were used as “negative” restraints206,208 by assigning
only a lower bound corresponding to d0.85ij −L, where d0.85
ij is the distance calcu-
lated from Iox/Ired = 0.85. For the synthetic data, the distance corresponding
to Iox/Ired = 0.85 was used as the upper limit.
As a general rule, Iox/Ired < 0.15 are unreliable206,208, as any experimental
uncertainty is large relative to the size of the measured Iox/Ired. Distances
calculated from experimental Iox/Ired < 0.15 and synthetic distances for which
dij < d0.15ij were therefore assigned only an upper bound corresponding to d0.15
ij +
U , where d0.15ij is the distance calculated from Iox/Ired = 0.15. The exact values
of L and U were varied as discussed in Chapter 4.
2.3 Analysis methods
2.3.1 Back-calculation of experimental observables
Rg and Rh
The geometric Rg was calculated from the heavy atoms of each structure
using charmm analysis facilities. During the development of the PRE-ERMD
method using synthetic data (Chapter 4), the ensembles were compared in terms
of the linearly averaged Rg. When experimental restraints were used (Chap-
ters 5 and 6), the⟨R−1
h
⟩−1of each ensemble was computed for comparison with
29
the experimental value. The harmonic mean was used to reflect the averaging
inherent in the experimental measurement. For each protein, the Rh of ∼ 200
structures of varying degrees of compactness was computed using hydropro356.
Default settings were used with six sizes of minibead ranging from 1.8− 2.8 A.
The molecular weight and partial specific volume were evaluated from the amino
acid sequence. The relationship between R−1g and R−1
h was parameterised by
linear regression.
When analysing the large ensembles representative of DS, the geometric Rg
of each structure was converted into an Rh according to the relevant equation
for that protein (equations 5.3, 5.4, 5.4 and 6.1). The overall⟨R−1
h
⟩−1was then
computed according to
⟨R−1
h
⟩−1=
(N−1
struct
Nstruct∑
k=1
R−1h,k
)−1
, (2.9)
where Nstruct was the number of structures in the ensemble.
3JHNHα-couplings
Of the various types of 3J-couplings able to be obtained experimentally, only3JHNHα-couplings have been measured for αS and βS (Chapters 3 and 5), but
these cannot be calculated directly from the coordinates of structures obtained
using the charmm19 representation because the Hα atoms are not explicitly
represented. They were therefore calculated indirectly by computing the φ angle
for a given residue, m, from the atomic coordinates of each structure, k, and
then applying the Karplus357 relationship358
3JHNHα,m,k = 6.4 cos2(φm,k)− 1.4 cos(φm,k) + 1.9. (2.10)
The couplings obtained in this manner were linearly averaged over all Nstruct
structures for each residue:
3JHNHα,m = N−1struct
Nstruct∑
k=1
3JHNHα,m,k. (2.11)
RDCs
Amide NH RDCs were calculated for αS (Chapters 3 and 5), βS (Chapter 5)
and PI3-SH3 (Chapter 6) using the steric version of the program pales190
with default settings and PDB format input files. The RDCs for each residue
(excluding P), m, of each structure, k, RDCm,k, were linearly averaged over the
ensemble of structures according to
30
RDC = N−1struct
Nstruct∑
k=1
RDCm,k (2.12)
The back-calculated RDCs were not scaled as the magnitude was already
similar to that of the experimental data and any discrepancies were not uni-
formly distributed along the sequence, so that scaling by a uniform factor did
not improve the agreement.
2.3.2 Statistics
〈Rg〉 (t)
The cumulative average of the Rg, 〈Rg〉 (t) (Chapter 3) was calculated ac-
cording to
〈Rg〉 (t) = N−1t
Nt∑τ=1
Rg(τ), (2.13)
where Nt was the number of structures collected at time t. Where multiple
replicas were run in parallel, 〈Rg〉 (t) was calculated separately for each replica.
Q values
The agreement between the synthetic or experimental observables and those
back-calculated from a calculated ensemble was quantified with a “quality fac-
tor”359:
Q =
(∑Nobsl=1 (f ref
l − f calcl )2
)1/2
(∑Nobsl=1 (f ref
l )2)1/2
, (2.14)
where Nobs was the number of observables of that type (such as working or free
PRE distances), and the f calcl were averaged over the pooled ensemble. A lower
Q value indicates a better agreement.
S values
To quantify the agreement of an entire distribution of a given observable the
distance measure344
sl =Nbins∑m=1
∣∣prefm,l − pcalc
m,l
∣∣ (2.15)
was used, where Nbins was the number of bins into which the histogram was
divided and the pm,l were the normalised probabilities of finding a particular
31
observable in bin m of histogram l. sl ranges from 0 − 2, with low values
representing similar histograms. Summation over all Nobs histograms quantifies
the overall agreement of two ensembles in terms of distance distributions:
S = N−1obs
Nobs∑
l=1
(sl). (2.16)
The values of sl and S depend on the bin width, the ideal value of which
depends in turn on the width of the distributions being compared. A bin width
of 1 A was found to be broadly suitable for the wide range of distance and Rg
distributions encountered. Using the same bin width for all distributions allows
universal comparisons of the sl values computed from different pairs of atoms
in the same and different ensembles. It is also a prerequisite for combining the
various sl into an overall S value.
Statistical errors
The statistical error (SE) in the back-calculated ensemble-averaged observ-
ables such as the Rg, Rh, PRE distances and RDCs and, where relevant, their
respective Q and S values was estimated by randomly splitting the data into
two sets and computing the averaged quantity or the Q or S value for each set.
The splitting was repeated 10 times such that 20 different averages or Q or S
values were collected. The standard deviation (SD) of these values was taken
as the SE. For all of the data reported here, the SE was less than 1% unless
explicitly stated otherwise.
To assess the contribution to the overall SD made by within-replica and
between-replica variation, two further SDs were defined (Chapter 3). SDbetween
was the SD of the set of ensemble-averaged observables, X:
SDbetween = SD(X), X =[〈X1〉 , 〈X2〉 , . . .
⟨XNrep
⟩], (2.17)
where the averaging was carried out separately for each replica.
SDwithin was the average of the set of Nrep SDs:
SDwithin = 〈SDrep〉 , SDrep =[SD1,SD2, . . . SDNrep
], (2.18)
where each SD in SDrep was for a different replica.
32
2.3.3 Correlation of distance distributions
The correlation between two distance distributions (Chapter 4) was investigated
by computing sl values (equation 2.15) to quantify the similarity between the 2D
distance histograms p(rAB, rAC) and p(rAB) ∗ p(rAC), where rAB and rAC were
the distances between the Cα atoms of residues A and B or A and C, respectively.
A, B and C were chosen from a set of 10 residues spaced approximately 14
residues apart along the sequence. This particular set of residues was selected
because they were not included in either the experimental or synthetic PRE
distance restraints, thus the identification of correlations was not complicated
by the direct influence of a restraint on those residues. A high sl value indicates
that rAB and rAC are correlated. It should be noted that an sl value of 2.0
was never obtained, even when B = C, because of the different resolutions of
p(rAB, rAC) and p(rAB) ∗ p(rAC).
The sl values were viewed as 2D plots of sl versus B and C for each value
of A. Discrete points rather than contours were plotted using the matlab (The
MathWorks, Inc) imagesc command. Because there were only 10 possible values
of A, B and C, the matlab interp2 function was used to linearly interpolate
the sl values in 2 dimensions, giving an estimated sl value for all possible BC
combinations for each of the chosen A. No extrapolation was possible, thus the
edges of the plots were left blank to signify a lack of data. The agreement
between the set of sl values computed for two different ensembles was evaluated
by linear regression.
2.3.4 Distance comparison maps
Distance comparison (DC) maps (Chapters 4, 5 and 6) were created by plotting
the rms distance between two residues, i and j, normalised by the rms distance
predicted for a purely random coil:
DC =
⟨dcalc2
ij
⟩1/2
⟨drc2
ij
⟩1/2. (2.19)
The rms inter-residue distances for the calculated ensemble were defined as
⟨dcalc2
ij
⟩1/2
=
(N−1
struct
Nstruct∑
k=1
d2ij,k
)1/2
, (2.20)
where Nstruct was the number of structures in the calculated ensemble. The rms
inter-residue distances for a random coil were predicted according to
33
⟨drc2
ij
⟩1/2
= 5.31N0.6sep, (2.21)
where Nsep was the sequence separation between the two residues. This empir-
ical equation was fitted to a model of a random flight chain constructed using
dihedral angles taken from a PDB coil library and including the effects of ex-
cluded volume 360. Similar results were obtained if⟨drc2
ij
⟩1/2
was calculated
from the random coil model of the protein in question. A more accurate equa-
tion for⟨drc2
ij
⟩1/2
that accounts for the location of the two residues within the
polypeptide chain360 was tested but the predicted rms distances were found
to be discontinuous for short sequence separations so it was not used. The
normalisation by⟨drc2
ij
⟩1/2
is important because it removes the dependence of
the inter-residue distance on the sequence separation, allowing pairs of residues
with different sequence separations and also proteins of different lengths to be
compared. The DC was not smoothed and was plotted as discrete points using
the matlab imagesc function.
2.3.5 Free energy landscapes
Free energy landscapes (Chapters 4, 5 and 6) were obtained for each ensemble
in the form of the two-dimensional histogram of p(Rg, X) according to
F (Rg, X) = − ln p(Rg, X), (2.22)
where p(Rg, X) was the joint probability distribution of the Rg and either
the SASA or the end-to-end distance, REE. The SASA was computed using
charmm analysis facilities. An individual value for each residue, averaged over
all structures in the ensemble, was also computed. The free energy landscapes
were displayed as filled contour plots using the matlab contourf function.
2.3.6 Ramachandran plots
Ramachandran plots (Chapters 3, 5 and 6) were created by computing the φ
and ψ dihedral angles for each internal residue for all structures comprising
an ensemble using charmm analysis facilities. The normalised probabilities of
occurrence of each set of (φm, ψn), where m and n refer to 10◦ bins, were plotted
as discrete points on a 2D map using the matlab imagesc function.
34
2.3.7 Predicted properties
Rh
The Rh expected if the protein is natively folded (RFh ) or fully unfolded (RU
h )
were calculated according to the empirical relationships161:
RFh = 4.75N0.29
res (2.23)
RUh = 2.21N0.57
res (2.24)
where Nres was the number of residues. These are referred to in Chapters 5
and 6).
Compaction factors
Compaction factors, Cf , quantifying the degree of compaction relative to the
random coil and natively folded states were calculated according to161:
Cf =RU
h −Rexph
RUh −RF
h
, (2.25)
where Rexph was the experimental Rh and RF
h and RUh were computed according
to equations 2.23 and 2.24. Cf ∼ 1 indicates that the protein is of a similar size
to that expected if it were folded into a compact, globular structure, whereas
a Cf near zero indicates a highly expanded chain. The Cf are referred to in
Chapters 5 and 6.
Helical propensity
The program agadir361–364 was used to calculate the helical propensities of
αS, βS, β+HC (Chapter 5) and PI3-SH3 (Chapter 6) based on their sequences
using the online calculator available at http://www.embl-heidelberg.de/cgi/
agadir-wrapper.pl. Larger values indicate that helical structure is more likely.
Hydrophobicity Profile
The Kyte-Doolittle (KD) hydrophobicity profile365 of PI3-SH3 (Chapter 6)
was calculated using a Perl script available from the Canadian Bioinformatics
Help Desk at http://gchelpdesk.ualberta.ca/repository/VersionDetails
.php?filId=66&submissionId=48. The hydrophobicity was smoothed over the
recommended 11-residue window. Hydrophobic regions are assigned KD values
greater than 1.
35
Aggregation propensity
Aggregation propensity profiles (Zprofagg ) of αS, βS, β+HC (Chapter 5) and
PI3-SH3 (Chapter 6) were computed by G.G. Tartaglia using an updated ver-
sion of the Zyggregator algorithm366, which predicts the aggregation propensity
of peptides and proteins in aqueous solution from the physicochemical properties
of their constituent amino acids and compares this to the aggregation propen-
sity of a set of randomly generated amino acid sequences of the same length367.
Zprofagg indicates the regions that are most aggregation prone. The overall aggre-
gation propensity scores (Zagg) for various regions of αS, βS and some related
constructs were also computed.
Coil-library model
An ensemble of 5000 αS (Chapter 5) and PI3-SH3 (Chapter 6) structures were
obtained from A. Jha via the server at http://unfolded.uchicago.edu/index
.html. The structures were generated using a self-avoiding statistical coil model
based on backbone conformational preferences from a coil library, a subset of
the PDB265.
36
Chapter 3
Simulation of disordered
states of proteins
3.1 Introduction
To understand DS of proteins in molecular detail, it is necessary to characterise
an ensemble of structures. Experimental observables are, in most cases, average
values that do not reveal the range of structures accessible to a disordered
protein. Biomolecular simulation has the potential to provide an ensemble of
structures at atomic level detail. The challenge, therefore, is to reconcile the
two sources of information so that the simulation yields the same ensemble of
structures as gave rise to the experimental observables.
This chapter describes the evaluation of the applicability of a range of differ-
ent MD simulation techniques of varying degrees of accuracy for the generation
of DS ensembles. The implications of the technical details of each method given
in Section 2.1.1 are expanded upon below along with the results. The IDP
αS introduced in Section 1.2.1 serves as a useful model system for determining
which of the simulation methods are most appropriate for DS as it is disordered
in solution, that is, under normal simulation conditions.
The success of the simulations is assessed in two ways. The first criteria are
whether convergence occurs, and if so, how long it takes in terms of both the
internal timescale of the simulation and the real time required to carry out the
calculations. This is important because any ensemble-averaged quantity calcu-
lated from an unconverged simulation will contain errors over and above the
expected statistical noise. An efficient search of conformational space is there-
fore desirable. The Rg, a measure of the global size of the protein structure that
37
is simple and fast to calculate from the atomic coordinates, is followed through
time to provide a crude measure of the extent of sampling. The cumulative
time-averaged Rg with respect to the first structure of the production phase
for each replica, 〈Rg(t)〉 (Section 2.13), is also computed to monitor the rate of
convergence. Following these properties separately for each replica reveals the
differences in the conformational space sampled by each replica.
The second consideration is how accurately the effective energy of the pro-
tein is defined. This is assessed by comparing back-calculated observables with
experimental data. Initially, the main factor considered is the⟨R−1
h
⟩−1, which
is calculated from the Rg according to equations 5.3 and 2.9 and compared
with the experimental value determined by PFG-NMR158,235. As it transpires,
obtaining sufficiently expanded structures is an appreciable problem, thus repro-
duction of the experimental⟨R−1
h
⟩−1is a fundamental criterion for the success
of a simulation. Once conditions have been determined in which the average size
of the ensembles of structures is the same as that measured experimentally, the
agreement with other observables that report on more detailed aspects of the
structures is investigated. At this point, an ensemble of structures generated
using a self-avoiding statistical coil model based on backbone conformational
preferences from a coil library265 (Section 2.3.7) is also analysed.
3.2 Random coil model
There is some controversy in the literature12,13,233,236,246,255,256 over whether
a random coil provides an adequate description of DS or whether additional
protein-specific features need to be taken into account. To investigate this, a
random coil model was set up as outline in Section 2.1.1. All of the bonded terms
in the charmm force-field were retained but the non-bonded interactions were
reduced to only the repulsive part of the Lennard-Jones potential. Electrostatic
interactions were ignored and the simulations were carried out in vacuum. This
model preserves the bulkiness of the amino acid side chains and the connectivity
of the polypeptide backbone but little else.
The advantages of this random coil model are that it is fast in terms of
both computer time and also the extent of conformational change that occurs
at each integration step. The large fluctuations in the Rg over time (Figure 3.1)
are indicative of the huge variety of structures that are sampled. Despite this,
〈Rg(t)〉 reaches a plateau in less than 4 ns. Convergence of this global property
is therefore achieved rapidly and efficiently, fulfilling the first assessment criteria.
The absence of solvent, however, means that the time-scale of conformational
38
change is unphysically fast due to the lack of friction, thus kinetic parameters
cannot be extracted from such a simulation.
Comparison of the⟨R−1
h
⟩−1with the experimental value for αS158,235 shows
that the structures produced with the random coil model are more expanded
(Table 3.1). The good agreement between the⟨R−1
h
⟩−1of the random coil en-
semble (37.6 A) and the predicted⟨R−1
h
⟩−1for a fully unfolded polypeptide161
(37.0 A) indicates that this random coil model provides a good representation
for fully unfolded states. It does not afford an appropriate description for αS,
however, which is not unexpected given that αS is known to be more compact
than is predicted for a fully unfolded polypeptide of the same length in solu-
tion158,235. Additionally, when simulating in vacuum, the protein-like features
encoded in the solvent model are lost. Given the dearth of quantitative exper-
imental data for DS, it is important to retain as much information as possible
in the model, thus a more detailed representation than this random coil model
was desired.
Table 3.1: The⟨R−1
h
⟩−1of αS determined experimentally1, predicted2 and
calculated from the ensembles of structures generated with the various models as
described in the text. SD is the standard deviation of the⟨R−1
h
⟩−1of the entire
ensemble (all replicas and timesteps). Where multiple replicas were simulated in
parallel, SDbetween and SDwithin describe the variation between replicas and the
average variation of each individual replica as described in 2.3.2. The⟨R−1
h
⟩−1
and all statistics are in A.
Model T (K)⟨R−1
h
⟩−1SD SDbetween SDwithin
Experimental 288 31.9
Predicted (F ) - 19.9
Predicted (U) - 37.0
Random Coil 300 37.6 3.10 - -
sasa 300 20.8 0.74 0.64 0.41
eef1 300 22.2 1.30 1.30 0.42
GB/SA 300 20.9 0.81 0.76 0.27
GB/SA(0)3 300 20.9 0.59 0.59 0.14
Explicit Water 330 24.3 0.15 - -
1. measured by PFG-NMR on 100235 or 200158 µM protein in 99.9% D2O, 20 mM Mes buffer
with 100 mM NaCl, pH 6.5, 288 K.
2. F and U refer to the Rh predicted for a 140-residue natively folded or fully unfolded
polypeptide according to equations 2.23 and 2.24161.
3. GB/SA(0) refers to the simulation in GB/SA with zero surface tension.
39
20406080
Ran
dom
Coi
l
1015202530
SA
SA
1015202530
EE
F1
0 200 400 600 8001015202530
GB
/SA
0 200 400 600 800
0 50 100 150 200 25016
18
20
22
Exp
licit
Wat
er
20 40 60 80
Rg (Å) <Rg(t)> (Å)
Timestep
Figure 3.1: Comparison of Rg and 〈Rg(t)〉 during simulations run using the
various MD methods discussed in Sections 3.2, 3.3 and 3.4.1. Where there is
more than one curve, the different colours correspond to the first five replicas of
a multiple replica simulation. Data for the other replicas are omitted for clarity.
The (black, dashed) vertical lines on the plots of Rg correspond to the start
of the production phase. This quantity is shown for the heating, equilibration
and production phases so that the collapse that occurs in the implicit solvent
models can be seen. 〈Rg(t)〉 is the cumulative average at each point in time
and was calculated for the production phase only to monitor convergence. The
time-scale on the abscissa corresponds to the frequency at which the coordinates
were saved, which was every 2500 integration steps for all simulations except
that using the random coil model, in which case the structures were only saved
every 10 000 steps. The integration timestep was 2 fs in all cases.
3.3 Explicit solvent
In contrast to the highly simplified random coil model, explicit solvent is the
most detailed representation possible given practical considerations. Although
it was expected to be too computationally expensive to be practical for DS, a
brief simulation of αS in explicit water was run to provide a reference against
40
which to evaluate other simulation methods. A slightly elevated T of 330 K
was used to enhance the rate of conformational sampling. The dimensions of
the water box, 58 × 68 × 68 A, were sufficiently large to allow simulation of a
reasonably expanded αS structure using periodic boundary conditions and the
standard cutoff for non-bonded interactions of 14 A without self-self interactions
of the protein with its images occurring. A fully extended chain would require
a larger box, but as the computational cost of simulating in explicit water is
largely dependent on the number of solvent molecules, this reduced box size was
deemed sufficient for preliminary tests.
Even with this compromise, only a very short run (∼ 0.5 ns) was feasible,
making it highly unlikely that convergence would be achieved. Several µs or even
ms of sampling are likely to be required to cover the range of structures expected
for a DS, as the time-scale of sampling in explicit water is physically relevant.
Accordingly, the Rg changes very little over the duration of the simulation,
resulting in an essentially flat plot of 〈Rg(t)〉 (Figure 3.1) and a low SD of the
〈Rh〉−1 (Table 3.1). It is obvious from this trial run that simulation in explicit
water for the time-scales required to obtain convergence for DS is not practical.
Additionally, it is not clear whether sufficiently expanded structures would be
sampled even with long simulation times. Other experimental observables were
therefore not calculated from this ensemble, and alternative, simpler models
were investigated.
3.4 Implicit solvent
A compromise between the extremes of the random coil model and explicit
solvent models is to use implicit solvent models. These represent the effects of
the solvent as a PMF by integrating over the solvent degrees of freedom. Of the
wide range of implicit solvent models available, three commonly used models
from the charmm simulation package were tested: GB/SA354, sasa355 and
eef1280. These are introduced below followed by the results of simulations at
physiological temperature and, subsequently, the effect of restraining or altering
various parameters on the global size of the structures.
As with the random coil model, the conformational transition rate in implicit
solvent models is greatly increased relative to explicit water due to the lack of
the viscosity usually imparted by the random collisions of solvent molecules with
the solute281,368,369. This is a desirable feature for the simulation of DS, as it
makes the sampling of conformational space much faster, although any kinetic
parameters extracted from the trajectories will be incorrect.
41
Generalised Born
Generalised Born solvation models are inspired by the Born equation for cal-
culating the electrostatic solvation energies of ions370 and are made possible
by the simple generalisation of the Born formula to polyatomic molecules by
Still371. This has been shown to provide a good approximation to the more
exact Poisson-Boltzmann (PB) solvation energies353 at a lower computational
cost372,373. The GB/SA model implemented in charmm uses a simple poly-
nomial smoothing function374 to define the dielectric boundary between the
interior and exterior of the protein353. The electrostatic solvation energy is
estimated by solving the Born equation, and the non-polar solvation energy is
approximated from the solvent-exposed surface area using a phenomenological
surface tension coefficient.
EEF1 and SASA
The eef1 (effective energy function)280 and sasa355 implicit solvent models
combine estimates of the free energy of solvation with the charmm19 polar
hydrogen energy function to provide the effective energy function for a protein
in solution. The formulation of the screening effect of the solvent is the same in
both models. The formal charges on ionic side-chains (D, E, R and K) and the
termini are neutralized, and a distance-dependent dielectric constant is used to
approximate the charge-charge interactions in solution. This simple dielectric
function does not take different environments into account, meaning that it does
not distinguish whether or not the interacting partial charges are buried or on
the protein surface.
The main difference between sasa and eef1 is the way that the solvent
exclusion is accounted for. eef1 is based on solvent-excluded volume. The
solvation free energy of the protein molecule is assumed to be the sum of group
contributions, which are evaluated by subtracting the amount of solvation lost
due to solvent exclusion by proximal atoms of the macromolecule from the
solvation free energy of that group in a small model compound. The solvation
free energy density is given by a Gaussian function. In comparison, the sasa
model assumes that the polar and non-polar contributions made by each atom to
the free energy of solvation are proportional to their SASAs, which are calculated
by an analytical approximation to increase the efficiency. The direct solvation
of polar groups is favoured, whereas the hydrophobic effect on apolar groups is
accounted for by a positive atomic solvation parameter.
42
3.4.1 Physiological temperature
In the first instance, simulations of αS using each of the three implicit solvent
models outlined above were carried out at physiological temperature (300 K).
The structures produced using all three models are more compact than those
present experimentally. The⟨R−1
h
⟩−1of each ensemble is close to the Rh pre-
dicted for αS if it were a folded globular protein (Table 3.1). Even though
relatively expanded starting structures were used, each replica quickly collapses
and remains a similar size thereafter (Figure 3.1). Examination of the trajecto-
ries with the molecular visualisation program vmd375 showed that the collapsed
conformation of each replica depends on the starting structure, and that once
collapsed, the structure of each replica changes little over the remainder of the
simulation. This effect is reflected in the 〈Rg(t)〉 of each replica, which reaches
a plateau within ∼ 0.5 ns and remains constant thereafter (Figure 3.1). The
addition of solvent therefore greatly restricts the conformational sampling at
physiological temperature compared to the random coil model.
The SD of the⟨R−1
h
⟩−1is lower than that of the random coil model for all
of the implicit solvent models considered. To evaluate the relative contributions
that the differences between the major species into which each replica collapses
and the within-replica variation over the duration of the simulation make to
this variation, the SD of the set of Nrep
⟨R−1
h
⟩−1, SDbetween, was compared to
the average over all replicas of the SD of the⟨R−1
h
⟩−1of each replica, SDwithin,
(Section 2.3.2). In all cases, the magnitude of SDbetween is similar to that of
the overall SD, whereas SDwithin is lower. Thus the major contribution to the
overall SD is the variation between replicas, providing further evidence for both
the restricted sampling of each individual replica and the dependence of the
structures sampled by each replica on the starting structure.
Two major obstacles to the realistic simulation of DS states can be identified
from these preliminary simulations. Firstly, although each individual replica
converges, the structures sampled by each replica depend on the choice of start-
ing structure, thus the overall ensembles obtained by pooling all of the repli-
cas are not converged. Secondly, the correspondence with the experimental⟨R−1
h
⟩−1is poor, indicating that the effective energy function provided by the
force-field and implicit solvent models does not accurately describe the con-
formations visited by αS in solution in terms of their global dimensions. The
following section describes the various means of overcoming these problems us-
ing implicit solvent models that were investigated.
43
3.4.2 Methods for generating expanded structures
Restraining the Rg
One possible means of ensuring that the average size of the molecules matches
that observed experimentally is to explicitly restrain the 〈Rg〉. The restraint is
applied to the ensemble-average calculated across all Nrep replicas at each point
in time to take into account the time- and ensemble-averaging of the experi-
mental measurement. ERMD simulations in which the 〈Rg〉 was restrained to
match the Rg corresponding to the experimental⟨R−1
h
⟩−1 158,235 were carried
out with each of the three implicit solvent models discussed above. Although
it is possible to satisfy the imposed Rg restraint, this is achieved by one or two
replicas becoming almost completely extended, and the remainder collapsing
just as in the unrestrained simulations (data not shown). Whilst the experi-
mental⟨R−1
h
⟩−1, as an average, does not rule out such a situation being an
accurate reflection of the state on which the PFG-NMR measurement was car-
ried out, none of the other commonly used experimental techniques detect such
a structural dichotomy. The Rg restraint also suffers from the fact that the
best compromise between minimising the SASA and satisfying the restraint is a
prolate ellipsoid. This results in elongated structures, which again may or may
not be typical of those present experimentally. Restraining the Rg is therefore
an unsatisfactory solution to the compaction problem.
Reducing the surface tension
In the charmm GB/SA implementation, the non-polar solvation energy is
estimated from the SASA. This contribution is only considered if the input
parameter SGAMMA, which describes the non-polar surface tension, is non-
zero. The fact that IDPs sample expanded rather than collapsed, globular
structures in solution suggests that surface tension does not affect them in the
same way as it does NFPs. To mimic this situation, SGAMMA was set to
zero. There is no noticeable effect on the resulting trajectories, however, as
summarised by the identical⟨R−1
h
⟩−1and similar statistics (Table 3.1). Thus
adjusting the surface tension is not a viable method for generating suitably
expanded structures.
Increasing T
In sasa, the contribution made by solvent exclusion effects to the solvation
free energy is approximated from the SASA and in eef1, it depends on the
exclusion of solvent by proximal solute atoms. Both models are parameterised
to fit experimental data for natively folded proteins and small peptides, and
44
therefore favour compact structures in which the SASA is minimised or the sol-
vent exclusion maximised. Rather than altering the solvent models themselves,
which is beyond the scope of this work, the free energy of the polypeptide can be
easily altered by changing T . Within an implicit solvent model, T does not cor-
respond directly to a physical quantity, rather, it can be thought of as a source
of kinetic energy, EK. Increasing T , and therefore EK, provides a means of
compensating for the inherent bias of implicit solvent models towards compact
structures.
Increasing T with the GB/SA solvation model has little effect, with the⟨R−1
h
⟩−1remaining almost constant as T increases from 300 to 600 K (Fig-
ure 3.2). Higher T were not tested because according to the observed trend,
extremely high T would be required to produce structures of the required size.
The continued preference for compact structures even at high T may be related
to the known over-stabilisation of salt bridges312,316,369,376,377 and hydrogen
bonds313,378 by GB/SA. It is thought that this may occur because of insuffi-
cient electrostatic screening, which would be expected to be particularly perti-
nent for a highly charged protein such as αS. Whilst these effects have mostly
been observed for the opls-aa implementation of GB, the charmm GB/SA
model has been shown to over-stabilise the NS of a folded protein relative to
explicit water310.
The sasa and eef1 models, in contrast, are much more responsive to changes
in T . The⟨R−1
h
⟩−1increases with T up to ∼ 700 K (Figure 3.2). The same
effect occurs for the IDPs βS and β+HC (Chapter 5) and the acid-denatured
state of PI3-SH3 (Chapter 6), thus it is likely to be a general feature of high T
simulations with these implicit solvent models. This is an important result, as
it means that by manipulating T , an ensemble of structures with an⟨R−1
h
⟩−1
that matches the experimental value can be obtained for a DS of any protein.
Simulating at high T also alleviates the lack of convergence observed at
physiological temperatures. The range of Rg sampled by each replica is much
greater at higher T (500−600 K), yet this quantity converges almost as quickly
as at low T (Figure 3.3). The overall SD of the⟨R−1
h
⟩−1of the pooled ensembles
does not increase monotonically with T because this quantity is also affected
by the pooling of multiple replicas, meaning that it reflects both inter- and
intra-replica variation, as discussed previously with regard to the simulations
at physiological T (Section 3.4.1). SDwithin, however, shows a clear increase
with T (Table 3.2). The reversal in the relative contributions of SDbetween and
SDwithin to the overall SD as T increases is indicative of the burgeoning range
of structures sampled by each replica at higher T .
45
Increasing EK by increasing T provides a simple solution to the two main
difficulties encountered when attempting to simulate DS at physiological tem-
peratures. At elevated T , a wide range of different structures are accessible to
each independent replica, thus the sampling of conformational space is more
comprehensive. Additionally, the structures are of a similar degree of expansion
as is expected for disordered protein states and the global size can be tuned by
altering T . Some additional consequences of simulating at high T are discussed
below.
300 400 500 600 700Simulation Temperature (K)
15
20
25
30
35
40
<Rh
-1>-1
(Å
)
Figure 3.2: The⟨R−1
h
⟩−1of the ensembles generated using sasa (black), eef1
(red) and GB/SA (green) at various temperatures.
Consequences of simulating at high T
When simulating at high T , the effective barriers between different conforma-
tions are reduced because the free energy difference is lower relative to the avail-
able thermal energy. Effectively, the free energy landscape appears smoother
from the point of view of the protein, which confers both advantages and dis-
advantages. It facilitates conformational sampling by increasing the speed of
conformational transitions and the range of accessible conformations. It is also
the reason why the⟨R−1
h
⟩−1increases, as compact structures no longer occupy
deep minima on the free energy surface.
A disadvantage of a smoothed free energy landscape is that minima that do
exist may not be detected. Additionally, the increased rate of sampling prevents
the extraction of meaningful kinetic parameters. This is not an issue for the
purposes described here, but should be considered if kinetics are of interest.
A consequence that is more relevant to the production of ensembles of struc-
tures representative of DS is that the effect of the energy penalty for violation of
46
Table 3.2:⟨R−1
h
⟩−1and SD calculated from the ensembles of αS structures
generated with the sasa and eef1 implicit solvent models at T ranging from
300− 600 K. SD is the standard deviation of the⟨R−1
h
⟩−1of the entire ensem-
ble (all replicas and timesteps). SDbetween and SDwithin describe the variation
between replicas and the average variation of each individual replica as outlined
in Section 2.3.2. The⟨R−1
h
⟩−1and all statistics are in A.
Model T (K)⟨R−1
h
⟩−1SD SDbetween SDwithin
sasa 300 20.8 0.74 0.64 0.41
eef1 300 22.2 1.30 1.30 0.42
sasa 400 21.1 0.60 0.16 0.57
eef1 400 21.7 0.87 0.57 0.64
sasa 500 26.6 2.98 0.22 2.97
eef1 500 25.3 2.05 0.25 2.03
sasa 600 32.9 3.39 0.25 3.32
eef1 600 32.2 3.26 0.29 3.49
the protein-like features encoded in the molecular mechanics force-field, such as
the dihedral and improper angles, is reduced due to the higher overall energies
of the molecule. The Ramachandran plots for ensembles produced using the
sasa and eef1 implicit solvent models at T in the range 300−600 K show that
a more diffuse range of dihedral angles are sampled at higher T (Figure 3.4).
However the overall nature of the Ramachandran plots remains typical of each
solvent model as T increases, with the sasa model favouring α-helical struc-
ture more than the eef1 model, showing that the dihedral angle preferences are
relatively robust to increases in T within the range considered here.
3.5 Comparison with experimental data
Whilst the production of ensembles of structures whose⟨R−1
h
⟩−1matches the
experimental value is a significant result, it is also necessary to consider whether
the description of the protein provided by the force-field and implicit solvent
models is an accurate reflection of more detailed aspects of the nature of the
structures present experimentally. The quality of the ensembles of structures
can be assessed by comparing back-calculated observables with experimental
data. Here, three types of NMR data are considered: long-range PRE distances
equivalent to those obtained from a PRE-NMR experiment, 3JHNHα-couplings
47
0204060
S30
0
0204060
E30
0
0204060
S40
0
0204060
E40
0
0204060
S50
0
0204060
E50
0
0204060
S60
0
0 200 400 600 8000
204060
E60
0
0 200 400 600 800
Rg (Å) <Rg(t)> (Å)
Timestep
Figure 3.3: Comparison of Rg and 〈Rg(t)〉 during simulations run using the
sasa (S) and eef1 (E) implicit solvent models at a range of T (300, 400, 500
and 600 K). The different colours correspond to the first 5 of 16 replicas. Data
for the remainder of the replicas are omitted for clarity. The (black, dashed)
vertical lines on the plots of Rg correspond to the start of the production phase.
This quantity is shown for the heating, equilibration and production phases so
that the collapse that occurs at low T can be seen. 〈Rg(t)〉 is the cumulative
average at each point in time and was calculated for the production phase only
to monitor convergence. The time-scale on the abscissa corresponds to the
frequency at which the coordinates were saved, which was every 2500 integration
steps of 2 fs.
and RDCs, all of which have been measured experimentally for αS.
The ensembles
Four different ensembles of αS structures are considered. The first, the ran-
dom coil model introduced in Section 3.2 (αRC), provides a baseline from which
to infer the effect of the protein-like features encoded in more detailed represen-
tations. The RDCs calculated from this ensemble are of particular interest given
that one of the few protein-like features it retains is the bulkiness of the amino
acid side chains, which have been shown to correlate with the RDCs measured
for the urea-denatured state of αS but not native αS268. The⟨R−1
h
⟩−1of αRC
also compares favourably with that measured for urea-denatured αS158.
48
Figure 3.4: Ramachandran plots of p(φ, ψ) for ensembles of αS structures gen-
erated using the (A-D) eef1 and (E-H) sasa implicit solvent models at (A,E)
300 K, (B,F) 400 K, (C,G) 500 K and (D,H) 600 K. The probability of each
combination of φ and ψ dihedral angles is the average over all residues and all
structures in the ensemble. The same scale is used for all plots to facilitate
comparisons.
As an intermediate between the random coil model and the description of
the polypeptide chain engendered by the implicit solvent models, an ensemble
of 5000 structures generated using a self-avoiding statistical coil model based
on backbone conformational preferences from a coil library database265 was
obtained from A. Jha (αCOIL). In addition to the covalent connectivity and
excluded volume effects provided by the random coil model, this model includes
local dihedral angle preferences, including nearest-neighbour effects, but lacks
a description of amino-acid specific long-range interactions such as electrostatic
interactions. Bernado et al. found that RDCs calculated from a similar ensemble
of αS structures are similar to the experimental data for the urea-unfolded state,
but the RDCs for native αS in purely steric alignment media are best reproduced
when only those structures exhibiting long-range interactions between the N-
and C-termini are considered266. As with the random coil model, the⟨R−1
h
⟩−1
of αCOIL is similar to that of urea-denatured αS158.
The final two ensembles were produced in accordance with the results of
Section 3.4.2 using the sasa and eef1 implicit solvent models at high T . The
T was chosen separately for each implicit solvent model so that the⟨R−1
h
⟩−1
matched the experimental value for αS158,235, thus ensuring that in terms of
global dimensions, the structures are equivalent on average to those present
experimentally. The ensemble of structures produced using sasa (αSASA) was
49
generated at 570 K and that produced using eef1 (αEEF1) at 600 K.
PRE distances
The long-range distances obtained from PRE-NMR provide information about
the tertiary structure of the conformations present. Because distances corre-
sponding to Iox/Ired < 0.15 or Iox/Ired > 0.85 cannot be calculated exactly,
these distances are represented in the experimental dataset as d0.15ij and d0.85
ij ,
respectively. Any correlation between the experimental and back-calculated dis-
tances observed for d0.15ij < dij < d0.85
ij is therefore not expected to extend to
dij > d0.85ij and dij < d0.15
ij . For this reason, only d0.15ij < dij < d0.85
ij contribute
to the Q values reported in Figure 3.5. The PRE distances were defined as the
distance between the Cα atom of the spin-labelled side-chain and the amide
hydrogen atom; this definition is explained and justified in Section 4.3.
The ensemble-averaged PRE distances back-calculated from αSASA and
αEEF1 are almost all shorter than those recorded experimentally (Figure 3.5).
This can be reconciled with the fact that the average size of the molecules is
by definition the same as in the experiment by considering the sensitivity of
the r−6 average to small values of r. If the distance distributions are broader
than those present in vitro, the r−6-averaged PRE distances will be shorter, as
is seen here, even though the⟨R−1
h
⟩−1, which is a near-linear average, is similar
to the experimental value. This concept is discussed further in Chapter 4.
The distances calculated from αRC and αCOIL are generally of a similar
size to those determined experimentally, although there is more variation in
the calculated distances, as quantified by the larger Q values. Given that the⟨R−1
h
⟩−1of these ensembles are significantly larger than the experimental value,
this again suggests that the distance distributions characteristic of these models
are broader than those contributing to the experimental observable.
0 10 20 30 400
10
20
30
40
0 10 20 30 400
10
20
30
40 Q = 0.41
0 10 20 30 400
10
20
30
40
0 10 20 30 400
10
20
30
40 Q = 0.27
0 10 20 30 400
10
20
30
40
0 10 20 30 400
10
20
30
40 Q = 0.25
0 10 20 30 400
10
20
30
40
0 10 20 30 400
10
20
30
40 Q = 0.25
dijca
lc (
Å)
dijexp
(Å) dijexp
(Å) dijexp
(Å) dijexp
(Å)
A B C D
Figure 3.5: Comparison of the PRE distances back-calculated from (A) αRC,
(B) αCOIL, (C) αEEF1 and (D) αSASA with the experimental data. The red
line corresponding to a perfect agreement is shown to guide the eye. The Q
values were calculated according to equation 2.14 using only d0.15ij < dij < d0.85
ij .
50
3JHNHα-couplings
The 3JHNHα-couplings were back-calculated and compared with the experi-
mental values obtained from C. Bertoncini158. Because there are no Hα atoms
in the charmm19 representation, the 3JHNHα-couplings were computed indi-
rectly from the φ angles (equations 2.10 and 2.11). 3JHNHα-couplings measured
experimentally are around 3− 5 Hz for helical structure (including α-helix and
PPII) and 8−11 Hz for β-sheet structure. For a random coil, a weighted average
of these values is expected, typically around 6− 8 Hz.
The 3JHNHα-couplings computed for αRC, however, are close to 5 Hz for all
residues, as are those for αEEF1 and αSASA (Figure 3.6). The only ensemble to
show significant variation in the 3JHNHα-couplings throughout the sequence is
αCOIL, reflecting the inclusion of amino-acid specific dihedral angle preferences
in the structural parameters of this model265. Even so, the agreement with
experimental data is poor, and the average value of the 3JHNHα-couplings is
again ∼ 5 Hz.
Although the 3JHNHα-couplings of all four ensembles resemble the values
expected for helical structure, this does not necessarily imply that the struc-
tures are predominantly α-helical. 3JHNHα-couplings are only sensitive to the φ
dihedral angle, thus PPII and α-helical structure cannot be distinguished. Ex-
amination of the Ramachandran plots shows that for all four ensembles, PPII
structure is favoured over extended β-sheet-like structure (Figure 3.7), thus ex-
plaining why the 3JHNHα-couplings are lower than is expected for a random
coil. The Ramachandran plot for αCOIL is noticeably different to those of the
ensembles generated using charmm, in keeping with the distinctive 3JHNHα-
couplings computed for this ensemble. The distribution of dihedral angles is
not as smooth and there are small regions of particularly high probability den-
sity corresponding to PPII and α-helical structure.
All of the four ensembles considered here fail to reproduce the experimental3J-couplings, indicating that the dihedral angles favoured by the random coil
model, the charmm19 force-field in combination with either the sasa or eef1
implicit solvent models and even the coil library database are not the same as
those sampled by αS in solution. Additionally, it shows that reproduction of the
global scaling of a polypeptide chain is not sufficient to gauge whether the local
structure is correct, in keeping with the numerous experimental and theoretical
results discussed in Section 1.3.2.
RDCs
RDCs combine information about both the local structure and the overall
51
0 20 40 60 80 100 120 1403
4
5
6
7
8
9
0 20 40 60 80 100 120 1403
4
5
6
7
8
9
0 20 40 60 80 100 120 140Residue Number
3
4
5
6
7
8
9
0 20 40 60 80 100 120 140Residue Number
3
4
5
6
7
8
9
A
C
B
D
3 J HN
Hα (
Hz)
3 J HN
Hα (
Hz)
3 J HN
Hα (
Hz)
3 J HN
Hα (
Hz)
Figure 3.6: Comparison of the 3JHNHα-couplings back-calculated from (A) αRC,
(B) αCOIL, (C) αEEF1 and (D) αSASA, shown in red, with the experimental3JHNHα-couplings158, shown in black.
Figure 3.7: Ramachandran plots of the dihedral angle distributions p(φ, ψ) for
(A) αRC, (B) αCOIL, (C) αEEF1 and (D) αSASA. The probability of each
combination of φ and ψ dihedral angles is the average over all residues and all
structures in the ensemble. The same scale is used for all plots to facilitate
comparisons.
alignment properties of the molecule, and thus provide the most stringent test
of the quality of the ensembles of αS structures. The NH RDCs measured in n-
octyl-penta(ethylene glycol)/octanol (C8E5/octanol) and Pf1 bacteriophage240
were therefore obtained from C. Bertoncini and M. Zweckstetter, respectively
and compared to RDCs back-calculated from each of the four ensembles.
The NH RDCs were back-calculated using the steric version of pales190.
It was not possible to obtain predictions using the electrostatic version. Steric
pales estimates A, the tensor describing the average solute orientation with
respect to the magnetic field, by excluding the fraction of the solute orienta-
tions that sterically clash with the alignment media, and averaging the individ-
ual alignment matrices, A’, calculated from the atomic coordinates of a given
structure for each non-obstructed position and orientation. The independent
52
prediction of the alignment tensor for each individual conformation is impor-
tant as differential alignment is expected to provide the greatest contribution
towards non-zero RDCs measured for DS193–195.
Other methods for calculating the alignment tensor have also been
described179, including an efficient approach in which the steric alignment prop-
erties are derived from the information regarding the shape asymmetry present
in the molecular inertia tensor192. A similar approximation was used by Sosnick
et al. to calculate RDCs from coil library ensembles of a variety of proteins257.
To test the method of Sosnick et al., the coil library ensemble of ubiquitin
structures analysed in their study was obtained. Comparison of the NH RDCs
calculated using their method and using pales revealed that the two methods
give very different results (data not shown). Because pales is a well-established
program that has been widely and successfully applied, it is used for all calcu-
lations reported here.
In C8E5/octanol, the alignment is expected to be purely steric. Pf1 bacterio-
phage, in comparison, bear an overall negative charge on the protein-accessible
side379, thus at low salt concentrations the positively charged N-terminal do-
main of αS is expected to be strongly attracted to the phage, and the negatively
charged C-terminal domain should be repelled. Despite this, NH RDCs calcu-
lated from a coil library ensemble of αS structures using steric pales were
found to give a reasonable agreement with the experimental data measured in
Pf1 alignment media269, thus the use of steric pales here is acceptable.
To complement the previous analyses of the convergence properties of the
multiple replica simulations in terms of the Rg (Section 3.4), the RDCs aver-
aged individually for each replica are compared to the RDCs averaged over the
entire ensemble (Figure 3.8 A). The results are reported for αSASA only but
were similar for the other ensembles (data not shown). The ensemble-averaged
RDCs for each replica are quite different, indicating that each replica explores a
different region of conformational space. This highlights the utility of carrying
out multiple-replica simulations, as significantly longer simulation times would
be required to obtain the same coverage of conformational space with a single
replica.
To investigate further the range of RDCs contributing to the overall av-
erage and how this is affected by the number of contributing structures, the
average and its SD was calculated for various fractions of the αSASA ensemble
(Figure 3.8 B-E). The SD remains consistently high even when all of the 51 355
structures are considered. It is noteworthy that the SD is an order of magnitude
larger than the average RDCs, which lends some uncertainty to the significance
53
of the fine structure of the RDC pattern along the sequence from which the
presence of residual structure is often inferred.
-20246
RD
C (
Hz)
1 10 20 50 100
1
2
3
<SE
>
-60-40-20
0204060
RD
C (
Hz)
-5
0
5
10
RD
C (
Hz)
-60-40-20
0204060
RD
C (
Hz)
-5
0
5
10
RD
C (
Hz)
-60-40-20
0204060
RD
C (
Hz)
-5
0
5
10
RD
C (
Hz)
0 20 40 60 80 100 120 140Residue Number
-60-40-20
0204060
RD
C (
Hz)
20 40 60 80 100 120 140Residue Number
-5
0
5
10
RD
C (
Hz)
%(Nstruct)
A
B
C
D
E
F
G
H
I
J
Figure 3.8: RDCs calculated from various fractions of αSASA. (A) RDCs for
each residue averaged over all structures sampled by each individual replica,
shown in a different colour for each replica and the entire ensemble, shown as a
thick black line. (B-E) Ensemble-averaged RDCs for each residue and their SD
where the ensemble comprises (B) 1, (C) 10, (D) 20 and (E) 100% of αSASA. (F)
The 〈SE〉 in the ensemble-averaged RDCs when the ensemble comprises varying
proportions of αS and the brackets denote averaging over all residues and G-
J. ensemble-averaged RDCs for each residue and their SE where the ensemble
comprises (G) 1, (H) 10, (I) 20 and (J) 100% of αS. The grey lines at 0 Hz in
(A) and (G-J) are to guide the eye. Note that different scales are used for plots
A-E and G-J.
The SE in the ensemble-averaged RDCs calculated from varying proportions
of the αS ensemble was also computed (Figure 3.8 F-J). The most significant
decrease in the 〈SE〉, where the brackets indicate averaging over all residues,
occurs when going from 1 to 10% of the ensemble, whereas when 50% of the
structures are considered, the SE is comparable to that of the entire ensemble
(Figure 3.8 F). In addition to the reduction in the SE, the ensemble-averaged
RDCs for each residue also change as the number of contributing structures in-
creases (Figure 3.8 G-J). The range of RDC values and their fluctuations along
54
the sequence are much greater when fewer structures are considered. This im-
plies the need for caution when comparing RDCs back-calculated from synthetic
ensembles to experimental data, as if there are insufficient structures, the aver-
age will not be converged. Minimising the SE provides a means of determining
the appropriate number of structures to use.
The αEEF1 and αSASA ensembles contain 31 262 and 51 355 structures,
respectively, which is sufficient that the RDCs computed from the entire ensem-
ble are converged. αRC, however, only contains 10 000 structures and αCOIL
is even smaller, as only 5000 structures were provided by A. Jha. This is almost
certainly too few for the RDCs to be converged. Nevertheless, the RDCs were
back-calculated from all four ensembles and compared to the experimental data
(Figure 3.9)
Whilst the calculated and experimental RDCs are of similar magnitude,
the residue-specific agreement is poor for all four ensembles. For αEEF1 and
αSASA, the calculated RDCs are most like those measured in Pf1 and the
magnitude is low throughout the sequence, whereas for αRC and particularly
αCOIL the calculated RDCs are closer to those measured in C8E5/octanol.
The two prominent peaks in the C-terminus visible in the experimental data
measured in both media are not reproduced by any of the ensembles. αCOIL,
which provides a reasonable match to the first peak of the C8E5/octanol data,
performs the best in this regard, but the RDCs in the N-terminus are larger
than those measured in either media. Thus comparison of the back-calculated
and experimental RDCs suggests that, as was seen for the 3JHNHα-couplings,
reproduction of the global dimensions can occur independently of an accurate
representation of the local structure.
Additional structural features can be introduced by selecting only certain
structures according to pre-defined criteria. A previous study found that the
RDCs back-calculated from subsets of structures selected according to the for-
mation of particular intra-molecular contacts are in good agreement with the
experimental RDCs for native αS266. To investigate this conjecture, two sub-
ensembles of αS structures were selected from αCOIL by choosing only struc-
tures with at least one contact between residues 1− 20 and 120− 140 (αCOIL-
C:N) or residues 61− 95 and 110− 140 (αCOIL-C:NAC). αCOIL-C:N was de-
signed to mimic the filtered ensemble found to best reproduce the experimental
data by Bernado et al.266, and αCOIL-C:NAC incorporates the intra-molecular
contacts found to be most probable in the PRE-ERMD study of αS by Dedmon
et al.205. Here, a contact is said to occur if the Cα atoms of two residues were
closer than 15 A, following the definition used by Bernado266. The magnitude
55
0 20 40 60 80 100 120 140-5
0
5
10
0 20 40 60 80 100 120 140-5
0
5
10
0 20 40 60 80 100 120 140Residue Number
-5
0
5
10
0 20 40 60 80 100 120 140Residue Number
-5
0
5
10
A
C
B
D
RD
C (
Hz)
RD
C (
Hz)
RD
C (
Hz)
RD
C (
Hz)
Figure 3.9: Comparison of (black) the RDCs back-calculated from (A) αRC, (B)
αCOIL, (C) αEEF1 and (D) αSASA with the experimental RDCs measured in
(red) C3E5/octanol and (green) Pf1 bacteriophage240. The grey line at 0 Hz is
to guide the eye.
of the RDCs for each sub-ensemble vary more throughout the sequence than the
RDCs calculated from αCOIL, but it is not possible to determine whether this ef-
fect arises from differences between the original and filtered ensembles or simply
results from fewer structures contributing to the average. Even the 5000 struc-
tures comprising αCOIL are unlikely to be sufficient to produce fully converged
RDCs and the filtering procedure further reduces the sizes of the ensembles to
1302 structures (αCOIL-C:N) and just 173 structures (αCOIL-NAC:N). Neither
αCOIL-C:N or αCOIL-C:NAC provide a good match to the experimental data
(Figure 3.10), but again this may be due to the relatively small size of these en-
sembles, thus it is difficult to draw any meaningful conclusions regarding either
how well these ensembles describe the NS of αS in solution, or the relative roles
that local and global structure play in determining the RDCs.
3.6 Conclusions
Any simulation should be carried out using the best protocol available. For the
simulation of polypeptides in water, this means using an explicit water model.
The computational cost, however, renders such an approach impractical for sim-
ulating DS. Implicit solvent models are less accurate, but the computational cost
is greatly reduced. Simulations of the IDP αS carried out using three common
implicit solvent models at physiological temperature did not converge, and the
structures were too compact with respect to the experimental⟨R−1
h
⟩−1. The
56
0 20 40 60 80 100 120 140
-5
0
5
10
0 20 40 60 80 100 120 140
-5
0
5
10
0 20 40 60 80 100 120 140Residue Number
-5
0
5
10
0 20 40 60 80 100 120 140Residue Number
-5
0
5
10
A
C
B
D
RD
C (
Hz)
RD
C (
Hz)
RD
C (
Hz)
RD
C (
Hz)
Figure 3.10: Comparison of the RDCs back-calculated from (blue) (A,C)
αCOIL-C:N and (B,D) αCOIL-NAC:C with (A,B) (black) αCOIL and (C,D)
the experimental RDCs measured in (red) C3E5/octanol and (green) Pf1 bac-
teriophage240. The grey line at 0 Hz is to guide the eye.
random coil model did not suffer from the problems of compaction and con-
vergence, but at the expense of eliminating many of the protein-like features.
Increasing T with either the eef1 or sasa implicit solvent models allowed con-
verged ensembles consisting of sufficiently expanded structures to be generated.
Whilst raising T is not an ideal solution, there is not yet a force-field and solva-
tion model available that is capable of recognising that a protein is intrinsically
disordered from the sequence alone.
Comparison of the PRE distances, 3JHNHα-couplings and RDCs back-calculated
from the random coil ensemble, a coil library ensemble and two ensembles gen-
erated at high T using the sasa and eef1 implicit solvent models showed that
even when the global dimensions of the polypeptide are reproduced, the local
and long-range structure is not equivalent to that detected experimentally, in-
dicating that the types of structures sampled by the models considered here
are not a good representation of those accessible to αS in vitro. It is unlikely
that carrying out the MD at elevated T is the sole cause of the discrepancies,
as the local structure is relatively impervious to increasing T . Despite exhibit-
ing significantly different 3JHNHα-couplings to the remainder of the ensembles,
αCOIL did not provide a better match to the experimental data, thus for αS,
at least, selecting the dihedral angles from a coil library database is not suffi-
cient to describe the local structure. Moreover, unlike in other studies of coil
library ensembles of αS structures, filtering the ensemble to extract only those
structures exhibiting specific types of tertiary structure did not improve the
57
agreement of the calculated RDCs with the experimental data. An alternative
to filtering that eliminates the need for an ad hoc choice of selection criteria
is to use experimental data as restraints during the simulation process. Such
methods are used throughout the remainder of this work; specifically, the use
of long-range distances from PRE-NMR in ERMD is investigated as a means of
generating more representative ensembles.
58
Chapter 4
Improving the accuracy of
ensemble-restrained
molecular dynamics
4.1 Introduction
As discussed in Chapter 3, the simulation of DS is made difficult by the het-
erogeneous and expanded nature of the structures comprising DS ensembles.
Molecular dynamics simulations in explicit water represent the most realistic
and accurate means of simulating macromolecules in solution. However this
method is impractical for simulating DS due to the high computational cost re-
sulting from the large water box required to accommodate the expanded struc-
tures typical of DS and the long simulation time necessary for convergence.
Whilst implicit solvent models are much faster, high simulation temperatures
must be used to generate sufficiently expanded structures. The smoothing of
the free energy landscape induced by simulating at high temperatures with sim-
plified solvation models means that some of the structures gathered in such
simulations may not be physiologically relevant. Indeed, it was found in Chap-
ter 3 that even for ensembles whose⟨R−1
h
⟩−1is consistent with the experimental
value, the back-calculated PRE distances, 3JHNHα-couplings and RDCs bear lit-
tle resemblance to those measured experimentally. Restraining the simulations
with experimental observables provides a means of overcoming these problems
by restricting the sampling of conformational space to encompass only those
structures that are compatible with the experimental data.
59
Experimental observables measured for DS are averages over broad distribu-
tions, as DS ensembles typically encompass a heterogeneous range of structures,
meaning that it is not appropriate to apply the restraints to a single replica.
ERMD is therefore explored as a means of reconstructing DS in silico, with the
aim of establishing a general ERMD method that can be used to characterise
any DS. Rather than rely on the limited experimental data available for DS, the
method is developed using synthetic data back-calculated from two different
reference ensembles.
There are many advantages of measuring the success of the method accord-
ing to its ability to reproduce known reference ensembles. Problems related to
possible inaccuracies in the experimental data and in the translation of exper-
imental NMR signals into structural restraints are avoided344. Moreover, the
ensembles produced using ERMD can be compared to the reference ensembles
from which the restraints were calculated in terms of both averages and dis-
tributions. This is important because ensembles are best described in terms
of distributions, which are generally not accessible experimentally. Addition-
ally, as it is not clear how to restrain a distribution, it is necessary to develop
methods in which average values are restrained, but the accuracy of the result-
ing ensemble is assessed in terms of distributions. Even this definition may be
insufficient if correlated motions are present, as then two ensembles may differ
even if all their distance distributions are equal. The use of a synthetic reference
ensemble allows the presence of correlations and their effective reproduction to
be investigated.
The intrinsically disordered protein αS112, introduced in Section 1.2.1, is
used as a model system. This 140-residue protein has been studied previously
by both PRE-NMR205,240 and ERMD205. Two different reference ensembles
were generated using unrestrained MD. Neither is expected to be an exact re-
flection of the ensemble of structures sampled by native αS under experimental
conditions; if they were, then restraints would not be required. The exact na-
ture of the reference ensembles is in fact somewhat arbitrary, as the aim is to
find the optimal computational procedure for reproduction of a variety of known
reference ensembles, and thus establish a general method for using ERMD to
characterise DS ensembles.
The range of quantitative experimental observables that can be measured
for DS is quite limited, as discussed in Section 1.3.1. Additionally, not all
measurable quantities are suitable for use as restraints; for instance, methods for
restraining RDCs across multiple replicas have not yet been developed due to the
difficulties imposed by the need to calculate a separate alignment tensor for each
60
conformation. This thesis focuses on PRE-ERMD, in which long-range distances
obtained from PRE-NMR are used as restraints. These data are expected to
enhance the description of protein-like features already present in the force-field
and implicit solvent model by providing information about the tertiary structure
of the molecule, particularly where the distances are between residues far apart
in sequence. Many of the obstacles encountered here and the solutions proposed
are expected to be relevant for other types of data as well.
In Section 4.2, theoretical considerations related to model-fitting and the
relationship between averages and distributions are outlined. The way in which
the equivalent of the distances obtained from the PRE-NMR experiment are
defined within the context of the simulations is then explained and justified with
reference to experimental data describing the motion of a spin-label attached
to a polypeptide. Following this, the generation of the reference ensemble and
calculation of synthetic distance restraints are described. Preliminary results are
reported in Section 4.4.4 and the causes of various issues that arise from these
and the solutions that are found are discussed. The resulting general method
is applied in Section 4.6 and finally, the calculated and reference ensembles are
compared using novel techniques.
4.2 Theoretical aspects of ERMD
Optimisation of the PRE-ERMD procedure requires consideration of various
issues that are best thought of in the context of ERMD as a model-fitting
process. Fitting a model involves obtaining the best fit to the data without using
an unnecessarily and unjustifiably large number of free parameters. If there are
too many degrees of freedom relative to the amount of information, under-
restraining occurs331. In the case of ERMD, the number of free parameters is
determined by the number of atoms comprising the polypeptide and the number
of replicas. The sources of structural information are the protein-like features
encoded in the force-field and implicit solvent model and the observables used
as restraints. The number of replicas therefore cannot become too large since as
this number grows, the experimental information quickly becomes insufficient
to define the structures of all of the replicas.
The opposite problem, over-restraining, arises because experimental data are
ensemble-averages over hundreds or thousands of molecules which, depending
on the relative time and spatial scales of the molecular motion and the exper-
imental measurement, may sample many different conformations. DS, in par-
ticular, comprise a heterogeneous and broad range of structures. Consequently,
61
it is not appropriate to enforce restraints upon a single replica, since a single
structure compatible with all of the restraints is unlikely to be representative
of the structures actually present, and may in fact be physically impossible to
obtain331,341,380.
The number of replicas must therefore be carefully chosen so as to avoid over-
or under-restraining the data. The standard means of determining the optimal
number of replicas is cross-validation331,381–384. Typically, around 20% of the
data are excluded from the working dataset (the restraints). Reproduction of
these free data provides a more stringent test than satisfaction of the restraints.
This is because, unlike the satisfaction of the restraints, which generally im-
proves with more replicas, reproduction of the free data becomes worse. This
type of cross-validation is particularly effective in identifying the appearance of
under-restraining, but it may be insufficient when over-restraining is present385.
For example, compact conformations of ∆131∆ pass this cross-validation test,
even if they are over-restrained204.
However even if both the working and free datasets are reproduced, this
is not necessarily sufficient to guarantee that the underlying distributions are
accurately reconstructed. Many different distributions can give rise to the same
average386 (Figure 4.1 A and B). Equally, two mostly similar distributions can
have a different average, if the region of the distributions to which that type of
average is most sensitive differ. These effects occur because different types of
average report on different aspects of the underlying distribution (Figure 4.1). A
linear or near-linear average, for instance, lies near the centre of the distribution,
whereas a highly non-linear average such as an r−6 average lies towards the edge,
and is most influenced by the outliers in the distribution.
If more than one type of average is known, they can be combined to give
more information about the shape of the underlying distribution (Figure 4.1 C).
Proof of this principle was given by Choy et al.387, who obtained a more precise
description of the Rg distribution of an unfolded protein when they fitted the
distribution function to both the⟨R2
g
⟩1/2 derived from SAXS and the⟨R−1
h
⟩−1
measured by PFG-NMR simultaneously. It also forms the basis of one aspect
of the improvements to the previously published ERMD method made in this
chapter.
4.3 Definition of PRE distances
In a SDSL PRE-NMR experiment, the spin-label is covalently attached to the
sulfur atom of an introduced cysteine residue. All of the experimental data
62
0 20 40 60 80 100Distance
0
0.01
0.02
0.03
0.04
0.05
0.06
0 20 40 60 80 100Distance
0
0.01
0.02
0.03
0.04
0.05
0.06
0 20 40 60 80 100Distance
0
0.01
0.02
0.03
0.04
0.05
0.06<r
-6>
-1/6<r
-6>
-1/6
<r><r> <r>
<r-6
>-1/6
A B Cp
(Dis
tan
ce)
p(D
ista
nce
)
p(D
ista
nce
)
Figure 4.1: The relationship between different types of average and the under-
lying distribution. Two distributions for which (A) the r−6 averages are equal
but the linear averages are different, (B) the linear averages are equal but the
r−6 averages are different and (C) both types of average are the same.
used in Chapters 5 and 6 were obtained using the nitroxide spin-label MTSL
(Figure 4.2). The reduction in the 1H peak intensity in the 1H-15N HSQC
spectrum that occurs due to the presence of the paramagnetic spin-label is
dependent on the r−6 average of the distance between the free electron of the
spin-label and the amide hydrogen whose cross-peak is affected. This average
distance can be calculated from Iox/Ired, the ratio of the intensity of the peaks
in the 1H-15N HSQC spectrum when the spin-label is in its oxidised and reduced
states, using equations 2.6 and 2.7199,208,224.
In order to use distances derived from PRE-NMR as restraints, it is necessary
to define an equivalent set of distances within the context of the simulation. It
is possible to use any arbitrary definition for the purpose of calculating and
implementing the synthetic PRE distance restraints, but to ensure that the
method is applicable with experimental distance restraints, the same definition
should be used for both. The atomic coordinates of the amide hydrogens may
simply be extracted from the coordinates of the molecule. Demarcating the
electron-proton distance is complicated, however, by the range of issues relating
to the representation and behaviour of the spin-label outlined below.
If the overall protein structure can be assumed to be rigid, such as in the
determination of a folded NS206–210, then correct interpretation of the electron-
proton distance requires knowledge of the orientation and motion of the spin-
label. Various attempts have therefore been made to model the spin-label side
chain explicitly. Single-replica simulated annealing with the spin-label included
has been used to determine folded native structures208–210, and the paramag-
netic group has been prepended to known X-ray structures to check the quality
63
of the experimental PRE distances206,209. In one study, flexibility of the para-
magnetic group was accounted for by using a multiple-conformer representation
and an S2 order parameter for the PRE interaction vector, but the remainder of
the molecule was represented by a single structure207. This approach is difficult
to implement, however, when PRE distance restraints from multiple spin-label
positions are used simultaneously, thus it was not attempted here.
EPR spectroscopy has shown that at solvent-exposed loop sites, MTSL has
high isotropic mobility388. When attached to the exposed surface of a helix,
the g+g+ conformation, with X1 and X2 ∼ 300◦ is dominant, but this weak
conformational preference is easily overcome by specific favourable interactions
or the need to minimize steric clashes389. Rotation about the disulphide bond
is constrained due to a relatively high activation energy of ∼ 7 kCal·mol−1 390,
resulting in a preferred rotamer of X3 ∼ 270◦ 391. Thus of the 5 rotable bonds
between the nitroxide ring and the polypeptide backbone, there is only signifi-
cant motion about the X4 and X5 bonds388,389, of which rotation about X4 is
the primary determinant of the position of the nitroxide ring388,391.
When attached to exposed loop regions, the nitroxide of MTSL has high
flexibility389, corresponding to the large range of allowed X4 values of the g+g+
state. Rotation about X4 moves the nitrogen atom of the nitroxide on a circle of
points of diameter ∼ 6 A391 (Figure 4.2). Back-calculation of experimental data
for small helical peptides showed that a rigid arm of length ∼ 6.7 A perpendic-
ular to the helix axis and passing through the Cβ atom provides a reasonable
approximation to the position of MTSL392. The expected error in the electron-
proton distances can therefore be estimated, although the secondary structure
dependence of the MTSL conformation389,391 means that this information can
only be used for structural refinement when the secondary structure is known,
which is seldom the case when modelling DS.
In fact, the lack of a single well-defined molecular structure in DS ensembles
effectively means that the conformation of the protein backbone with respect
to the point at which the paramagnetic label is attached may be considered to
be completely random. When MTSL is attached to solvent-exposed loops, the
motion is highly isotropic388, thus the allowed conformations of MTSL attached
to a disordered protein are likely to be similar. The experimental observable
therefore contains contributions from all possible orientations of the spin-label in
combination with all accessible conformations of the backbone, greatly reducing
the dependence of the calculated distance on the orientation of MTSL.
For the previous PRE-ERMD of αS, the PRE distance was defined as the
distance between the atom of the wild-type side-chain of the spin-labelled residue
64
furthest from the protein backbone and the amide hydrogen atom205. The
length of the side-chain varies, however, depending on the residue type, whereas
experimentally, the spin-label is always attached to a cysteine. The only way
to remove this source of variation is to define the position of the spin-label as
being the position of the Cα atom of the spin-labelled residue, as this is the
only atom that is present in all amino acids. If the motion of the spin-label and
the side-chain of the cysteine residue to which it is attached are indeed truly
random with respect to the polypeptide backbone, then the Cα atom is the
centre of the locus of conformations populated by MTSL. The average distance
between the amide hydrogen and the free electron is not the same as the distance
between the Cα atom and the amide hydrogen, due to the r−6 averaging. The
magnitude and direction of the discrepancy cannot be determined unequivocally
however, thus for the purposes of these simulations, the PRE distance is defined
as the distance between the Cα atom of the spin-labelled residue and the amide
hydrogen. This approximation has no effect on the back-calculation and use of
synthetic PRE distance restraints, which are exact, but becomes more important
when experimental data is used in Chapters 5 and 6.
Figure 4.2: Diagrams of (A) the structure of the MTSL side-chain, indicating
the dihedral angles X1 − X5 and the 4-H atom on the nitroxide ring and (B)
the locus of points swept out by the nitroxide nitrogen due to rotation about
X4391.
4.4 Preliminary results
4.4.1 Generation of reference ensembles
Having obtained a definition of a PRE distance, the remaining parameters de-
scribing the PRE-ERMD simulations can be specified. It is necessary to carry
out the PRE-ERMD using a different effective energy function to that used to
generate the reference ensembles, as otherwise the satisfaction of the restraints
65
is trivial. In Chapter 3 it was shown that by using the sasa355 or eef1280
implicit solvent models at high T it is possible to produce converged ensembles,
the⟨R−1
h
⟩−1of which can be matched to the experimental value by tuning T .
The two reference ensembles of αS structures, REF23 and REF20, were there-
fore generated using unrestrained molecular dynamics with the eef1280 implicit
solvent model and the PRE-ERMD was carried out using the sasa355 implicit
solvent model.
For REF23, T was chosen such that the⟨R−1
h
⟩−1is close to the experimental
value for αS in dilute solution234. The smoothing of the free energy landscape at
high T was accounted for to an extent by filtering REF23 to increase the amount
of residual structure. Because the previous PRE-ERMD of αS suggested that
it has a tendency to form contacts between the C-terminus and the central
NAC region205, only structures with more than 15 contacts between these two
regions were included in REF23. This selection process reduces the⟨R−1
h
⟩−1by
∼ 1 A but does not markedly change other ensemble-averaged quantities (data
not shown). The second, more compact reference ensemble, REF20, serves to
confirm that the PRE-ERMD method can be applied to a range of different
types of DS ensembles. The residual structure content of REF20 is sufficiently
high that filtering was not required.
4.4.2 Absence of correlated motions
It is clear from the factors discussed in Chapter 1 that averages alone are not
sufficient to describe heterogeneous ensembles such as those typical of DS of
proteins. A further question of interest is whether distributions alone constitute
an adequate description of an ensemble, or if correlations between distributions
must be included as well. The answer to this question also has implications
regarding the choice of appropriate methods for assessing the success of PRE-
ERMD.
The presence of correlated motions in the reference ensembles was investi-
gated by comparing the joint distributions, p(rAB, rAC), of distances between
pairs of Cα atoms AB and AC at a range of positions throughout the sequence
to the product of the distributions, p(rAB) ∗ p(rAC). The similarity between
the two types of 2D histograms was quantified using S values, with a high S
value indicating the existence of correlations. Three representative pairs of dis-
tributions with low, medium and high S values are shown in Figure 4.3. In the
presence of correlations, the joint distribution has an elongated shape, whereas
the product of the distributions remains circular (Figure 4.3 A, B, D and E).
Plotting the S values for all combinations of B and C for a given A allows
66
the pairs of residues for which the distance distributions are most correlated
to be identified. In REF23 (Figure 4.4), the S values are highest when B and
C are close together in sequence. This result is not surprising, as it is likely
that two residues close together in sequence will also occupy similar spatial
coordinates, so that any changes in the distance from B or C to A will occur
in a coordinated manner. The S values for the remainder of the B/C pairs are
predominantly low, other than when B and C are both in the C-terminus. This
latter effect is a result of the filtering procedure used to produce REF23, as it
does not occur for REF20, which is similar to REF23 in all other respects (data
not shown). Thus correlations are essentially absent in REF23 and REF20,
other than those arising from the persistence length of the polypeptide chain,
which can be estimated to be ∼ 7 residues. Reproduction of the probability
distributions underlying the PRE distances is therefore sufficient to certify that
the reference ensemble is accurately reconstructed.
Figure 4.3: 2D histograms of (A-C) p(rAB) ∗ p(rAC) and (D-F) p(rAB, rAC) for
REF23. The residues and the S values quantifying the similarity of the two
types of histogram are (A,D) A = 1, B = 105, C = 116 (S = 0.86), (B,E) A =
1, B = 71, C = 130 (S = 0.50) and (C,F) A = 82, B = 71, C = 130 (S = 0.23).
4.4.3 Calculation of synthetic distance restraints
The set of synthetic restraints back-calculated from the reference ensembles
comprises 1000 long-range PRE distances between the Cα atoms of 8 ‘spin-
labelled’ residues and all amide hydrogens except those on residues adjacent to
67
Figure 4.4: 2D plots of the S values quantifying the agreement between
p(rAB, rAC) and p(rAB)∗p(rAC) for REF23 for (A) A = 29 and (B) A = 71. The
discontinuity of the region of highest S values for B ∼ C is due to the inability
of the smoothing function (see Section 2.3.3) to interpolate fully between the
limited number of A, B and C for which S values were available.
the spin-labelled residues. Whilst many more distance restraints could have been
calculated, this number was chosen because it is the upper limit for the number
of distances that can practically be obtained experimentally for a protein of
this size. A ‘free’ dataset, also consisting of 1000 PRE distances, was created
for cross-validation purposes. Although experimentally, the free dataset usually
comprises 20% of the data, with the remaining 80% used as restraints, here it
is the same size as the working dataset so that the statistics for two sets are
comparable.
Typically, when using experimentally determined PRE distances as restraints,
the ensemble-averaged distance at each point in time, dcalcij (t), are required to
lie within a square well defined by drefij − L and dref
ij + U , where L and U are
lower and upper bounds, respectively, and drefij is the experimental or synthetic
distance restraint. A harmonic potential is applied outside the square well to en-
sure continuity. Justification for the use and choice of L and U , including which
of the PRE distances are given both and upper and lower bounds, and which are
used as ‘negative’ restraints and assigned only one or the other, are discussed
further in Section 5.2.3. Here, L and U were assigned as if the distances had
been calculated from experimental data. Distances greater than d0.85ij or less
than d0.15ij , where d0.85
ij and d0.15ij represent the maximum and minimum reliable
distances that can be determined experimentally (see Section 2.2.3), were as-
signed only a lower or upper bound, respectively, corresponding to d0.85ij − L or
d0.15ij + U .
68
4.4.4 Application of PRE-ERMD
A number of PRE-ERMD simulations using PRE distance restraints back-
calculated from REF23 were carried out varying Nrep, L and U . The simulation
temperature, T , of 515 K was chosen so that the 〈Rg〉 of an unrestrained ensem-
ble matches that of REF23. Each ensemble produced using PRE-ERMD was
compared to the appropriate reference ensemble in terms of both averages and
distributions. The ensemble-averaged PRE distances and Rg were compared
using Q values359 (equation 2.14) and the distributions underlying these ob-
servables were compared using S values344 (equations 2.15 and 2.16). Because
an ensemble is better defined in terms of distributions of observables rather than
averages, the S values provide the best measure of how accurately the reference
ensemble is recovered.
In most cases, application of restraints causes the average size of the struc-
tures to decrease, so that it is difficult to obtain an ensemble of structures for
which the 〈Rg〉 and the Rg distribution match those of REF23. Furthermore,
the Q values and S values for the PRE distances (QPRE and SPRE) are not
optimised simultaneously (Table 4.1 and Figure 4.5 A), indicating that the best
conditions for the reproduction of the r−6-averaged PRE distances are not the
same as for the reproduction of the underlying distributions. This can occur
because of the complex relationship between averages and distributions386 dis-
cussed in Section 4.2. PRE distances are r−6 averages, thus the poor correlation
between SPRE and QPRE is most likely due to the left-hand side of the PRE dis-
tance distributions of the ensembles generated using PRE-ERMD not matching
those of REF23. The inability of cross-validation against the PRE distances
to report on how well the distributions are reproduced is of particular concern
because the ultimate aim is to use experimental data as restraints, in which case
the true underlying distributions are not known.
Based on these preliminary results, there are therefore two main issues that
need to be addressed. Firstly, a validation measure based on information avail-
able experimentally that, like the S values, reports on how well the distributions
are reproduced is required. Secondly, it is necessary to find a means of over-
coming the compaction induced by the application of PRE distance restraints.
The means by which these problems were circumvented are described in the
following section.
69
Table 4.1: The 〈Rg〉, Q and S values quantify how well REF23 is reproduced
varying the number of replicas (Nrep) and the lower (L) and upper (U) bound-
aries. The simulation temperature of T = 515 K was chosen because the 〈Rg〉 of
an unrestrained ensemble at this T is equal to that of REF23 (23.2 A). QRg and
SRg refer to the Rg, QwPRE and SwPRE to the working PRE distance restraints
and QfPRE and SfPRE to the free PRE distances.
Nrep L U 〈Rg〉 (A) QRg SRg QwPRE QfPRE SwPRE SfPRE
16 5 5 18.3 0.21 1.24 0.11 0.16 0.47 0.43
24 5 5 19.3 0.17 0.96 0.09 0.14 0.39 0.37
32 5 5 20.0 0.14 0.78 0.10 0.17 0.35 0.34
16 1 1 13.8 0.41 2.0 0.11 0.16 0.66 0.54
24 1 1 17.6 0.24 1.42 0.09 0.21 0.57 0.51
32 1 1 17.9 0.23 1.34 0.10 0.15 0.51 0.45
16 1 8 19.9 0.14 0.80 0.17 0.15 0.38 0.38
24 1 8 21.2 0.09 0.45 0.15 0.13 0.30 0.29
32 1 8 21.9 0.06 0.33 0.15 0.13 0.28 0.28
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5
0 0.1 0.2 0.3 0.4 0.50
0.5
1
1.5
2
0 0.1 0.2 0.3 0.4 0.50
0.05
0.1
0.15
0.2
0.25
SPRE SPRE SPRE
QP
RE
SR
g
QR
g
A B C
Figure 4.5: Correlation between the SPRE values and (A) the QPRE values, (B)
the SRg values and (C) the QRg values. Each point corresponds to a different
ensemble created using PRE-ERMD using synthetic distance restraints calcu-
lated from REF23. The working data are shown in black and the free data in
red.
4.5 Improvement of the PRE-ERMD method
4.5.1 Cross-validation against multiple observables
The work of Choy et al.387 provides a starting point for the development of
an alternative validation measure. They use two different types of average to
70
define more precisely the parameters of a distribution function. In the case of
PRE-ERMD, the distribution function is implicitly included in the choice of
simulation conditions, thus all that is required is an experimental observable
that is a type of average other than r−6.
Whilst the expanded and heterogeneous range of structures comprising DS
ensembles renders many experimental observables difficult to obtain and in-
terpret, one observable that remains both measurable and informative is the
Rg. The geometric Rg is easily calculated from the atomic coordinates of each
structure. Root-mean-square averaging (⟨R2
g
⟩1/2) can be used for comparison
with experimental values obtained by SAXS. There are also programs available,
such as crysol393, that calculate the solution scattering profile and thus the
expected experimental⟨R2
g
⟩1/2, from the atomic coordinates. Additionally, the
calculated Rg of each structure can be converted into an Rh using a phenomeno-
logical relationship as described in Section 2.3.1, and the ensemble-averaged⟨R−1
h
⟩−1computed for comparison with the experimental value obtained from
PFG-NMR. In all cases, the type of average is different to an r−6 average;
accordingly, it imparts additional information regarding the shape of the under-
lying distribution. For simplicity, the 〈Rg〉 was used with the synthetic PRE
distance restraints, though the same conclusions hold if other types of averaging
are used.
In order for the information contained in the r−6-averaged PRE distances
and the linearly (or otherwise) averaged Rg to be combined, the first requirement
is that the distributions of each type of observable are correlated. This is indeed
the case: ensembles for which SPRE is low also exhibit low SRg (Figure 4.5 B).
In fact, QRg is also highly correlated with SPRE (Figure 4.5 C). This is in part
an artifact of the nature of the reference ensembles used here. The widths of
the distributions are correlated with the midpoints of the distributions, thus
if a linear average such as the 〈Rg〉 matches, the distributions tend to be of
similar width. However in the general case it still holds that cross-validation
against different types of average, which report on different aspects of the un-
derlying distribution, provides a better measure of whether the distributions,
and therefore the ensemble, are correct. Therefore, when using experimental
data, cross-validation against the Rg or Rh should be used as a substitute for
cross-validation against S values in order to determine the optimal choice of
simulation conditions. This criterion was used in the PRE-ERMD described in
the remainder of this thesis.
71
4.5.2 Explanation of the compaction problem
In order to cross-validate against the Rg, it is first necessary to overcome the
compaction induced by the application of restraints and generate ensembles
that are sufficiently expanded. The reason why compaction occurs is related
to the issues addressed by cross-validation. As outlined in Section 4.2, this
process aims to determine the number of replicas for which there are sufficient
degrees of freedom to account for the time- and ensemble-average nature of the
experimental data, but not so many that over-fitting occurs.
It is impossible, however, to consider as many replicas as the number of copies
of the molecule that contribute to an experimental solution-state observable, not
only due to the avoidance of over-fitting, but also for more practical reasons,
such as computing resources. This is particularly pertinent for DS, where each
experimental value is an average over a broad distribution. With fewer replicas,
only a small fraction of the contributing values can be sampled at each point
in time. If the ergodic principle holds, this is compensated for by simulating
for a sufficiently long time. The application of restraints, however, poses a
significant restriction to the ergodicity of the simulations, which is exacerbated
by the sensitivity of the r−6-averaged PRE distances to the smallest contributing
values (Figure 4.1). When there are fewer replicas, a greater proportion must
contain short distances in order to satisfy the restraint at each point in time.
This results in narrow distributions containing mostly short distances close to
the r−6 average, and, ultimately, ensembles of structures that are too compact
(Figure 4.6 A).
Accordingly, despite carrying out the ERMD at temperatures where the 〈Rg〉of an unrestrained ensemble matches that of the relevant reference ensemble, the
〈Rg〉 decreases upon application of synthetic PRE distance restraints, even with
32 replicas (Table 4.1). Ways to increase the range of structures accessible at
each point in time other than explicitly increasing the number of degrees of
freedom by increasing Nrep were therefore investigated.
4.5.3 Solving the compaction problem
PRE distance restraints are typically not enforced precisely; rather, as men-
tioned previously, dij(t) is simply required to lie within a harmonic square well
defined by L and U . Manipulating L and U provides a simple mechanism
for indirectly controlling the range of distances sampled and so the width of
the distance distribution. L and U were originally implemented to account
for experimental inaccuracies and the associated errors in the calculated dis-
72
0
0.1
0.2
0.3
0.4p
(Dis
tanc
e)
Nrep=16,L=5,U=5
0
0.1
0.2
0.3
0.4
Nrep=24,L=5,U=5
0 20 40 60Distance (Å)
0
0.1
0.2
0.3
0.4
p(D
ista
nce)
Nrep=24,L=1,U=8
0 20 40 60Distance (Å)
0
0.1
0.2
0.3
0.4
Nrep=24,L=1,U=1
A B
C D
Figure 4.6: The effect of changing Nrep, L and U . The distributions of distances
sampled over all time-points and all replicas, rij,k(t), are in black, the distribu-
tions of ensemble-averages compiled over all time-points, dij(t), are in red and
the distributions of distances calculated from REF23, rrefij,k, are in grey. Also
shown are the overall time- and ensemble-average calculated from the PRE-
restrained ensemble, dij , in green and from REF23, drefij , in blue and the lower
(L) and upper (U) bounds in cyan. The data are from four ensembles generated
using synthetic PRE distance restraints calculated from REF23.
tance199,208,224. In Chapter 5, it is shown that by careful treatment of the
experimental data, a smaller degree of tolerance than that used for most pre-
viously published examples of ERMD (L,U = 4 − 5 A) can be justified, thus
extending the usefulness of this practice beyond the original spirit in which it
was implemented.
The smaller L and U are, the closer dcalcij (t) is to dref
ij at each point in time
(Figure 4.6 B vs D). Although altering Nrep does not directly control the variety
of distances contributing to dcalcij (t) (that is, the width of the distribution of
distances rij,k at each time-point, t), in general, a wider range of distances are
sampled at each point in time if Nrep is large (Figure 4.6 A vs B). On the other
hand, increasing L and U allows more variation in dcalcij (t) (Figure 4.6 B vs D).
Over many time-points, this variation equates to a wider range of distances
73
being sampled for a given Nrep. Thus increasing the tolerance to instantaneous
fluctuation in the ensemble-averaged observables can atone for a reduced number
of replicas, without explicitly increasing the number of degrees of freedom.
For the time- and ensemble-average with fewer replicas and larger L and U
to be equivalent to that obtained with more replicas and smaller L and U , the
dcalcij (t) over multiple timesteps must be evenly distributed within dref
ij −L, drefij +
U . This is the case (Figure 4.6). If L and U are equal, however, such that the
range of dcalcij (t) collected over all time-points, t, are evenly distributed either
side of drefij , the r−6 average calculated from the overall distribution of rcalc
ij,k ,
pooled over all Nrep replicas and all time-points t, is in general smaller than the
imposed restraint drefij (Figure 4.6 A, B and D). This is because approximately
half of the rij,k lie between drefij and dref
ij − L, and these small rij,k have a
disproportionately large influence on dcalcij . If the tolerance is to be used to
compensate for using fewer replicas, then L and U must be chosen such that
the overall distribution of rcalcij,k contains a smaller proportion of short distances.
This can be achieved by favouring dcalcij (t) > dref
ij at the expense of dcalcij (t) < dref
ij :
essentially, L < U (Figure 4.6 C).
A range of different combinations of L and U were tested with 16, 24 and
32 replicas and the synthetic PRE distance restraints calculated from REF23.
The key results are summarised in Table 4.1. With L = 1 and U = 8 the
desired effect is obtained without the upper bound becoming so large that it
ceases to act as a restraint. However even with 32 replicas the distributions are
still narrower than those of REF23, containing too many short distances and
not enough large distances (data not shown, but see Figure 4.6), and the 〈Rg〉remains too low.
Further measures are therefore required to encourage the sampling of longer
distances. The results of Chapter 3 suggest that increasing the simulation tem-
perature, T , tends to generate more expanded structures, in which the inter-
residue distances are more likely to be large. As discussed in Chapter 3, T
should not be thought of as a true physical quantity, but as a source of en-
ergy to overcome the bias of the force-field and implicit solvent models towards
compact structures. Here the additional energy provided by increasing T also
helps to reverse the tendency towards sampling shorter distances caused by re-
straining an r−6-average calculated over fewer replicas than contributed to the
restraint.
The effectiveness of treating T as an adjustable parameter was tested by
attempting to reproduce the more compact ensemble, REF20, as well as REF23.
74
In both cases, 24 replicas were used with L = 1 and U = 8. By adjusting T ,
both reference ensembles can be accurately reproduced in terms of distributions
and averages (Table 4.2), indicating that this method is applicable to different
types of ensembles. The optimal T in each case depends on the broadness of
the reference ensemble and the compactness of the structures, so that a higher
T is required to reproduce REF23 than REF20.
Table 4.2: The 〈Rg〉, Q and S values quantify how well REF23 (Rg = 23.2 A)
and REF20 (Rg = 20.0 A) are reproduced by varying the simulation tempera-
ture, T , with Nrep = 24, L = 1 and U = 8. QRg and SRg refer to the Rg, QwPRE
and SwPRE to the working PRE distance restraints and QfPRE and SfPRE to
the free PRE distances.
T (K) 〈Rg〉 (A) QRg SRg QwPRE QfPRE SwPRE SfPRE
REF23 485 19.6 0.16 0.88 0.13 0.17 0.37 0.38
515 21.2 0.09 0.45 0.15 0.13 0.30 0.29
550 22.5 0.03 0.22 0.15 0.12 0.27 0.25
570 23.0 0.01 0.12 0.17 0.13 0.26 0.25
REF20 515 20.5 0.02 0.13 0.17 0.13 0.26 0.25
550 22.0 0.09 0.43 0.15 0.14 0.23 0.22
4.6 General protocol for PRE-ERMD
To facilitate the application of the techniques developed in Section 4.5 to char-
acterise any type of disordered state for which both Rg and PRE measurements
are available, a general protocol was devised (Figure 4.7). As explained in the
caption of the figure, this method is based on the simultaneous minimisation of
QRg and QPRE.
The general method was tested by comparing its ability to reproduce REF23
to the results obtained by trial and error. The statistics are similar in both cases
(Table 4.3), showing that sufficient structures are collected at each temperature
during the initial phases to obtain meaningful statistics. Moreover, the agree-
ment between the final calculated ensemble determined using the generalised
protocol and REF23 is good, especially in terms of distributions (Table 4.3 and
Figure 4.8). The lack of correlations in REF23 means that this is sufficient to
consider the two ensembles to be equal. Thus the desired result - delineation of
75
a general method capable of reproducing DS ensembles - has been achieved.
Figure 4.7: Outline of the general method for carrying out PRE-ERMD. In
all cases, Nrep = 24, L = 1 and U = 8. The molecules are first heated to a
700 K in 50 K increments, then the force constant, α, is increased to a value
that is sufficiently high that the restraints are satisfied but not so high as to
cause large changes in the energy. The next three steps form a loop in which
after a brief equilibration phase, a preliminary set of structures is collected,
before the temperature is lowered by 25 K and the process repeated. The 1920
structures (80 per replica) collected at each temperature are sufficient to obtain
reliable estimates of QRg and QPRE. At the temperature at which these are
both optimised, a further 5760 structures are collected (240 per replica).
4.6.1 Additional modes of validation
Throughout the work discussed in this thesis so far, the reproduction of distri-
butions has emerged as a critical factor in ensuring that an ensemble generated
using ERMD is equivalent to that from which the restraints are derived. Whilst
the S value provides a measure of how well the distributions are satisfied over-
all, and the sl values can be used to extract localised information, visualising
76
Table 4.3: The 〈Rg〉, Q and S values quantify how well REF23 (Rg = 23.2 A)
is reproduced by varying the simulation temperature, T , with Nrep = 24, L = 1
and U = 8. QRg and SRg refer to the Rg, QwPRE and SwPRE to the working
PRE distance restraints and QfPRE and SfPRE to the free PRE distances. The
results for the most representative ensemble collected at the optimal T are in
bold type.
T (K) 〈Rg〉 (A) QRg SRg QwPRE QfPRE SwPRE SfPRE
500 20.4 0.12 0.64 0.15 0.15 0.31 0.31
525 21.5 0.07 0.38 0.16 0.15 0.31 0.31
550 22.5 0.03 0.22 0.16 0.16 0.30 0.29
575 23.0 0.01 0.11 0.17 0.18 0.29 0.28
590 23.2 0.00 0.06 0.17 0.15 0.26 0.25
600 23.4 0.01 0.07 0.17 0.17 0.29 0.28
625 23.6 0.02 0.09 0.18 0.19 0.29 0.27
650 24.1 0.04 0.14 0.19 0.21 0.28 0.28
675 24.4 0.05 0.20 0.19 0.24 0.29 0.29
700 24.5 0.06 0.21 0.20 0.25 0.30 0.29
725 24.4 0.05 0.24 0.22 0.25 0.30 0.29
750 24.7 0.06 0.28 0.21 0.28 0.30 0.30
the distributions supplies additional information, allowing the causes of the ob-
served sl values to be understood. In Figure 4.8, only three PRE distance
distributions are shown as it is not feasible to examine them all individually. A
pictorial summary of the nature of the distributions is therefore desirable.
The overall pairwise distance distribution function, p(r), is a graphical repre-
sentation that includes information regarding all distance distributions describ-
ing the ensemble. p(r) is also one of the few experimentally accessible distribu-
tion functions. It is obtained by taking the sine Fourier transform of the SAXS
scattering profile of a protein in solution162,163. The experimental p(r) includes
contributions from all pairs of interatomic distances within the macromolecule.
Here, it is approximated by considering only CαCα distances to reduce the com-
putational cost. Distributions were calculated for αRC (Chapter 3), REF23,
the optimal PRE-restrained ensemble generated at T = 590 K (αPRE), and
two unrestrained ensembles also generated using sasa. The⟨R−1
h
⟩−1of the
first, αSASA (Chapter 3), is the same as that of REF23, whereas the other
unrestrained ensemble (αSASA32) was generated at 590 K, and thus contains
77
10 20 30 40 50 600
0.04
0.08
0.12
0.16
20 40 60 0 25 50 75 100 1250
0.01
0.02
0.03
0.04
0 50 100 0 25 50 75 100 1250
0.01
0.02
0.03
0.04
0 50 100 0 25 50 75 100 1250
0.01
0.02
0.03
0.04
0 50 100
0 5 10 15 200
5
10
15
20
0 5 10 15 20 0 25 50 75 100 125 150 1750
0.01
0.02
0.03
0 100r (Å)
p(r
)
dijref
(Å)
dijca
lc (
Å)
Rg (Å)
p(R
g)
dij (Å) dij (Å) dij (Å)
p(d
ij)
p(d
ij)
p(d
ij)
A B C D
E F
Figure 4.8: Comparison of αPRE with REF23 in terms of (A) Rg distributions,
(B-D) three examples of distance distributions and (E) scatter plot of inter-
atomic distances. For the distributions, REF23 is shown in black and αPRE
in red. In (E), the working dataset is in black and the free dataset in red.
(F) Comparison of p(r) calculated from REF23 (black), αPRE (red), αSASA
(green), αSASA32 (blue) and αRC (yellow).
structures that are much more expanded on average (⟨R−1
h
⟩−1= 31.9 A).
At small r, the p(r) of all of the ensembles overlay, with two well-defined
peaks at ∼ 4 and 7 A corresponding to nearest-neighbour packing effects (Fig-
ure 4.8 F). Thereafter, the p(r) for αRC is considerably flatter and broader
than that of the other ensembles, as is expected given its much larger⟨R−1
h
⟩−1
(∼ 37 A). The p(r) of REF23 and αSASA are similar. However the p(r) of
αPRE provides an even closer match to that of REF23, indicating that the ap-
plication of PRE distance restraints provides additional information not present
in the effective energy function defined by the force-field and solvent model. The
much broader and flatter p(r) of αSASA32 reveals the extent of the compaction
effects induced by the application of PRE distance restraints.
Each of the validation measures examined so far reflects how well a particular
type of observable is reproduced. A complementary test of the effectiveness of
the PRE-ERMD method is to compare the free energy landscapes of the vari-
ous ensembles. It is a question of central interest whether molecular dynamics
simulations with experimentally-derived restraints can be used to calculate free
energies. Here, 2D free energy landscapes were defined by considering the joint
probability of occurrence of pairs of observables. The Rg, SASA and end-to-end
distance, REE were chosen as the observables of interest. Free energy landscapes
78
were created for REF23, αSASA and αPRE (Figure 4.9). There is a very good
agreement between the two types of free energy landscape of REF23 and those
of αPRE. In contrast, there is a large discrepancy between the free energy land-
scapes of αSASA and those of REF23. These results demonstrate that the use
of a pseudo-energy function based on experimentally-derived restraints is capa-
ble of modifying the force field so that the resulting equilibrium conformational
distribution becomes correct and confirm the earlier conclusion that the general
method is capable of accurately reconstructing a given ensemble.
Figure 4.9: Free energy landscapes of (A,D) REF23, (B,E) αSASA and (C,F)
αPRE. The free energy is defined as (A-C) F (Rg, SASA) = − ln p(Rg,SASA)
and (D-F) F (Rg, REE) = − ln p(Rg, REE), where REE is the end-to-end distance.
The Rg and REE are in A and the SASA is in A2.
4.7 Conclusions
Extensive testing of the ability of ERMD with distance restraints calculated
from two arbitrary reference ensembles showed that with an appropriate choice
of simulation parameters, it is possible to reproduce accurately a DS ensemble
despite having information about only a small fraction of the distances. The
concept of cross-validation against more than one type of average was intro-
duced in order to evaluate whether the underlying distributions, and therefore
the ensemble, are correct in cases where the true distributions are not known.
This is an important prerequisite for the use of experimental data, in which case
79
the underlying distributions are not available for validation. Additionally, some
changes to the previously published method201,202,205 were proposed to allevi-
ate the difficulty in compromising the need to avoid over-fitting with the bias
towards overly-compact structures due to aspects of the implicit solvent models
and the restraint of r−6-averaged observables on a limited number of replicas.
These changes were justified both empirically, due to the improvement in the re-
production of the reference ensemble, and theoretically. Comparison of a range
of quantities calculated from various reference, unrestrained and restrained en-
sembles confirmed that the general method developed in this chapter provides
an accurate and efficient means of obtaining DS ensembles. In the following two
chapters, the application of this general method for carrying out PRE-ERMD
to characterise DS ensembles is applied using experimental data for the IDPs
αS, βS and β+HC, and the acid-denatured state of the NFP PI3-SH3.
80
Chapter 5
Comparison of the solution
state ensembles of
α-synuclein, β-synuclein
and β+HC
5.1 Introduction
In Chapter 4 an improved method for ERMD of disordered states using PRE
distance restraints was developed. The changes were justified according to how
well a reference ensemble of αS structures was reproduced in terms of distribu-
tions as well as averages. This resulted in a general protocol for PRE-ERMD
capable of accurately reconstructing an ensemble. In this chapter, the general
PRE-ERMD method is used to characterise the two related proteins, αS and βS
and an artificial construct, β+HC145. These three polypeptides are of interest
because, despite high sequence identity, they exhibit contrasting aggregation
behaviour. In order to properly understand their different properties, it is de-
sirable to obtain a complete description of the solution state ensembles of each
protein in terms of the constituent structures and their relative populations.
PRE-ERMD provides a means of determining such ensembles for DS.
Before applying the general method with experimental data, the relationship
between the data obtained from a PRE-NMR experiment and the calculated
inter-atomic distance is explored to ascertain the sources and magnitude of un-
certainty in the distance restraints. The calculated distances are then used with
81
the improved PRE-ERMD method to obtain ensembles of αS, βS and β+HC
structures. In Sections 5.5 and 5.6, methods for analysing the resulting ensem-
bles of structures are developed and applied. The agreement with experimental
data, both quantitative and qualitative, is also assessed. Finally, the αS en-
semble is compared to the ensembles of βS and β+HC structures to examine
the effect of the hydrophobic core on the structural properties and aggregation
propensities of αS and βS at a molecular level.
Although αS has been characterised previously by PRE-ERMD205, a new
ensemble is generated here to encompass additional experimental data that has
become available since that study was published. The⟨R−1
h
⟩−1of the previously
published ensemble is similar to that of αS in D2O, pH 7.0 at 298 K (26.6 A)234.
The PRE-NMR, however, was carried out in phosphate buffer with 100 mM
NaCl, pH 7.4 at 283 K. Subsequent measurement of the⟨R−1
h
⟩−1of αS in
Mes buffer with 100 mM NaCl, pH 6.5 at 288 K showed that it is much more
expanded (32.0 A) in these conditions235. A revised ensemble of αS structures
with the correct 〈Rh〉 of ∼ 32.0 A was therefore determined using the newly
optimised method. An additional change from the previously published work
was the inclusion of a further 118 distances obtained from a spin-label positioned
at residue N122. Another ensemble of αS structures compatible with distances
derived from PRE-NMR experiments has been obtained using single-replica
simulated annealing240. A measure of the global size of these structures was
not reported, but given the issues discussed in Chapter 4, it is likely that they
were both too compact and unrepresentative.
5.2 Factors influencing the calculated distances
The calculation of distances from the experimental Iox/Ired is based on a modi-
fied Solomon-Bloembergen equation for transfer relaxation rates (equation 2.7),
as outlined in Section 2.2. The use of this equation depends on a number of
assumptions, including a constant electron-proton distance, that are not neces-
sarily true in the case of DS. Alternative means of formulating the equations
based on expressions for the spectral density394–399 were investigated as part
of this work but did not prove successful (data not shown). In the following
sections, the effect on the calculated distance of using a single correlation time,
τc and experimental uncertainty in the measured R2 and Iox/Ired is examined.
Motion of the spin-label during the time-course of the experiment may also
contribute, as discussed in Chapter 4.
82
5.2.1 Correlation time
The correlation time of the electron-proton vector, τc, contains contributions
from the relaxation of the electron and from motions of the electron-proton
vector: 1/τc = 1/τS + 1/τR, where τS is the longitudinal relaxation time of the
free electron and τR is the effective rotational correlation time of the vector199.
As τS > 10−7 s−1 for nitroxide free radicals and τR ∼ 10−9− 10−8 s−1, τc ∼ τR.
With τc in this range the term τcωH in equation 2.7 is ≥ 1, which means that
τc can be estimated from
τc =(
6(Rsp2 /Rsp
1 )− 74ω2
H
)1/2
, (5.1)
where Rsp1 and Rsp
2 are the paramagnetic relaxation enhancement of the proton’s
longitudinal and transverse relaxation rates, respectively.
The values of τc calculated from measured Rsp1 and Rsp
2 typically range from
1− 15 ns. However a detailed study by Gillespie and Shortle found that prop-
agation of the error in the measured Rsp1 and Rsp
2 results in average errors of
± 50%199. Additionally, for many vectors Rsp1 and Rsp
2 could not both be
determined with sufficient precision to permit a reasonable calculation of τc.
Consequently, they used the average value of τc (4.1 ns) for all further calcula-
tions.
Alternative means of determining τc have since been developed207,209,210.
Some of these, however, are only suitable for the determination of folded NS207,209.
Gaponenko et al.210 were able to estimate τc for each residue based on the fre-
quency dependence of paramagnetic effects, but were still forced to resort to
using an average value in some cases.
Whilst it is preferable to know τc exactly, uncertainty in this parameter has
only a limited influence on the calculated distance. A 10% error in τc results
in only 2% error in the calculated distance, thus error in τc of up to 40% can
be tolerated206. The effect of using an average τc is therefore negligible. Other
authors have followed the example of Gillespie and Shortle, approximating τc
with the global rotational correlation time of the protein in question206,208 or
simply using τc = 4 ns159,202,205. Because the global rotational correlation time
was not measured for any of the proteins studied here, τc = 4 ns is used for all
calculations.
5.2.2 Transverse relaxation rate
The intrinsic transverse relaxation rate, R2, which occurs in equation 2.6, is
commonly assumed to be equal to R2 of the diamagnetic sample, Rred2 . The
83
variation in the experimental Rred2 is generally very low. For instance, for βS,
the average SD for duplicate measurements is just 0.014%. Residue-specific
values of Rred2 were used in the distance calculations where available; otherwise,
the average over all residues was substituted. This is unlikely to introduce a
great deal of error, as the SD of Rred2 over all residues is only 2.13%. Additionally,
as for τc, the calculated distance has only an r−1/6 dependence on the fitted
Rsp2 (equation 2.7), thus any error that may be introduced during the fitting of
equation 2.6 to obtain Rsp2 has a negligible effect on the calculated distance.
5.2.3 Intensity ratio
The remaining experimental observable in equation 2.6 is Iox/Ired. Uncertainty
in Iox/Ired resulting from experimental variation can have a significant effect
on the calculated distance, particularly for large Iox/Ired, due to the non-linear
relationship between r and Iox/Ired. Various means of accounting for error in
the measurement of Iox/Ired have been developed. Error-dependent weighting
functions have been used in studies where the back-calculated Γ2 (R−12 ) rather
than the r−6 distances are restrained207,209. This method can only be used if
there is at least duplicate data for every observable, which is not always the
case with the experimental datasets used here. The simplest and most common
way to account for uncertainty in Iox/Ired is to include a degree of tolerance
towards variation in the ensemble-averaged back-calculated distances, dcalcij , at
each point in time during the PRE-ERMD. This tolerance takes the form of a
square well defined by lower (L) and upper (U) bounds, so that dcalcij within
−L and +U of the restraint are not penalised. A harmonic potential is applied
outside the square well to ensure continuity.
It is desirable to be able to implement the distance restraints as precisely
as possible, so as to maximise their information content. To quantify the con-
tribution that error in Iox/Ired makes to the calculated distance, variation of
up to 15% was introduced into a set of model Iox/Ired and the differences in
the calculated distances were examined (Figure 5.1). The effect of errors in
Iox/Ired on r depends on the magnitude of Iox/Ired. The experimental Iox/Ired
were therefore divided into three groups based on their magnitude. L and U
were assigned differently for each group so as to be appropriate for the expected
uncertainty in the calculated distance.
Iox/Ired < 0.15 correspond to the shortest inter-atomic distances. They
are therefore an important source of structural information if the two residues
involved are far apart in sequence. However the magnitude of any variation
in the experimental data between replicates is generally large relative to the
84
0 0.2 0.4 0.6 0.8 1
20
40
60
80
100D
ista
nce
(Å)
0.45 0.475 0.5 0.525 0.5515
16
17
18
0.8 0.85 0.9 0.95 120
40
60
80
100
0.8 0.825 0.85 0.875 0.9
20
25
30
35
Iox/Ired Iox/Ired Iox/Ired Iox/Ired
A B C D
Figure 5.1: The relationship between the calculated distance and the corre-
sponding Iox/Ired for (A) 0 < Iox/Ired < 1.0, (B) 0.45 < Iox/Ired < 0.55, (C)
0.8 < Iox/Ired < 1.0 and (D) 0.8 < Iox/Ired < 0.9. The distances calculated
from the correct Iox/Ired are shown in black, and the distances resulting from
errors of ±1% (solid), ±5% (dashed), ±10% (dot-dashed) and ±15% (dotted)
of Iox/Ired are shown in red.
measured value208, resulting in a large uncertainty in the calculated distance.
To maximise the amount of information gleaned from these Iox/Ired, whilst
avoiding introducing errors, a “negative” restraint was applied by requiring the
inter-residue distance to be less than an upper bound of d0.15ij + U , where d0.15
ij
is the distance calculated from Iox/Ired = 0.15.
At the other extreme, the greatest source of inaccuracy in the distances
calculated from Iox/Ired > 0.85 is the nature of the equations relating r to
Iox/Ired rather than the experimental measurement. Even a very small error in
the measured Iox/Ired can result in a large difference in the calculated distance.
Residue pairs with Iox/Ired > 0.85 were therefore assigned only a lower bound
of d0.85ij − L, where d0.85
ij is the distance calculated from Iox/Ired = 0.85.
For the remaining 0.15 < Iox/Ired < 0.85, errors of up to 10% in Iox/Ired
result in propagated errors of less than -1.9 or +3.8 A in the calculated distance
(Figure 5.1). A distance restraint was therefore only applied if all of the repli-
cate Iox/Ired measured experimentally were within 10% of the average Iox/Ired
for that residue. The fraction of the experimental data that was discarded due
to this restriction is given in Table 5.1 along with the total number of distance
restraints for each protein. The Iox/Ired from which the distances were calcu-
lated for each of the three proteins are also shown graphically in Figures 5.2, 5.3
and 5.4. The PRE-ERMD was carried out using a working dataset comprising
80% of the data, as every 5th distance was relegated to a ‘free’ dataset to be
used for independent cross-validation.
85
Table 5.1: Summary of the experimental restraints. NPRE is the total number
of distances derived from the PRE experiment and NwPRE and NfPRE are the
numbers of distances in the working and free datasets, comprising 80 and 20%
of the total data, respectively. The percentage of the experimental data that
was discarded due to inaccuracies > 10% is also shown.
Protein NPRE NwPRE NfPRE % discarded
αS 595 476 119 16.90
βS 635 508 127 16.78
β+HC 578 462 116 2.86
0
0.2
0.4
0.6
0.8
1
I ox/I re
d
Q24 S42 Q62
0 20 40 60 80 100 120 1400
0.2
0.4
0.6
0.8
1
I ox/I re
d
S87
0 20 40 60 80 100 120 140Residue Number
N103
0 20 40 60 80 100 120 140
N122
Figure 5.2: The distribution of Iox/Ired along the αS sequence for each spin-
label position as indicated. The experimental data is shown as black bars and
the Iox/Ired expected for a purely random coil is plotted as a thick red line. The
experimental Iox/Ired are those processed for use in the simulations as discussed
in the text, thus any Iox/Ired < 0.15 or > 0.85 have been set to 0.15 or 0.85,
respectively. If no bar is present, then either Iox/Ired was not measured for this
residue or it was discarded due to error > 10%.
5.3 Choice of optimal T for characterisation by
PRE-ERMD
PRE-ERMD simulations of αS, βS and β+HC were run using the general
method developed in Chapter 4, with one minor modification. When using
86
0
0.2
0.4
0.6
0.8
1I o
x/I red
A30 S42 S64 F89
0 20 40 60 80 1001200
0.2
0.4
0.6
0.8
1
I ox/I re
d
A102
0 20 40 60 80 100120 Residue Number
S118
0 20 40 60 80 100120
A134
Figure 5.3: The distribution of Iox/Ired along the βS sequence for each spin-
label position as indicated. The experimental data is shown as black bars and
the Iox/Ired expected for a purely random coil is plotted as a thick red line. The
experimental Iox/Ired are those processed for use in the simulations as discussed
in the text, thus any Iox/Ired < 0.15 or > 0.85 have been set to 0.15 or 0.85,
respectively. If no bar is present, then either Iox/Ired was not measured for this
residue or it was discarded due to error > 10%.
synthetic data, the 〈Rg〉 of the calculated ensemble was compared to that of the
reference ensemble. The Rh is the preferred experimental measure of the global
size, as it is measured by PFG-NMR under similar conditions to the PRE-NMR.
The experimental observable is a harmonic average, and thus fulfills the criteria
outlined in Chapter 4 for the improved cross-validation procedure: it is not an
r−6 average, and as a near-linear average, it reports on the central portion of
the underlying distribution.
The Rh of each of the calculated ensembles was obtained by using phe-
nomenological relationships derived separately for each protein:
αS : R−1h = 0.0148 + 0.488R−1
g (5.2)
βS : R−1h = 0.0163 + 0.454R−1
g (5.3)
β+HC : R−1h = 0.0151 + 0.494R−1
g (5.4)
The Rh and the geometric Rg of each of a set of structures encompassing a
wide range of sizes were calculated using the program hydropro356, and linear
regression was carried out to parameterise the relationship between R−1g and
87
0 25 50 75 100 1250
0.2
0.4
0.6
0.8
1I re
d/I re
dA30
0 25 50 75 100 125
S42
0 25 50 75 100 125
S64
0 25 50 75 100 1250
0.2
0.4
0.6
0.8
1
I ox/I re
d
A113
0 25 50 75 100 125Residue Number
A145
Figure 5.4: The distribution of Iox/Ired along the β+HC sequence for each spin-
label position as indicated. The experimental data is shown as black bars and
the Iox/Ired expected for a purely random coil is plotted as a thick red line. The
experimental Iox/Ired are those processed for use in the simulations as discussed
in the text, thus any Iox/Ired < 0.15 or > 0.85 have been set to 0.15 or 0.85,
respectively. If no bar is present, then either Iox/Ired was not measured for this
residue or it was discarded due to error > 10%.
R−1h . These equations were used to convert the Rg of each structure in the
ensembles generated by PRE-ERMD into an Rh as described in Section 2.3.1.
In general, the⟨R−1
h
⟩−1decreases with decreasing T , thus QRh also decreases
until the calculated⟨R−1
h
⟩−1matches the experimental
⟨R−1
h
⟩−1and then in-
creases again (Tables 5.2, 5.3 and 5.4). It is therefore straightforward to locate
the T that optimises QRh .
Determining when the QPRE values are minimised is not as simple. In Chap-
ter 4, every PRE distance was known exactly, and so could contribute to the
calculation of statistics. In comparison, because the experimental PRE distances
can only be calculated accurately for 0.15 < Iox/Ired < 0.85, it is only appro-
priate for these medium-range distances to contribute to QwPRE and QfPRE.
Unlike the synthetic QPRE values, neither the QwPRE or the QfPRE calculated
for 0.15 < Iox/Ired < 0.85 changes markedly for the range of T explored (Ta-
bles 5.2, 5.3 and 5.4). However in Chapter 4 it was shown that QRg rather
than QwPRE or QfPRE provides the best measure of when the distance distri-
88
butions are most accurately reconstructed. Consequently, QRh was used here
as the primary determinant of the optimal T . Where two T give similar results,
an intermediate value was chosen. The optimal T for each protein is slightly
different (Tables 5.2, 5.3 and 5.4, bold type); the reasons for this are discussed
further in the next section.
Table 5.2: The Q values quantify how well the experimental⟨R−1
h
⟩−1(32.0 A)
and the PRE distances for αS are reproduced by varying T with Nrep = 24,
L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE
distances) and QfPRE to the free dataset (remaining 20%). The results for the
most representative ensemble collected at the optimal T are in bold type.
T (K)⟨R−1
h
⟩−1(A) QRh Qw Qf
475 31.3 0.018 0.18 0.21
490 32.1 0.006 0.19 0.20
500 32.6 0.021 0.19 0.22
525 33.2 0.042 0.19 0.20
550 33.7 0.056 0.19 0.20
575 34.2 0.069 0.20 0.19
600 34.3 0.085 0.20 0.20
625 34.5 0.081 0.20 0.20
650 34.6 0.085 0.21 0.20
675 34.8 0.092 0.21 0.20
700 34.9 0.094 0.21 0.21
5.4 Global dimensions
Longer simulations were carried out at the optimal T for αS, βS and β+HC.
For βS and β+HC, where simulations at the optimal T had already been carried
out during the optimisation phase, the good agreement between the final and
preliminary statistics further confirms that sufficient sampling is carried out
during the initial phase to obtain reliable statistics.
The⟨R−1
h
⟩−1of each of the most representative ensembles is in good agree-
ment with the experimental value. The range of Rg sampled by each protein is
broad, reflecting the heterogeneous range of structures comprising each ensem-
ble (Figure 5.5). Comparison of the Rg distribution of the new αS ensemble
with that of the original ensemble205 shows that, as is expected given the larger
89
Table 5.3: The Q values quantify how well the experimental⟨R−1
h
⟩−1(32.4 A)
and the PRE distances for βS are reproduced by varying T with Nrep = 24,
L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE
distances) and QfPRE to the free dataset (remaining 20%). The results for the
most representative ensemble collected at the optimal T are in bold type.
T (K)⟨R−1
h
⟩−1(A) QRh Qw Qf
425 27.9 0.137 0.17 0.16
450 29.7 0.084 0.18 0.17
475 31.2 0.037 0.19 0.18
500 31.7 0.021 0.19 0.20
525 32.3 0.003 0.20 0.20
525 32.2 0.005 0.20 0.19
550 32.7 0.008 0.20 0.20
575 32.8 0.011 0.20 0.20
600 32.9 0.013 0.21 0.20
625 33.0 0.019 0.21 0.21
650 33.3 0.027 0.21 0.20
675 33.2 0.024 0.21 0.21
700 33.3 0.029 0.21 0.21
⟨R−1
h
⟩−1, a wider range of Rg are sampled, and the new ensemble contains a
greater number of expanded structures.
Further insight into the meaning of the⟨R−1
h
⟩−1can be gained by compar-
ing them to the values expected if the protein is in a compact globular state or
is a purely random coil. The predicted⟨R−1
h
⟩−1for these two reference states
were calculated according to relationships determined by Wilkins et al.161. The
experimental⟨R−1
h
⟩−1of each protein is intermediate between the two calcu-
lated values (Table 5.5). A quantitative measure of the degree of compaction
of a given polypeptide chain is given by the compaction factor, Cf161, which
scales the⟨R−1
h
⟩−1so as to account for differing numbers of residues. Cf ∼ 1
indicates that the protein is of a similar size to that expected if it were folded
into a compact, globular structure, whereas a Cf near zero indicates a highly
expanded chain. According to this measure, β+HC is the most compact of the
three proteins, and βS is the most expanded (Table 5.5). The optimal T also
correlates negatively with Cf , with a higher T required when Cf is low.
To provide a reference state from which to interpret the Rg distributions, a
90
Table 5.4: The Q values quantify how well the experimental⟨R−1
h
⟩−1(29.7 A)
and the PRE distances for β+HC are reproduced by varying T with Nrep = 24,
L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE
distances) and QfPRE to the free dataset (remaining 20%). The results for the
most representative ensemble collected at the optimal T are in bold type.
T (K)⟨R−1
h
⟩−1(A) QRh Qw Qf
450 26.9 0.093 0.19 0.23
475 30.0 0.011 0.20 0.27
475 30.3 0.020 0.20 0.28
500 31.9 0.076 0.21 0.30
525 32.7 0.101 0.20 0.31
550 33.5 0.123 0.21 0.31
575 33.9 0.141 0.21 0.34
600 34.1 0.149 0.21 0.35
625 34.1 0.148 0.22 0.34
650 34.4 0.157 0.22 0.35
675 34.4 0.158 0.22 0.36
700 34.7 0.167 0.22 0.37
random coil model analogous to that described in Chapter 3 for αS was gen-
erated for each protein. The⟨R−1
h
⟩−1,
⟨R2
g
⟩1/2 and rms end-to-end distances
of this model agree with those predicted according to theoretical equations de-
rived for random flight chains with excluded volume161,360 (Table 5.5). The
random coil Rg distributions are broader than those of the PRE-ERMD ensem-
bles for each protein, and are shifted towards larger values of Rg (Figure 5.5).
The ensembles produced by PRE-ERMD are thus more restricted than a purely
random coil as well as more compact. Additionally, as with the Cf , the Rg
distribution of β+HC is the most different to the corresponding random coil
model.
5.5 Characterising residual structure
The general method developed in Chapter 4 guarantees that the average global
dimensions of the ensembles of structures match those measured experimentally.
Additionally, the ensemble-averaged tertiary structure, defined in terms of the
often long-range PRE distances, is by definition compatible with the experimen-
91
20 40 60 800
0.02
0.04
0.06
0.08
0.1
0.12
20 40 60 800
0.02
0.04
0.06
0.08
0.1
0.12
20 40 60 800
0.02
0.04
0.06
0.08
0.1
0.12
Rg (Å) Rg (Å) Rg (Å)
p(R
g)
A B C
Figure 5.5: Rg probability distributions for (A) αS, (B) βS and (C) β+HC.
The random coil ensembles (see text for definition) are shown in black, the
ensembles calculated using PRE-ERMD are in red and (A only) the ensemble
previously calculated for αS205 is in green. The Rg distributions are plotted
rather than the Rh distributions because the former are faster to calculate, but
the Rh distributions are similar.
Table 5.5: Predicted1 and experimental2 Rh (in A) and compaction factors, C3f
for αS, βS and β+HC in various states.
U F D2O NaCl NaCl + SDS
αS Rh 37.0 19.9 26.6 31.9 24.6
Cf 0.608 0.298 0.725
βS Rh 36.0 19.7 - 32.4 32.2
Cf - 0.221 0.233
β+HC Rh 37.7 20.1 - - 29.7
Cf - - 0.455
1. U and F refer to the Rh predicted for an unfolded or folded polypeptide according to
equations 2.23 and 2.24161, respectively.
2. measured by PFG-NMR on 0−1 mM αS in D2O, pH 7.0, 298 K234, 100 (αS)235 or 200 µM
protein (αS and βS) in 99.9% D2O, 20 mM Mes buffer with 100 mM NaCl, pH 6.5, 288 K158,
70 µM protein in 10 mM phosphate buffer with 100 mM NaCl and 0.5 mM SDS, pH 7.7 at
298 K145.
3. calculated according to equation 2.25161.
tal data. The advantage of biomolecular simulations is that they complement
the averaged information accessible experimentally with atomic-level structural
detail for each conformation in the ensemble. When the ensemble contains only
a limited number of well-defined structures, such as for NFPs in solution and
partially folded states such as the folding TS, it is relatively simple to define and
characterise representative structures. For DS, however, the ensembles contain
a broad and heterogeneous range of structures, and it is not always apparent
92
how best to analyse these. In theory, cluster analysis allows the ensemble to
be grouped into sets of like structures, thus partitioning the accessible confor-
mational space into a manageable number of sub-ensembles, each of which can
be described by a single representative structure. The success of clustering,
however, is dependent on the choice of a suitable reaction coordinate to define
the distance matrix that discretises the conformational phasespace. A number
of different distance measures were tested as part of this work, but none proved
successful (data not shown).
The most useful means of summarising the structural propensities of large
ensembles of structures is to display the probabilities of occurrence and co-
occurrence of various properties as 2D maps. Ramachandran maps and free
energy maps, which were introduced in Chapters 3 and 4, are also used here.
Additionally, a new type of 2D map is developed to investigate the inter-residue
distances.
5.5.1 Distance comparison maps
Definition
A convenient means of examining the structural propensities of an ensemble
is to represent the distances between residues as a 2D plot. In the past, both
the raw ensemble-averaged distances240 and more complicated functions of the
inter-residue distances such as the residual contact probability (RCP)201,202,205
have been used. The former method, like comparing⟨R−1
h
⟩−1values, suffers
from the fact that the magnitude of the inter-residue distances for DS in gen-
eral scales with the sequence separation of the residues involved, making it
difficult to compare the distances between different pairs of residues within the
same molecule or between any pair of residues in two or more different proteins.
The RCP, defined as − ln(pcalc
ij /prcij
), accounts for the sequence separation by
comparing the probability of residues i and j being in contact in the calculated
ensemble, pcalcij , to the probability of them coming into contact if the molecule
were a purely random coil, prcij . The contact probability is calculated by deter-
mining how many times residues i and j are separated by less than 8.5 A out
of all of the structures in the ensemble. The RCP is therefore influenced most
by the shortest distances, and is impervious to the remainder of the distance
distribution. Additionally, residues far apart in sequence become increasingly
unlikely to come closer than 8.5 A, even if they are, on average, closer together
than is expected for a random coil.
To overcome the aforementioned difficulties, ‘distance comparison’ (DC)
maps were created. The rms distance between two residues in the calculated
93
ensemble,⟨dcalc2
ij
⟩1/2
, is compared to the rms distance for the same sequence
separation in a random coil,⟨drc2
ij
⟩1/2
, according to equation 2.19.
An expression that predicts the rms end-to-end distance of a random flight
chain with excluded volume and dihedral angles taken from a PDB coil database
(equation 2.21)360 was used to estimate⟨drc2
ij
⟩1/2
. The sequence separation of
the pair of residues under consideration was equated with the length of the ran-
dom flight chain. Calculation of⟨drc2
ij
⟩1/2
from the random coil gives essentially
identical distances (data not shown), meaning that if a random coil model is not
available, the theoretical predictions will suffice. The DC value for each pair of
residues is plotted directly, without the smoothing that is applied to the RCP,
so that local detail is not lost.
Interpretation
It has been shown that it is possible to obtain random coil-like scaling of
global parameters such as the Rh whilst retaining varying degrees of local struc-
ture257–259,400. Additionally, at the local level, random flight and secondary
structural characteristics may be indistinguishable. For instance, the rms inter-
residue distances for an α-helix are almost identical to those expected for a
random flight chain for sequence separations in the range 1− 8 residues258,259,
meaning that DC values near 1.0 do not necessarily correspond to random coil
structure. Of course, the effects of ensemble-averaging discussed above still
apply.
Just as for the interpretation of experimental measurements made on DS,
there may not be any structures present in the ensemble that exhibit all of
the characteristics displayed in the DC map simultaneously. In fact, it has been
shown for unfolded states of natively folded proteins that the ensemble-averaged
CαCα distance matrix is closer to that of the native structure than the distance
matrix of any individual member of the unfolded state ensemble401. Thus it
is not unexpected that, despite there being no native fold to refer to, the DC
maps indicate the presence of some residual structure. The best interpretation
of these averaged structural properties of the molecules comprising an ensemble
is in terms of structural propensities that influence the range of conformations
sampled by a given residue, rather than as continuous segments of simultane-
ously defined structure.
RCP and DC maps provide complementary information
As an example of how the DC maps reflect different aspects of the nature
of the structures in an ensemble to the RCP maps, the DC and RCP maps
94
of the original ensemble of αS structures generated using PRE-ERMD205 are
compared (Figure 5.6 A and B). The RCP map portrays an increased propensity
towards contact formation between the C-terminus and the central NAC region.
In comparison, the DC map reveals that the shortest rms inter-residue distances
relative to those expected for a random coil occur between residues 1− 60 and
110− 140, although the distances between the C-terminus and the NAC region
are also shorter than expected.
The discrepancy between the nature of the residual structure suggested by
the two types of map can be reconciled by considering the fact that the RCP
reports only on distances less than 8.5 A, whereas the DC compares the position
of the centre of the distance distributions. For instance, the RCP between the C-
terminus and the most proximal residues is increased relative to the interactions
of the C-terminus with the remainder of the sequence, but the DC is only slightly
lower. Because the pairs of residues involved in these interactions are close
together in sequence, a slight shift of the distance distribution towards shorter
distances relative to the random coil distance distribution causes a significant
increase in the probability of occurrence of distances less than 8.5 A. This results
in a significant increase in the RCP, which is logarithmically dependent on the
relative probabilities, whereas the DC, which is a simple ratio, is not affected
so markedly. The opposite situation occurs for distances between the N- and
C-termini, where the DC values suggest shorter than average distances but the
RCP values do not correspond to an increased contact probability. Even if the
entire distribution were shifted significantly towards shorter distances relative
to those expected for a random coil, thus causing a change in the DC value, the
large sequence separation would prevent the number of distances less than 8.5 A
becoming large enough to affect the RCP. Thus the apparent disparity between
the nature of the structural propensities of αS suggested by each type of map
can be easily explained by considering how each measure reacts to changes in
the relationship between the distance distributions of the ensemble of interest
and those of a random coil.
5.6 Residual structure of αS, βS and β+HC
DC maps were created for the previously published αS ensemble and the en-
sembles of αs, βS and β+HC ensembles generated here. In all cases, there are
regions in which the inter-residue distances are shorter than expected for a ran-
dom coil, portrayed by DC values significantly less than 1.0 (Figure 5.6 C-E).
This may occur simply because all three proteins are slightly more compact than
95
Figure 5.6: (A) RCP map and (B) DC map for the previously published αS
ensemble205, (C-E) DC maps for the (C) re-calculated αS (D) βS and (E)
β+HC ensembles determined by PRE-ERMD. The RCP and DC are defined
in Section 5.5.1. The same scale is used for all DC maps to aid comparisons.
a random coil on average (Table 5.5). However there are also some inter-residue
distances in all three proteins that are, on average, longer than is expected for
a random coil, suggesting the presence of non-random residual structure.
In Section 5.6.1, the residual structure suggested by the DC map is com-
pared for the αS ensemble produced here and the previously published ensemble.
The analysis is then extended to βS and β+HC. The structural propensities of
the C- and N-termini of all three polypeptides inferred from the DC maps are
compared with qualitative reference to experimental data and predicted quanti-
ties. The dihedral angle preferences, including differences in the conformational
propensities of different sections of the protein are then investigated by gener-
ating separate Ramachandran plots for each section. A quantitative assessment
of the agreement of back-calculated observables with the experimental data is
made for αS and βS to investigate the correspondence between the local struc-
tural properties of the calculated ensembles and that observed experimentally.
Finally, free energy maps are used to examine aspects of the global structural
properties of the three proteins.
96
5.6.1 Comparison of the re-calculated and previously pub-
lished αS ensembles
The ensemble of αS structures produced here is expected to differ slightly from
the previously published ensemble due to the inclusion of additional PRE dis-
tance restraints and the larger⟨R−1
h
⟩−1. In accordance with the more ex-
panded structures present in the new ensemble, the DC values are greater overall
(Figure 5.6 C). The smallest DC values are for inter-residue distances between
residues around 120 and the first 40 residues of the protein, thus the tertiary
structure exhibited by this ensemble is broadly similar to that of the original
ensemble.
The location of the lowest DC values is in keeping with the results of Bernado
et al.266, who selected structures containing particular sets of contacts from an
ensemble of structures created using a dihedral angle database with excluded
volume and found that the experimental RDCs for αS are best reproduced when
only structures containing contacts between residues 6 − 10 and 136 − 140 are
considered. In contrast, the use of RCP maps to analyse the original ensemble
suggested an increased RCP between the C-terminus and the NAC region of
the original ensemble, which was credited with protecting the NAC region from
aggregation205. The presence of interactions of this type is supported by the
experimental PRE-NMR data of Bertoncini et al.240 and Sung and Eliezer159.
As outlined in Section 5.5.1, this apparent discrepancy can be reconciled by
considering the different sensitivities of the two analysis methods. DC values
report on the location of the centre of the distribution, whereas the RCP and
the experimental PRE distances are most sensitive to the shortest distances.
The definition of contact formation used by Bernado et al. is less stringent than
the definition of RCP: a contact between two regions of the polypeptide is said
to occur if the Cβ atoms of two residues are separated by less than 15 A266.
Therefore, measures that are most sensitive to the shortest distances highlight
preferential contact formation between the C-terminus and the NAC region of
αS, whereas when larger distances are considered, the biggest difference between
the calculated ensemble and the random coil ensemble pertains to interactions
between the C- and N-termini. Whilst it is interesting to know how often two
residues come close enough together to interact specifically, this information
is already contained within the PRE-NMR data, whereas the DC maps pro-
vide additional information about the remainder of the distribution that is not
available experimentally.
97
5.6.2 Long-range structure of βS and β+HC
In both βS and β+HC the distances between residues separated by more than
40− 50 residues are all significantly shorter than the random coil, especially for
distances between the first 40 residues and residues 80 − 145 in β+HC (Fig-
ure 5.6 D and E). This is in keeping with the experimental data shown in
Figures 5.2, 5.3 and 5.4, but contrasts with the experimental results of Sung
and Eliezer159 and Bertoncini et al.158, who find that βS exhibits fewer long-
range interactions than αS. It is not clear why this discrepancy exists, as the
experiments were all conducted in similar conditions. The regions exhibiting
the shortest inter-residue distances are shifted slightly towards the N-terminus
compared to αS, so that in βS, the shortest distances are between residues 1−40
of the N-terminus and residues 80− 120 of the C-terminus, and in β+HC, they
are between residues 1 − 40 of the N-terminus and residues 80 − 145 of the
C-terminus. This may provide additional protection to the central region, in
keeping with the lower aggregation propensity of both of these polypeptides.
For β+HC, the scaled long-range distances are shorter than in either αS or βS,
reflecting its larger compaction factor (Table 5.5).
5.6.3 Structural propensities of the C-terminus
All three proteins, and in particular βS, exhibit distances between residues
within the C-termini that are larger on average than in a random coil (Fig-
ure 5.6 C-E). This can be interpreted as either extended β or PPII structure,
both of which are characterised by rms inter-residue distances longer than those
of a random flight chain259. For βS, PPII structure is the most likely, as the C-
terminus of βS contains 8 proline residues, which are known to disrupt β-sheet
formation, and PPII structure has been observed experimentally158. The exper-
imental data for αS, in contrast, suggest a much lower PPII propensity109,158,
thus the DC values greater than 1.0 in the C-terminus of this protein are more
likely to correspond to extended β-like structure. There is less experimental
data available for β+HC, but the cross-peaks in the HSQC spectra overlay with
those of βS for the majority of the sequence, and with those of αS for the
inserted hydrophobic core region145, indicating that the secondary structural
preferences of the C-terminus are likely to be similar to those of βS. Interest-
ingly, the C-terminus of β+HC does not contain as many DC values greater
than 1.0 as βS, suggesting that the insertion of the αS hydrophobic core may
have an indirect effect on the structural propensities of the C-terminus.
98
5.6.4 Structural propensities of the N-terminus
Within the N-termini of all three proteins there are clusters of residues close
together in sequence separated by distances that are, on average, similar to
in a random coil. Such DC values could result from either random coil or α-
helical structure, as the expected inter-residue distances are the same for short
sequence separations259. The helical propensity predicted using agadir361–364
shows a series of regions within the N-termini that are mildly prone to form
helical structure (Figure 5.7 A-C). None of these regions correspond precisely
to the areas where DC ∼ 1.0 in the DC maps, however. Additionally, the helical
propensity is lowest for β+HC, whereas the DC maps suggest that this protein
has the largest amount of N-terminal residual helical structure.
Further evidence for the presence of local α-helical structure in the N-termini
is provided by the results of NMR experiments. The ∆δ 109,158,159 and 3J-
couplings109,158 for the N-termini of both αS and βS reveal a propensity to-
wards helical structure in the solution state. The helical propensity of β+HC
is likely to be similar, as its cross-peaks overlay those of βS for the N-terminal
72 residues, and those of αS for residues 73− 83145. Given that the N-termini
of both αS and βS become helical upon binding to lipid membranes114–120, it
appears that the lipid-bound structure of αS and βS may be encoded in their
solution state ensembles. It is unlikely that, in solution, the N-termini fluctuate
between fully formed α-helix and completely random coil, as this would be ex-
pected to be detected experimentally. More probably, short sections of α-helix
form transiently in the solution state ensemble, and the preferential binding of
lipids to the helical form results in a shift in the equilibrium upon the addition
of lipids.
The transient nature of the residual helical structure is in keeping with the
non-negative RDCs observed for the N-terminus158,159. It also explains why
some of the details of the lipid-bound structures, such as the break in heli-
cal structure around residue 40114–120, are not observed in the solution-state
ensembles. Distinguishing helical and non-helical structure is complicated by
the similarity between the predicted distances for a random coil and for an α-
helix259. One aspect of the lipid-bound structures that is present, however, is
the longer α-helical region in αS caused by the extra 11 residues relative to βS.
The insertion of these residues into βS to form β+HC extends the unitary DC
values to residue ∼ 90, whereas lower DC values indicative of local compaction
occur around residue 70 in βS. This is in keeping with the termination of helical
structure around residue 65 observed experimentally for βS in solution159.
Other details of the solution state ensembles observed experimentally are
99
not clearly delineated in the DC maps. According to the ∆δ, residues 6− 37 of
αS have the greatest helical propensity in solution109, whereas in βS, there are
two distinct regions of higher helical propensity in the N-terminus, comprising
residues 20 − 35 and 55 − 65158. The helical propensity of the central portion
of the polypeptide chain in solution is therefore higher for βS158, but these
differences cannot be seen in the DC maps.
0 20 40 60 80 100 120 1400
1
2
3
4
Hel
ical
Co
nte
nt
0 20 40 60 80 100 120 140-6-4-2024
0 20 40 60 80 100 1200
1
2
3
4
Hel
ical
Co
nte
nt
0 20 40 60 80 100 120-6-4-2024
0 20 40 60 80 100 120 140Residue Number
0
1
2
3
4
Hel
ical
Co
nte
nt
0 20 40 60 80 100 120 140Residue Number
-6-4-2024
Zag
gp
rof
Zag
gp
rof
Zag
gp
rof
A
B
C F
E
D
Figure 5.7: (A-C) Helical propensity predicted using agadir for (A) αS, (B), βS
and (C) β+HC. In (A), the black line corresponds to the experimental conditions
of Morar et al.234 and the red line to the conditions of Binolfi et al.235. The
predictions shown in (B) and (C) were made using the experimental conditions
of Bertoncini et al.158 and R.C. Rivers145, respectively. (D-F) Aggregation
propensity, Zprofagg , predicted using the Zyggregator algorithm366 for (D) αS,
(E), βS and (F) β+HC.
5.6.5 Dihedral angle preferences
For all three proteins, the Ramachandran plots averaged over all residues (Fig-
ure 5.8 A-C) show that PPII structure is the most common, followed by α-helix
and lastly β structure. β+HC exhibits more β structure than the other two
proteins, and βS exhibits the most PPII structure. For αS, in particular, there
is also a small probability of sampling positive φ angles, which is unusual and
does not correspond to any common secondary structural motif. When only the
N-termini of the proteins are considered, the Ramachandran plots are essentially
identical to those for the entire sequence (not shown), most likely because these
regions (residues ∼ 1 − 100, see definitions in caption of Figure 5.8) contain a
100
large fraction of the entire sequence. The Ramachandran plots for the C-termini,
however, are different to the overall and N-termini plots (Figure 5.8 D-F). There
is a reduction in the α-helical propensity, especially for αS and βS, which results
in an increase in β structure for αS, and an increase in PPII for the other two
proteins, particularly for βS. This is in agreement with the prediction based on
the experimental data and the DC maps that the C-termini of all three proteins
have a lower α-helical propensity and an increased propensity to form PPII and
β-structure. Thus, despite being averaged over many different types of residue,
the Ramachandran plots show that there are differences in the dihedral angle
distributions between the N- and C-termini, which must be indirectly encoded in
the PRE distance restraints. The overall propensity towards helical structure,
whether α-helical or PPII, may however be due to the sasa implicit solvent
model, which is known to favour helical structures.
Figure 5.8: Ramachandran plots showing the dihedral angle distributions p(φ, ψ)
for (A,D) αS, (B,E) βS and (C,F) β+HC. In (A-C) the probability of each
combination of φ and ψ dihedral angles is the average over all residues and
all structures whereas for (D-F) only the C-termini (residues 103− 140 for αS,
98 − 134 for βS and 103 − 145 for β+HC) were considered. The same scale is
used for all plots to facilitate comparisons.
5.6.6 Comparison with experimental data
The most stringent test of how well PRE-ERMD reproduces the ‘true’ ensemble
of structures is a quantitative comparison with experimental data. The agree-
101
ment with the free PRE distances is almost as good as the satisfaction of the
restraints (Tables 5.2, 5.3 and 5.4), indicating that the ensemble-averaged long-
range structure is in reasonable agreement with that observed experimentally.
Little other quantitative data is available for β+HC, but the 3JHNHα-couplings
and RDCs for αS and βS were obtained from C. Bertoncini158,240 for comparison
with the back-calculated values.
The calculated 3JHNHα-couplings for αS and βS are around 5 Hz through-
out the sequence, the upper boundary of the range of values expected for helical
structure (Figure 5.9 A and B). The difference between the N- and C-termini ob-
served in the Ramachandran plots is not evident. Moreover, the agreement with
the experimental 3JHNHα-couplings is poor. The experimental couplings are in
general larger than the calculated couplings and lie in the range expected for
random coil structure. There is also more fluctuation in the measured 3JHNHα-
couplings along the sequence, suggestive of local residue-specific conformational
preferences that are not reproduced in the calculated ensembles. This is not
surprising given that inter-residue distance restraints only contain local con-
formational information when they involve residues close together in sequence,
but the amide peaks of residues proximal to the spin-label are often broad-
ened beyond detection in PRE-NMR experiments, meaning that no distance
restraint can be obtained. The similarity between the 3JHNHα-couplings of the
PRE-restrained αS ensemble and those of the unrestrained ensemble (αSASA)
analysed in Chapter 3 provides further evidence that PRE distance restraints
are not sufficient to alter the dihedral angle preferences encoded in the force-field
and implicit solvent model. The simplest way to improve the 3JHNHα-couplings
of the calculated ensembles is to restrain them directly. This is not possible
using the current PRE-ERMD methodology, however, due to the absence of Hα
atoms in the charmm19 representation, and the inability of the implicit solvent
models parameterised for use with all-atom representations, such as GB/SA, to
produce converged ensembles of sufficiently expanded structures.
Comparison of the calculated and experimental RDCs was expected to prove
particularly interesting, as deviations of the experimental RDCs from a uniform
distribution in the C-termini of αS and βS have been interpreted as suggesting
the presence of specific structural preferences in the C-termini that are different
for αS and βS158 or as merely the product of preferential alignment due to the
extended nature of this region159. The RDCs for αS in a variety of media all
exhibit two regions in the C-terminus with RDCs of greater magnitude than
the remainder of the sequence, separated by near-zero RDCs around residue
102
122159,240. The location of the break in the RDC pattern is especially intrigu-
ing in light of the localisation of the shortest DC values around residue 120.
Other experimental data for the lipid-bound state (in which the C-terminus re-
mains disordered) are also consistent with some sort of structural perturbation
in this region. The paramagnetic broadening induced by an aqueous spin-label
indicates that any residual structure in the C-terminus might be divided into
two segments, one on either side of position 122117. Furthermore, the Cα ∆δ in
this region114, although largely indicative of random coil, show two regions of
similar shifts on either side of position 122, and the dynamics data (R1, R2 and
nOes) for the lipid-bound state117 and, in one case, the free state402 suggest a
slightly lower mobility at position 122 than on either side. For the free state,
at least, this lower mobility may be due to the residual interactions with the
N-terminus.
The RDCs back-calculated from the αS ensemble produced here, however,
do not exhibit the distinct peaks in the C-terminus (Figure 5.9 C). For the re-
mainder of the sequence, the magnitude of the calculated RDCs, but not the
residue-specific pattern, is similar to that of the experimental RDCs measured
in Pf1 bacteriophage alignment media, other than around residue 60, where they
are more like those measured in C5E8/octanol. A similar situation occurs for
the RDCs calculated from the βS ensemble: the magnitude of the calculated
and experimental RDCs in the N-terminus are similar but the calculated RDCs
do not correspond to the experimental RDCs recorded for the C-terminus (Fig-
ure 5.9 D). Insufficient averaging, as discussed in Chapter 3, is unlikely to be
the explanation for the discrepancies, as the number of structures used, 57 600,
is enough for the calculated RDCs to have converged. Additionally, the effect
of increasing the number of structures is to reduce the amount of variation in
the RDCs along the sequence, whereas the opposite would be required to repro-
duce the experimental data. Despite the inability of the calculated ensembles
to reproduce precisely the local structure that perturbs the experimental data
from the random coil expectations, the generally larger RDCs in the C-termini
of both proteins is in keeping with the extended nature of this region invoked
from analysis of the DC maps, supporting the conclusion of Sung and Eliezer
that this may be the major factor contributing to the RDCs observed for αS
and βS159.
5.6.7 Free energy maps
A more global perspective on the nature of the structures sampled by each
of the three proteins can be gained by examining the free energy landscapes
103
0 20 40 60 80 100 120 1404
5
6
7
8
9
0 20 40 60 80 100 1204
5
6
7
8
9
0 20 40 60 80 100 120 140Residue Number
-5
0
5
10
RD
C (
Hz)
0 20 40 60 80 100 120Residue Number
-5
0
5
10
RD
C (
Hz)
3 J HN
Hα (
Hz)
3 J HN
Hα (
Hz)
A
C
B
D
Figure 5.9: (A,B) 3JHNHα-couplings and (C,D) RDCs for (A,C) αS and (B,D)
βS. The 3JHNHα-couplings and RDCs back-calculated from the PRE-ERMD
ensembles are in black and the experimental data are in red. In (C) and (D),
the RDCs obtained in C8E5/octanol are in red and those measured in Pf1
bacteriophage are in green (C only). The grey lines at 0 Hz in (C) and (D) are
to guide the eye.
(Figure 5.10). The greatest differences between the three proteins occur for
F (Rg,SASA), with βS exhibiting the narrowest range of SASA and β+HC the
widest. Interestingly, this pattern reflects the relationship between the Cf of
the three proteins (Table 5.5), so that a low Cf corresponds to a narrow range
of SASA. In all cases, the structures with the lowest Rg encompass a wide
range of SASA; similarly, there are a large range of Rg corresponding to the
largest SASA. Thus having a small Rg poses few restrictions on the fraction
of the surface area that is exposed. This may facilitate the role of αS as a
hub protein65, as a larger surface area allows for a diverse range of binding
partners59. The greater similarity between the F (Rg, SASA) landscapes of αS
and β+HC suggests that the insertion of the central NAC region into β+HC
causes it to behave more like αS in this respect.
The three proteins are not so easily distinguished in terms of F (Rg, REE).
Analysis of the relationship between Rg and REE by linear regression (data not
shown) indicates that for Rg up to ∼ 35− 40 A the corresponding REE is lower
than is expected for a purely random coil15, whereas above this it is higher. The
shorter than expected REE of the more compact structures is in keeping with
the tendency towards contact formation between the N- and C-termini noted
previously in the DC maps and experimental data266.
104
Figure 5.10: Free energy landscapes of the (A,D) αS, (B,E) βS and (C,F)
β+HC ensembles. The free energy is defined as (A-C) F (Rg, SASA) =
− ln p(Rg,SASA) and (D-F) F (Rg, REE) = − ln p(Rg, REE), where REE is the
end-to-end distance.
5.7 Implications for aggregation
The construction and study of β+HC was initiated with the aim of understand-
ing why the predicted aggregation propensity and measured aggregation rate of
βS are lower than those of αS145. Fibril formation by βS requires the presence
of metals146 or sub-critical micelle concentrations of SDS (0.5 mM), conditions
which also increase the aggregation rate of αS403,404. The PRE distances were
collected without metals or SDS present, however, thus the ensembles calcu-
lated here can only provide insight into the aggregation properties of the three
proteins in the absence of these components. This is unlikely to be critical in
the case of βS, at least with respect to SDS, as the⟨R−1
h
⟩−1is the same in
the presence and absence of 0.5 mM SDS145,158 (Table 5.5) and the intensity
ratios measured in SDS with a spin-label at position 42 are not significantly
different to those obtained in solution145. The induction of βS aggregation by
SDS is therefore most likely due to increases in the local protein concentration
rather than any induced structural changes. αS, in comparison, becomes con-
siderably more compact in 0.5 mM SDS (Table 5.5), thus structural changes
cannot be ruled out. The ensemble calculated here, however, remains relevant
for understanding the initiation of fibril formation, as αS does not require SDS to
stimulate aggregation. The induction of aggregation by metals for both proteins
105
is thought to be due to neutralisation of the negatively-charged C-termini159,
which is discussed further below. For β+HC, interpretation of the results ob-
tained here is more complicated: the Rh was only determined in the presence of
SDS, whereas the PRE-NMR measurements were made without SDS present. If,
like αS, β+HC collapses in the presence of SDS, the ensemble created here may
be too compact. However based on the comparison of the previously published
and re-calculated αS ensembles, the greatest change to the residual structure is
likely to be an overall decrease in the DC values rather than alteration of the
specific structural propensities.
It is generally accepted that the cause of the different aggregation propensi-
ties of αS and βS is the absence of 11 residues (73−83) from the NAC region of
βS147,155. Contrary to the original expectations, β+HC, which contains residues
73−83 of αS within the βS sequence following residue 72, was found to have sim-
ilar aggregation properties to βS145. Further investigations, including analysis
of the aggregation properties of two deletion mutants, α∆73-83 and α∆71-82,
showed that the most likely reason for the similar aggregation behaviour of βS
and β+HC is the inclusion of E83 in the β+HC construct145. This negatively
charged residue is thought to disrupt the inter-molecular interactions of the
hydrophobic core and may therefore act as an aggregation ‘gatekeeper’155,156.
Additionally, the incorporation of charged residues into the hydrophobic core of
full-length αS decreases the rate of fibril formation155,156. This suggests that the
lower experimental and theoretical aggregation propensities of βS and β+HC,
both of which have a greater net charge than αS, may be due to inter-molecular
repulsion between charged residues.
The role of charge in preventing aggregation is not confined to the inter-
molecular interactions. Whilst any contacts made by the C-terminus with the
NAC region are thought to be hydrophobic in nature, interactions with the
N-terminus are most likely electrostatic. The increased negative charge of the
C-termini of βS and β+HC may therefore enhance these intramolecular electro-
static interactions. Indeed, comparison of the DC maps shows that the scaled
distances between the N- and C-termini of βS and β+HC are shorter than those
of αS (Figure 5.6 B-D). Additionally, the predicted aggregation propensity of the
C-termini of βS and β+HC is even lower than that of αS (Figure 5.7 D-F). The
importance of electrostatic interactions between the N- and C-termini in deter-
mining the aggregation properties is supported by most experimental data, other
than one recent study which failed to find any evidence for perturbation of in-
tramolecular interactions by polycation binding to αS159. In agreement with the
conjectures presented here, C-terminal truncation mutants of αS only aggregate
106
faster than wild-type151 if the truncation removes the majority of the charged
residues from the C-terminus. Additionally, the binding of positively-charged
polyamines such as spermine to the C-terminus increases the aggregation rates
of βS in SDS and αS without SDS145,151,154,405,406. Neutralisation of the excess
negative charge of αS at low pH also increases the aggregation rate407. Thus
features apparent in the PRE-ERMD ensembles correlate with the experimental
data and provide further support for the suggestion that charge plays a key role
in controlling the aggregation propensities of the synucleins.
5.8 Conclusions
The use of PRE-ERMD augments the information available from experimen-
tal data by providing atomic-level structural detail. DC maps were developed
to characterise the structures produced using PRE-ERMD by comparing the
ensemble-averaged rms CαCα distances to the expected inter-residue distances
for a random coil. Analysis of these maps shows that the distances between the
N- and C-termini of all three proteins are shorter than is expected for a purely
random coil, indicative of interactions between the two regions that may be elec-
trostatic in nature266. Both the DC maps and the Ramachandran plots reveal a
tendency towards α-helical propensity in the N-terminus of all three proteins, in
keeping with the experimental data and suggesting that the lipid-bound struc-
ture of αS and βS is encoded in their solution-state conformational preferences.
The C-termini of βS and, to a lesser extent β+HC have a tendency to form
PPII structure, whereas the C-terminus of αS is more disordered. Whilst such
qualitative features agree well with the available experimental data, quantita-
tive assessment of the agreement with the experimental 3JHNHα-couplings shows
it to be poor, suggesting that although the gross tertiary structure implied by
the PRE distances is reproduced by the calculated ensembles, the addition of
PRE distance restraints is not sufficient to affect the description of the local
structure provided by the force-field and implicit solvent, which, in this case, is
not compatible with that observed experimentally. The back-calculated RDCs
also failed to exhibit all of the features present in the experimental data. In-
terestingly, however, the larger RDCs calculated for the C-termini of αS and
βS are in keeping with an interpretation of the experimental data in which it
is the more extended nature of the C-termini that causes the increased RDCs
measured for this region rather than residual contact formation159.
The main structural effect of inserting the hydrophobic core of αS into βS is
an extension of the N-terminal helical propensity to include the inserted residues.
107
This appears to weaken the PPII propensity of the C-terminus, making β+HC
more like αS in this respect and suggesting that the structure of the C-terminus
is affected by the remainder of the sequence. Other than this, the resemblance
between the structural propensities of β+HC and βS echoes their similar ag-
gregation propensities. The main difference between these two proteins and
αS likely to be related to aggregation is the greater number of inter-residue
distances between the N- and C-termini that are shorter than expected for a
random coil. As interactions between the N- and C-termini are expected to be
electrostatic in nature, this strengthens the case for charge playing a key role in
determining the aggregation properties of these polypeptides.
108
Chapter 6
Characterisation of the
acid-denatured state of
PI3-SH3
6.1 Introduction
In Chapter 5, the generalised PRE-ERMD method developed in Chapter 4 was
used to generate ensembles of structures for the IDPs αS and βS and a related
construct, β+HC in order to rationalise their relative aggregation propensities.
For these proteins, there is no folded structure to refer to, and aggregation can
proceed directly from the disordered NS. In contrast, it is generally agreed that
NFPs must unfold prior to aggregation3,16–20. Characterising the unfolded and
partially folded states of NFPs may therefore shed light on how both folding to
the NS and mis-folding and aggregation into various oligomeric species is initi-
ated. Understanding how the balance between these two processes is controlled
is of critical importance given that protein aggregation is involved in an increas-
ingly large number of diseases3. The atomic-level structural detail provided by
simulation may help to elucidate the mechanism of aggregation at the molecular
level.
This chapter describes the application of the general PRE-ERMD method
to the acid-denatured state of the PI3-SH3 domain (SH3-AS), which is known
to be the precursor to amyloid fibril formation47–51. Prior to carrying out the
PRE-ERMD, the possibility of explaining the experimental PRE-NMR data for
SH3-AS in terms of various combinations of native and random coil structure
109
is explored. The experimental data is then treated according to the principles
outlined in Chapters 4 and 5 and an ensemble of structures representative of
SH3-AS is generated (SH3-PRE). To complement this ensemble, a coil library
ensemble, comprising structures generated using a self-avoiding statistical coil
model based on backbone conformational preferences from coil regions of pro-
teins in the PDB265, was obtained from A. Jha (SH3-CLIB). The global dimen-
sions and residual structure of these ensembles are probed using the methods
developed in the previous chapters of this thesis, with comparison to the native
fold at neutral pH (SH3-NS, PDB code 1pnj, Figure 6.2 A)41 and a random
coil model (SH3-RC) obtained as described in Chapter 5. The TS ensembles
of three related SH3 domains are also included in the analysis to assist the in-
terpretation of the DC maps for SH3-PRE. A quantitative assessment of the
agreement with the experimental RDCs is made and the free energy maps are
examined. Finally, the implications of the structural propensities identified in
this study for the aggregation of PI3-SH3 are discussed.
6.2 Experimental PRE-NMR data implies non-
native structure
An issue that is widely debated in the context of the presence of residual struc-
ture in DS is whether any non-random structure implied by the experimental
observables is due to a small fraction of highly structured molecules amongst a
largely unstructured ensemble, or results from an ensemble of partially-structured
polypeptides170,176,199,202,204,205,222,224,232,233,236–244,255,257–259,408,409. In the case
of NFPs, the native fold provides a potential candidate for the structured state
in the first scenario. Indeed, experimental data for DS of several NFPs sug-
gests the presence of native-like residual structure170,222,236,238,244,408,409. To
test whether the observed Iox/Ired for SH3-AS can be explained by the presence
of a small number of natively folded molecules amid an essentially random coil
ensemble, the expected Iox/Ired were computed for SH3-NS and SH3-RC and
combined in varying proportions.
Overall, the Iox/Ired measured for SH3-AS are mostly lower than those pre-
dicted for SH3-RC (Figure 6.1), reflecting the high degree of compaction of
SH3-AS (see Table 6.2 and Section 6.4). The large variation in the Iox/Ired pre-
dicted for SH3-NS is due to the well-defined tertiary structure, which results in
some residues that are distant in sequence being located close to the spin-label.
Although there are some areas where the experimental Iox/Ired for SH3-AS are
similar to those predicted for SH3-NS, such as around residues 60− 70 for L13
110
and L42, and residues 80− 86 for L13, L26 and L42, there is little resemblance
between the Iox/Ired of SH3-NS and those of SH3-AS overall.
The Iox/Ired expected if 1, 10 or 50% of the ensemble exists in the native
fold and the remainder is purely random coil also bear little similarity with the
experimental data for SH3-AS. It is obvious from the systematic behaviour of
the composite Iox/Ired as the contribution of the SH3-NS Iox/Ired is increased
that combinations of the SH3-NS and SH3-RC Iox/Ired other than the fractions
considered here would also fail to explain the experimental data. It seems
unlikely, therefore, that the Iox/Ired observed experimentally for SH3-AS are
due to the presence of a sub-population of natively folded protein molecules
within a random coil ensemble. Whilst it remains unclear whether the observed
Iox/Ired arise from a small fraction of the proteins exhibiting structure other
than the native fold or an ensemble of partially-structured molecules, it seems
certain that any structure that does occur is non-native in nature, and the low
Iox/Ired suggests that short inter-residue distances are relatively frequent.
0
0.2
0.4
0.6
0.8
1M3 S4 L13 L26 L42
0 20 40 60 800
0.2
0.4
0.6
0.8
1S45
0 20 40 60 80
E54
0 20 40 60 80Residue Number
E63
0 20 40 60 80
G80
0 20 40 60 80
P86
I ox/I re
dI o
x/I red
Figure 6.1: The distribution of Iox/Ired along the sequence for each spin-label
position as indicated. The experimental data for SH3-AS is shown as black bars
and the thick lines correspond to the Iox/Ired calculated from SH3-RC (green)
and SH3-NS (red). The thin red lines correspond to 1% SH3-NS, 99% SH3-RC
(solid), 10% SH3-NS, 90% SH3-RC (dashed) and 50% SH3-NS, 50% SH3-RC
(dotted). The experimental Iox/Ired shown for SH3-AS are those processed for
use in the simulations (see Section 6.3), thus any Iox/Ired < 0.15 or > 0.85 have
been set to 0.15 and 0.85, respectively. If no bar is present, then either Iox/Ired
was not measured for this residue or it was discarded due to error > 10%.
111
6.3 Choice of optimal T for characterisation by
PRE-ERMD
PRE-NMR was carried out on SH3-AS with the MTSL spin-label attached in-
dependently in 10 different positions distributed throughout the sequence (Sec-
tion 2.2). The experimental data were treated in the manner introduced in
Chapter 5. Any Iox/Ired with greater than 10% error were discarded. For
the distances calculated from the remaining 639 Iox/Ired, those calculated from
Iox/Ired < 0.15 and Iox/Ired > 0.85 were assigned only an upper or lower bound,
respectively. The working dataset comprised 80% of the PRE distances, with
the remaining 20% used for independent cross-validation.
The PRE-ERMD simulations were run using the general method developed
in Chapter 4. QRh was used to assess how well the global size of the molecules
is reproduced and as the primary determinant of the optimal T . A relationship
between R−1g and R−1
h ,
R−1h = 0.0227 + 0.405R−1
g , (6.1)
derived by K. Lindorff-Larsen in the manner described in Chapter 5, was used
to convert the Rg of each structure in the ensembles generated by PRE-ERMD
into an Rh. The⟨R−1
h
⟩−1was then computed according to equation 2.9.
In a similar manner to the results recorded in Chapter 5 for αS, βS and
β+HC, the⟨R−1
h
⟩−1decreases with T , thus QRh also decreases until the cal-
culated⟨R−1
h
⟩−1matches the experimental
⟨R−1
h
⟩−1and then increases again
(Table 6.1). It was therefore straightforward to locate the optimal T of 445 K
(Table 6.1). An additional 57 600 structures were collected at this T and sub-
jected to further analysis.
6.4 Global dimensions
The⟨R−1
h
⟩−1of the final ensemble generated using PRE-ERMD (SH3-PRE)
is in good agreement with the experimental value (Table 6.1), as is expected
given that reproduction of the experimental⟨R−1
h
⟩−1is a fundamental criterion
in the choice of the optimal simulation conditions. The previously published
Rh (∼ 24.3 A)48 is larger than that used here (21.2 A) because the construct
used in that study included an additional 4 amino acids at the C-terminus.
Interestingly, SH3-AS is almost as compact as the folded SH3-NS structure
present at neutral pH (Table 6.2). Neither of these states are as compact as is
predicted for a NFP of this size. For SH3-NS, this may result from the long
112
Table 6.1: The Q values quantify how well the experimental⟨R−1
h
⟩−1(21.2 A)
and the PRE distances for PI3-SH3 are reproduced by varying T with Nrep = 24,
L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE
distances) and QfPRE to the free dataset (remaining 20%). The results for the
most representative ensemble collected at the optimal T are in bold type.
T (K)⟨R−1
h
⟩−1(A) QRh Qw Qf
400 19.5 0.078 0.18 0.22
425 20.3 0.040 0.18 0.21
445 21.1 0.004 0.19 0.20
450 21.5 0.013 0.19 0.21
475 22.2 0.047 0.20 0.21
500 22.9 0.078 0.21 0.20
525 23.2 0.095 0.22 0.21
550 23.5 0.108 0.22 0.21
575 23.7 0.118 0.23 0.21
600 23.8 0.124 0.23 0.22
625 24.0 0.133 0.23 0.22
650 24.1 0.135 0.24 0.22
675 24.2 0.143 0.24 0.22
700 25.1 0.184 0.29 0.27
RT loop extending outwards from the remainder of the protein, which is more
structured. In contrast to SH3-PRE, the⟨R−1
h
⟩−1of SH3-CLIB (26.8 A) is
more like that expected for a random coil, indicating that the compactness of
SH3-AS cannot be explained by dihedral angle preferences alone.
The Rg distributions of SH3-PRE and SH3-CLIB were compared with that
of SH3-RC. The Rg distribution of SH3-CLIB is very similar to that of SH3-RC
(Figure 6.2 B). Although SH3-PRE encompasses a range of Rg, the distribution
is not as wide as that of SH3-RC. The difference between the Rg distribu-
tions of SH3-PRE and SH3-RC is more noticeable than for αS, βS and β+HC
(Figure 5.5), reflecting the greater relative compactness of SH3-AS (Tables 5.5
and 6.2).
113
Table 6.2: Predicted1 and experimental2 Rh (in A) and compaction factors, C3f
for PI3-SH3 in various states.
U F pH 7.4 pH 2.0 3.5 M GndHCl
Rh 28.0 17.3 19.5 21.2 28.0
Cf 0.793 0.634 0.000
1. U and F refer to the Rh predicted for an unfolded or folded polypeptide according to
equations 2.23 and 2.24161, respectively.
2. measured by PFG-NMR on 0.5−1.0 mM PI3-SH3 in D2O adjusted to pH 7.4 with 2HCl at
293 K48, 100 µM PI3-SH3, 1.25 µM DSS, 10 mM HCl in 10% D2O at pH 2.0 and 298 K410,
and 0.5− 1.0 mM PI3-SH3 in D2O with 3.5 M GndHCl at pH 7.4 and 293 K48.
3. calculated according to equation 2.25161.
Figure 6.2: (A) The native fold of PI3-SH3 determined by NMR (PDB code
1pnj)41, showing the RT loop across the top of the β-barrel and the n-Src loop
on the left-hand side in green. The colour ranges from red at the N-terminus
to blue at the C-terminus. The view shown here was prepared with vmd375.
(B) Rg probability distribution for SH3-RC (black), SH3-PRE (red) and SH3-
CLIB (green). The Rg distributions are plotted rather than the Rh distributions
because the former are faster to calculate, but the Rh distributions are similar.
6.5 Residual structure
To characterise the residual structure of SH3-AS, the same analysis methods
were used as for the IDPs αS, βS and β+HC (Chapter 5), namely DC maps, Ra-
machandran plots, free energy maps and both qualitative and quantitative com-
parisons with experimental NMR data. Recently, a large amount of NMR data
describing the structure and dynamics of PI3-AS at both 298410 and 308 K411
has become available. The R1 and R2 relaxation rates and the PRE-NMR data
114
are similar at both temperatures, whereas the secondary structure propensities
derived from the ∆δ are quite different. The data obtained at 298 K is most rel-
evant for comparisons with the calculated ensemble, as this is the temperature
at which the PRE-NMR experiments were carried out. In addition to the helical
content and aggregation propensity, the SASA and hydrophobicity throughout
the sequence were also calculated. In comparison to the IDPs, for which the
structural propensities were mostly inferred with respect to the random coil
model, the existence of a native fold for PI3-SH3 provided a further reference
state.
6.5.1 Comparison of the native and acid-denatured states
DC maps were produced for SH3-NS41, SH3-PRE and SH3-CLIB (Figure 6.3).
SH3-NS encompasses a wider range of DC values than SH3-PRE and SH3-
CLIB. This is an artifact of generating a DC map from a single structure, which
means that no averaging takes place. The distinct areas of low DC values
perpendicular to the main diagonal in the SH3-NS DC map correspond to anti-
parallel contacts between the β-strands forming the β-barrel (Figure 6.2 A).
Low DC values in other regions are due to contacts between parts of the RT
and n-Src loops and the β-strands. The DC map for SH3-PRE is more diffuse,
as is expected given that the 1H-15N HSQC spectrum for SH3-AS, to which the
restraints pertain, suggests that it is largely unfolded410,411, despite being very
compact (Table 6.2). The DC map for SH3-CLIB implies an even greater degree
of unfolding however; the DC values are greater than 1.0 throughout most of the
sequence, indicating that the rms inter-residue distances are even larger than in
the random coil model used to compute the DC values. It is not surprising that
the rms distances of SH3-CLIB are longer than those of SH3-PRE, as the latter
ensemble is much more compact. Other than rms distances slightly shorter
than in a random coil in the C-terminus, the DC map of SH3-CLIB has few
distinguishing features, indicating that implementing residue-specific dihedral
angle preferences is not sufficient to induce global order. Consideration of the
dihedral angle preferences of SH3-CLIB displayed in the Ramachandran plot
(Figure 6.4 C) along with the DC map, it appears that the structures comprising
SH3-CLIB are predominantly composed of PPII structure, consistent with the
highly expanded nature of this ensemble.
In comparing the DC map for SH3-PRE with that of SH3-NS, it is not
possible to distinguish whether the lack of the distinctive patterns representative
of the native tertiary structural motifs arises from conformational averaging
camouflaging the presence of a few native-like conformations or is due to residual
115
Figure 6.3: DC maps for (A) SH3-NS (PDB code 1pnj)41, (B) SH3-PRE and
(C) SH3-CLIB. The very wide range of distances present in SH3-NS compared
to SH3-PRE and SH3-CLIB means that a different scale is required for the latter
two ensembles.
Figure 6.4: Ramachandran plots showing the dihedral angle distributions p(φ, ψ)
for (A,B) SH3-PRE and (C) SH3-CLIB. In (A,C) the probability of each com-
bination of φ and ψ dihedral angles is the average over all residues and all
structures whereas for (B) only residues 3-23 are considered. The same scale is
used for all plots to facilitate comparisons.
structure that is predominantly non-native in nature. Less extreme examples
of the first situation pertinent to the study of PI3-SH3 are provided by the
folding TS ensembles46 of three other SH3 domains, those from c-src (PDB
code 1fmk)412, Fyn (PDB code 1shf)413 and α-spectrin (PDB code 1bk2)414.
These ensembles were shown to be native-like on average despite substantial
local variability. Whilst these three SH3 domains are considerably smaller than
PI3-SH3, which has a longer n-Src loop, comparison of the TS and NS DC maps
provides a useful illustration of the relationship between an ensemble-averaged
DC map and that of a single structure in the case where the ensemble exhibits
native-like structure.
It is clear from Figure 6.5 that according to the DC maps, all three TS
ensembles bear much more resemblance to the DC maps of the corresponding
NS than SH3-PRE does to SH3-NS. Not only do the TS ensembles encompass
116
the same range of DC values as the NS, but the structural elements visible in
the NS DC maps are only slightly less distinct in the TS DC maps. It seems
unlikely, therefore, that the residual structure observed in the DC map of SH3-
PRE is native-like, as the DC map would then be expected to be much more
like that of SH3-NS.
Figure 6.5: DC maps for (A-C) the NS and (D-F) the TS ensembles obtained
from K. Lindorff-Larsen46 of the (A,D) α-spectrin (PDB code 1bk2)414, (B,E)
c-src (1fmk)412 and (C,F) Fyn (1shf)413 SH3 domains. Only the DC maps for
the TS ensembles generated at 500 K are shown; the 640 K ensembles are very
similar.
6.5.2 Structural propensities of the acid-denatured state
Having eliminated the possibility that the residual structure of SH3-PRE and,
by analogy, SH3-AS, is native-like, it remains to characterise the nature of the
structures comprising SH3-PRE, including a qualitative comparison with the
experimental data for SH3-AS.
N-terminus
The inter-residue distances for residues 5 − 23 of SH3-PRE are mostly of
similar size to those expected for a random coil, which is consistent with either
α-helical or random coil structure259. This region is predicted to have a high
helical propensity by agadir361–364, particularly at acid pH (Figure 6.6 A).
Additionally, the ∆δ of SH3-AS at 298 K correspond to a propensity for α-
117
helical structure410, although those measured at 308 K do not411. The negative
RDCs measured for this region have also been interpreted in terms of helical
structure410. Comparison of the Ramachandran plot for residues 3 − 23 of
SH3-PRE with that generated for the entire sequence shows a slight increase
in the population of both the α-helical and PPII regions (Figure 6.4 A and B).
Together, these data justify an interpretation of the N-terminal region of the
SH3-PRE DC map in terms of helical structure.
Experimental data that report on dynamics, including R1 and R2 relaxation
rates and RDCs (Figure 6.7 B), suggest an increased stiffness of residues 3−23 in
SH3-AS at both temperatures410,411. Additionally, the Iox/Ired obtained from
PRE-NMR are consistently high in the N-terminus, particularly for residues
1−7 and 14−23, regardless of the position of the spin-label (Figure 6.1). These
regions may therefore be extended away from the remainder of the protein in
SH3-AS, although the ensemble-averaged SASA of this region is only slightly
higher than for the remainder of the sequence (Figure 6.6 B).
Both the experimental data for SH3-AS and the DC map of SH3-PRE sug-
gest that the N-terminus has a tendency to form α-helical and perhaps also
PPII structure, and is relatively extended compared to the remainder of the
polypeptide chain. These structural tendencies are clearly non-native, as in
SH3-NS, residues 7− 13 form the first β-strand, and residues 13− 30 comprise
the RT loop. Interestingly, however, the bend in the RT loop around residue
21 appears to be retained in SH3-AS, as can be seen by the low DC values of
SH3-PRE between residues 15−20 and 20−25. This region has been implicated
in aggregation (see Section 6.6), despite its hydrophilicity and low aggregation
propensity (Figure 6.6 C and D).
C-terminus
The experimental data for the C-terminus of SH3-AS are less consistent in
terms of the implied structural propensities. The Cα and Hα secondary shifts
measured at 298 K indicate a slight helical propensity for residues 60 − 64410;
correspondingly, residues 61− 68 have a high predicted helical propensity (Fig-
ure 6.6 A). On the other hand, residues 35 − 41 and 72 − 79, which are also
predicted to be helical, exhibit ∆δ more in keeping with extended or β-sheet
structure, as does most of the sequence from residue 23 onwards410.
Of the observables that report on dynamics, the RDCs for residues 23− 86
are all positive and vary little in magnitude410 (Figure 6.7 B). The R2 relaxation
rates, however, are slightly larger than average for residues 55− 60 and 75− 77
at 298 K410 and residues 51−63 and 72−78 at 308 K411. These two regions are
118
0 20 40 60 800
2
4
6
8
10H
elic
al C
onte
nt
0 20 40 60 800
50
100
150
200
250
SA
SA
(Å
2 )
0
50
100
150
200
250
0 20 40 60 80Residue Number
-4
-3
-2
-1
0
1
2
K-D
Hyd
roph
obic
ity
-4
-3
-2
-1
0
1
2
0 20 40 60 80Residue Number
-4
-3
-2
-1
0
1
2
-4
-2
0
2
Zag
gpr
of
A
C
B
D
Figure 6.6: (A) Helical propensity predicted using agadir361–364 for PI3-SH3
at pH 6.0 (black) and pH 2.0 (red). (B) SASA of SH3-NS (PDB code 1pnj41,
black) and SH3-PRE (red). (C) the KD hydrophobicity profile365 of PI3-SH3,
smoothed over an 11-residue window. Positive values correspond to hydrophobic
regions. (D) The aggregation propensity profile, Zprofagg , calculated using the
Zyggregator algorithm366 for PI3-SH3 at pH 6.0 (black) and pH 2.0 (red). Zprofagg
values greater than 1 indicate regions that are aggregation prone.
hereafter referred to as ‘Reg1’ and ‘Reg2’. Because the same pattern also oc-
curs in the R1 relaxation rates and heteronuclear nOes, it probably results from
reduced mobility of Reg1 and Reg2 rather than conformational exchange411.
This restricted motion is retained at 308 K despite the apparent lack of sec-
ondary structural preferences411. Residues within Reg1 and Reg2 exhibit lower
Iox/Ired in the PRE-NMR profiles for spin-labels situated in the N-terminus.
Correspondingly, the DC map of SH3-PRE shows that the distances between
these two regions and the N-terminus are shorter than expected for a purely
random coil (Figure 6.3 B). Thus Reg1 and Reg2, which may be slightly stiffer
due to their restricted motion, preferentially form intramolecular contacts with
the N-terminus. These interactions may be hydrophobic in nature, as the hy-
drophobicity profile shows that Reg1 and Reg2 are slightly more hydrophobic
than the surrounding residues (Figure 6.6 C).
As well as isolating regions with reduced mobility, measurement of the R1
and R2 relaxation rates also allows the most flexible sections of SH3-AS to
be identified. The data recorded at both 298 and 308 K show that residues
27 − 47, which act as a bridge between the extended N-terminus and the pre-
dominantly unstructured C-terminus, are more mobile than the remainder of
the protein. The relative orientation of the N- and C-termini is therefore likely
119
to undergo frequent rearrangements. This explains the large number of residue
pairs involving the N- and C-terminus that have low DC values in SH3-PRE,
as steric considerations make it unlikely that they could all interact simultane-
ously. Higher mobility of this region also corresponds to the finding that the
first site of proteolysis of PI3-SH3 at pH 2.0 at both 295 − 7 and 308 K is the
peptide bond between residues 39 and 4052. Interestingly, residues in this re-
gion form a short helical turn in the n-Src loop of SH3-NS, and there is some
evidence in the SH3-PRE DC map that turn-like structure may be retained in
SH3-AS, as indicated by the relatively short distances between neighbouring
residues (Figure 6.3 B). The Iox/Ired for residues 45 − 50 tend to be relatively
large for all spin-label positions (Figure 6.1)411, suggesting that this region is
often located distant to the remainder of the structure and so permitting its
observed mobility.
Summary of the residual structure
Overall, interpretation of the experimental data in combination with analysis
of SH3-NS, SH3-PRE and SH3-CLIB leads to a picture of SH3-AS as possessing
a mostly disordered C-terminus, an extended N-terminus with some residual
helical propensity, and a flexible region comprising residues 27 − 47 that is
susceptible to proteolysis. The occurrence of the lowest DC values for SH3-
PRE for interactions between the C-terminus and residues 5− 25 suggests that
in SH3-AS the extended N-terminus folds back against the C-terminus. All of
these secondary and tertiary structural propensities are non-native in nature,
which has important implications for the initiation of aggregation. These are
discussed in more detail in Section 6.6.
6.5.3 Comparison with experimental data
The ultimate test of the quality of the ensembles discussed here is whether they
are capable of reproducing independent experimental observables. 3J-couplings
have not been measured for PI3-SH3, thus only the RDCs were considered. The
RDCs calculated from SH3-NS are similar to the experimental values for the
C-terminus, but fail to reproduce the experimental RDCs for the N-terminus,
suggesting that a description of the NS in terms of a single structure is not
sufficient (Figure 6.7 A). The RDCs for SH3-AS are of lower magnitude than
those of SH3-NS. They are all of the same sign (Figure 6.7 B) other than for
residues 3 − 23. The negative RDCs in this region have been interpreted as
corresponding to helical structure, as noted in section 6.5.2. When urea is
added to SH3-AS (SH3-ASU), the negative RDCs become positive, indicating
120
that this residual structure is lost under chemical denaturation.
The RDCs calculated from SH3-CLIB fluctuate greatly from residue to
residue, but fail to reproduce the experimental data for either SH3-AS or SH3-
ASU (Figure 6.7 D). As only 5000 structures were available, lack of convergence
of the calculated RDCs cannot be ruled out as an explanation for this discrep-
ancy. The RDCs calculated from SH3-PRE are all of much lower magnitude
than the experimental RDCs for either SH3-AS or SH3-ASU (Figure 6.7 C).
Whilst there are some negative values, only one is situated within the proposed
helical region. Sufficient structures were used for the calculated RDCs to be
converged, meaning that the poor agreement cannot be attributed to statistical
error. Thus despite the residual structure identified in the DC maps, the local
structure of SH3-PRE remains different to that present experimentally.
0 20 40 60 80-30
-20
-10
0
10
20
RD
C (
Hz)
0 20 40 60 80-30
-20
-10
0
10
20
RD
C (
Hz)
0 20 40 60 80Residue Number
-6
-3
0
3
6
9
RD
C (
Hz)
0 20 40 60 80Residue Number
-6
-3
0
3
6
9
RD
C (
Hz)
A
C
B
D
Figure 6.7: Comparison of the experimental and calculated RDCs for PI3-SH3 in
a variety of states. (A) RDCs for SH3-NS obtained experimentally in stretched
polyacrylamide gels at pH 7.0410 are shown in black and those calculated using
pales190 from the NMR structure (PDB code 1pnj)41 are in red. (B) Experi-
mental RDCs for SH3-NS (pH 7.0, black) SH3-AS (pH 2.0, red) and SH3-ASU
(pH 2.0, 7.3 M urea, green) obtained as in (A). (C,D) RDCs calculated from (C)
SH3-PRE and (D) SH3-CLIB are shown in black and the experimental RDCs
measured for SH3-AS and SH3-ASU as described in (B) are in red and green,
respectively. The grey lines at 0 Hz are to guide the eye.
6.5.4 Free energy maps
The free energy maps provide a more global picture of the nature of the struc-
tures comprising the various ensembles. As in Chapters 4 and 5, F (Rg,SASA)
121
and F (Rg, REE) are considered. SH3-PRE and SH3-CLIB are very different in
terms of both definitions of the free energy (Figure 6.8). As expected given
its lower⟨R−1
h
⟩−1, lower Rg values are highly populated by SH3-PRE. This
ensemble displays a wider range of SASA than SH3-CLIB, further confirming
the observation made in Chapter 5 that a large SASA can coincide with a low
Rg. In the case of PI3-SH3, this large accessible surface area may contribute to
its high aggregation propensity rather than towards productive inter-molecular
interactions, as was proposed to occur for αS.
The distribution of F (Rg, REE) for SH3-CLIB is similar to that seen for the
synucleins (Figure 5.10). For SH3-PRE, however, a wide range of REE corre-
spond to the smallest Rg, indicating that even for the most compact structures,
the termini are likely to be highly disordered.
The free energy maps reinforce the conclusion that SH3-AS does not include
native-like structures. The Rg, SASA and REE of SH3-NS (12.8 A, 5781 A2 and
21.8 A, respectively) lie outside of the regions pictured in Figure 6.8 for either
SH3-PRE or SH3-CLIB, indicating that neither of these ensembles contains
structures that resemble SH3-NS in terms of these global parameters.
Figure 6.8: Free energy landscapes of (A,B) SH3-PRE and (C,D) SH3-CLIB.
The free energy is defined as (A,C) F (Rg, SASA) = − ln p(Rg,SASA) and (B,D)
F (Rg, REE) = − ln p(Rg, REE), where REE is the end-to-end distance.
6.6 Implications for aggregation
The motivation for generating and characterising SH3-PRE was to better un-
derstand the factors that stimulate the aggregation of PI3-SH3, and, in a more
general sense, the conversion of NFPs into amyloid fibrils. It appears likely
that unfolding prior to amyloid formation is a general requirement for the mis-
folding of NFPs, as at least partial unfolding prior to fibril formation is re-
quired for many proteins35,415, most obviously those that are predominantly
α-helical39,416. PI3-SH3 fits into this scheme, as acid-denaturation is required
to stimulate its conversion into amyloid fibrils and both the experimental mea-
122
surements made on SH3-AS and the representative ensemble characterised here
(SH3-PRE) demonstrate that SH3-AS is unfolded relative to SH3-NS. This re-
flects the need for rearrangement of the native structure into the fibril struc-
ture50 and the inability of SH3-NS to form fibrils directly411.
It has been suggested that amyloid fibril formation is initiated from par-
tially folded rather than completely unfolded states3,47,411,417. The presence
of non-native structure in SH3-AS, as suggested by the experimental data and
corroborated by analysis of the ensemble generated here using PRE-ERMD, is
consistent with such a scenario. Two other proteins that aggregate at acidic
pH (4− 5), transthyretin and β2-microglobulin, are also known to adopt partly
structured conformations under these conditions418,419.
The aggregation behaviour of PI3-SH3 can be rationalised by considering
its pH dependence in concert with its residual structure at low pH. Whilst
PI3-SH3 initially gains two positive charges as the pH is lowered below 3.0,
further reduction in the pH is unlikely to result in any additional changes to the
ionisation state of the protein48. Instead, the increased concentration of anions
provided by the agent (such as HCl) used to lower the pH are thought to screen
the positive charges, thus reducing the electrostatic repulsion and favouring
compaction and aggregation. The absence of two conserved basic residues from
the diverging turn of PI3-SH3 and the majority of the other SH3 domains known
to aggregate55 may also contribute to this effect.
Other residues known to play a critical role in the aggregation of PI3-SH3
are the charged residues in the RT loop and diverging turn (17 − 25)51,54 and
Y5549. These are not identified as being aggregation prone by the Zyggregator
algorithm366 (Figure 6.6 C and D) as it is most sensitive to hydrophobic re-
gions. The probability of inter-molecular associations involving residues 17-25
is enhanced by the neutralisation of the negatively charged residues (E19, E21,
E22, D23 and D25) at low pH and the reduced repulsion between the positively
charged residues due to anionic screening coupled with the extended nature of
the N-terminus in SH3-AS, which may increase the exposure of this region. Sim-
ilarly, Y55 is located in the flexible central region, and is therefore also likely
to be accessible. The tendency for the N-terminus to fold back against the
C-terminus may help to maintain it in an aggregation-competent conformation.
6.7 Conclusions
PRE-ERMD was used to characterise the acid-denatured state of the NFP PI3-
SH3 with the aim of understanding the causes of the high aggregation propensity
123
of this state. Analysis of the expected Iox/Ired for combinations of SH3-NS and
SH3-RC suggest that the experimental data cannot be explained purely in terms
of native-like and random coil structure. Comparison of the TS ensembles of
three related SH3 domains with their respective native structures further con-
firmed that the residual structure observed for SH3-AS is not native-like. An
ensemble of structures generated using a coil library database to describe the di-
hedral angle preferences also failed to explain both the global and local structure
of SH3-AS. Characterisation of the ensemble of structures representative of the
acid-denatured, amyloidogenic state of PI3-SH3 generated using PRE-ERMD
provided insight into the nature of the structural propensities of SH3-AS, which
for the most part coincide with the residual structure suggested by the ex-
perimental data. Although the quantitative agreement with the experimental
RDCs is not good, this is not unexpected, as the PRE distances do not offer
any means of altering the description of the local structural preferences provided
by the force-field, which is clearly not accurate in the case of DS. The residual
structure identified for SH3-PRE allowed the mechanism by which charge af-
fects the aggregation properties of SH3-AS to be elucidated. Like αS, the key
determinant of PI3-SH3 aggregation is not the most obvious difference between
it and other related but non-amyloidogenic proteins, but the result of a more
subtle interplay between charge and environment.
124
Chapter 7
Conclusions
Interest in characterising DS of proteins has recently piqued as a result of the
escalating amounts of high resolution structural data made available by develop-
ments in techniques such as NMR spectroscopy. Understanding DS is important
as this is the reference state from which both folding and mis-folding are ini-
tiated. Additionally, an increasing number of proteins are being shown to be
natively disordered. DS typically comprise a heterogeneous range of conforma-
tions, thus they cannot be described in terms of a single structure, making an
ensemble representation essential. Experimental observables, however, are al-
most always averages over the duration of the experiment and the ensemble of
molecules present. In order to define a DS ensemble it is necessary to know the
distribution of values underlying each experimental observable. Biomolecular
simulation can therefore complement experimental measurements as it allows
the nature of the structures comprising the ensemble and their relative popula-
tions to be determined. The aims of this thesis were to develop the best possible
method for generating ensembles of structures characteristic of DS of proteins
and to apply this method to gain insight into the factors that govern the balance
between folding, mis-folding and intrinsic disorder.
In Chapter 3, the ability of a range of simulation techniques to produce en-
sembles of structures representative of DS of proteins was assessed using the
IDP αS as a model system. It was found that generating sufficiently expanded
structures poses a significant difficulty. A solution was identified whereby al-
tering T provides a means of controlling the range of accessible conformations
and their global dimensions. However even when the global dimensions match
those measured experimentally, other experimental observables that report on
both local and long-range structure are not well reproduced. This led to the
investigation in Chapter 4 of the use of long-range distances derived from PRE-
125
NMR experiments as restraints in ERMD simulations. The use of synthetic
data back-calculated from two reference ensembles of αS structures allowed the
effectiveness of the methods that were tested to be assessed in terms of their
ability to reproduce distributions as well as averages. This showed that obtain-
ing a good agreement of average values, particularly highly non-linear averages
such as the r−6-averaged PRE distances, is not sufficient to determine whether
an ensemble has been accurately reconstructed. Cross-validation against more
than one type of average was therefore introduced. It also emerged that the
compaction problem encountered in Chapter 3 is exacerbated by the r−6 nature
of the PRE distances. Again, manipulating T provides a means of overcom-
ing this problem. In a further change to the previously published method, the
tolerance to variation in the back-calculated, ensemble-averaged PRE distances
at each point in time was altered so as to account for the typical relationship
between an r−6 average and the underlying distribution.
The general method resulting from the work summarised in Chapter 4 was
applied in Chapters 5 and 6 to characterise the IDPs αS, βS and the related
artificial construct β+HC and the acid-denatured state of the NFP PI3-SH3.
As part of this work, the sources of uncertainty in the distances calculated
from experimental PRE data were thoroughly investigated and combined with
a modified definition of a ‘PRE’ distance developed in Chapter 4. Analysis of
the ensembles produced using PRE-ERMD revealed that although global prop-
erties such as the Rh and PRE distances match their experimental counterparts,
the quantitative agreement with observables that report on local structure such
as 3J-couplings and RDCs is not as good. The long-range PRE distances are
therefore not capable of altering the local structural properties encoded in the
force-field and implicit solvent models, which, at least in the situations explored
here, do not provide a good description of the local structure of DS. Coil li-
brary ensembles, widely touted in the literature as good models for DS, were
also found to be unsuitable. However despite the poor results of the quanti-
tative comparisons with experimental data, the DC maps, developed as part
of this work, portray a significant amount of residual structure, much of which
is in keeping with the structural propensities inferred from the experimental
data. Further scrutiny of this residual structure revealed the important role
that charge plays in determining the aggregation properties of the proteins con-
sidered here, allowing the differing aggregation propensities of αS and βS to
be rationalised and explaining why acid-denaturation is required to stimulate
amyloid fibril formation by PI3-SH3.
In summary, existing methods for using PRE distances as restraints in
126
ERMD were improved upon so that an ensemble could be accurately recon-
structed in terms of distributions as well as averages. Application of the new,
general method to a family of IDPs and the unfolded state of a NFP corrobo-
rated the presence of residual structure implied by the experimental data and
provided new insight into the relationship between the structural and aggrega-
tion propensities of these proteins. In the future, inclusion of local structural
information as well as long-range distance restraints has the potential to further
enhance this technique.
127
References
1. Vendruscolo, M., Zurdo, J., MacPhee, C.E. & Dobson, C.M. Protein fold-
ing and misfolding: a paradigm of self-assembly and regulation in complex
biological systems. Philos. Transact. A Math Phys. Eng. Sci. 361, 1205–22
(2003).
2. Dobson, C.M., Sali, A. & Karplus, M. Protein folding: a perspective from
theory and experiment. Angew. Chem. Int. Ed. 37, 868–93 (1998).
3. Dobson, C.M. Protein folding and misfolding. Nature 426, 884–90 (2003).
4. Wright, P.E. & Dyson, H.J. Intrinsically unstructured proteins: re-
assessing the protein structure-function paradigm. J. Mol. Biol. 293, 321–
31 (1999).
5. James, L.C. & Tawfik, D.S. Conformational diversity and protein evolution
- a 60-year-old hypothesis revisited. Trends Biochem. Sci. 28, 361–8 (2003).
6. Uversky, V.N. Protein folding revisited. A polypeptide chain at the folding
misfolding nonfolding cross-roads: which way to go? Cell. Mol. Life Sci.
60, 1852–71 (2004).
7. Sickmeier, M., Hamilton, J.A., LeGall, T., Vacic, V., Cortese, M.S., Tan-
tos, A., Szabo, B., Tompa, P., Chen, J., Uversky, V.N., Obradovic, Z. &
Dunker, A.K. DisProt: the database of disordered proteins. Nucl. Acids
Res. 35, D786–93 (2007).
8. Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. & Jones, D.T. Pre-
diction and functional analysis of native disorder in proteins from the three
kingdoms of life. J. Mol. Biol. 337, 635–45 (2004).
9. Hegyi, H. & Gerstein, M. The relationship between protein structure and
function: a comprehensive survey with application to the yeast genome.
J. Mol. Biol. 288, 147–64 (1999).
128
10. Uversky, V.N. Natively unfolded proteins: a point where biology waits for
physics. Protein Sci. 11, 739–56 (2002).
11. Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M. & Obradovic,
Z. Intrinsic disorder and protein function. Biochemistry 41, 6573–82
(2002).
12. Dill, K.A. & Shortle, D. Denatured states of proteins. Annu. Rev. Biochem.
60, 795–825 (1991).
13. Baldwin, R.L. A new perspective on unfolded proteins. Adv. Prot. Chem.
62, 361–7 (2002).
14. Levinthal, C. Are there pathways for protein folding? J. Chim. Phys. 65,
44 (1968).
15. Goldenberg, D.P. Computational simulation of the statistical properties
of unfolded proteins. J. Mol. Biol. 326, 1615–33 (2003).
16. Bennett, M., Schlunegger, M. & Eisenberg, D. 3D Domain swapping: a
mechanism for oligomer assembly. Protein Sci. 4, 2455–68 (1995).
17. Schlunegger, M., Bennett, M. & Eisenberg, D. Oligomer formation by 3D
domain swapping: a model for protein assembly and misassembly. Adv.
Prot. Chem. 50, 61–122 (1997).
18. Bucciantini, M., Giannoni, E., Chiti, F., Baroni, F., Formigli, L., Zurdo,
J., Taddei, N., Ramponi, G., Dobson, C.M. & Stefani, M. Inherent toxicity
of aggregates implies a common mechanism for protein misfolding diseases.
Nature 416, 507–11 (2002).
19. Lashuel, H.A., Hartley, D., Petre, B.M., Walz, T. & Lansbury, P.T. Neu-
rodegenerative disease: amyloid pores from pathogenic mutations. Nature
418, 291 (2002).
20. Lashuel, H.A., Hartley, D.M., Petre, B.M., Wall, J.S., Simon, M.N., Walz,
T. & Lansbury, P.T. Mixtures of wild-type and a pathogenic (E22G) form
of Aβ40 in vitro accumulate protofibrils, including amyloid pores. J. Mol.
Biol. 332, 795–808 (2003).
21. Fersht, A.R. & Daggett, V. Protein folding and unfolding at atomic reso-
lution. Cell 108, 573–82 (2002).
22. Daggett, V. & Fersht, A.R. Is there a unifying mechanism for protein
folding? Trends Biochem. Sci. 28, 18–25 (2003).
129
23. Daggett, V. & Fersht, A. The present view of the mechanism of protein
folding. Nat. Rev. Mol. Cell Biol. 4, 497–502 (2003).
24. Shakhnovich, E., Abkevich, V. & Ptitsyn, O. Conserved residues and the
mechanism of protein folding. Nature 379, 96–8 (1996).
25. Vendruscolo, M., Paci, E., Dobson, C.M. & Karplus, M. Three key residues
form a critical contact network in a protein folding transition state. Nature
409, 641–5 (2001).
26. Fersht, A.R. Transition-state structure as a unifying basis in protein-
folding mechanisms: contact order, chain topology, stability, and the ex-
tended nucleus mechanism. Proc. Natl. Acad. Sci. U. S. A. 97, 1525–9
(2000).
27. Dinner, A.R., Sali, A., Smith, L.J., Dobson, C.M. & Karplus, M. Under-
standing protein folding via free-energy surfaces from theory and experi-
ment. Trends Biochem. Sci. 25, 331–9 (2000).
28. Ozkan, S.B., Wu, G.A., Chodera, J.D. & Dill, K.A. Protein folding by
zipping and assembly. Proc. Natl. Acad. Sci. U. S. A. 104, 11987–92
(2007).
29. Bryngelson, J.D., Onuchic, J.N., Socci, N.D. & Wolynes, P.G. Funnels,
pathways, and the energy landscape of protein folding: a synthesis. Pro-
teins: Struct. Funct. Genet. 21, 167–95 (1995).
30. Vendruscolo, M., Paci, E., Karplus, M. & Dobson, C.M. Structures and
relative free energies of partially folded states of proteins. Proc. Natl. Acad.
Sci. U. S. A. 100, 14817–21 (2003).
31. Choy, W.Y. & Forman-Kay, J.D. Calculation of ensembles of structures
representing the unfolded state of an SH3 domain. J. Mol. Biol. 308,
1011–32 (2001).
32. Roder, H. & Colon, W. Kinetic role of early intermediates in protein
folding. Curr. Opin. Struct. Biol. 7, 15–28 (1997).
33. Dobson, C.M. The structural basis of protein folding and its links with
human disease. Philos. Trans. R. Soc. Lond. B Biol. Sci. 356, 133–145
(2001).
34. Horwich, A. Protein aggregation in disease: a role for folding intermediates
forming specific multimeric interactions. J. Clin. Invest. 110, 1221–32
(2002).
130
35. Kelly, J.W. The alternative conformations of amyloidogenic proteins and
their multi-step assembly pathways. Curr. Opin. Struct. Biol. 8, 101–6
(1998).
36. Caughey, B. & Lansbury, P.T. Protofibrils, pores, fibrils, and neurodegen-
eration: separating the responsible protein aggregates from the innocent
bystanders. Annu. Rev. Neurosci. 26, 267–98 (2003).
37. Bucciantini, M., Calloni, G., Chiti, F., Formigli, L., Nosi, D., Dobson,
C.M. & Stefani, M. Prefibrillar amyloid protein aggregates share common
features of cytotoxicity. J. Biol. Chem. 279, 31374–82 (2004).
38. Walsh, D.M., Klyubin, I., Fadeeva, J.V., Cullen, W.K., Anwyl, R., Wolfe,
M.S., Rowan, M.J. & Selkoe, D.J. Naturally secreted oligomers of amyloid-
β protein potently inhibit hippocampal long-term potentiation in vivo.
Nature 416, 535–9 (2002).
39. Sunde, M. & Blake, C. The structure of amyloid fibrils by electron mi-
croscopy and X-ray diffraction. Adv. Prot. Chem. 50, 123–59 (1997).
40. Dobson, C.M. Protein-misfolding diseases: getting out of shape. Nature
418, 729–30 (2002).
41. Booker, G.W., Gout, I., Downing, A.K., Driscoll, P.C., Boyd, J., Water-
field, M.D. & Campbell, I.D. Solution structure and ligand-binding site
of the SH3 domain of the p85 α subunit of phosphatidylinositol 3-kinase.
Cell 73, 813–22 (1993).
42. Morton, C.J. & Campbell, I.D. SH3 domains. Molecular ‘velcro’. Curr.
Biol. 4, 615–7 (1994).
43. Musacchio, A., Wilmanns, M. & Saraste, M. Structure and function of the
SH3 domain. Prog. Biophys. Mol. Biol. 61, 283–97 (1994).
44. Pawson, T. & Gish, G.D. SH2 and SH3 domains: from structure to func-
tion. Cell 71, 359–62 (1992).
45. Guijarro, J.I., Morton, C.J., Plaxco, K.W., Campbell, I.D. & Dobson,
C.M. Folding kinetics of the SH3 domain of PI3 kinase by real-time NMR
combined with optical spectroscopy. J. Mol. Biol. 276, 657–67 (1998).
46. Lindorff-Larsen, K., Vendruscolo, M., Paci, E. & Dobson, C.M. Transition
states for protein folding have native topologies despite high structural
variability. Nat. Struct. Mol. Biol. 11, 443–9 (2004).
131
47. Guijarro, J.I., Sunde, M., Jones, J.A., Campbell, I.D. & Dobson, C.M.
Amyloid fibril formation by an SH3 domain. Proc. Natl. Acad. Sci. U. S.
A. 95, 4224–8 (1998).
48. Zurdo, J., Guijarro, J.I., Jimenez, J.L., Saibil, H.R. & Dobson, C.M. De-
pendence on solution conditions of aggregation and amyloid formation by
an SH3 domain. J. Mol. Biol. 311, 325–40 (2001).
49. Bader, R., Bamford, R., Zurdo, J., Luisi, B.F. & Dobson, C.M. Probing
the mechanism of amyloidogenesis through a tandem repeat of the PI3-
SH3 domain suggests a generic model for protein aggregation and fibril
formation. J. Mol. Biol. 356, 189–208 (2006).
50. Jimenez, J.L., Guijarro, J.n., Orlova, E., Zurdo, J., Dobson, C.M., Sunde,
M. & Saibil, H.R. Cryo-electron microscopy structure of an SH3 amyloid
fibril and model of the molecular packing. EMBO J. 18, 81521 (1999).
51. Ventura, S., Zurdo, J., Narayanan, S., Parreno, M., Mangues, R., Reif, B.,
Chiti, F., Giannoni, E., Dobson, C.M., Aviles, F.X. & Serrano, L. Short
amino acid stretches can mediate amyloid formation in globular proteins:
the Src homology 3 (SH3) case. Proc. Natl. Acad. Sci. U. S. A. 101,
7258–63 (2004).
52. Polverino de Laureto, P., Taddei, N., Frare, E., Capanni, C., Costantini, S.,
Zurdo, J., Chiti, F., Dobson, C.M. & Fontana, A. Protein aggregation and
amyloid fibril formation by an SH3 domain probed by limited proteolysis.
J. Mol. Biol. 334, 129–41 (2003).
53. Monera, O.D., Kay, C.M. & Hodges, R.S. Protein denaturation with guani-
dine hydrochloride or urea provides a different estimate of stability de-
pending on the contributions of electrostatic interactions. Protein Sci 3,
1984–91 (1994).
54. Ventura, S., Lacroix, E. & Serrano, L. Insights into the origin of the
tendency of the PI3-SH3 domain to form amyloid fibrils. J. Mol. Biol.
322, 1147–58 (2002).
55. Liepina, I., Ventura, S., Czaplewski, C. & Liwo, A. Molecular dynamics
study of amyloid formation of two Abl-SH3 domain peptides. J. Peptide
Sci. 12, 780–9 (2006).
56. Martin-Garcia, J.M., Luque, I., Mateo, P.L., Ruiz-Sanz, J. & Camara-
Artigas, A. Crystallographic structure of the SH3 domain of the human
132
c-Yes tyrosine kinase: loop flexibility and amyloid aggregation. FEBS Lett.
581, 1701–6 (2007).
57. Carulla, N., Caddy, G.L., Hall, D.R., Zurdo, J., Gairi, M., Feliz, M., Giralt,
E., Robinson, C.V. & Dobson, C.M. Molecular recycling within amyloid
fibrils. Nature 436, 554–8 (2005).
58. Ding, F., Dokholyan, N.V., Buldyrev, S.V., Stanley, H.E. & Shakhnovich,
E.I. Molecular dynamics simulation of the SH3 domain aggregation sug-
gests a generic amyloidogenesis mechanism. J. Mol. Biol. 324, 851–7
(2002).
59. Gunasekaran, K., Tsai, C.J., Kumar, S., Zanuy, D. & Nussinov, R. Ex-
tended disordered proteins: targeting function with less scaffold. Trends
Biochem. Sci. 28, 81–5 (2003).
60. Dunker, A.K., Brown, C.J. & Obradovic, Z. Identification and functions
of usefully disordered proteins. Adv. Prot. Chem. 62, 25–49 (2002).
61. Dyson, H.J. & Wright, P.E. Coupling of folding and binding for unstruc-
tured proteins. Curr. Opin. Struct. Biol. 12, 54–60 (2002).
62. Dyson, H.J. & Wright, P.E. Intrinsically unstructured proteins and their
functions. Nat. Rev. Mol. Cell Biol. 6, 197–208 (2005).
63. Tompa, P. Intrinsically unstructured proteins. Trends Biochem. Sci. 27,
527–33 (2002).
64. Tompa, P. & Csermely, P. The role of structural disorder in the function
of RNA and protein chaperones. FASEB J. 18, 1169–75 (2004).
65. Dunker, A.K., Cortese, M.S., Romero, P., Iakoucheva, L.M. & Uversky,
V.N. Flexible nets. The roles of intrinsic disorder in protein interaction
networks. FEBS J. 272, 5129–48 (2005).
66. Tompa, P., Szasz, C. & Buday, L. Structural disorder throws new light on
moonlighting. Trends Biochem. Sci. 30, 484–9 (2005).
67. Uversky, V. A protein-chameleon: conformational plasticity of α-synuclein,
a disordered protein involved in neurodegenerative disorders. J. Biomol.
Struct. Dyn. 21, 211–34 (2003).
68. Dunker, A.K., Obradovic, Z., Romero, P., Garner, E.C. & Brown, C.J. In-
trinsic protein disorder in complete genomes. Genome Inform. Ser. Work-
shop Genome Inform. 11, 161–71 (2000).
133
69. Fink, A.L. Natively unfolded proteins. Curr. Opin. Struct. Biol. 15, 35–41
(2005).
70. Bracken, C., Iakoucheva, L.M., Romero, P.R. & Dunker, A.K. Combining
prediction, computation and experiment for the characterization of protein
disorder. Curr. Opin. Struct. Biol. 14, 570–6 (2004).
71. Tompa, P. Intrinsically unstructured proteins evolve by repeat expansion.
BioEssays 25, 847–55 (2003).
72. Tompa, P. The interplay between structure and function in intrinsically
unstructured proteins. FEBS Lett. 579, 3346–54 (2005).
73. Dunker, A.K., Lawson, J.D., Brown, C.J., Williams, R.M., Romero, P.,
Oh, J.S., Oldfield, C.J., Campen, A.M., Ratliff, C.M., Hipps, K.W., Ausio,
J., Nissen, M.S., Reeves, R., Kang, C., Kissinger, C.R., Bailey, R.W.,
Griswold, M.D., Chiu, W., Garner, E.C. & Obradovic, Z. Intrinsically
disordered protein. J. Mol. Graph. Model. 19, 26–59 (2001).
74. Iakoucheva, L.M., Brown, C.J., Lawson, J.D., Obradovic, Z. & Dunker,
A.K. Intrinsic disorder in cell-signaling and cancer-associated proteins. J.
Mol. Biol. 323, 573–84 (2002).
75. Uversky, V.N. What does it mean to be natively unfolded? Eur. J.
Biochem. 269, 2–12 (2002).
76. Demchenko, A.P. Recognition between flexible protein molecules: induced
and assisted folding. J. Mol. Recognit. 14, 42–61 (2001).
77. Namba, K. Roles of partly unfolded conformations in macromolecular
self-assembly. Genes to Cells 6, 1–12 (2001).
78. Romero, P., Obradovic, Z. & Dunker, K. Sequence data analysis for long
disordered regions prediction in the calcineurin family. Genome Inform.
Ser. Workshop Genome Inform. 8, 110–24 (1997).
79. Romero, P., Obradovic, Z., Kissinger, C., Villafranca, J., Garner, E., Guil-
liot, S. & Dunker, A. Thousands of proteins likely to have long disordered
regions. Pac. Symp. Biocomput. 3, 437–48 (1998).
80. Romero, P., Obradovic, Z., Li, X., Garner, E.C., Brown, C.J. & Dunker,
A.K. Sequence complexity of disordered protein. Proteins: Struct. Funct.
Genet. 42, 38–48 (2001).
134
81. Dunker, A., Garner, E., Guilliot, S., Romero, P., Albrecht, K., Hart, J.,
Obradovic, Z., Kissinger, C. & Villafranca, J. Protein disorder and the
evolution of molecular recognition: theory, predictions and observations.
Pac. Symp. Biocomput. 3, 473–84 (1998).
82. Romero, P., Obradovic, Z., Kissinger, C., Villafranca, J. & Dunker, A.
Identifying disordered regions in proteins from amino acid sequence. Proc.
Int. Conf. Neur. Net. 1, 90–5 (1997).
83. Uversky, V.N., Gillespie, J.R. & Fink, A.L. Why are “natively unfolded”
proteins unstructured under physiologic conditions? Proteins: Struct.
Funct. Genet. 41, 415–27 (2000).
84. Oldfield, C., Cheng, Y., Cortese, M., Brown, C., Uversky, V. & Dunker,
A. Comparing and combining predictors of mostly disordered proteins.
Biochemistry 44, 1989–2000 (2005).
85. Li, X., Romero, P., Rani, M., Dunker, A. & Obradovic, Z. Predicting
protein disorder for N-, C-, and internal regions. Genome Inform. Ser.
Workshop Genome Inform. 10, 30–40 (1999).
86. Obradovic, Z., Peng, K., Vucetic, S., Radivojac, P., Brown, C.J. & Dunker,
A.K. Predicting intrinsic disorder from amino acid sequence. Proteins:
Struct. Funct. Genet. 53, 566–72 (2003).
87. Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server
for the prediction of intrinsically unstructured regions of proteins based on
estimated energy content. Bioinformatics 21, 3433–4 (2005).
88. Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. The pairwise en-
ergy content estimated from amino acid composition discriminates between
folded and intrinsically unstructured proteins. J. Mol. Biol. 347, 827–39
(2005).
89. Peng, K., Radivojac, P., Vucetic, S., Dunker, A.K. & Obradovic, Z.
Length-dependent prediction of protein intrinsic disorder. BMC Bioin-
formatics 7, 208 (2006).
90. Tompa, P., Dosztanyi, Z. & Simon, I. Prevalent structural disorder in E.
coli and S. cerevisiae proteomes. J. Proteome Res. 5, 1996–2000 (2006).
91. Uversky, V. & Narizhneva, N. Effect of natural ligands on the structural
properties and conformational stability of proteins. Biochemistry (Mosc)
63, 420–33 (1998).
135
92. Ellis, R.J. Macromolecular crowding: obvious but underappreciated.
Trends Biochem. Sci. 26, 597–604 (2001).
93. Ellis, R.J. Macromolecular crowding: an important but neglected aspect of
the intracellular environment. Curr. Opin. Struct. Biol. 11, 114–9 (2001).
94. Flaugh, S.L. & Lumb, K.J. Effects of macromolecular crowding on the
intrinsically disordered proteins c-Fos and p27(Kip1). Biomacromolecules
2, 538–40 (2001).
95. Zurdo, J., Sanz, J., Gonzalez, C., Rico, M. & Ballesta, J. The exchangeable
yeast ribosomal acidic protein YP2β shows characteristics of a partly folded
state under physiological conditions. Biochemistry 36, 9625–35 (1997).
96. Dedmon, M.M., Patel, C.N., Young, G.B. & Pielak, G.J. FlgM gains
structure in living cells. Proc. Natl. Acad. Sci. U. S. A. 99, 12681–4
(2002).
97. Spolar, R. & Record, MT, J. Coupling of local folding to site-specific
binding of proteins to DNA. Science 263, 777–84 (1994).
98. Weiss, M.A., Ellenberger, T., Wobbe, C.R., Lee, J.P., Harrison, S.C. &
Struhl, K. Folding transition in the DMA-binding domain of GCN4 on
specific binding to DNA. Nature 347, 575–8 (1990).
99. Lacy, E.R., Filippov, I., Lewis, W.S., Otieno, S., Xiao, L., Weiss, S.,
Hengst, L. & Kriwacki, R.W. p27 binds cyclin-CDK complexes through
a sequential mechanism involving binding-induced protein folding. Nat.
Struct. Mol. Biol. 11, 358–64 (2004).
100. Fiebig, K.M., Rice, L.M., Pollock, E. & Brunger, A.T. Folding interme-
diates of SNARE complex assembly. Nat. Struct. Mol. Biol. 6, 117–23
(1999).
101. Magidovich, E., Orr, I., Fass, D., Abdu, U. & Yifrach, O. Intrinsic disor-
der in the C-terminal domain of the Shaker voltage-activated K+ channel
modulates its interaction with scaffold proteins. Proc. Natl. Acad. Sci. U.
S. A. 104, 13022–7 (2007).
102. Ahmed, M., Bamm, V., Harauz, G. & Ladizhansky, V. The BG21 isoform
of golli myelin basic protein is intrinsically disordered with a highly flexible
amino-terminal domain. Biochemistry 46, 9700–12 (2007).
136
103. Meszros, B., Tompa, P., Simon, I. & Dosztnyi, Z. Molecular principles of
the interactions of disordered proteins. J. Mol. Biol. 372, 549–61 (2007).
104. Dunker, A.K. & Obradovic, Z. The protein trinity - linking function and
disorder. Nat. Biotech. 19, 805–6 (2001).
105. Kriwacki, R.W., Hengst, L., Tennant, L., Reed, S.I. & Wright, P.E. Struc-
tural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2-bound state:
conformational disorder mediatesbindingdiversity. Proc. Natl. Acad. Sci.
U. S. A. 93, 11504–9 (1996).
106. Romero, P.R., Zaidi, S., Fang, Y.Y., Uversky, V.N., Radivojac, P., Old-
field, C.J., Cortese, M.S., Sickmeier, M., LeGall, T., Obradovic, Z. &
Dunker, A.K. Alternative splicing in concert with protein intrinsic disor-
der enables increased functional diversity in multicellular organisms. Proc.
Natl. Acad. Sci. U. S. A. 103, 8390–5 (2006).
107. Hilser, V.J. & Thompson, E.B. Intrinsic disorder as a mechanism to opti-
mize allosteric coupling in proteins. Proc. Natl. Acad. Sci. U. S. A. 104,
8311–5 (2007).
108. Weinreb, P., Zhen, W., Poon, A., Conway, K. & Lansbury, P. NACP, a
protein implicated in Alzheimer’s disease and learning, is natively unfolded.
Biochemistry 35, 13709–15 (1996).
109. Eliezer, D., Kutluay, E., Bussell, Jr, R. & Browne, G. Conformational
properties of α-synuclein in its free and lipid-associated states. J. Mol.
Biol. 307, 1061–73 (2001).
110. Uversky, V.N., Li, J., Souillac, P., Millett, I.S., Doniach, S., Jakes, R.,
Goedert, M. & Fink, A.L. Biophysical properties of the synucleins and
their propensities to fibrillate: inhibition of α-synuclein assembly by β-
and γ-synucleins. J. Biol. Chem. 277, 11970–8 (2002).
111. George, J. The synucleins. Genome Biol. 3, 3002.1–6 (2001).
112. Jakes, R., Spillantini, M.G. & Goedert, M. Identification of two distinct
synucleins from human brain. FEBS Lett. 345, 27–32 (1994).
113. Ueda, K., Fukushima, H., Masliah, E., Xia, Y., Iwai, A., Yoshimoto, M.,
Otero, D., Kondo, J., Ihara, Y. & Saitoh, T. Molecular cloning of cDNA en-
coding an unrecognized component of amyloid in Alzheimer disease. Proc.
Natl. Acad. Sci. U. S. A. 90, 11282–6 (1993).
137
114. Bussell, Jr, R. & Eliezer, D. A structural and functional role for 11-mer
repeats in α-synuclein and other exchangeable lipid binding proteins. J.
Mol. Biol. 329, 763–78 (2003).
115. Chandra, S., Chen, X., Rizo, J., Jahn, R. & Sudhof, T.C. A broken α-helix
in folded α-synuclein. J. Biol. Chem. 278, 15313–8 (2003).
116. Bisaglia, M., Tessari, I., Pinato, L., Bellanda, M., Giraudo, S., Fasano,
M., Bergantino, E., Bubacco, L. & Mammi, S. A topological model of
the interaction between α-synuclein and sodium dodecyl sulfate micelles.
Biochemistry 44, 329–39 (2005).
117. Bussell, R., J., Ramlall, T.F. & Eliezer, D. Helix periodicity, topology,
and dynamics of membrane-associated α-synuclein. Protein Sci. 14, 862–
72 (2005).
118. Ulmer, T.S., Bax, A., Cole, N.B. & Nussbaum, R.L. Structure and dy-
namics of micelle-bound human α-synuclein. J. Biol. Chem. 280, 9595–603
(2005).
119. Jao, C.C., Der-Sarkissian, A., Chen, J. & Langen, R. Structure of
membrane-bound α-synuclein studied by site-directed spin labeling. Proc.
Natl. Acad. Sci. U. S. A. 101, 8331–6 (2004).
120. Sung, Y.H. & Eliezer, D. Secondary structure and dynamics of micelle
bound β- and γ-synuclein. Protein Sci. 15, 1162–74 (2006).
121. Davidson, W.S., Jonas, A., Clayton, D.F. & George, J.M. Stabilization of
α-synuclein secondary structure upon binding to synthetic membranes. J.
Biol. Chem. 273, 9443–9 (1998).
122. Zhu, M., Li, J. & Fink, A.L. The association of α-synuclein with mem-
branes affects bilayer structure, stability, and fibril formation. J. Biol.
Chem. 278, 40186–97 (2003).
123. Cookson, M.R. The biochemistry of Parkinson’s disease. Annu. Rev.
Biochem. 74, 29–52 (2005).
124. Murphy, D.D., Rueter, S.M., Trojanowski, J.Q. & Lee, V.M.Y. Synucleins
are developmentally expressed, and α-synuclein regulates the size of the
presynaptic vesicular pool in primary hippocampal neurons. J. Neurosci.
20, 3214–20 (2000).
138
125. Narayanan, V. & Scarlata, S. Membrane binding and self-association of
α-synucleins. Biochemistry 40, 9927–34 (2001).
126. Clayton, D.F. & George, J.M. The synucleins: a family of proteins involved
in synaptic function, plasticity, neurodegeneration and disease. Trends
Neurosci. 21, 249–54 (1998).
127. Payton, J.E., Perrin, R.J., Woods, W.S. & George, J.M. Structural de-
terminants of PLD2 inhibition by α-synuclein. J. Mol. Biol. 337, 1001–9
(2004).
128. Jenco, J.M., Rawlingson, A., Daniels, B. & Morris, A.J. Regulation of
phospholipase D2: selective inhibition of mammalian phospholipase D
isoenzymes by α- and β-synucleins. Biochemistry 37, 4901–9 (1998).
129. Kahle, P.J., Haass, C., Kretzschmar, H.A. & Neumann, M. Struc-
ture/function of α-synuclein in health and disease: rational development
of animal models for Parkinson’s and related diseases. J. Neurochem. 82,
449–57 (2002).
130. Moore, D.J., West, A.B., Dawson, V.L. & Dawson, T.M. Molecular patho-
physiology of Parkinson’s disease. Annu. Rev. Neurosci. 28, 57–87 (2005).
131. Spillantini, M.G., Schmidt, M.L., Lee, V.M., Trojanowski, J.Q., Jakes, R.
& Goedert, M. α-synuclein in Lewy bodies. Nature 388, 839–40 (1997).
132. Goedert, M. α-synuclein and neurodegenerative diseases. Nat. Rev. Neu-
rosci. 2, 492–501 (2001).
133. Baba, M., Nakajo, S., Tu, P.H., Tomita, T., Nakaya, K., Lee, V.M., Tro-
janowski, J.Q. & Iwatsubo, T. Aggregation of α-synuclein in Lewy bodies
of sporadic Parkinson’s disease and dementia with Lewy bodies. Am. J.
Pathol. 152, 879–84 (1998).
134. Polymeropoulos, M., C., L., Leroy, E., Ide, S., Dehejia, A., Dutra, A., Pike,
B., Root, H., Rubenstein, J., Boyer, R., Stenroos, E., Chandrasekharappa,
S., Athanassiadou, A., Papapetropoulos, T., Johnson, W., Lazzarini, A.,
Duvoisin, R., Di Iorio, G., Golbe, L. & Nussbaum, R. Mutation in the
α-synuclein gene identified in families with Parkinson’s disease. Science
276, 2045–7 (1997).
135. Kruger, R., Kuhn, W., Muller, T., Woitalla, D., Graeber, M., Kosel, S.,
Przuntek, H., Epplen, J., Schols, L. & Riess, O. Ala30Pro mutation in the
139
gene encoding α-synuclein in Parkinson’s disease. Nat. Genet. 18, 106–8
(1998).
136. Zarranz, J.J., Alegre, J., Gomez-Esteban, J.C., Lezcano, E., Ros, R., Am-
puero, I., Vidal, L., Hoenicka, J., Rodriguez, O., Atares, B., Llorens, V.,
Gomez Tortosa, E., del Ser, T., Munoz, D.G. & de Yebenes, J.G. The new
mutation, E46K, of α-synuclein causes Parkinson and Lewy body demen-
tia. Ann. Neurol. 55, 164–73 (2004).
137. Singleton, A., Farrer, M., Johnson, J., Singleton, A., Hague, S., Kacher-
gus, J., Hulihan, M., Peuralinna, T., Dutra, A., Nussbaum, R., Lincoln,
S., Crawley, A., Hanson, M., Maraganore, D., Adler, C., Cookson, M.,
Muenter, M., Baptista, M., Miller, D., Blancato, J., Hardy, J. & Gwinn-
Hardy, K. α-synuclein locus triplication causes Parkinson’s disease. Science
302, 5646 (2003).
138. Chartier-Harlin, M.C., Kachergus, J., Roumier, C., Mouroux, V., Douay,
X., Lincoln, S., Levecque, C., Larvor, L., Andrieux, J., Hulihan, M., Wauc-
quier, N., Defebvre, L., Amouyel, P., Farrer, M. & Destee, A. α-synuclein
locus duplication as a cause of familial Parkinson’s disease. Lancet 364,
1167–9 (2004).
139. Ibanez, P., Bonnet, A.M., Debarges, B., Lohmann, E., Tison, F., Pollak,
P., Agid, Y., Durr, A. & Brice, A. Causal relation between α-synuclein
gene duplication and familial Parkinson’s disease. Lancet 364, 1169–71
(2004).
140. Dedmon, M.M., Christodoulou, J., Wilson, M.R. & Dobson, C.M. Heat
shock protein 70 inhibits α-synuclein fibril formation via preferential bind-
ing to prefibrillar species. J. Biol. Chem. 280, 14733–40 (2005).
141. Volles, M. & Lansbury, P. Vesicle permeabilization by protofibrillar α-
synuclein is sensitive to Parkinson’s disease-linked mutations and occurs
by a pore-like mechanism. Biochemistry 41, 4595–4602 (2002).
142. Volles, M., Lee, S.J., Rochet, J.C., Shtilerman, M., Ding, T., Kessler,
J. & Lansbury, P. Vesicle permeabilization by protofibrillar α-synuclein:
implications for the pathogenesis and treatment of Parkinson’s disease.
Biochemistry 40, 7812–7819 (2001).
143. Lashuel, H.A., Petre, B.M., Wall, J., Simon, M., Nowak, R.J., Walz, T. &
Lansbury, P.T. α-synuclein, especially the Parkinson’s disease-associated
140
mutants, forms pore-like annular and tubular protofibrils. J. Mol. Biol.
322, 1089–1102 (2002).
144. Mori, F., Hayashi, S., Yamagishi, S., Yoshimoto, M., Yagihashi, S.,
Takahashi, H. & Wakabayashi, K. Pick’s disease: α- and β-synuclein-
immunoreactive Pick bodies in the dentate gyrus. Acta Neuropathol. (Berl)
104, 455–61 (2002).
145. Rivers, R.C. Biophysical analysis of the aggregation behaviour and struc-
tural properties of α- and β-synuclein. PhD Thesis (2007).
146. Yamin, G., Munishkina, L.A., Karymov, M.A., Lyubchenko, Y.L., Uver-
sky, V.N. & Fink, A.L. Forcing nonamyloidogenic β-synuclein to fibrillate.
Biochemistry 44, 9096–107 (2005).
147. Biere, A.L., Wood, S.J., Wypych, J., Steavenson, S., Jiang, Y., Anafi,
D., Jacobsen, F.W., Jarosinski, M.A., Wu, G.M., Louis, J.C., Martin,
F., Narhi, L.O. & Citron, M. Parkinson’s disease-associated α-synuclein
is more fibrillogenic than β- and γ-synuclein and cannot cross-seed its
homologs. J. Biol. Chem. 275, 34574–9 (2000).
148. Park, J.Y. & Lansbury, P. T., J. Beta-synuclein inhibits formation of α-
synuclein protofibrils: a possible therapeutic strategy against Parkinson’s
disease. Biochemistry 42, 3696–700 (2003).
149. Tsigelny, I.F., Bar-On, P., Sharikov, Y., Crews, L., Hashimoto, M., Miller,
M.A., Keller, S.H., Platoshyn, O., Yuan, J.X.J. & Masliah, E. Dynamics of
α-synuclein aggregation and inhibition of pore-like oligomer development
by β-synuclein. FEBS J. 274, 1862–77 (2007).
150. Uversky, V.N. & Fink, A.L. Amino acid determinants of α-synuclein ag-
gregation: putting together pieces of the puzzle. FEBS Lett. 522, 9–13
(2002).
151. Murray, I.V., Giasson, B.I., Quinn, S.M., Koppaka, V., Axelsen, P.H.,
Ischiropoulos, H., Trojanowski, J.Q. & Lee, V.M. Role of α-synuclein
carboxy-terminus on fibril formation in vitro. Biochemistry 42, 8530–40
(2003).
152. Spillantini, M.G., Crowther, R.A., Jakes, R., Hasegawa, M. & Goedert,
M. α-synuclein in filamentous inclusions of Lewy bodies from Parkinson’s
disease and dementia with Lewy bodies. Proc. Natl. Acad. Sci. U. S. A.
95, 6469–73 (1998).
141
153. Hoyer, W., Cherny, D., Subramaniam, V. & Jovin, T.M. Impact of the
acidic C-terminal region comprising amino acids 109-140 on α-synuclein
aggregation in vitro. Biochemistry 43, 16233–42 (2004).
154. Li, W., West, N., Colla, E., Pletnikova, O., Troncoso, J.C., Marsh, L.,
Dawson, T.M., Jakala, P., Hartmann, T., Price, D.L. & Lee, M.K. Aggre-
gation promoting C-terminal truncation of α-synuclein is a normal cellular
process and is enhanced by the familial Parkinson’s disease-linked muta-
tions. Proc. Natl. Acad. Sci. U. S. A. 102, 2162–7 (2005).
155. Giasson, B.I., Murray, I.V.J., Trojanowski, J.Q. & Lee, V.M.Y. A hy-
drophobic stretch of 12 amino acid residues in the middle of α-synuclein is
essential for filament assembly. J. Biol. Chem. 276, 2380–6 (2001).
156. Du, H.N., Tang, L., Luo, X.Y., Li, H.T., Hu, J., Zhou, J.W. & Hu, H.Y.
A peptide motif consisting of glycine, alanine, and valine is required for
the fibrillization and cytotoxicity of human α-synuclein. Biochemistry 42,
8870–8 (2003).
157. Madine, J., Doig, A. & Middleton, D. The aggregation and membrane-
binding properties of an α-synuclein peptide fragment. Biochem. Soc.
Trans. 32, 1127–9 (2004).
158. Bertoncini, C.W., Rasia, R.M., Lamberto, G.R., Binolfi, A., Zweckstetter,
M., Griesinger, C. & Fernandez, C.O. Structural Characterization of the
Intrinsically Unfolded Protein [β]-Synuclein, a Natural Negative Regulator
of [alpha]-Synuclein Aggregation. Journal of Molecular Biology 372, 708–
722 (2007).
159. Sung, Y.h. & Eliezer, D. Residual structure, backbone dynamics, and
interactions within the synuclein family. J. Mol. Biol. 372, 689–707 (2007).
160. Dyson, H.J. & Wright, P.E. Equilibrium NMR studies of unfolded and
partially folded proteins. Nat. Struct. Biol. 5, 499–503 (1998).
161. Wilkins, D.K., Grimshaw, S.B., Receveur, V., Dobson, C.M., Jones, J.A.
& Smith, L.J. Hydrodynamic radii of native and denatured proteins mea-
sured by pulse field gradient NMR techniques. Biochemistry 38, 16424–31
(1999).
162. Svergun, D.I. & Koch, M.H.J. Small-angle scattering studies of biological
macromolecules in solution. Rep. Prog. Phys. 66, 1735–82 (2003).
142
163. Bilsel, O. & Matthews, C.R. Molecular dimensions and their distributions
in early folding intermediates. Curr. Opin. Struct. Biol. 16, 86–93 (2006).
164. Mittag, T. & Forman-Kay, J.D. Atomic-level characterization of disordered
protein ensembles. Curr. Opin. Struct. Biol. 17, 3–14 (2007).
165. Dyson, H.J. & Wright, P.E. Elucidation of the protein folding landscape
by NMR. Methods Enzymol. 394, 299–321 (2005).
166. Wishart, D.S. & Sykes, B.D. The 13C chemical-shift index: a simple
method for the identification of protein secondary structure using 13C
chemical-shift data. J. Biomol. NMR 4, 171–80 (1994).
167. Wishart, D. & Sykes, B. Chemical shifts as a tool for structure determi-
nation. Methods Enzymol. 239, 363–92 (1994).
168. Marsh, J.A., Singh, V.K., Jia, Z. & Forman-Kay, J.D. Sensitivity of sec-
ondary structure propensities to sequence differences between α- and γ-
synuclein: implications for fibrillation. Protein Sci. 15, 2795–804 (2006).
169. Wang, Y. & Jardetzky, O. Probability-based protein secondary structure
identification using combined NMR chemical-shift data. Protein Sci. 11,
852–61 (2002).
170. Klein-Seetharaman, J., Oikawa, M., Grimshaw, S.B., Wirmer, J.,
Duchardt, E., Ueda, T., Imoto, T., Smith, L.J., Dobson, C.M. & Schwalbe,
H. Long-range interactions within a nonnative protein. Science 295, 1719–
22 (2002).
171. Reed, M.A., Jelinska, C., Syson, K., Cliff, M.J., Splevins, A., Alizadeh,
T., Hounslow, A.M., Staniforth, R.A., Clarke, A.R., Jeremy Craven, C. &
Waltho, J.P. The denatured state under native conditions: a non-native-
like collapsed state of N-PGK. J. Mol. Biol. 357, 365–72 (2006).
172. Bolin, K.A., Pitkeathly, M., Miranker, A., Smith, L.J. & Dobson, C.M.
Insight into a random coil conformation and an isolated helix: structural
and dynamical characterisation of the C-helix peptide from hen lysozyme.
J. Mol. Biol. 261, 443–53 (1996).
173. Yi, Q., Scalley-Kim, M.L., Alm, E.J. & Baker, D. NMR characterization
of residual structure in the denatured state of protein L. J. Mol. Biol. 299,
1341–51 (2000).
143
174. Serrano, L. Comparison between the φ distribution of the amino acids
in the protein database and NMR data indicates that amino acids have
various φ propensities in the random coil conformation. J. Mol. Biol. 254,
322–33 (1995).
175. Smith, L.J., Bolin, K.A., Schwalbe, H., MacArthur, M.W., Thornton, J.M.
& Dobson, C.M. Analysis of main chain torsion angles in proteins: predic-
tion of NMR coupling constants for native and random coil conformations.
J. Mol. Biol. 255, 494–506 (1996).
176. Fiebig, K., Schwalbe, H., Buck, M., Smith, L. & Dobson, C. Toward a de-
scription of the conformations of denatured states of proteins. Comparison
of a random coil model with NMR measurements. J. Phys. Chem. 100,
2661–6 (1996).
177. Choy, W.Y., Shortle, D. & Kay, L. Side chain dynamics in unfolded protein
states: an NMR based 2H spin relaxation study of ∆131∆. J. Am. Chem.
Soc. 125, 1748–58 (2003).
178. Schwalbe, H., Fiebig, K., Buck, M., Jones, J., Grimshaw, S., Spencer, A.,
Glaser, S., Smith, L. & Dobson, C. Structural and dynamical properties of
a denatured protein. Heteronuclear 3D NMR experiments and theoretical
simulations of lysozyme in 8 M urea. Biochemistry 36, 8977–91 (1997).
179. Blackledge, M. Recent progress in the study of biomolecular structure and
dynamics in solution from residual dipolar couplings. Prog. Nucl. Magn.
Reson. Spectrosc. 46, 23–61 (2005).
180. Tjandra, N. & Bax, A. Direct measurement of distances and angles in
biomolecules by NMR in a dilute liquid crystalline medium. Science 278,
1111–4 (1997).
181. Sanders, C.R., Hare, B.J., Howard, K.P. & Prestegard, J.H. Magnetically-
oriented phospholipid micelles as a tool for the study of membrane-
associated molecules. Prog. Nucl. Magn. Reson. Spectrosc. 26, 421–44
(1994).
182. Hansen, M.R., Mueller, L. & Pardi, A. Tunable alignment of macro-
molecules by filamentous phage yields dipolar coupling interactions. Nat.
Struct. Mol. Biol. 5, 1065–74 (1998).
183. Clore, G., Starich, M. & Gronenborn, A. Measurement of residual dipolar
couplings of macromolecules aligned in the nematic phase of a colloidal
suspension of rod-shaped viruses. J. Am. Chem. Soc. 120, 10571–2 (1998).
144
184. Sass, J., Cordier, F., Hoffmann, A., Rogowski, M., Cousin, A., Omichinski,
J., Lowen, H. & Grzesiek, S. Purple membrane induced alignment of
biological macromolecules in the magnetic field. J. Am. Chem. Soc. 121,
2047–55 (1999).
185. Koenig, B., Hu, J.S., Ottiger, M., Bose, S., Hendler, R. & Bax, A. NMR
measurement of dipolar couplings in proteins aligned by transient binding
to purple membrane fragments. J. Am. Chem. Soc. 121, 1385–6 (1999).
186. Ruckert, M. & Otting, G. Alignment of biological macromolecules in novel
nonionic liquid crystalline media for NMR experiments. J. Am. Chem.
Soc. 122, 7793–7 (2000).
187. Tycko, R., Blanco, F. & Ishii, Y. Alignment of biopolymers in strained gels:
a new way to create detectable dipole-dipole couplings in high-resolution
biomolecular NMR. J. Am. Chem. Soc. 122, 9340–1 (2000).
188. Chou, J.J., Gaemers, S., Howder, B., Louis, J.M. & Bax, A. A simple
apparatus for generating stretched polyacrylamide gels, yielding uniform
alignment of proteins and detergent micelles. J. Biomol. NMR 21, 377–82
(2001).
189. Tolman, J.R. Dipolar couplings as a probe of molecular dynamics and
structure in solution. Curr. Opin. Struct. Biol. 11, 532–9 (2001).
190. Zweckstetter, M. & Bax, A. Prediction of sterically induced alignment in
a dilute liquid crystalline phase: aid to protein structure determination by
NMR. J. Am. Chem. Soc. 122, 3791–2 (2000).
191. Zweckstetter, M., Hummer, G. & Bax, A. Prediction of charge-induced
molecular alignment of biomolecules dissolved in dilute liquid-crystalline
phases. Biophys. J. 86, 3444–60 (2004).
192. Azurmendi, H. & Bush, C. Tracking alignment from the moment of in-
ertia tensor (TRAMITE) of biomolecules in neutral dilute liquid crystal
solutions. J. Am. Chem. Soc. 124, 2426–7 (2002).
193. Louhivuori, M., Otten, R., Lindorff-Larsen, K. & Annila, A. Conforma-
tional fluctuations affect protein alignment in dilute liquid crystal media.
J. Am. Chem. Soc. 128, 4371–6 (2006).
194. Fredriksson, K., Louhivuori, M., Permi, P. & Annila, A. On the interpre-
tation of residual dipolar couplings as reporters of molecular dynamics. J.
Am. Chem. Soc. 126, 12646–50 (2004).
145
195. Louhivuori, M., Paakkonen, K., Fredriksson, K., Permi, P., Lounila, J.
& Annila, A. On the origin of residual dipolar couplings from denatured
proteins. J. Am. Chem. Soc. 125, 15647–50 (2003).
196. Kuhn, W. Uber die Gestalt fadenformiger Molekule in Losungen. Kolloid-Z
68, 2–11 (1934).
197. Haber, C., Ruiz, S.A. & Wirtz, D. Shape anisotropy of a single random-
walk polymer. Proc. Natl. Acad. Sci. U. S. A. 97, 10792–5 (2000).
198. Clore, G. & Gronenborn, A. NMR structures of proteins and protein
complexes beyond 20,000 M(r). Nat. Struct. Biol. 4, 849–53 (1997).
199. Gillespie, J.R. & Shortle, D. Characterization of long-range structure in
the denatured state of staphylococcal nuclease. I. Paramagnetic relaxation
enhancement by nitroxide spin labels. J. Mol. Biol. 268, 158–69 (1997).
200. Crowhurst, K. & Forman-Kay, J. Aromatic and methyl NOEs highlight hy-
drophobic clustering in the unfolded state of an SH3 domain. Biochemistry
42, 8687–95 (2003).
201. Kristjansdottir, S., Lindorff-Larsen, K., Fieber, W., Dobson, C.M., Ven-
druscolo, M. & Poulsen, F.M. Formation of native and non-native in-
teractions in ensembles of denatured ACBP molecules from paramagnetic
relaxation enhancement studies. J. Mol. Biol. 347, 1053–62 (2005).
202. Lindorff-Larsen, K., Kristjansdottir, S., Teilum, K., Fieber, W., Dobson,
C.M., Poulsen, F.M. & Vendruscolo, M. Determination of an ensemble of
structures representing the denatured state of the bovine acyl-coenzyme a
binding protein. J. Am. Chem. Soc. 126, 3291–9 (2004).
203. Teilum, K., Kragelund, B.B. & Poulsen, F.M. Transient structure forma-
tion in unfolded acyl-coenzyme A-binding protein observed by site-directed
spin labelling. J. Mol. Biol. 324, 349–57 (2002).
204. Francis, C., Lindorff-Larsen, K., Robert B. Best, R. & Vendruscolo, M.
Characterization of the residual structure in the unfolded state of the
∆131∆ fragment of staphylococcal nuclease. Proteins: Struct. Funct.
Bioinform. 65, 145–52 (2006).
205. Dedmon, M.M., Lindorff-Larsen, K., Christodoulou, J., Vendruscolo, M. &
Dobson, C.M. Mapping long-range interactions in α-synuclein using spin-
label NMR and ensemble molecular dynamics simulations. J. Am. Chem.
Soc. 127, 476–7 (2005).
146
206. Liang, B., Bushweller, J.H. & Tamm, L.K. Site-directed parallel spin-
labeling and paramagnetic relaxation enhancement in structure determi-
nation of membrane proteins by solution NMR spectroscopy. J. Am. Chem.
Soc. 128, 4389–97 (2006).
207. Iwahara, J., Schwieters, C.D. & Clore, G.M. Ensemble approach for
NMR structure refinement against 1H paramagnetic relaxation enhance-
ment data arising from a flexible paramagnetic group attached to a macro-
molecule. J. Am. Chem. Soc. 126, 5879–96 (2004).
208. Battiste, J.L. & Wagner, G. Utilization of site-directed spin labeling and
high-resolution heteronuclear nuclear magnetic resonance for global fold
determination of large proteins with limited nuclear overhauser effect data.
Biochemistry 39, 5355–65 (2000).
209. Donaldson, L.W., Skrynnikov, N.R., Choy, W.Y., Muhandiram, D.R.,
Sarkar, B., Forman-Kay, J.D. & Kay, L.E. Structural characterization
of proteins with an attached ATCUN motif by paramagnetic relaxation
enhancement NMR spectroscopy. J. Am. Chem. Soc. 123, 9843–7 (2001).
210. Gaponenko, V., Howarth, J.W., Columbus, L., Gasmi-Seabrook, G., Yuan,
J., Hubbell, W.L. & Rosevear, P.R. Protein global fold determination using
site-directed spin and isotope labeling. Protein Sci. 9, 302–9 (2000).
211. Tang, C., Iwahara, J. & Clore, G.M. Visualization of transient encounter
complexes in protein-protein association. Nature 444, 383–6 (2006).
212. Voss, J., Salwinski, L., Kaback, H. & Hubbell, W. A method for distance
determination in proteins using a designed metal ion binding site and site-
directed spin labeling: evaluation with T4 lysozyme. Proc. Natl. Acad.
Sci. U. S. A. 92, 12295–9 (1995).
213. Iwahara, J. & Clore, G.M. Detecting transient intermediates in macro-
molecular binding by paramagnetic NMR. Nature 440, 1227–30 (2006).
214. Iwahara, J., Anderson, D., Murphy, E. & Clore, G. EDTA-derivatized de-
oxythymidine as a tool for rapid determination of protein binding polarity
to DNA by intermolecular paramagnetic relaxation enhancement. J. Am.
Chem. Soc. 125, 6634–5 (2003).
215. Mal, T., Ikura, M. & Kay, L. The ATCUN domain as a probe of inter-
molecular interactions: application to calmodulin-peptide complexes. J.
Am. Chem. Soc. 124, 14002–3 (2002).
147
216. Karim, C.B., Kirby, T.L., Zhang, Z., Nesmelov, Y. & Thomas, D.D. Phos-
pholamban structural dynamics in lipid bilayers probed by a spin label
rigidly coupled to the peptide backbone. Proc. Natl. Acad. Sci. U. S. A.
101, 14437–42 (2004).
217. Shenkarev, Z.O., Paramonov, A.S., Balashova, T.A., Yakimenko, Z.A.,
Baru, M.B., Mustaeva, L.G., Raap, J., Ovchinnikova, T.V. & Arseniev,
A.S. High stability of the hinge region in the membrane-active peptide helix
of zervamicin: paramagnetic relaxation enhancement studies. Biochem.
Biophys. Res. Comm. 325, 1099–105 (2004).
218. Milov, A.D., Tsvetkov, Y.D., Gorbunova, E.Y., Mustaeva, L.G., Ovchin-
nikova, T.V. & Raap, J. Self-aggregation properties of spin-labeled zer-
vamicin IIA as studied by PELDOR spectroscopy. Biopolymers 64, 328–36
(2002).
219. Johnson, P.E., Brun, E., MacKenzie, L.F., Withers, S.G. & McIntosh, L.P.
The cellulose-binding domains from Cellulomonas fimi β-1,4-glucanase
CenC bind nitroxide spin-labeled cellooligosaccharides in multiple orien-
tations. J. Mol. Biol. 287, 609–25 (1999).
220. Ueda, T., Kato, A., Ogawa, Y., Torizawa, T., Kuramitsu, S., Iwai, S.,
Terasawa, H. & Shimada, I. NMR study of repair mechanism of DNA pho-
tolyase by FAD-induced paramagnetic relaxation enhancement. J. Biol.
Chem. 279, 52574–9 (2004).
221. Roosild, T.P., Greenwald, J., Vega, M., Castronovo, S., Riek, R. & Choe,
S. NMR structure of mistic, a membrane-integrating protein for membrane
protein expression. Science 307, 1317–21 (2005).
222. Lietzow, M.A., Jamin, M., Jane Dyson, H. & Wright, P.E. Mapping long-
range contacts in a highly unfolded protein. J. Mol. Biol. 322, 655–62
(2002).
223. Solomon, I. & Bloembergen, N. Nuclear magnetic interactions in the HF
molecule. J. Chem. Phys. 25, 261–6 (1956).
224. Gillespie, J.R. & Shortle, D. Characterization of long-range structure in
the denatured state of staphylococcal nuclease. II. Distance restraints from
paramagnetic relaxation and calculation of an ensemble of structures. J.
Mol. Biol. 268, 170–84 (1997).
148
225. Nadaud, P., Helmus, J., Hofer, N. & Jaroniec, C. Long-range structural
restraints in spin-labeled proteins probed by solid-state nuclear magnetic
resonance spectroscopy. J. Am. Chem. Soc. 129, 7502–3 (2007).
226. Lee, J., Langen, R., Hummel, P., Gray, H. & Winkler, J. α-synuclein
structures from fluorescence energy-transfer kinetics: implications for the
role of the protein in Parkinson’s disease. Proc. Natl. Acad. Sci. U. S. A.
101, 16466–71 (2004).
227. Lee, J.C., Gray, H.B. & Winkler, J.R. Tertiary contact formation in α-
synuclein probed by electron transfer. J. Am. Chem. Soc. 127, 16388–9
(2005).
228. Smith, L.J., Fiebig, K.M., Schwalbe, H. & Dobson, C.M. The concept of a
random coil. Residual structure in peptides and denatured proteins. Fold.
Des. 1, R95–106 (1996).
229. Tanford, C., Kawahara, K. & Lapanje, S. Proteins in 6-M guanidine hy-
drochloride. Demonstration of random coil behavior. J. Biol. Chem. 241,
1921–3 (1966).
230. Tanford, C. Protein denaturation. Adv. Prot. Chem. 23, 121–282 (1968).
231. McCarney, E.R., Kohn, J.E. & Plaxco, K.W. Is there or isn’t there? The
case for (and against) residual structure in chemically denatured proteins.
Crit. Rev. Biochem. Mol. Biol. 40, 181–9 (2005).
232. Kohn, J.E., Millett, I.S., Jacob, J., Zagrovic, B., Dillon, T.M., Cingel,
N., Dothager, R.S., Seifert, S., Thiyagarajan, P., Sosnick, T.R., Hasan,
M.Z., Pande, V.S., Ruczinski, I., Doniach, S. & Plaxco, K.W. Random-
coil behavior and the dimensions of chemically unfolded proteins. Proc.
Natl. Acad. Sci. U. S. A. 101, 12491–6 (2004).
233. Millett, I.S., Doniach, S. & Plaxco, K.W. Toward a taxonomy of the
denatured state: small angle scattering studies of unfolded proteins. Adv.
Prot. Chem. 62, 241–62 (2002).
234. Morar, A.S., Olteanu, A., Young, G.B. & Pielak, G.J. Solvent-induced
collapse of α-synuclein and acid-denatured cytochrome c. Protein Sci. 10,
2195–9 (2001).
235. Binolfi, A., Rasia, R.M., Bertoncini, C.W., Ceolin, M., Zweckstetter, M.,
Griesinger, C., Jovin, T.M. & Fernandez, C.O. Interaction of α-synuclein
149
with divalent metal ions reveals key differences: a link between structure,
binding specificity and fibrillation enhancement. J. Am. Chem. Soc. 128,
9893–901 (2006).
236. Shortle, D. & Ackerman, M.S. Persistence of native-like topology in a
denatured protein in 8 M urea. Science 293, 487–9 (2001).
237. Ohnishi, S. & Shortle, D. Effects of denaturants and substitutions of
hydrophobic residues on backbone dynamics of denatured staphylococcal
nuclease. Protein Sci. 12, 1530–7 (2003).
238. Ohnishi, S., Lee, A.L., Edgell, M.H. & Shortle, D. Direct demonstration of
structural similarity between native and denatured eglin C. Biochemistry
43, 4064–70 (2004).
239. Fieber, W., Kristjansdottir, S. & Poulsen, F.M. Short-range, long-range
and transition state interactions in the denatured state of ACBP from
residual dipolar couplings. J. Mol. Biol. 339, 1191–9 (2004).
240. Bertoncini, C.W., Jung, Y.S., Fernandez, C.O., Hoyer, W., Griesinger, C.,
Jovin, T.M. & Zweckstetter, M. Release of long-range tertiary interactions
potentiates aggregation of natively unstructured α-synuclein. Proc. Natl.
Acad. Sci. U. S. A. 102, 1430–5 (2005).
241. Dyson, H.J. & Wright, P.E. Defining solution conformations of small linear
peptides. Annu. Rev. Biophys. Biophys. Chem. 20, 519–38 (1991).
242. Mok, Y.K., Kay, C.M., Kay, L.E. & Forman-Kay, J. NOE data demon-
strating a compact unfolded state for an SH3 domain under non-denaturing
conditions. J. Mol. Biol. 289, 619–38 (1999).
243. Ackerman, M.S. & Shortle, D. Molecular alignment of denatured states of
staphylococcal nuclease with strained polyacrylamide gels and surfactant
liquid crystalline phases. Biochemistry 41, 3089–95 (2002).
244. Ackerman, M.S. & Shortle, D. Robustness of the long-range structure
in denatured staphylococcal nuclease to changes in amino acid sequence.
Biochemistry 41, 13791–7 (2002).
245. Mohana-Borges, R., Goto, N.K., Kroon, G.J., Dyson, H.J. & Wright, P.E.
Structural characterization of unfolded states of apomyoglobin using resid-
ual dipolar couplings. J. Mol. Biol. 340, 1131–42 (2004).
150
246. Shortle, D. The denatured state (the other half of the folding equation)
and its role in protein stability. FASEB J. 10, 27–34 (1996).
247. Neri, D., Billeter, M., Wider, G. & Wuthrich, K. NMR determination of
residual structure in a urea-denatured protein, the 434-repressor. Science
257, 1559–63 (1992).
248. Tsai, C.J., Ma, B., Sham, Y.Y., Kumar, S. & Nussinov, R. Structured
disorder and conformational selection. Proteins: Struct. Funct. Genet. 44,
418–27 (2001).
249. Shortle, D.R. Structural analysis of non-native states of proteins by NMR
methods. Curr. Opin. Struct. Biol. 6, 24–30 (1996).
250. Wrabl, J. & Shortle, D. A model of the changes in denatured state structure
underlying m value effects in staphylococcal nuclease. Nat. Struct. Mol.
Biol. 6, 876–83 (1999).
251. Blanco, F.J., Serrano, L. & Forman-Kay, J.D. High populations of non-
native structures in the denatured state are compatible with the formation
of the native folded state. J. Mol. Biol. 284, 1153–64 (1998).
252. Wong, K.B., Freund, S.M.V. & Fersht, A.R. Cold denaturation of barstar:1H,15N and13C NMR assignment and characterisation of residual struc-
ture. J. Mol. Biol. 259, 805–18 (1996).
253. Saab-Rincon, G., Gualfetti, P. & Matthews, C. Mutagenic and thermo-
dynamic analyses of residual structure in the α subunit of tryptophan
synthase. Biochemistry 35, 1988–94 (1996).
254. Ropson, I. & Frieden, C. Dynamic NMR spectral analysis and protein
folding: identification of a highly populated folding intermediate of rat
intestinal fatty acid-binding protein by 19F NMR. Proc. Natl. Acad. Sci.
U. S. A. 89, 7222–6 (1992).
255. Tran, H.T., Wang, X. & Pappu, R.V. Reconciling observations of sequence-
specific conformational propensities with the generic polymeric behavior
of denatured proteins. Biochemistry 44, 11369–80 (2005).
256. Pappu, R.V., Srinivasan, R. & Rose, G.D. The Flory isolated-pair hypoth-
esis is not valid for polypeptide chains: implications for protein folding.
Proc. Natl. Acad. Sci. U. S. A. 97, 12565–70 (2000).
151
257. Jha, A.K., Colubri, A., Freed, K.F. & Sosnick, T.R. Statistical coil model
of the unfolded state: resolving the reconciliation problem. Proc. Natl.
Acad. Sci. U. S. A. 102, 13099–104 (2005).
258. Fitzkee, N.C. & Rose, G.D. Reassessing random-coil statistics in unfolded
proteins. Proc. Natl. Acad. Sci. U. S. A. 101, 12497–502 (2004).
259. Zagrovic, B. & Pande, V.S. Structural correspondence between the α-
helix and the random-flight chain resolves how unfolded proteins can have
native-like properties. Nat. Struct. Biol. 10, 955–61 (2003).
260. Banavar, J.R., Hoang, T.X. & Maritan, A. Proteins and polymers. J.
Chem. Phys. 122, 234910–4 (2005).
261. Banavar, J.R., Cieplak, M., Flammini, A., Hoang, T.X., Kamien, R.D.,
Lezon, T., Marenduzzo, D., Maritan, A., Seno, F., Snir, Y. & Trovato, A.
Geometry of proteins: hydrogen bonding, sterics, and marginally compact
tubes. Phys. Rev. E 73, 031921–5 (2006).
262. Marenduzzo, D., Hoang, T.X., Seno, F., Vendruscolo, M. & Maritan, A.
Form of growing strings. Phys. Rev. Lett. 95, 098103–4 (2005).
263. Hoang, T.X., Marsella, L., Trovato, A., Seno, F., Banavar, J.R. & Maritan,
A. Common attributes of native-state structures of proteins, disordered
proteins, and amyloid. Proc. Natl. Acad. Sci. U. S. A. 103, 6883–8 (2006).
264. Petrescu, A., Calmettes, P., Durand, D., Receveur, V. & Smith, J. Change
in backbone torsion angle distribution on protein folding. Protein Sci. 9,
1129–36 (2000).
265. Jha, A., Colubri, A., Zaman, M., Koide, S., Sosnick, T. & Freed, K. Helix,
sheet, and polyproline II frequencies and strong nearest neighbor effects in
a restricted coil library. Biochemistry 44, 9691–702 (2005).
266. Bernado, P., Bertoncini, C.W., Griesinger, C., Zweckstetter, M. & Black-
ledge, M. Defining long-range order and local disorder in native α-synuclein
using residual dipolar couplings. J. Am. Chem. Soc. 127, 17968–9 (2005).
267. Zaman, M.H., Shen, M.Y., Berry, R.S., Freed, K.F. & Sosnick, T.R. In-
vestigations into sequence and conformational dependence of backbone
entropy, inter-basin dynamics and the Flory isolated-pair hypothesis for
peptides. J. Mol. Biol. 331, 693–711 (2003).
152
268. Cho, M.K., Kim, H.Y., Bernado, P., Fernandez, C., Blackledge, M. &
Zweckstetter, M. Amino acid bulkiness defines the local conformations
and dynamics of natively unfolded α-synuclein and tau. J. Am. Chem.
Soc. 129, 3032–3 (2007).
269. Skora, L., Cho, M.K., Kim, H., Becker, S., Fernandez, C.O., Blackledge,
M. & Zweckstetter, M. Charge-induced molecular alignment of intrinsically
disordered proteins. Angew. Chem. Int. Ed. 45, 7012–15 (2006).
270. van Gunsteren, W.F., Bakowies, D., Baron, R., Chandrasekhar, I., Chris-
ten, M., Daura, X., Gee, P., Geerke, D.P., Glattli, A., Hunenberger, P.H.,
Kastenholz, M.A., Oostenbrink, C., Schenk, M., Trzesniak, D., van der
Vegt, N.F.A. & Yu, H.B. Biomolecular modeling: goals, problems, per-
spectives. Angew. Chem. Int. Ed. 45, 4064–92 (2006).
271. Brooks, B., Bruccoler, R., Olafson, B., States, D., Swaminathan, S. &
Karplus, M. CHARMM: a program for macromolecular energy, minimiza-
tion, and dynamics calculations. J. Comput. Chem. 4, 187–217 (1983).
272. Mackerell, A.D.J. Empirical force fields for biological macromolecules:
overview and issues. J. Comput. Chem. 25, 1584–604 (2004).
273. Jorgensen, W.L. & Tirado-Rives, J. Potential energy functions for atomic-
level simulations of water and organic and biomolecular systems. Proc.
Natl. Acad. Sci. U. S. A. 102, 6665–70 (2005).
274. Wang, W., Donini, O., Reyes, C.M. & Kollman, P.A. Biomolecular simula-
tions: recent developments in force fields, simulations of enzyme catalysis,
protein-ligand, protein-protein, and protein-nucleic acid noncovalent inter-
actions. Annu. Rev. Biophys. Biomol. Struct. 30, 211–43 (2001).
275. Lazaridis, T., Archontis, G. & Karplus, M. Enthalpic contribution to pro-
tein stability: insights from atom-based calculations and statistical me-
chanics. Adv. Prot. Chem. 46, 213–306 (1995).
276. Roux, B. & Simonson, T. Implicit solvent models. Biophys. Chem. 78,
1–20 (1999).
277. Feig, M. & Brooks, C.L. Recent advances in the development and appli-
cation of implicit solvent models in biomolecule simulations. Curr. Opin.
Struct. Biol. 14, 217–24 (2004).
278. Lazaridis, T. & Karplus, M. Effective energy functions for protein structure
prediction. Curr. Opin. Struct. Biol. 10, 139–45 (2000).
153
279. Im, W., Chen, J. & Brooks III, C.L. Peptide and protein folding and con-
formational equilibria: theoretical treatment of electrostatics and hydrogen
bonding with implicit solvent models (2005).
280. Lazaridis, T. & Karplus, M. Effective energy function for proteins in
solution. Proteins: Struct. Funct. Genet. 35, 133–52 (1999).
281. Zagrovic, B. & Pande, V.S. Solvent viscosity dependence of the folding
rate of a small protein: distributed computing study. J. Comput. Chem.
24, 1432–6 (2003).
282. Dominy, B.N. & Brooks, C.L.I. Identifying native-like protein structures
using physics-based potentials. J. Comput. Chem. 23, 147–60 (2002).
283. Felts, A.K., Gallicchio, E., Wallqvist, A. & Levy, R.M. Distinguishing
native conformations of proteins from decoys with an effective free energy
estimator based on the OPLS all-atom force field and the surface gener-
alized born solvent model. Proteins: Struct. Funct. Genet. 48, 404–22
(2002).
284. Feig, M. & III, B.C.L. Evaluating CASP4 predictions with physical energy
functions. Proteins: Struct. Funct. Genet. 49, 232–45 (2002).
285. Zhu, J., Zhu, Q., Shi, Y. & Liu, H. How well can we predict native contacts
in proteins based on decoy structures and their energies? Proteins: Struct.
Funct. Genet. 52, 598–608 (2003).
286. Forrest, L.R. & Woolf, T.B. Discrimination of native loop conformations in
membrane proteins: decoy library design and evaluation of effective energy
scoring functions. Proteins: Struct. Funct. Genet. 52, 492–509 (2003).
287. Fiser, A., Feig, M., Brooks, C. & Sali, A. Evolution and physics in com-
parative protein structure modeling. Acc. Chem. Res. 35, 413–21 (2002).
288. Lazaridis, T. & Karplus, M. Discrimination of the native from misfolded
protein models with an energy function including implicit solvation. J.
Mol. Biol. 288, 477–87 (1999).
289. Ramos, J. & Lazaridis, T. Energetic determinants of oligomeric state
specificity in coiled coils. J. Am. Chem. Soc. 128, 15499–510 (2006).
290. Donnini, S. & Juffer, A.H. Calculation of affinities of peptides for proteins.
J. Comput. Chem. 25, 393–411 (2004).
154
291. Lazaridis, T. Binding affinity and specificity from computational studies.
Curr. Org. Chem. 6, 1319–32 (2002).
292. Mardis, K.L., Luo, R. & Gilson, M.K. Interpreting trends in the binding
of cyclic ureas to HIV-1 protease. J. Mol. Biol. 309, 507–17 (2001).
293. Ferrara, P., Gohlke, H., Price, D., Klebe, G. & Brooks, C. Assessing
scoring functions for protein-ligand interactions. J. Med. Chem. 47, 3032–
47 (2004).
294. Gohlke, H. & Case, D.A. Converging free energy estimates: MM-
PB(GB)SA studies on the protein-protein complex Ras-Raf. J. Comput.
Chem. 25, 238–50 (2004).
295. Ferrara, P. & Caflisch, A. Folding simulations of a three-stranded antipar-
allel β-sheet peptide. Proc. Natl. Acad. Sci. U. S. A. 97, 10780–5 (2000).
296. Lazaridis, T. & Karplus, M. “New view” of protein folding reconciled with
the old through multiple unfolding simulations. Science 278, 1928–31
(1997).
297. Paci, E., Vendruscolo, M. & Karplus, M. Native and non-native inter-
actions along protein folding and unfolding pathways. Proteins: Struct.
Funct. Genet. 47, 379–92 (2002).
298. Settanni, G., Gsponer, J. & Caflisch, A. Formation of the folding nucleus
of an SH3 domain investigated by loosely coupled molecular dynamics
simulations. Biophys. J. 86, 1691–701 (2004).
299. Gsponer, J. & Caflisch, A. Molecular dynamics simulations of protein
folding from the transition state. Proc. Natl. Acad. Sci. U. S. A. 99,
6719–24 (2002).
300. Gsponer, J. & Caflisch, A. Role of native topology investigated by multiple
unfolding simulations of four SH3 domains. J. Mol. Biol. 309, 285–98
(2001).
301. Zhu, J., Shi, Y. & Liu, H. Parametrization of a generalized Born/solvent-
accessible surface area model and applications to the simulation of protein
dynamics. J. Phys. Chem. B 106, 4844–53 (2002).
302. Dominy, B. & Brooks, C. Development of a generalized Born model
parametrization for proteins and nucleic acids. J. Phys. Chem. B 103,
3765–73 (1999).
155
303. Calimet, N., Schaefer, M. & Simonson, T. Protein molecular dynamics
with the generalized born/ACE solvent model. Proteins: Struct. Funct.
Genet. 45, 144–58 (2001).
304. Shen, M.y. & Freed, K.F. Long time dynamics of met-enkephalin: com-
parison of explicit and implicit solvent models. Biophys. J. 82, 1791–808
(2002).
305. Krol, M. Comparison of various implicit solvent models in molecular dy-
namics simulations of immunoglobulin G light chain dimer. J. Comput.
Chem. 24, 531–46 (2003).
306. Wang, T. & Wade, R.C. Implicit solvent models for flexible protein-protein
docking by molecular dynamics simulation. Proteins: Struct. Funct. Genet.
50, 158–69 (2003).
307. Paci, E., Gsponer, J., Salvatella, X. & Vendruscolo, M. Molecular dynam-
ics studies of the process of amyloid aggregation of peptide fragments of
transthyretin. J. Mol. Biol. 340, 555–69 (2004).
308. Gsponer, J., Haberthur, U. & Caflisch, A. The role of side-chain interac-
tions in the early steps of aggregation: molecular dynamics simulations of
an amyloid-forming peptide from the yeast prion Sup35. Proc. Natl. Acad.
Sci. U. S. A. 100, 5154–9 (2003).
309. Rao, F. & Caflisch, A. The protein folding network. J. Mol. Biol. 342,
299–306 (2004).
310. Bursulaya, B. & Brooks, C. Comparative study of the folding free energy
landscape of a three-stranded β-sheet protein with explicit and implicit
solvent models. J. Phys. Chem. B 104, 12378–83 (2000).
311. Gnanakaran, S., Nymeyer, H., Portman, J., Sanbonmatsu, K. & Garcia, A.
Peptide folding simulations. Curr. Opin. Struct. Biol. 13, 168–74 (2003).
312. Zhou, R. & Berne, B.J. Can a continuum solvent model reproduce the free
energy landscape of a β-hairpin folding in water? Proc. Natl. Acad. Sci.
U. S. A. 99, 12777–82 (2002).
313. Pitera, J.W. & Swope, W. Understanding folding and design: replica-
exchange simulations of “Trp-cage” miniproteins. Proc. Natl. Acad. Sci.
U. S. A. 100, 7587–92 (2003).
156
314. He, J., Zhang, Z., Shi, Y. & Liu, H. Efficiently explore the energy landscape
of proteins in molecular dynamics simulations by amplifying collective mo-
tions. J. Chem. Phys. 119, 4005–17 (2003).
315. Zagrovic, B., Sorin, E.J. & Pande, V. Beta-hairpin folding simulations in
atomistic detail using an implicit solvent model. J. Mol. Biol. 313, 151–69
(2001).
316. Zhou, R. Free energy landscape of protein folding in water: explicit vs.
implicit solvent. Proteins: Struct. Funct. Genet. 53, 148–61 (2003).
317. Suenaga, A. Replica-exchange molecular dynamics simulations for a small-
sized protein folding with implicit solvent. J. Mol. Struct. 634, 235–41
(2003).
318. Liu, Y. & Beveridge, D.L. Exploratory studies of ab initio protein structure
prediction: multiple copy simulated annealing, AMBER energy functions,
and a generalized born/solvent accessibility solvation model. Proteins:
Struct. Funct. Genet. 46, 128–46 (2002).
319. Ohkubo, Y.Z. & Brooks, Charles L., I. Exploring Flory’s isolated-pair
hypothesis: statistical mechanics of helix-coil transitions in polyalanine
and the C-peptide from RNase A. Proc. Natl. Acad. Sci. U. S. A. 100,
13916–21 (2003).
320. Karanicolas, J. & Brooks, Charles L., I. Integrating folding kinetics and
protein function: biphasic kinetics and dual binding specificity in a WW
domain. Proc. Natl. Acad. Sci. U. S. A. 101, 3432–7 (2004).
321. Lin, C.Y., Hu, C.K. & Hansmann, U.H.E. Parallel tempering simulations
of HP-36. Proteins: Struct. Funct. Genet. 52, 436–45 (2003).
322. Alves, N. & Hansmann, U. Solution effects and the folding of an artificial
peptide. J. Phys. Chem. B 107, 10284–91 (2003).
323. Rao, F. & Caflisch, A. Replica exchange molecular dynamics simulations
of reversible folding. J. Chem. Phys. 119, 4035–42 (2003).
324. Xia, B., Tsui, V., Case, D.A., Dyson, H.J. & Wright, P.E. Comparison
of protein solution structures refined by molecular dynamics simulation
in vacuum, with a generalized Born model, and with explicit water. J.
Biomol. NMR 22, 317–31 (2004).
157
325. Moulinier, L., A., C.D. & Simonson, T. Reintroducing electrostatics into
protein X-ray structure refinement: bulk solvent treated as a dielectric
continuum. Acta Cryst. D59, 2094–103 (2003).
326. Gsponer, J., Hopearuoho, H., Whittaker, S.B.M., Spence, G.R., Moore,
G.R., Paci, E., Radford, S.E. & Vendruscolo, M. Determination of an
ensemble of structures representing the intermediate state of the bacterial
immunity protein Im7. Proc. Natl. Acad. Sci. U. S. A. 103, 99–104 (2006).
327. Paci, E., Greene, L.H., Jones, R.M. & Smith, L.J. Characterization of the
molten globule state of retinol-binding protein using a molecular dynamics
simulation approach. FEBS J. 272, 4826–38 (2005).
328. Best, R.B. & Vendruscolo, M. Determination of protein structures consis-
tent with NMR order parameters. J. Am. Chem. Soc. 126, 8090–1 (2004).
329. Daura, X., Antes, I., van Gunsteren, W.F., Thiel, W. & Mark, A.E. The
effect of motional averaging on the calculation of NMR-derived structural
properties. Proteins: Struct. Funct. Genet. 36, 542–55 (1999).
330. Kemmink, J. & Scheek, R. Dynamic modeling of a helical peptide in
solution using NMR data - multiple conformations and multi-spin effects.
J. Biomol. NMR 6, 33–40 (1995).
331. Bonvin, A.M. & Brunger, A.T. Conformational variability of solution
nuclear magnetic resonance structures. J. Mol. Biol. 250, 80–93 (1995).
332. Torda, A., Scheek, R. & van Gunsteren, W.F. Time-dependent distance
restraints in molecular dynamics simulations. Chem. Phys. Lett. 157, 289–
94 (1989).
333. Torda, A.E., Scheek, R.M. & van Gunsteren, W.F. Time-averaged nuclear
overhauser effect distance restraints applied to tendamistat. J. Mol. Biol.
214, 223–35 (1990).
334. Torda, A.E., Brunne, R.M., Huber, T., Kessler, H. & van Gunsteren, W.F.
Structure refinement using time-averaged J-coupling constant restraints.
J. Biomol. NMR 3, 55–66 (1993).
335. Bonvin, A., Boelens, R. & Kaptein, R. Time-averaged and ensemble aver-
aged direct NOE restraints. J. Biomol. NMR 4, 143–9 (1994).
336. Vendruscolo, M., Paci, E., Dobson, C.M. & Karplus, M. Rare fluctuations
of native proteins sampled by equilibrium hydrogen exchange. J. Am.
Chem. Soc. 125, 15686–7 (2003).
158
337. Vendruscolo, M. & Dobson, C.M. Towards complete descriptions of the
free-energy landscapes of proteins. Philos. Transact. A Math Phys. Eng.
Sci. 363, 433–52 (2005).
338. Lindorff-Larsen, K., Best, R.B., Depristo, M.A., Dobson, C.M. & Vendr-
uscolo, M. Simultaneous determination of protein structure and dynamics.
Nature 433, 128–32 (2005).
339. Clore, G.M. & Schwieters, C.D. How much backbone motion in ubiquitin
is required to account for dipolar coupling data measured in multiple align-
ment media as assessed by independent cross-validation? J. Am. Chem.
Soc. 126, 2923–38 (2004).
340. Clore, G.M. & Schwieters, C.D. Amplitudes of protein backbone dynamics
and correlated motions in a small α/β protein: correspondence of dipolar
coupling and heteronuclear relaxation measurements. Biochemistry 43,
10678–91 (2004).
341. Clore, G.M. & Schwieters, C.D. Concordance of residual dipolar couplings,
backbone order parameters and crystallographic B-factors for a small α/β
protein: a unified picture of high probability, fast atomic motions in pro-
teins. J. Mol. Biol. 355, 879–86 (2006).
342. Hess, B. & Scheek, R.M. Orientation restraints in molecular dynamics
simulations using time and ensemble averaging. J. Magn. Reson. 164,
19–27 (2003).
343. Gsponer, J., Hopearuoho, H., Cavalli, A., Dobson, C. & Vendruscolo, M.
Geometry, energetics, and dynamics of hydrogen bonds in proteins: struc-
tural information derived from NMR scalar couplings. J. Am. Chem. Soc.
128, 15127–35 (2006).
344. Richter, B., Gsponer, J., Varnai, P., Salvatella, X. & Vendruscolo, M. The
MUMO (minimal under-restraining minimal over-restraining) method for
the determination of native state ensembles of proteins. J. Biomol. NMR
37, 117–35 (2007).
345. Fennen, J., Torda, A.E. & van Gunsteren, W.F. Structure refinement with
molecular dynamics and a Boltzmann-weighted ensemble. J. Biomol. NMR
6, 163–70 (1995).
346. Vendruscolo, M. & Paci, E. Protein folding: bringing theory and experi-
ment closer together. Curr. Opin. Struct. Biol. 13, 82–7 (2003).
159
347. Kuszewski, J., Gronenborn, A. & Clore, G. Improving the packing and
accuracy of NMR structures with a pseudopotential for the radius of gy-
ration. J. Am. Chem. Soc. 121, 2337–8 (1999).
348. Nose, S. A unified formulation of the constant temperature molecular
dynamics methods. J. Chem. Phys. 81, 511–9 (1984).
349. Hoover, W.G. Canonical dynamics: equilibrium phase-space distributions.
Phys. Rev. A 31, 1695–7 (1985).
350. Ryckaert, J.P., Ciccotti, G. & Berendsen, H.J.C. Numerical integration of
the Cartesian equations of motion of a system with constraints: molecular
dynamics of n-alkanes. J. Comput. Phys. 23, 327–41 (1977).
351. MacKerell, A., Bashford, D., Bellott, M., Dunbrack, R., Evanseck, J.,
Field, M., Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D.,
Kuchnir, L., Kuczera, K., Lau, F., Mattos, C., Michnick, S., Ngo, T.,
Nguyen, D., Prodhom, B., Reiher, W., Roux, B., Schlenkrich, M., Smith,
J., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera, J., Yin, D.
& Karplus, M. All-atom empirical potential for molecular modeling and
dynamics studies of proteins. J. Phys. Chem. B 102, 3586–616 (1998).
352. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W. & Klein,
M.L. Comparison of simple potential functions for simulating liquid water.
J. Chem. Phys. 79, 926–35 (1983).
353. Im, W., Lee, M.S. & Brooks, C. L., r. Generalized born model with a
simple smoothing function. J. Comput. Chem. 24, 1691–702 (2003).
354. Im, W., Feig, M. & Brooks, Charles L., I. An implicit membrane general-
ized Born theory for the study of structure, stability, and interactions of
membrane proteins. Biophys. J. 85, 2900–18 (2003).
355. Ferrara, P., Apostolakis, J. & Caflisch, A. Evaluation of a fast implicit
solvent model for molecular dynamics simulations. Proteins: Struct. Funct.
Genet. 46, 24–33 (2002).
356. Garcia de la Torre, J., Huertas, M.L. & Carrasco, B. Calculation of hydro-
dynamic properties of globular proteins from their atomic-level structure.
Biophys. J. 78, 719–30 (2000).
357. Karplus, M. Contact electron-spin coupling of nuclear magnetic moments.
J. Chem. Phys. 30, 11–5 (1959).
160
358. Pardi, A., Billeter, M. & Wuthrich, K. Calibration of the angular de-
pendence of the amide proton-Cα proton coupling constants, 3JHNα, in
a globular protein : use of 3JHNα for identification of helical secondary
structure. J. Mol. Biol. 180, 741–51 (1984).
359. Bax, A. Weak alignment offers new NMR opportunities to study protein
structure and dynamics. Protein Sci. 12, 1–16 (2003).
360. Zhou, H.X. Dimensions of denatured protein chains from hydrodynamic
data. J. Phys. Chem. B 106, 5769–75 (2002).
361. Lacroix, E., Viguera, A.R. & Serrano, L. Elucidating the folding problem of
α-helices: local motifs, long-range electrostatics, ionic-strength dependence
and prediction of NMR parameters. J. Mol. Biol. 284, 173–91 (1998).
362. Munoz, V. & Serrano, L. Development of the multiple sequence approxi-
mation within the AGADIR model of α-helix formation: Comparison with
Zimm-Bragg and Lifson-Roig formalisms. Biopolymers 41, 495–509 (1997).
363. Munoz, V. & Serrano, L. Elucidating the folding problem of helical pep-
tides using empirical parameters. II. Helix macrodipole effects and rational
modification of the helical content of natural peptides. J. Mol. Biol. 245,
275–96 (1995).
364. Munoz, V. & Serrano, L. Elucidating the folding problem of helical pep-
tides using empirical parameters. III.Temperature and pH dependence. J.
Mol. Biol. 245, 297–308 (1995).
365. Kyte, J. & Doolittle, R.F. A simple method for displaying the hydropathic
character of a protein. J. Mol. Biol. 157, 105–32 (1982).
366. Pawar, A.P., Dubay, K.F., Zurdo, J., Chiti, F., Vendruscolo, M.
& Dobson, C.M. Prediction of “aggregation-prone”and “aggregation-
susceptible”regions in proteins associated with neurodegenerative diseases.
J. Mol. Biol. 350, 379–92 (2005).
367. DuBay, K.F., Pawar, A.P., Chiti, F., Zurdo, J., Dobson, C.M. & Vendr-
uscolo, M. Prediction of the absolute aggregation rates of amyloidogenic
polypeptide chains. J. Mol. Biol. 341, 1317–26 (2004).
368. Srinivasan, J., Cheatham, T., Cieplak, P., Kollman, P. & Case, D. Contin-
uum solvent studies of the stability of DNA, RNA, and phosphoramidate-
DNA helices. J. Am. Chem. Soc. 120, 9401–9 (1998).
161
369. Geney, R., Layten, M., Gomperts, R., Hornak, V. & Simmerling, C. In-
vestigation of salt bridge stability in a generalized Born solvent model. J.
Chem. Theory Comput. 2, 115–27 (2006).
370. Born, M. Volumen und Hydratationswarme der Ionen. Z. Phys 1, 45–8
(1920).
371. Still, W.C., Tempczyk, A., Hawley, R.C. & Hendrickson, T. Semianalytical
treatment of solvation for molecular mechanics and dynamics. J. Am.
Chem. Soc. 112, 6127–9 (1990).
372. David, L., Luo, R. & Gilson, M.K. Comparison of generalized born and
poisson models: energetics and dynamics of HIV protease. J. Comput.
Chem. 21, 295–309 (2000).
373. Luo, R., David, L. & Gilson, M.K. Accelerated Poisson-Boltzmann calcu-
lations for static and dynamic systems. J. Comput. Chem. 23, 1244–53
(2002).
374. Im, W., Beglov, D. & Roux, B. Continuum solvation model: computation
of electrostatic forces from numerical solutions to the Poisson-Boltzmann
equation. Comput. Phys. Comm. 111, 59–75 (1998).
375. Humphrey, W., Dalke, A. & Schulten, K. VMD: Visual molecular dynam-
ics. J. Mol. Graph. 14, 33–8 (1996).
376. Simmerling, C., Strockbine, B. & Roitberg, A. All-atom structure predic-
tion and folding simulations of a stable protein. J. Am. Chem. Soc. 124,
11258–9 (2002).
377. Felts, A.K., Harano, Y., Gallicchio, E. & Levy, R.M. Free energy surfaces
of α-hairpin and β-helical peptides generated by replica exchange molecu-
lar dynamics with the AGBNP implicit solvent model. Proteins: Struct.
Funct. Genet. 56, 310–21 (2004).
378. Formaneck, M.S. & Cui, Q. The use of a generalized born model for the
analysis of protein conformational transitions: a comparative study with
explicit solvent simulations for chemotaxis Y protein (CheY). J. Comput.
Chem. 27, 1923–43 (2006).
379. Zimmermann, K., Hagedorn, H., Heuck, C., Hinrichsen, M. & Ludwig, H.
The ionic properties of the filamentous bacteriophages Pf1 and fd. J. Biol.
Chem. 261, 1653–5 (1986).
162
380. Zagrovic, B. & van Gunsteren, W.F. Comparing atomistic simulation data
with the NMR experiment: how much can NOEs actually tell us? Proteins:
Struct. Funct. Bioinform. 63, 210–8 (2006).
381. Brunger, A.T., Clore, G.M., Gronenborn, A.M., Saffrich, R. & Nilges, M.
Assessing the quality of solution nuclear magnetic resonance structures by
complete cross-validation. Science 261, 328–31 (1993).
382. Brunger, A.T. Assessment of phase accuracy by cross validation: the free
R value. Methods and applications. Acta Crystallogr. D Biol. Crystallogr.
49, 24–36 (1993).
383. Burling, F.T., Weis, W.I., Flaherty, K.M. & Brunger, A.T. Direct obser-
vation of protein solvation and discrete disorder with experimental crys-
tallographic phases. Science 271, 72–7 (1996).
384. Brunger, A.T. Free R value: a novel statistical quantity for assessing the
accuracy of crystal structures. Nature 355, 472–5 (1992).
385. Vendruscolo, M. Determination of conformationally heterogeneous states
of proteins. Curr. Opin. Struct. Biol. 17, 15–20 (2007).
386. Burgi, R., Pitera, J. & van Gunsteren, W.F. Assessing the effect of con-
formational averaging on the measured values of observables. J. Biomol.
NMR 19, 305–20 (2001).
387. Choy, W.Y., Mulder, F.A., Crowhurst, K.A., Muhandiram, D.R., Millett,
I.S., Doniach, S., Forman-Kay, J.D. & Kay, L.E. Distribution of molecular
size within an unfolded state ensemble using small-angle X-ray scattering
and pulse field gradient NMR techniques. J. Mol. Biol. 316, 101–12 (2002).
388. McHaourab, H.S., Lietzow, M.A., Hideg, K. & Hubbell, W.L. Motion of
spin-labeled side chains in T4 lysozyme. Correlation with protein structure
and dynamics. Biochemistry 35, 7692–704 (1996).
389. Langen, R., Oh, K.J., Cascio, D. & Hubbell, W.L. Crystal structures of
spin labeled T4 lysozyme mutants: implications for the interpretation of
EPR spectra in terms of structure. Biochemistry 39, 8396–405 (2000).
390. Jiao, D., Barfield, M., Combariza, J.E. & Hruby, V.J. Ab initio molecular
orbital studies of the rotational barriers and the sulfur-33 and carbon-13
chemical shieldings for dimethyl disulfide. J. Am. Chem. Soc. 114, 3639–43
(1992).
163
391. Altenbach, C., Oh, K.J., Trabanino, R.J., Hideg, K. & Hubbell, W.L. Es-
timation of inter-residue distances in spin labeled proteins at physiological
temperatures: experimental strategies and practical limitations. Biochem-
istry 40, 15471–82 (2001).
392. Rabenstein, M.D. & Shin, Y.K. Determination of the distance between
two spin labels attached to a macromolecule. Proc. Natl. Acad. Sci. U. S.
A. 92, 8239–43 (1995).
393. Svergun, D., Barberato, C. & Koch, M.H.J. CRYSOL - a program to eval-
uate X-ray solution scattering of biological macromolecules from atomic
coordinates. J. Appl. Cryst. 28, 768–73 (1995).
394. Lipari, G. & Szabo, A. Model-free approach to the interpretation of nuclear
magnetic resonance relaxation in macromolecules. 1. Theory and range of
validity. J. Am. Chem. Soc. 104, 4546–59 (1982).
395. Lipari, G. & Szabo, A. Model-free approach to the interpretation of nu-
clear magnetic resonance relaxation in macromolecules. 2. Analysis of ex-
perimental results. J. Am. Chem. Soc. 104, 4559–70 (1982).
396. Woessner, D.E. Nuclear spin relaxation in ellipsoids undergoing rotational
Brownian motion. J. Chem. Phys. 37, 647–54 (1962).
397. Peng, J.W. & Wagner, G. Mapping of the spectral densities of nitrogen-
hydrogen bond motions in Eglin c using heteronuclear relaxation experi-
ments. Biochemistry 31, 8571–86 (1992).
398. Solomon, I. Relaxation processes in a system of two spins. Phys. Rev. 99,
559–565 (1955).
399. Bloembergen, N. Proton relaxation times in paramagnetic solutions. J.
Chem. Phys. 27, 572–3 (1957).
400. Zagrovic, B., Lipfert, J., Sorin, E.J., Millett, I.S., van Gunsteren, W.F.,
Doniach, S. & Pande, V.S. Unusual compactness of a polyproline type II
structure. Proc. Natl. Acad. Sci. U. S. A. 102, 11698–703 (2005).
401. Zagrovic, B. & Pande, V.S. How does averaging affect protein structure
comparison on the ensemble level? Biophys. J. 87, 2240–6 (2004).
402. Bussell, Robert, J. & Eliezer, D. Residual structure and dynamics in
Parkinson’s disease-associated mutants of α-synuclein. J. Biol. Chem. 276,
45996–6003 (2001).
164
403. Necula, M., Chirita, C.N. & Kuret, J. Rapid anionic micelle-mediated
α-synuclein fibrillization in vitro. J. Biol. Chem. 278, 46674–80 (2003).
404. Ahmad, M.F., Ramakrishna, T., Raman, B. & Rao Ch, M. Fibrillogenic
and non-fibrillogenic ensembles of SDS-bound human α-synuclein. J. Mol.
Biol. 364, 1061–72 (2006).
405. Antony, T., Hoyer, W., Cherny, D., Heim, G., Jovin, T.M. & Subrama-
niam, V. Cellular polyamines promote the aggregation of α-synuclein. J.
Biol. Chem. 278, 3235–40 (2003).
406. Fernndez, C.O., Hoyer, W., Zweckstetter, M., Jares-Erijman, E., Subra-
maniam, V., Griesinger, C. & Jovin, T.M. NMR of α-synuclein-polyamine
complexes elucidates the mechanism and kinetics of induced aggregation.
EMBO J. 23, 2039–46 (2004).
407. Uversky, V.N., Li, J. & Fink, A.L. Evidence for a partially folded inter-
mediate in α-synuclein fibril formation. J. Biol. Chem. 276, 10737–10744
(2001).
408. Eliezer, D., Chung, J., Dyson, H. & Wright, P. Native and non-native sec-
ondary structure and dynamics in the pH 4 intermediate of apomyoglobin.
Biochemistry 39, 2894–2901 (2000).
409. Katou, H., Hoshino, M., Kamikubo, H., Batt, C.A. & Goto, Y. Native-like
β-hairpin retained in the cold-denatured state of bovine β-lactoglobulin.
J. Mol. Biol. 310, 471–84 (2001).
410. Birkett, N. Studies of the formation and characterisation of amyloid fibrils
by the PI3-SH3 domain. PhD Thesis (2007).
411. Ahn, H.C., Le, Y.T., Nagchowdhuri, P.S., Derose, E.F., Putnam-Evans, C.,
London, R.E., Markley, J.L. & Lim, K.H. NMR characterizations of an
amyloidogenic conformational ensemble of the PI3K SH3 domain. Protein
Sci. 15, 2552–7 (2006).
412. Xu, W., Harrison, S.C. & Eck, M.J. Three-dimensional structure of the
tyrosine kinase c-Src. Nature 385, 595–602 (1997).
413. Noble, M.E., Musacchio, A., Saraste, M., Courtneidge, S. & Wierenga,
R. Crystal structure of the SH3 domain in human Fyn; comparison of
the three-dimensional structures of SH3 domains in tyrosine kinases and
spectrin. EMBO J. 12, 2617–24 (1993).
165
414. Martinez, J.C., Pisabarro, M.T. & Serrano, L. Obligatory steps in protein
folding and the conformational diversity of the transition state. Nat. Struct.
Mol. Biol. 5, 721–9 (1998).
415. Booth, D.R., Sunde, M., Bellotti, V., Robinson, C.V., Hutchinson, W.L.,
Fraser, P.E., Hawkins, P.N., Dobson, C.M., Radford, S.E., Blake, C.C.
& Pepys, M.B. Instability, unfolding and aggregation of human lysozyme
variants underlying amyloid fibrillogenesis. Nature 385, 787–93 (1997).
416. Horwich, A.L. & Weissman, J.S. Deadly conformations-protein misfolding
in prion disease. Cell 89, 499–510 (1997).
417. Uversky, V.N. & Fink, A.L. Conformational constraints for amyloid fibril-
lation: the importance of being unfolded. Biochim. Biophys. Acta 1698,
131–53 (2004).
418. Liu, K., Cho, H.S., Lashuel, H.A., Kelly, J.W. & Wemmer, D.E. A glimpse
of a possible amyloidogenic intermediate of transthyretin. Nat. Struct.
Biol. 7, 754–7 (2000).
419. McParland, V.J., Kalverda, A.P., Homans, S.W. & Radford, S.E. Struc-
tural properties of an amyloid precursor of β2-microglobulin. Nat. Struct.
Biol. 9, 326–31 (2002).
166