Jra Phd Final 051107

178
Computational Methods for Characterising Disordered States of Proteins Jane R. Allison A dissertation submitted for the degree of Doctor of Philosophy Trinity College University of Cambridge 13 September 2007

Transcript of Jra Phd Final 051107

Page 1: Jra Phd Final 051107

Computational Methods for

Characterising Disordered States of

Proteins

Jane R. Allison

A dissertation submitted for the degree of Doctor of Philosophy

Trinity College

University of Cambridge

13 September 2007

Page 2: Jra Phd Final 051107

Declaration

The research outlined in this dissertation was carried out by its author at the

Department of Chemistry of the University of Cambridge between October 2004

and September 2007. The work described herein is the original work of the au-

thor and includes nothing which is the outcome of work done in collaboration

except where specifically indicated in the text. It has not previously been sub-

mitted to any institution for any qualification or degree. The length of this

dissertation does not exceed the word limit.

Jane Allison

Cambridge, England

September 2007

i

Page 3: Jra Phd Final 051107

Acknowledgements

I would like to extend my thanks to everyone who helped me with the writing of

this thesis; it is impossible to mention you all individually. In particular, how-

ever, I acknowledge Chris Dobson for accepting me into his group and allowing

me the academic and personal freedom that has made my time in Cambridge

so special. Michele Vendruscolo provided more hands-on supervision, including

an inexhaustible supply of ideas and admirable patience regarding my complete

ignorance of statistical mechanics. Peter Varnai deserves mention for always

taking the time to read and discuss my work and ask me a myriad of ques-

tions. Barbara Richter also provided invaluable support and, in combination

with Amol Pawar, a seemingly boundless appreciation of my cooking.

The key experimental data used in this thesis were provided by Matt Ded-

mon, Rob Rivers and Neil Birkett. Additional data used for validation were

obtained from Carlos Bertoncini and Markus Zweckstetter, and the coil library

ensembles were generated by Abhishek Jha. Finally, it was the work of Kresten

Lindorff-Larsen that initiated the development and application of PRE-ERMD

which forms the basis of this thesis. Thanks must also go to the entire Dobson

group for their willingness to share their expertise.

On a more personal note, I thank my family for their encouragement and

for putting me in a position to take this opportunity. Of the many people in

Cambridge and elsewhere who have contributed to my life over the past three

years, the consistent support of Hope Johnston and the proof-reading efforts

and shared endorphin addiction of Erica Thompson were greatly appreciated.

Additionally, the various sports teams and crews that I have been involved with

provided a vital outlet and both challenged and maintained my sanity. Finally, it

remains to thank the Woolf Fisher Trust for providing the funding that allowed

me to study towards my PhD at Cambridge University.

ii

Page 4: Jra Phd Final 051107

Abbreviations

2D 2-dimensional

3D 3-dimensional

A alanine

A Angstrom

ANS 1-anilinonaphthalene-8-sulfonic acid

αS α-synuclein

βS β-synuclein

β+HC αS/βS construct

C8E5 n-octyl-penta(ethylene glycol)

Cα α carbon atom of an amino acid

Cβ first carbon of an amino acid side chain

CO carbonyl atom of an amino acid

CD circular dichroism

Cf compaction factor

δ chemical shift

∆δ secondary chemical shift

D aspartic acid

D2O deuterated water

DC distance comparison

DLB dementia with Lewy bodies

iii

Page 5: Jra Phd Final 051107

DNA deoxyribonucleic acid

DS disordered state(s)

DSS 2,2-dimethylsilapentane-5-sulfonic acid

E energy

E glutamic acid

EK kinetic energy

EPR electron paramagnetic resonance

ERMD ensemble-restrained molecular dynamics

ET electron transfer

F phenylalanine

FET fluorescence energy transfer

fs femtosecond

G glycine

GB generalised Born

GB/SA generalised Born/surface area

GndHCl guanidine hydrochloride

H hydrogen

Hα hydrogen atom attached to α carbon

HC hydrophobic core

HCl hydrochloric acid

HSQC heteronuclear single quantum coherence

Hz Hertz

IDP intrinsically disordered protein

INEPT insensitive nuclei enhanced by polarization transfer

Iox intensity of peak when spin-label is in oxidised (paramagnetic) state

Iox/Ired intensity ratio

iv

Page 6: Jra Phd Final 051107

Ired intensity of peak when spin-label is in reduced (diamagnetic) state

3J-coupling scalar 3-bond coupling

3JHNHα scalar 3-bond coupling between the amide and Cα hydrogens

kcal kilocalorie

K lysine

K degrees Kelvin

L lower bound on PRE distance restraint

L leucine

M methionine

M molar

MC Monte Carlo

MD molecular dynamics

Mes 2-(N-morpholino)ethanesulfonic acid

mM millimolar

mol mole

ms milliseconds

MTSL 1-oxyl-2,2,5,5-tetramethyl-3-pyrroline-3-methyl methanethiosulfonate

N nitrogen

N asparagine

NAC non-amyloid β component

NaCl sodium chloride

NaOH sodium hydroxide

NFP natively folded protein

NMR nuclear magnetic resonance

nOe nuclear Overhauser effect

ns nanoseconds

v

Page 7: Jra Phd Final 051107

NS native state(s)

P proline

PB Poisson-Boltzmann

PD Parkinson’s disease

PDB protein data bank

PFG-NMR pulse field gradient nuclear magnetic resonance

φ dihedral angle about the N-Cα bond of a polypeptide

PI3-SH3 bovine phosphatidylinositol-3’-kinase SH3 domain

PMF potential of mean force

PPII polyproline II

PRE paramagnetic relaxation enhancement

PRE-ERMD ERMD using distance restraints derived from PRE-NMR

PRE-NMR paramagnetic relaxation enhancement NMR

ps picoseconds

ψ dihedral angle about the Cα-CO bond of a polypeptide

Q glutamine

R1 longitudinal relaxation rate

Rsp1 paramagnetic enhancement of the longitudinal relaxation rate

R2 transverse relaxation rate

Rred2 transverse relaxation rate in diamagnetic conditions

Rsp2 paramagnetic enhancement of the transverse relaxation rate

RCP residual contact probability

RDC residual dipolar coupling

Rg radius of gyration

Rh hydrodynamic radius

rms root-mean-square

vi

Page 8: Jra Phd Final 051107

S serine

SASA solvent accessible surface area

SAXS small-angle X-ray scattering

SD standard deviation

SDS sodium dodecyl sulfate

SDSL side-directed spin-labelling

SE statistical error

SH3 Src homology 3

SPC-SH3 chicken α-Spectrin SH3 domain

T threonine

T simulation temperature

τc correlation time of the electron-proton vector

TS transition state

µM micromolar

µs microseconds

U upper bound on PRE distance restraint

UV ultraviolet

V valine

vdw van der Waals

X1−5 dihedral angles of the MTSL spin-label

Y tyrosine

Zagg predicted aggregation propensity

Zprofagg predicted aggregation propensity profile

vii

Page 9: Jra Phd Final 051107

Abstract

To obtain a complete understanding of the behaviour of proteins it is necessary

to characterise all accessible conformations. This includes not only folded struc-

tures, but also the partially and fully unfolded states populated during folding

and mis-folding. The existence of intrinsically disordered proteins (IDPs) adds

a further category.

The heterogeneous range of structures comprising disordered states (DS)

presents a challenge for structure determination, making an ensemble descrip-

tion essential. Recent advances in techniques such as nuclear magnetic resonance

(NMR) spectroscopy allow site-specific structural information to be obtained

for DS. Experimental observables, however, are time- and ensemble-averages,

whereas definition of an ensemble requires knowledge of the underlying distribu-

tions. Simulations can complement experiments by providing such information.

Consequently, this thesis focuses on the development of computational meth-

ods for characterising DS of proteins. Firstly, a range of existing techniques of

varying degrees of accuracy are tested. Producing structures that are suffi-

ciently expanded proves a major difficulty, and even when this is overcome, the

structures remain incorrect. Long-range distances derived from paramagnetic

relaxation enhancement (PRE)-NMR are therefore incorporated into ensemble-

restrained molecular dynamics (ERMD) simulations to modulate the accessible

conformations. The initial tests are conducted using synthetic data so that the

success of the simulations can be evaluated by comparing distributions as well as

averages. The methodology is improved to account for the anomalous effects of

restraining a highly non-linear average across a limited number of replicas and

the inability of a single type of average to report on the underlying distribution.

The conversion of experimental data into distance restraints is also refined. The

resulting general method is applied using experimental data for three IDPs and

the acid-denatured state of a natively folded protein and new analysis methods

are introduced. The use of ERMD allows the aggregation propensities of the

proteins to be rationalised in terms of the nature of their residual structure.

viii

Page 10: Jra Phd Final 051107

Contents

Declaration i

Acknowledgements ii

Abbreviations iii

Abstract viii

Contents ix

1 Introduction 1

1.1 Unfolded and partially folded states of

proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Protein folding . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.2 Protein mis-folding and aggregation . . . . . . . . . . . . 3

1.1.3 PI3-SH3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Intrinsically disordered proteins . . . . . . . . . . . . . . . . . . . 7

1.2.1 α-synuclein and β-synuclein . . . . . . . . . . . . . . . . . 9

1.3 Methods for characterising disordered states . . . . . . . . . . . . 11

1.3.1 Experimental methods . . . . . . . . . . . . . . . . . . . . 11

1.3.2 Theoretical representations . . . . . . . . . . . . . . . . . 17

1.3.3 Biomolecular simulations . . . . . . . . . . . . . . . . . . 18

1.3.4 Ensemble-restrained molecular dynamics . . . . . . . . . . 20

1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Methods 23

2.1 Simulation methods . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1.1 Unrestrained simulations . . . . . . . . . . . . . . . . . . 23

2.1.2 ERMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2 Restraints for PRE-ERMD . . . . . . . . . . . . . . . . . . . . . 27

ix

Page 11: Jra Phd Final 051107

2.2.1 Calculation of distances from experimental data . . . . . 27

2.2.2 Calculation of distances from reference ensembles . . . . . 28

2.2.3 Accounting for uncertainty in PRE distance restraints . . 28

2.3 Analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 Back-calculation of experimental observables . . . . . . . 29

2.3.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.3 Correlation of distance distributions . . . . . . . . . . . . 33

2.3.4 Distance comparison maps . . . . . . . . . . . . . . . . . 33

2.3.5 Free energy landscapes . . . . . . . . . . . . . . . . . . . . 34

2.3.6 Ramachandran plots . . . . . . . . . . . . . . . . . . . . . 34

2.3.7 Predicted properties . . . . . . . . . . . . . . . . . . . . . 35

3 Simulation of disordered states of proteins 37

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2 Random coil model . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Explicit solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 Implicit solvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4.1 Physiological temperature . . . . . . . . . . . . . . . . . . 43

3.4.2 Methods for generating expanded structures . . . . . . . . 44

3.5 Comparison with experimental data . . . . . . . . . . . . . . . . 47

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Improving the accuracy of ensemble-restrained molecular dy-

namics 59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Theoretical aspects of ERMD . . . . . . . . . . . . . . . . . . . . 61

4.3 Definition of PRE distances . . . . . . . . . . . . . . . . . . . . . 62

4.4 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4.1 Generation of reference ensembles . . . . . . . . . . . . . 65

4.4.2 Absence of correlated motions . . . . . . . . . . . . . . . . 66

4.4.3 Calculation of synthetic distance restraints . . . . . . . . 67

4.4.4 Application of PRE-ERMD . . . . . . . . . . . . . . . . . 69

4.5 Improvement of the PRE-ERMD method . . . . . . . . . . . . . 70

4.5.1 Cross-validation against multiple observables . . . . . . . 70

4.5.2 Explanation of the compaction problem . . . . . . . . . . 72

4.5.3 Solving the compaction problem . . . . . . . . . . . . . . 72

4.6 General protocol for PRE-ERMD . . . . . . . . . . . . . . . . . . 75

4.6.1 Additional modes of validation . . . . . . . . . . . . . . . 76

4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

x

Page 12: Jra Phd Final 051107

5 Comparison of the solution state ensembles of α-synuclein, β-

synuclein and β+HC 81

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2 Factors influencing the calculated distances . . . . . . . . . . . . 82

5.2.1 Correlation time . . . . . . . . . . . . . . . . . . . . . . . 83

5.2.2 Transverse relaxation rate . . . . . . . . . . . . . . . . . . 83

5.2.3 Intensity ratio . . . . . . . . . . . . . . . . . . . . . . . . 84

5.3 Choice of optimal T for characterisation by PRE-ERMD . . . . . 86

5.4 Global dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.5 Characterising residual structure . . . . . . . . . . . . . . . . . . 91

5.5.1 Distance comparison maps . . . . . . . . . . . . . . . . . 93

5.6 Residual structure of αS, βS and β+HC . . . . . . . . . . . . . . 95

5.6.1 Comparison of the re-calculated and previously published

αS ensembles . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.6.2 Long-range structure of βS and β+HC . . . . . . . . . . . 98

5.6.3 Structural propensities of the C-terminus . . . . . . . . . 98

5.6.4 Structural propensities of the N-terminus . . . . . . . . . 99

5.6.5 Dihedral angle preferences . . . . . . . . . . . . . . . . . . 100

5.6.6 Comparison with experimental data . . . . . . . . . . . . 101

5.6.7 Free energy maps . . . . . . . . . . . . . . . . . . . . . . . 103

5.7 Implications for aggregation . . . . . . . . . . . . . . . . . . . . . 105

5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6 Characterisation of the acid-denatured state of PI3-SH3 109

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.2 Experimental PRE-NMR data implies non-native structure . . . 110

6.3 Choice of optimal T for characterisation by PRE-ERMD . . . . . 112

6.4 Global dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.5 Residual structure . . . . . . . . . . . . . . . . . . . . . . . . . . 114

6.5.1 Comparison of the native and acid-denatured states . . . 115

6.5.2 Structural propensities of the acid-denatured state . . . . 117

6.5.3 Comparison with experimental data . . . . . . . . . . . . 120

6.5.4 Free energy maps . . . . . . . . . . . . . . . . . . . . . . . 121

6.6 Implications for aggregation . . . . . . . . . . . . . . . . . . . . . 122

6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7 Conclusions 125

References 128

xi

Page 13: Jra Phd Final 051107

Chapter 1

Introduction

Proteins are the primary constituent of the interwoven biochemical networks

that make up living organisms1, enabling and controlling virtually every chem-

ical process that takes place in the cell2. A complete description of the mech-

anistic link between proteins and physiology is therefore essential in order to

understand not only the normal functioning of biological entities, but also the

myriad of pathological conditions that result from protein malfunction3.

Because a protein’s function is inherently linked to its three-dimensional

(3D) structure, it has been a long-standing goal in structural biology to under-

stand the relationship between a protein’s amino acid sequence and its structural

properties. This traditionally took place within a structure-function paradigm

based on the assumption that a given amino acid sequence dictates a single,

rigid structure upon which the function is entirely dependent4–6. Such a con-

clusion was inevitable given the use of X-ray crystallography as the principal

method to study protein structure, as by definition this is a purification process

that leads to the isolation of conformationally homogeneous molecules5. Addi-

tionally, enzymes, the traditional focus of biochemical studies, are proteins for

which the concept of a unique 3D structure is most tenable7,8. Thus the pro-

tein data bank (PDB), which is dominated by enzymes9 and other proteins that

have been successfully crystallized, does not constitute a representative sample

of the types of structures adopted by proteins in solution8.

More recently, a variety of solution-based techniques, in particular nuclear

magnetic resonance (NMR), have challenged the “one sequence - one structure

- one function” paradigm by revealing the conformational diversity exhibited by

proteins in solution. This includes bond bending and stretching, fluctuation of

side chains, movement of loops and elements of secondary structure and even

global tertiary structure rearrangements5. It is now widely acknowledged that

1

Page 14: Jra Phd Final 051107

proteins are better represented as a probability distribution of conformations

rather than as a single structure. For ‘natively folded proteins’ (NFPs), which

fold into a well-defined and compact globular structure and exhibit only a mod-

est range of motion, the width of these distributions may be relatively narrow.

An increasing number of proteins, however, are being shown to be unstructured

under physiological conditions 4,10,11. Additionally, partially folded and fully

unfolded states of NFPs are of interest due to their central role in defining the

conceptual framework for protein folding and mis-folding12–14. An ensemble

representation is essential for the characterisation of such states, which com-

prise dynamic ensembles of interconverting structures and thus are described

by much broader probability distributions7,15 than NFPs.

In the following sections, the importance of characterising disordered states

(DS) of proteins is elaborated on, and the characteristics and roles of intrin-

sically disordered proteins (IDPs) are outlined. The model systems studied in

this thesis are introduced within the context of the type of DS epitomised by

each. Experimental and computational methods for describing DS are discussed

along with currently accepted concepts of the nature of DS and the supporting

evidence. Finally, ensemble-restrained molecular dynamics (ERMD), a method

for combining theory and experiment to gain a fuller understanding of the struc-

tures accessible to DS at a molecular level, is described. Development of this

method and its application forms the basis of the work described in the subse-

quent chapters of this thesis.

1.1 Unfolded and partially folded states of

proteins

Unfolded states of proteins are the reference state from which both folding into

the native state (NS) and mis-folding into disease-related aggregates such as

amyloid fibrils are initiated. This lends a fundamental motive to the character-

isation of unfolded states; namely, to explain why proteins predominantly fold

to their globular native structures rather than mis-folding into oligomers and

fibrillar aggregates. Additionally, both of these processes may in some cases pro-

ceed via partially folded intermediates3,16–20, which are therefore also of interest

with respect to understanding the mechanisms of folding and mis-folding.

2

Page 15: Jra Phd Final 051107

1.1.1 Protein folding

In order to carry out their biological functions, NFPs must fold into a unique

3D structure. When studied in vitro, folding is initiated from a highly unfolded

state, and it is likely that a similar situation occurs in vivo. An explanation

for the ability of a nascent polypeptide chain to fold rapidly and precisely into

its native fold has long eluded structural biologists. The defining issue, termed

‘Levinthal’s Paradox’14, is the failure of a random search for the native fold

among the vast number of possible conformations accessible to even a small

protein to account for the observed time-scale of folding, which is of the order

of milliseconds to seconds. A proposed solution, which reconciles the stochastic

nature of the folding process with its robustness, is that folding is initiated by

a nucleation event21–25. This establishes a critical core, which then drives the

formation of the remainder of the structure21,26–28. By extension, larger pro-

teins fold by coalescence of substructures that are already partially preformed

according to the same principles21–24,27,29,30, making folding a hierarchical pro-

cess. Such a mechanism implies that the conformational free energy landscape

has been sculpted by evolution so as to allow efficient folding.

To establish the factors responsible for the initiation of folding, a descrip-

tion of the unfolded state of a protein in terms of its constituent structures and

their relative populations is required31. Characterisation of the partially folded

states that occur during the folding process is a further prerequisite for the elu-

cidation of protein folding mechanisms. Small proteins often fold in a two-state

manner, passing through a high energy transition state (TS) which has been

shown to be native-like in many cases25. Larger proteins may form transient

intermediates which may contain non-native as well as native-like structure32.

The probability distributions describing such states are generally broader than

those of the native fold, but narrower than those of fully unfolded states.

1.1.2 Protein mis-folding and aggregation

It has recently become apparent that the highly individualistic globular struc-

ture of a NFP is not the only stable ordered state accessible. An increasing

number of proteins have been shown to mis-fold into amyloid fibrils3. The ag-

gregation of these species is implicated in a number of debilitating diseases,

including Alzheimer’s disease, Parkinson’s disease (PD), type II diabetes, vari-

ant Creutzfeldt-Jakob disease and bovine spongiform encephalopathy33–35. Ad-

ditionally, many polypeptides unrelated to disease can form amyloid fibrils in

vitro under specific conditions3. Whilst it was the detection of amyloid fibrils in

3

Page 16: Jra Phd Final 051107

pathological studies that led to their association with disease, it has been sug-

gested that the fibrils themselves are not the toxic species, but a sequestration

mechanism. The observation that the early pre-fibrillar aggregates are highly

damaging to cells, but the mature fibrils are relatively benign18,36–38 provides

support for such a hypothesis.

The highly organised core of amyloid fibrils consists of β-sheets whose strands

run perpendicular to the fibril axis39. The fibril structure is stabilised primarily

by interactions involving the polypeptide backbone. This, along with the fact

that seemingly any protein can be induced to form fibrils, has led to the be-

lief that the ability to form fibrils is a generic feature of proteins, although the

propensity to form such structures depends on a subtle interplay between the

protein in question and the conditions in which it is studied3,40. It is of interest,

therefore, to consider how disease-related mutations and changes to the envi-

ronment affect the aggregation propensity of a given protein. In Chapter 6, the

latter factor is investigated with respect to the model system described below.

1.1.3 PI3-SH3

The Src homology 3 domain from bovine phosphatidylinositol-3’-kinase (PI3-

SH3) is an example of a NFP for which characterising the unfolded state is

important with respect to both folding and mis-folding.

PI3-SH3 is an 84 residue globular protein which consists of two perpendicular

antiparallel β-sheets of three and two strands, respectively, and two helix-like

turns, arranged into a β-barrel41 (see Figure 6.2 A). It is a member of the SH3

family, a set of small protein modules of around 60− 85 residues which mediate

intra-cellular signal transduction42–44. Despite their low sequence homology,

the family of SH3 domains all exhibit a common fold which has been well-

characterised by both NMR spectroscopy and X-ray crystallography.

At neutral pH, PI3-SH3 folds cooperatively and reversibly in a two-state

manner with no intermediates45. The folding TS of three other SH3 domains

have been shown to contain predominantly native-like structure46 and it is likely

that a similar situation occurs for PI3-SH3.

Under folding conditions, the unfolded state is not stable, thus methods

such as acid-denaturation are used to generate unfolded states from which to

initiate folding studies. After prolonged incubation at low pH, however, PI3-

SH3 aggregates into amyloid fibrils47–51. Characterisation of the acid-denatured

state of PI3-SH3 is therefore important for understanding the factors controlling

the competition between folding and mis-folding.

The fact that PI3-SH3 does not aggregate at neutral pH combined with the

4

Page 17: Jra Phd Final 051107

pronounced lag phase of several days indicate that the acid-induced destabilisa-

tion of the NS is a prerequisite for fibril formation49. Moreover, the susceptibil-

ity of acid-denatured PI3-SH3 to proteolysis is enhanced during the initial stages

of aggregation, suggesting that further unfolding from the acid-denatured state

occurs prior to formation of ordered aggregates, which are almost completely

resistant to proteolytic attack52.

Additional evidence for the requirement of unfolding prior to aggregation

comes from a 3D reconstruction of the fibril structure from cryo-electron mi-

croscopy data50, which provided the first glimpse of amyloid fibril structure.

The fibrils consist of a double helix of two protofilament pairs wound around a

hollow core. The 20 A-wide protofilaments can only contain two flat β-sheets,

which must be oriented differently to those of the native fold to ensure that all

of the strands are perpendicular to the fibre axis. Although the strands may

occur in similar regions of the polypeptide chain to those of the native fold, the

native structure of PI3-SH3 at neutral pH is too compact to fit into the fibril

density, thus it must unfold to adopt a more extended conformation.

The aggregation and structural properties of the acid-denatured state formed

at pH 2.0 have been widely studied. Under these conditions, PI3-SH3 is sub-

stantially unfolded relative to the native fold at neutral pH according to far-

and near-UV circular dichroism (CD) and 1H-NMR, although the binding of

the hydrophobic dye 1-anilinonaphthalene-8-sulfonic acid (ANS) suggests that

there is a partially formed hydrophobic core47. Whilst the acid-denatured state

is more expanded than the native fold, it is still relatively compact compared

to the fully unfolded protein denatured in guanidine chloride (GndHCl)47.

Interestingly, pH titration of PI3-SH3 showed that although the hydrody-

namic radius (Rh) initially increases as the pH is lowered, it reaches a maxi-

mum at around pH 2.4 and decreases again thereafter48. The CD ellipticity at

200 nm follows a similar pattern as a function of pH. The nature of the aggre-

gation process is also dependent on the pH: at pH values less than 2.0 (1.2 and

1.5), amorphous aggregates are rapidly formed, and the aggregation product

includes only a small number of short fibril-like structures, whereas at higher

pH values (2.0 and 2.7) there is a long lag phase, but the final aggregates consist

of morphologically well-defined fibrils. It is thought that these effects are due

to the screening of positive charges by anions at pH values less than 3.0 rather

than changes in the ionisation state of the protein, as it is unlikely that any of

the amino acids in a denatured protein have a pKa below 2.448. Such screen-

ing reduces intra- and inter-molecular repulsion, favouring protein compaction

and aggregation. At higher pH values, where there are fewer positively charged

5

Page 18: Jra Phd Final 051107

side-chains, or at lower ionic strength, where the screening is less effective, the

aggregation occurs more slowly allowing well-organised fibrillar structures to

form.

At first sight, therefore, it appears strange that the GndHCl-denatured state

of PI3-SH3 does not aggregate, as it is even more unfolded than the acid-

denatured state and the ionic strength in these conditions is much higher. Gnd-

HCl, however, interacts preferentially with backbone CONH groups53, increas-

ing their solubility in aqueous solution and thus negating the energetic benefits

of forming not only native contacts, but also the intermolecular interactions

that lead to aggregation.

The role of charge in controlling the aggregation process may be why PI3-

SH3 was, until recently, the only member of the SH3 family known to form

amyloid fibrils. The long n-Src loop unique to PI3-SH3 was shown not to be

responsible for its differing amyloidogenic properties, as insertion of this region

into the chicken α-Spectrin SH3 domain (SPC-SH3), which has the same fold

and 24% sequence identity with PI3-SH3, does not induce fibril formation54.

Instead, insertion of six amino acids from the diverging turn and adjacent RT

loop of PI3-SH3 into SPC-SH3 results in an aggregation phenotype similar to

that of PI3-SH351. Replacement of two residues in this region of PI3-SH3

with those most highly represented in SH3 domains, which increases the net

charge by +2, significantly reduces the aggregation propensity. Other mutations

that increase the charge in this region also prevent aggregation, but addition of

charged residues to the N-terminus do not, indicating that this region plays a

key role in the aggregation of PI3-SH3. Two other SH3 domains lacking the two

conserved basic residues at the diverging turn, the human and Drosophila Abl-

SH3 domains, also aggregate into amyloid fibrils55. The only other SH3 domain

known to form fibrils, c-Yes-SH3, only aggregates in acidic conditions56, further

highlighting the importance of charge in the aggregation of SH3 domains.

As well as elucidating the specific mechanism of fibril formation by PI3-SH3,

characterisation of the acid-denatured state of PI3-SH3 provides an opportunity

to gain insight into the generic determinants of protein aggregation. PI3-SH3

is one of a growing number of model systems used to study the mechanism of

amyloid fibril formation and the toxicity and structural properties of mature

amyloid fibrils47–51,57. Whilst PI3-SH3 is not related to any known patholog-

ical condition and does not form amyloid fibrils in vivo, the fibrils formed in

vitro are morphologically identical to fibrils formed by amyloid disease related

proteins47. Additionally, early granular aggregates formed by PI3-SH3 exhibit

substantial cytotoxicity18. Together, these observations lend support to the

6

Page 19: Jra Phd Final 051107

argument that the ability to form amyloid fibrils is a generic property of the

polypeptide backbone that can be induced in any polypeptide chain given the

right conditions3,58. The investigations of the acid-denatured state of PI3-SH3

described in Chapter 6 are therefore of general interest with respect to the

determinants of protein structure.

1.2 Intrinsically disordered proteins

The unfolded and partially folded states of NFPs are seldom stable under phys-

iological conditions, thus they must be stabilised by artificial means such as

increasing the temperature, introducing mutations, adding chemical denatu-

rants, or, as for PI3-SH3, altering the pH. In contrast, IDPs are fully or par-

tially unfolded under physiological conditions4,6,10,11,59–75, so that their char-

acterisation divulges facets of the relationship between sequence and struc-

ture not encountered in the study of NFPs. Some IDPs undergo a transi-

tion to a more ordered state upon binding to their biologically relevant lig-

ands such as metals, polyamines, small polypeptides, other proteins or mem-

branes4,5,7,10,11,60–63,65,66,76,77. The function of IDPs is dependent on their

highly flexible nature7,72,77, demonstrating that a defined 3D structure is not a

prerequisite for function4.

Statistical analysis has shown that the sequences of IDPs are significantly

different to those of NFPs. In particular, they exhibit a low sequence complexity,

with relatively few of the bulky hydrophobic residues typically found in the core

of folded globular proteins, and a high proportion of polar and charged amino

acids11,73,75,78–85. The resultant charge-charge repulsion and lower driving force

for hydrophobic collapse may explain their disorder10,62,83.

The compositional bias of IDPs provides a distinct signature for disordered

regions in sequence space that has formed the basis of various algorithms that

predict disorder based on sequence7,8,65,70,78,79,82,86–89. These predictors have

shown that disorder is ubiquitous10,68,76,79,86 in all kingdoms of life68, empha-

sising the importance of characterising this class of proteins. A significant pro-

portion of the proteomes of eukaryotes, eubacteria and archaea predicted to

comprise disordered regions of more than 40 contiguous residues4,8,69,84,90. The

exact proportions vary from study to study, however, and may in some cases be

overestimates83, as ligand-induced structure91 as well as the crowded nature of

the intracellular environment5,11,92–96 have been shown to increase the degree

of structure exhibited by such proteins.

IDPs participate in numerous non-catalytic interaction-based biological func-

7

Page 20: Jra Phd Final 051107

tions10,11,59,60,62,63,65,66. These include protein-nucleic acid

interactions7,8,62,84,97,98 during transcription4,62 and translation4,11,61,62,76 and

protein-protein interactions99 that contribute to cellular scaffolding4,84, ion

binding84 and vesicle fusion100 and regulate signal transduction11,61,62,74,76,101,102

and the cell cycle4,11,61,76. In fact, it has been shown that the hubs in the

scale-free protein-protein interaction networks that define cellular function are

typically unstructured or partially structured proteins65. In contrast, enzymes

rarely contain disordered regions, especially those involved in biosynthesis and

metabolism8,74, the prime exception being regulatory kinases8. The prevalence

of disordered regions in the genomes of eukaryotes is therefore likely to be a

consequence of the increased need for cell signalling and regulation in higher

organisms65,90.

The means by which IDPs carry out their functions are intrinsically linked to

their unique physical characteristics103. The relatively large solvent-accessible

surface areas (SASA) of extended disordered structures makes large surfaces

available for intermolecular interactions59. Provision of an equivalently sized

interface by a structured protein would require a 2 − 3-fold increase in molec-

ular weight, resulting in either increased cellular crowding or an enlargement

of cell size by 15− 30%59,62. Moreover, the coupling of folding with binding83

affords low affinity but high specificity binding7,59,98, thus providing fine ther-

modynamic control. Such easily reversible binding is a fundamental requirement

for signalling104.

The conformational heterogeneity of IDPs allows functional diversity at a

single-protein level5,7,11,65,105 in a natural extension of the familiar principle

of allostery5. Alternative splicing may further increase the range of possible

conformations106. The binding of multiple partners by a single protein is a key

feature of the role of IDPs as network hubs; different regions of the protein

can participate in different pathways to avoid cross-talk65. Additionally, con-

formational disorder provides a mechanism for controlling protein activation5.

Dynamic flexibility facilitates post-translational modifications such as phospho-

rylation and ubiquitination, which are common regulatory mechanisms, as the

substrate protein can conform more easily to the active site of the modifying

enzyme7. Furthermore, the rapid proteolytic degradation of IDPs permits fast

and accurate responses to changes in the environmental conditions3,4,62,65. It

has also been shown that allosteric coupling is maximised when one or more of

the coupled domains is intrinsically disordered107.

It has been proposed that the conformational diversity and functional promis-

cuity of IDPs are necessary for the co-evolution of protein fold and function5,

8

Page 21: Jra Phd Final 051107

and thus may be evolvable traits. The expansion of internal repeat regions may

power such evolution71. However there are also some disadvantages to disorder.

IDPs appear to be related to the promotion and proliferation of protein-folding

diseases5,62,74,108 including many neurological conditions. Their role as hubs in

signalling networks means that mis-function has serious consequences, including

the development of cancer8,74. Characterisation of IDPs is therefore important

for understanding both the normal functioning and pathogenesis of biological

entities.

1.2.1 α-synuclein and β-synuclein

α-synuclein (αS) and β-synuclein (βS) are IDPs67,108–110 and so are disordered

in solution4,10,11,59,62,63. Despite being closely related, αS forms amyloid fibrils

in vivo whereas βS does not. The characterisation of these proteins along with

a related construct, β+HC (see below), described in Chapter 5, provides an

opportunity to gain insight into the determinants of intrinsic disorder and the

factors that govern the aggregation of IDPs at a molecular level.

αS and βS are members of the synuclein family of proteins, all of which are of

similar size (∼ 127− 140 residues)10,111,112. They show 62% sequence identity,

mostly due to the conserved imperfect repeats of the KTKEGV lipid-interaction

domain in their N-termini. Despite considerable sequence divergence, the C-

termini of both proteins contain a large number of acidic residues. Perhaps the

most important difference between the two proteins is the absence in βS of 12

mostly hydrophobic residues from within the central non-amyloid β component

(NAC) region (residues 61− 95) of αS113.

Upon binding to lipid membranes and mimetics such as sodium dodecyl

sulfate (SDS) micelles the N-terminus of each protein forms two anti-parallel

α-helices, with a break around residue 40114–120. The C-terminus remains dis-

ordered in the lipid-bound state114–122, which is thought to facilitate its inter-

actions with a variety of binding partners64,65. Although the physiological roles

of αS and βS remain unclear123, several pieces of evidence suggest that their

functions are mediated by lipid binding114,121,124–129.

αS forms amyloid fibrils both in vitro and in vivo, where it is the primary

constituent of the amyloid plaques found in PD and the related dementia with

Lewy bodies (DLB)130–133. The causative link has been further strengthened by

the identification of three mis-sense mutations of αS and a gene triplication, all

of which lead to familial PD and DLB134–139. The pre-fibrillar aggregates rather

than the mature fibrils appear to be the cytotoxic species140–142; accordingly,

protofibril formation is accelerated by the PD-linked mutations141,143. In con-

9

Page 22: Jra Phd Final 051107

trast to αS, βS does not aggregate in vivo144 and requires specific conditions

to induce in vitro aggregation110,145–147. In fact, it may inhibit αS aggrega-

tion148,149.

The contrasting aggregation profiles of αS and βS may be partly due to the

differences in the C-termini150, although this region is known to reduce the ag-

gregation propensity of both proteins145,151. C-terminal-truncated forms of αS

are key components of Lewy body deposits131,133,152 and C-terminal-truncated

mutants of both αS and βS aggregate more readily in vitro145,151,153,154. The

C-termini also have much lower predicted aggregation propensities than the re-

mainder of either sequence145. The major difference between the two proteins

is the greater number of negatively charged residues in the C-terminus of βS,

which may facilitate electrostatic interactions with the N-terminus, thus reduc-

ing the exposure of the central NAC region and inhibiting aggregation. However

a construct containing residues 1− 97 of αS and residues 87− 134 of βS forms

fibrils at a similar rate to wild-type αS in vitro, indicating that differences in the

C-termini are unlikely to be solely responsible for the differences in aggregation

propensity.

The absence of residues 73− 83 of αS from βS147,155 appears to be a more

likely cause of the different aggregation properties. These residues lie within the

central NAC region, which forms the core of αS amyloid fibrils155 and is neces-

sary for fibril formation, particularly residues 66−74156. Residues 71−82 of αS

aggregate alone155,157 whereas αS∆71-82, in which residues 71−82 have been re-

moved from wild-type αS, does not aggregate under physiological conditions155.

On the other hand, αS∆73-83 forms fibrils even faster than wild-type αS145

and αS∆83 has an extremely high predicted aggregation propensity (Zagg)145.

Moreover, the Zagg and measured aggregation rate of the α/β construct stud-

ied in this thesis, β+HC, in which the 11 residue hydrophobic core from αS

(residues 73− 83) that is missing from βS is inserted into βS following residue

72, are closer to those of βS than αS145. These observations have led to the

idea that the negatively charged E83 may act as ‘gatekeeper’ residue145,155,156,

preventing or reducing aggregation by breaking up the stretch of hydrophobic

residues in the NAC region and thus disrupting hydrophobic inter-molecular

interactions.

The contrasting aggregation behaviour of αS and βS despite their high se-

quence homology and similar lipid-bound structures implies that the key to

understanding their differences lies in the solution state from which both lipid

binding and aggregation are initiated, hence furnishing the motivation for their

study by ERMD simulations (Chapter 5). A variety of experimental data have

10

Page 23: Jra Phd Final 051107

been gathered for both αS and βS in solution109,158,159. The N-termini of both

proteins exhibit helical propensity, although to differing degrees. For βS, there

are two distinct regions of higher helical propensity, comprising residues 20−35

and 55−65158. In αS, residues 6−37 have the greatest helical propensity109, al-

though the region of helical propensity extends further towards the N-terminus

than in βS159. There is some suggestion that the break between the helices

that occurs in the lipid-bound structure of βS also occurs to some extent in the

solution structure158. The C-terminus of βS appears to form transient polypro-

line II (PPII) structure145,158, whereas the C-terminus of αS is more disordered.

This difference, which implies an increased stiffness of βS, is most likely due to

the higher negative charge (-16 compared to -14) and greater number of proline

residues (8 compared to 5) in the C-terminus of βS. The comparison of the

ensembles of structures representative of αS, βS and β+HC in solution with

each other and with the experimental data outlined above carried out in Chap-

ter 5 allows the relationship between the differing structural propensities and

aggregation properties of the three proteins to be clarified.

1.3 Methods for characterising disordered states

1.3.1 Experimental methods

The heterogeneity of DS poses severe methodological challenges to their study

by both experimental and computational methods. Traditional structure de-

termination methods such as X-ray crystallography are inappropriate for char-

acterising dynamic ensembles. Various solution spectroscopy techniques have

been applied160, although the wide variety of structures present at any point in

time and their rapid interconversion can hamper the extraction of meaningful

structural information. The most successful of these techniques so far have been

small-angle X-ray scattering (SAXS) and NMR.

The majority of the experimental data utilised in this thesis were deter-

mined by NMR spectroscopy, which is a particularly powerful technique for

characterising DS as it is capable of providing site-specific structural informa-

tion. Additionally, NMR observables contain information about the underly-

ing conformational distribution, although in practice it is extremely difficult

to extract this information. Accessing the underlying distribution is especially

important for DS, for which an average structure is unlikely to be an appropri-

ate representation. NMR is important because it provides the opportunity to

gain a complete description of DS in terms of both the nature of the accessible

conformations and their relative populations. Of the techniques discussed in

11

Page 24: Jra Phd Final 051107

the remainder of this section, some are used quantitatively in the subsequent

chapters as restraints in the ERMD and to assess the quality of the calculated

ensembles, whereas others merely provide a qualitative aid to the interpretation

of the residual structure exhibited by the calculated ensembles.

Global dimensions

Reproducing the global dimensions of DS proved to be fundamental to the

success of the simulation methods investigated in this thesis. The global di-

mensions of a polypeptide chain can be probed by NMR and SAXS, both of

which yield an average over all molecules in solution and the time-scale of the

experiment. Pulsed-field-gradient (PFG)-NMR supplies the translational diffu-

sion coefficient from which the Rh can be calculated in the form of⟨R−1

h

⟩−1,

where the angular brackets denote time- and ensemble-averaging161. The most

common parameter extracted from a SAXS experiment is the radius of gyration

(Rg), which is determined as the root-mean-square (rms) average,⟨R2

g

⟩1/2. The

distribution of all pairwise interatomic distances, p(r) and information regard-

ing the overall shape of the macromolecule can also be obtained162,163. The⟨R−1

h

⟩−1is the preferred measure of the global dimensions for the work de-

scribed in this thesis, as PFG-NMR is conducted under similar conditions to

the remainder of the experimental measurements. p(r) is a potentially impor-

tant quantity, however, due to the scarcity of experimental techniques able to

report on distribution functions, hence its fitness as a means of quantitatively

comparing ensembles is tested in Chapter 4.

Chemical shifts

The utility of NMR stems from its ability to provide local as well as global in-

formation. Chemical shifts, δ, report on the chemical environment experienced

by an atom. The chemical shift dispersion for DS is poor due to conformational

averaging164,165, but it is usually possible to assign the majority of the peaks

using triple-resonance experiments. Deviations from the values expected for a

random coil, referred to as secondary chemical shifts, ∆δ, are used to infer the

tendency of individual residues to sample helical, PPII or extended β-sheet-like

structure166,167. The absolute values of ∆δ recorded for DS are generally much

lower than for residues in fully formed secondary structure elements. Attempts

have been made to obtain quantitative estimates of the fractional occupancy of

the various types of secondary structure167,168, but such analysis is complicated

by the fact that ∆δ from different types of nuclei and residues are not equally

sensitive to secondary structure169. Thus in most cases, ∆δ are simply inter-

preted in terms of structural propensities145,158,170–173. In the work described

12

Page 25: Jra Phd Final 051107

here, they are used to help interpret the residual structure propensities of the

ensembles of structures calculated in Chapters 5 and 6.

3J-couplings

Additional information regarding local structural propensities can be obtained

from 3J-couplings, which report on the φ and ψ dihedral angles of the polypep-

tide backbone174–176. The conformational fluctuations that occur in DS, how-

ever, preclude the direct interpretation of 3J-couplings in terms of a particular

type of secondary structure. For instance, although the characteristic 3JHNHα-

couplings for α-helices and β-sheets are ∼ 4.8 and ∼ 8.5 Hz, respectively, aver-

aging over the contributing conformers results in a shift in the 3JHNHα-couplings

measured for DS towards intermediate values175. It is still possible, however,

to make inferences regarding conformational preferences by considering the de-

viation from the expected random coil values. In this thesis, comparison of

the experimental 3JHNHα-couplings with those back-calculated from various en-

sembles is used to evaluate the legitimacy of the description afforded by both

unrestrained and restrained simulations.

Transverse relaxation rates

The dynamics that complicate the derivation of structural information from3J-couplings can be probed by spin-relaxation NMR techniques. Measurement

of the heteronuclear 15N transverse relaxation rates (R2) of backbone amide

groups allows the identification of regions undergoing restricted motion up to the

ms time-scale170,177. If a simple model is used in which the physical properties

of the polypeptide chain are dominated by unrestrained segmental motion of

the polypeptide main chain178, the R2 values for a fully denatured protein

are predicted to follow a bell-shaped curve, with the shortest relaxation rates

occurring for the terminal regions of the protein. Positive deviation of the R2

values may then be attributed to the presence of non-random structure such as

clusters of hydrophobic side chains170 or regions of increased stiffness. Such an

interpretation is used in this thesis to aid the identification of residual structure

from the calculated ensembles of αS, βS (Chapter 5) and PI3-SH3 (Chapter 6),

although without specifically defining the parameters of the model.

Residual dipolar couplings

Residual dipolar couplings (RDCs) are emerging as a particularly powerful

NMR technique, as they report on both structure and dynamics, providing

long-range as well as local information. RDCs probe the orientation of bond

13

Page 26: Jra Phd Final 051107

vectors relative to the magnetic field179. In isotropic solution, the dipolar cou-

plings average to zero, thus weak alignment of the macromolecule of interest

is required179. This is most commonly induced by carrying out the measure-

ment in dilute liquid crystal media180–186 or in axial matrices such as stressed

polyacrylamide gels187,188.

The measured coupling is an average over all orientations of a given con-

formation with respect to the magnetic field and all conformations sampled by

the macromolecule, thus RDCs report on both the overall shape of the macro-

molecule and the local dynamics of the chemical bond. Because RDCs are av-

eraged over much longer time-scales (ms) than traditional spin-relaxation NMR

experiments (ps-ns), they provide complementary information by reporting on

slower molecular motions that are otherwise inaccessible179,189. The angular

degeneracy of RDCs, however, means that either multiple different types of

couplings must be measured, or the experiments must be repeated in media in

which the alignment of the macromolecule is significantly different179.

If the structure of the molecule is known, then an expected alignment tensor

can be estimated based on the physical properties of the solute in combination

with an appropriate description of the mechanism of alignment. For purely

steric alignment, only short-range repulsive forces dependent on the size and

shape of the molecule need to be taken into account. In charged alignment me-

dia, the situation is more complicated and the electrostatic properties of both

the solute and the liquid crystal must be considered. Methods for computing the

alignment tensor in both situations have been developed190–192. Based on these,

RDCs have been used to define the relative orientations of domains of known

structure and ligand-receptor geometries, validate structures obtained using ho-

mology modelling and refining structures determined using other experimental

observables179. The use of RDCs in ab initio structure determination is com-

plicated by the fact that the magnitude and orientation of the alignment tensor

is not known a priori179. A further limitation is their orientational degeneracy

and the resulting complexity of the energy landscape.

If the molecule or domain under investigation can be considered to be rigid,

then its preferential orientational averaging, including the effects of imperfect

alignment, can be described in terms of the alignment tensor. The measured

coupling then depends simply on the orientation of the inter-spin vector in the

eigenframe of the alignment tensor179. Such a description is seldom appropri-

ate for proteins in solution, however, as even folded globular proteins undergo

significant thermal motion so that the measured RDC incorporates both time

and ensemble conformational averaging193. Various techniques for overcoming

14

Page 27: Jra Phd Final 051107

the aforementioned problems have been developed, mostly pertaining to the

determination of folded NS ensembles.

In the case of DS, the analysis of RDCs is further complicated by the fact

that the internal frame of reference is dynamic on the time-scale of the mea-

surement194. Initially, it was implicitly assumed that this would mean that the

RDCs measured for a random coil would be uniformly zero195, ignoring early

work showing that the ensemble of conformations sampled by a random flight

chain is not spherically symmetric196. Various studies have since confirmed this

result both theoretically195 and experimentally197. It is now well understood

that a random flight chain will give rise to a bell-shaped distribution of RDCs

throughout the sequence, due to the fact that RDCs are local probes and, at

individual loci along the chain, the distribution of orientations of the chain

segment are non-random195. As the most elongated structures align most effec-

tively194, the measured dipolar couplings incorporate information regarding the

range of shapes present as well as their relative weights190. RDCs therefore con-

tain a wealth of information, but methods for implementing them as restraints

in multiple-replica simulations are still probationary, especially for cases such

as DS where the alignment tensor of each replica is expected to be significantly

different. For this reason, they are used here to examine the accuracy of the

calculated ensembles in a similar manner to the 3JHNHα-couplings.

Nuclear Overhauser enhancement

In addition to the parameters discussed thus far, NMR is also capable of pro-

viding inter-atomic distances. The most common form of information used to

determine the structures of NFPs are inter-atomic distances derived from nu-

clear Overhauser enhancements (nOes), cross-relaxation effects between protons

close together in space198. Their used for DS is limited because the expanded

nature of the structures comprising DS means that non-sequential nOes, which

provide the most useful information for structure refinement, are seldom de-

tected164,199, as they are only sensitive to distances up to ∼ 5 A. A modified

method, in which high levels of deuteration were used to increase the sensitivity

of the experiment, allowed a considerable number of long- and medium-range

nOes to be observed for one unfolded state200, but for another only medium-

range nOes were detected173. This approach also has several disadvantages164

which have precluded its widespread application.

Paramagnetic relaxation enhancement

In comparison to nOe experiments, paramagnetic relaxation enhancement

(PRE) is an NMR technique that is sensitive to distances in the range 12−20 A,

15

Page 28: Jra Phd Final 051107

making it particularly useful for characterising DS199,201–205. In the work de-

scribed here, long-range distances derived from PRE-NMR provide the primary

structural information for determining ensembles of structures representative of

DS of proteins.

PRE-NMR utilises the enhancement of proton relaxation by free electrons

to provide information about the distance between a paramagnetic centre and

a nuclear spin173,199,201–204,206–211. The free electron may be provided by ions

bound to native or engineered207,209,211–215 metal-binding sites, modified amino-

acids216–218 or ligands219, intrinsically paramagnetic co-factors220 or, in site-

directed spin-labelling (SDSL), by a covalently attached

spin-label173,201–204,208,210,221,222, many of which contain nitroxide moieties. The

experimental data used in Chapters 5 and 6 of this thesis were determined us-

ing SDSL with 1-oxyl-2,2,5,5-tetramethyl-3-pyrroline-3-methyl methanethiosul-

fonate (MTSL), an example of a nitroxide spin-label. The advantage of using

SDSL is that residues distributed throughout the sequence can be spin-labelled,

ensuring that distance information pertaining to the entire protein is obtained.

The contribution made by the free electron to the relaxation rate of the

amide protons in the protein of interest is defined as the difference between the

longitudinal or transverse relaxation rates measured for the paramagnetic and

diamagnetic states208. The Solomon-Bloembergen equations are then applied

to derive the r−6 distance between each proton for which relaxation rates can

be measured and the free electron223. These equations are based on the as-

sumptions that the proton-electron vector is free to undergo isotropic rotational

diffusion and that its length is fixed199. The consequences of these assumptions

along with other aspects of the distance calculation are discussed in Chapters 4

and 5.

PRE-NMR has been used to refine the global fold of NFPs208–210, de-

termine the structure of integral membrane proteins206,217,221, follow protein-

protein211, protein-DNA207,213,214,220 and protein-ligand215,219 complex forma-

tion and characterise DS of proteins173,199,201–205,222,224. PRE effects have also

been examined using electron paramagnetic resonance (EPR) spectroscopy212,216

and solid-state NMR225. Whilst other experimental methods, notably fluores-

cence energy transfer (FET) and electron transfer (ET)226,227, are able to pro-

vide similarly long-range distance information, a typical PRE-NMR experiment

yields many more distances. This is an important prerequisite for avoiding

under-restraining in multiple-replica simulations, a matter that is discussed fur-

ther in Chapter 4.

16

Page 29: Jra Phd Final 051107

1.3.2 Theoretical representations

The NMR techniques introduced in the previous section provide complementary

information on different aspects of protein structure and dynamics. In order to

interpret the measured observables, which are time- and ensemble-averages, a

conceptual model of the nature of DS is required. There has been much debate

about whether DS are best described as random flight chains, or whether there

is a significant amount of residual structure present.

A random flight chain describes an idealised state in which the bonds be-

tween atoms are of set length, but the angles are unconstrained, giving rise

to distributions of dihedral angles that are dependent only on local steric con-

straints228. The distributions of distances between pairs of atoms are Gaussian-

like in the limit of large sequential separations15. The global dimensions of

unfolded polypeptides measured by experiment229–233 and calculated from sim-

ulations15 agree with random coil predictions, although some IDPs, including

αS and βS, are more compact232,234,235.

In apparent conflict with these results, experimental techniques, in particular

NMR, that give site-specific conformational information, suggest that disordered

states are not completely devoid of

structure31,160,170,171,174–176,178,199,202–205,224,232,236–254, giving rise to the so-

called “reconciliation problem” of how to explain the simultaneous existence

of random coil scaling behaviour and a significant amount of local

structure12,13,233,236,246,255,256.

The solution appears to lie in the fact that the overall dimensions of a

polypeptide chain are relatively insensitive to either local conformational prefer-

ences or more global changes in the distribution of dihedral angles15. Theoreti-

cal studies have shown that random coil-like global dimensions do not preclude

the presence of some degree of residual structure257–259. As an extreme exam-

ple, ensembles of conformations constructed by introducing joints at random

into the structures of NFPs were shown to reproduce adequately the random

coil scaling of the⟨R2

g

⟩−1/2 258.

Whilst these observations explain the apparent discrepancy between random

coil dimensions and the existence of local structural preferences, the origin of

the residual structure observed for many DS remains to be accounted for. The-

oretical models have been developed that explain the limited menu of observed

protein folds in terms of symmetry and geometric considerations260–263, but

these have not been explicitly extended to apply to DS. One suggestion perti-

nent to DS is that the residual structure resides predominantly in hydrophobic

clusters170,247,253,254. It has also been proposed that steric repulsion among

17

Page 30: Jra Phd Final 051107

side-chains may favour native-like topology in unfolded states236. Computa-

tional studies have failed to provide unequivocal support for such an effect

however15,256, even suggesting that the dihedral angle distributions undergo

a quantifiable shift upon folding264.

Further insight into whether dihedral angle preferences can explain the devi-

ations from random coil behaviour suggested by many experimental observables

has been gained from models of DS in which ensembles of structures are gener-

ated by selecting the dihedral angles from coil library databases265,266. These

databases describe the amino-acid specific probabilities of each φ/ψ combina-

tion for residues in loop regions of high-resolution X-ray structures. Whilst it

is not clear whether the analogy between such regions and DS is appropriate,

such models have been remarkably successful in reproducing the residue-level

patterns of experimental observables such as RDCs257,266 and 3J-couplings172.

Inclusion of nearest neighbour effects on the dihedral angle preferences257,267

improves the agreement with experimental data. However in some cases it has

been found that additional information is required in order to explain the exper-

imental observations. For instance, although both the bulkiness of the amino

acids and the RDCs predicted from a coil library ensemble correlate well with

the RDCs measured for the urea-denatured state of αS268, it is necessary to

enforce long-range interactions between the N- and C-termini to obtain a good

match with the experimental data for the unperturbed solution-state ensem-

ble266. On the other hand, when the effects of electrostatic interactions with

charged alignment media are taken into account, the inclusion of long-range in-

teractions no longer improves the predicted RDCs269. To further investigate the

appropriateness of such models as a description of DS ensembles, coil library

ensembles obtained for αS and PI3-SH3 are analysed in Chapters 3, 5 and 6

alongside the ensembles produced in those chapters.

1.3.3 Biomolecular simulations

Computer simulations of biological molecules provide a link between theory and

experiment. The information available from experimental measurements is com-

plemented by the provision of distributions of the properties of interest as well

as atomic-level structural detail270. The ability to visualise protein structures

and their motions has greatly enhanced our understanding of the mechanisms of

protein folding and aggregation by providing a conceptual link between the ab-

stract chemistry of the amino acid sequence and the biology of the 3D fold. The

distributions of observables accessible by simulation are particularly important

for DS, where a broad and heterogeneous range of conformations contribute

18

Page 31: Jra Phd Final 051107

to the time- and ensemble-averaged experimental observables. In such cases,

the relationship between an observable and the underlying distribution is far

from simple. Simulations such as those described in this thesis are therefore

indispensable tools with which to interpret experimental data.

In order to carry out simulations, the molecule(s) of interest must be repre-

sented in silico. This is done using molecular mechanics force-fields270–274 which

comprise various terms describing protein geometry. The functional form of the

potential energy of a given conformation is a sum of individual energy terms:

E = Ecovalent + Enon−covalent. The covalent term includes contributions from

the bond lengths, bond angles and torsion angles (dihedral and improper) and

the non-covalent term comprises the van der Waals (vdw) interactions between

non-bonded pairs and electrostatic interactions between partial charges. Hy-

drogen bonding is often implicitly included in the non-bonded interactions271.

Unlike the remainder of the terms, which provide the energy of a protein in vac-

uum, the electrostatic interactions depend on the environment of the atom(s) in

question. Both theoretical considerations and simulation results indicate that

the effective energy hypersurface of a protein, which includes the effects of sol-

vent, is significantly different from the intramolecular energy hypersurface275.

It is therefore desirable to include solvent in biomolecular simulations, although

the random coil model used throughout this work, which comprises only a sim-

plified representation of the polypeptide chain, is found to afford a reasonable

approximation of a fully unfolded state that is computationally efficient.

Solvent models

So called ‘explicit solvent’, in which the solvent molecules are simulated along

with the biomolecule of interest, provides the most exact representation of the

solvent environment270,276,277. The very large number of solvent molecules re-

quired to model bulk solution, combined with the expanded structures typical

of DS and the long simulation times required to sample the large regions of

conformational space accessible to DS are expected to make explicit solvent un-

suitable for the simulation of DS, a premise that is confirmed by the limited

testing of this technique reported in Chapter 3. For this reason, implicit solvent

models were used for the remainder of the simulations that embody this work.

The computational expense of simulating large biomolecules in explicit sol-

vent has led to the development of various implicit solvent models by substitut-

ing speed for accuracy277. These are generally classified as either empirical or

continuum electrostatics solvation models276,278,279, depending on the theoreti-

cal approaches used to describe the solvation. Essentially, a solvation correction

19

Page 32: Jra Phd Final 051107

is combined with the usual molecular mechanical force-fields describing the in-

tramolecular interactions in vacuum280. The influence of solvent is expressed

within a theoretical framework based on a statistical mechanical formulation of

the so-called ‘potential of mean force’ (PMF)276, whereby the free energy of

the system is expressed as an average over all solvent degrees of freedom280.

As well as increasing the speed of the calculations, this mean field approxi-

mation ameliorates the need for long simulation times to adequately sample

the instantaneous solute-solvent interactions277. The significant enhancement

of computational efficiency provided by implicit solvent models is particularly

beneficial for the simulation of DS, as it allows a wide range of structures to be

sampled within a reasonable time-frame, although the kinetic behaviour may

be unrealistic277,281.

Implicit solvent models have been widely used for a range of applications

including scoring functions for distinguishing native structures from non-native

decoys282–287 and mis-folded structures288,289, the calculation of binding free en-

ergies for protein-protein and protein-ligand interactions290–294, molecular dy-

namics simulations of folding and unfolding trajectories281,295–306 and the pro-

cess of aggregation307,308 and the determination of folding landscapes281,309–323.

They have also been used in biased MD simulations, often in combination

with experimental data, for the refinement of native and near-native struc-

tures287,324,325 and the generation of transition46,298, intermediate326, molten

globule327 and disordered state ensembles201,202,204,328.

One disadvantage of using implicit solvent models is that they are parame-

terised to reproduce the compact globular structures typical of NFPs, thus when

characterising partially or fully unfolded states, artificial means of overcoming

this bias towards compact structures, such as carrying out the simulations at

unphysically high temperatures, are required. As is found in Chapter 3, this

may reduce the quality of the description of the protein-like features of the

molecule, thus methods have been developed for including experimental data

as restraints (see below). The improvement and application of one of these

techniques, ERMD, forms the basis of Chapters 4, 5 and 6.

1.3.4 Ensemble-restrained molecular dynamics

Restrained MD simulations provide a means of overcoming force-field inaccura-

cies and alleviating statistical sampling errors by biasing the trajectories towards

experimentally relevant areas of conformational phasespace. When incorporat-

ing experimental data into simulations, it is essential to consider the fact that

experimental observables are averages over the duration of the experiment and

20

Page 33: Jra Phd Final 051107

the ensemble of molecules present329–331. This is particularly important for DS,

where an average structure is unlikely to be representative. In fact, it may be

physically impossible to find a single structure that satisfies all experimental

observables simultaneously330.

Various methods have been suggested for taking experimental averaging into

account when implementing restraints in MD and Monte Carlo (MC) simula-

tions. One technique is to apply a restraining force if the average of an observable

over a predetermined time-window prior to the current time does not satisfy the

restraint332–334. An alternative approach is ERMD, in which multiple copies

of a molecule are simulated in parallel and the restraint enforced upon the

ensemble average of the observable at each point in time204,328,330,331,335–344.

Simultaneous time- and ensemble-averaging has also been used345. A range

of different protein states have been characterised by ERMD, including disor-

dered201,202,204,205, intermediate, transition and folded

states204,326,337,339–341,346,347.

The ability of ERMD to generate an ensemble of structures that, on average,

satisfies the experimental data is the key to its usefulness for characterising DS

of proteins, for which an average structure does not provide an adequate rep-

resentation. However there remain many issues pertaining to the relationship

between averages and distributions. These are discussed in Chapter 4 and solu-

tions are proposed and tested using synthetic restraints prior to the application

of ERMD with experimental data that forms the basis of Chapters 5 and 6.

1.4 Overview

The results reported in this thesis commence with a thorough investigation of

the ability of unrestrained MD simulations to produce ensembles of structures

representative of DS of proteins using the IDP αS as a model system (Chap-

ter 3). The calculated ensembles are compared with the experimental data

available for αS, firstly in terms of the global dimensions and then with re-

spect to observables that provide more detailed structural information. The

best method identified from these trials is used to generate two reference en-

sembles from which synthetic distance restraints equivalent to those obtained

from a PRE-NMR experiment are calculated. These restraints are used in a

series of tests in which the previously published ERMD method201,202,205 is im-

proved, making it generally applicable to any DS (Chapter 4). The changes

that are made are justified according to how well the reference ensembles are

reconstructed. The resulting protocol is used to produce ensembles of structures

21

Page 34: Jra Phd Final 051107

representative of the IDPs αS, βS, the artificial construct β+HC (Chapter 5)

and the acid-denatured state of PI3-SH3 (Chapter 6). Interpretation of these

ensembles with recourse to the experimental data for each protein provides in-

sight into the factors that govern the balance between folding, mis-folding and

intrinsic disorder.

22

Page 35: Jra Phd Final 051107

Chapter 2

Methods

2.1 Simulation methods

All simulations were carried out within the charmm molecular simulation pack-

age (v. c32a2)271. Where more than one copy of the molecule was simulated

in parallel an in-house version of charmm that has been modified to allow re-

straints to be applied across multiple replicas (ensemble-charmm) was used.

Newtonian dynamics were used, and the Nose-Hoover thermostat348,349 was

employed to ensure that the kinetic energy was compatible with the desired

temperature. Bond lengths were constrained with the shake algorithm350, al-

lowing for an integration timestep of 2 fs. The starting structures for each

protein were generated by building the coordinates for a linear structure from

the amino acid sequence, minimising the energy, running a high temperature

(500 K) simulation with the eef1280 implicit solvent model, and selecting at

random a set of relatively expanded structures. The final ensemble for each sim-

ulation was obtained by pooling together all of the structures obtained during

the production phase; if multiple replicas were used, these were pooled as well.

2.1.1 Unrestrained simulations

Random coil model

A random coil model for each protein (Chapters 3, 5 and 6) was produced

using the charmm19 polar hydrogen representation with the non-bonded inter-

actions truncated so that only the repulsive part of the Lennard-Jones potential

remained (CUTNB 6.0 CTOFNB 3.5 CTONNB 3.0). The simulations were run

in vacuum and electrostatic interactions were ignored. The simulation temper-

ature, T , was typically 500 − 600 K to enhance the rate of sampling, but the

23

Page 36: Jra Phd Final 051107

nature of the resulting ensemble was similar at lower T . The coordinates were

saved every 20 ps for 200 ns, giving 10 000 structures in total.

Explicit water

Simulations of αS in explicit water (Chapter 3) were carried out using the

charmm22351 all-atom potential for the protein and the TIP3P water model352

for the solvent. Periodic boundary conditions were used with a cutoff of 14 A on

the non-bonded interactions. A water box of dimensions 58× 68× 68 A, large

enough to avoid self-interaction of a reasonably expanded αS structure, was

built by translation of a previously equilibrated box. The energy of the starting

structure was minimised prior to insertion in the water box. Once solvated,

the protein was first equilibrated at 300 K for 5 ps with a harmonic restraint

on the positions of all atoms. The force constant was then reduced from its

starting value of 10 kcal·mol−1·A−2 in a series of steps consisting of 50 ps with

a force constant of 1.0 kcal·mol−1·A−2 on all atoms, 25 ps with a force constant

of 0.5 kcal·mol−1·A−2 applied to backbone atoms only, and finally 20 ps with

no restraints. The temperature was then increased to 330 K in 5 K increments

(5 ps per increment). 330 K rather than 300 K was used to increase the rate of

conformational sampling. After further equilibration for 40 ps at 330 K without

restraints, structures were collected every 2 ps for 1 ns. The lengths of covalent

bonds to hydrogen atoms were constrained with shake throughout to prevent

the energy change between progressive integration steps exceeding 20%.

CHARMM generalised Born

Simulations of αS with the generalised Born/surface area (GB/SA) solvation

model (Chapter 3) were carried out using the charmm ‘gbsw’ module, which im-

plements a simple switching function to smooth the electrostatic and non-polar

solvation energy and forces at the boundary353,354. Both the charmm19280 and

charmm22351 representations were tested, with similar results; only those per-

taining to the charmm22 are reported here. Default settings for the integration

parameters, grid spacing, and Coulomb field settings were used. The Born radii

were updated at every integration step.

In the gbsw implementation, the non-polar solvation contribution is con-

sidered only when a non-zero SGAMMA is issued. A zero SGAMMA was

tested along with the default value of 0.03 kcal·mol−1·A−2 after preliminary

simulations using amber showed that eliminating the surface tension term ap-

peared to reduce the bias towards collapsed structures (data not shown). The

change was justified on the grounds that this term was parameterised for na-

tively folded proteins for which a large proportion of the surface of the polypep-

24

Page 37: Jra Phd Final 051107

tide chain is buried. Simulations were carried out at 300, 350, 400, 500 and

600 K with SGAMMA = 0.03 kcal·mol−1·A−2 and at 300 and 350 K with

SGAMMA = 0.00 kcal·mol−1·A−2. 20 independent replicas were simulated in

parallel to enhance the conformational sampling. The starting structures were

minimised in GB/SA prior to starting the simulation. The molecules were first

heated to the desired temperature in 50 K increments (10 ps per increment),

then equilibrated for 0.2 ns before collecting coordinates every 5 ps for further

analysis.

SASA and EEF1

All simulations using the sasa355 and eef1280 implicit solvent models (Chap-

ters 3, 4, 5 and 6) were carried out using the charmm19280 representation, for

which they were exclusively parameterised. The default cutoffs for non-bonded

and electrostatic interactions were used. Periodic boundary conditions were used

with the sasa model because the polypeptide undergoes marked translation in

this system. Multiple independent replicas, typically 16 − 24, were simulated

in parallel to facilitate conformational sampling. The system was first heated

to the desired temperature in 50 K increments (10 ps per temperature), then

equilibrated briefly (0.2 ns) before collecting coordinates every 5 ps for further

analysis.

Reference ensembles

Two different (unrestrained) αS reference ensembles (Chapter 4) were gen-

erated using the eef1280 implicit solvent model as described above. The first,

REF23, was generated at 540 K using 20 independent replicas and the second,

REF20, at 505 K using 16 independent replicas. Structures were collected every

5 ps (2500 steps) for 20 ns per replica, giving a total of 400 ns, or 80 000 struc-

tures for REF23 and 320 ns, or 64 000 structures for REF20. REF23 was filtered

to increase the degree of residual structure by selecting only those structures

with more than 15 contacts between the NAC region (residues 61− 95) and the

C-terminus (residues 110−140) (see section 4.4). Two residues were considered

to be in contact if their Cα atoms were within 8.5 A205. The final ensemble

consisted of 23 675 structures.

2.1.2 ERMD

In ERMD (Chapters 4, 5 and 6), the restraints are applied to multiple indepen-

dent replicas simulated in parallel204,328,330,331,335–344. A reaction coordinate, ρ,

is defined as the difference between the current average of each observable across

25

Page 38: Jra Phd Final 051107

all replicas, f calcl , and the restraint, f ref

l , averaged over all Nrestr restraints:

ρ(t) = N−1restr

Nrestr∑

l=1

(f ref

l − f calcl (t)

)2. (2.1)

When the restraints are distances derived from PRE-NMR experiments

(PRE-ERMD),

f calcl (t) = dcalc

ij (t) =

N−1

rep

Nrep∑

k=1

r−6ij,k(t)

−1/6

, (2.2)

where rij,k(t) is the distance between residues i and j calculated from replica

k of the restrained ensemble at time t and Nrep is the number of replicas. r−6

averaging is used because the distances calculated from the PRE experiment

are r−6 averages.

In the work constituting this thesis, f refl = dref

ij was either the ensemble-

averaged distance calculated from one of the reference ensembles according to

equation 2.8 (Chapter 4) or the distance calculated from the experimental data

(equations 2.6 and 2.7 below) (Chapters 5 and 6). The distance between residues

i and j was defined as being between the Cα atom of the spin-labelled residue

i and the amide hydrogen of residue j. The reasons for this choice are outlined

in Chapter 4.

During PRE-ERMD simulations, dcalcij (t) is allowed to vary freely within

a harmonic square well defined by the lower (L) and upper (U) boundaries.

Justification for the use of these boundaries and the values of L and U chosen

for use in the general ERMD method developed here is given in Chapters 4

and 5.

To enforce the restraint, an energy penalty of the form

αNrep

2(ρ(t)− ρ0(t))

2 (2.3)

is added to the potential energy if ρ(t) > ρ0(t), where

ρ0(t) = min[ρ(τ)] (0 ≤ τ ≤ t) (2.4)

and α is a force constant associated with the restraints. In this way, as the

simulation proceeds, the ensemble of structures is progressively biased towards

structures that, on average, satisfy the restraints.

When the Rg was restrained (Chapter 3), f refl was the desired Rg and

26

Page 39: Jra Phd Final 051107

f calcl = N−1

rep

Nrep∑

k=1

Rg,k(t) (2.5)

where Rg,k(t) was the Rg of replica k at time t.

The PRE-ERMD described in Chapters 4, 5 and 6 was carried out using the

sasa355 implicit solvation model. An extra phase was included immediately

after the heating stage during which α was increased from its starting value of

500 to its final value (Table 2.1) by a factor of 3 every 10 ps. Nrep, L, U and T

were varied as discussed in Chapter 4.

2.2 Restraints for PRE-ERMD

2.2.1 Calculation of distances from experimental data

The PRE-NMR data for αS205 used in Chapter 5 were obtained from M.M. Ded-

mon, including data for an additional spin-label attached to residue N122 which

gave rise to a further 117 distance restraints. PRE-NMR experiments were con-

ducted for βS and β+HC (Chapter 5) by R.C. Rivers and on PI3-SH3 (Chap-

ter 6) by N.R. Birkett. Individual residues throughout the sequence of each

protein were mutated to cysteine for attachment of the paramagnetic spin-label

MTSL. The locations of the spin-labels were chosen so as to minimise the per-

turbation of any residual structure predicted on the basis of ∆δ and, for βS

and β+HC, to match the previous PRE-NMR analysis of αS. The identity of

the spin-labelled residues and the total number of distance restraints for each

protein are shown in Table 2.1.

The 1H-15N heteronuclear single quantum coherence (HSQC) spectra of the

labelled protein was recorded with the spin-label in its oxidised (paramagnetic)

and reduced (diamagnetic) states. The PRE due to the presence of a free

electron was quantified by the intensity ratio, Iox/Ired, which compares the

intensity (height) of the cross-peaks in the oxidised (Iox) and reduced (Ired)

states.

The paramagnetic relaxation enhancement, Rsp2 , was determined by fitting199,208

Iox

Ired=

R2exp(−Rsp2 t)

(R2 + Rsp2 )

, (2.6)

where t is the total INEPT delay time (15.72 ms). R2, the intrinsic transverse

relaxation rate, was assumed to be equal to the R2 of the diamagnetic sam-

ple and was estimated for each residue from the half-height linewidth assuming

27

Page 40: Jra Phd Final 051107

Lorentzian line shapes. The electron-proton distance was then calculated ac-

cording to199,208

r =

[K

Rsp2

(4τc +

3τc

1 + ω2Hτ2

c

)]1/6

, (2.7)

where ωH is the Larmor frequency of the proton and K is a combination of

physical constants. τc is a correlation time that is discussed in more detail in

Chapter 5. The set of distances obtained in this manner were analysed as de-

scribed in Section 5.2.3 to account for experimental uncertainty and imprecision

arising from the nature of equation 2.7. Every 5th distance was excluded from

the working dataset to form a ‘free’ dataset for cross-validation.

2.2.2 Calculation of distances from reference ensembles

Synthetic αS distance restraints were calculated from REF20 and REF23 (Chap-

ter 4) so as to be analogous to the ‘PRE’ distances that would be obtained from

a PRE-NMR experiment. 8 residues distributed throughout the αS sequence

were selected to be ‘spin-labelled’. The r−6-averaged distance between the Cα

atom of each ‘spin-labelled’ residue i and all non-adjacent amide hydrogens on

residues j were calculated from the Nref structures comprising each reference

ensemble according to

drefij =

(N−1

ref

Nref∑

k=1

r−6ij,k

)−1/6

, (2.8)

giving 1000 restraints in total. This number corresponds to the upper limit on

the number of distances that can typically be determined experimentally. A

‘free’ dataset, consisting of a further 1000 distances, was also calculated for use

in cross-validation. r−6 averaging was used because the distances calculated

from the PRE experiment are r−6 averages.

2.2.3 Accounting for uncertainty in PRE distance restraints

The effect of uncertainty in Iox/Ired on the calculated distance was quantified

(Chapter 5) by calculating the distances corresponding to Iox/Ired ranging from

0 − 1, and then repeating the calculations with the Iox/Ired altered by ± 1, 5,

10 or 15%. To ensure that the calculated distances were physically reasonable,

the remaining parameters required for the distance calculation were the same

as for the calculation of distances from experimental Iox/Ired except that the

R2 used was the average over all residues in the sequence.

28

Page 41: Jra Phd Final 051107

Table 2.1: The residues to which spin-labels were attached, the total number

of PRE distance restraints (NPRE) and the value of α used in the ERMD of

αS, βS, β+HC (Chapter 5) and PI3-SH3 (Chapter 6). αS(REF) refers to the

synthetic data back-calculated from REF20 and REF23 (Chapter 4); in all other

cases the data was obtained experimentally.

Protein Spin Label Positions NPRE α

αS(REF) A17 K34 G51 G68 A85 K102 D119 Y136 1000 364 500

αS Q24 S42 Q62 S87 N103 N122 595 364 500

βS A30 S42 S64 F89 A102 S118 A134 635 364 500

β+HC A30 S42 S64 A113 A145 578 364 500

PI3-SH3 M1 S2 L11 L24 L40 S43 E52 E61 G78 P84 639 121 500

The assignment of L and U was carried out in a similar manner for the exper-

imental (Chapters 5 and 6) and synthetic (Chapter 4) distance restraints. The

nature of equations 2.6 and 2.7 means that for high Iox/Ired, a small change in

Iox/Ired results in a large change in the calculated distance. For the experimen-

tal data, Iox/Ired > 0.85 were used as “negative” restraints206,208 by assigning

only a lower bound corresponding to d0.85ij −L, where d0.85

ij is the distance calcu-

lated from Iox/Ired = 0.85. For the synthetic data, the distance corresponding

to Iox/Ired = 0.85 was used as the upper limit.

As a general rule, Iox/Ired < 0.15 are unreliable206,208, as any experimental

uncertainty is large relative to the size of the measured Iox/Ired. Distances

calculated from experimental Iox/Ired < 0.15 and synthetic distances for which

dij < d0.15ij were therefore assigned only an upper bound corresponding to d0.15

ij +

U , where d0.15ij is the distance calculated from Iox/Ired = 0.15. The exact values

of L and U were varied as discussed in Chapter 4.

2.3 Analysis methods

2.3.1 Back-calculation of experimental observables

Rg and Rh

The geometric Rg was calculated from the heavy atoms of each structure

using charmm analysis facilities. During the development of the PRE-ERMD

method using synthetic data (Chapter 4), the ensembles were compared in terms

of the linearly averaged Rg. When experimental restraints were used (Chap-

ters 5 and 6), the⟨R−1

h

⟩−1of each ensemble was computed for comparison with

29

Page 42: Jra Phd Final 051107

the experimental value. The harmonic mean was used to reflect the averaging

inherent in the experimental measurement. For each protein, the Rh of ∼ 200

structures of varying degrees of compactness was computed using hydropro356.

Default settings were used with six sizes of minibead ranging from 1.8− 2.8 A.

The molecular weight and partial specific volume were evaluated from the amino

acid sequence. The relationship between R−1g and R−1

h was parameterised by

linear regression.

When analysing the large ensembles representative of DS, the geometric Rg

of each structure was converted into an Rh according to the relevant equation

for that protein (equations 5.3, 5.4, 5.4 and 6.1). The overall⟨R−1

h

⟩−1was then

computed according to

⟨R−1

h

⟩−1=

(N−1

struct

Nstruct∑

k=1

R−1h,k

)−1

, (2.9)

where Nstruct was the number of structures in the ensemble.

3JHNHα-couplings

Of the various types of 3J-couplings able to be obtained experimentally, only3JHNHα-couplings have been measured for αS and βS (Chapters 3 and 5), but

these cannot be calculated directly from the coordinates of structures obtained

using the charmm19 representation because the Hα atoms are not explicitly

represented. They were therefore calculated indirectly by computing the φ angle

for a given residue, m, from the atomic coordinates of each structure, k, and

then applying the Karplus357 relationship358

3JHNHα,m,k = 6.4 cos2(φm,k)− 1.4 cos(φm,k) + 1.9. (2.10)

The couplings obtained in this manner were linearly averaged over all Nstruct

structures for each residue:

3JHNHα,m = N−1struct

Nstruct∑

k=1

3JHNHα,m,k. (2.11)

RDCs

Amide NH RDCs were calculated for αS (Chapters 3 and 5), βS (Chapter 5)

and PI3-SH3 (Chapter 6) using the steric version of the program pales190

with default settings and PDB format input files. The RDCs for each residue

(excluding P), m, of each structure, k, RDCm,k, were linearly averaged over the

ensemble of structures according to

30

Page 43: Jra Phd Final 051107

RDC = N−1struct

Nstruct∑

k=1

RDCm,k (2.12)

The back-calculated RDCs were not scaled as the magnitude was already

similar to that of the experimental data and any discrepancies were not uni-

formly distributed along the sequence, so that scaling by a uniform factor did

not improve the agreement.

2.3.2 Statistics

〈Rg〉 (t)

The cumulative average of the Rg, 〈Rg〉 (t) (Chapter 3) was calculated ac-

cording to

〈Rg〉 (t) = N−1t

Nt∑τ=1

Rg(τ), (2.13)

where Nt was the number of structures collected at time t. Where multiple

replicas were run in parallel, 〈Rg〉 (t) was calculated separately for each replica.

Q values

The agreement between the synthetic or experimental observables and those

back-calculated from a calculated ensemble was quantified with a “quality fac-

tor”359:

Q =

(∑Nobsl=1 (f ref

l − f calcl )2

)1/2

(∑Nobsl=1 (f ref

l )2)1/2

, (2.14)

where Nobs was the number of observables of that type (such as working or free

PRE distances), and the f calcl were averaged over the pooled ensemble. A lower

Q value indicates a better agreement.

S values

To quantify the agreement of an entire distribution of a given observable the

distance measure344

sl =Nbins∑m=1

∣∣prefm,l − pcalc

m,l

∣∣ (2.15)

was used, where Nbins was the number of bins into which the histogram was

divided and the pm,l were the normalised probabilities of finding a particular

31

Page 44: Jra Phd Final 051107

observable in bin m of histogram l. sl ranges from 0 − 2, with low values

representing similar histograms. Summation over all Nobs histograms quantifies

the overall agreement of two ensembles in terms of distance distributions:

S = N−1obs

Nobs∑

l=1

(sl). (2.16)

The values of sl and S depend on the bin width, the ideal value of which

depends in turn on the width of the distributions being compared. A bin width

of 1 A was found to be broadly suitable for the wide range of distance and Rg

distributions encountered. Using the same bin width for all distributions allows

universal comparisons of the sl values computed from different pairs of atoms

in the same and different ensembles. It is also a prerequisite for combining the

various sl into an overall S value.

Statistical errors

The statistical error (SE) in the back-calculated ensemble-averaged observ-

ables such as the Rg, Rh, PRE distances and RDCs and, where relevant, their

respective Q and S values was estimated by randomly splitting the data into

two sets and computing the averaged quantity or the Q or S value for each set.

The splitting was repeated 10 times such that 20 different averages or Q or S

values were collected. The standard deviation (SD) of these values was taken

as the SE. For all of the data reported here, the SE was less than 1% unless

explicitly stated otherwise.

To assess the contribution to the overall SD made by within-replica and

between-replica variation, two further SDs were defined (Chapter 3). SDbetween

was the SD of the set of ensemble-averaged observables, X:

SDbetween = SD(X), X =[〈X1〉 , 〈X2〉 , . . .

⟨XNrep

⟩], (2.17)

where the averaging was carried out separately for each replica.

SDwithin was the average of the set of Nrep SDs:

SDwithin = 〈SDrep〉 , SDrep =[SD1,SD2, . . . SDNrep

], (2.18)

where each SD in SDrep was for a different replica.

32

Page 45: Jra Phd Final 051107

2.3.3 Correlation of distance distributions

The correlation between two distance distributions (Chapter 4) was investigated

by computing sl values (equation 2.15) to quantify the similarity between the 2D

distance histograms p(rAB, rAC) and p(rAB) ∗ p(rAC), where rAB and rAC were

the distances between the Cα atoms of residues A and B or A and C, respectively.

A, B and C were chosen from a set of 10 residues spaced approximately 14

residues apart along the sequence. This particular set of residues was selected

because they were not included in either the experimental or synthetic PRE

distance restraints, thus the identification of correlations was not complicated

by the direct influence of a restraint on those residues. A high sl value indicates

that rAB and rAC are correlated. It should be noted that an sl value of 2.0

was never obtained, even when B = C, because of the different resolutions of

p(rAB, rAC) and p(rAB) ∗ p(rAC).

The sl values were viewed as 2D plots of sl versus B and C for each value

of A. Discrete points rather than contours were plotted using the matlab (The

MathWorks, Inc) imagesc command. Because there were only 10 possible values

of A, B and C, the matlab interp2 function was used to linearly interpolate

the sl values in 2 dimensions, giving an estimated sl value for all possible BC

combinations for each of the chosen A. No extrapolation was possible, thus the

edges of the plots were left blank to signify a lack of data. The agreement

between the set of sl values computed for two different ensembles was evaluated

by linear regression.

2.3.4 Distance comparison maps

Distance comparison (DC) maps (Chapters 4, 5 and 6) were created by plotting

the rms distance between two residues, i and j, normalised by the rms distance

predicted for a purely random coil:

DC =

⟨dcalc2

ij

⟩1/2

⟨drc2

ij

⟩1/2. (2.19)

The rms inter-residue distances for the calculated ensemble were defined as

⟨dcalc2

ij

⟩1/2

=

(N−1

struct

Nstruct∑

k=1

d2ij,k

)1/2

, (2.20)

where Nstruct was the number of structures in the calculated ensemble. The rms

inter-residue distances for a random coil were predicted according to

33

Page 46: Jra Phd Final 051107

⟨drc2

ij

⟩1/2

= 5.31N0.6sep, (2.21)

where Nsep was the sequence separation between the two residues. This empir-

ical equation was fitted to a model of a random flight chain constructed using

dihedral angles taken from a PDB coil library and including the effects of ex-

cluded volume 360. Similar results were obtained if⟨drc2

ij

⟩1/2

was calculated

from the random coil model of the protein in question. A more accurate equa-

tion for⟨drc2

ij

⟩1/2

that accounts for the location of the two residues within the

polypeptide chain360 was tested but the predicted rms distances were found

to be discontinuous for short sequence separations so it was not used. The

normalisation by⟨drc2

ij

⟩1/2

is important because it removes the dependence of

the inter-residue distance on the sequence separation, allowing pairs of residues

with different sequence separations and also proteins of different lengths to be

compared. The DC was not smoothed and was plotted as discrete points using

the matlab imagesc function.

2.3.5 Free energy landscapes

Free energy landscapes (Chapters 4, 5 and 6) were obtained for each ensemble

in the form of the two-dimensional histogram of p(Rg, X) according to

F (Rg, X) = − ln p(Rg, X), (2.22)

where p(Rg, X) was the joint probability distribution of the Rg and either

the SASA or the end-to-end distance, REE. The SASA was computed using

charmm analysis facilities. An individual value for each residue, averaged over

all structures in the ensemble, was also computed. The free energy landscapes

were displayed as filled contour plots using the matlab contourf function.

2.3.6 Ramachandran plots

Ramachandran plots (Chapters 3, 5 and 6) were created by computing the φ

and ψ dihedral angles for each internal residue for all structures comprising

an ensemble using charmm analysis facilities. The normalised probabilities of

occurrence of each set of (φm, ψn), where m and n refer to 10◦ bins, were plotted

as discrete points on a 2D map using the matlab imagesc function.

34

Page 47: Jra Phd Final 051107

2.3.7 Predicted properties

Rh

The Rh expected if the protein is natively folded (RFh ) or fully unfolded (RU

h )

were calculated according to the empirical relationships161:

RFh = 4.75N0.29

res (2.23)

RUh = 2.21N0.57

res (2.24)

where Nres was the number of residues. These are referred to in Chapters 5

and 6).

Compaction factors

Compaction factors, Cf , quantifying the degree of compaction relative to the

random coil and natively folded states were calculated according to161:

Cf =RU

h −Rexph

RUh −RF

h

, (2.25)

where Rexph was the experimental Rh and RF

h and RUh were computed according

to equations 2.23 and 2.24. Cf ∼ 1 indicates that the protein is of a similar size

to that expected if it were folded into a compact, globular structure, whereas

a Cf near zero indicates a highly expanded chain. The Cf are referred to in

Chapters 5 and 6.

Helical propensity

The program agadir361–364 was used to calculate the helical propensities of

αS, βS, β+HC (Chapter 5) and PI3-SH3 (Chapter 6) based on their sequences

using the online calculator available at http://www.embl-heidelberg.de/cgi/

agadir-wrapper.pl. Larger values indicate that helical structure is more likely.

Hydrophobicity Profile

The Kyte-Doolittle (KD) hydrophobicity profile365 of PI3-SH3 (Chapter 6)

was calculated using a Perl script available from the Canadian Bioinformatics

Help Desk at http://gchelpdesk.ualberta.ca/repository/VersionDetails

.php?filId=66&submissionId=48. The hydrophobicity was smoothed over the

recommended 11-residue window. Hydrophobic regions are assigned KD values

greater than 1.

35

Page 48: Jra Phd Final 051107

Aggregation propensity

Aggregation propensity profiles (Zprofagg ) of αS, βS, β+HC (Chapter 5) and

PI3-SH3 (Chapter 6) were computed by G.G. Tartaglia using an updated ver-

sion of the Zyggregator algorithm366, which predicts the aggregation propensity

of peptides and proteins in aqueous solution from the physicochemical properties

of their constituent amino acids and compares this to the aggregation propen-

sity of a set of randomly generated amino acid sequences of the same length367.

Zprofagg indicates the regions that are most aggregation prone. The overall aggre-

gation propensity scores (Zagg) for various regions of αS, βS and some related

constructs were also computed.

Coil-library model

An ensemble of 5000 αS (Chapter 5) and PI3-SH3 (Chapter 6) structures were

obtained from A. Jha via the server at http://unfolded.uchicago.edu/index

.html. The structures were generated using a self-avoiding statistical coil model

based on backbone conformational preferences from a coil library, a subset of

the PDB265.

36

Page 49: Jra Phd Final 051107

Chapter 3

Simulation of disordered

states of proteins

3.1 Introduction

To understand DS of proteins in molecular detail, it is necessary to characterise

an ensemble of structures. Experimental observables are, in most cases, average

values that do not reveal the range of structures accessible to a disordered

protein. Biomolecular simulation has the potential to provide an ensemble of

structures at atomic level detail. The challenge, therefore, is to reconcile the

two sources of information so that the simulation yields the same ensemble of

structures as gave rise to the experimental observables.

This chapter describes the evaluation of the applicability of a range of differ-

ent MD simulation techniques of varying degrees of accuracy for the generation

of DS ensembles. The implications of the technical details of each method given

in Section 2.1.1 are expanded upon below along with the results. The IDP

αS introduced in Section 1.2.1 serves as a useful model system for determining

which of the simulation methods are most appropriate for DS as it is disordered

in solution, that is, under normal simulation conditions.

The success of the simulations is assessed in two ways. The first criteria are

whether convergence occurs, and if so, how long it takes in terms of both the

internal timescale of the simulation and the real time required to carry out the

calculations. This is important because any ensemble-averaged quantity calcu-

lated from an unconverged simulation will contain errors over and above the

expected statistical noise. An efficient search of conformational space is there-

fore desirable. The Rg, a measure of the global size of the protein structure that

37

Page 50: Jra Phd Final 051107

is simple and fast to calculate from the atomic coordinates, is followed through

time to provide a crude measure of the extent of sampling. The cumulative

time-averaged Rg with respect to the first structure of the production phase

for each replica, 〈Rg(t)〉 (Section 2.13), is also computed to monitor the rate of

convergence. Following these properties separately for each replica reveals the

differences in the conformational space sampled by each replica.

The second consideration is how accurately the effective energy of the pro-

tein is defined. This is assessed by comparing back-calculated observables with

experimental data. Initially, the main factor considered is the⟨R−1

h

⟩−1, which

is calculated from the Rg according to equations 5.3 and 2.9 and compared

with the experimental value determined by PFG-NMR158,235. As it transpires,

obtaining sufficiently expanded structures is an appreciable problem, thus repro-

duction of the experimental⟨R−1

h

⟩−1is a fundamental criterion for the success

of a simulation. Once conditions have been determined in which the average size

of the ensembles of structures is the same as that measured experimentally, the

agreement with other observables that report on more detailed aspects of the

structures is investigated. At this point, an ensemble of structures generated

using a self-avoiding statistical coil model based on backbone conformational

preferences from a coil library265 (Section 2.3.7) is also analysed.

3.2 Random coil model

There is some controversy in the literature12,13,233,236,246,255,256 over whether

a random coil provides an adequate description of DS or whether additional

protein-specific features need to be taken into account. To investigate this, a

random coil model was set up as outline in Section 2.1.1. All of the bonded terms

in the charmm force-field were retained but the non-bonded interactions were

reduced to only the repulsive part of the Lennard-Jones potential. Electrostatic

interactions were ignored and the simulations were carried out in vacuum. This

model preserves the bulkiness of the amino acid side chains and the connectivity

of the polypeptide backbone but little else.

The advantages of this random coil model are that it is fast in terms of

both computer time and also the extent of conformational change that occurs

at each integration step. The large fluctuations in the Rg over time (Figure 3.1)

are indicative of the huge variety of structures that are sampled. Despite this,

〈Rg(t)〉 reaches a plateau in less than 4 ns. Convergence of this global property

is therefore achieved rapidly and efficiently, fulfilling the first assessment criteria.

The absence of solvent, however, means that the time-scale of conformational

38

Page 51: Jra Phd Final 051107

change is unphysically fast due to the lack of friction, thus kinetic parameters

cannot be extracted from such a simulation.

Comparison of the⟨R−1

h

⟩−1with the experimental value for αS158,235 shows

that the structures produced with the random coil model are more expanded

(Table 3.1). The good agreement between the⟨R−1

h

⟩−1of the random coil en-

semble (37.6 A) and the predicted⟨R−1

h

⟩−1for a fully unfolded polypeptide161

(37.0 A) indicates that this random coil model provides a good representation

for fully unfolded states. It does not afford an appropriate description for αS,

however, which is not unexpected given that αS is known to be more compact

than is predicted for a fully unfolded polypeptide of the same length in solu-

tion158,235. Additionally, when simulating in vacuum, the protein-like features

encoded in the solvent model are lost. Given the dearth of quantitative exper-

imental data for DS, it is important to retain as much information as possible

in the model, thus a more detailed representation than this random coil model

was desired.

Table 3.1: The⟨R−1

h

⟩−1of αS determined experimentally1, predicted2 and

calculated from the ensembles of structures generated with the various models as

described in the text. SD is the standard deviation of the⟨R−1

h

⟩−1of the entire

ensemble (all replicas and timesteps). Where multiple replicas were simulated in

parallel, SDbetween and SDwithin describe the variation between replicas and the

average variation of each individual replica as described in 2.3.2. The⟨R−1

h

⟩−1

and all statistics are in A.

Model T (K)⟨R−1

h

⟩−1SD SDbetween SDwithin

Experimental 288 31.9

Predicted (F ) - 19.9

Predicted (U) - 37.0

Random Coil 300 37.6 3.10 - -

sasa 300 20.8 0.74 0.64 0.41

eef1 300 22.2 1.30 1.30 0.42

GB/SA 300 20.9 0.81 0.76 0.27

GB/SA(0)3 300 20.9 0.59 0.59 0.14

Explicit Water 330 24.3 0.15 - -

1. measured by PFG-NMR on 100235 or 200158 µM protein in 99.9% D2O, 20 mM Mes buffer

with 100 mM NaCl, pH 6.5, 288 K.

2. F and U refer to the Rh predicted for a 140-residue natively folded or fully unfolded

polypeptide according to equations 2.23 and 2.24161.

3. GB/SA(0) refers to the simulation in GB/SA with zero surface tension.

39

Page 52: Jra Phd Final 051107

20406080

Ran

dom

Coi

l

1015202530

SA

SA

1015202530

EE

F1

0 200 400 600 8001015202530

GB

/SA

0 200 400 600 800

0 50 100 150 200 25016

18

20

22

Exp

licit

Wat

er

20 40 60 80

Rg (Å) <Rg(t)> (Å)

Timestep

Figure 3.1: Comparison of Rg and 〈Rg(t)〉 during simulations run using the

various MD methods discussed in Sections 3.2, 3.3 and 3.4.1. Where there is

more than one curve, the different colours correspond to the first five replicas of

a multiple replica simulation. Data for the other replicas are omitted for clarity.

The (black, dashed) vertical lines on the plots of Rg correspond to the start

of the production phase. This quantity is shown for the heating, equilibration

and production phases so that the collapse that occurs in the implicit solvent

models can be seen. 〈Rg(t)〉 is the cumulative average at each point in time

and was calculated for the production phase only to monitor convergence. The

time-scale on the abscissa corresponds to the frequency at which the coordinates

were saved, which was every 2500 integration steps for all simulations except

that using the random coil model, in which case the structures were only saved

every 10 000 steps. The integration timestep was 2 fs in all cases.

3.3 Explicit solvent

In contrast to the highly simplified random coil model, explicit solvent is the

most detailed representation possible given practical considerations. Although

it was expected to be too computationally expensive to be practical for DS, a

brief simulation of αS in explicit water was run to provide a reference against

40

Page 53: Jra Phd Final 051107

which to evaluate other simulation methods. A slightly elevated T of 330 K

was used to enhance the rate of conformational sampling. The dimensions of

the water box, 58 × 68 × 68 A, were sufficiently large to allow simulation of a

reasonably expanded αS structure using periodic boundary conditions and the

standard cutoff for non-bonded interactions of 14 A without self-self interactions

of the protein with its images occurring. A fully extended chain would require

a larger box, but as the computational cost of simulating in explicit water is

largely dependent on the number of solvent molecules, this reduced box size was

deemed sufficient for preliminary tests.

Even with this compromise, only a very short run (∼ 0.5 ns) was feasible,

making it highly unlikely that convergence would be achieved. Several µs or even

ms of sampling are likely to be required to cover the range of structures expected

for a DS, as the time-scale of sampling in explicit water is physically relevant.

Accordingly, the Rg changes very little over the duration of the simulation,

resulting in an essentially flat plot of 〈Rg(t)〉 (Figure 3.1) and a low SD of the

〈Rh〉−1 (Table 3.1). It is obvious from this trial run that simulation in explicit

water for the time-scales required to obtain convergence for DS is not practical.

Additionally, it is not clear whether sufficiently expanded structures would be

sampled even with long simulation times. Other experimental observables were

therefore not calculated from this ensemble, and alternative, simpler models

were investigated.

3.4 Implicit solvent

A compromise between the extremes of the random coil model and explicit

solvent models is to use implicit solvent models. These represent the effects of

the solvent as a PMF by integrating over the solvent degrees of freedom. Of the

wide range of implicit solvent models available, three commonly used models

from the charmm simulation package were tested: GB/SA354, sasa355 and

eef1280. These are introduced below followed by the results of simulations at

physiological temperature and, subsequently, the effect of restraining or altering

various parameters on the global size of the structures.

As with the random coil model, the conformational transition rate in implicit

solvent models is greatly increased relative to explicit water due to the lack of

the viscosity usually imparted by the random collisions of solvent molecules with

the solute281,368,369. This is a desirable feature for the simulation of DS, as it

makes the sampling of conformational space much faster, although any kinetic

parameters extracted from the trajectories will be incorrect.

41

Page 54: Jra Phd Final 051107

Generalised Born

Generalised Born solvation models are inspired by the Born equation for cal-

culating the electrostatic solvation energies of ions370 and are made possible

by the simple generalisation of the Born formula to polyatomic molecules by

Still371. This has been shown to provide a good approximation to the more

exact Poisson-Boltzmann (PB) solvation energies353 at a lower computational

cost372,373. The GB/SA model implemented in charmm uses a simple poly-

nomial smoothing function374 to define the dielectric boundary between the

interior and exterior of the protein353. The electrostatic solvation energy is

estimated by solving the Born equation, and the non-polar solvation energy is

approximated from the solvent-exposed surface area using a phenomenological

surface tension coefficient.

EEF1 and SASA

The eef1 (effective energy function)280 and sasa355 implicit solvent models

combine estimates of the free energy of solvation with the charmm19 polar

hydrogen energy function to provide the effective energy function for a protein

in solution. The formulation of the screening effect of the solvent is the same in

both models. The formal charges on ionic side-chains (D, E, R and K) and the

termini are neutralized, and a distance-dependent dielectric constant is used to

approximate the charge-charge interactions in solution. This simple dielectric

function does not take different environments into account, meaning that it does

not distinguish whether or not the interacting partial charges are buried or on

the protein surface.

The main difference between sasa and eef1 is the way that the solvent

exclusion is accounted for. eef1 is based on solvent-excluded volume. The

solvation free energy of the protein molecule is assumed to be the sum of group

contributions, which are evaluated by subtracting the amount of solvation lost

due to solvent exclusion by proximal atoms of the macromolecule from the

solvation free energy of that group in a small model compound. The solvation

free energy density is given by a Gaussian function. In comparison, the sasa

model assumes that the polar and non-polar contributions made by each atom to

the free energy of solvation are proportional to their SASAs, which are calculated

by an analytical approximation to increase the efficiency. The direct solvation

of polar groups is favoured, whereas the hydrophobic effect on apolar groups is

accounted for by a positive atomic solvation parameter.

42

Page 55: Jra Phd Final 051107

3.4.1 Physiological temperature

In the first instance, simulations of αS using each of the three implicit solvent

models outlined above were carried out at physiological temperature (300 K).

The structures produced using all three models are more compact than those

present experimentally. The⟨R−1

h

⟩−1of each ensemble is close to the Rh pre-

dicted for αS if it were a folded globular protein (Table 3.1). Even though

relatively expanded starting structures were used, each replica quickly collapses

and remains a similar size thereafter (Figure 3.1). Examination of the trajecto-

ries with the molecular visualisation program vmd375 showed that the collapsed

conformation of each replica depends on the starting structure, and that once

collapsed, the structure of each replica changes little over the remainder of the

simulation. This effect is reflected in the 〈Rg(t)〉 of each replica, which reaches

a plateau within ∼ 0.5 ns and remains constant thereafter (Figure 3.1). The

addition of solvent therefore greatly restricts the conformational sampling at

physiological temperature compared to the random coil model.

The SD of the⟨R−1

h

⟩−1is lower than that of the random coil model for all

of the implicit solvent models considered. To evaluate the relative contributions

that the differences between the major species into which each replica collapses

and the within-replica variation over the duration of the simulation make to

this variation, the SD of the set of Nrep

⟨R−1

h

⟩−1, SDbetween, was compared to

the average over all replicas of the SD of the⟨R−1

h

⟩−1of each replica, SDwithin,

(Section 2.3.2). In all cases, the magnitude of SDbetween is similar to that of

the overall SD, whereas SDwithin is lower. Thus the major contribution to the

overall SD is the variation between replicas, providing further evidence for both

the restricted sampling of each individual replica and the dependence of the

structures sampled by each replica on the starting structure.

Two major obstacles to the realistic simulation of DS states can be identified

from these preliminary simulations. Firstly, although each individual replica

converges, the structures sampled by each replica depend on the choice of start-

ing structure, thus the overall ensembles obtained by pooling all of the repli-

cas are not converged. Secondly, the correspondence with the experimental⟨R−1

h

⟩−1is poor, indicating that the effective energy function provided by the

force-field and implicit solvent models does not accurately describe the con-

formations visited by αS in solution in terms of their global dimensions. The

following section describes the various means of overcoming these problems us-

ing implicit solvent models that were investigated.

43

Page 56: Jra Phd Final 051107

3.4.2 Methods for generating expanded structures

Restraining the Rg

One possible means of ensuring that the average size of the molecules matches

that observed experimentally is to explicitly restrain the 〈Rg〉. The restraint is

applied to the ensemble-average calculated across all Nrep replicas at each point

in time to take into account the time- and ensemble-averaging of the experi-

mental measurement. ERMD simulations in which the 〈Rg〉 was restrained to

match the Rg corresponding to the experimental⟨R−1

h

⟩−1 158,235 were carried

out with each of the three implicit solvent models discussed above. Although

it is possible to satisfy the imposed Rg restraint, this is achieved by one or two

replicas becoming almost completely extended, and the remainder collapsing

just as in the unrestrained simulations (data not shown). Whilst the experi-

mental⟨R−1

h

⟩−1, as an average, does not rule out such a situation being an

accurate reflection of the state on which the PFG-NMR measurement was car-

ried out, none of the other commonly used experimental techniques detect such

a structural dichotomy. The Rg restraint also suffers from the fact that the

best compromise between minimising the SASA and satisfying the restraint is a

prolate ellipsoid. This results in elongated structures, which again may or may

not be typical of those present experimentally. Restraining the Rg is therefore

an unsatisfactory solution to the compaction problem.

Reducing the surface tension

In the charmm GB/SA implementation, the non-polar solvation energy is

estimated from the SASA. This contribution is only considered if the input

parameter SGAMMA, which describes the non-polar surface tension, is non-

zero. The fact that IDPs sample expanded rather than collapsed, globular

structures in solution suggests that surface tension does not affect them in the

same way as it does NFPs. To mimic this situation, SGAMMA was set to

zero. There is no noticeable effect on the resulting trajectories, however, as

summarised by the identical⟨R−1

h

⟩−1and similar statistics (Table 3.1). Thus

adjusting the surface tension is not a viable method for generating suitably

expanded structures.

Increasing T

In sasa, the contribution made by solvent exclusion effects to the solvation

free energy is approximated from the SASA and in eef1, it depends on the

exclusion of solvent by proximal solute atoms. Both models are parameterised

to fit experimental data for natively folded proteins and small peptides, and

44

Page 57: Jra Phd Final 051107

therefore favour compact structures in which the SASA is minimised or the sol-

vent exclusion maximised. Rather than altering the solvent models themselves,

which is beyond the scope of this work, the free energy of the polypeptide can be

easily altered by changing T . Within an implicit solvent model, T does not cor-

respond directly to a physical quantity, rather, it can be thought of as a source

of kinetic energy, EK. Increasing T , and therefore EK, provides a means of

compensating for the inherent bias of implicit solvent models towards compact

structures.

Increasing T with the GB/SA solvation model has little effect, with the⟨R−1

h

⟩−1remaining almost constant as T increases from 300 to 600 K (Fig-

ure 3.2). Higher T were not tested because according to the observed trend,

extremely high T would be required to produce structures of the required size.

The continued preference for compact structures even at high T may be related

to the known over-stabilisation of salt bridges312,316,369,376,377 and hydrogen

bonds313,378 by GB/SA. It is thought that this may occur because of insuffi-

cient electrostatic screening, which would be expected to be particularly perti-

nent for a highly charged protein such as αS. Whilst these effects have mostly

been observed for the opls-aa implementation of GB, the charmm GB/SA

model has been shown to over-stabilise the NS of a folded protein relative to

explicit water310.

The sasa and eef1 models, in contrast, are much more responsive to changes

in T . The⟨R−1

h

⟩−1increases with T up to ∼ 700 K (Figure 3.2). The same

effect occurs for the IDPs βS and β+HC (Chapter 5) and the acid-denatured

state of PI3-SH3 (Chapter 6), thus it is likely to be a general feature of high T

simulations with these implicit solvent models. This is an important result, as

it means that by manipulating T , an ensemble of structures with an⟨R−1

h

⟩−1

that matches the experimental value can be obtained for a DS of any protein.

Simulating at high T also alleviates the lack of convergence observed at

physiological temperatures. The range of Rg sampled by each replica is much

greater at higher T (500−600 K), yet this quantity converges almost as quickly

as at low T (Figure 3.3). The overall SD of the⟨R−1

h

⟩−1of the pooled ensembles

does not increase monotonically with T because this quantity is also affected

by the pooling of multiple replicas, meaning that it reflects both inter- and

intra-replica variation, as discussed previously with regard to the simulations

at physiological T (Section 3.4.1). SDwithin, however, shows a clear increase

with T (Table 3.2). The reversal in the relative contributions of SDbetween and

SDwithin to the overall SD as T increases is indicative of the burgeoning range

of structures sampled by each replica at higher T .

45

Page 58: Jra Phd Final 051107

Increasing EK by increasing T provides a simple solution to the two main

difficulties encountered when attempting to simulate DS at physiological tem-

peratures. At elevated T , a wide range of different structures are accessible to

each independent replica, thus the sampling of conformational space is more

comprehensive. Additionally, the structures are of a similar degree of expansion

as is expected for disordered protein states and the global size can be tuned by

altering T . Some additional consequences of simulating at high T are discussed

below.

300 400 500 600 700Simulation Temperature (K)

15

20

25

30

35

40

<Rh

-1>-1

)

Figure 3.2: The⟨R−1

h

⟩−1of the ensembles generated using sasa (black), eef1

(red) and GB/SA (green) at various temperatures.

Consequences of simulating at high T

When simulating at high T , the effective barriers between different conforma-

tions are reduced because the free energy difference is lower relative to the avail-

able thermal energy. Effectively, the free energy landscape appears smoother

from the point of view of the protein, which confers both advantages and dis-

advantages. It facilitates conformational sampling by increasing the speed of

conformational transitions and the range of accessible conformations. It is also

the reason why the⟨R−1

h

⟩−1increases, as compact structures no longer occupy

deep minima on the free energy surface.

A disadvantage of a smoothed free energy landscape is that minima that do

exist may not be detected. Additionally, the increased rate of sampling prevents

the extraction of meaningful kinetic parameters. This is not an issue for the

purposes described here, but should be considered if kinetics are of interest.

A consequence that is more relevant to the production of ensembles of struc-

tures representative of DS is that the effect of the energy penalty for violation of

46

Page 59: Jra Phd Final 051107

Table 3.2:⟨R−1

h

⟩−1and SD calculated from the ensembles of αS structures

generated with the sasa and eef1 implicit solvent models at T ranging from

300− 600 K. SD is the standard deviation of the⟨R−1

h

⟩−1of the entire ensem-

ble (all replicas and timesteps). SDbetween and SDwithin describe the variation

between replicas and the average variation of each individual replica as outlined

in Section 2.3.2. The⟨R−1

h

⟩−1and all statistics are in A.

Model T (K)⟨R−1

h

⟩−1SD SDbetween SDwithin

sasa 300 20.8 0.74 0.64 0.41

eef1 300 22.2 1.30 1.30 0.42

sasa 400 21.1 0.60 0.16 0.57

eef1 400 21.7 0.87 0.57 0.64

sasa 500 26.6 2.98 0.22 2.97

eef1 500 25.3 2.05 0.25 2.03

sasa 600 32.9 3.39 0.25 3.32

eef1 600 32.2 3.26 0.29 3.49

the protein-like features encoded in the molecular mechanics force-field, such as

the dihedral and improper angles, is reduced due to the higher overall energies

of the molecule. The Ramachandran plots for ensembles produced using the

sasa and eef1 implicit solvent models at T in the range 300−600 K show that

a more diffuse range of dihedral angles are sampled at higher T (Figure 3.4).

However the overall nature of the Ramachandran plots remains typical of each

solvent model as T increases, with the sasa model favouring α-helical struc-

ture more than the eef1 model, showing that the dihedral angle preferences are

relatively robust to increases in T within the range considered here.

3.5 Comparison with experimental data

Whilst the production of ensembles of structures whose⟨R−1

h

⟩−1matches the

experimental value is a significant result, it is also necessary to consider whether

the description of the protein provided by the force-field and implicit solvent

models is an accurate reflection of more detailed aspects of the nature of the

structures present experimentally. The quality of the ensembles of structures

can be assessed by comparing back-calculated observables with experimental

data. Here, three types of NMR data are considered: long-range PRE distances

equivalent to those obtained from a PRE-NMR experiment, 3JHNHα-couplings

47

Page 60: Jra Phd Final 051107

0204060

S30

0

0204060

E30

0

0204060

S40

0

0204060

E40

0

0204060

S50

0

0204060

E50

0

0204060

S60

0

0 200 400 600 8000

204060

E60

0

0 200 400 600 800

Rg (Å) <Rg(t)> (Å)

Timestep

Figure 3.3: Comparison of Rg and 〈Rg(t)〉 during simulations run using the

sasa (S) and eef1 (E) implicit solvent models at a range of T (300, 400, 500

and 600 K). The different colours correspond to the first 5 of 16 replicas. Data

for the remainder of the replicas are omitted for clarity. The (black, dashed)

vertical lines on the plots of Rg correspond to the start of the production phase.

This quantity is shown for the heating, equilibration and production phases so

that the collapse that occurs at low T can be seen. 〈Rg(t)〉 is the cumulative

average at each point in time and was calculated for the production phase only

to monitor convergence. The time-scale on the abscissa corresponds to the

frequency at which the coordinates were saved, which was every 2500 integration

steps of 2 fs.

and RDCs, all of which have been measured experimentally for αS.

The ensembles

Four different ensembles of αS structures are considered. The first, the ran-

dom coil model introduced in Section 3.2 (αRC), provides a baseline from which

to infer the effect of the protein-like features encoded in more detailed represen-

tations. The RDCs calculated from this ensemble are of particular interest given

that one of the few protein-like features it retains is the bulkiness of the amino

acid side chains, which have been shown to correlate with the RDCs measured

for the urea-denatured state of αS but not native αS268. The⟨R−1

h

⟩−1of αRC

also compares favourably with that measured for urea-denatured αS158.

48

Page 61: Jra Phd Final 051107

Figure 3.4: Ramachandran plots of p(φ, ψ) for ensembles of αS structures gen-

erated using the (A-D) eef1 and (E-H) sasa implicit solvent models at (A,E)

300 K, (B,F) 400 K, (C,G) 500 K and (D,H) 600 K. The probability of each

combination of φ and ψ dihedral angles is the average over all residues and all

structures in the ensemble. The same scale is used for all plots to facilitate

comparisons.

As an intermediate between the random coil model and the description of

the polypeptide chain engendered by the implicit solvent models, an ensemble

of 5000 structures generated using a self-avoiding statistical coil model based

on backbone conformational preferences from a coil library database265 was

obtained from A. Jha (αCOIL). In addition to the covalent connectivity and

excluded volume effects provided by the random coil model, this model includes

local dihedral angle preferences, including nearest-neighbour effects, but lacks

a description of amino-acid specific long-range interactions such as electrostatic

interactions. Bernado et al. found that RDCs calculated from a similar ensemble

of αS structures are similar to the experimental data for the urea-unfolded state,

but the RDCs for native αS in purely steric alignment media are best reproduced

when only those structures exhibiting long-range interactions between the N-

and C-termini are considered266. As with the random coil model, the⟨R−1

h

⟩−1

of αCOIL is similar to that of urea-denatured αS158.

The final two ensembles were produced in accordance with the results of

Section 3.4.2 using the sasa and eef1 implicit solvent models at high T . The

T was chosen separately for each implicit solvent model so that the⟨R−1

h

⟩−1

matched the experimental value for αS158,235, thus ensuring that in terms of

global dimensions, the structures are equivalent on average to those present

experimentally. The ensemble of structures produced using sasa (αSASA) was

49

Page 62: Jra Phd Final 051107

generated at 570 K and that produced using eef1 (αEEF1) at 600 K.

PRE distances

The long-range distances obtained from PRE-NMR provide information about

the tertiary structure of the conformations present. Because distances corre-

sponding to Iox/Ired < 0.15 or Iox/Ired > 0.85 cannot be calculated exactly,

these distances are represented in the experimental dataset as d0.15ij and d0.85

ij ,

respectively. Any correlation between the experimental and back-calculated dis-

tances observed for d0.15ij < dij < d0.85

ij is therefore not expected to extend to

dij > d0.85ij and dij < d0.15

ij . For this reason, only d0.15ij < dij < d0.85

ij contribute

to the Q values reported in Figure 3.5. The PRE distances were defined as the

distance between the Cα atom of the spin-labelled side-chain and the amide

hydrogen atom; this definition is explained and justified in Section 4.3.

The ensemble-averaged PRE distances back-calculated from αSASA and

αEEF1 are almost all shorter than those recorded experimentally (Figure 3.5).

This can be reconciled with the fact that the average size of the molecules is

by definition the same as in the experiment by considering the sensitivity of

the r−6 average to small values of r. If the distance distributions are broader

than those present in vitro, the r−6-averaged PRE distances will be shorter, as

is seen here, even though the⟨R−1

h

⟩−1, which is a near-linear average, is similar

to the experimental value. This concept is discussed further in Chapter 4.

The distances calculated from αRC and αCOIL are generally of a similar

size to those determined experimentally, although there is more variation in

the calculated distances, as quantified by the larger Q values. Given that the⟨R−1

h

⟩−1of these ensembles are significantly larger than the experimental value,

this again suggests that the distance distributions characteristic of these models

are broader than those contributing to the experimental observable.

0 10 20 30 400

10

20

30

40

0 10 20 30 400

10

20

30

40 Q = 0.41

0 10 20 30 400

10

20

30

40

0 10 20 30 400

10

20

30

40 Q = 0.27

0 10 20 30 400

10

20

30

40

0 10 20 30 400

10

20

30

40 Q = 0.25

0 10 20 30 400

10

20

30

40

0 10 20 30 400

10

20

30

40 Q = 0.25

dijca

lc (

Å)

dijexp

(Å) dijexp

(Å) dijexp

(Å) dijexp

(Å)

A B C D

Figure 3.5: Comparison of the PRE distances back-calculated from (A) αRC,

(B) αCOIL, (C) αEEF1 and (D) αSASA with the experimental data. The red

line corresponding to a perfect agreement is shown to guide the eye. The Q

values were calculated according to equation 2.14 using only d0.15ij < dij < d0.85

ij .

50

Page 63: Jra Phd Final 051107

3JHNHα-couplings

The 3JHNHα-couplings were back-calculated and compared with the experi-

mental values obtained from C. Bertoncini158. Because there are no Hα atoms

in the charmm19 representation, the 3JHNHα-couplings were computed indi-

rectly from the φ angles (equations 2.10 and 2.11). 3JHNHα-couplings measured

experimentally are around 3− 5 Hz for helical structure (including α-helix and

PPII) and 8−11 Hz for β-sheet structure. For a random coil, a weighted average

of these values is expected, typically around 6− 8 Hz.

The 3JHNHα-couplings computed for αRC, however, are close to 5 Hz for all

residues, as are those for αEEF1 and αSASA (Figure 3.6). The only ensemble to

show significant variation in the 3JHNHα-couplings throughout the sequence is

αCOIL, reflecting the inclusion of amino-acid specific dihedral angle preferences

in the structural parameters of this model265. Even so, the agreement with

experimental data is poor, and the average value of the 3JHNHα-couplings is

again ∼ 5 Hz.

Although the 3JHNHα-couplings of all four ensembles resemble the values

expected for helical structure, this does not necessarily imply that the struc-

tures are predominantly α-helical. 3JHNHα-couplings are only sensitive to the φ

dihedral angle, thus PPII and α-helical structure cannot be distinguished. Ex-

amination of the Ramachandran plots shows that for all four ensembles, PPII

structure is favoured over extended β-sheet-like structure (Figure 3.7), thus ex-

plaining why the 3JHNHα-couplings are lower than is expected for a random

coil. The Ramachandran plot for αCOIL is noticeably different to those of the

ensembles generated using charmm, in keeping with the distinctive 3JHNHα-

couplings computed for this ensemble. The distribution of dihedral angles is

not as smooth and there are small regions of particularly high probability den-

sity corresponding to PPII and α-helical structure.

All of the four ensembles considered here fail to reproduce the experimental3J-couplings, indicating that the dihedral angles favoured by the random coil

model, the charmm19 force-field in combination with either the sasa or eef1

implicit solvent models and even the coil library database are not the same as

those sampled by αS in solution. Additionally, it shows that reproduction of the

global scaling of a polypeptide chain is not sufficient to gauge whether the local

structure is correct, in keeping with the numerous experimental and theoretical

results discussed in Section 1.3.2.

RDCs

RDCs combine information about both the local structure and the overall

51

Page 64: Jra Phd Final 051107

0 20 40 60 80 100 120 1403

4

5

6

7

8

9

0 20 40 60 80 100 120 1403

4

5

6

7

8

9

0 20 40 60 80 100 120 140Residue Number

3

4

5

6

7

8

9

0 20 40 60 80 100 120 140Residue Number

3

4

5

6

7

8

9

A

C

B

D

3 J HN

Hα (

Hz)

3 J HN

Hα (

Hz)

3 J HN

Hα (

Hz)

3 J HN

Hα (

Hz)

Figure 3.6: Comparison of the 3JHNHα-couplings back-calculated from (A) αRC,

(B) αCOIL, (C) αEEF1 and (D) αSASA, shown in red, with the experimental3JHNHα-couplings158, shown in black.

Figure 3.7: Ramachandran plots of the dihedral angle distributions p(φ, ψ) for

(A) αRC, (B) αCOIL, (C) αEEF1 and (D) αSASA. The probability of each

combination of φ and ψ dihedral angles is the average over all residues and all

structures in the ensemble. The same scale is used for all plots to facilitate

comparisons.

alignment properties of the molecule, and thus provide the most stringent test

of the quality of the ensembles of αS structures. The NH RDCs measured in n-

octyl-penta(ethylene glycol)/octanol (C8E5/octanol) and Pf1 bacteriophage240

were therefore obtained from C. Bertoncini and M. Zweckstetter, respectively

and compared to RDCs back-calculated from each of the four ensembles.

The NH RDCs were back-calculated using the steric version of pales190.

It was not possible to obtain predictions using the electrostatic version. Steric

pales estimates A, the tensor describing the average solute orientation with

respect to the magnetic field, by excluding the fraction of the solute orienta-

tions that sterically clash with the alignment media, and averaging the individ-

ual alignment matrices, A’, calculated from the atomic coordinates of a given

structure for each non-obstructed position and orientation. The independent

52

Page 65: Jra Phd Final 051107

prediction of the alignment tensor for each individual conformation is impor-

tant as differential alignment is expected to provide the greatest contribution

towards non-zero RDCs measured for DS193–195.

Other methods for calculating the alignment tensor have also been

described179, including an efficient approach in which the steric alignment prop-

erties are derived from the information regarding the shape asymmetry present

in the molecular inertia tensor192. A similar approximation was used by Sosnick

et al. to calculate RDCs from coil library ensembles of a variety of proteins257.

To test the method of Sosnick et al., the coil library ensemble of ubiquitin

structures analysed in their study was obtained. Comparison of the NH RDCs

calculated using their method and using pales revealed that the two methods

give very different results (data not shown). Because pales is a well-established

program that has been widely and successfully applied, it is used for all calcu-

lations reported here.

In C8E5/octanol, the alignment is expected to be purely steric. Pf1 bacterio-

phage, in comparison, bear an overall negative charge on the protein-accessible

side379, thus at low salt concentrations the positively charged N-terminal do-

main of αS is expected to be strongly attracted to the phage, and the negatively

charged C-terminal domain should be repelled. Despite this, NH RDCs calcu-

lated from a coil library ensemble of αS structures using steric pales were

found to give a reasonable agreement with the experimental data measured in

Pf1 alignment media269, thus the use of steric pales here is acceptable.

To complement the previous analyses of the convergence properties of the

multiple replica simulations in terms of the Rg (Section 3.4), the RDCs aver-

aged individually for each replica are compared to the RDCs averaged over the

entire ensemble (Figure 3.8 A). The results are reported for αSASA only but

were similar for the other ensembles (data not shown). The ensemble-averaged

RDCs for each replica are quite different, indicating that each replica explores a

different region of conformational space. This highlights the utility of carrying

out multiple-replica simulations, as significantly longer simulation times would

be required to obtain the same coverage of conformational space with a single

replica.

To investigate further the range of RDCs contributing to the overall av-

erage and how this is affected by the number of contributing structures, the

average and its SD was calculated for various fractions of the αSASA ensemble

(Figure 3.8 B-E). The SD remains consistently high even when all of the 51 355

structures are considered. It is noteworthy that the SD is an order of magnitude

larger than the average RDCs, which lends some uncertainty to the significance

53

Page 66: Jra Phd Final 051107

of the fine structure of the RDC pattern along the sequence from which the

presence of residual structure is often inferred.

-20246

RD

C (

Hz)

1 10 20 50 100

1

2

3

<SE

>

-60-40-20

0204060

RD

C (

Hz)

-5

0

5

10

RD

C (

Hz)

-60-40-20

0204060

RD

C (

Hz)

-5

0

5

10

RD

C (

Hz)

-60-40-20

0204060

RD

C (

Hz)

-5

0

5

10

RD

C (

Hz)

0 20 40 60 80 100 120 140Residue Number

-60-40-20

0204060

RD

C (

Hz)

20 40 60 80 100 120 140Residue Number

-5

0

5

10

RD

C (

Hz)

%(Nstruct)

A

B

C

D

E

F

G

H

I

J

Figure 3.8: RDCs calculated from various fractions of αSASA. (A) RDCs for

each residue averaged over all structures sampled by each individual replica,

shown in a different colour for each replica and the entire ensemble, shown as a

thick black line. (B-E) Ensemble-averaged RDCs for each residue and their SD

where the ensemble comprises (B) 1, (C) 10, (D) 20 and (E) 100% of αSASA. (F)

The 〈SE〉 in the ensemble-averaged RDCs when the ensemble comprises varying

proportions of αS and the brackets denote averaging over all residues and G-

J. ensemble-averaged RDCs for each residue and their SE where the ensemble

comprises (G) 1, (H) 10, (I) 20 and (J) 100% of αS. The grey lines at 0 Hz in

(A) and (G-J) are to guide the eye. Note that different scales are used for plots

A-E and G-J.

The SE in the ensemble-averaged RDCs calculated from varying proportions

of the αS ensemble was also computed (Figure 3.8 F-J). The most significant

decrease in the 〈SE〉, where the brackets indicate averaging over all residues,

occurs when going from 1 to 10% of the ensemble, whereas when 50% of the

structures are considered, the SE is comparable to that of the entire ensemble

(Figure 3.8 F). In addition to the reduction in the SE, the ensemble-averaged

RDCs for each residue also change as the number of contributing structures in-

creases (Figure 3.8 G-J). The range of RDC values and their fluctuations along

54

Page 67: Jra Phd Final 051107

the sequence are much greater when fewer structures are considered. This im-

plies the need for caution when comparing RDCs back-calculated from synthetic

ensembles to experimental data, as if there are insufficient structures, the aver-

age will not be converged. Minimising the SE provides a means of determining

the appropriate number of structures to use.

The αEEF1 and αSASA ensembles contain 31 262 and 51 355 structures,

respectively, which is sufficient that the RDCs computed from the entire ensem-

ble are converged. αRC, however, only contains 10 000 structures and αCOIL

is even smaller, as only 5000 structures were provided by A. Jha. This is almost

certainly too few for the RDCs to be converged. Nevertheless, the RDCs were

back-calculated from all four ensembles and compared to the experimental data

(Figure 3.9)

Whilst the calculated and experimental RDCs are of similar magnitude,

the residue-specific agreement is poor for all four ensembles. For αEEF1 and

αSASA, the calculated RDCs are most like those measured in Pf1 and the

magnitude is low throughout the sequence, whereas for αRC and particularly

αCOIL the calculated RDCs are closer to those measured in C8E5/octanol.

The two prominent peaks in the C-terminus visible in the experimental data

measured in both media are not reproduced by any of the ensembles. αCOIL,

which provides a reasonable match to the first peak of the C8E5/octanol data,

performs the best in this regard, but the RDCs in the N-terminus are larger

than those measured in either media. Thus comparison of the back-calculated

and experimental RDCs suggests that, as was seen for the 3JHNHα-couplings,

reproduction of the global dimensions can occur independently of an accurate

representation of the local structure.

Additional structural features can be introduced by selecting only certain

structures according to pre-defined criteria. A previous study found that the

RDCs back-calculated from subsets of structures selected according to the for-

mation of particular intra-molecular contacts are in good agreement with the

experimental RDCs for native αS266. To investigate this conjecture, two sub-

ensembles of αS structures were selected from αCOIL by choosing only struc-

tures with at least one contact between residues 1− 20 and 120− 140 (αCOIL-

C:N) or residues 61− 95 and 110− 140 (αCOIL-C:NAC). αCOIL-C:N was de-

signed to mimic the filtered ensemble found to best reproduce the experimental

data by Bernado et al.266, and αCOIL-C:NAC incorporates the intra-molecular

contacts found to be most probable in the PRE-ERMD study of αS by Dedmon

et al.205. Here, a contact is said to occur if the Cα atoms of two residues were

closer than 15 A, following the definition used by Bernado266. The magnitude

55

Page 68: Jra Phd Final 051107

0 20 40 60 80 100 120 140-5

0

5

10

0 20 40 60 80 100 120 140-5

0

5

10

0 20 40 60 80 100 120 140Residue Number

-5

0

5

10

0 20 40 60 80 100 120 140Residue Number

-5

0

5

10

A

C

B

D

RD

C (

Hz)

RD

C (

Hz)

RD

C (

Hz)

RD

C (

Hz)

Figure 3.9: Comparison of (black) the RDCs back-calculated from (A) αRC, (B)

αCOIL, (C) αEEF1 and (D) αSASA with the experimental RDCs measured in

(red) C3E5/octanol and (green) Pf1 bacteriophage240. The grey line at 0 Hz is

to guide the eye.

of the RDCs for each sub-ensemble vary more throughout the sequence than the

RDCs calculated from αCOIL, but it is not possible to determine whether this ef-

fect arises from differences between the original and filtered ensembles or simply

results from fewer structures contributing to the average. Even the 5000 struc-

tures comprising αCOIL are unlikely to be sufficient to produce fully converged

RDCs and the filtering procedure further reduces the sizes of the ensembles to

1302 structures (αCOIL-C:N) and just 173 structures (αCOIL-NAC:N). Neither

αCOIL-C:N or αCOIL-C:NAC provide a good match to the experimental data

(Figure 3.10), but again this may be due to the relatively small size of these en-

sembles, thus it is difficult to draw any meaningful conclusions regarding either

how well these ensembles describe the NS of αS in solution, or the relative roles

that local and global structure play in determining the RDCs.

3.6 Conclusions

Any simulation should be carried out using the best protocol available. For the

simulation of polypeptides in water, this means using an explicit water model.

The computational cost, however, renders such an approach impractical for sim-

ulating DS. Implicit solvent models are less accurate, but the computational cost

is greatly reduced. Simulations of the IDP αS carried out using three common

implicit solvent models at physiological temperature did not converge, and the

structures were too compact with respect to the experimental⟨R−1

h

⟩−1. The

56

Page 69: Jra Phd Final 051107

0 20 40 60 80 100 120 140

-5

0

5

10

0 20 40 60 80 100 120 140

-5

0

5

10

0 20 40 60 80 100 120 140Residue Number

-5

0

5

10

0 20 40 60 80 100 120 140Residue Number

-5

0

5

10

A

C

B

D

RD

C (

Hz)

RD

C (

Hz)

RD

C (

Hz)

RD

C (

Hz)

Figure 3.10: Comparison of the RDCs back-calculated from (blue) (A,C)

αCOIL-C:N and (B,D) αCOIL-NAC:C with (A,B) (black) αCOIL and (C,D)

the experimental RDCs measured in (red) C3E5/octanol and (green) Pf1 bac-

teriophage240. The grey line at 0 Hz is to guide the eye.

random coil model did not suffer from the problems of compaction and con-

vergence, but at the expense of eliminating many of the protein-like features.

Increasing T with either the eef1 or sasa implicit solvent models allowed con-

verged ensembles consisting of sufficiently expanded structures to be generated.

Whilst raising T is not an ideal solution, there is not yet a force-field and solva-

tion model available that is capable of recognising that a protein is intrinsically

disordered from the sequence alone.

Comparison of the PRE distances, 3JHNHα-couplings and RDCs back-calculated

from the random coil ensemble, a coil library ensemble and two ensembles gen-

erated at high T using the sasa and eef1 implicit solvent models showed that

even when the global dimensions of the polypeptide are reproduced, the local

and long-range structure is not equivalent to that detected experimentally, in-

dicating that the types of structures sampled by the models considered here

are not a good representation of those accessible to αS in vitro. It is unlikely

that carrying out the MD at elevated T is the sole cause of the discrepancies,

as the local structure is relatively impervious to increasing T . Despite exhibit-

ing significantly different 3JHNHα-couplings to the remainder of the ensembles,

αCOIL did not provide a better match to the experimental data, thus for αS,

at least, selecting the dihedral angles from a coil library database is not suffi-

cient to describe the local structure. Moreover, unlike in other studies of coil

library ensembles of αS structures, filtering the ensemble to extract only those

structures exhibiting specific types of tertiary structure did not improve the

57

Page 70: Jra Phd Final 051107

agreement of the calculated RDCs with the experimental data. An alternative

to filtering that eliminates the need for an ad hoc choice of selection criteria

is to use experimental data as restraints during the simulation process. Such

methods are used throughout the remainder of this work; specifically, the use

of long-range distances from PRE-NMR in ERMD is investigated as a means of

generating more representative ensembles.

58

Page 71: Jra Phd Final 051107

Chapter 4

Improving the accuracy of

ensemble-restrained

molecular dynamics

4.1 Introduction

As discussed in Chapter 3, the simulation of DS is made difficult by the het-

erogeneous and expanded nature of the structures comprising DS ensembles.

Molecular dynamics simulations in explicit water represent the most realistic

and accurate means of simulating macromolecules in solution. However this

method is impractical for simulating DS due to the high computational cost re-

sulting from the large water box required to accommodate the expanded struc-

tures typical of DS and the long simulation time necessary for convergence.

Whilst implicit solvent models are much faster, high simulation temperatures

must be used to generate sufficiently expanded structures. The smoothing of

the free energy landscape induced by simulating at high temperatures with sim-

plified solvation models means that some of the structures gathered in such

simulations may not be physiologically relevant. Indeed, it was found in Chap-

ter 3 that even for ensembles whose⟨R−1

h

⟩−1is consistent with the experimental

value, the back-calculated PRE distances, 3JHNHα-couplings and RDCs bear lit-

tle resemblance to those measured experimentally. Restraining the simulations

with experimental observables provides a means of overcoming these problems

by restricting the sampling of conformational space to encompass only those

structures that are compatible with the experimental data.

59

Page 72: Jra Phd Final 051107

Experimental observables measured for DS are averages over broad distribu-

tions, as DS ensembles typically encompass a heterogeneous range of structures,

meaning that it is not appropriate to apply the restraints to a single replica.

ERMD is therefore explored as a means of reconstructing DS in silico, with the

aim of establishing a general ERMD method that can be used to characterise

any DS. Rather than rely on the limited experimental data available for DS, the

method is developed using synthetic data back-calculated from two different

reference ensembles.

There are many advantages of measuring the success of the method accord-

ing to its ability to reproduce known reference ensembles. Problems related to

possible inaccuracies in the experimental data and in the translation of exper-

imental NMR signals into structural restraints are avoided344. Moreover, the

ensembles produced using ERMD can be compared to the reference ensembles

from which the restraints were calculated in terms of both averages and dis-

tributions. This is important because ensembles are best described in terms

of distributions, which are generally not accessible experimentally. Addition-

ally, as it is not clear how to restrain a distribution, it is necessary to develop

methods in which average values are restrained, but the accuracy of the result-

ing ensemble is assessed in terms of distributions. Even this definition may be

insufficient if correlated motions are present, as then two ensembles may differ

even if all their distance distributions are equal. The use of a synthetic reference

ensemble allows the presence of correlations and their effective reproduction to

be investigated.

The intrinsically disordered protein αS112, introduced in Section 1.2.1, is

used as a model system. This 140-residue protein has been studied previously

by both PRE-NMR205,240 and ERMD205. Two different reference ensembles

were generated using unrestrained MD. Neither is expected to be an exact re-

flection of the ensemble of structures sampled by native αS under experimental

conditions; if they were, then restraints would not be required. The exact na-

ture of the reference ensembles is in fact somewhat arbitrary, as the aim is to

find the optimal computational procedure for reproduction of a variety of known

reference ensembles, and thus establish a general method for using ERMD to

characterise DS ensembles.

The range of quantitative experimental observables that can be measured

for DS is quite limited, as discussed in Section 1.3.1. Additionally, not all

measurable quantities are suitable for use as restraints; for instance, methods for

restraining RDCs across multiple replicas have not yet been developed due to the

difficulties imposed by the need to calculate a separate alignment tensor for each

60

Page 73: Jra Phd Final 051107

conformation. This thesis focuses on PRE-ERMD, in which long-range distances

obtained from PRE-NMR are used as restraints. These data are expected to

enhance the description of protein-like features already present in the force-field

and implicit solvent model by providing information about the tertiary structure

of the molecule, particularly where the distances are between residues far apart

in sequence. Many of the obstacles encountered here and the solutions proposed

are expected to be relevant for other types of data as well.

In Section 4.2, theoretical considerations related to model-fitting and the

relationship between averages and distributions are outlined. The way in which

the equivalent of the distances obtained from the PRE-NMR experiment are

defined within the context of the simulations is then explained and justified with

reference to experimental data describing the motion of a spin-label attached

to a polypeptide. Following this, the generation of the reference ensemble and

calculation of synthetic distance restraints are described. Preliminary results are

reported in Section 4.4.4 and the causes of various issues that arise from these

and the solutions that are found are discussed. The resulting general method

is applied in Section 4.6 and finally, the calculated and reference ensembles are

compared using novel techniques.

4.2 Theoretical aspects of ERMD

Optimisation of the PRE-ERMD procedure requires consideration of various

issues that are best thought of in the context of ERMD as a model-fitting

process. Fitting a model involves obtaining the best fit to the data without using

an unnecessarily and unjustifiably large number of free parameters. If there are

too many degrees of freedom relative to the amount of information, under-

restraining occurs331. In the case of ERMD, the number of free parameters is

determined by the number of atoms comprising the polypeptide and the number

of replicas. The sources of structural information are the protein-like features

encoded in the force-field and implicit solvent model and the observables used

as restraints. The number of replicas therefore cannot become too large since as

this number grows, the experimental information quickly becomes insufficient

to define the structures of all of the replicas.

The opposite problem, over-restraining, arises because experimental data are

ensemble-averages over hundreds or thousands of molecules which, depending

on the relative time and spatial scales of the molecular motion and the exper-

imental measurement, may sample many different conformations. DS, in par-

ticular, comprise a heterogeneous and broad range of structures. Consequently,

61

Page 74: Jra Phd Final 051107

it is not appropriate to enforce restraints upon a single replica, since a single

structure compatible with all of the restraints is unlikely to be representative

of the structures actually present, and may in fact be physically impossible to

obtain331,341,380.

The number of replicas must therefore be carefully chosen so as to avoid over-

or under-restraining the data. The standard means of determining the optimal

number of replicas is cross-validation331,381–384. Typically, around 20% of the

data are excluded from the working dataset (the restraints). Reproduction of

these free data provides a more stringent test than satisfaction of the restraints.

This is because, unlike the satisfaction of the restraints, which generally im-

proves with more replicas, reproduction of the free data becomes worse. This

type of cross-validation is particularly effective in identifying the appearance of

under-restraining, but it may be insufficient when over-restraining is present385.

For example, compact conformations of ∆131∆ pass this cross-validation test,

even if they are over-restrained204.

However even if both the working and free datasets are reproduced, this

is not necessarily sufficient to guarantee that the underlying distributions are

accurately reconstructed. Many different distributions can give rise to the same

average386 (Figure 4.1 A and B). Equally, two mostly similar distributions can

have a different average, if the region of the distributions to which that type of

average is most sensitive differ. These effects occur because different types of

average report on different aspects of the underlying distribution (Figure 4.1). A

linear or near-linear average, for instance, lies near the centre of the distribution,

whereas a highly non-linear average such as an r−6 average lies towards the edge,

and is most influenced by the outliers in the distribution.

If more than one type of average is known, they can be combined to give

more information about the shape of the underlying distribution (Figure 4.1 C).

Proof of this principle was given by Choy et al.387, who obtained a more precise

description of the Rg distribution of an unfolded protein when they fitted the

distribution function to both the⟨R2

g

⟩1/2 derived from SAXS and the⟨R−1

h

⟩−1

measured by PFG-NMR simultaneously. It also forms the basis of one aspect

of the improvements to the previously published ERMD method made in this

chapter.

4.3 Definition of PRE distances

In a SDSL PRE-NMR experiment, the spin-label is covalently attached to the

sulfur atom of an introduced cysteine residue. All of the experimental data

62

Page 75: Jra Phd Final 051107

0 20 40 60 80 100Distance

0

0.01

0.02

0.03

0.04

0.05

0.06

0 20 40 60 80 100Distance

0

0.01

0.02

0.03

0.04

0.05

0.06

0 20 40 60 80 100Distance

0

0.01

0.02

0.03

0.04

0.05

0.06<r

-6>

-1/6<r

-6>

-1/6

<r><r> <r>

<r-6

>-1/6

A B Cp

(Dis

tan

ce)

p(D

ista

nce

)

p(D

ista

nce

)

Figure 4.1: The relationship between different types of average and the under-

lying distribution. Two distributions for which (A) the r−6 averages are equal

but the linear averages are different, (B) the linear averages are equal but the

r−6 averages are different and (C) both types of average are the same.

used in Chapters 5 and 6 were obtained using the nitroxide spin-label MTSL

(Figure 4.2). The reduction in the 1H peak intensity in the 1H-15N HSQC

spectrum that occurs due to the presence of the paramagnetic spin-label is

dependent on the r−6 average of the distance between the free electron of the

spin-label and the amide hydrogen whose cross-peak is affected. This average

distance can be calculated from Iox/Ired, the ratio of the intensity of the peaks

in the 1H-15N HSQC spectrum when the spin-label is in its oxidised and reduced

states, using equations 2.6 and 2.7199,208,224.

In order to use distances derived from PRE-NMR as restraints, it is necessary

to define an equivalent set of distances within the context of the simulation. It

is possible to use any arbitrary definition for the purpose of calculating and

implementing the synthetic PRE distance restraints, but to ensure that the

method is applicable with experimental distance restraints, the same definition

should be used for both. The atomic coordinates of the amide hydrogens may

simply be extracted from the coordinates of the molecule. Demarcating the

electron-proton distance is complicated, however, by the range of issues relating

to the representation and behaviour of the spin-label outlined below.

If the overall protein structure can be assumed to be rigid, such as in the

determination of a folded NS206–210, then correct interpretation of the electron-

proton distance requires knowledge of the orientation and motion of the spin-

label. Various attempts have therefore been made to model the spin-label side

chain explicitly. Single-replica simulated annealing with the spin-label included

has been used to determine folded native structures208–210, and the paramag-

netic group has been prepended to known X-ray structures to check the quality

63

Page 76: Jra Phd Final 051107

of the experimental PRE distances206,209. In one study, flexibility of the para-

magnetic group was accounted for by using a multiple-conformer representation

and an S2 order parameter for the PRE interaction vector, but the remainder of

the molecule was represented by a single structure207. This approach is difficult

to implement, however, when PRE distance restraints from multiple spin-label

positions are used simultaneously, thus it was not attempted here.

EPR spectroscopy has shown that at solvent-exposed loop sites, MTSL has

high isotropic mobility388. When attached to the exposed surface of a helix,

the g+g+ conformation, with X1 and X2 ∼ 300◦ is dominant, but this weak

conformational preference is easily overcome by specific favourable interactions

or the need to minimize steric clashes389. Rotation about the disulphide bond

is constrained due to a relatively high activation energy of ∼ 7 kCal·mol−1 390,

resulting in a preferred rotamer of X3 ∼ 270◦ 391. Thus of the 5 rotable bonds

between the nitroxide ring and the polypeptide backbone, there is only signifi-

cant motion about the X4 and X5 bonds388,389, of which rotation about X4 is

the primary determinant of the position of the nitroxide ring388,391.

When attached to exposed loop regions, the nitroxide of MTSL has high

flexibility389, corresponding to the large range of allowed X4 values of the g+g+

state. Rotation about X4 moves the nitrogen atom of the nitroxide on a circle of

points of diameter ∼ 6 A391 (Figure 4.2). Back-calculation of experimental data

for small helical peptides showed that a rigid arm of length ∼ 6.7 A perpendic-

ular to the helix axis and passing through the Cβ atom provides a reasonable

approximation to the position of MTSL392. The expected error in the electron-

proton distances can therefore be estimated, although the secondary structure

dependence of the MTSL conformation389,391 means that this information can

only be used for structural refinement when the secondary structure is known,

which is seldom the case when modelling DS.

In fact, the lack of a single well-defined molecular structure in DS ensembles

effectively means that the conformation of the protein backbone with respect

to the point at which the paramagnetic label is attached may be considered to

be completely random. When MTSL is attached to solvent-exposed loops, the

motion is highly isotropic388, thus the allowed conformations of MTSL attached

to a disordered protein are likely to be similar. The experimental observable

therefore contains contributions from all possible orientations of the spin-label in

combination with all accessible conformations of the backbone, greatly reducing

the dependence of the calculated distance on the orientation of MTSL.

For the previous PRE-ERMD of αS, the PRE distance was defined as the

distance between the atom of the wild-type side-chain of the spin-labelled residue

64

Page 77: Jra Phd Final 051107

furthest from the protein backbone and the amide hydrogen atom205. The

length of the side-chain varies, however, depending on the residue type, whereas

experimentally, the spin-label is always attached to a cysteine. The only way

to remove this source of variation is to define the position of the spin-label as

being the position of the Cα atom of the spin-labelled residue, as this is the

only atom that is present in all amino acids. If the motion of the spin-label and

the side-chain of the cysteine residue to which it is attached are indeed truly

random with respect to the polypeptide backbone, then the Cα atom is the

centre of the locus of conformations populated by MTSL. The average distance

between the amide hydrogen and the free electron is not the same as the distance

between the Cα atom and the amide hydrogen, due to the r−6 averaging. The

magnitude and direction of the discrepancy cannot be determined unequivocally

however, thus for the purposes of these simulations, the PRE distance is defined

as the distance between the Cα atom of the spin-labelled residue and the amide

hydrogen. This approximation has no effect on the back-calculation and use of

synthetic PRE distance restraints, which are exact, but becomes more important

when experimental data is used in Chapters 5 and 6.

Figure 4.2: Diagrams of (A) the structure of the MTSL side-chain, indicating

the dihedral angles X1 − X5 and the 4-H atom on the nitroxide ring and (B)

the locus of points swept out by the nitroxide nitrogen due to rotation about

X4391.

4.4 Preliminary results

4.4.1 Generation of reference ensembles

Having obtained a definition of a PRE distance, the remaining parameters de-

scribing the PRE-ERMD simulations can be specified. It is necessary to carry

out the PRE-ERMD using a different effective energy function to that used to

generate the reference ensembles, as otherwise the satisfaction of the restraints

65

Page 78: Jra Phd Final 051107

is trivial. In Chapter 3 it was shown that by using the sasa355 or eef1280

implicit solvent models at high T it is possible to produce converged ensembles,

the⟨R−1

h

⟩−1of which can be matched to the experimental value by tuning T .

The two reference ensembles of αS structures, REF23 and REF20, were there-

fore generated using unrestrained molecular dynamics with the eef1280 implicit

solvent model and the PRE-ERMD was carried out using the sasa355 implicit

solvent model.

For REF23, T was chosen such that the⟨R−1

h

⟩−1is close to the experimental

value for αS in dilute solution234. The smoothing of the free energy landscape at

high T was accounted for to an extent by filtering REF23 to increase the amount

of residual structure. Because the previous PRE-ERMD of αS suggested that

it has a tendency to form contacts between the C-terminus and the central

NAC region205, only structures with more than 15 contacts between these two

regions were included in REF23. This selection process reduces the⟨R−1

h

⟩−1by

∼ 1 A but does not markedly change other ensemble-averaged quantities (data

not shown). The second, more compact reference ensemble, REF20, serves to

confirm that the PRE-ERMD method can be applied to a range of different

types of DS ensembles. The residual structure content of REF20 is sufficiently

high that filtering was not required.

4.4.2 Absence of correlated motions

It is clear from the factors discussed in Chapter 1 that averages alone are not

sufficient to describe heterogeneous ensembles such as those typical of DS of

proteins. A further question of interest is whether distributions alone constitute

an adequate description of an ensemble, or if correlations between distributions

must be included as well. The answer to this question also has implications

regarding the choice of appropriate methods for assessing the success of PRE-

ERMD.

The presence of correlated motions in the reference ensembles was investi-

gated by comparing the joint distributions, p(rAB, rAC), of distances between

pairs of Cα atoms AB and AC at a range of positions throughout the sequence

to the product of the distributions, p(rAB) ∗ p(rAC). The similarity between

the two types of 2D histograms was quantified using S values, with a high S

value indicating the existence of correlations. Three representative pairs of dis-

tributions with low, medium and high S values are shown in Figure 4.3. In the

presence of correlations, the joint distribution has an elongated shape, whereas

the product of the distributions remains circular (Figure 4.3 A, B, D and E).

Plotting the S values for all combinations of B and C for a given A allows

66

Page 79: Jra Phd Final 051107

the pairs of residues for which the distance distributions are most correlated

to be identified. In REF23 (Figure 4.4), the S values are highest when B and

C are close together in sequence. This result is not surprising, as it is likely

that two residues close together in sequence will also occupy similar spatial

coordinates, so that any changes in the distance from B or C to A will occur

in a coordinated manner. The S values for the remainder of the B/C pairs are

predominantly low, other than when B and C are both in the C-terminus. This

latter effect is a result of the filtering procedure used to produce REF23, as it

does not occur for REF20, which is similar to REF23 in all other respects (data

not shown). Thus correlations are essentially absent in REF23 and REF20,

other than those arising from the persistence length of the polypeptide chain,

which can be estimated to be ∼ 7 residues. Reproduction of the probability

distributions underlying the PRE distances is therefore sufficient to certify that

the reference ensemble is accurately reconstructed.

Figure 4.3: 2D histograms of (A-C) p(rAB) ∗ p(rAC) and (D-F) p(rAB, rAC) for

REF23. The residues and the S values quantifying the similarity of the two

types of histogram are (A,D) A = 1, B = 105, C = 116 (S = 0.86), (B,E) A =

1, B = 71, C = 130 (S = 0.50) and (C,F) A = 82, B = 71, C = 130 (S = 0.23).

4.4.3 Calculation of synthetic distance restraints

The set of synthetic restraints back-calculated from the reference ensembles

comprises 1000 long-range PRE distances between the Cα atoms of 8 ‘spin-

labelled’ residues and all amide hydrogens except those on residues adjacent to

67

Page 80: Jra Phd Final 051107

Figure 4.4: 2D plots of the S values quantifying the agreement between

p(rAB, rAC) and p(rAB)∗p(rAC) for REF23 for (A) A = 29 and (B) A = 71. The

discontinuity of the region of highest S values for B ∼ C is due to the inability

of the smoothing function (see Section 2.3.3) to interpolate fully between the

limited number of A, B and C for which S values were available.

the spin-labelled residues. Whilst many more distance restraints could have been

calculated, this number was chosen because it is the upper limit for the number

of distances that can practically be obtained experimentally for a protein of

this size. A ‘free’ dataset, also consisting of 1000 PRE distances, was created

for cross-validation purposes. Although experimentally, the free dataset usually

comprises 20% of the data, with the remaining 80% used as restraints, here it

is the same size as the working dataset so that the statistics for two sets are

comparable.

Typically, when using experimentally determined PRE distances as restraints,

the ensemble-averaged distance at each point in time, dcalcij (t), are required to

lie within a square well defined by drefij − L and dref

ij + U , where L and U are

lower and upper bounds, respectively, and drefij is the experimental or synthetic

distance restraint. A harmonic potential is applied outside the square well to en-

sure continuity. Justification for the use and choice of L and U , including which

of the PRE distances are given both and upper and lower bounds, and which are

used as ‘negative’ restraints and assigned only one or the other, are discussed

further in Section 5.2.3. Here, L and U were assigned as if the distances had

been calculated from experimental data. Distances greater than d0.85ij or less

than d0.15ij , where d0.85

ij and d0.15ij represent the maximum and minimum reliable

distances that can be determined experimentally (see Section 2.2.3), were as-

signed only a lower or upper bound, respectively, corresponding to d0.85ij − L or

d0.15ij + U .

68

Page 81: Jra Phd Final 051107

4.4.4 Application of PRE-ERMD

A number of PRE-ERMD simulations using PRE distance restraints back-

calculated from REF23 were carried out varying Nrep, L and U . The simulation

temperature, T , of 515 K was chosen so that the 〈Rg〉 of an unrestrained ensem-

ble matches that of REF23. Each ensemble produced using PRE-ERMD was

compared to the appropriate reference ensemble in terms of both averages and

distributions. The ensemble-averaged PRE distances and Rg were compared

using Q values359 (equation 2.14) and the distributions underlying these ob-

servables were compared using S values344 (equations 2.15 and 2.16). Because

an ensemble is better defined in terms of distributions of observables rather than

averages, the S values provide the best measure of how accurately the reference

ensemble is recovered.

In most cases, application of restraints causes the average size of the struc-

tures to decrease, so that it is difficult to obtain an ensemble of structures for

which the 〈Rg〉 and the Rg distribution match those of REF23. Furthermore,

the Q values and S values for the PRE distances (QPRE and SPRE) are not

optimised simultaneously (Table 4.1 and Figure 4.5 A), indicating that the best

conditions for the reproduction of the r−6-averaged PRE distances are not the

same as for the reproduction of the underlying distributions. This can occur

because of the complex relationship between averages and distributions386 dis-

cussed in Section 4.2. PRE distances are r−6 averages, thus the poor correlation

between SPRE and QPRE is most likely due to the left-hand side of the PRE dis-

tance distributions of the ensembles generated using PRE-ERMD not matching

those of REF23. The inability of cross-validation against the PRE distances

to report on how well the distributions are reproduced is of particular concern

because the ultimate aim is to use experimental data as restraints, in which case

the true underlying distributions are not known.

Based on these preliminary results, there are therefore two main issues that

need to be addressed. Firstly, a validation measure based on information avail-

able experimentally that, like the S values, reports on how well the distributions

are reproduced is required. Secondly, it is necessary to find a means of over-

coming the compaction induced by the application of PRE distance restraints.

The means by which these problems were circumvented are described in the

following section.

69

Page 82: Jra Phd Final 051107

Table 4.1: The 〈Rg〉, Q and S values quantify how well REF23 is reproduced

varying the number of replicas (Nrep) and the lower (L) and upper (U) bound-

aries. The simulation temperature of T = 515 K was chosen because the 〈Rg〉 of

an unrestrained ensemble at this T is equal to that of REF23 (23.2 A). QRg and

SRg refer to the Rg, QwPRE and SwPRE to the working PRE distance restraints

and QfPRE and SfPRE to the free PRE distances.

Nrep L U 〈Rg〉 (A) QRg SRg QwPRE QfPRE SwPRE SfPRE

16 5 5 18.3 0.21 1.24 0.11 0.16 0.47 0.43

24 5 5 19.3 0.17 0.96 0.09 0.14 0.39 0.37

32 5 5 20.0 0.14 0.78 0.10 0.17 0.35 0.34

16 1 1 13.8 0.41 2.0 0.11 0.16 0.66 0.54

24 1 1 17.6 0.24 1.42 0.09 0.21 0.57 0.51

32 1 1 17.9 0.23 1.34 0.10 0.15 0.51 0.45

16 1 8 19.9 0.14 0.80 0.17 0.15 0.38 0.38

24 1 8 21.2 0.09 0.45 0.15 0.13 0.30 0.29

32 1 8 21.9 0.06 0.33 0.15 0.13 0.28 0.28

0 0.1 0.2 0.3 0.4 0.50

0.1

0.2

0.3

0.4

0.5

0 0.1 0.2 0.3 0.4 0.50

0.5

1

1.5

2

0 0.1 0.2 0.3 0.4 0.50

0.05

0.1

0.15

0.2

0.25

SPRE SPRE SPRE

QP

RE

SR

g

QR

g

A B C

Figure 4.5: Correlation between the SPRE values and (A) the QPRE values, (B)

the SRg values and (C) the QRg values. Each point corresponds to a different

ensemble created using PRE-ERMD using synthetic distance restraints calcu-

lated from REF23. The working data are shown in black and the free data in

red.

4.5 Improvement of the PRE-ERMD method

4.5.1 Cross-validation against multiple observables

The work of Choy et al.387 provides a starting point for the development of

an alternative validation measure. They use two different types of average to

70

Page 83: Jra Phd Final 051107

define more precisely the parameters of a distribution function. In the case of

PRE-ERMD, the distribution function is implicitly included in the choice of

simulation conditions, thus all that is required is an experimental observable

that is a type of average other than r−6.

Whilst the expanded and heterogeneous range of structures comprising DS

ensembles renders many experimental observables difficult to obtain and in-

terpret, one observable that remains both measurable and informative is the

Rg. The geometric Rg is easily calculated from the atomic coordinates of each

structure. Root-mean-square averaging (⟨R2

g

⟩1/2) can be used for comparison

with experimental values obtained by SAXS. There are also programs available,

such as crysol393, that calculate the solution scattering profile and thus the

expected experimental⟨R2

g

⟩1/2, from the atomic coordinates. Additionally, the

calculated Rg of each structure can be converted into an Rh using a phenomeno-

logical relationship as described in Section 2.3.1, and the ensemble-averaged⟨R−1

h

⟩−1computed for comparison with the experimental value obtained from

PFG-NMR. In all cases, the type of average is different to an r−6 average;

accordingly, it imparts additional information regarding the shape of the under-

lying distribution. For simplicity, the 〈Rg〉 was used with the synthetic PRE

distance restraints, though the same conclusions hold if other types of averaging

are used.

In order for the information contained in the r−6-averaged PRE distances

and the linearly (or otherwise) averaged Rg to be combined, the first requirement

is that the distributions of each type of observable are correlated. This is indeed

the case: ensembles for which SPRE is low also exhibit low SRg (Figure 4.5 B).

In fact, QRg is also highly correlated with SPRE (Figure 4.5 C). This is in part

an artifact of the nature of the reference ensembles used here. The widths of

the distributions are correlated with the midpoints of the distributions, thus

if a linear average such as the 〈Rg〉 matches, the distributions tend to be of

similar width. However in the general case it still holds that cross-validation

against different types of average, which report on different aspects of the un-

derlying distribution, provides a better measure of whether the distributions,

and therefore the ensemble, are correct. Therefore, when using experimental

data, cross-validation against the Rg or Rh should be used as a substitute for

cross-validation against S values in order to determine the optimal choice of

simulation conditions. This criterion was used in the PRE-ERMD described in

the remainder of this thesis.

71

Page 84: Jra Phd Final 051107

4.5.2 Explanation of the compaction problem

In order to cross-validate against the Rg, it is first necessary to overcome the

compaction induced by the application of restraints and generate ensembles

that are sufficiently expanded. The reason why compaction occurs is related

to the issues addressed by cross-validation. As outlined in Section 4.2, this

process aims to determine the number of replicas for which there are sufficient

degrees of freedom to account for the time- and ensemble-average nature of the

experimental data, but not so many that over-fitting occurs.

It is impossible, however, to consider as many replicas as the number of copies

of the molecule that contribute to an experimental solution-state observable, not

only due to the avoidance of over-fitting, but also for more practical reasons,

such as computing resources. This is particularly pertinent for DS, where each

experimental value is an average over a broad distribution. With fewer replicas,

only a small fraction of the contributing values can be sampled at each point

in time. If the ergodic principle holds, this is compensated for by simulating

for a sufficiently long time. The application of restraints, however, poses a

significant restriction to the ergodicity of the simulations, which is exacerbated

by the sensitivity of the r−6-averaged PRE distances to the smallest contributing

values (Figure 4.1). When there are fewer replicas, a greater proportion must

contain short distances in order to satisfy the restraint at each point in time.

This results in narrow distributions containing mostly short distances close to

the r−6 average, and, ultimately, ensembles of structures that are too compact

(Figure 4.6 A).

Accordingly, despite carrying out the ERMD at temperatures where the 〈Rg〉of an unrestrained ensemble matches that of the relevant reference ensemble, the

〈Rg〉 decreases upon application of synthetic PRE distance restraints, even with

32 replicas (Table 4.1). Ways to increase the range of structures accessible at

each point in time other than explicitly increasing the number of degrees of

freedom by increasing Nrep were therefore investigated.

4.5.3 Solving the compaction problem

PRE distance restraints are typically not enforced precisely; rather, as men-

tioned previously, dij(t) is simply required to lie within a harmonic square well

defined by L and U . Manipulating L and U provides a simple mechanism

for indirectly controlling the range of distances sampled and so the width of

the distance distribution. L and U were originally implemented to account

for experimental inaccuracies and the associated errors in the calculated dis-

72

Page 85: Jra Phd Final 051107

0

0.1

0.2

0.3

0.4p

(Dis

tanc

e)

Nrep=16,L=5,U=5

0

0.1

0.2

0.3

0.4

Nrep=24,L=5,U=5

0 20 40 60Distance (Å)

0

0.1

0.2

0.3

0.4

p(D

ista

nce)

Nrep=24,L=1,U=8

0 20 40 60Distance (Å)

0

0.1

0.2

0.3

0.4

Nrep=24,L=1,U=1

A B

C D

Figure 4.6: The effect of changing Nrep, L and U . The distributions of distances

sampled over all time-points and all replicas, rij,k(t), are in black, the distribu-

tions of ensemble-averages compiled over all time-points, dij(t), are in red and

the distributions of distances calculated from REF23, rrefij,k, are in grey. Also

shown are the overall time- and ensemble-average calculated from the PRE-

restrained ensemble, dij , in green and from REF23, drefij , in blue and the lower

(L) and upper (U) bounds in cyan. The data are from four ensembles generated

using synthetic PRE distance restraints calculated from REF23.

tance199,208,224. In Chapter 5, it is shown that by careful treatment of the

experimental data, a smaller degree of tolerance than that used for most pre-

viously published examples of ERMD (L,U = 4 − 5 A) can be justified, thus

extending the usefulness of this practice beyond the original spirit in which it

was implemented.

The smaller L and U are, the closer dcalcij (t) is to dref

ij at each point in time

(Figure 4.6 B vs D). Although altering Nrep does not directly control the variety

of distances contributing to dcalcij (t) (that is, the width of the distribution of

distances rij,k at each time-point, t), in general, a wider range of distances are

sampled at each point in time if Nrep is large (Figure 4.6 A vs B). On the other

hand, increasing L and U allows more variation in dcalcij (t) (Figure 4.6 B vs D).

Over many time-points, this variation equates to a wider range of distances

73

Page 86: Jra Phd Final 051107

being sampled for a given Nrep. Thus increasing the tolerance to instantaneous

fluctuation in the ensemble-averaged observables can atone for a reduced number

of replicas, without explicitly increasing the number of degrees of freedom.

For the time- and ensemble-average with fewer replicas and larger L and U

to be equivalent to that obtained with more replicas and smaller L and U , the

dcalcij (t) over multiple timesteps must be evenly distributed within dref

ij −L, drefij +

U . This is the case (Figure 4.6). If L and U are equal, however, such that the

range of dcalcij (t) collected over all time-points, t, are evenly distributed either

side of drefij , the r−6 average calculated from the overall distribution of rcalc

ij,k ,

pooled over all Nrep replicas and all time-points t, is in general smaller than the

imposed restraint drefij (Figure 4.6 A, B and D). This is because approximately

half of the rij,k lie between drefij and dref

ij − L, and these small rij,k have a

disproportionately large influence on dcalcij . If the tolerance is to be used to

compensate for using fewer replicas, then L and U must be chosen such that

the overall distribution of rcalcij,k contains a smaller proportion of short distances.

This can be achieved by favouring dcalcij (t) > dref

ij at the expense of dcalcij (t) < dref

ij :

essentially, L < U (Figure 4.6 C).

A range of different combinations of L and U were tested with 16, 24 and

32 replicas and the synthetic PRE distance restraints calculated from REF23.

The key results are summarised in Table 4.1. With L = 1 and U = 8 the

desired effect is obtained without the upper bound becoming so large that it

ceases to act as a restraint. However even with 32 replicas the distributions are

still narrower than those of REF23, containing too many short distances and

not enough large distances (data not shown, but see Figure 4.6), and the 〈Rg〉remains too low.

Further measures are therefore required to encourage the sampling of longer

distances. The results of Chapter 3 suggest that increasing the simulation tem-

perature, T , tends to generate more expanded structures, in which the inter-

residue distances are more likely to be large. As discussed in Chapter 3, T

should not be thought of as a true physical quantity, but as a source of en-

ergy to overcome the bias of the force-field and implicit solvent models towards

compact structures. Here the additional energy provided by increasing T also

helps to reverse the tendency towards sampling shorter distances caused by re-

straining an r−6-average calculated over fewer replicas than contributed to the

restraint.

The effectiveness of treating T as an adjustable parameter was tested by

attempting to reproduce the more compact ensemble, REF20, as well as REF23.

74

Page 87: Jra Phd Final 051107

In both cases, 24 replicas were used with L = 1 and U = 8. By adjusting T ,

both reference ensembles can be accurately reproduced in terms of distributions

and averages (Table 4.2), indicating that this method is applicable to different

types of ensembles. The optimal T in each case depends on the broadness of

the reference ensemble and the compactness of the structures, so that a higher

T is required to reproduce REF23 than REF20.

Table 4.2: The 〈Rg〉, Q and S values quantify how well REF23 (Rg = 23.2 A)

and REF20 (Rg = 20.0 A) are reproduced by varying the simulation tempera-

ture, T , with Nrep = 24, L = 1 and U = 8. QRg and SRg refer to the Rg, QwPRE

and SwPRE to the working PRE distance restraints and QfPRE and SfPRE to

the free PRE distances.

T (K) 〈Rg〉 (A) QRg SRg QwPRE QfPRE SwPRE SfPRE

REF23 485 19.6 0.16 0.88 0.13 0.17 0.37 0.38

515 21.2 0.09 0.45 0.15 0.13 0.30 0.29

550 22.5 0.03 0.22 0.15 0.12 0.27 0.25

570 23.0 0.01 0.12 0.17 0.13 0.26 0.25

REF20 515 20.5 0.02 0.13 0.17 0.13 0.26 0.25

550 22.0 0.09 0.43 0.15 0.14 0.23 0.22

4.6 General protocol for PRE-ERMD

To facilitate the application of the techniques developed in Section 4.5 to char-

acterise any type of disordered state for which both Rg and PRE measurements

are available, a general protocol was devised (Figure 4.7). As explained in the

caption of the figure, this method is based on the simultaneous minimisation of

QRg and QPRE.

The general method was tested by comparing its ability to reproduce REF23

to the results obtained by trial and error. The statistics are similar in both cases

(Table 4.3), showing that sufficient structures are collected at each temperature

during the initial phases to obtain meaningful statistics. Moreover, the agree-

ment between the final calculated ensemble determined using the generalised

protocol and REF23 is good, especially in terms of distributions (Table 4.3 and

Figure 4.8). The lack of correlations in REF23 means that this is sufficient to

consider the two ensembles to be equal. Thus the desired result - delineation of

75

Page 88: Jra Phd Final 051107

a general method capable of reproducing DS ensembles - has been achieved.

Figure 4.7: Outline of the general method for carrying out PRE-ERMD. In

all cases, Nrep = 24, L = 1 and U = 8. The molecules are first heated to a

700 K in 50 K increments, then the force constant, α, is increased to a value

that is sufficiently high that the restraints are satisfied but not so high as to

cause large changes in the energy. The next three steps form a loop in which

after a brief equilibration phase, a preliminary set of structures is collected,

before the temperature is lowered by 25 K and the process repeated. The 1920

structures (80 per replica) collected at each temperature are sufficient to obtain

reliable estimates of QRg and QPRE. At the temperature at which these are

both optimised, a further 5760 structures are collected (240 per replica).

4.6.1 Additional modes of validation

Throughout the work discussed in this thesis so far, the reproduction of distri-

butions has emerged as a critical factor in ensuring that an ensemble generated

using ERMD is equivalent to that from which the restraints are derived. Whilst

the S value provides a measure of how well the distributions are satisfied over-

all, and the sl values can be used to extract localised information, visualising

76

Page 89: Jra Phd Final 051107

Table 4.3: The 〈Rg〉, Q and S values quantify how well REF23 (Rg = 23.2 A)

is reproduced by varying the simulation temperature, T , with Nrep = 24, L = 1

and U = 8. QRg and SRg refer to the Rg, QwPRE and SwPRE to the working

PRE distance restraints and QfPRE and SfPRE to the free PRE distances. The

results for the most representative ensemble collected at the optimal T are in

bold type.

T (K) 〈Rg〉 (A) QRg SRg QwPRE QfPRE SwPRE SfPRE

500 20.4 0.12 0.64 0.15 0.15 0.31 0.31

525 21.5 0.07 0.38 0.16 0.15 0.31 0.31

550 22.5 0.03 0.22 0.16 0.16 0.30 0.29

575 23.0 0.01 0.11 0.17 0.18 0.29 0.28

590 23.2 0.00 0.06 0.17 0.15 0.26 0.25

600 23.4 0.01 0.07 0.17 0.17 0.29 0.28

625 23.6 0.02 0.09 0.18 0.19 0.29 0.27

650 24.1 0.04 0.14 0.19 0.21 0.28 0.28

675 24.4 0.05 0.20 0.19 0.24 0.29 0.29

700 24.5 0.06 0.21 0.20 0.25 0.30 0.29

725 24.4 0.05 0.24 0.22 0.25 0.30 0.29

750 24.7 0.06 0.28 0.21 0.28 0.30 0.30

the distributions supplies additional information, allowing the causes of the ob-

served sl values to be understood. In Figure 4.8, only three PRE distance

distributions are shown as it is not feasible to examine them all individually. A

pictorial summary of the nature of the distributions is therefore desirable.

The overall pairwise distance distribution function, p(r), is a graphical repre-

sentation that includes information regarding all distance distributions describ-

ing the ensemble. p(r) is also one of the few experimentally accessible distribu-

tion functions. It is obtained by taking the sine Fourier transform of the SAXS

scattering profile of a protein in solution162,163. The experimental p(r) includes

contributions from all pairs of interatomic distances within the macromolecule.

Here, it is approximated by considering only CαCα distances to reduce the com-

putational cost. Distributions were calculated for αRC (Chapter 3), REF23,

the optimal PRE-restrained ensemble generated at T = 590 K (αPRE), and

two unrestrained ensembles also generated using sasa. The⟨R−1

h

⟩−1of the

first, αSASA (Chapter 3), is the same as that of REF23, whereas the other

unrestrained ensemble (αSASA32) was generated at 590 K, and thus contains

77

Page 90: Jra Phd Final 051107

10 20 30 40 50 600

0.04

0.08

0.12

0.16

20 40 60 0 25 50 75 100 1250

0.01

0.02

0.03

0.04

0 50 100 0 25 50 75 100 1250

0.01

0.02

0.03

0.04

0 50 100 0 25 50 75 100 1250

0.01

0.02

0.03

0.04

0 50 100

0 5 10 15 200

5

10

15

20

0 5 10 15 20 0 25 50 75 100 125 150 1750

0.01

0.02

0.03

0 100r (Å)

p(r

)

dijref

(Å)

dijca

lc (

Å)

Rg (Å)

p(R

g)

dij (Å) dij (Å) dij (Å)

p(d

ij)

p(d

ij)

p(d

ij)

A B C D

E F

Figure 4.8: Comparison of αPRE with REF23 in terms of (A) Rg distributions,

(B-D) three examples of distance distributions and (E) scatter plot of inter-

atomic distances. For the distributions, REF23 is shown in black and αPRE

in red. In (E), the working dataset is in black and the free dataset in red.

(F) Comparison of p(r) calculated from REF23 (black), αPRE (red), αSASA

(green), αSASA32 (blue) and αRC (yellow).

structures that are much more expanded on average (⟨R−1

h

⟩−1= 31.9 A).

At small r, the p(r) of all of the ensembles overlay, with two well-defined

peaks at ∼ 4 and 7 A corresponding to nearest-neighbour packing effects (Fig-

ure 4.8 F). Thereafter, the p(r) for αRC is considerably flatter and broader

than that of the other ensembles, as is expected given its much larger⟨R−1

h

⟩−1

(∼ 37 A). The p(r) of REF23 and αSASA are similar. However the p(r) of

αPRE provides an even closer match to that of REF23, indicating that the ap-

plication of PRE distance restraints provides additional information not present

in the effective energy function defined by the force-field and solvent model. The

much broader and flatter p(r) of αSASA32 reveals the extent of the compaction

effects induced by the application of PRE distance restraints.

Each of the validation measures examined so far reflects how well a particular

type of observable is reproduced. A complementary test of the effectiveness of

the PRE-ERMD method is to compare the free energy landscapes of the vari-

ous ensembles. It is a question of central interest whether molecular dynamics

simulations with experimentally-derived restraints can be used to calculate free

energies. Here, 2D free energy landscapes were defined by considering the joint

probability of occurrence of pairs of observables. The Rg, SASA and end-to-end

distance, REE were chosen as the observables of interest. Free energy landscapes

78

Page 91: Jra Phd Final 051107

were created for REF23, αSASA and αPRE (Figure 4.9). There is a very good

agreement between the two types of free energy landscape of REF23 and those

of αPRE. In contrast, there is a large discrepancy between the free energy land-

scapes of αSASA and those of REF23. These results demonstrate that the use

of a pseudo-energy function based on experimentally-derived restraints is capa-

ble of modifying the force field so that the resulting equilibrium conformational

distribution becomes correct and confirm the earlier conclusion that the general

method is capable of accurately reconstructing a given ensemble.

Figure 4.9: Free energy landscapes of (A,D) REF23, (B,E) αSASA and (C,F)

αPRE. The free energy is defined as (A-C) F (Rg, SASA) = − ln p(Rg,SASA)

and (D-F) F (Rg, REE) = − ln p(Rg, REE), where REE is the end-to-end distance.

The Rg and REE are in A and the SASA is in A2.

4.7 Conclusions

Extensive testing of the ability of ERMD with distance restraints calculated

from two arbitrary reference ensembles showed that with an appropriate choice

of simulation parameters, it is possible to reproduce accurately a DS ensemble

despite having information about only a small fraction of the distances. The

concept of cross-validation against more than one type of average was intro-

duced in order to evaluate whether the underlying distributions, and therefore

the ensemble, are correct in cases where the true distributions are not known.

This is an important prerequisite for the use of experimental data, in which case

79

Page 92: Jra Phd Final 051107

the underlying distributions are not available for validation. Additionally, some

changes to the previously published method201,202,205 were proposed to allevi-

ate the difficulty in compromising the need to avoid over-fitting with the bias

towards overly-compact structures due to aspects of the implicit solvent models

and the restraint of r−6-averaged observables on a limited number of replicas.

These changes were justified both empirically, due to the improvement in the re-

production of the reference ensemble, and theoretically. Comparison of a range

of quantities calculated from various reference, unrestrained and restrained en-

sembles confirmed that the general method developed in this chapter provides

an accurate and efficient means of obtaining DS ensembles. In the following two

chapters, the application of this general method for carrying out PRE-ERMD

to characterise DS ensembles is applied using experimental data for the IDPs

αS, βS and β+HC, and the acid-denatured state of the NFP PI3-SH3.

80

Page 93: Jra Phd Final 051107

Chapter 5

Comparison of the solution

state ensembles of

α-synuclein, β-synuclein

and β+HC

5.1 Introduction

In Chapter 4 an improved method for ERMD of disordered states using PRE

distance restraints was developed. The changes were justified according to how

well a reference ensemble of αS structures was reproduced in terms of distribu-

tions as well as averages. This resulted in a general protocol for PRE-ERMD

capable of accurately reconstructing an ensemble. In this chapter, the general

PRE-ERMD method is used to characterise the two related proteins, αS and βS

and an artificial construct, β+HC145. These three polypeptides are of interest

because, despite high sequence identity, they exhibit contrasting aggregation

behaviour. In order to properly understand their different properties, it is de-

sirable to obtain a complete description of the solution state ensembles of each

protein in terms of the constituent structures and their relative populations.

PRE-ERMD provides a means of determining such ensembles for DS.

Before applying the general method with experimental data, the relationship

between the data obtained from a PRE-NMR experiment and the calculated

inter-atomic distance is explored to ascertain the sources and magnitude of un-

certainty in the distance restraints. The calculated distances are then used with

81

Page 94: Jra Phd Final 051107

the improved PRE-ERMD method to obtain ensembles of αS, βS and β+HC

structures. In Sections 5.5 and 5.6, methods for analysing the resulting ensem-

bles of structures are developed and applied. The agreement with experimental

data, both quantitative and qualitative, is also assessed. Finally, the αS en-

semble is compared to the ensembles of βS and β+HC structures to examine

the effect of the hydrophobic core on the structural properties and aggregation

propensities of αS and βS at a molecular level.

Although αS has been characterised previously by PRE-ERMD205, a new

ensemble is generated here to encompass additional experimental data that has

become available since that study was published. The⟨R−1

h

⟩−1of the previously

published ensemble is similar to that of αS in D2O, pH 7.0 at 298 K (26.6 A)234.

The PRE-NMR, however, was carried out in phosphate buffer with 100 mM

NaCl, pH 7.4 at 283 K. Subsequent measurement of the⟨R−1

h

⟩−1of αS in

Mes buffer with 100 mM NaCl, pH 6.5 at 288 K showed that it is much more

expanded (32.0 A) in these conditions235. A revised ensemble of αS structures

with the correct 〈Rh〉 of ∼ 32.0 A was therefore determined using the newly

optimised method. An additional change from the previously published work

was the inclusion of a further 118 distances obtained from a spin-label positioned

at residue N122. Another ensemble of αS structures compatible with distances

derived from PRE-NMR experiments has been obtained using single-replica

simulated annealing240. A measure of the global size of these structures was

not reported, but given the issues discussed in Chapter 4, it is likely that they

were both too compact and unrepresentative.

5.2 Factors influencing the calculated distances

The calculation of distances from the experimental Iox/Ired is based on a modi-

fied Solomon-Bloembergen equation for transfer relaxation rates (equation 2.7),

as outlined in Section 2.2. The use of this equation depends on a number of

assumptions, including a constant electron-proton distance, that are not neces-

sarily true in the case of DS. Alternative means of formulating the equations

based on expressions for the spectral density394–399 were investigated as part

of this work but did not prove successful (data not shown). In the following

sections, the effect on the calculated distance of using a single correlation time,

τc and experimental uncertainty in the measured R2 and Iox/Ired is examined.

Motion of the spin-label during the time-course of the experiment may also

contribute, as discussed in Chapter 4.

82

Page 95: Jra Phd Final 051107

5.2.1 Correlation time

The correlation time of the electron-proton vector, τc, contains contributions

from the relaxation of the electron and from motions of the electron-proton

vector: 1/τc = 1/τS + 1/τR, where τS is the longitudinal relaxation time of the

free electron and τR is the effective rotational correlation time of the vector199.

As τS > 10−7 s−1 for nitroxide free radicals and τR ∼ 10−9− 10−8 s−1, τc ∼ τR.

With τc in this range the term τcωH in equation 2.7 is ≥ 1, which means that

τc can be estimated from

τc =(

6(Rsp2 /Rsp

1 )− 74ω2

H

)1/2

, (5.1)

where Rsp1 and Rsp

2 are the paramagnetic relaxation enhancement of the proton’s

longitudinal and transverse relaxation rates, respectively.

The values of τc calculated from measured Rsp1 and Rsp

2 typically range from

1− 15 ns. However a detailed study by Gillespie and Shortle found that prop-

agation of the error in the measured Rsp1 and Rsp

2 results in average errors of

± 50%199. Additionally, for many vectors Rsp1 and Rsp

2 could not both be

determined with sufficient precision to permit a reasonable calculation of τc.

Consequently, they used the average value of τc (4.1 ns) for all further calcula-

tions.

Alternative means of determining τc have since been developed207,209,210.

Some of these, however, are only suitable for the determination of folded NS207,209.

Gaponenko et al.210 were able to estimate τc for each residue based on the fre-

quency dependence of paramagnetic effects, but were still forced to resort to

using an average value in some cases.

Whilst it is preferable to know τc exactly, uncertainty in this parameter has

only a limited influence on the calculated distance. A 10% error in τc results

in only 2% error in the calculated distance, thus error in τc of up to 40% can

be tolerated206. The effect of using an average τc is therefore negligible. Other

authors have followed the example of Gillespie and Shortle, approximating τc

with the global rotational correlation time of the protein in question206,208 or

simply using τc = 4 ns159,202,205. Because the global rotational correlation time

was not measured for any of the proteins studied here, τc = 4 ns is used for all

calculations.

5.2.2 Transverse relaxation rate

The intrinsic transverse relaxation rate, R2, which occurs in equation 2.6, is

commonly assumed to be equal to R2 of the diamagnetic sample, Rred2 . The

83

Page 96: Jra Phd Final 051107

variation in the experimental Rred2 is generally very low. For instance, for βS,

the average SD for duplicate measurements is just 0.014%. Residue-specific

values of Rred2 were used in the distance calculations where available; otherwise,

the average over all residues was substituted. This is unlikely to introduce a

great deal of error, as the SD of Rred2 over all residues is only 2.13%. Additionally,

as for τc, the calculated distance has only an r−1/6 dependence on the fitted

Rsp2 (equation 2.7), thus any error that may be introduced during the fitting of

equation 2.6 to obtain Rsp2 has a negligible effect on the calculated distance.

5.2.3 Intensity ratio

The remaining experimental observable in equation 2.6 is Iox/Ired. Uncertainty

in Iox/Ired resulting from experimental variation can have a significant effect

on the calculated distance, particularly for large Iox/Ired, due to the non-linear

relationship between r and Iox/Ired. Various means of accounting for error in

the measurement of Iox/Ired have been developed. Error-dependent weighting

functions have been used in studies where the back-calculated Γ2 (R−12 ) rather

than the r−6 distances are restrained207,209. This method can only be used if

there is at least duplicate data for every observable, which is not always the

case with the experimental datasets used here. The simplest and most common

way to account for uncertainty in Iox/Ired is to include a degree of tolerance

towards variation in the ensemble-averaged back-calculated distances, dcalcij , at

each point in time during the PRE-ERMD. This tolerance takes the form of a

square well defined by lower (L) and upper (U) bounds, so that dcalcij within

−L and +U of the restraint are not penalised. A harmonic potential is applied

outside the square well to ensure continuity.

It is desirable to be able to implement the distance restraints as precisely

as possible, so as to maximise their information content. To quantify the con-

tribution that error in Iox/Ired makes to the calculated distance, variation of

up to 15% was introduced into a set of model Iox/Ired and the differences in

the calculated distances were examined (Figure 5.1). The effect of errors in

Iox/Ired on r depends on the magnitude of Iox/Ired. The experimental Iox/Ired

were therefore divided into three groups based on their magnitude. L and U

were assigned differently for each group so as to be appropriate for the expected

uncertainty in the calculated distance.

Iox/Ired < 0.15 correspond to the shortest inter-atomic distances. They

are therefore an important source of structural information if the two residues

involved are far apart in sequence. However the magnitude of any variation

in the experimental data between replicates is generally large relative to the

84

Page 97: Jra Phd Final 051107

0 0.2 0.4 0.6 0.8 1

20

40

60

80

100D

ista

nce

(Å)

0.45 0.475 0.5 0.525 0.5515

16

17

18

0.8 0.85 0.9 0.95 120

40

60

80

100

0.8 0.825 0.85 0.875 0.9

20

25

30

35

Iox/Ired Iox/Ired Iox/Ired Iox/Ired

A B C D

Figure 5.1: The relationship between the calculated distance and the corre-

sponding Iox/Ired for (A) 0 < Iox/Ired < 1.0, (B) 0.45 < Iox/Ired < 0.55, (C)

0.8 < Iox/Ired < 1.0 and (D) 0.8 < Iox/Ired < 0.9. The distances calculated

from the correct Iox/Ired are shown in black, and the distances resulting from

errors of ±1% (solid), ±5% (dashed), ±10% (dot-dashed) and ±15% (dotted)

of Iox/Ired are shown in red.

measured value208, resulting in a large uncertainty in the calculated distance.

To maximise the amount of information gleaned from these Iox/Ired, whilst

avoiding introducing errors, a “negative” restraint was applied by requiring the

inter-residue distance to be less than an upper bound of d0.15ij + U , where d0.15

ij

is the distance calculated from Iox/Ired = 0.15.

At the other extreme, the greatest source of inaccuracy in the distances

calculated from Iox/Ired > 0.85 is the nature of the equations relating r to

Iox/Ired rather than the experimental measurement. Even a very small error in

the measured Iox/Ired can result in a large difference in the calculated distance.

Residue pairs with Iox/Ired > 0.85 were therefore assigned only a lower bound

of d0.85ij − L, where d0.85

ij is the distance calculated from Iox/Ired = 0.85.

For the remaining 0.15 < Iox/Ired < 0.85, errors of up to 10% in Iox/Ired

result in propagated errors of less than -1.9 or +3.8 A in the calculated distance

(Figure 5.1). A distance restraint was therefore only applied if all of the repli-

cate Iox/Ired measured experimentally were within 10% of the average Iox/Ired

for that residue. The fraction of the experimental data that was discarded due

to this restriction is given in Table 5.1 along with the total number of distance

restraints for each protein. The Iox/Ired from which the distances were calcu-

lated for each of the three proteins are also shown graphically in Figures 5.2, 5.3

and 5.4. The PRE-ERMD was carried out using a working dataset comprising

80% of the data, as every 5th distance was relegated to a ‘free’ dataset to be

used for independent cross-validation.

85

Page 98: Jra Phd Final 051107

Table 5.1: Summary of the experimental restraints. NPRE is the total number

of distances derived from the PRE experiment and NwPRE and NfPRE are the

numbers of distances in the working and free datasets, comprising 80 and 20%

of the total data, respectively. The percentage of the experimental data that

was discarded due to inaccuracies > 10% is also shown.

Protein NPRE NwPRE NfPRE % discarded

αS 595 476 119 16.90

βS 635 508 127 16.78

β+HC 578 462 116 2.86

0

0.2

0.4

0.6

0.8

1

I ox/I re

d

Q24 S42 Q62

0 20 40 60 80 100 120 1400

0.2

0.4

0.6

0.8

1

I ox/I re

d

S87

0 20 40 60 80 100 120 140Residue Number

N103

0 20 40 60 80 100 120 140

N122

Figure 5.2: The distribution of Iox/Ired along the αS sequence for each spin-

label position as indicated. The experimental data is shown as black bars and

the Iox/Ired expected for a purely random coil is plotted as a thick red line. The

experimental Iox/Ired are those processed for use in the simulations as discussed

in the text, thus any Iox/Ired < 0.15 or > 0.85 have been set to 0.15 or 0.85,

respectively. If no bar is present, then either Iox/Ired was not measured for this

residue or it was discarded due to error > 10%.

5.3 Choice of optimal T for characterisation by

PRE-ERMD

PRE-ERMD simulations of αS, βS and β+HC were run using the general

method developed in Chapter 4, with one minor modification. When using

86

Page 99: Jra Phd Final 051107

0

0.2

0.4

0.6

0.8

1I o

x/I red

A30 S42 S64 F89

0 20 40 60 80 1001200

0.2

0.4

0.6

0.8

1

I ox/I re

d

A102

0 20 40 60 80 100120 Residue Number

S118

0 20 40 60 80 100120

A134

Figure 5.3: The distribution of Iox/Ired along the βS sequence for each spin-

label position as indicated. The experimental data is shown as black bars and

the Iox/Ired expected for a purely random coil is plotted as a thick red line. The

experimental Iox/Ired are those processed for use in the simulations as discussed

in the text, thus any Iox/Ired < 0.15 or > 0.85 have been set to 0.15 or 0.85,

respectively. If no bar is present, then either Iox/Ired was not measured for this

residue or it was discarded due to error > 10%.

synthetic data, the 〈Rg〉 of the calculated ensemble was compared to that of the

reference ensemble. The Rh is the preferred experimental measure of the global

size, as it is measured by PFG-NMR under similar conditions to the PRE-NMR.

The experimental observable is a harmonic average, and thus fulfills the criteria

outlined in Chapter 4 for the improved cross-validation procedure: it is not an

r−6 average, and as a near-linear average, it reports on the central portion of

the underlying distribution.

The Rh of each of the calculated ensembles was obtained by using phe-

nomenological relationships derived separately for each protein:

αS : R−1h = 0.0148 + 0.488R−1

g (5.2)

βS : R−1h = 0.0163 + 0.454R−1

g (5.3)

β+HC : R−1h = 0.0151 + 0.494R−1

g (5.4)

The Rh and the geometric Rg of each of a set of structures encompassing a

wide range of sizes were calculated using the program hydropro356, and linear

regression was carried out to parameterise the relationship between R−1g and

87

Page 100: Jra Phd Final 051107

0 25 50 75 100 1250

0.2

0.4

0.6

0.8

1I re

d/I re

dA30

0 25 50 75 100 125

S42

0 25 50 75 100 125

S64

0 25 50 75 100 1250

0.2

0.4

0.6

0.8

1

I ox/I re

d

A113

0 25 50 75 100 125Residue Number

A145

Figure 5.4: The distribution of Iox/Ired along the β+HC sequence for each spin-

label position as indicated. The experimental data is shown as black bars and

the Iox/Ired expected for a purely random coil is plotted as a thick red line. The

experimental Iox/Ired are those processed for use in the simulations as discussed

in the text, thus any Iox/Ired < 0.15 or > 0.85 have been set to 0.15 or 0.85,

respectively. If no bar is present, then either Iox/Ired was not measured for this

residue or it was discarded due to error > 10%.

R−1h . These equations were used to convert the Rg of each structure in the

ensembles generated by PRE-ERMD into an Rh as described in Section 2.3.1.

In general, the⟨R−1

h

⟩−1decreases with decreasing T , thus QRh also decreases

until the calculated⟨R−1

h

⟩−1matches the experimental

⟨R−1

h

⟩−1and then in-

creases again (Tables 5.2, 5.3 and 5.4). It is therefore straightforward to locate

the T that optimises QRh .

Determining when the QPRE values are minimised is not as simple. In Chap-

ter 4, every PRE distance was known exactly, and so could contribute to the

calculation of statistics. In comparison, because the experimental PRE distances

can only be calculated accurately for 0.15 < Iox/Ired < 0.85, it is only appro-

priate for these medium-range distances to contribute to QwPRE and QfPRE.

Unlike the synthetic QPRE values, neither the QwPRE or the QfPRE calculated

for 0.15 < Iox/Ired < 0.85 changes markedly for the range of T explored (Ta-

bles 5.2, 5.3 and 5.4). However in Chapter 4 it was shown that QRg rather

than QwPRE or QfPRE provides the best measure of when the distance distri-

88

Page 101: Jra Phd Final 051107

butions are most accurately reconstructed. Consequently, QRh was used here

as the primary determinant of the optimal T . Where two T give similar results,

an intermediate value was chosen. The optimal T for each protein is slightly

different (Tables 5.2, 5.3 and 5.4, bold type); the reasons for this are discussed

further in the next section.

Table 5.2: The Q values quantify how well the experimental⟨R−1

h

⟩−1(32.0 A)

and the PRE distances for αS are reproduced by varying T with Nrep = 24,

L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE

distances) and QfPRE to the free dataset (remaining 20%). The results for the

most representative ensemble collected at the optimal T are in bold type.

T (K)⟨R−1

h

⟩−1(A) QRh Qw Qf

475 31.3 0.018 0.18 0.21

490 32.1 0.006 0.19 0.20

500 32.6 0.021 0.19 0.22

525 33.2 0.042 0.19 0.20

550 33.7 0.056 0.19 0.20

575 34.2 0.069 0.20 0.19

600 34.3 0.085 0.20 0.20

625 34.5 0.081 0.20 0.20

650 34.6 0.085 0.21 0.20

675 34.8 0.092 0.21 0.20

700 34.9 0.094 0.21 0.21

5.4 Global dimensions

Longer simulations were carried out at the optimal T for αS, βS and β+HC.

For βS and β+HC, where simulations at the optimal T had already been carried

out during the optimisation phase, the good agreement between the final and

preliminary statistics further confirms that sufficient sampling is carried out

during the initial phase to obtain reliable statistics.

The⟨R−1

h

⟩−1of each of the most representative ensembles is in good agree-

ment with the experimental value. The range of Rg sampled by each protein is

broad, reflecting the heterogeneous range of structures comprising each ensem-

ble (Figure 5.5). Comparison of the Rg distribution of the new αS ensemble

with that of the original ensemble205 shows that, as is expected given the larger

89

Page 102: Jra Phd Final 051107

Table 5.3: The Q values quantify how well the experimental⟨R−1

h

⟩−1(32.4 A)

and the PRE distances for βS are reproduced by varying T with Nrep = 24,

L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE

distances) and QfPRE to the free dataset (remaining 20%). The results for the

most representative ensemble collected at the optimal T are in bold type.

T (K)⟨R−1

h

⟩−1(A) QRh Qw Qf

425 27.9 0.137 0.17 0.16

450 29.7 0.084 0.18 0.17

475 31.2 0.037 0.19 0.18

500 31.7 0.021 0.19 0.20

525 32.3 0.003 0.20 0.20

525 32.2 0.005 0.20 0.19

550 32.7 0.008 0.20 0.20

575 32.8 0.011 0.20 0.20

600 32.9 0.013 0.21 0.20

625 33.0 0.019 0.21 0.21

650 33.3 0.027 0.21 0.20

675 33.2 0.024 0.21 0.21

700 33.3 0.029 0.21 0.21

⟨R−1

h

⟩−1, a wider range of Rg are sampled, and the new ensemble contains a

greater number of expanded structures.

Further insight into the meaning of the⟨R−1

h

⟩−1can be gained by compar-

ing them to the values expected if the protein is in a compact globular state or

is a purely random coil. The predicted⟨R−1

h

⟩−1for these two reference states

were calculated according to relationships determined by Wilkins et al.161. The

experimental⟨R−1

h

⟩−1of each protein is intermediate between the two calcu-

lated values (Table 5.5). A quantitative measure of the degree of compaction

of a given polypeptide chain is given by the compaction factor, Cf161, which

scales the⟨R−1

h

⟩−1so as to account for differing numbers of residues. Cf ∼ 1

indicates that the protein is of a similar size to that expected if it were folded

into a compact, globular structure, whereas a Cf near zero indicates a highly

expanded chain. According to this measure, β+HC is the most compact of the

three proteins, and βS is the most expanded (Table 5.5). The optimal T also

correlates negatively with Cf , with a higher T required when Cf is low.

To provide a reference state from which to interpret the Rg distributions, a

90

Page 103: Jra Phd Final 051107

Table 5.4: The Q values quantify how well the experimental⟨R−1

h

⟩−1(29.7 A)

and the PRE distances for β+HC are reproduced by varying T with Nrep = 24,

L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE

distances) and QfPRE to the free dataset (remaining 20%). The results for the

most representative ensemble collected at the optimal T are in bold type.

T (K)⟨R−1

h

⟩−1(A) QRh Qw Qf

450 26.9 0.093 0.19 0.23

475 30.0 0.011 0.20 0.27

475 30.3 0.020 0.20 0.28

500 31.9 0.076 0.21 0.30

525 32.7 0.101 0.20 0.31

550 33.5 0.123 0.21 0.31

575 33.9 0.141 0.21 0.34

600 34.1 0.149 0.21 0.35

625 34.1 0.148 0.22 0.34

650 34.4 0.157 0.22 0.35

675 34.4 0.158 0.22 0.36

700 34.7 0.167 0.22 0.37

random coil model analogous to that described in Chapter 3 for αS was gen-

erated for each protein. The⟨R−1

h

⟩−1,

⟨R2

g

⟩1/2 and rms end-to-end distances

of this model agree with those predicted according to theoretical equations de-

rived for random flight chains with excluded volume161,360 (Table 5.5). The

random coil Rg distributions are broader than those of the PRE-ERMD ensem-

bles for each protein, and are shifted towards larger values of Rg (Figure 5.5).

The ensembles produced by PRE-ERMD are thus more restricted than a purely

random coil as well as more compact. Additionally, as with the Cf , the Rg

distribution of β+HC is the most different to the corresponding random coil

model.

5.5 Characterising residual structure

The general method developed in Chapter 4 guarantees that the average global

dimensions of the ensembles of structures match those measured experimentally.

Additionally, the ensemble-averaged tertiary structure, defined in terms of the

often long-range PRE distances, is by definition compatible with the experimen-

91

Page 104: Jra Phd Final 051107

20 40 60 800

0.02

0.04

0.06

0.08

0.1

0.12

20 40 60 800

0.02

0.04

0.06

0.08

0.1

0.12

20 40 60 800

0.02

0.04

0.06

0.08

0.1

0.12

Rg (Å) Rg (Å) Rg (Å)

p(R

g)

A B C

Figure 5.5: Rg probability distributions for (A) αS, (B) βS and (C) β+HC.

The random coil ensembles (see text for definition) are shown in black, the

ensembles calculated using PRE-ERMD are in red and (A only) the ensemble

previously calculated for αS205 is in green. The Rg distributions are plotted

rather than the Rh distributions because the former are faster to calculate, but

the Rh distributions are similar.

Table 5.5: Predicted1 and experimental2 Rh (in A) and compaction factors, C3f

for αS, βS and β+HC in various states.

U F D2O NaCl NaCl + SDS

αS Rh 37.0 19.9 26.6 31.9 24.6

Cf 0.608 0.298 0.725

βS Rh 36.0 19.7 - 32.4 32.2

Cf - 0.221 0.233

β+HC Rh 37.7 20.1 - - 29.7

Cf - - 0.455

1. U and F refer to the Rh predicted for an unfolded or folded polypeptide according to

equations 2.23 and 2.24161, respectively.

2. measured by PFG-NMR on 0−1 mM αS in D2O, pH 7.0, 298 K234, 100 (αS)235 or 200 µM

protein (αS and βS) in 99.9% D2O, 20 mM Mes buffer with 100 mM NaCl, pH 6.5, 288 K158,

70 µM protein in 10 mM phosphate buffer with 100 mM NaCl and 0.5 mM SDS, pH 7.7 at

298 K145.

3. calculated according to equation 2.25161.

tal data. The advantage of biomolecular simulations is that they complement

the averaged information accessible experimentally with atomic-level structural

detail for each conformation in the ensemble. When the ensemble contains only

a limited number of well-defined structures, such as for NFPs in solution and

partially folded states such as the folding TS, it is relatively simple to define and

characterise representative structures. For DS, however, the ensembles contain

a broad and heterogeneous range of structures, and it is not always apparent

92

Page 105: Jra Phd Final 051107

how best to analyse these. In theory, cluster analysis allows the ensemble to

be grouped into sets of like structures, thus partitioning the accessible confor-

mational space into a manageable number of sub-ensembles, each of which can

be described by a single representative structure. The success of clustering,

however, is dependent on the choice of a suitable reaction coordinate to define

the distance matrix that discretises the conformational phasespace. A number

of different distance measures were tested as part of this work, but none proved

successful (data not shown).

The most useful means of summarising the structural propensities of large

ensembles of structures is to display the probabilities of occurrence and co-

occurrence of various properties as 2D maps. Ramachandran maps and free

energy maps, which were introduced in Chapters 3 and 4, are also used here.

Additionally, a new type of 2D map is developed to investigate the inter-residue

distances.

5.5.1 Distance comparison maps

Definition

A convenient means of examining the structural propensities of an ensemble

is to represent the distances between residues as a 2D plot. In the past, both

the raw ensemble-averaged distances240 and more complicated functions of the

inter-residue distances such as the residual contact probability (RCP)201,202,205

have been used. The former method, like comparing⟨R−1

h

⟩−1values, suffers

from the fact that the magnitude of the inter-residue distances for DS in gen-

eral scales with the sequence separation of the residues involved, making it

difficult to compare the distances between different pairs of residues within the

same molecule or between any pair of residues in two or more different proteins.

The RCP, defined as − ln(pcalc

ij /prcij

), accounts for the sequence separation by

comparing the probability of residues i and j being in contact in the calculated

ensemble, pcalcij , to the probability of them coming into contact if the molecule

were a purely random coil, prcij . The contact probability is calculated by deter-

mining how many times residues i and j are separated by less than 8.5 A out

of all of the structures in the ensemble. The RCP is therefore influenced most

by the shortest distances, and is impervious to the remainder of the distance

distribution. Additionally, residues far apart in sequence become increasingly

unlikely to come closer than 8.5 A, even if they are, on average, closer together

than is expected for a random coil.

To overcome the aforementioned difficulties, ‘distance comparison’ (DC)

maps were created. The rms distance between two residues in the calculated

93

Page 106: Jra Phd Final 051107

ensemble,⟨dcalc2

ij

⟩1/2

, is compared to the rms distance for the same sequence

separation in a random coil,⟨drc2

ij

⟩1/2

, according to equation 2.19.

An expression that predicts the rms end-to-end distance of a random flight

chain with excluded volume and dihedral angles taken from a PDB coil database

(equation 2.21)360 was used to estimate⟨drc2

ij

⟩1/2

. The sequence separation of

the pair of residues under consideration was equated with the length of the ran-

dom flight chain. Calculation of⟨drc2

ij

⟩1/2

from the random coil gives essentially

identical distances (data not shown), meaning that if a random coil model is not

available, the theoretical predictions will suffice. The DC value for each pair of

residues is plotted directly, without the smoothing that is applied to the RCP,

so that local detail is not lost.

Interpretation

It has been shown that it is possible to obtain random coil-like scaling of

global parameters such as the Rh whilst retaining varying degrees of local struc-

ture257–259,400. Additionally, at the local level, random flight and secondary

structural characteristics may be indistinguishable. For instance, the rms inter-

residue distances for an α-helix are almost identical to those expected for a

random flight chain for sequence separations in the range 1− 8 residues258,259,

meaning that DC values near 1.0 do not necessarily correspond to random coil

structure. Of course, the effects of ensemble-averaging discussed above still

apply.

Just as for the interpretation of experimental measurements made on DS,

there may not be any structures present in the ensemble that exhibit all of

the characteristics displayed in the DC map simultaneously. In fact, it has been

shown for unfolded states of natively folded proteins that the ensemble-averaged

CαCα distance matrix is closer to that of the native structure than the distance

matrix of any individual member of the unfolded state ensemble401. Thus it

is not unexpected that, despite there being no native fold to refer to, the DC

maps indicate the presence of some residual structure. The best interpretation

of these averaged structural properties of the molecules comprising an ensemble

is in terms of structural propensities that influence the range of conformations

sampled by a given residue, rather than as continuous segments of simultane-

ously defined structure.

RCP and DC maps provide complementary information

As an example of how the DC maps reflect different aspects of the nature

of the structures in an ensemble to the RCP maps, the DC and RCP maps

94

Page 107: Jra Phd Final 051107

of the original ensemble of αS structures generated using PRE-ERMD205 are

compared (Figure 5.6 A and B). The RCP map portrays an increased propensity

towards contact formation between the C-terminus and the central NAC region.

In comparison, the DC map reveals that the shortest rms inter-residue distances

relative to those expected for a random coil occur between residues 1− 60 and

110− 140, although the distances between the C-terminus and the NAC region

are also shorter than expected.

The discrepancy between the nature of the residual structure suggested by

the two types of map can be reconciled by considering the fact that the RCP

reports only on distances less than 8.5 A, whereas the DC compares the position

of the centre of the distance distributions. For instance, the RCP between the C-

terminus and the most proximal residues is increased relative to the interactions

of the C-terminus with the remainder of the sequence, but the DC is only slightly

lower. Because the pairs of residues involved in these interactions are close

together in sequence, a slight shift of the distance distribution towards shorter

distances relative to the random coil distance distribution causes a significant

increase in the probability of occurrence of distances less than 8.5 A. This results

in a significant increase in the RCP, which is logarithmically dependent on the

relative probabilities, whereas the DC, which is a simple ratio, is not affected

so markedly. The opposite situation occurs for distances between the N- and

C-termini, where the DC values suggest shorter than average distances but the

RCP values do not correspond to an increased contact probability. Even if the

entire distribution were shifted significantly towards shorter distances relative

to those expected for a random coil, thus causing a change in the DC value, the

large sequence separation would prevent the number of distances less than 8.5 A

becoming large enough to affect the RCP. Thus the apparent disparity between

the nature of the structural propensities of αS suggested by each type of map

can be easily explained by considering how each measure reacts to changes in

the relationship between the distance distributions of the ensemble of interest

and those of a random coil.

5.6 Residual structure of αS, βS and β+HC

DC maps were created for the previously published αS ensemble and the en-

sembles of αs, βS and β+HC ensembles generated here. In all cases, there are

regions in which the inter-residue distances are shorter than expected for a ran-

dom coil, portrayed by DC values significantly less than 1.0 (Figure 5.6 C-E).

This may occur simply because all three proteins are slightly more compact than

95

Page 108: Jra Phd Final 051107

Figure 5.6: (A) RCP map and (B) DC map for the previously published αS

ensemble205, (C-E) DC maps for the (C) re-calculated αS (D) βS and (E)

β+HC ensembles determined by PRE-ERMD. The RCP and DC are defined

in Section 5.5.1. The same scale is used for all DC maps to aid comparisons.

a random coil on average (Table 5.5). However there are also some inter-residue

distances in all three proteins that are, on average, longer than is expected for

a random coil, suggesting the presence of non-random residual structure.

In Section 5.6.1, the residual structure suggested by the DC map is com-

pared for the αS ensemble produced here and the previously published ensemble.

The analysis is then extended to βS and β+HC. The structural propensities of

the C- and N-termini of all three polypeptides inferred from the DC maps are

compared with qualitative reference to experimental data and predicted quanti-

ties. The dihedral angle preferences, including differences in the conformational

propensities of different sections of the protein are then investigated by gener-

ating separate Ramachandran plots for each section. A quantitative assessment

of the agreement of back-calculated observables with the experimental data is

made for αS and βS to investigate the correspondence between the local struc-

tural properties of the calculated ensembles and that observed experimentally.

Finally, free energy maps are used to examine aspects of the global structural

properties of the three proteins.

96

Page 109: Jra Phd Final 051107

5.6.1 Comparison of the re-calculated and previously pub-

lished αS ensembles

The ensemble of αS structures produced here is expected to differ slightly from

the previously published ensemble due to the inclusion of additional PRE dis-

tance restraints and the larger⟨R−1

h

⟩−1. In accordance with the more ex-

panded structures present in the new ensemble, the DC values are greater overall

(Figure 5.6 C). The smallest DC values are for inter-residue distances between

residues around 120 and the first 40 residues of the protein, thus the tertiary

structure exhibited by this ensemble is broadly similar to that of the original

ensemble.

The location of the lowest DC values is in keeping with the results of Bernado

et al.266, who selected structures containing particular sets of contacts from an

ensemble of structures created using a dihedral angle database with excluded

volume and found that the experimental RDCs for αS are best reproduced when

only structures containing contacts between residues 6 − 10 and 136 − 140 are

considered. In contrast, the use of RCP maps to analyse the original ensemble

suggested an increased RCP between the C-terminus and the NAC region of

the original ensemble, which was credited with protecting the NAC region from

aggregation205. The presence of interactions of this type is supported by the

experimental PRE-NMR data of Bertoncini et al.240 and Sung and Eliezer159.

As outlined in Section 5.5.1, this apparent discrepancy can be reconciled by

considering the different sensitivities of the two analysis methods. DC values

report on the location of the centre of the distribution, whereas the RCP and

the experimental PRE distances are most sensitive to the shortest distances.

The definition of contact formation used by Bernado et al. is less stringent than

the definition of RCP: a contact between two regions of the polypeptide is said

to occur if the Cβ atoms of two residues are separated by less than 15 A266.

Therefore, measures that are most sensitive to the shortest distances highlight

preferential contact formation between the C-terminus and the NAC region of

αS, whereas when larger distances are considered, the biggest difference between

the calculated ensemble and the random coil ensemble pertains to interactions

between the C- and N-termini. Whilst it is interesting to know how often two

residues come close enough together to interact specifically, this information

is already contained within the PRE-NMR data, whereas the DC maps pro-

vide additional information about the remainder of the distribution that is not

available experimentally.

97

Page 110: Jra Phd Final 051107

5.6.2 Long-range structure of βS and β+HC

In both βS and β+HC the distances between residues separated by more than

40− 50 residues are all significantly shorter than the random coil, especially for

distances between the first 40 residues and residues 80 − 145 in β+HC (Fig-

ure 5.6 D and E). This is in keeping with the experimental data shown in

Figures 5.2, 5.3 and 5.4, but contrasts with the experimental results of Sung

and Eliezer159 and Bertoncini et al.158, who find that βS exhibits fewer long-

range interactions than αS. It is not clear why this discrepancy exists, as the

experiments were all conducted in similar conditions. The regions exhibiting

the shortest inter-residue distances are shifted slightly towards the N-terminus

compared to αS, so that in βS, the shortest distances are between residues 1−40

of the N-terminus and residues 80− 120 of the C-terminus, and in β+HC, they

are between residues 1 − 40 of the N-terminus and residues 80 − 145 of the

C-terminus. This may provide additional protection to the central region, in

keeping with the lower aggregation propensity of both of these polypeptides.

For β+HC, the scaled long-range distances are shorter than in either αS or βS,

reflecting its larger compaction factor (Table 5.5).

5.6.3 Structural propensities of the C-terminus

All three proteins, and in particular βS, exhibit distances between residues

within the C-termini that are larger on average than in a random coil (Fig-

ure 5.6 C-E). This can be interpreted as either extended β or PPII structure,

both of which are characterised by rms inter-residue distances longer than those

of a random flight chain259. For βS, PPII structure is the most likely, as the C-

terminus of βS contains 8 proline residues, which are known to disrupt β-sheet

formation, and PPII structure has been observed experimentally158. The exper-

imental data for αS, in contrast, suggest a much lower PPII propensity109,158,

thus the DC values greater than 1.0 in the C-terminus of this protein are more

likely to correspond to extended β-like structure. There is less experimental

data available for β+HC, but the cross-peaks in the HSQC spectra overlay with

those of βS for the majority of the sequence, and with those of αS for the

inserted hydrophobic core region145, indicating that the secondary structural

preferences of the C-terminus are likely to be similar to those of βS. Interest-

ingly, the C-terminus of β+HC does not contain as many DC values greater

than 1.0 as βS, suggesting that the insertion of the αS hydrophobic core may

have an indirect effect on the structural propensities of the C-terminus.

98

Page 111: Jra Phd Final 051107

5.6.4 Structural propensities of the N-terminus

Within the N-termini of all three proteins there are clusters of residues close

together in sequence separated by distances that are, on average, similar to

in a random coil. Such DC values could result from either random coil or α-

helical structure, as the expected inter-residue distances are the same for short

sequence separations259. The helical propensity predicted using agadir361–364

shows a series of regions within the N-termini that are mildly prone to form

helical structure (Figure 5.7 A-C). None of these regions correspond precisely

to the areas where DC ∼ 1.0 in the DC maps, however. Additionally, the helical

propensity is lowest for β+HC, whereas the DC maps suggest that this protein

has the largest amount of N-terminal residual helical structure.

Further evidence for the presence of local α-helical structure in the N-termini

is provided by the results of NMR experiments. The ∆δ 109,158,159 and 3J-

couplings109,158 for the N-termini of both αS and βS reveal a propensity to-

wards helical structure in the solution state. The helical propensity of β+HC

is likely to be similar, as its cross-peaks overlay those of βS for the N-terminal

72 residues, and those of αS for residues 73− 83145. Given that the N-termini

of both αS and βS become helical upon binding to lipid membranes114–120, it

appears that the lipid-bound structure of αS and βS may be encoded in their

solution state ensembles. It is unlikely that, in solution, the N-termini fluctuate

between fully formed α-helix and completely random coil, as this would be ex-

pected to be detected experimentally. More probably, short sections of α-helix

form transiently in the solution state ensemble, and the preferential binding of

lipids to the helical form results in a shift in the equilibrium upon the addition

of lipids.

The transient nature of the residual helical structure is in keeping with the

non-negative RDCs observed for the N-terminus158,159. It also explains why

some of the details of the lipid-bound structures, such as the break in heli-

cal structure around residue 40114–120, are not observed in the solution-state

ensembles. Distinguishing helical and non-helical structure is complicated by

the similarity between the predicted distances for a random coil and for an α-

helix259. One aspect of the lipid-bound structures that is present, however, is

the longer α-helical region in αS caused by the extra 11 residues relative to βS.

The insertion of these residues into βS to form β+HC extends the unitary DC

values to residue ∼ 90, whereas lower DC values indicative of local compaction

occur around residue 70 in βS. This is in keeping with the termination of helical

structure around residue 65 observed experimentally for βS in solution159.

Other details of the solution state ensembles observed experimentally are

99

Page 112: Jra Phd Final 051107

not clearly delineated in the DC maps. According to the ∆δ, residues 6− 37 of

αS have the greatest helical propensity in solution109, whereas in βS, there are

two distinct regions of higher helical propensity in the N-terminus, comprising

residues 20 − 35 and 55 − 65158. The helical propensity of the central portion

of the polypeptide chain in solution is therefore higher for βS158, but these

differences cannot be seen in the DC maps.

0 20 40 60 80 100 120 1400

1

2

3

4

Hel

ical

Co

nte

nt

0 20 40 60 80 100 120 140-6-4-2024

0 20 40 60 80 100 1200

1

2

3

4

Hel

ical

Co

nte

nt

0 20 40 60 80 100 120-6-4-2024

0 20 40 60 80 100 120 140Residue Number

0

1

2

3

4

Hel

ical

Co

nte

nt

0 20 40 60 80 100 120 140Residue Number

-6-4-2024

Zag

gp

rof

Zag

gp

rof

Zag

gp

rof

A

B

C F

E

D

Figure 5.7: (A-C) Helical propensity predicted using agadir for (A) αS, (B), βS

and (C) β+HC. In (A), the black line corresponds to the experimental conditions

of Morar et al.234 and the red line to the conditions of Binolfi et al.235. The

predictions shown in (B) and (C) were made using the experimental conditions

of Bertoncini et al.158 and R.C. Rivers145, respectively. (D-F) Aggregation

propensity, Zprofagg , predicted using the Zyggregator algorithm366 for (D) αS,

(E), βS and (F) β+HC.

5.6.5 Dihedral angle preferences

For all three proteins, the Ramachandran plots averaged over all residues (Fig-

ure 5.8 A-C) show that PPII structure is the most common, followed by α-helix

and lastly β structure. β+HC exhibits more β structure than the other two

proteins, and βS exhibits the most PPII structure. For αS, in particular, there

is also a small probability of sampling positive φ angles, which is unusual and

does not correspond to any common secondary structural motif. When only the

N-termini of the proteins are considered, the Ramachandran plots are essentially

identical to those for the entire sequence (not shown), most likely because these

regions (residues ∼ 1 − 100, see definitions in caption of Figure 5.8) contain a

100

Page 113: Jra Phd Final 051107

large fraction of the entire sequence. The Ramachandran plots for the C-termini,

however, are different to the overall and N-termini plots (Figure 5.8 D-F). There

is a reduction in the α-helical propensity, especially for αS and βS, which results

in an increase in β structure for αS, and an increase in PPII for the other two

proteins, particularly for βS. This is in agreement with the prediction based on

the experimental data and the DC maps that the C-termini of all three proteins

have a lower α-helical propensity and an increased propensity to form PPII and

β-structure. Thus, despite being averaged over many different types of residue,

the Ramachandran plots show that there are differences in the dihedral angle

distributions between the N- and C-termini, which must be indirectly encoded in

the PRE distance restraints. The overall propensity towards helical structure,

whether α-helical or PPII, may however be due to the sasa implicit solvent

model, which is known to favour helical structures.

Figure 5.8: Ramachandran plots showing the dihedral angle distributions p(φ, ψ)

for (A,D) αS, (B,E) βS and (C,F) β+HC. In (A-C) the probability of each

combination of φ and ψ dihedral angles is the average over all residues and

all structures whereas for (D-F) only the C-termini (residues 103− 140 for αS,

98 − 134 for βS and 103 − 145 for β+HC) were considered. The same scale is

used for all plots to facilitate comparisons.

5.6.6 Comparison with experimental data

The most stringent test of how well PRE-ERMD reproduces the ‘true’ ensemble

of structures is a quantitative comparison with experimental data. The agree-

101

Page 114: Jra Phd Final 051107

ment with the free PRE distances is almost as good as the satisfaction of the

restraints (Tables 5.2, 5.3 and 5.4), indicating that the ensemble-averaged long-

range structure is in reasonable agreement with that observed experimentally.

Little other quantitative data is available for β+HC, but the 3JHNHα-couplings

and RDCs for αS and βS were obtained from C. Bertoncini158,240 for comparison

with the back-calculated values.

The calculated 3JHNHα-couplings for αS and βS are around 5 Hz through-

out the sequence, the upper boundary of the range of values expected for helical

structure (Figure 5.9 A and B). The difference between the N- and C-termini ob-

served in the Ramachandran plots is not evident. Moreover, the agreement with

the experimental 3JHNHα-couplings is poor. The experimental couplings are in

general larger than the calculated couplings and lie in the range expected for

random coil structure. There is also more fluctuation in the measured 3JHNHα-

couplings along the sequence, suggestive of local residue-specific conformational

preferences that are not reproduced in the calculated ensembles. This is not

surprising given that inter-residue distance restraints only contain local con-

formational information when they involve residues close together in sequence,

but the amide peaks of residues proximal to the spin-label are often broad-

ened beyond detection in PRE-NMR experiments, meaning that no distance

restraint can be obtained. The similarity between the 3JHNHα-couplings of the

PRE-restrained αS ensemble and those of the unrestrained ensemble (αSASA)

analysed in Chapter 3 provides further evidence that PRE distance restraints

are not sufficient to alter the dihedral angle preferences encoded in the force-field

and implicit solvent model. The simplest way to improve the 3JHNHα-couplings

of the calculated ensembles is to restrain them directly. This is not possible

using the current PRE-ERMD methodology, however, due to the absence of Hα

atoms in the charmm19 representation, and the inability of the implicit solvent

models parameterised for use with all-atom representations, such as GB/SA, to

produce converged ensembles of sufficiently expanded structures.

Comparison of the calculated and experimental RDCs was expected to prove

particularly interesting, as deviations of the experimental RDCs from a uniform

distribution in the C-termini of αS and βS have been interpreted as suggesting

the presence of specific structural preferences in the C-termini that are different

for αS and βS158 or as merely the product of preferential alignment due to the

extended nature of this region159. The RDCs for αS in a variety of media all

exhibit two regions in the C-terminus with RDCs of greater magnitude than

the remainder of the sequence, separated by near-zero RDCs around residue

102

Page 115: Jra Phd Final 051107

122159,240. The location of the break in the RDC pattern is especially intrigu-

ing in light of the localisation of the shortest DC values around residue 120.

Other experimental data for the lipid-bound state (in which the C-terminus re-

mains disordered) are also consistent with some sort of structural perturbation

in this region. The paramagnetic broadening induced by an aqueous spin-label

indicates that any residual structure in the C-terminus might be divided into

two segments, one on either side of position 122117. Furthermore, the Cα ∆δ in

this region114, although largely indicative of random coil, show two regions of

similar shifts on either side of position 122, and the dynamics data (R1, R2 and

nOes) for the lipid-bound state117 and, in one case, the free state402 suggest a

slightly lower mobility at position 122 than on either side. For the free state,

at least, this lower mobility may be due to the residual interactions with the

N-terminus.

The RDCs back-calculated from the αS ensemble produced here, however,

do not exhibit the distinct peaks in the C-terminus (Figure 5.9 C). For the re-

mainder of the sequence, the magnitude of the calculated RDCs, but not the

residue-specific pattern, is similar to that of the experimental RDCs measured

in Pf1 bacteriophage alignment media, other than around residue 60, where they

are more like those measured in C5E8/octanol. A similar situation occurs for

the RDCs calculated from the βS ensemble: the magnitude of the calculated

and experimental RDCs in the N-terminus are similar but the calculated RDCs

do not correspond to the experimental RDCs recorded for the C-terminus (Fig-

ure 5.9 D). Insufficient averaging, as discussed in Chapter 3, is unlikely to be

the explanation for the discrepancies, as the number of structures used, 57 600,

is enough for the calculated RDCs to have converged. Additionally, the effect

of increasing the number of structures is to reduce the amount of variation in

the RDCs along the sequence, whereas the opposite would be required to repro-

duce the experimental data. Despite the inability of the calculated ensembles

to reproduce precisely the local structure that perturbs the experimental data

from the random coil expectations, the generally larger RDCs in the C-termini

of both proteins is in keeping with the extended nature of this region invoked

from analysis of the DC maps, supporting the conclusion of Sung and Eliezer

that this may be the major factor contributing to the RDCs observed for αS

and βS159.

5.6.7 Free energy maps

A more global perspective on the nature of the structures sampled by each

of the three proteins can be gained by examining the free energy landscapes

103

Page 116: Jra Phd Final 051107

0 20 40 60 80 100 120 1404

5

6

7

8

9

0 20 40 60 80 100 1204

5

6

7

8

9

0 20 40 60 80 100 120 140Residue Number

-5

0

5

10

RD

C (

Hz)

0 20 40 60 80 100 120Residue Number

-5

0

5

10

RD

C (

Hz)

3 J HN

Hα (

Hz)

3 J HN

Hα (

Hz)

A

C

B

D

Figure 5.9: (A,B) 3JHNHα-couplings and (C,D) RDCs for (A,C) αS and (B,D)

βS. The 3JHNHα-couplings and RDCs back-calculated from the PRE-ERMD

ensembles are in black and the experimental data are in red. In (C) and (D),

the RDCs obtained in C8E5/octanol are in red and those measured in Pf1

bacteriophage are in green (C only). The grey lines at 0 Hz in (C) and (D) are

to guide the eye.

(Figure 5.10). The greatest differences between the three proteins occur for

F (Rg,SASA), with βS exhibiting the narrowest range of SASA and β+HC the

widest. Interestingly, this pattern reflects the relationship between the Cf of

the three proteins (Table 5.5), so that a low Cf corresponds to a narrow range

of SASA. In all cases, the structures with the lowest Rg encompass a wide

range of SASA; similarly, there are a large range of Rg corresponding to the

largest SASA. Thus having a small Rg poses few restrictions on the fraction

of the surface area that is exposed. This may facilitate the role of αS as a

hub protein65, as a larger surface area allows for a diverse range of binding

partners59. The greater similarity between the F (Rg, SASA) landscapes of αS

and β+HC suggests that the insertion of the central NAC region into β+HC

causes it to behave more like αS in this respect.

The three proteins are not so easily distinguished in terms of F (Rg, REE).

Analysis of the relationship between Rg and REE by linear regression (data not

shown) indicates that for Rg up to ∼ 35− 40 A the corresponding REE is lower

than is expected for a purely random coil15, whereas above this it is higher. The

shorter than expected REE of the more compact structures is in keeping with

the tendency towards contact formation between the N- and C-termini noted

previously in the DC maps and experimental data266.

104

Page 117: Jra Phd Final 051107

Figure 5.10: Free energy landscapes of the (A,D) αS, (B,E) βS and (C,F)

β+HC ensembles. The free energy is defined as (A-C) F (Rg, SASA) =

− ln p(Rg,SASA) and (D-F) F (Rg, REE) = − ln p(Rg, REE), where REE is the

end-to-end distance.

5.7 Implications for aggregation

The construction and study of β+HC was initiated with the aim of understand-

ing why the predicted aggregation propensity and measured aggregation rate of

βS are lower than those of αS145. Fibril formation by βS requires the presence

of metals146 or sub-critical micelle concentrations of SDS (0.5 mM), conditions

which also increase the aggregation rate of αS403,404. The PRE distances were

collected without metals or SDS present, however, thus the ensembles calcu-

lated here can only provide insight into the aggregation properties of the three

proteins in the absence of these components. This is unlikely to be critical in

the case of βS, at least with respect to SDS, as the⟨R−1

h

⟩−1is the same in

the presence and absence of 0.5 mM SDS145,158 (Table 5.5) and the intensity

ratios measured in SDS with a spin-label at position 42 are not significantly

different to those obtained in solution145. The induction of βS aggregation by

SDS is therefore most likely due to increases in the local protein concentration

rather than any induced structural changes. αS, in comparison, becomes con-

siderably more compact in 0.5 mM SDS (Table 5.5), thus structural changes

cannot be ruled out. The ensemble calculated here, however, remains relevant

for understanding the initiation of fibril formation, as αS does not require SDS to

stimulate aggregation. The induction of aggregation by metals for both proteins

105

Page 118: Jra Phd Final 051107

is thought to be due to neutralisation of the negatively-charged C-termini159,

which is discussed further below. For β+HC, interpretation of the results ob-

tained here is more complicated: the Rh was only determined in the presence of

SDS, whereas the PRE-NMR measurements were made without SDS present. If,

like αS, β+HC collapses in the presence of SDS, the ensemble created here may

be too compact. However based on the comparison of the previously published

and re-calculated αS ensembles, the greatest change to the residual structure is

likely to be an overall decrease in the DC values rather than alteration of the

specific structural propensities.

It is generally accepted that the cause of the different aggregation propensi-

ties of αS and βS is the absence of 11 residues (73−83) from the NAC region of

βS147,155. Contrary to the original expectations, β+HC, which contains residues

73−83 of αS within the βS sequence following residue 72, was found to have sim-

ilar aggregation properties to βS145. Further investigations, including analysis

of the aggregation properties of two deletion mutants, α∆73-83 and α∆71-82,

showed that the most likely reason for the similar aggregation behaviour of βS

and β+HC is the inclusion of E83 in the β+HC construct145. This negatively

charged residue is thought to disrupt the inter-molecular interactions of the

hydrophobic core and may therefore act as an aggregation ‘gatekeeper’155,156.

Additionally, the incorporation of charged residues into the hydrophobic core of

full-length αS decreases the rate of fibril formation155,156. This suggests that the

lower experimental and theoretical aggregation propensities of βS and β+HC,

both of which have a greater net charge than αS, may be due to inter-molecular

repulsion between charged residues.

The role of charge in preventing aggregation is not confined to the inter-

molecular interactions. Whilst any contacts made by the C-terminus with the

NAC region are thought to be hydrophobic in nature, interactions with the

N-terminus are most likely electrostatic. The increased negative charge of the

C-termini of βS and β+HC may therefore enhance these intramolecular electro-

static interactions. Indeed, comparison of the DC maps shows that the scaled

distances between the N- and C-termini of βS and β+HC are shorter than those

of αS (Figure 5.6 B-D). Additionally, the predicted aggregation propensity of the

C-termini of βS and β+HC is even lower than that of αS (Figure 5.7 D-F). The

importance of electrostatic interactions between the N- and C-termini in deter-

mining the aggregation properties is supported by most experimental data, other

than one recent study which failed to find any evidence for perturbation of in-

tramolecular interactions by polycation binding to αS159. In agreement with the

conjectures presented here, C-terminal truncation mutants of αS only aggregate

106

Page 119: Jra Phd Final 051107

faster than wild-type151 if the truncation removes the majority of the charged

residues from the C-terminus. Additionally, the binding of positively-charged

polyamines such as spermine to the C-terminus increases the aggregation rates

of βS in SDS and αS without SDS145,151,154,405,406. Neutralisation of the excess

negative charge of αS at low pH also increases the aggregation rate407. Thus

features apparent in the PRE-ERMD ensembles correlate with the experimental

data and provide further support for the suggestion that charge plays a key role

in controlling the aggregation propensities of the synucleins.

5.8 Conclusions

The use of PRE-ERMD augments the information available from experimen-

tal data by providing atomic-level structural detail. DC maps were developed

to characterise the structures produced using PRE-ERMD by comparing the

ensemble-averaged rms CαCα distances to the expected inter-residue distances

for a random coil. Analysis of these maps shows that the distances between the

N- and C-termini of all three proteins are shorter than is expected for a purely

random coil, indicative of interactions between the two regions that may be elec-

trostatic in nature266. Both the DC maps and the Ramachandran plots reveal a

tendency towards α-helical propensity in the N-terminus of all three proteins, in

keeping with the experimental data and suggesting that the lipid-bound struc-

ture of αS and βS is encoded in their solution-state conformational preferences.

The C-termini of βS and, to a lesser extent β+HC have a tendency to form

PPII structure, whereas the C-terminus of αS is more disordered. Whilst such

qualitative features agree well with the available experimental data, quantita-

tive assessment of the agreement with the experimental 3JHNHα-couplings shows

it to be poor, suggesting that although the gross tertiary structure implied by

the PRE distances is reproduced by the calculated ensembles, the addition of

PRE distance restraints is not sufficient to affect the description of the local

structure provided by the force-field and implicit solvent, which, in this case, is

not compatible with that observed experimentally. The back-calculated RDCs

also failed to exhibit all of the features present in the experimental data. In-

terestingly, however, the larger RDCs calculated for the C-termini of αS and

βS are in keeping with an interpretation of the experimental data in which it

is the more extended nature of the C-termini that causes the increased RDCs

measured for this region rather than residual contact formation159.

The main structural effect of inserting the hydrophobic core of αS into βS is

an extension of the N-terminal helical propensity to include the inserted residues.

107

Page 120: Jra Phd Final 051107

This appears to weaken the PPII propensity of the C-terminus, making β+HC

more like αS in this respect and suggesting that the structure of the C-terminus

is affected by the remainder of the sequence. Other than this, the resemblance

between the structural propensities of β+HC and βS echoes their similar ag-

gregation propensities. The main difference between these two proteins and

αS likely to be related to aggregation is the greater number of inter-residue

distances between the N- and C-termini that are shorter than expected for a

random coil. As interactions between the N- and C-termini are expected to be

electrostatic in nature, this strengthens the case for charge playing a key role in

determining the aggregation properties of these polypeptides.

108

Page 121: Jra Phd Final 051107

Chapter 6

Characterisation of the

acid-denatured state of

PI3-SH3

6.1 Introduction

In Chapter 5, the generalised PRE-ERMD method developed in Chapter 4 was

used to generate ensembles of structures for the IDPs αS and βS and a related

construct, β+HC in order to rationalise their relative aggregation propensities.

For these proteins, there is no folded structure to refer to, and aggregation can

proceed directly from the disordered NS. In contrast, it is generally agreed that

NFPs must unfold prior to aggregation3,16–20. Characterising the unfolded and

partially folded states of NFPs may therefore shed light on how both folding to

the NS and mis-folding and aggregation into various oligomeric species is initi-

ated. Understanding how the balance between these two processes is controlled

is of critical importance given that protein aggregation is involved in an increas-

ingly large number of diseases3. The atomic-level structural detail provided by

simulation may help to elucidate the mechanism of aggregation at the molecular

level.

This chapter describes the application of the general PRE-ERMD method

to the acid-denatured state of the PI3-SH3 domain (SH3-AS), which is known

to be the precursor to amyloid fibril formation47–51. Prior to carrying out the

PRE-ERMD, the possibility of explaining the experimental PRE-NMR data for

SH3-AS in terms of various combinations of native and random coil structure

109

Page 122: Jra Phd Final 051107

is explored. The experimental data is then treated according to the principles

outlined in Chapters 4 and 5 and an ensemble of structures representative of

SH3-AS is generated (SH3-PRE). To complement this ensemble, a coil library

ensemble, comprising structures generated using a self-avoiding statistical coil

model based on backbone conformational preferences from coil regions of pro-

teins in the PDB265, was obtained from A. Jha (SH3-CLIB). The global dimen-

sions and residual structure of these ensembles are probed using the methods

developed in the previous chapters of this thesis, with comparison to the native

fold at neutral pH (SH3-NS, PDB code 1pnj, Figure 6.2 A)41 and a random

coil model (SH3-RC) obtained as described in Chapter 5. The TS ensembles

of three related SH3 domains are also included in the analysis to assist the in-

terpretation of the DC maps for SH3-PRE. A quantitative assessment of the

agreement with the experimental RDCs is made and the free energy maps are

examined. Finally, the implications of the structural propensities identified in

this study for the aggregation of PI3-SH3 are discussed.

6.2 Experimental PRE-NMR data implies non-

native structure

An issue that is widely debated in the context of the presence of residual struc-

ture in DS is whether any non-random structure implied by the experimental

observables is due to a small fraction of highly structured molecules amongst a

largely unstructured ensemble, or results from an ensemble of partially-structured

polypeptides170,176,199,202,204,205,222,224,232,233,236–244,255,257–259,408,409. In the case

of NFPs, the native fold provides a potential candidate for the structured state

in the first scenario. Indeed, experimental data for DS of several NFPs sug-

gests the presence of native-like residual structure170,222,236,238,244,408,409. To

test whether the observed Iox/Ired for SH3-AS can be explained by the presence

of a small number of natively folded molecules amid an essentially random coil

ensemble, the expected Iox/Ired were computed for SH3-NS and SH3-RC and

combined in varying proportions.

Overall, the Iox/Ired measured for SH3-AS are mostly lower than those pre-

dicted for SH3-RC (Figure 6.1), reflecting the high degree of compaction of

SH3-AS (see Table 6.2 and Section 6.4). The large variation in the Iox/Ired pre-

dicted for SH3-NS is due to the well-defined tertiary structure, which results in

some residues that are distant in sequence being located close to the spin-label.

Although there are some areas where the experimental Iox/Ired for SH3-AS are

similar to those predicted for SH3-NS, such as around residues 60− 70 for L13

110

Page 123: Jra Phd Final 051107

and L42, and residues 80− 86 for L13, L26 and L42, there is little resemblance

between the Iox/Ired of SH3-NS and those of SH3-AS overall.

The Iox/Ired expected if 1, 10 or 50% of the ensemble exists in the native

fold and the remainder is purely random coil also bear little similarity with the

experimental data for SH3-AS. It is obvious from the systematic behaviour of

the composite Iox/Ired as the contribution of the SH3-NS Iox/Ired is increased

that combinations of the SH3-NS and SH3-RC Iox/Ired other than the fractions

considered here would also fail to explain the experimental data. It seems

unlikely, therefore, that the Iox/Ired observed experimentally for SH3-AS are

due to the presence of a sub-population of natively folded protein molecules

within a random coil ensemble. Whilst it remains unclear whether the observed

Iox/Ired arise from a small fraction of the proteins exhibiting structure other

than the native fold or an ensemble of partially-structured molecules, it seems

certain that any structure that does occur is non-native in nature, and the low

Iox/Ired suggests that short inter-residue distances are relatively frequent.

0

0.2

0.4

0.6

0.8

1M3 S4 L13 L26 L42

0 20 40 60 800

0.2

0.4

0.6

0.8

1S45

0 20 40 60 80

E54

0 20 40 60 80Residue Number

E63

0 20 40 60 80

G80

0 20 40 60 80

P86

I ox/I re

dI o

x/I red

Figure 6.1: The distribution of Iox/Ired along the sequence for each spin-label

position as indicated. The experimental data for SH3-AS is shown as black bars

and the thick lines correspond to the Iox/Ired calculated from SH3-RC (green)

and SH3-NS (red). The thin red lines correspond to 1% SH3-NS, 99% SH3-RC

(solid), 10% SH3-NS, 90% SH3-RC (dashed) and 50% SH3-NS, 50% SH3-RC

(dotted). The experimental Iox/Ired shown for SH3-AS are those processed for

use in the simulations (see Section 6.3), thus any Iox/Ired < 0.15 or > 0.85 have

been set to 0.15 and 0.85, respectively. If no bar is present, then either Iox/Ired

was not measured for this residue or it was discarded due to error > 10%.

111

Page 124: Jra Phd Final 051107

6.3 Choice of optimal T for characterisation by

PRE-ERMD

PRE-NMR was carried out on SH3-AS with the MTSL spin-label attached in-

dependently in 10 different positions distributed throughout the sequence (Sec-

tion 2.2). The experimental data were treated in the manner introduced in

Chapter 5. Any Iox/Ired with greater than 10% error were discarded. For

the distances calculated from the remaining 639 Iox/Ired, those calculated from

Iox/Ired < 0.15 and Iox/Ired > 0.85 were assigned only an upper or lower bound,

respectively. The working dataset comprised 80% of the PRE distances, with

the remaining 20% used for independent cross-validation.

The PRE-ERMD simulations were run using the general method developed

in Chapter 4. QRh was used to assess how well the global size of the molecules

is reproduced and as the primary determinant of the optimal T . A relationship

between R−1g and R−1

h ,

R−1h = 0.0227 + 0.405R−1

g , (6.1)

derived by K. Lindorff-Larsen in the manner described in Chapter 5, was used

to convert the Rg of each structure in the ensembles generated by PRE-ERMD

into an Rh. The⟨R−1

h

⟩−1was then computed according to equation 2.9.

In a similar manner to the results recorded in Chapter 5 for αS, βS and

β+HC, the⟨R−1

h

⟩−1decreases with T , thus QRh also decreases until the cal-

culated⟨R−1

h

⟩−1matches the experimental

⟨R−1

h

⟩−1and then increases again

(Table 6.1). It was therefore straightforward to locate the optimal T of 445 K

(Table 6.1). An additional 57 600 structures were collected at this T and sub-

jected to further analysis.

6.4 Global dimensions

The⟨R−1

h

⟩−1of the final ensemble generated using PRE-ERMD (SH3-PRE)

is in good agreement with the experimental value (Table 6.1), as is expected

given that reproduction of the experimental⟨R−1

h

⟩−1is a fundamental criterion

in the choice of the optimal simulation conditions. The previously published

Rh (∼ 24.3 A)48 is larger than that used here (21.2 A) because the construct

used in that study included an additional 4 amino acids at the C-terminus.

Interestingly, SH3-AS is almost as compact as the folded SH3-NS structure

present at neutral pH (Table 6.2). Neither of these states are as compact as is

predicted for a NFP of this size. For SH3-NS, this may result from the long

112

Page 125: Jra Phd Final 051107

Table 6.1: The Q values quantify how well the experimental⟨R−1

h

⟩−1(21.2 A)

and the PRE distances for PI3-SH3 are reproduced by varying T with Nrep = 24,

L = 1 and U = 8. QwPRE refers to the working dataset (80% of the PRE

distances) and QfPRE to the free dataset (remaining 20%). The results for the

most representative ensemble collected at the optimal T are in bold type.

T (K)⟨R−1

h

⟩−1(A) QRh Qw Qf

400 19.5 0.078 0.18 0.22

425 20.3 0.040 0.18 0.21

445 21.1 0.004 0.19 0.20

450 21.5 0.013 0.19 0.21

475 22.2 0.047 0.20 0.21

500 22.9 0.078 0.21 0.20

525 23.2 0.095 0.22 0.21

550 23.5 0.108 0.22 0.21

575 23.7 0.118 0.23 0.21

600 23.8 0.124 0.23 0.22

625 24.0 0.133 0.23 0.22

650 24.1 0.135 0.24 0.22

675 24.2 0.143 0.24 0.22

700 25.1 0.184 0.29 0.27

RT loop extending outwards from the remainder of the protein, which is more

structured. In contrast to SH3-PRE, the⟨R−1

h

⟩−1of SH3-CLIB (26.8 A) is

more like that expected for a random coil, indicating that the compactness of

SH3-AS cannot be explained by dihedral angle preferences alone.

The Rg distributions of SH3-PRE and SH3-CLIB were compared with that

of SH3-RC. The Rg distribution of SH3-CLIB is very similar to that of SH3-RC

(Figure 6.2 B). Although SH3-PRE encompasses a range of Rg, the distribution

is not as wide as that of SH3-RC. The difference between the Rg distribu-

tions of SH3-PRE and SH3-RC is more noticeable than for αS, βS and β+HC

(Figure 5.5), reflecting the greater relative compactness of SH3-AS (Tables 5.5

and 6.2).

113

Page 126: Jra Phd Final 051107

Table 6.2: Predicted1 and experimental2 Rh (in A) and compaction factors, C3f

for PI3-SH3 in various states.

U F pH 7.4 pH 2.0 3.5 M GndHCl

Rh 28.0 17.3 19.5 21.2 28.0

Cf 0.793 0.634 0.000

1. U and F refer to the Rh predicted for an unfolded or folded polypeptide according to

equations 2.23 and 2.24161, respectively.

2. measured by PFG-NMR on 0.5−1.0 mM PI3-SH3 in D2O adjusted to pH 7.4 with 2HCl at

293 K48, 100 µM PI3-SH3, 1.25 µM DSS, 10 mM HCl in 10% D2O at pH 2.0 and 298 K410,

and 0.5− 1.0 mM PI3-SH3 in D2O with 3.5 M GndHCl at pH 7.4 and 293 K48.

3. calculated according to equation 2.25161.

Figure 6.2: (A) The native fold of PI3-SH3 determined by NMR (PDB code

1pnj)41, showing the RT loop across the top of the β-barrel and the n-Src loop

on the left-hand side in green. The colour ranges from red at the N-terminus

to blue at the C-terminus. The view shown here was prepared with vmd375.

(B) Rg probability distribution for SH3-RC (black), SH3-PRE (red) and SH3-

CLIB (green). The Rg distributions are plotted rather than the Rh distributions

because the former are faster to calculate, but the Rh distributions are similar.

6.5 Residual structure

To characterise the residual structure of SH3-AS, the same analysis methods

were used as for the IDPs αS, βS and β+HC (Chapter 5), namely DC maps, Ra-

machandran plots, free energy maps and both qualitative and quantitative com-

parisons with experimental NMR data. Recently, a large amount of NMR data

describing the structure and dynamics of PI3-AS at both 298410 and 308 K411

has become available. The R1 and R2 relaxation rates and the PRE-NMR data

114

Page 127: Jra Phd Final 051107

are similar at both temperatures, whereas the secondary structure propensities

derived from the ∆δ are quite different. The data obtained at 298 K is most rel-

evant for comparisons with the calculated ensemble, as this is the temperature

at which the PRE-NMR experiments were carried out. In addition to the helical

content and aggregation propensity, the SASA and hydrophobicity throughout

the sequence were also calculated. In comparison to the IDPs, for which the

structural propensities were mostly inferred with respect to the random coil

model, the existence of a native fold for PI3-SH3 provided a further reference

state.

6.5.1 Comparison of the native and acid-denatured states

DC maps were produced for SH3-NS41, SH3-PRE and SH3-CLIB (Figure 6.3).

SH3-NS encompasses a wider range of DC values than SH3-PRE and SH3-

CLIB. This is an artifact of generating a DC map from a single structure, which

means that no averaging takes place. The distinct areas of low DC values

perpendicular to the main diagonal in the SH3-NS DC map correspond to anti-

parallel contacts between the β-strands forming the β-barrel (Figure 6.2 A).

Low DC values in other regions are due to contacts between parts of the RT

and n-Src loops and the β-strands. The DC map for SH3-PRE is more diffuse,

as is expected given that the 1H-15N HSQC spectrum for SH3-AS, to which the

restraints pertain, suggests that it is largely unfolded410,411, despite being very

compact (Table 6.2). The DC map for SH3-CLIB implies an even greater degree

of unfolding however; the DC values are greater than 1.0 throughout most of the

sequence, indicating that the rms inter-residue distances are even larger than in

the random coil model used to compute the DC values. It is not surprising that

the rms distances of SH3-CLIB are longer than those of SH3-PRE, as the latter

ensemble is much more compact. Other than rms distances slightly shorter

than in a random coil in the C-terminus, the DC map of SH3-CLIB has few

distinguishing features, indicating that implementing residue-specific dihedral

angle preferences is not sufficient to induce global order. Consideration of the

dihedral angle preferences of SH3-CLIB displayed in the Ramachandran plot

(Figure 6.4 C) along with the DC map, it appears that the structures comprising

SH3-CLIB are predominantly composed of PPII structure, consistent with the

highly expanded nature of this ensemble.

In comparing the DC map for SH3-PRE with that of SH3-NS, it is not

possible to distinguish whether the lack of the distinctive patterns representative

of the native tertiary structural motifs arises from conformational averaging

camouflaging the presence of a few native-like conformations or is due to residual

115

Page 128: Jra Phd Final 051107

Figure 6.3: DC maps for (A) SH3-NS (PDB code 1pnj)41, (B) SH3-PRE and

(C) SH3-CLIB. The very wide range of distances present in SH3-NS compared

to SH3-PRE and SH3-CLIB means that a different scale is required for the latter

two ensembles.

Figure 6.4: Ramachandran plots showing the dihedral angle distributions p(φ, ψ)

for (A,B) SH3-PRE and (C) SH3-CLIB. In (A,C) the probability of each com-

bination of φ and ψ dihedral angles is the average over all residues and all

structures whereas for (B) only residues 3-23 are considered. The same scale is

used for all plots to facilitate comparisons.

structure that is predominantly non-native in nature. Less extreme examples

of the first situation pertinent to the study of PI3-SH3 are provided by the

folding TS ensembles46 of three other SH3 domains, those from c-src (PDB

code 1fmk)412, Fyn (PDB code 1shf)413 and α-spectrin (PDB code 1bk2)414.

These ensembles were shown to be native-like on average despite substantial

local variability. Whilst these three SH3 domains are considerably smaller than

PI3-SH3, which has a longer n-Src loop, comparison of the TS and NS DC maps

provides a useful illustration of the relationship between an ensemble-averaged

DC map and that of a single structure in the case where the ensemble exhibits

native-like structure.

It is clear from Figure 6.5 that according to the DC maps, all three TS

ensembles bear much more resemblance to the DC maps of the corresponding

NS than SH3-PRE does to SH3-NS. Not only do the TS ensembles encompass

116

Page 129: Jra Phd Final 051107

the same range of DC values as the NS, but the structural elements visible in

the NS DC maps are only slightly less distinct in the TS DC maps. It seems

unlikely, therefore, that the residual structure observed in the DC map of SH3-

PRE is native-like, as the DC map would then be expected to be much more

like that of SH3-NS.

Figure 6.5: DC maps for (A-C) the NS and (D-F) the TS ensembles obtained

from K. Lindorff-Larsen46 of the (A,D) α-spectrin (PDB code 1bk2)414, (B,E)

c-src (1fmk)412 and (C,F) Fyn (1shf)413 SH3 domains. Only the DC maps for

the TS ensembles generated at 500 K are shown; the 640 K ensembles are very

similar.

6.5.2 Structural propensities of the acid-denatured state

Having eliminated the possibility that the residual structure of SH3-PRE and,

by analogy, SH3-AS, is native-like, it remains to characterise the nature of the

structures comprising SH3-PRE, including a qualitative comparison with the

experimental data for SH3-AS.

N-terminus

The inter-residue distances for residues 5 − 23 of SH3-PRE are mostly of

similar size to those expected for a random coil, which is consistent with either

α-helical or random coil structure259. This region is predicted to have a high

helical propensity by agadir361–364, particularly at acid pH (Figure 6.6 A).

Additionally, the ∆δ of SH3-AS at 298 K correspond to a propensity for α-

117

Page 130: Jra Phd Final 051107

helical structure410, although those measured at 308 K do not411. The negative

RDCs measured for this region have also been interpreted in terms of helical

structure410. Comparison of the Ramachandran plot for residues 3 − 23 of

SH3-PRE with that generated for the entire sequence shows a slight increase

in the population of both the α-helical and PPII regions (Figure 6.4 A and B).

Together, these data justify an interpretation of the N-terminal region of the

SH3-PRE DC map in terms of helical structure.

Experimental data that report on dynamics, including R1 and R2 relaxation

rates and RDCs (Figure 6.7 B), suggest an increased stiffness of residues 3−23 in

SH3-AS at both temperatures410,411. Additionally, the Iox/Ired obtained from

PRE-NMR are consistently high in the N-terminus, particularly for residues

1−7 and 14−23, regardless of the position of the spin-label (Figure 6.1). These

regions may therefore be extended away from the remainder of the protein in

SH3-AS, although the ensemble-averaged SASA of this region is only slightly

higher than for the remainder of the sequence (Figure 6.6 B).

Both the experimental data for SH3-AS and the DC map of SH3-PRE sug-

gest that the N-terminus has a tendency to form α-helical and perhaps also

PPII structure, and is relatively extended compared to the remainder of the

polypeptide chain. These structural tendencies are clearly non-native, as in

SH3-NS, residues 7− 13 form the first β-strand, and residues 13− 30 comprise

the RT loop. Interestingly, however, the bend in the RT loop around residue

21 appears to be retained in SH3-AS, as can be seen by the low DC values of

SH3-PRE between residues 15−20 and 20−25. This region has been implicated

in aggregation (see Section 6.6), despite its hydrophilicity and low aggregation

propensity (Figure 6.6 C and D).

C-terminus

The experimental data for the C-terminus of SH3-AS are less consistent in

terms of the implied structural propensities. The Cα and Hα secondary shifts

measured at 298 K indicate a slight helical propensity for residues 60 − 64410;

correspondingly, residues 61− 68 have a high predicted helical propensity (Fig-

ure 6.6 A). On the other hand, residues 35 − 41 and 72 − 79, which are also

predicted to be helical, exhibit ∆δ more in keeping with extended or β-sheet

structure, as does most of the sequence from residue 23 onwards410.

Of the observables that report on dynamics, the RDCs for residues 23− 86

are all positive and vary little in magnitude410 (Figure 6.7 B). The R2 relaxation

rates, however, are slightly larger than average for residues 55− 60 and 75− 77

at 298 K410 and residues 51−63 and 72−78 at 308 K411. These two regions are

118

Page 131: Jra Phd Final 051107

0 20 40 60 800

2

4

6

8

10H

elic

al C

onte

nt

0 20 40 60 800

50

100

150

200

250

SA

SA

2 )

0

50

100

150

200

250

0 20 40 60 80Residue Number

-4

-3

-2

-1

0

1

2

K-D

Hyd

roph

obic

ity

-4

-3

-2

-1

0

1

2

0 20 40 60 80Residue Number

-4

-3

-2

-1

0

1

2

-4

-2

0

2

Zag

gpr

of

A

C

B

D

Figure 6.6: (A) Helical propensity predicted using agadir361–364 for PI3-SH3

at pH 6.0 (black) and pH 2.0 (red). (B) SASA of SH3-NS (PDB code 1pnj41,

black) and SH3-PRE (red). (C) the KD hydrophobicity profile365 of PI3-SH3,

smoothed over an 11-residue window. Positive values correspond to hydrophobic

regions. (D) The aggregation propensity profile, Zprofagg , calculated using the

Zyggregator algorithm366 for PI3-SH3 at pH 6.0 (black) and pH 2.0 (red). Zprofagg

values greater than 1 indicate regions that are aggregation prone.

hereafter referred to as ‘Reg1’ and ‘Reg2’. Because the same pattern also oc-

curs in the R1 relaxation rates and heteronuclear nOes, it probably results from

reduced mobility of Reg1 and Reg2 rather than conformational exchange411.

This restricted motion is retained at 308 K despite the apparent lack of sec-

ondary structural preferences411. Residues within Reg1 and Reg2 exhibit lower

Iox/Ired in the PRE-NMR profiles for spin-labels situated in the N-terminus.

Correspondingly, the DC map of SH3-PRE shows that the distances between

these two regions and the N-terminus are shorter than expected for a purely

random coil (Figure 6.3 B). Thus Reg1 and Reg2, which may be slightly stiffer

due to their restricted motion, preferentially form intramolecular contacts with

the N-terminus. These interactions may be hydrophobic in nature, as the hy-

drophobicity profile shows that Reg1 and Reg2 are slightly more hydrophobic

than the surrounding residues (Figure 6.6 C).

As well as isolating regions with reduced mobility, measurement of the R1

and R2 relaxation rates also allows the most flexible sections of SH3-AS to

be identified. The data recorded at both 298 and 308 K show that residues

27 − 47, which act as a bridge between the extended N-terminus and the pre-

dominantly unstructured C-terminus, are more mobile than the remainder of

the protein. The relative orientation of the N- and C-termini is therefore likely

119

Page 132: Jra Phd Final 051107

to undergo frequent rearrangements. This explains the large number of residue

pairs involving the N- and C-terminus that have low DC values in SH3-PRE,

as steric considerations make it unlikely that they could all interact simultane-

ously. Higher mobility of this region also corresponds to the finding that the

first site of proteolysis of PI3-SH3 at pH 2.0 at both 295 − 7 and 308 K is the

peptide bond between residues 39 and 4052. Interestingly, residues in this re-

gion form a short helical turn in the n-Src loop of SH3-NS, and there is some

evidence in the SH3-PRE DC map that turn-like structure may be retained in

SH3-AS, as indicated by the relatively short distances between neighbouring

residues (Figure 6.3 B). The Iox/Ired for residues 45 − 50 tend to be relatively

large for all spin-label positions (Figure 6.1)411, suggesting that this region is

often located distant to the remainder of the structure and so permitting its

observed mobility.

Summary of the residual structure

Overall, interpretation of the experimental data in combination with analysis

of SH3-NS, SH3-PRE and SH3-CLIB leads to a picture of SH3-AS as possessing

a mostly disordered C-terminus, an extended N-terminus with some residual

helical propensity, and a flexible region comprising residues 27 − 47 that is

susceptible to proteolysis. The occurrence of the lowest DC values for SH3-

PRE for interactions between the C-terminus and residues 5− 25 suggests that

in SH3-AS the extended N-terminus folds back against the C-terminus. All of

these secondary and tertiary structural propensities are non-native in nature,

which has important implications for the initiation of aggregation. These are

discussed in more detail in Section 6.6.

6.5.3 Comparison with experimental data

The ultimate test of the quality of the ensembles discussed here is whether they

are capable of reproducing independent experimental observables. 3J-couplings

have not been measured for PI3-SH3, thus only the RDCs were considered. The

RDCs calculated from SH3-NS are similar to the experimental values for the

C-terminus, but fail to reproduce the experimental RDCs for the N-terminus,

suggesting that a description of the NS in terms of a single structure is not

sufficient (Figure 6.7 A). The RDCs for SH3-AS are of lower magnitude than

those of SH3-NS. They are all of the same sign (Figure 6.7 B) other than for

residues 3 − 23. The negative RDCs in this region have been interpreted as

corresponding to helical structure, as noted in section 6.5.2. When urea is

added to SH3-AS (SH3-ASU), the negative RDCs become positive, indicating

120

Page 133: Jra Phd Final 051107

that this residual structure is lost under chemical denaturation.

The RDCs calculated from SH3-CLIB fluctuate greatly from residue to

residue, but fail to reproduce the experimental data for either SH3-AS or SH3-

ASU (Figure 6.7 D). As only 5000 structures were available, lack of convergence

of the calculated RDCs cannot be ruled out as an explanation for this discrep-

ancy. The RDCs calculated from SH3-PRE are all of much lower magnitude

than the experimental RDCs for either SH3-AS or SH3-ASU (Figure 6.7 C).

Whilst there are some negative values, only one is situated within the proposed

helical region. Sufficient structures were used for the calculated RDCs to be

converged, meaning that the poor agreement cannot be attributed to statistical

error. Thus despite the residual structure identified in the DC maps, the local

structure of SH3-PRE remains different to that present experimentally.

0 20 40 60 80-30

-20

-10

0

10

20

RD

C (

Hz)

0 20 40 60 80-30

-20

-10

0

10

20

RD

C (

Hz)

0 20 40 60 80Residue Number

-6

-3

0

3

6

9

RD

C (

Hz)

0 20 40 60 80Residue Number

-6

-3

0

3

6

9

RD

C (

Hz)

A

C

B

D

Figure 6.7: Comparison of the experimental and calculated RDCs for PI3-SH3 in

a variety of states. (A) RDCs for SH3-NS obtained experimentally in stretched

polyacrylamide gels at pH 7.0410 are shown in black and those calculated using

pales190 from the NMR structure (PDB code 1pnj)41 are in red. (B) Experi-

mental RDCs for SH3-NS (pH 7.0, black) SH3-AS (pH 2.0, red) and SH3-ASU

(pH 2.0, 7.3 M urea, green) obtained as in (A). (C,D) RDCs calculated from (C)

SH3-PRE and (D) SH3-CLIB are shown in black and the experimental RDCs

measured for SH3-AS and SH3-ASU as described in (B) are in red and green,

respectively. The grey lines at 0 Hz are to guide the eye.

6.5.4 Free energy maps

The free energy maps provide a more global picture of the nature of the struc-

tures comprising the various ensembles. As in Chapters 4 and 5, F (Rg,SASA)

121

Page 134: Jra Phd Final 051107

and F (Rg, REE) are considered. SH3-PRE and SH3-CLIB are very different in

terms of both definitions of the free energy (Figure 6.8). As expected given

its lower⟨R−1

h

⟩−1, lower Rg values are highly populated by SH3-PRE. This

ensemble displays a wider range of SASA than SH3-CLIB, further confirming

the observation made in Chapter 5 that a large SASA can coincide with a low

Rg. In the case of PI3-SH3, this large accessible surface area may contribute to

its high aggregation propensity rather than towards productive inter-molecular

interactions, as was proposed to occur for αS.

The distribution of F (Rg, REE) for SH3-CLIB is similar to that seen for the

synucleins (Figure 5.10). For SH3-PRE, however, a wide range of REE corre-

spond to the smallest Rg, indicating that even for the most compact structures,

the termini are likely to be highly disordered.

The free energy maps reinforce the conclusion that SH3-AS does not include

native-like structures. The Rg, SASA and REE of SH3-NS (12.8 A, 5781 A2 and

21.8 A, respectively) lie outside of the regions pictured in Figure 6.8 for either

SH3-PRE or SH3-CLIB, indicating that neither of these ensembles contains

structures that resemble SH3-NS in terms of these global parameters.

Figure 6.8: Free energy landscapes of (A,B) SH3-PRE and (C,D) SH3-CLIB.

The free energy is defined as (A,C) F (Rg, SASA) = − ln p(Rg,SASA) and (B,D)

F (Rg, REE) = − ln p(Rg, REE), where REE is the end-to-end distance.

6.6 Implications for aggregation

The motivation for generating and characterising SH3-PRE was to better un-

derstand the factors that stimulate the aggregation of PI3-SH3, and, in a more

general sense, the conversion of NFPs into amyloid fibrils. It appears likely

that unfolding prior to amyloid formation is a general requirement for the mis-

folding of NFPs, as at least partial unfolding prior to fibril formation is re-

quired for many proteins35,415, most obviously those that are predominantly

α-helical39,416. PI3-SH3 fits into this scheme, as acid-denaturation is required

to stimulate its conversion into amyloid fibrils and both the experimental mea-

122

Page 135: Jra Phd Final 051107

surements made on SH3-AS and the representative ensemble characterised here

(SH3-PRE) demonstrate that SH3-AS is unfolded relative to SH3-NS. This re-

flects the need for rearrangement of the native structure into the fibril struc-

ture50 and the inability of SH3-NS to form fibrils directly411.

It has been suggested that amyloid fibril formation is initiated from par-

tially folded rather than completely unfolded states3,47,411,417. The presence

of non-native structure in SH3-AS, as suggested by the experimental data and

corroborated by analysis of the ensemble generated here using PRE-ERMD, is

consistent with such a scenario. Two other proteins that aggregate at acidic

pH (4− 5), transthyretin and β2-microglobulin, are also known to adopt partly

structured conformations under these conditions418,419.

The aggregation behaviour of PI3-SH3 can be rationalised by considering

its pH dependence in concert with its residual structure at low pH. Whilst

PI3-SH3 initially gains two positive charges as the pH is lowered below 3.0,

further reduction in the pH is unlikely to result in any additional changes to the

ionisation state of the protein48. Instead, the increased concentration of anions

provided by the agent (such as HCl) used to lower the pH are thought to screen

the positive charges, thus reducing the electrostatic repulsion and favouring

compaction and aggregation. The absence of two conserved basic residues from

the diverging turn of PI3-SH3 and the majority of the other SH3 domains known

to aggregate55 may also contribute to this effect.

Other residues known to play a critical role in the aggregation of PI3-SH3

are the charged residues in the RT loop and diverging turn (17 − 25)51,54 and

Y5549. These are not identified as being aggregation prone by the Zyggregator

algorithm366 (Figure 6.6 C and D) as it is most sensitive to hydrophobic re-

gions. The probability of inter-molecular associations involving residues 17-25

is enhanced by the neutralisation of the negatively charged residues (E19, E21,

E22, D23 and D25) at low pH and the reduced repulsion between the positively

charged residues due to anionic screening coupled with the extended nature of

the N-terminus in SH3-AS, which may increase the exposure of this region. Sim-

ilarly, Y55 is located in the flexible central region, and is therefore also likely

to be accessible. The tendency for the N-terminus to fold back against the

C-terminus may help to maintain it in an aggregation-competent conformation.

6.7 Conclusions

PRE-ERMD was used to characterise the acid-denatured state of the NFP PI3-

SH3 with the aim of understanding the causes of the high aggregation propensity

123

Page 136: Jra Phd Final 051107

of this state. Analysis of the expected Iox/Ired for combinations of SH3-NS and

SH3-RC suggest that the experimental data cannot be explained purely in terms

of native-like and random coil structure. Comparison of the TS ensembles of

three related SH3 domains with their respective native structures further con-

firmed that the residual structure observed for SH3-AS is not native-like. An

ensemble of structures generated using a coil library database to describe the di-

hedral angle preferences also failed to explain both the global and local structure

of SH3-AS. Characterisation of the ensemble of structures representative of the

acid-denatured, amyloidogenic state of PI3-SH3 generated using PRE-ERMD

provided insight into the nature of the structural propensities of SH3-AS, which

for the most part coincide with the residual structure suggested by the ex-

perimental data. Although the quantitative agreement with the experimental

RDCs is not good, this is not unexpected, as the PRE distances do not offer

any means of altering the description of the local structural preferences provided

by the force-field, which is clearly not accurate in the case of DS. The residual

structure identified for SH3-PRE allowed the mechanism by which charge af-

fects the aggregation properties of SH3-AS to be elucidated. Like αS, the key

determinant of PI3-SH3 aggregation is not the most obvious difference between

it and other related but non-amyloidogenic proteins, but the result of a more

subtle interplay between charge and environment.

124

Page 137: Jra Phd Final 051107

Chapter 7

Conclusions

Interest in characterising DS of proteins has recently piqued as a result of the

escalating amounts of high resolution structural data made available by develop-

ments in techniques such as NMR spectroscopy. Understanding DS is important

as this is the reference state from which both folding and mis-folding are ini-

tiated. Additionally, an increasing number of proteins are being shown to be

natively disordered. DS typically comprise a heterogeneous range of conforma-

tions, thus they cannot be described in terms of a single structure, making an

ensemble representation essential. Experimental observables, however, are al-

most always averages over the duration of the experiment and the ensemble of

molecules present. In order to define a DS ensemble it is necessary to know the

distribution of values underlying each experimental observable. Biomolecular

simulation can therefore complement experimental measurements as it allows

the nature of the structures comprising the ensemble and their relative popula-

tions to be determined. The aims of this thesis were to develop the best possible

method for generating ensembles of structures characteristic of DS of proteins

and to apply this method to gain insight into the factors that govern the balance

between folding, mis-folding and intrinsic disorder.

In Chapter 3, the ability of a range of simulation techniques to produce en-

sembles of structures representative of DS of proteins was assessed using the

IDP αS as a model system. It was found that generating sufficiently expanded

structures poses a significant difficulty. A solution was identified whereby al-

tering T provides a means of controlling the range of accessible conformations

and their global dimensions. However even when the global dimensions match

those measured experimentally, other experimental observables that report on

both local and long-range structure are not well reproduced. This led to the

investigation in Chapter 4 of the use of long-range distances derived from PRE-

125

Page 138: Jra Phd Final 051107

NMR experiments as restraints in ERMD simulations. The use of synthetic

data back-calculated from two reference ensembles of αS structures allowed the

effectiveness of the methods that were tested to be assessed in terms of their

ability to reproduce distributions as well as averages. This showed that obtain-

ing a good agreement of average values, particularly highly non-linear averages

such as the r−6-averaged PRE distances, is not sufficient to determine whether

an ensemble has been accurately reconstructed. Cross-validation against more

than one type of average was therefore introduced. It also emerged that the

compaction problem encountered in Chapter 3 is exacerbated by the r−6 nature

of the PRE distances. Again, manipulating T provides a means of overcom-

ing this problem. In a further change to the previously published method, the

tolerance to variation in the back-calculated, ensemble-averaged PRE distances

at each point in time was altered so as to account for the typical relationship

between an r−6 average and the underlying distribution.

The general method resulting from the work summarised in Chapter 4 was

applied in Chapters 5 and 6 to characterise the IDPs αS, βS and the related

artificial construct β+HC and the acid-denatured state of the NFP PI3-SH3.

As part of this work, the sources of uncertainty in the distances calculated

from experimental PRE data were thoroughly investigated and combined with

a modified definition of a ‘PRE’ distance developed in Chapter 4. Analysis of

the ensembles produced using PRE-ERMD revealed that although global prop-

erties such as the Rh and PRE distances match their experimental counterparts,

the quantitative agreement with observables that report on local structure such

as 3J-couplings and RDCs is not as good. The long-range PRE distances are

therefore not capable of altering the local structural properties encoded in the

force-field and implicit solvent models, which, at least in the situations explored

here, do not provide a good description of the local structure of DS. Coil li-

brary ensembles, widely touted in the literature as good models for DS, were

also found to be unsuitable. However despite the poor results of the quanti-

tative comparisons with experimental data, the DC maps, developed as part

of this work, portray a significant amount of residual structure, much of which

is in keeping with the structural propensities inferred from the experimental

data. Further scrutiny of this residual structure revealed the important role

that charge plays in determining the aggregation properties of the proteins con-

sidered here, allowing the differing aggregation propensities of αS and βS to

be rationalised and explaining why acid-denaturation is required to stimulate

amyloid fibril formation by PI3-SH3.

In summary, existing methods for using PRE distances as restraints in

126

Page 139: Jra Phd Final 051107

ERMD were improved upon so that an ensemble could be accurately recon-

structed in terms of distributions as well as averages. Application of the new,

general method to a family of IDPs and the unfolded state of a NFP corrobo-

rated the presence of residual structure implied by the experimental data and

provided new insight into the relationship between the structural and aggrega-

tion propensities of these proteins. In the future, inclusion of local structural

information as well as long-range distance restraints has the potential to further

enhance this technique.

127

Page 140: Jra Phd Final 051107

References

1. Vendruscolo, M., Zurdo, J., MacPhee, C.E. & Dobson, C.M. Protein fold-

ing and misfolding: a paradigm of self-assembly and regulation in complex

biological systems. Philos. Transact. A Math Phys. Eng. Sci. 361, 1205–22

(2003).

2. Dobson, C.M., Sali, A. & Karplus, M. Protein folding: a perspective from

theory and experiment. Angew. Chem. Int. Ed. 37, 868–93 (1998).

3. Dobson, C.M. Protein folding and misfolding. Nature 426, 884–90 (2003).

4. Wright, P.E. & Dyson, H.J. Intrinsically unstructured proteins: re-

assessing the protein structure-function paradigm. J. Mol. Biol. 293, 321–

31 (1999).

5. James, L.C. & Tawfik, D.S. Conformational diversity and protein evolution

- a 60-year-old hypothesis revisited. Trends Biochem. Sci. 28, 361–8 (2003).

6. Uversky, V.N. Protein folding revisited. A polypeptide chain at the folding

misfolding nonfolding cross-roads: which way to go? Cell. Mol. Life Sci.

60, 1852–71 (2004).

7. Sickmeier, M., Hamilton, J.A., LeGall, T., Vacic, V., Cortese, M.S., Tan-

tos, A., Szabo, B., Tompa, P., Chen, J., Uversky, V.N., Obradovic, Z. &

Dunker, A.K. DisProt: the database of disordered proteins. Nucl. Acids

Res. 35, D786–93 (2007).

8. Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. & Jones, D.T. Pre-

diction and functional analysis of native disorder in proteins from the three

kingdoms of life. J. Mol. Biol. 337, 635–45 (2004).

9. Hegyi, H. & Gerstein, M. The relationship between protein structure and

function: a comprehensive survey with application to the yeast genome.

J. Mol. Biol. 288, 147–64 (1999).

128

Page 141: Jra Phd Final 051107

10. Uversky, V.N. Natively unfolded proteins: a point where biology waits for

physics. Protein Sci. 11, 739–56 (2002).

11. Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M. & Obradovic,

Z. Intrinsic disorder and protein function. Biochemistry 41, 6573–82

(2002).

12. Dill, K.A. & Shortle, D. Denatured states of proteins. Annu. Rev. Biochem.

60, 795–825 (1991).

13. Baldwin, R.L. A new perspective on unfolded proteins. Adv. Prot. Chem.

62, 361–7 (2002).

14. Levinthal, C. Are there pathways for protein folding? J. Chim. Phys. 65,

44 (1968).

15. Goldenberg, D.P. Computational simulation of the statistical properties

of unfolded proteins. J. Mol. Biol. 326, 1615–33 (2003).

16. Bennett, M., Schlunegger, M. & Eisenberg, D. 3D Domain swapping: a

mechanism for oligomer assembly. Protein Sci. 4, 2455–68 (1995).

17. Schlunegger, M., Bennett, M. & Eisenberg, D. Oligomer formation by 3D

domain swapping: a model for protein assembly and misassembly. Adv.

Prot. Chem. 50, 61–122 (1997).

18. Bucciantini, M., Giannoni, E., Chiti, F., Baroni, F., Formigli, L., Zurdo,

J., Taddei, N., Ramponi, G., Dobson, C.M. & Stefani, M. Inherent toxicity

of aggregates implies a common mechanism for protein misfolding diseases.

Nature 416, 507–11 (2002).

19. Lashuel, H.A., Hartley, D., Petre, B.M., Walz, T. & Lansbury, P.T. Neu-

rodegenerative disease: amyloid pores from pathogenic mutations. Nature

418, 291 (2002).

20. Lashuel, H.A., Hartley, D.M., Petre, B.M., Wall, J.S., Simon, M.N., Walz,

T. & Lansbury, P.T. Mixtures of wild-type and a pathogenic (E22G) form

of Aβ40 in vitro accumulate protofibrils, including amyloid pores. J. Mol.

Biol. 332, 795–808 (2003).

21. Fersht, A.R. & Daggett, V. Protein folding and unfolding at atomic reso-

lution. Cell 108, 573–82 (2002).

22. Daggett, V. & Fersht, A.R. Is there a unifying mechanism for protein

folding? Trends Biochem. Sci. 28, 18–25 (2003).

129

Page 142: Jra Phd Final 051107

23. Daggett, V. & Fersht, A. The present view of the mechanism of protein

folding. Nat. Rev. Mol. Cell Biol. 4, 497–502 (2003).

24. Shakhnovich, E., Abkevich, V. & Ptitsyn, O. Conserved residues and the

mechanism of protein folding. Nature 379, 96–8 (1996).

25. Vendruscolo, M., Paci, E., Dobson, C.M. & Karplus, M. Three key residues

form a critical contact network in a protein folding transition state. Nature

409, 641–5 (2001).

26. Fersht, A.R. Transition-state structure as a unifying basis in protein-

folding mechanisms: contact order, chain topology, stability, and the ex-

tended nucleus mechanism. Proc. Natl. Acad. Sci. U. S. A. 97, 1525–9

(2000).

27. Dinner, A.R., Sali, A., Smith, L.J., Dobson, C.M. & Karplus, M. Under-

standing protein folding via free-energy surfaces from theory and experi-

ment. Trends Biochem. Sci. 25, 331–9 (2000).

28. Ozkan, S.B., Wu, G.A., Chodera, J.D. & Dill, K.A. Protein folding by

zipping and assembly. Proc. Natl. Acad. Sci. U. S. A. 104, 11987–92

(2007).

29. Bryngelson, J.D., Onuchic, J.N., Socci, N.D. & Wolynes, P.G. Funnels,

pathways, and the energy landscape of protein folding: a synthesis. Pro-

teins: Struct. Funct. Genet. 21, 167–95 (1995).

30. Vendruscolo, M., Paci, E., Karplus, M. & Dobson, C.M. Structures and

relative free energies of partially folded states of proteins. Proc. Natl. Acad.

Sci. U. S. A. 100, 14817–21 (2003).

31. Choy, W.Y. & Forman-Kay, J.D. Calculation of ensembles of structures

representing the unfolded state of an SH3 domain. J. Mol. Biol. 308,

1011–32 (2001).

32. Roder, H. & Colon, W. Kinetic role of early intermediates in protein

folding. Curr. Opin. Struct. Biol. 7, 15–28 (1997).

33. Dobson, C.M. The structural basis of protein folding and its links with

human disease. Philos. Trans. R. Soc. Lond. B Biol. Sci. 356, 133–145

(2001).

34. Horwich, A. Protein aggregation in disease: a role for folding intermediates

forming specific multimeric interactions. J. Clin. Invest. 110, 1221–32

(2002).

130

Page 143: Jra Phd Final 051107

35. Kelly, J.W. The alternative conformations of amyloidogenic proteins and

their multi-step assembly pathways. Curr. Opin. Struct. Biol. 8, 101–6

(1998).

36. Caughey, B. & Lansbury, P.T. Protofibrils, pores, fibrils, and neurodegen-

eration: separating the responsible protein aggregates from the innocent

bystanders. Annu. Rev. Neurosci. 26, 267–98 (2003).

37. Bucciantini, M., Calloni, G., Chiti, F., Formigli, L., Nosi, D., Dobson,

C.M. & Stefani, M. Prefibrillar amyloid protein aggregates share common

features of cytotoxicity. J. Biol. Chem. 279, 31374–82 (2004).

38. Walsh, D.M., Klyubin, I., Fadeeva, J.V., Cullen, W.K., Anwyl, R., Wolfe,

M.S., Rowan, M.J. & Selkoe, D.J. Naturally secreted oligomers of amyloid-

β protein potently inhibit hippocampal long-term potentiation in vivo.

Nature 416, 535–9 (2002).

39. Sunde, M. & Blake, C. The structure of amyloid fibrils by electron mi-

croscopy and X-ray diffraction. Adv. Prot. Chem. 50, 123–59 (1997).

40. Dobson, C.M. Protein-misfolding diseases: getting out of shape. Nature

418, 729–30 (2002).

41. Booker, G.W., Gout, I., Downing, A.K., Driscoll, P.C., Boyd, J., Water-

field, M.D. & Campbell, I.D. Solution structure and ligand-binding site

of the SH3 domain of the p85 α subunit of phosphatidylinositol 3-kinase.

Cell 73, 813–22 (1993).

42. Morton, C.J. & Campbell, I.D. SH3 domains. Molecular ‘velcro’. Curr.

Biol. 4, 615–7 (1994).

43. Musacchio, A., Wilmanns, M. & Saraste, M. Structure and function of the

SH3 domain. Prog. Biophys. Mol. Biol. 61, 283–97 (1994).

44. Pawson, T. & Gish, G.D. SH2 and SH3 domains: from structure to func-

tion. Cell 71, 359–62 (1992).

45. Guijarro, J.I., Morton, C.J., Plaxco, K.W., Campbell, I.D. & Dobson,

C.M. Folding kinetics of the SH3 domain of PI3 kinase by real-time NMR

combined with optical spectroscopy. J. Mol. Biol. 276, 657–67 (1998).

46. Lindorff-Larsen, K., Vendruscolo, M., Paci, E. & Dobson, C.M. Transition

states for protein folding have native topologies despite high structural

variability. Nat. Struct. Mol. Biol. 11, 443–9 (2004).

131

Page 144: Jra Phd Final 051107

47. Guijarro, J.I., Sunde, M., Jones, J.A., Campbell, I.D. & Dobson, C.M.

Amyloid fibril formation by an SH3 domain. Proc. Natl. Acad. Sci. U. S.

A. 95, 4224–8 (1998).

48. Zurdo, J., Guijarro, J.I., Jimenez, J.L., Saibil, H.R. & Dobson, C.M. De-

pendence on solution conditions of aggregation and amyloid formation by

an SH3 domain. J. Mol. Biol. 311, 325–40 (2001).

49. Bader, R., Bamford, R., Zurdo, J., Luisi, B.F. & Dobson, C.M. Probing

the mechanism of amyloidogenesis through a tandem repeat of the PI3-

SH3 domain suggests a generic model for protein aggregation and fibril

formation. J. Mol. Biol. 356, 189–208 (2006).

50. Jimenez, J.L., Guijarro, J.n., Orlova, E., Zurdo, J., Dobson, C.M., Sunde,

M. & Saibil, H.R. Cryo-electron microscopy structure of an SH3 amyloid

fibril and model of the molecular packing. EMBO J. 18, 81521 (1999).

51. Ventura, S., Zurdo, J., Narayanan, S., Parreno, M., Mangues, R., Reif, B.,

Chiti, F., Giannoni, E., Dobson, C.M., Aviles, F.X. & Serrano, L. Short

amino acid stretches can mediate amyloid formation in globular proteins:

the Src homology 3 (SH3) case. Proc. Natl. Acad. Sci. U. S. A. 101,

7258–63 (2004).

52. Polverino de Laureto, P., Taddei, N., Frare, E., Capanni, C., Costantini, S.,

Zurdo, J., Chiti, F., Dobson, C.M. & Fontana, A. Protein aggregation and

amyloid fibril formation by an SH3 domain probed by limited proteolysis.

J. Mol. Biol. 334, 129–41 (2003).

53. Monera, O.D., Kay, C.M. & Hodges, R.S. Protein denaturation with guani-

dine hydrochloride or urea provides a different estimate of stability de-

pending on the contributions of electrostatic interactions. Protein Sci 3,

1984–91 (1994).

54. Ventura, S., Lacroix, E. & Serrano, L. Insights into the origin of the

tendency of the PI3-SH3 domain to form amyloid fibrils. J. Mol. Biol.

322, 1147–58 (2002).

55. Liepina, I., Ventura, S., Czaplewski, C. & Liwo, A. Molecular dynamics

study of amyloid formation of two Abl-SH3 domain peptides. J. Peptide

Sci. 12, 780–9 (2006).

56. Martin-Garcia, J.M., Luque, I., Mateo, P.L., Ruiz-Sanz, J. & Camara-

Artigas, A. Crystallographic structure of the SH3 domain of the human

132

Page 145: Jra Phd Final 051107

c-Yes tyrosine kinase: loop flexibility and amyloid aggregation. FEBS Lett.

581, 1701–6 (2007).

57. Carulla, N., Caddy, G.L., Hall, D.R., Zurdo, J., Gairi, M., Feliz, M., Giralt,

E., Robinson, C.V. & Dobson, C.M. Molecular recycling within amyloid

fibrils. Nature 436, 554–8 (2005).

58. Ding, F., Dokholyan, N.V., Buldyrev, S.V., Stanley, H.E. & Shakhnovich,

E.I. Molecular dynamics simulation of the SH3 domain aggregation sug-

gests a generic amyloidogenesis mechanism. J. Mol. Biol. 324, 851–7

(2002).

59. Gunasekaran, K., Tsai, C.J., Kumar, S., Zanuy, D. & Nussinov, R. Ex-

tended disordered proteins: targeting function with less scaffold. Trends

Biochem. Sci. 28, 81–5 (2003).

60. Dunker, A.K., Brown, C.J. & Obradovic, Z. Identification and functions

of usefully disordered proteins. Adv. Prot. Chem. 62, 25–49 (2002).

61. Dyson, H.J. & Wright, P.E. Coupling of folding and binding for unstruc-

tured proteins. Curr. Opin. Struct. Biol. 12, 54–60 (2002).

62. Dyson, H.J. & Wright, P.E. Intrinsically unstructured proteins and their

functions. Nat. Rev. Mol. Cell Biol. 6, 197–208 (2005).

63. Tompa, P. Intrinsically unstructured proteins. Trends Biochem. Sci. 27,

527–33 (2002).

64. Tompa, P. & Csermely, P. The role of structural disorder in the function

of RNA and protein chaperones. FASEB J. 18, 1169–75 (2004).

65. Dunker, A.K., Cortese, M.S., Romero, P., Iakoucheva, L.M. & Uversky,

V.N. Flexible nets. The roles of intrinsic disorder in protein interaction

networks. FEBS J. 272, 5129–48 (2005).

66. Tompa, P., Szasz, C. & Buday, L. Structural disorder throws new light on

moonlighting. Trends Biochem. Sci. 30, 484–9 (2005).

67. Uversky, V. A protein-chameleon: conformational plasticity of α-synuclein,

a disordered protein involved in neurodegenerative disorders. J. Biomol.

Struct. Dyn. 21, 211–34 (2003).

68. Dunker, A.K., Obradovic, Z., Romero, P., Garner, E.C. & Brown, C.J. In-

trinsic protein disorder in complete genomes. Genome Inform. Ser. Work-

shop Genome Inform. 11, 161–71 (2000).

133

Page 146: Jra Phd Final 051107

69. Fink, A.L. Natively unfolded proteins. Curr. Opin. Struct. Biol. 15, 35–41

(2005).

70. Bracken, C., Iakoucheva, L.M., Romero, P.R. & Dunker, A.K. Combining

prediction, computation and experiment for the characterization of protein

disorder. Curr. Opin. Struct. Biol. 14, 570–6 (2004).

71. Tompa, P. Intrinsically unstructured proteins evolve by repeat expansion.

BioEssays 25, 847–55 (2003).

72. Tompa, P. The interplay between structure and function in intrinsically

unstructured proteins. FEBS Lett. 579, 3346–54 (2005).

73. Dunker, A.K., Lawson, J.D., Brown, C.J., Williams, R.M., Romero, P.,

Oh, J.S., Oldfield, C.J., Campen, A.M., Ratliff, C.M., Hipps, K.W., Ausio,

J., Nissen, M.S., Reeves, R., Kang, C., Kissinger, C.R., Bailey, R.W.,

Griswold, M.D., Chiu, W., Garner, E.C. & Obradovic, Z. Intrinsically

disordered protein. J. Mol. Graph. Model. 19, 26–59 (2001).

74. Iakoucheva, L.M., Brown, C.J., Lawson, J.D., Obradovic, Z. & Dunker,

A.K. Intrinsic disorder in cell-signaling and cancer-associated proteins. J.

Mol. Biol. 323, 573–84 (2002).

75. Uversky, V.N. What does it mean to be natively unfolded? Eur. J.

Biochem. 269, 2–12 (2002).

76. Demchenko, A.P. Recognition between flexible protein molecules: induced

and assisted folding. J. Mol. Recognit. 14, 42–61 (2001).

77. Namba, K. Roles of partly unfolded conformations in macromolecular

self-assembly. Genes to Cells 6, 1–12 (2001).

78. Romero, P., Obradovic, Z. & Dunker, K. Sequence data analysis for long

disordered regions prediction in the calcineurin family. Genome Inform.

Ser. Workshop Genome Inform. 8, 110–24 (1997).

79. Romero, P., Obradovic, Z., Kissinger, C., Villafranca, J., Garner, E., Guil-

liot, S. & Dunker, A. Thousands of proteins likely to have long disordered

regions. Pac. Symp. Biocomput. 3, 437–48 (1998).

80. Romero, P., Obradovic, Z., Li, X., Garner, E.C., Brown, C.J. & Dunker,

A.K. Sequence complexity of disordered protein. Proteins: Struct. Funct.

Genet. 42, 38–48 (2001).

134

Page 147: Jra Phd Final 051107

81. Dunker, A., Garner, E., Guilliot, S., Romero, P., Albrecht, K., Hart, J.,

Obradovic, Z., Kissinger, C. & Villafranca, J. Protein disorder and the

evolution of molecular recognition: theory, predictions and observations.

Pac. Symp. Biocomput. 3, 473–84 (1998).

82. Romero, P., Obradovic, Z., Kissinger, C., Villafranca, J. & Dunker, A.

Identifying disordered regions in proteins from amino acid sequence. Proc.

Int. Conf. Neur. Net. 1, 90–5 (1997).

83. Uversky, V.N., Gillespie, J.R. & Fink, A.L. Why are “natively unfolded”

proteins unstructured under physiologic conditions? Proteins: Struct.

Funct. Genet. 41, 415–27 (2000).

84. Oldfield, C., Cheng, Y., Cortese, M., Brown, C., Uversky, V. & Dunker,

A. Comparing and combining predictors of mostly disordered proteins.

Biochemistry 44, 1989–2000 (2005).

85. Li, X., Romero, P., Rani, M., Dunker, A. & Obradovic, Z. Predicting

protein disorder for N-, C-, and internal regions. Genome Inform. Ser.

Workshop Genome Inform. 10, 30–40 (1999).

86. Obradovic, Z., Peng, K., Vucetic, S., Radivojac, P., Brown, C.J. & Dunker,

A.K. Predicting intrinsic disorder from amino acid sequence. Proteins:

Struct. Funct. Genet. 53, 566–72 (2003).

87. Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server

for the prediction of intrinsically unstructured regions of proteins based on

estimated energy content. Bioinformatics 21, 3433–4 (2005).

88. Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. The pairwise en-

ergy content estimated from amino acid composition discriminates between

folded and intrinsically unstructured proteins. J. Mol. Biol. 347, 827–39

(2005).

89. Peng, K., Radivojac, P., Vucetic, S., Dunker, A.K. & Obradovic, Z.

Length-dependent prediction of protein intrinsic disorder. BMC Bioin-

formatics 7, 208 (2006).

90. Tompa, P., Dosztanyi, Z. & Simon, I. Prevalent structural disorder in E.

coli and S. cerevisiae proteomes. J. Proteome Res. 5, 1996–2000 (2006).

91. Uversky, V. & Narizhneva, N. Effect of natural ligands on the structural

properties and conformational stability of proteins. Biochemistry (Mosc)

63, 420–33 (1998).

135

Page 148: Jra Phd Final 051107

92. Ellis, R.J. Macromolecular crowding: obvious but underappreciated.

Trends Biochem. Sci. 26, 597–604 (2001).

93. Ellis, R.J. Macromolecular crowding: an important but neglected aspect of

the intracellular environment. Curr. Opin. Struct. Biol. 11, 114–9 (2001).

94. Flaugh, S.L. & Lumb, K.J. Effects of macromolecular crowding on the

intrinsically disordered proteins c-Fos and p27(Kip1). Biomacromolecules

2, 538–40 (2001).

95. Zurdo, J., Sanz, J., Gonzalez, C., Rico, M. & Ballesta, J. The exchangeable

yeast ribosomal acidic protein YP2β shows characteristics of a partly folded

state under physiological conditions. Biochemistry 36, 9625–35 (1997).

96. Dedmon, M.M., Patel, C.N., Young, G.B. & Pielak, G.J. FlgM gains

structure in living cells. Proc. Natl. Acad. Sci. U. S. A. 99, 12681–4

(2002).

97. Spolar, R. & Record, MT, J. Coupling of local folding to site-specific

binding of proteins to DNA. Science 263, 777–84 (1994).

98. Weiss, M.A., Ellenberger, T., Wobbe, C.R., Lee, J.P., Harrison, S.C. &

Struhl, K. Folding transition in the DMA-binding domain of GCN4 on

specific binding to DNA. Nature 347, 575–8 (1990).

99. Lacy, E.R., Filippov, I., Lewis, W.S., Otieno, S., Xiao, L., Weiss, S.,

Hengst, L. & Kriwacki, R.W. p27 binds cyclin-CDK complexes through

a sequential mechanism involving binding-induced protein folding. Nat.

Struct. Mol. Biol. 11, 358–64 (2004).

100. Fiebig, K.M., Rice, L.M., Pollock, E. & Brunger, A.T. Folding interme-

diates of SNARE complex assembly. Nat. Struct. Mol. Biol. 6, 117–23

(1999).

101. Magidovich, E., Orr, I., Fass, D., Abdu, U. & Yifrach, O. Intrinsic disor-

der in the C-terminal domain of the Shaker voltage-activated K+ channel

modulates its interaction with scaffold proteins. Proc. Natl. Acad. Sci. U.

S. A. 104, 13022–7 (2007).

102. Ahmed, M., Bamm, V., Harauz, G. & Ladizhansky, V. The BG21 isoform

of golli myelin basic protein is intrinsically disordered with a highly flexible

amino-terminal domain. Biochemistry 46, 9700–12 (2007).

136

Page 149: Jra Phd Final 051107

103. Meszros, B., Tompa, P., Simon, I. & Dosztnyi, Z. Molecular principles of

the interactions of disordered proteins. J. Mol. Biol. 372, 549–61 (2007).

104. Dunker, A.K. & Obradovic, Z. The protein trinity - linking function and

disorder. Nat. Biotech. 19, 805–6 (2001).

105. Kriwacki, R.W., Hengst, L., Tennant, L., Reed, S.I. & Wright, P.E. Struc-

tural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2-bound state:

conformational disorder mediatesbindingdiversity. Proc. Natl. Acad. Sci.

U. S. A. 93, 11504–9 (1996).

106. Romero, P.R., Zaidi, S., Fang, Y.Y., Uversky, V.N., Radivojac, P., Old-

field, C.J., Cortese, M.S., Sickmeier, M., LeGall, T., Obradovic, Z. &

Dunker, A.K. Alternative splicing in concert with protein intrinsic disor-

der enables increased functional diversity in multicellular organisms. Proc.

Natl. Acad. Sci. U. S. A. 103, 8390–5 (2006).

107. Hilser, V.J. & Thompson, E.B. Intrinsic disorder as a mechanism to opti-

mize allosteric coupling in proteins. Proc. Natl. Acad. Sci. U. S. A. 104,

8311–5 (2007).

108. Weinreb, P., Zhen, W., Poon, A., Conway, K. & Lansbury, P. NACP, a

protein implicated in Alzheimer’s disease and learning, is natively unfolded.

Biochemistry 35, 13709–15 (1996).

109. Eliezer, D., Kutluay, E., Bussell, Jr, R. & Browne, G. Conformational

properties of α-synuclein in its free and lipid-associated states. J. Mol.

Biol. 307, 1061–73 (2001).

110. Uversky, V.N., Li, J., Souillac, P., Millett, I.S., Doniach, S., Jakes, R.,

Goedert, M. & Fink, A.L. Biophysical properties of the synucleins and

their propensities to fibrillate: inhibition of α-synuclein assembly by β-

and γ-synucleins. J. Biol. Chem. 277, 11970–8 (2002).

111. George, J. The synucleins. Genome Biol. 3, 3002.1–6 (2001).

112. Jakes, R., Spillantini, M.G. & Goedert, M. Identification of two distinct

synucleins from human brain. FEBS Lett. 345, 27–32 (1994).

113. Ueda, K., Fukushima, H., Masliah, E., Xia, Y., Iwai, A., Yoshimoto, M.,

Otero, D., Kondo, J., Ihara, Y. & Saitoh, T. Molecular cloning of cDNA en-

coding an unrecognized component of amyloid in Alzheimer disease. Proc.

Natl. Acad. Sci. U. S. A. 90, 11282–6 (1993).

137

Page 150: Jra Phd Final 051107

114. Bussell, Jr, R. & Eliezer, D. A structural and functional role for 11-mer

repeats in α-synuclein and other exchangeable lipid binding proteins. J.

Mol. Biol. 329, 763–78 (2003).

115. Chandra, S., Chen, X., Rizo, J., Jahn, R. & Sudhof, T.C. A broken α-helix

in folded α-synuclein. J. Biol. Chem. 278, 15313–8 (2003).

116. Bisaglia, M., Tessari, I., Pinato, L., Bellanda, M., Giraudo, S., Fasano,

M., Bergantino, E., Bubacco, L. & Mammi, S. A topological model of

the interaction between α-synuclein and sodium dodecyl sulfate micelles.

Biochemistry 44, 329–39 (2005).

117. Bussell, R., J., Ramlall, T.F. & Eliezer, D. Helix periodicity, topology,

and dynamics of membrane-associated α-synuclein. Protein Sci. 14, 862–

72 (2005).

118. Ulmer, T.S., Bax, A., Cole, N.B. & Nussbaum, R.L. Structure and dy-

namics of micelle-bound human α-synuclein. J. Biol. Chem. 280, 9595–603

(2005).

119. Jao, C.C., Der-Sarkissian, A., Chen, J. & Langen, R. Structure of

membrane-bound α-synuclein studied by site-directed spin labeling. Proc.

Natl. Acad. Sci. U. S. A. 101, 8331–6 (2004).

120. Sung, Y.H. & Eliezer, D. Secondary structure and dynamics of micelle

bound β- and γ-synuclein. Protein Sci. 15, 1162–74 (2006).

121. Davidson, W.S., Jonas, A., Clayton, D.F. & George, J.M. Stabilization of

α-synuclein secondary structure upon binding to synthetic membranes. J.

Biol. Chem. 273, 9443–9 (1998).

122. Zhu, M., Li, J. & Fink, A.L. The association of α-synuclein with mem-

branes affects bilayer structure, stability, and fibril formation. J. Biol.

Chem. 278, 40186–97 (2003).

123. Cookson, M.R. The biochemistry of Parkinson’s disease. Annu. Rev.

Biochem. 74, 29–52 (2005).

124. Murphy, D.D., Rueter, S.M., Trojanowski, J.Q. & Lee, V.M.Y. Synucleins

are developmentally expressed, and α-synuclein regulates the size of the

presynaptic vesicular pool in primary hippocampal neurons. J. Neurosci.

20, 3214–20 (2000).

138

Page 151: Jra Phd Final 051107

125. Narayanan, V. & Scarlata, S. Membrane binding and self-association of

α-synucleins. Biochemistry 40, 9927–34 (2001).

126. Clayton, D.F. & George, J.M. The synucleins: a family of proteins involved

in synaptic function, plasticity, neurodegeneration and disease. Trends

Neurosci. 21, 249–54 (1998).

127. Payton, J.E., Perrin, R.J., Woods, W.S. & George, J.M. Structural de-

terminants of PLD2 inhibition by α-synuclein. J. Mol. Biol. 337, 1001–9

(2004).

128. Jenco, J.M., Rawlingson, A., Daniels, B. & Morris, A.J. Regulation of

phospholipase D2: selective inhibition of mammalian phospholipase D

isoenzymes by α- and β-synucleins. Biochemistry 37, 4901–9 (1998).

129. Kahle, P.J., Haass, C., Kretzschmar, H.A. & Neumann, M. Struc-

ture/function of α-synuclein in health and disease: rational development

of animal models for Parkinson’s and related diseases. J. Neurochem. 82,

449–57 (2002).

130. Moore, D.J., West, A.B., Dawson, V.L. & Dawson, T.M. Molecular patho-

physiology of Parkinson’s disease. Annu. Rev. Neurosci. 28, 57–87 (2005).

131. Spillantini, M.G., Schmidt, M.L., Lee, V.M., Trojanowski, J.Q., Jakes, R.

& Goedert, M. α-synuclein in Lewy bodies. Nature 388, 839–40 (1997).

132. Goedert, M. α-synuclein and neurodegenerative diseases. Nat. Rev. Neu-

rosci. 2, 492–501 (2001).

133. Baba, M., Nakajo, S., Tu, P.H., Tomita, T., Nakaya, K., Lee, V.M., Tro-

janowski, J.Q. & Iwatsubo, T. Aggregation of α-synuclein in Lewy bodies

of sporadic Parkinson’s disease and dementia with Lewy bodies. Am. J.

Pathol. 152, 879–84 (1998).

134. Polymeropoulos, M., C., L., Leroy, E., Ide, S., Dehejia, A., Dutra, A., Pike,

B., Root, H., Rubenstein, J., Boyer, R., Stenroos, E., Chandrasekharappa,

S., Athanassiadou, A., Papapetropoulos, T., Johnson, W., Lazzarini, A.,

Duvoisin, R., Di Iorio, G., Golbe, L. & Nussbaum, R. Mutation in the

α-synuclein gene identified in families with Parkinson’s disease. Science

276, 2045–7 (1997).

135. Kruger, R., Kuhn, W., Muller, T., Woitalla, D., Graeber, M., Kosel, S.,

Przuntek, H., Epplen, J., Schols, L. & Riess, O. Ala30Pro mutation in the

139

Page 152: Jra Phd Final 051107

gene encoding α-synuclein in Parkinson’s disease. Nat. Genet. 18, 106–8

(1998).

136. Zarranz, J.J., Alegre, J., Gomez-Esteban, J.C., Lezcano, E., Ros, R., Am-

puero, I., Vidal, L., Hoenicka, J., Rodriguez, O., Atares, B., Llorens, V.,

Gomez Tortosa, E., del Ser, T., Munoz, D.G. & de Yebenes, J.G. The new

mutation, E46K, of α-synuclein causes Parkinson and Lewy body demen-

tia. Ann. Neurol. 55, 164–73 (2004).

137. Singleton, A., Farrer, M., Johnson, J., Singleton, A., Hague, S., Kacher-

gus, J., Hulihan, M., Peuralinna, T., Dutra, A., Nussbaum, R., Lincoln,

S., Crawley, A., Hanson, M., Maraganore, D., Adler, C., Cookson, M.,

Muenter, M., Baptista, M., Miller, D., Blancato, J., Hardy, J. & Gwinn-

Hardy, K. α-synuclein locus triplication causes Parkinson’s disease. Science

302, 5646 (2003).

138. Chartier-Harlin, M.C., Kachergus, J., Roumier, C., Mouroux, V., Douay,

X., Lincoln, S., Levecque, C., Larvor, L., Andrieux, J., Hulihan, M., Wauc-

quier, N., Defebvre, L., Amouyel, P., Farrer, M. & Destee, A. α-synuclein

locus duplication as a cause of familial Parkinson’s disease. Lancet 364,

1167–9 (2004).

139. Ibanez, P., Bonnet, A.M., Debarges, B., Lohmann, E., Tison, F., Pollak,

P., Agid, Y., Durr, A. & Brice, A. Causal relation between α-synuclein

gene duplication and familial Parkinson’s disease. Lancet 364, 1169–71

(2004).

140. Dedmon, M.M., Christodoulou, J., Wilson, M.R. & Dobson, C.M. Heat

shock protein 70 inhibits α-synuclein fibril formation via preferential bind-

ing to prefibrillar species. J. Biol. Chem. 280, 14733–40 (2005).

141. Volles, M. & Lansbury, P. Vesicle permeabilization by protofibrillar α-

synuclein is sensitive to Parkinson’s disease-linked mutations and occurs

by a pore-like mechanism. Biochemistry 41, 4595–4602 (2002).

142. Volles, M., Lee, S.J., Rochet, J.C., Shtilerman, M., Ding, T., Kessler,

J. & Lansbury, P. Vesicle permeabilization by protofibrillar α-synuclein:

implications for the pathogenesis and treatment of Parkinson’s disease.

Biochemistry 40, 7812–7819 (2001).

143. Lashuel, H.A., Petre, B.M., Wall, J., Simon, M., Nowak, R.J., Walz, T. &

Lansbury, P.T. α-synuclein, especially the Parkinson’s disease-associated

140

Page 153: Jra Phd Final 051107

mutants, forms pore-like annular and tubular protofibrils. J. Mol. Biol.

322, 1089–1102 (2002).

144. Mori, F., Hayashi, S., Yamagishi, S., Yoshimoto, M., Yagihashi, S.,

Takahashi, H. & Wakabayashi, K. Pick’s disease: α- and β-synuclein-

immunoreactive Pick bodies in the dentate gyrus. Acta Neuropathol. (Berl)

104, 455–61 (2002).

145. Rivers, R.C. Biophysical analysis of the aggregation behaviour and struc-

tural properties of α- and β-synuclein. PhD Thesis (2007).

146. Yamin, G., Munishkina, L.A., Karymov, M.A., Lyubchenko, Y.L., Uver-

sky, V.N. & Fink, A.L. Forcing nonamyloidogenic β-synuclein to fibrillate.

Biochemistry 44, 9096–107 (2005).

147. Biere, A.L., Wood, S.J., Wypych, J., Steavenson, S., Jiang, Y., Anafi,

D., Jacobsen, F.W., Jarosinski, M.A., Wu, G.M., Louis, J.C., Martin,

F., Narhi, L.O. & Citron, M. Parkinson’s disease-associated α-synuclein

is more fibrillogenic than β- and γ-synuclein and cannot cross-seed its

homologs. J. Biol. Chem. 275, 34574–9 (2000).

148. Park, J.Y. & Lansbury, P. T., J. Beta-synuclein inhibits formation of α-

synuclein protofibrils: a possible therapeutic strategy against Parkinson’s

disease. Biochemistry 42, 3696–700 (2003).

149. Tsigelny, I.F., Bar-On, P., Sharikov, Y., Crews, L., Hashimoto, M., Miller,

M.A., Keller, S.H., Platoshyn, O., Yuan, J.X.J. & Masliah, E. Dynamics of

α-synuclein aggregation and inhibition of pore-like oligomer development

by β-synuclein. FEBS J. 274, 1862–77 (2007).

150. Uversky, V.N. & Fink, A.L. Amino acid determinants of α-synuclein ag-

gregation: putting together pieces of the puzzle. FEBS Lett. 522, 9–13

(2002).

151. Murray, I.V., Giasson, B.I., Quinn, S.M., Koppaka, V., Axelsen, P.H.,

Ischiropoulos, H., Trojanowski, J.Q. & Lee, V.M. Role of α-synuclein

carboxy-terminus on fibril formation in vitro. Biochemistry 42, 8530–40

(2003).

152. Spillantini, M.G., Crowther, R.A., Jakes, R., Hasegawa, M. & Goedert,

M. α-synuclein in filamentous inclusions of Lewy bodies from Parkinson’s

disease and dementia with Lewy bodies. Proc. Natl. Acad. Sci. U. S. A.

95, 6469–73 (1998).

141

Page 154: Jra Phd Final 051107

153. Hoyer, W., Cherny, D., Subramaniam, V. & Jovin, T.M. Impact of the

acidic C-terminal region comprising amino acids 109-140 on α-synuclein

aggregation in vitro. Biochemistry 43, 16233–42 (2004).

154. Li, W., West, N., Colla, E., Pletnikova, O., Troncoso, J.C., Marsh, L.,

Dawson, T.M., Jakala, P., Hartmann, T., Price, D.L. & Lee, M.K. Aggre-

gation promoting C-terminal truncation of α-synuclein is a normal cellular

process and is enhanced by the familial Parkinson’s disease-linked muta-

tions. Proc. Natl. Acad. Sci. U. S. A. 102, 2162–7 (2005).

155. Giasson, B.I., Murray, I.V.J., Trojanowski, J.Q. & Lee, V.M.Y. A hy-

drophobic stretch of 12 amino acid residues in the middle of α-synuclein is

essential for filament assembly. J. Biol. Chem. 276, 2380–6 (2001).

156. Du, H.N., Tang, L., Luo, X.Y., Li, H.T., Hu, J., Zhou, J.W. & Hu, H.Y.

A peptide motif consisting of glycine, alanine, and valine is required for

the fibrillization and cytotoxicity of human α-synuclein. Biochemistry 42,

8870–8 (2003).

157. Madine, J., Doig, A. & Middleton, D. The aggregation and membrane-

binding properties of an α-synuclein peptide fragment. Biochem. Soc.

Trans. 32, 1127–9 (2004).

158. Bertoncini, C.W., Rasia, R.M., Lamberto, G.R., Binolfi, A., Zweckstetter,

M., Griesinger, C. & Fernandez, C.O. Structural Characterization of the

Intrinsically Unfolded Protein [β]-Synuclein, a Natural Negative Regulator

of [alpha]-Synuclein Aggregation. Journal of Molecular Biology 372, 708–

722 (2007).

159. Sung, Y.h. & Eliezer, D. Residual structure, backbone dynamics, and

interactions within the synuclein family. J. Mol. Biol. 372, 689–707 (2007).

160. Dyson, H.J. & Wright, P.E. Equilibrium NMR studies of unfolded and

partially folded proteins. Nat. Struct. Biol. 5, 499–503 (1998).

161. Wilkins, D.K., Grimshaw, S.B., Receveur, V., Dobson, C.M., Jones, J.A.

& Smith, L.J. Hydrodynamic radii of native and denatured proteins mea-

sured by pulse field gradient NMR techniques. Biochemistry 38, 16424–31

(1999).

162. Svergun, D.I. & Koch, M.H.J. Small-angle scattering studies of biological

macromolecules in solution. Rep. Prog. Phys. 66, 1735–82 (2003).

142

Page 155: Jra Phd Final 051107

163. Bilsel, O. & Matthews, C.R. Molecular dimensions and their distributions

in early folding intermediates. Curr. Opin. Struct. Biol. 16, 86–93 (2006).

164. Mittag, T. & Forman-Kay, J.D. Atomic-level characterization of disordered

protein ensembles. Curr. Opin. Struct. Biol. 17, 3–14 (2007).

165. Dyson, H.J. & Wright, P.E. Elucidation of the protein folding landscape

by NMR. Methods Enzymol. 394, 299–321 (2005).

166. Wishart, D.S. & Sykes, B.D. The 13C chemical-shift index: a simple

method for the identification of protein secondary structure using 13C

chemical-shift data. J. Biomol. NMR 4, 171–80 (1994).

167. Wishart, D. & Sykes, B. Chemical shifts as a tool for structure determi-

nation. Methods Enzymol. 239, 363–92 (1994).

168. Marsh, J.A., Singh, V.K., Jia, Z. & Forman-Kay, J.D. Sensitivity of sec-

ondary structure propensities to sequence differences between α- and γ-

synuclein: implications for fibrillation. Protein Sci. 15, 2795–804 (2006).

169. Wang, Y. & Jardetzky, O. Probability-based protein secondary structure

identification using combined NMR chemical-shift data. Protein Sci. 11,

852–61 (2002).

170. Klein-Seetharaman, J., Oikawa, M., Grimshaw, S.B., Wirmer, J.,

Duchardt, E., Ueda, T., Imoto, T., Smith, L.J., Dobson, C.M. & Schwalbe,

H. Long-range interactions within a nonnative protein. Science 295, 1719–

22 (2002).

171. Reed, M.A., Jelinska, C., Syson, K., Cliff, M.J., Splevins, A., Alizadeh,

T., Hounslow, A.M., Staniforth, R.A., Clarke, A.R., Jeremy Craven, C. &

Waltho, J.P. The denatured state under native conditions: a non-native-

like collapsed state of N-PGK. J. Mol. Biol. 357, 365–72 (2006).

172. Bolin, K.A., Pitkeathly, M., Miranker, A., Smith, L.J. & Dobson, C.M.

Insight into a random coil conformation and an isolated helix: structural

and dynamical characterisation of the C-helix peptide from hen lysozyme.

J. Mol. Biol. 261, 443–53 (1996).

173. Yi, Q., Scalley-Kim, M.L., Alm, E.J. & Baker, D. NMR characterization

of residual structure in the denatured state of protein L. J. Mol. Biol. 299,

1341–51 (2000).

143

Page 156: Jra Phd Final 051107

174. Serrano, L. Comparison between the φ distribution of the amino acids

in the protein database and NMR data indicates that amino acids have

various φ propensities in the random coil conformation. J. Mol. Biol. 254,

322–33 (1995).

175. Smith, L.J., Bolin, K.A., Schwalbe, H., MacArthur, M.W., Thornton, J.M.

& Dobson, C.M. Analysis of main chain torsion angles in proteins: predic-

tion of NMR coupling constants for native and random coil conformations.

J. Mol. Biol. 255, 494–506 (1996).

176. Fiebig, K., Schwalbe, H., Buck, M., Smith, L. & Dobson, C. Toward a de-

scription of the conformations of denatured states of proteins. Comparison

of a random coil model with NMR measurements. J. Phys. Chem. 100,

2661–6 (1996).

177. Choy, W.Y., Shortle, D. & Kay, L. Side chain dynamics in unfolded protein

states: an NMR based 2H spin relaxation study of ∆131∆. J. Am. Chem.

Soc. 125, 1748–58 (2003).

178. Schwalbe, H., Fiebig, K., Buck, M., Jones, J., Grimshaw, S., Spencer, A.,

Glaser, S., Smith, L. & Dobson, C. Structural and dynamical properties of

a denatured protein. Heteronuclear 3D NMR experiments and theoretical

simulations of lysozyme in 8 M urea. Biochemistry 36, 8977–91 (1997).

179. Blackledge, M. Recent progress in the study of biomolecular structure and

dynamics in solution from residual dipolar couplings. Prog. Nucl. Magn.

Reson. Spectrosc. 46, 23–61 (2005).

180. Tjandra, N. & Bax, A. Direct measurement of distances and angles in

biomolecules by NMR in a dilute liquid crystalline medium. Science 278,

1111–4 (1997).

181. Sanders, C.R., Hare, B.J., Howard, K.P. & Prestegard, J.H. Magnetically-

oriented phospholipid micelles as a tool for the study of membrane-

associated molecules. Prog. Nucl. Magn. Reson. Spectrosc. 26, 421–44

(1994).

182. Hansen, M.R., Mueller, L. & Pardi, A. Tunable alignment of macro-

molecules by filamentous phage yields dipolar coupling interactions. Nat.

Struct. Mol. Biol. 5, 1065–74 (1998).

183. Clore, G., Starich, M. & Gronenborn, A. Measurement of residual dipolar

couplings of macromolecules aligned in the nematic phase of a colloidal

suspension of rod-shaped viruses. J. Am. Chem. Soc. 120, 10571–2 (1998).

144

Page 157: Jra Phd Final 051107

184. Sass, J., Cordier, F., Hoffmann, A., Rogowski, M., Cousin, A., Omichinski,

J., Lowen, H. & Grzesiek, S. Purple membrane induced alignment of

biological macromolecules in the magnetic field. J. Am. Chem. Soc. 121,

2047–55 (1999).

185. Koenig, B., Hu, J.S., Ottiger, M., Bose, S., Hendler, R. & Bax, A. NMR

measurement of dipolar couplings in proteins aligned by transient binding

to purple membrane fragments. J. Am. Chem. Soc. 121, 1385–6 (1999).

186. Ruckert, M. & Otting, G. Alignment of biological macromolecules in novel

nonionic liquid crystalline media for NMR experiments. J. Am. Chem.

Soc. 122, 7793–7 (2000).

187. Tycko, R., Blanco, F. & Ishii, Y. Alignment of biopolymers in strained gels:

a new way to create detectable dipole-dipole couplings in high-resolution

biomolecular NMR. J. Am. Chem. Soc. 122, 9340–1 (2000).

188. Chou, J.J., Gaemers, S., Howder, B., Louis, J.M. & Bax, A. A simple

apparatus for generating stretched polyacrylamide gels, yielding uniform

alignment of proteins and detergent micelles. J. Biomol. NMR 21, 377–82

(2001).

189. Tolman, J.R. Dipolar couplings as a probe of molecular dynamics and

structure in solution. Curr. Opin. Struct. Biol. 11, 532–9 (2001).

190. Zweckstetter, M. & Bax, A. Prediction of sterically induced alignment in

a dilute liquid crystalline phase: aid to protein structure determination by

NMR. J. Am. Chem. Soc. 122, 3791–2 (2000).

191. Zweckstetter, M., Hummer, G. & Bax, A. Prediction of charge-induced

molecular alignment of biomolecules dissolved in dilute liquid-crystalline

phases. Biophys. J. 86, 3444–60 (2004).

192. Azurmendi, H. & Bush, C. Tracking alignment from the moment of in-

ertia tensor (TRAMITE) of biomolecules in neutral dilute liquid crystal

solutions. J. Am. Chem. Soc. 124, 2426–7 (2002).

193. Louhivuori, M., Otten, R., Lindorff-Larsen, K. & Annila, A. Conforma-

tional fluctuations affect protein alignment in dilute liquid crystal media.

J. Am. Chem. Soc. 128, 4371–6 (2006).

194. Fredriksson, K., Louhivuori, M., Permi, P. & Annila, A. On the interpre-

tation of residual dipolar couplings as reporters of molecular dynamics. J.

Am. Chem. Soc. 126, 12646–50 (2004).

145

Page 158: Jra Phd Final 051107

195. Louhivuori, M., Paakkonen, K., Fredriksson, K., Permi, P., Lounila, J.

& Annila, A. On the origin of residual dipolar couplings from denatured

proteins. J. Am. Chem. Soc. 125, 15647–50 (2003).

196. Kuhn, W. Uber die Gestalt fadenformiger Molekule in Losungen. Kolloid-Z

68, 2–11 (1934).

197. Haber, C., Ruiz, S.A. & Wirtz, D. Shape anisotropy of a single random-

walk polymer. Proc. Natl. Acad. Sci. U. S. A. 97, 10792–5 (2000).

198. Clore, G. & Gronenborn, A. NMR structures of proteins and protein

complexes beyond 20,000 M(r). Nat. Struct. Biol. 4, 849–53 (1997).

199. Gillespie, J.R. & Shortle, D. Characterization of long-range structure in

the denatured state of staphylococcal nuclease. I. Paramagnetic relaxation

enhancement by nitroxide spin labels. J. Mol. Biol. 268, 158–69 (1997).

200. Crowhurst, K. & Forman-Kay, J. Aromatic and methyl NOEs highlight hy-

drophobic clustering in the unfolded state of an SH3 domain. Biochemistry

42, 8687–95 (2003).

201. Kristjansdottir, S., Lindorff-Larsen, K., Fieber, W., Dobson, C.M., Ven-

druscolo, M. & Poulsen, F.M. Formation of native and non-native in-

teractions in ensembles of denatured ACBP molecules from paramagnetic

relaxation enhancement studies. J. Mol. Biol. 347, 1053–62 (2005).

202. Lindorff-Larsen, K., Kristjansdottir, S., Teilum, K., Fieber, W., Dobson,

C.M., Poulsen, F.M. & Vendruscolo, M. Determination of an ensemble of

structures representing the denatured state of the bovine acyl-coenzyme a

binding protein. J. Am. Chem. Soc. 126, 3291–9 (2004).

203. Teilum, K., Kragelund, B.B. & Poulsen, F.M. Transient structure forma-

tion in unfolded acyl-coenzyme A-binding protein observed by site-directed

spin labelling. J. Mol. Biol. 324, 349–57 (2002).

204. Francis, C., Lindorff-Larsen, K., Robert B. Best, R. & Vendruscolo, M.

Characterization of the residual structure in the unfolded state of the

∆131∆ fragment of staphylococcal nuclease. Proteins: Struct. Funct.

Bioinform. 65, 145–52 (2006).

205. Dedmon, M.M., Lindorff-Larsen, K., Christodoulou, J., Vendruscolo, M. &

Dobson, C.M. Mapping long-range interactions in α-synuclein using spin-

label NMR and ensemble molecular dynamics simulations. J. Am. Chem.

Soc. 127, 476–7 (2005).

146

Page 159: Jra Phd Final 051107

206. Liang, B., Bushweller, J.H. & Tamm, L.K. Site-directed parallel spin-

labeling and paramagnetic relaxation enhancement in structure determi-

nation of membrane proteins by solution NMR spectroscopy. J. Am. Chem.

Soc. 128, 4389–97 (2006).

207. Iwahara, J., Schwieters, C.D. & Clore, G.M. Ensemble approach for

NMR structure refinement against 1H paramagnetic relaxation enhance-

ment data arising from a flexible paramagnetic group attached to a macro-

molecule. J. Am. Chem. Soc. 126, 5879–96 (2004).

208. Battiste, J.L. & Wagner, G. Utilization of site-directed spin labeling and

high-resolution heteronuclear nuclear magnetic resonance for global fold

determination of large proteins with limited nuclear overhauser effect data.

Biochemistry 39, 5355–65 (2000).

209. Donaldson, L.W., Skrynnikov, N.R., Choy, W.Y., Muhandiram, D.R.,

Sarkar, B., Forman-Kay, J.D. & Kay, L.E. Structural characterization

of proteins with an attached ATCUN motif by paramagnetic relaxation

enhancement NMR spectroscopy. J. Am. Chem. Soc. 123, 9843–7 (2001).

210. Gaponenko, V., Howarth, J.W., Columbus, L., Gasmi-Seabrook, G., Yuan,

J., Hubbell, W.L. & Rosevear, P.R. Protein global fold determination using

site-directed spin and isotope labeling. Protein Sci. 9, 302–9 (2000).

211. Tang, C., Iwahara, J. & Clore, G.M. Visualization of transient encounter

complexes in protein-protein association. Nature 444, 383–6 (2006).

212. Voss, J., Salwinski, L., Kaback, H. & Hubbell, W. A method for distance

determination in proteins using a designed metal ion binding site and site-

directed spin labeling: evaluation with T4 lysozyme. Proc. Natl. Acad.

Sci. U. S. A. 92, 12295–9 (1995).

213. Iwahara, J. & Clore, G.M. Detecting transient intermediates in macro-

molecular binding by paramagnetic NMR. Nature 440, 1227–30 (2006).

214. Iwahara, J., Anderson, D., Murphy, E. & Clore, G. EDTA-derivatized de-

oxythymidine as a tool for rapid determination of protein binding polarity

to DNA by intermolecular paramagnetic relaxation enhancement. J. Am.

Chem. Soc. 125, 6634–5 (2003).

215. Mal, T., Ikura, M. & Kay, L. The ATCUN domain as a probe of inter-

molecular interactions: application to calmodulin-peptide complexes. J.

Am. Chem. Soc. 124, 14002–3 (2002).

147

Page 160: Jra Phd Final 051107

216. Karim, C.B., Kirby, T.L., Zhang, Z., Nesmelov, Y. & Thomas, D.D. Phos-

pholamban structural dynamics in lipid bilayers probed by a spin label

rigidly coupled to the peptide backbone. Proc. Natl. Acad. Sci. U. S. A.

101, 14437–42 (2004).

217. Shenkarev, Z.O., Paramonov, A.S., Balashova, T.A., Yakimenko, Z.A.,

Baru, M.B., Mustaeva, L.G., Raap, J., Ovchinnikova, T.V. & Arseniev,

A.S. High stability of the hinge region in the membrane-active peptide helix

of zervamicin: paramagnetic relaxation enhancement studies. Biochem.

Biophys. Res. Comm. 325, 1099–105 (2004).

218. Milov, A.D., Tsvetkov, Y.D., Gorbunova, E.Y., Mustaeva, L.G., Ovchin-

nikova, T.V. & Raap, J. Self-aggregation properties of spin-labeled zer-

vamicin IIA as studied by PELDOR spectroscopy. Biopolymers 64, 328–36

(2002).

219. Johnson, P.E., Brun, E., MacKenzie, L.F., Withers, S.G. & McIntosh, L.P.

The cellulose-binding domains from Cellulomonas fimi β-1,4-glucanase

CenC bind nitroxide spin-labeled cellooligosaccharides in multiple orien-

tations. J. Mol. Biol. 287, 609–25 (1999).

220. Ueda, T., Kato, A., Ogawa, Y., Torizawa, T., Kuramitsu, S., Iwai, S.,

Terasawa, H. & Shimada, I. NMR study of repair mechanism of DNA pho-

tolyase by FAD-induced paramagnetic relaxation enhancement. J. Biol.

Chem. 279, 52574–9 (2004).

221. Roosild, T.P., Greenwald, J., Vega, M., Castronovo, S., Riek, R. & Choe,

S. NMR structure of mistic, a membrane-integrating protein for membrane

protein expression. Science 307, 1317–21 (2005).

222. Lietzow, M.A., Jamin, M., Jane Dyson, H. & Wright, P.E. Mapping long-

range contacts in a highly unfolded protein. J. Mol. Biol. 322, 655–62

(2002).

223. Solomon, I. & Bloembergen, N. Nuclear magnetic interactions in the HF

molecule. J. Chem. Phys. 25, 261–6 (1956).

224. Gillespie, J.R. & Shortle, D. Characterization of long-range structure in

the denatured state of staphylococcal nuclease. II. Distance restraints from

paramagnetic relaxation and calculation of an ensemble of structures. J.

Mol. Biol. 268, 170–84 (1997).

148

Page 161: Jra Phd Final 051107

225. Nadaud, P., Helmus, J., Hofer, N. & Jaroniec, C. Long-range structural

restraints in spin-labeled proteins probed by solid-state nuclear magnetic

resonance spectroscopy. J. Am. Chem. Soc. 129, 7502–3 (2007).

226. Lee, J., Langen, R., Hummel, P., Gray, H. & Winkler, J. α-synuclein

structures from fluorescence energy-transfer kinetics: implications for the

role of the protein in Parkinson’s disease. Proc. Natl. Acad. Sci. U. S. A.

101, 16466–71 (2004).

227. Lee, J.C., Gray, H.B. & Winkler, J.R. Tertiary contact formation in α-

synuclein probed by electron transfer. J. Am. Chem. Soc. 127, 16388–9

(2005).

228. Smith, L.J., Fiebig, K.M., Schwalbe, H. & Dobson, C.M. The concept of a

random coil. Residual structure in peptides and denatured proteins. Fold.

Des. 1, R95–106 (1996).

229. Tanford, C., Kawahara, K. & Lapanje, S. Proteins in 6-M guanidine hy-

drochloride. Demonstration of random coil behavior. J. Biol. Chem. 241,

1921–3 (1966).

230. Tanford, C. Protein denaturation. Adv. Prot. Chem. 23, 121–282 (1968).

231. McCarney, E.R., Kohn, J.E. & Plaxco, K.W. Is there or isn’t there? The

case for (and against) residual structure in chemically denatured proteins.

Crit. Rev. Biochem. Mol. Biol. 40, 181–9 (2005).

232. Kohn, J.E., Millett, I.S., Jacob, J., Zagrovic, B., Dillon, T.M., Cingel,

N., Dothager, R.S., Seifert, S., Thiyagarajan, P., Sosnick, T.R., Hasan,

M.Z., Pande, V.S., Ruczinski, I., Doniach, S. & Plaxco, K.W. Random-

coil behavior and the dimensions of chemically unfolded proteins. Proc.

Natl. Acad. Sci. U. S. A. 101, 12491–6 (2004).

233. Millett, I.S., Doniach, S. & Plaxco, K.W. Toward a taxonomy of the

denatured state: small angle scattering studies of unfolded proteins. Adv.

Prot. Chem. 62, 241–62 (2002).

234. Morar, A.S., Olteanu, A., Young, G.B. & Pielak, G.J. Solvent-induced

collapse of α-synuclein and acid-denatured cytochrome c. Protein Sci. 10,

2195–9 (2001).

235. Binolfi, A., Rasia, R.M., Bertoncini, C.W., Ceolin, M., Zweckstetter, M.,

Griesinger, C., Jovin, T.M. & Fernandez, C.O. Interaction of α-synuclein

149

Page 162: Jra Phd Final 051107

with divalent metal ions reveals key differences: a link between structure,

binding specificity and fibrillation enhancement. J. Am. Chem. Soc. 128,

9893–901 (2006).

236. Shortle, D. & Ackerman, M.S. Persistence of native-like topology in a

denatured protein in 8 M urea. Science 293, 487–9 (2001).

237. Ohnishi, S. & Shortle, D. Effects of denaturants and substitutions of

hydrophobic residues on backbone dynamics of denatured staphylococcal

nuclease. Protein Sci. 12, 1530–7 (2003).

238. Ohnishi, S., Lee, A.L., Edgell, M.H. & Shortle, D. Direct demonstration of

structural similarity between native and denatured eglin C. Biochemistry

43, 4064–70 (2004).

239. Fieber, W., Kristjansdottir, S. & Poulsen, F.M. Short-range, long-range

and transition state interactions in the denatured state of ACBP from

residual dipolar couplings. J. Mol. Biol. 339, 1191–9 (2004).

240. Bertoncini, C.W., Jung, Y.S., Fernandez, C.O., Hoyer, W., Griesinger, C.,

Jovin, T.M. & Zweckstetter, M. Release of long-range tertiary interactions

potentiates aggregation of natively unstructured α-synuclein. Proc. Natl.

Acad. Sci. U. S. A. 102, 1430–5 (2005).

241. Dyson, H.J. & Wright, P.E. Defining solution conformations of small linear

peptides. Annu. Rev. Biophys. Biophys. Chem. 20, 519–38 (1991).

242. Mok, Y.K., Kay, C.M., Kay, L.E. & Forman-Kay, J. NOE data demon-

strating a compact unfolded state for an SH3 domain under non-denaturing

conditions. J. Mol. Biol. 289, 619–38 (1999).

243. Ackerman, M.S. & Shortle, D. Molecular alignment of denatured states of

staphylococcal nuclease with strained polyacrylamide gels and surfactant

liquid crystalline phases. Biochemistry 41, 3089–95 (2002).

244. Ackerman, M.S. & Shortle, D. Robustness of the long-range structure

in denatured staphylococcal nuclease to changes in amino acid sequence.

Biochemistry 41, 13791–7 (2002).

245. Mohana-Borges, R., Goto, N.K., Kroon, G.J., Dyson, H.J. & Wright, P.E.

Structural characterization of unfolded states of apomyoglobin using resid-

ual dipolar couplings. J. Mol. Biol. 340, 1131–42 (2004).

150

Page 163: Jra Phd Final 051107

246. Shortle, D. The denatured state (the other half of the folding equation)

and its role in protein stability. FASEB J. 10, 27–34 (1996).

247. Neri, D., Billeter, M., Wider, G. & Wuthrich, K. NMR determination of

residual structure in a urea-denatured protein, the 434-repressor. Science

257, 1559–63 (1992).

248. Tsai, C.J., Ma, B., Sham, Y.Y., Kumar, S. & Nussinov, R. Structured

disorder and conformational selection. Proteins: Struct. Funct. Genet. 44,

418–27 (2001).

249. Shortle, D.R. Structural analysis of non-native states of proteins by NMR

methods. Curr. Opin. Struct. Biol. 6, 24–30 (1996).

250. Wrabl, J. & Shortle, D. A model of the changes in denatured state structure

underlying m value effects in staphylococcal nuclease. Nat. Struct. Mol.

Biol. 6, 876–83 (1999).

251. Blanco, F.J., Serrano, L. & Forman-Kay, J.D. High populations of non-

native structures in the denatured state are compatible with the formation

of the native folded state. J. Mol. Biol. 284, 1153–64 (1998).

252. Wong, K.B., Freund, S.M.V. & Fersht, A.R. Cold denaturation of barstar:1H,15N and13C NMR assignment and characterisation of residual struc-

ture. J. Mol. Biol. 259, 805–18 (1996).

253. Saab-Rincon, G., Gualfetti, P. & Matthews, C. Mutagenic and thermo-

dynamic analyses of residual structure in the α subunit of tryptophan

synthase. Biochemistry 35, 1988–94 (1996).

254. Ropson, I. & Frieden, C. Dynamic NMR spectral analysis and protein

folding: identification of a highly populated folding intermediate of rat

intestinal fatty acid-binding protein by 19F NMR. Proc. Natl. Acad. Sci.

U. S. A. 89, 7222–6 (1992).

255. Tran, H.T., Wang, X. & Pappu, R.V. Reconciling observations of sequence-

specific conformational propensities with the generic polymeric behavior

of denatured proteins. Biochemistry 44, 11369–80 (2005).

256. Pappu, R.V., Srinivasan, R. & Rose, G.D. The Flory isolated-pair hypoth-

esis is not valid for polypeptide chains: implications for protein folding.

Proc. Natl. Acad. Sci. U. S. A. 97, 12565–70 (2000).

151

Page 164: Jra Phd Final 051107

257. Jha, A.K., Colubri, A., Freed, K.F. & Sosnick, T.R. Statistical coil model

of the unfolded state: resolving the reconciliation problem. Proc. Natl.

Acad. Sci. U. S. A. 102, 13099–104 (2005).

258. Fitzkee, N.C. & Rose, G.D. Reassessing random-coil statistics in unfolded

proteins. Proc. Natl. Acad. Sci. U. S. A. 101, 12497–502 (2004).

259. Zagrovic, B. & Pande, V.S. Structural correspondence between the α-

helix and the random-flight chain resolves how unfolded proteins can have

native-like properties. Nat. Struct. Biol. 10, 955–61 (2003).

260. Banavar, J.R., Hoang, T.X. & Maritan, A. Proteins and polymers. J.

Chem. Phys. 122, 234910–4 (2005).

261. Banavar, J.R., Cieplak, M., Flammini, A., Hoang, T.X., Kamien, R.D.,

Lezon, T., Marenduzzo, D., Maritan, A., Seno, F., Snir, Y. & Trovato, A.

Geometry of proteins: hydrogen bonding, sterics, and marginally compact

tubes. Phys. Rev. E 73, 031921–5 (2006).

262. Marenduzzo, D., Hoang, T.X., Seno, F., Vendruscolo, M. & Maritan, A.

Form of growing strings. Phys. Rev. Lett. 95, 098103–4 (2005).

263. Hoang, T.X., Marsella, L., Trovato, A., Seno, F., Banavar, J.R. & Maritan,

A. Common attributes of native-state structures of proteins, disordered

proteins, and amyloid. Proc. Natl. Acad. Sci. U. S. A. 103, 6883–8 (2006).

264. Petrescu, A., Calmettes, P., Durand, D., Receveur, V. & Smith, J. Change

in backbone torsion angle distribution on protein folding. Protein Sci. 9,

1129–36 (2000).

265. Jha, A., Colubri, A., Zaman, M., Koide, S., Sosnick, T. & Freed, K. Helix,

sheet, and polyproline II frequencies and strong nearest neighbor effects in

a restricted coil library. Biochemistry 44, 9691–702 (2005).

266. Bernado, P., Bertoncini, C.W., Griesinger, C., Zweckstetter, M. & Black-

ledge, M. Defining long-range order and local disorder in native α-synuclein

using residual dipolar couplings. J. Am. Chem. Soc. 127, 17968–9 (2005).

267. Zaman, M.H., Shen, M.Y., Berry, R.S., Freed, K.F. & Sosnick, T.R. In-

vestigations into sequence and conformational dependence of backbone

entropy, inter-basin dynamics and the Flory isolated-pair hypothesis for

peptides. J. Mol. Biol. 331, 693–711 (2003).

152

Page 165: Jra Phd Final 051107

268. Cho, M.K., Kim, H.Y., Bernado, P., Fernandez, C., Blackledge, M. &

Zweckstetter, M. Amino acid bulkiness defines the local conformations

and dynamics of natively unfolded α-synuclein and tau. J. Am. Chem.

Soc. 129, 3032–3 (2007).

269. Skora, L., Cho, M.K., Kim, H., Becker, S., Fernandez, C.O., Blackledge,

M. & Zweckstetter, M. Charge-induced molecular alignment of intrinsically

disordered proteins. Angew. Chem. Int. Ed. 45, 7012–15 (2006).

270. van Gunsteren, W.F., Bakowies, D., Baron, R., Chandrasekhar, I., Chris-

ten, M., Daura, X., Gee, P., Geerke, D.P., Glattli, A., Hunenberger, P.H.,

Kastenholz, M.A., Oostenbrink, C., Schenk, M., Trzesniak, D., van der

Vegt, N.F.A. & Yu, H.B. Biomolecular modeling: goals, problems, per-

spectives. Angew. Chem. Int. Ed. 45, 4064–92 (2006).

271. Brooks, B., Bruccoler, R., Olafson, B., States, D., Swaminathan, S. &

Karplus, M. CHARMM: a program for macromolecular energy, minimiza-

tion, and dynamics calculations. J. Comput. Chem. 4, 187–217 (1983).

272. Mackerell, A.D.J. Empirical force fields for biological macromolecules:

overview and issues. J. Comput. Chem. 25, 1584–604 (2004).

273. Jorgensen, W.L. & Tirado-Rives, J. Potential energy functions for atomic-

level simulations of water and organic and biomolecular systems. Proc.

Natl. Acad. Sci. U. S. A. 102, 6665–70 (2005).

274. Wang, W., Donini, O., Reyes, C.M. & Kollman, P.A. Biomolecular simula-

tions: recent developments in force fields, simulations of enzyme catalysis,

protein-ligand, protein-protein, and protein-nucleic acid noncovalent inter-

actions. Annu. Rev. Biophys. Biomol. Struct. 30, 211–43 (2001).

275. Lazaridis, T., Archontis, G. & Karplus, M. Enthalpic contribution to pro-

tein stability: insights from atom-based calculations and statistical me-

chanics. Adv. Prot. Chem. 46, 213–306 (1995).

276. Roux, B. & Simonson, T. Implicit solvent models. Biophys. Chem. 78,

1–20 (1999).

277. Feig, M. & Brooks, C.L. Recent advances in the development and appli-

cation of implicit solvent models in biomolecule simulations. Curr. Opin.

Struct. Biol. 14, 217–24 (2004).

278. Lazaridis, T. & Karplus, M. Effective energy functions for protein structure

prediction. Curr. Opin. Struct. Biol. 10, 139–45 (2000).

153

Page 166: Jra Phd Final 051107

279. Im, W., Chen, J. & Brooks III, C.L. Peptide and protein folding and con-

formational equilibria: theoretical treatment of electrostatics and hydrogen

bonding with implicit solvent models (2005).

280. Lazaridis, T. & Karplus, M. Effective energy function for proteins in

solution. Proteins: Struct. Funct. Genet. 35, 133–52 (1999).

281. Zagrovic, B. & Pande, V.S. Solvent viscosity dependence of the folding

rate of a small protein: distributed computing study. J. Comput. Chem.

24, 1432–6 (2003).

282. Dominy, B.N. & Brooks, C.L.I. Identifying native-like protein structures

using physics-based potentials. J. Comput. Chem. 23, 147–60 (2002).

283. Felts, A.K., Gallicchio, E., Wallqvist, A. & Levy, R.M. Distinguishing

native conformations of proteins from decoys with an effective free energy

estimator based on the OPLS all-atom force field and the surface gener-

alized born solvent model. Proteins: Struct. Funct. Genet. 48, 404–22

(2002).

284. Feig, M. & III, B.C.L. Evaluating CASP4 predictions with physical energy

functions. Proteins: Struct. Funct. Genet. 49, 232–45 (2002).

285. Zhu, J., Zhu, Q., Shi, Y. & Liu, H. How well can we predict native contacts

in proteins based on decoy structures and their energies? Proteins: Struct.

Funct. Genet. 52, 598–608 (2003).

286. Forrest, L.R. & Woolf, T.B. Discrimination of native loop conformations in

membrane proteins: decoy library design and evaluation of effective energy

scoring functions. Proteins: Struct. Funct. Genet. 52, 492–509 (2003).

287. Fiser, A., Feig, M., Brooks, C. & Sali, A. Evolution and physics in com-

parative protein structure modeling. Acc. Chem. Res. 35, 413–21 (2002).

288. Lazaridis, T. & Karplus, M. Discrimination of the native from misfolded

protein models with an energy function including implicit solvation. J.

Mol. Biol. 288, 477–87 (1999).

289. Ramos, J. & Lazaridis, T. Energetic determinants of oligomeric state

specificity in coiled coils. J. Am. Chem. Soc. 128, 15499–510 (2006).

290. Donnini, S. & Juffer, A.H. Calculation of affinities of peptides for proteins.

J. Comput. Chem. 25, 393–411 (2004).

154

Page 167: Jra Phd Final 051107

291. Lazaridis, T. Binding affinity and specificity from computational studies.

Curr. Org. Chem. 6, 1319–32 (2002).

292. Mardis, K.L., Luo, R. & Gilson, M.K. Interpreting trends in the binding

of cyclic ureas to HIV-1 protease. J. Mol. Biol. 309, 507–17 (2001).

293. Ferrara, P., Gohlke, H., Price, D., Klebe, G. & Brooks, C. Assessing

scoring functions for protein-ligand interactions. J. Med. Chem. 47, 3032–

47 (2004).

294. Gohlke, H. & Case, D.A. Converging free energy estimates: MM-

PB(GB)SA studies on the protein-protein complex Ras-Raf. J. Comput.

Chem. 25, 238–50 (2004).

295. Ferrara, P. & Caflisch, A. Folding simulations of a three-stranded antipar-

allel β-sheet peptide. Proc. Natl. Acad. Sci. U. S. A. 97, 10780–5 (2000).

296. Lazaridis, T. & Karplus, M. “New view” of protein folding reconciled with

the old through multiple unfolding simulations. Science 278, 1928–31

(1997).

297. Paci, E., Vendruscolo, M. & Karplus, M. Native and non-native inter-

actions along protein folding and unfolding pathways. Proteins: Struct.

Funct. Genet. 47, 379–92 (2002).

298. Settanni, G., Gsponer, J. & Caflisch, A. Formation of the folding nucleus

of an SH3 domain investigated by loosely coupled molecular dynamics

simulations. Biophys. J. 86, 1691–701 (2004).

299. Gsponer, J. & Caflisch, A. Molecular dynamics simulations of protein

folding from the transition state. Proc. Natl. Acad. Sci. U. S. A. 99,

6719–24 (2002).

300. Gsponer, J. & Caflisch, A. Role of native topology investigated by multiple

unfolding simulations of four SH3 domains. J. Mol. Biol. 309, 285–98

(2001).

301. Zhu, J., Shi, Y. & Liu, H. Parametrization of a generalized Born/solvent-

accessible surface area model and applications to the simulation of protein

dynamics. J. Phys. Chem. B 106, 4844–53 (2002).

302. Dominy, B. & Brooks, C. Development of a generalized Born model

parametrization for proteins and nucleic acids. J. Phys. Chem. B 103,

3765–73 (1999).

155

Page 168: Jra Phd Final 051107

303. Calimet, N., Schaefer, M. & Simonson, T. Protein molecular dynamics

with the generalized born/ACE solvent model. Proteins: Struct. Funct.

Genet. 45, 144–58 (2001).

304. Shen, M.y. & Freed, K.F. Long time dynamics of met-enkephalin: com-

parison of explicit and implicit solvent models. Biophys. J. 82, 1791–808

(2002).

305. Krol, M. Comparison of various implicit solvent models in molecular dy-

namics simulations of immunoglobulin G light chain dimer. J. Comput.

Chem. 24, 531–46 (2003).

306. Wang, T. & Wade, R.C. Implicit solvent models for flexible protein-protein

docking by molecular dynamics simulation. Proteins: Struct. Funct. Genet.

50, 158–69 (2003).

307. Paci, E., Gsponer, J., Salvatella, X. & Vendruscolo, M. Molecular dynam-

ics studies of the process of amyloid aggregation of peptide fragments of

transthyretin. J. Mol. Biol. 340, 555–69 (2004).

308. Gsponer, J., Haberthur, U. & Caflisch, A. The role of side-chain interac-

tions in the early steps of aggregation: molecular dynamics simulations of

an amyloid-forming peptide from the yeast prion Sup35. Proc. Natl. Acad.

Sci. U. S. A. 100, 5154–9 (2003).

309. Rao, F. & Caflisch, A. The protein folding network. J. Mol. Biol. 342,

299–306 (2004).

310. Bursulaya, B. & Brooks, C. Comparative study of the folding free energy

landscape of a three-stranded β-sheet protein with explicit and implicit

solvent models. J. Phys. Chem. B 104, 12378–83 (2000).

311. Gnanakaran, S., Nymeyer, H., Portman, J., Sanbonmatsu, K. & Garcia, A.

Peptide folding simulations. Curr. Opin. Struct. Biol. 13, 168–74 (2003).

312. Zhou, R. & Berne, B.J. Can a continuum solvent model reproduce the free

energy landscape of a β-hairpin folding in water? Proc. Natl. Acad. Sci.

U. S. A. 99, 12777–82 (2002).

313. Pitera, J.W. & Swope, W. Understanding folding and design: replica-

exchange simulations of “Trp-cage” miniproteins. Proc. Natl. Acad. Sci.

U. S. A. 100, 7587–92 (2003).

156

Page 169: Jra Phd Final 051107

314. He, J., Zhang, Z., Shi, Y. & Liu, H. Efficiently explore the energy landscape

of proteins in molecular dynamics simulations by amplifying collective mo-

tions. J. Chem. Phys. 119, 4005–17 (2003).

315. Zagrovic, B., Sorin, E.J. & Pande, V. Beta-hairpin folding simulations in

atomistic detail using an implicit solvent model. J. Mol. Biol. 313, 151–69

(2001).

316. Zhou, R. Free energy landscape of protein folding in water: explicit vs.

implicit solvent. Proteins: Struct. Funct. Genet. 53, 148–61 (2003).

317. Suenaga, A. Replica-exchange molecular dynamics simulations for a small-

sized protein folding with implicit solvent. J. Mol. Struct. 634, 235–41

(2003).

318. Liu, Y. & Beveridge, D.L. Exploratory studies of ab initio protein structure

prediction: multiple copy simulated annealing, AMBER energy functions,

and a generalized born/solvent accessibility solvation model. Proteins:

Struct. Funct. Genet. 46, 128–46 (2002).

319. Ohkubo, Y.Z. & Brooks, Charles L., I. Exploring Flory’s isolated-pair

hypothesis: statistical mechanics of helix-coil transitions in polyalanine

and the C-peptide from RNase A. Proc. Natl. Acad. Sci. U. S. A. 100,

13916–21 (2003).

320. Karanicolas, J. & Brooks, Charles L., I. Integrating folding kinetics and

protein function: biphasic kinetics and dual binding specificity in a WW

domain. Proc. Natl. Acad. Sci. U. S. A. 101, 3432–7 (2004).

321. Lin, C.Y., Hu, C.K. & Hansmann, U.H.E. Parallel tempering simulations

of HP-36. Proteins: Struct. Funct. Genet. 52, 436–45 (2003).

322. Alves, N. & Hansmann, U. Solution effects and the folding of an artificial

peptide. J. Phys. Chem. B 107, 10284–91 (2003).

323. Rao, F. & Caflisch, A. Replica exchange molecular dynamics simulations

of reversible folding. J. Chem. Phys. 119, 4035–42 (2003).

324. Xia, B., Tsui, V., Case, D.A., Dyson, H.J. & Wright, P.E. Comparison

of protein solution structures refined by molecular dynamics simulation

in vacuum, with a generalized Born model, and with explicit water. J.

Biomol. NMR 22, 317–31 (2004).

157

Page 170: Jra Phd Final 051107

325. Moulinier, L., A., C.D. & Simonson, T. Reintroducing electrostatics into

protein X-ray structure refinement: bulk solvent treated as a dielectric

continuum. Acta Cryst. D59, 2094–103 (2003).

326. Gsponer, J., Hopearuoho, H., Whittaker, S.B.M., Spence, G.R., Moore,

G.R., Paci, E., Radford, S.E. & Vendruscolo, M. Determination of an

ensemble of structures representing the intermediate state of the bacterial

immunity protein Im7. Proc. Natl. Acad. Sci. U. S. A. 103, 99–104 (2006).

327. Paci, E., Greene, L.H., Jones, R.M. & Smith, L.J. Characterization of the

molten globule state of retinol-binding protein using a molecular dynamics

simulation approach. FEBS J. 272, 4826–38 (2005).

328. Best, R.B. & Vendruscolo, M. Determination of protein structures consis-

tent with NMR order parameters. J. Am. Chem. Soc. 126, 8090–1 (2004).

329. Daura, X., Antes, I., van Gunsteren, W.F., Thiel, W. & Mark, A.E. The

effect of motional averaging on the calculation of NMR-derived structural

properties. Proteins: Struct. Funct. Genet. 36, 542–55 (1999).

330. Kemmink, J. & Scheek, R. Dynamic modeling of a helical peptide in

solution using NMR data - multiple conformations and multi-spin effects.

J. Biomol. NMR 6, 33–40 (1995).

331. Bonvin, A.M. & Brunger, A.T. Conformational variability of solution

nuclear magnetic resonance structures. J. Mol. Biol. 250, 80–93 (1995).

332. Torda, A., Scheek, R. & van Gunsteren, W.F. Time-dependent distance

restraints in molecular dynamics simulations. Chem. Phys. Lett. 157, 289–

94 (1989).

333. Torda, A.E., Scheek, R.M. & van Gunsteren, W.F. Time-averaged nuclear

overhauser effect distance restraints applied to tendamistat. J. Mol. Biol.

214, 223–35 (1990).

334. Torda, A.E., Brunne, R.M., Huber, T., Kessler, H. & van Gunsteren, W.F.

Structure refinement using time-averaged J-coupling constant restraints.

J. Biomol. NMR 3, 55–66 (1993).

335. Bonvin, A., Boelens, R. & Kaptein, R. Time-averaged and ensemble aver-

aged direct NOE restraints. J. Biomol. NMR 4, 143–9 (1994).

336. Vendruscolo, M., Paci, E., Dobson, C.M. & Karplus, M. Rare fluctuations

of native proteins sampled by equilibrium hydrogen exchange. J. Am.

Chem. Soc. 125, 15686–7 (2003).

158

Page 171: Jra Phd Final 051107

337. Vendruscolo, M. & Dobson, C.M. Towards complete descriptions of the

free-energy landscapes of proteins. Philos. Transact. A Math Phys. Eng.

Sci. 363, 433–52 (2005).

338. Lindorff-Larsen, K., Best, R.B., Depristo, M.A., Dobson, C.M. & Vendr-

uscolo, M. Simultaneous determination of protein structure and dynamics.

Nature 433, 128–32 (2005).

339. Clore, G.M. & Schwieters, C.D. How much backbone motion in ubiquitin

is required to account for dipolar coupling data measured in multiple align-

ment media as assessed by independent cross-validation? J. Am. Chem.

Soc. 126, 2923–38 (2004).

340. Clore, G.M. & Schwieters, C.D. Amplitudes of protein backbone dynamics

and correlated motions in a small α/β protein: correspondence of dipolar

coupling and heteronuclear relaxation measurements. Biochemistry 43,

10678–91 (2004).

341. Clore, G.M. & Schwieters, C.D. Concordance of residual dipolar couplings,

backbone order parameters and crystallographic B-factors for a small α/β

protein: a unified picture of high probability, fast atomic motions in pro-

teins. J. Mol. Biol. 355, 879–86 (2006).

342. Hess, B. & Scheek, R.M. Orientation restraints in molecular dynamics

simulations using time and ensemble averaging. J. Magn. Reson. 164,

19–27 (2003).

343. Gsponer, J., Hopearuoho, H., Cavalli, A., Dobson, C. & Vendruscolo, M.

Geometry, energetics, and dynamics of hydrogen bonds in proteins: struc-

tural information derived from NMR scalar couplings. J. Am. Chem. Soc.

128, 15127–35 (2006).

344. Richter, B., Gsponer, J., Varnai, P., Salvatella, X. & Vendruscolo, M. The

MUMO (minimal under-restraining minimal over-restraining) method for

the determination of native state ensembles of proteins. J. Biomol. NMR

37, 117–35 (2007).

345. Fennen, J., Torda, A.E. & van Gunsteren, W.F. Structure refinement with

molecular dynamics and a Boltzmann-weighted ensemble. J. Biomol. NMR

6, 163–70 (1995).

346. Vendruscolo, M. & Paci, E. Protein folding: bringing theory and experi-

ment closer together. Curr. Opin. Struct. Biol. 13, 82–7 (2003).

159

Page 172: Jra Phd Final 051107

347. Kuszewski, J., Gronenborn, A. & Clore, G. Improving the packing and

accuracy of NMR structures with a pseudopotential for the radius of gy-

ration. J. Am. Chem. Soc. 121, 2337–8 (1999).

348. Nose, S. A unified formulation of the constant temperature molecular

dynamics methods. J. Chem. Phys. 81, 511–9 (1984).

349. Hoover, W.G. Canonical dynamics: equilibrium phase-space distributions.

Phys. Rev. A 31, 1695–7 (1985).

350. Ryckaert, J.P., Ciccotti, G. & Berendsen, H.J.C. Numerical integration of

the Cartesian equations of motion of a system with constraints: molecular

dynamics of n-alkanes. J. Comput. Phys. 23, 327–41 (1977).

351. MacKerell, A., Bashford, D., Bellott, M., Dunbrack, R., Evanseck, J.,

Field, M., Fischer, S., Gao, J., Guo, H., Ha, S., Joseph-McCarthy, D.,

Kuchnir, L., Kuczera, K., Lau, F., Mattos, C., Michnick, S., Ngo, T.,

Nguyen, D., Prodhom, B., Reiher, W., Roux, B., Schlenkrich, M., Smith,

J., Stote, R., Straub, J., Watanabe, M., Wiorkiewicz-Kuczera, J., Yin, D.

& Karplus, M. All-atom empirical potential for molecular modeling and

dynamics studies of proteins. J. Phys. Chem. B 102, 3586–616 (1998).

352. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W. & Klein,

M.L. Comparison of simple potential functions for simulating liquid water.

J. Chem. Phys. 79, 926–35 (1983).

353. Im, W., Lee, M.S. & Brooks, C. L., r. Generalized born model with a

simple smoothing function. J. Comput. Chem. 24, 1691–702 (2003).

354. Im, W., Feig, M. & Brooks, Charles L., I. An implicit membrane general-

ized Born theory for the study of structure, stability, and interactions of

membrane proteins. Biophys. J. 85, 2900–18 (2003).

355. Ferrara, P., Apostolakis, J. & Caflisch, A. Evaluation of a fast implicit

solvent model for molecular dynamics simulations. Proteins: Struct. Funct.

Genet. 46, 24–33 (2002).

356. Garcia de la Torre, J., Huertas, M.L. & Carrasco, B. Calculation of hydro-

dynamic properties of globular proteins from their atomic-level structure.

Biophys. J. 78, 719–30 (2000).

357. Karplus, M. Contact electron-spin coupling of nuclear magnetic moments.

J. Chem. Phys. 30, 11–5 (1959).

160

Page 173: Jra Phd Final 051107

358. Pardi, A., Billeter, M. & Wuthrich, K. Calibration of the angular de-

pendence of the amide proton-Cα proton coupling constants, 3JHNα, in

a globular protein : use of 3JHNα for identification of helical secondary

structure. J. Mol. Biol. 180, 741–51 (1984).

359. Bax, A. Weak alignment offers new NMR opportunities to study protein

structure and dynamics. Protein Sci. 12, 1–16 (2003).

360. Zhou, H.X. Dimensions of denatured protein chains from hydrodynamic

data. J. Phys. Chem. B 106, 5769–75 (2002).

361. Lacroix, E., Viguera, A.R. & Serrano, L. Elucidating the folding problem of

α-helices: local motifs, long-range electrostatics, ionic-strength dependence

and prediction of NMR parameters. J. Mol. Biol. 284, 173–91 (1998).

362. Munoz, V. & Serrano, L. Development of the multiple sequence approxi-

mation within the AGADIR model of α-helix formation: Comparison with

Zimm-Bragg and Lifson-Roig formalisms. Biopolymers 41, 495–509 (1997).

363. Munoz, V. & Serrano, L. Elucidating the folding problem of helical pep-

tides using empirical parameters. II. Helix macrodipole effects and rational

modification of the helical content of natural peptides. J. Mol. Biol. 245,

275–96 (1995).

364. Munoz, V. & Serrano, L. Elucidating the folding problem of helical pep-

tides using empirical parameters. III.Temperature and pH dependence. J.

Mol. Biol. 245, 297–308 (1995).

365. Kyte, J. & Doolittle, R.F. A simple method for displaying the hydropathic

character of a protein. J. Mol. Biol. 157, 105–32 (1982).

366. Pawar, A.P., Dubay, K.F., Zurdo, J., Chiti, F., Vendruscolo, M.

& Dobson, C.M. Prediction of “aggregation-prone”and “aggregation-

susceptible”regions in proteins associated with neurodegenerative diseases.

J. Mol. Biol. 350, 379–92 (2005).

367. DuBay, K.F., Pawar, A.P., Chiti, F., Zurdo, J., Dobson, C.M. & Vendr-

uscolo, M. Prediction of the absolute aggregation rates of amyloidogenic

polypeptide chains. J. Mol. Biol. 341, 1317–26 (2004).

368. Srinivasan, J., Cheatham, T., Cieplak, P., Kollman, P. & Case, D. Contin-

uum solvent studies of the stability of DNA, RNA, and phosphoramidate-

DNA helices. J. Am. Chem. Soc. 120, 9401–9 (1998).

161

Page 174: Jra Phd Final 051107

369. Geney, R., Layten, M., Gomperts, R., Hornak, V. & Simmerling, C. In-

vestigation of salt bridge stability in a generalized Born solvent model. J.

Chem. Theory Comput. 2, 115–27 (2006).

370. Born, M. Volumen und Hydratationswarme der Ionen. Z. Phys 1, 45–8

(1920).

371. Still, W.C., Tempczyk, A., Hawley, R.C. & Hendrickson, T. Semianalytical

treatment of solvation for molecular mechanics and dynamics. J. Am.

Chem. Soc. 112, 6127–9 (1990).

372. David, L., Luo, R. & Gilson, M.K. Comparison of generalized born and

poisson models: energetics and dynamics of HIV protease. J. Comput.

Chem. 21, 295–309 (2000).

373. Luo, R., David, L. & Gilson, M.K. Accelerated Poisson-Boltzmann calcu-

lations for static and dynamic systems. J. Comput. Chem. 23, 1244–53

(2002).

374. Im, W., Beglov, D. & Roux, B. Continuum solvation model: computation

of electrostatic forces from numerical solutions to the Poisson-Boltzmann

equation. Comput. Phys. Comm. 111, 59–75 (1998).

375. Humphrey, W., Dalke, A. & Schulten, K. VMD: Visual molecular dynam-

ics. J. Mol. Graph. 14, 33–8 (1996).

376. Simmerling, C., Strockbine, B. & Roitberg, A. All-atom structure predic-

tion and folding simulations of a stable protein. J. Am. Chem. Soc. 124,

11258–9 (2002).

377. Felts, A.K., Harano, Y., Gallicchio, E. & Levy, R.M. Free energy surfaces

of α-hairpin and β-helical peptides generated by replica exchange molecu-

lar dynamics with the AGBNP implicit solvent model. Proteins: Struct.

Funct. Genet. 56, 310–21 (2004).

378. Formaneck, M.S. & Cui, Q. The use of a generalized born model for the

analysis of protein conformational transitions: a comparative study with

explicit solvent simulations for chemotaxis Y protein (CheY). J. Comput.

Chem. 27, 1923–43 (2006).

379. Zimmermann, K., Hagedorn, H., Heuck, C., Hinrichsen, M. & Ludwig, H.

The ionic properties of the filamentous bacteriophages Pf1 and fd. J. Biol.

Chem. 261, 1653–5 (1986).

162

Page 175: Jra Phd Final 051107

380. Zagrovic, B. & van Gunsteren, W.F. Comparing atomistic simulation data

with the NMR experiment: how much can NOEs actually tell us? Proteins:

Struct. Funct. Bioinform. 63, 210–8 (2006).

381. Brunger, A.T., Clore, G.M., Gronenborn, A.M., Saffrich, R. & Nilges, M.

Assessing the quality of solution nuclear magnetic resonance structures by

complete cross-validation. Science 261, 328–31 (1993).

382. Brunger, A.T. Assessment of phase accuracy by cross validation: the free

R value. Methods and applications. Acta Crystallogr. D Biol. Crystallogr.

49, 24–36 (1993).

383. Burling, F.T., Weis, W.I., Flaherty, K.M. & Brunger, A.T. Direct obser-

vation of protein solvation and discrete disorder with experimental crys-

tallographic phases. Science 271, 72–7 (1996).

384. Brunger, A.T. Free R value: a novel statistical quantity for assessing the

accuracy of crystal structures. Nature 355, 472–5 (1992).

385. Vendruscolo, M. Determination of conformationally heterogeneous states

of proteins. Curr. Opin. Struct. Biol. 17, 15–20 (2007).

386. Burgi, R., Pitera, J. & van Gunsteren, W.F. Assessing the effect of con-

formational averaging on the measured values of observables. J. Biomol.

NMR 19, 305–20 (2001).

387. Choy, W.Y., Mulder, F.A., Crowhurst, K.A., Muhandiram, D.R., Millett,

I.S., Doniach, S., Forman-Kay, J.D. & Kay, L.E. Distribution of molecular

size within an unfolded state ensemble using small-angle X-ray scattering

and pulse field gradient NMR techniques. J. Mol. Biol. 316, 101–12 (2002).

388. McHaourab, H.S., Lietzow, M.A., Hideg, K. & Hubbell, W.L. Motion of

spin-labeled side chains in T4 lysozyme. Correlation with protein structure

and dynamics. Biochemistry 35, 7692–704 (1996).

389. Langen, R., Oh, K.J., Cascio, D. & Hubbell, W.L. Crystal structures of

spin labeled T4 lysozyme mutants: implications for the interpretation of

EPR spectra in terms of structure. Biochemistry 39, 8396–405 (2000).

390. Jiao, D., Barfield, M., Combariza, J.E. & Hruby, V.J. Ab initio molecular

orbital studies of the rotational barriers and the sulfur-33 and carbon-13

chemical shieldings for dimethyl disulfide. J. Am. Chem. Soc. 114, 3639–43

(1992).

163

Page 176: Jra Phd Final 051107

391. Altenbach, C., Oh, K.J., Trabanino, R.J., Hideg, K. & Hubbell, W.L. Es-

timation of inter-residue distances in spin labeled proteins at physiological

temperatures: experimental strategies and practical limitations. Biochem-

istry 40, 15471–82 (2001).

392. Rabenstein, M.D. & Shin, Y.K. Determination of the distance between

two spin labels attached to a macromolecule. Proc. Natl. Acad. Sci. U. S.

A. 92, 8239–43 (1995).

393. Svergun, D., Barberato, C. & Koch, M.H.J. CRYSOL - a program to eval-

uate X-ray solution scattering of biological macromolecules from atomic

coordinates. J. Appl. Cryst. 28, 768–73 (1995).

394. Lipari, G. & Szabo, A. Model-free approach to the interpretation of nuclear

magnetic resonance relaxation in macromolecules. 1. Theory and range of

validity. J. Am. Chem. Soc. 104, 4546–59 (1982).

395. Lipari, G. & Szabo, A. Model-free approach to the interpretation of nu-

clear magnetic resonance relaxation in macromolecules. 2. Analysis of ex-

perimental results. J. Am. Chem. Soc. 104, 4559–70 (1982).

396. Woessner, D.E. Nuclear spin relaxation in ellipsoids undergoing rotational

Brownian motion. J. Chem. Phys. 37, 647–54 (1962).

397. Peng, J.W. & Wagner, G. Mapping of the spectral densities of nitrogen-

hydrogen bond motions in Eglin c using heteronuclear relaxation experi-

ments. Biochemistry 31, 8571–86 (1992).

398. Solomon, I. Relaxation processes in a system of two spins. Phys. Rev. 99,

559–565 (1955).

399. Bloembergen, N. Proton relaxation times in paramagnetic solutions. J.

Chem. Phys. 27, 572–3 (1957).

400. Zagrovic, B., Lipfert, J., Sorin, E.J., Millett, I.S., van Gunsteren, W.F.,

Doniach, S. & Pande, V.S. Unusual compactness of a polyproline type II

structure. Proc. Natl. Acad. Sci. U. S. A. 102, 11698–703 (2005).

401. Zagrovic, B. & Pande, V.S. How does averaging affect protein structure

comparison on the ensemble level? Biophys. J. 87, 2240–6 (2004).

402. Bussell, Robert, J. & Eliezer, D. Residual structure and dynamics in

Parkinson’s disease-associated mutants of α-synuclein. J. Biol. Chem. 276,

45996–6003 (2001).

164

Page 177: Jra Phd Final 051107

403. Necula, M., Chirita, C.N. & Kuret, J. Rapid anionic micelle-mediated

α-synuclein fibrillization in vitro. J. Biol. Chem. 278, 46674–80 (2003).

404. Ahmad, M.F., Ramakrishna, T., Raman, B. & Rao Ch, M. Fibrillogenic

and non-fibrillogenic ensembles of SDS-bound human α-synuclein. J. Mol.

Biol. 364, 1061–72 (2006).

405. Antony, T., Hoyer, W., Cherny, D., Heim, G., Jovin, T.M. & Subrama-

niam, V. Cellular polyamines promote the aggregation of α-synuclein. J.

Biol. Chem. 278, 3235–40 (2003).

406. Fernndez, C.O., Hoyer, W., Zweckstetter, M., Jares-Erijman, E., Subra-

maniam, V., Griesinger, C. & Jovin, T.M. NMR of α-synuclein-polyamine

complexes elucidates the mechanism and kinetics of induced aggregation.

EMBO J. 23, 2039–46 (2004).

407. Uversky, V.N., Li, J. & Fink, A.L. Evidence for a partially folded inter-

mediate in α-synuclein fibril formation. J. Biol. Chem. 276, 10737–10744

(2001).

408. Eliezer, D., Chung, J., Dyson, H. & Wright, P. Native and non-native sec-

ondary structure and dynamics in the pH 4 intermediate of apomyoglobin.

Biochemistry 39, 2894–2901 (2000).

409. Katou, H., Hoshino, M., Kamikubo, H., Batt, C.A. & Goto, Y. Native-like

β-hairpin retained in the cold-denatured state of bovine β-lactoglobulin.

J. Mol. Biol. 310, 471–84 (2001).

410. Birkett, N. Studies of the formation and characterisation of amyloid fibrils

by the PI3-SH3 domain. PhD Thesis (2007).

411. Ahn, H.C., Le, Y.T., Nagchowdhuri, P.S., Derose, E.F., Putnam-Evans, C.,

London, R.E., Markley, J.L. & Lim, K.H. NMR characterizations of an

amyloidogenic conformational ensemble of the PI3K SH3 domain. Protein

Sci. 15, 2552–7 (2006).

412. Xu, W., Harrison, S.C. & Eck, M.J. Three-dimensional structure of the

tyrosine kinase c-Src. Nature 385, 595–602 (1997).

413. Noble, M.E., Musacchio, A., Saraste, M., Courtneidge, S. & Wierenga,

R. Crystal structure of the SH3 domain in human Fyn; comparison of

the three-dimensional structures of SH3 domains in tyrosine kinases and

spectrin. EMBO J. 12, 2617–24 (1993).

165

Page 178: Jra Phd Final 051107

414. Martinez, J.C., Pisabarro, M.T. & Serrano, L. Obligatory steps in protein

folding and the conformational diversity of the transition state. Nat. Struct.

Mol. Biol. 5, 721–9 (1998).

415. Booth, D.R., Sunde, M., Bellotti, V., Robinson, C.V., Hutchinson, W.L.,

Fraser, P.E., Hawkins, P.N., Dobson, C.M., Radford, S.E., Blake, C.C.

& Pepys, M.B. Instability, unfolding and aggregation of human lysozyme

variants underlying amyloid fibrillogenesis. Nature 385, 787–93 (1997).

416. Horwich, A.L. & Weissman, J.S. Deadly conformations-protein misfolding

in prion disease. Cell 89, 499–510 (1997).

417. Uversky, V.N. & Fink, A.L. Conformational constraints for amyloid fibril-

lation: the importance of being unfolded. Biochim. Biophys. Acta 1698,

131–53 (2004).

418. Liu, K., Cho, H.S., Lashuel, H.A., Kelly, J.W. & Wemmer, D.E. A glimpse

of a possible amyloidogenic intermediate of transthyretin. Nat. Struct.

Biol. 7, 754–7 (2000).

419. McParland, V.J., Kalverda, A.P., Homans, S.W. & Radford, S.E. Struc-

tural properties of an amyloid precursor of β2-microglobulin. Nat. Struct.

Biol. 9, 326–31 (2002).

166