PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES:...

22
JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in PART I 1 COPYRIGHTED MATERIAL

Transcript of PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES:...

Page 1: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

PART I

1

COPYRIG

HTED M

ATERIAL

Page 2: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

2

Page 3: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

1X-RAY CRYSTALLOGRAPHY OF BIOLOGICALMACROMOLECULES: FUNDAMENTALS ANDAPPLICATIONS

Antonio L. Llamas-Saiz and Mark J. van Raaij

1.1 INTRODUCTION

X-ray crystallography is a powerful technique to determinethe three-dimensional structure of any kind of moleculeat atomic resolution, including that of biological macro-molecules like proteins, nucleic acids, or any complexbetween them or with smaller compounds like ligands, drugs,cofactors, or inhibitors. The experimental result providedby this technique is the three-dimensional electron densitymap corresponding to the crystal subjected to the diffrac-tion experiment. In this detailed and “amplified” imageof the crystal, an atomic model for the molecules presentcan be built. The theoretical background involved in X-raycrystallography is very broad, covering different disciplineslike mathematics, physics, chemistry, and even biology. Theexperimental setups can also be very complicated, like thebeam lines at synchrotron installations, which include opticaland experimental hutches full of dedicated equipment. In thischapter, we will try to cover the main concepts to understandthe basic theory behind an X-ray diffraction crystal structuredetermination and to outline, from a practical viewpoint,the principal steps in order to facilitate the interpretationof the structural determination process and the final resultsobtained.

1.2 FUNDAMENTALS OF X-RAY DIFFRACTION

1.2.1 X-Ray Radiation and Interaction with Matter

X-rays consist of photons from the electromagnetic spec-trum with energies above ultraviolet light and below gammaradiation. The energy ranges from approximately 0.12 to

120 keV, corresponding to wavelengths between 100 and0.1 Å, respectively (1 Å equals 0.1 nm). The most energeticX-rays, known as hard X-rays, are the ones used in crystallog-raphy for single crystal structure determination due to theirpenetrating abilities and due to their wavelengths that varyfrom 2 to 0.5 Å, similar to the shortest interatomic distancespresent in solid matter [1].

X-rays interact almost exclusively with the electrons ofmatter. They do this in different ways, via absorption, pho-toelectric, Compton, and Thompson scattering. Thompsonscattering, also called as coherent or elastic scattering, ispredominant in the X-ray diffraction pattern obtained froma crystal. It is a pure scattering interaction and deposits noenergy in the scattering material. In the classical free electronmodel developed by J.J. Thompson in 1898, the charged par-ticle interacts with the X-ray electromagnetic field and startsto oscillate. Consequently, it emits secondary radiation of thesame wavelength (same energy) in all directions. The inten-sity distribution as a function of the scattering angle (anglebetween incident and scattered radiation) found using thisclassical model is comparable to that obtained from quantummechanical calculations. As we can consider the electronsas the unique X-ray scatters in a crystal, diffraction shouldtherefore reveal the distribution of electrons, or the electrondensity, of the atoms or molecules of that crystal.

1.2.2 Crystals and Symmetry

Why do we need crystals? Reconstructing the image of asingle molecule using X-rays is still not possible, mainlyfor the following two reasons. The first one is that thereis no easy way to focus X-ray-scattered beams by lenses.

Proteins in Solution and at Interfaces: Methods and Applications in Biotechnology and Materials Science, First Edition. Edited by Juan M. Ruso and Angel Pineiro.C© 2013 John Wiley & Sons, Inc. Published 2013 by John Wiley & Sons, Inc.

3

Page 4: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

4 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

The second reason is that a single molecule scatters X-raysvery weakly. Having said that, X-ray diffraction from sin-gle molecules using X-ray lasers is under development [2].Both limitations can be surpassed by the use of crystals. Thecrystalline state leads to the concentration of the scatteredintensities for every irradiated molecule in small and well-defined regions of space (i.e., generates a diffraction pattern),thus increasing the local intensities many-fold and facilitat-ing their measurement. After the “phase problem” has beensolved (see Sections 1.2.7 and 1.3.4), the recombination ofthe diffracted beams is then performed by a crystallographerusing crystallographic software. This is analogous to whathappens in real time using an optical microscope.

The crystalline state is defined by the repetition of a sin-gle elemental unit (motif) by means of identical translations.In practice, the final result can be mono-dimensional crystals(fibers, as used in fiber diffraction), bi-dimensional crystals (asingle layer of ordered molecules as used in electron diffrac-tion), or three-dimensional crystals (used in single crystalstructure determination). That means that the crystal is theconvolution of a repeating motif with a periodic lattice. Acrystal is formed by a motif that repeats in a perfectly regu-lar pattern in three dimensions. Choosing any arbitrary pointin the pattern and all the equivalent points related by trans-lation, a three-dimensional lattice can be defined in whichall lattice points have exactly the same environment. A fun-damental difference between a crystal (the pattern) and itslattice is that the first is a continuous media (like its electrondensity) and the lattice is discontinuous. The point latticeis determined by all the points that correspond to succes-sive repetitions of identical crystal components. A latticemay have additional symmetry operators besides its owntranslation operators and the symmetry operators belong-ing to the point group or space group of the correspond-ing crystal. For example, all crystallographic lattices arecentrosymmetric.

A lattice plane can be defined for every set of three nonco-linear lattice points. All the equivalent planes in the lattice,parallel with the same periodical repetition, constitute theassociated family of lattice planes. They are unambiguouslynamed by the three Miller indices (hkl) that correspond to thenumber of times that the planes intersect each of the threeunit cell vectors a, b, and c, respectively.

The unit cell is the parallelepiped built on the basis vec-tors, a, b, and c, of a crystal lattice, which can be selected inmany different ways. The most convenient way is to choosethat volume enclosed by the set of three noncoplanar lat-tice vectors with the shortest possible lengths and sorted ina “right-handed” way. A primitive unit cell, containing onlyone lattice point, can always be defined. However, for sym-metry reasons, basis vectors defining nonprimitive unit cells(i.e., face- or body-centered) are sometimes used instead,because they provide a more convenient coordinate systemand set of basis vectors.

TABLE 1.1 Crystal Systems and Bravais Lattices

CrystalSystem

LatticeCenteringSymbol

LatticeSymmetrya

ConditionsImposed by

Symmetry on UnitCell Geometry

Triclinic P −1 (Ci) NoneMonoclinic P 2/m (C2h) Unique axis b:

α = γ = 90◦

COrthorhombic P mmm (D2h) α = β = γ = 90◦

CFI

Tetragonalb P 4/mmm (D4h) a = b;α = β = γ = 90◦

ITrigonal R −3m (D3d) a = b = c;

α = β = γ

Hexagonal Pb 6/mmm (D6h) a = b;α = β = 90◦;γ = 120◦

Cubic P m-3m (Oh) a = b = c;α = β = γ = 90◦

FI

aHermann–Mauguin (and Schoenflies) symbols.bThe primitive hexagonal lattice is common to the trigonal and hexagonalcrystal systems.

In three dimensions, seven kinds of lattices, or crystalsystems, are possible: triclinic, monoclinic, orthorhombic,tetragonal, trigonal, hexagonal, and cubic (Table 1.1). Thecombination of the seven crystal systems and the possibilityof choosing nonprimitive unit cells give rise to 14 Bravaislattices.

The classification of a crystal into a crystal system isalways determined by the symmetry of the lattice (the Laueclass to which the crystal structure belongs, see next para-graph) and not to the relationships between the unit cellmetric values. For example, a tetragonal unit cell will alwayshave a = b and α = β = γ = 90◦; however, the c axiscould take any value, in most cases different from a and b,but it could be equal just by chance and still belong to thetetragonal system instead of to the cubic system.

By definition, symmetry point groups apply to any objectwhere at least one point remains invariant after the applica-tion of all its symmetry operations. Crystallographic pointgroups play their role in three-dimensional lattices (not inthree-dimensional space in general), and in this particularcase the rotations and rotoinversions allowed are restrictedto 1, 2, 3, 4, 6, and −1, −2(=m), −3, −4 −6, respectively.There are 32 crystallographic point groups (Table 1.2), alsoknown as crystal classes. The Laue classes correspond to11 centrosymmetric crystallographic point groups. On the

Page 5: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

FUNDAMENTALS OF X-RAY DIFFRACTION 5

TABLE 1.2 The 32 Crystallographic Point Groups

Laue ClassesNoncentrosymmetric GroupsHaving the Same Laue Class

1 12/m 2, mmmm 222, 2mm3 33m 32, 3m4/m 4, 44/mmm 422, 42m, 42m6/m 6, 66/mmm 622, 62m, 62mm3 23m3m 432, 432

In bold the 11 enantiomorphic point groups.

other hand, the crystal classes that include inversion centersor mirror planes are not allowed for crystals of enantiomer-ically pure substances, like the biological macromolecules.Crystals of chiral molecules display only one of the 11 enan-tiomorphic point groups (Table 1.2].

The combination of the 32 crystallographic point groupswith the 14 Bravais lattices gives rise to 73 symmorphicspace groups. In a symmorphic space group, all generatingsymmetry operations leave at least one common point fixed,of course, with the exception of the lattice translations. Tocomplete the 230 space groups possible in three-dimensionalcrystal patterns another kind of symmetry elements shouldbe taken into account. They are the screw axes and glideplanes, where the rotations or reflections are combined withtranslational displacements, respectively.

Crystallographic space groups apply to infinite periodicpatterns. Therefore, according to the previous description, thesymmetry elements of the space groups are translations, sym-metry elements of the crystallographic point groups, screwaxes and glide planes. In any case, the space group of acrystal structure determines its point group uniquely and notvice versa. For a complete description of all symmetry ele-ments compatible with three-dimensional periodic patterns(crystals), see the International Tables for Crystallography,Volume A, in Reference 3. Space groups with mirror planesand/or inversion centers are not allowed for crystals of bio-logical macromolecules, like proteins or nucleic acids, dueto the enantiopure nature of these molecules. This means thatthere are only 65 space groups available for the enantiomor-phic crystal structures of biological macromolecules.

The symmetry elements of the crystal space group operateinside the crystal unit cell; therefore, it is possible to definean “asymmetric unit” of the unit cell. The asymmetric unitis the independent fraction of the unit cell that generates thewhole crystal structure once all the symmetry operations ofthe space group are applied. The structural description ofthis asymmetric unit plus the indication of the corresponding

space group is all that is needed to represent the completecrystal structure (and is thus what is normally used by crystal-lographers, crystallographic programs, and what is depositedin databases such as the Protein Data Bank).

1.2.3 Diffraction by Crystals

Crystals are constituted by atoms; therefore let us first con-sider the X-ray scattering by the atomic electron cloud (con-sidered spherical in shape in a first approximation). The scat-tering amplitude of an atom is called the atomic scatteringfactor, or form factor, f. It expresses the scattering power ofone atom in relation of that from a free single electron, andit is calculated and averaged for spherical electron densitydistributions.

The values for f are tabulated in the International Tablesfor Crystallography, Volume C, Table 6.1.1.1, page 555, inReference 4, for each atom type as a function of sin θ /λ.Usually its value is calculated using Equation 1.1 and thetabulated set of nine Cromer–Mann coefficients ai, bi, c,(i = 1 to 4) in a parameterization of the nondispersive part ofthe atomic scattering factor for each atom (see Table 6.1.1.4in Reference 4]. This expression is very convenient for cal-culation in crystal structure software suites. These valuesare real numbers if the X-ray wavelength is not close toan absorption edge of the atom. Near the absorption edges,the atomic scattering factors become complex numbers asexpressed in Equation 1.2, where f is the “normal” atomicscattering factor, f ′ is the real part of the correction, andf ′′ is the imaginary one, which is always π /2 out of phaseahead of f [5]. The anomalous dispersion (or more rigor-ously, resonant scattering) effect, far from being an incon-venience, is a very useful tool to solve crystal structures ofmacromolecules (see Friedel’s law description below andSection 1.3.4.3).

There is always an angular dependence for the scatteringamplitude of an atom, it decays with increasing scatteringangle for two reasons. The first reason is interference inter-actions between the scattered rays from different regions ofthe atomic electronic cloud. In the incident beam directionθ = 0, all electrons scatter in phase, there is no decay for thisreason, and the atomic scattering factor value is identical tothe number of electrons in the atom. This type of decay isreflected in the tabulated values and represented with solidlines in Figure 1.1. The second source of decay is due to theatomic displacement effects that cause that the apparent sizeof an atom is larger than it will be at rest during the X-rayexposure time, dashed line in Figure 1.1 and Equation 1.3.The spreading of the atomic electronic cloud may be dueto temperature-dependent atomic vibrations around the equi-librium position, dynamic disorder, or to the situation whereequivalent atoms in different unit cells stray around differentequilibrium positions. This is called static disorder and istemperature-independent. During a typical X-ray diffraction

Page 6: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

6 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

f

8

ON

C

7

6

0.1 0.3sin θ/λ

0.5 0.7

f = f ° + f ′ + i · f ″

(1.1)

(1.2)

(1.3)fB = f · e –B(sinθ/λ)2

B = 8π2U2

f °(sinθ/λ) = + ci = 1

4ai · e –bi(sinθ/λ)2Σ

FIGURE 1.1 Schematic representation of the theoretical atomic scattering factors for C, N, and Oatoms at rest as a function of the scattering angle (solid lines). Faster decay is observed for vibratingatoms and plotted for the C atom (dashed line) when the atomic displacement parameter, U or B, isdifferent from zero.

experiment, the measured intensities of the diffracted beamshave been averaged in space over all the unit cells that diffractsimultaneously. They have also been averaged in time duringthe data acquisition period, which is much longer than theatomic vibration periods.

When considering the X-ray scattering by the whole crys-tal, periodicity imposes discontinuity in the resulting diffrac-tion pattern. All scattering intensities are concentrated andmagnified in well-defined directions in space where con-structive interference of waves occur and are recorded asclear points in the X-ray detector. The conditions for con-structive interference of the diffracted beams are defined bythe Bragg’s law (Fig. 1.2) or the equivalent Laue equations.To obtain constructive interference between both waves, theequivalence 2d sin θ = nλ must hold, n being any positiveinteger (this is the mathematical expression of Bragg’s law).Sometimes it is useful to indicate this relation in terms of thecorresponding family of planes (by use of the Miller indices)

FIGURE 1.2 Geometrical representation of Bragg’s law. Thepath differences between the X-ray waves that reach the first andsecond horizontal crystal planes of atoms separated with distance dis equal to two times d sin θ .

instead of the interplanar distance d. This gives rise to the setof Laue equations in the three-dimensional space:

a · (s − so) = hλ

b · (s − so) = kλ

c · (s − so) = lλ,

where a, b, and c are the unit cell vectors, h, k, and l are theMiller indices of the corresponding family of planes, and sand so are the unit vectors along the incident and reflecteddirections, respectively.

1.2.4 Real and Reciprocal Space

Given any crystal lattice in real space, it is always possibleto construct its one-to-one related counterpart in reciprocalspace, the reciprocal crystal lattice. The reciprocal lattice isa very convenient tool for constructing and analyzing the X-ray diffraction pattern. It is obtained by positioning its latticepoints along the direction perpendicular to each family ofreal lattice planes and at a distance from the origin, d∗, equalto the inverse of the interplanar distance corresponding tothis family, d∗ = 1/d. According to this construction, eachreciprocal lattice point is univocally associated to a family oflattice planes in real space. Therefore, the Miller indices ofthis family also correspond to the coordinates of one latticepoint in the three-dimensional reciprocal lattice.

As it is clearly stated in Bragg’s law there is an inverserelation between the diffraction angle θ and interplanar dis-tances, d. Reflections measured at higher diffraction anglescorrespond to shorter values of d and therefore contain struc-tural information about the electronic density distribution athigher resolution. More detail can be seen in the electrondensity maps calculated with data measured up to higherdiffraction angles.

Page 7: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

FUNDAMENTALS OF X-RAY DIFFRACTION 7

1.2.5 Structure Factors

The structure factor represents the total scattered wave by allthe electrons in the whole unit cell. The effective number ofscattering electrons is called the structure factor, F, becauseit depends on the structure, that is, the electronic densitydistribution of the atoms in the unit cell. Due to the regularperiodicity in the crystals it also depends on the scatteringdirection. The structure factor can be regarded as the sumof the scattering by the atoms in the unit cell, taking intoconsideration their positions and the corresponding phasedifferences between the scattered waves.

F(h,k,l) =atoms∑

j=1

f( j) exp[2π · i(hx( j) + ky( j) + lz( j))] (1.4)

F(h, k, l) = |F(h, k, l)|eiα(h,k,l)

Fhkl =atoms∑

j=1

f j (a j + ib j ) = A + i B.

It is a complex (vectorial) magnitude and therefore can berepresented in different ways, for example, with module anddirection (phase) or as a complex number (real and imagi-nary part) as shown in the Argand diagram (Fig. 1.3). It isimportant not to confuse the mentioned “direction” of thestructure factor vector in the complex space, which indicatesthe phase of the structure factor, with the “direction” of thediffracted X-ray beam in real space, which is determined bythe crystal lattice geometry and the particular setup for thediffraction experiment.

Im

⏐F⏐=(A2+B2)½

B=⏐F⏐sinφ

A=⏐F⏐cosφ

φπ 0 Re

φ=tan–1(B/A)

F hkl

FIGURE 1.3 Argand diagram for the representation of complexmagnitudes (like the structure factors) in the complex plane. Realand imaginary components are located along horizontal and verticalaxes, respectively.

The diffracted intensity is proportional to the square ofthe modulus of the structure factor, I ∝ |Fh|2. When theanomalous dispersion effect is negligible the atomic scat-tering factors, f, of all atoms are real, and accordingly|Fhkl|2 = |F-h-k-l|2, that is, the intensities of the hkl and -h-k-l reflections (Friedel’s pair) are equal and it is known as theFriedel’s rule. The rule does not hold for noncentrosymmet-ric crystals containing atoms showing anomalous dispersion,because of the imaginary part of the atomic scattering fac-tors, f ′′. The difference between these intensities becomeslarger when the X-ray wavelength used is close to an absorp-tion edge of a particular atom in the crystal. Synchrotronradiation, which is a tunable X-ray source, may be used forthis purpose. When the differences in intensity between bothcomponents of the Friedel’s pairs are clearly measured, thediffraction pattern reveals the symmetry of the actual pointgroup of the crystal.

1.2.6 Fourier Synthesis and Transform

The electron density distribution is a periodic function; there-fore, it can be described as a Fourier series.

ρ(x, y, z) =∑

h′

k ′

l ′Ch′k ′l ′e

2π i(h′x+k ′ y+l ′z). (1.5)

Analogously to the discrete expression for the structurefactor (Eq. 1.4), it can be expressed as a continuous summa-tion (integration) of the electron density distribution over thewhole unit cell volume.

Fhkl =∫

vρ(x, y, z) e2π i(hx+ky+lz)dv . (1.6)

Substituting electronic density expression (Eq. 1.5) inEquation 1.6 and after some operations it is not so difficultto arrive to

ρ(x, y, z) = 1

V

h

k

l

Fhkl e−2π i(hx+ky+lz), (1.7)

where the structure factors are the coefficients of this sum-mation in the Fourier expansion.

Each structure factor contains contributions from all atomsin the unit cell. Its value (module and phase) will be deter-mined by the electron density distribution along the directionperpendicular to its associated diffracting family of planes.

The reciprocal space lattice weighted by the correspond-ing structure factors is the Fourier transform of the electrondensity distribution of the crystal structure. Therefore, thereciprocal lattice construction is a very convenient repre-sentation of the diffraction pattern. To obtain this informa-tion, every measured diffracted intensity has to be processed(see Section 1.3.3) to get the structure factor module after

Page 8: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

8 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

normalization and correction for Lorentz, polarization, andabsorption effects.

1.2.7 The Phase Problem

Only X-ray-diffracted intensities can be measured, and fromthem only the value of the amplitude (module) can be esti-mated. No direct information about the phase can be recordedin the X-ray diffraction experiment, and the reciprocal spacelattice could only be weighted by structure factor ampli-tudes. To calculate the Fourier transform and obtain the three-dimensional electron density map of the crystal, the value forthe phase of each reflection is needed. The crystallographermust obtain the phase angles from further experimentation asdescribed in Section 1.3.4. This is what is called the “phaseproblem” in crystal structure determination [6, 7].

1.3 THE STRUCTURE DETERMINATIONPROCESS

Determining the structure of a macromolecule is a processthat consists of various steps, comprising many different tech-niques. The macromolecule, or complex of macromolecules,may have to be expressed in a suitable system if it cannot beisolated from natural sources. For this, a suitable expressionvector will need to be constructed, involving genetic engi-neering and/or cloning. The molecule or complex of interestwill have to be isolated and purified, either from its naturalsource or from the expression host in sufficient amounts, usu-ally several to many milligrams. Then, many different crys-tallization trials are performed. When crystals are obtained,they have to be manipulated to allow data collection, andwhere necessary, heavy atom derivatives may need to be pre-pared. Cocrystallization or crystal soaking experiments withnatural or artificial ligands may also be performed. This parttakes place in the laboratory, that is, in vitro. Data processing,structure determination, model construction, refinement, andvalidation take place in silico, using specialized computerprograms developed to such end. All these steps are dis-cussed below.

1.3.1 Sample Production and Conditioning

High-quality samples may be obtained by careful purifica-tion from natural sources in which the macromolecule orcomplex of interest is present in sufficient amounts andat high enough concentration to make purification feasibleand worthwhile. Examples are myoglobin from sperm whalemeat [8], hemoglobin from blood [9], elongation factor Tuand ribosomes from bacteria [10, 11], F1-ATPase from beefhearts [12], and light harvesting center from spinach leaves[13]. However, in many other cases, the macromolecule orcomplex of interest needs to be expressed in bacteria, yeast,insect cells, or mammalian cells.

1.3.1.1 Protein Expression in Bacteria For expression inprokaryotic systems (most often the bacterium Escherichiacoli), expression vectors have to be constructed. Usually,expression plasmids are used. Plasmids are small circularDNAs that replicate in the bacterium independently fromthe chromosome. To select for bacteria containing the plas-mid during cultivation, plasmids contain a gene encodinga protein that confers resistance to a certain antibiotic. Forexample, they may encode a gene for beta-lactamase, whichhydrolyses ampicillin and carbenicillin. Other commonlyused antibiotics are kanamycin, streptomycin, and chlo-ramphenicol with their corresponding resistance-conferringgenes. Positive selection of plasmid-containing bacteria isnecessary because, without selection, bacteria without incor-porated plasmid will inevitably have a growth advantagedue to less energy expenditure and they will thus outgrowplasmid-containing ones.

To allow for replication in bacteria, plasmids must containan origin of replication, the type of which also determineswhether the plasmid is present at higher or lower copy num-bers. In many cases, high plasmid copy numbers are desir-able, because it facilitates plasmid purification and allows forthe expression of high amounts of protein in less time. How-ever, in cases where the protein folding rate is limiting, it maybe preferable to have lower plasmid copy numbers, leadingto less rapid protein expression and thus giving more time tothe expressed proteins to fold correctly. Growing the culturesat lower temperatures may also promote correct folding.

The promoter and its location upstream of the gene tobe expressed included in the expression vector determinethe amount of messenger RNA that will be produced andthus, indirectly, the rate and amount of protein that willbe expressed. In principle, constitutive expression may beemployed, but unless the expressed protein is useful for theexpression host (e.g., a chaperone), the extra expenditureof energy to produce the protein will be disadvantageousand mutants that do not express the protein will accumulateduring repeated growth/dilution cycles. Therefore, severalinducible expression systems have been developed. Many usethe PLAC, PTAC, or PTRC promoters, inducible with the lactoseanalogue isopropyl-beta-d-thiogalactoside [14, 15]. Anotherpopular system uses the PT7 promoter, the late promoterof bacteriophage T7 [16]. In this case, first T7 RNA poly-merase has to be produced, which is usually achieved using anexpression host that contains a lambda lysogen called DE3,which encodes T7 RNA polymerase under the control ofthe isopropyl-beta-d-thiogalactoside-inducible lacUV5 pro-moter. The T7 RNA polymerase then produces the messengerRNA of interest.

Most inducible systems allow some protein expressioneven before induction. This means that if the protein or com-plex to be expressed is toxic to the host cells, a system withstrong repression before induction must be used. An exam-ple of such a system uses the PBAD promoter of the E. coli

Page 9: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

THE STRUCTURE DETERMINATION PROCESS 9

arabinose operon and its regulatory gene araC, allowingstrong repression in the absence of l-arabinose (and evenstronger repression if glucose is added to the culture media)and high levels of messenger RNA generation after inductionwith l-arabinose [17]. In case the protein to be expressed con-tains cystine bonds, expression in the reducing bacterial cyto-plasm may lead to incorrectly folded protein. In this case, theprotein to be expressed may be directed to the less-reducingbacterial periplasm compartment via an N-terminal signalpeptide or bacterial strains mutated in thioredoxin reductase(trxB) and/or glutathione reductase (gor) may be used (likethe E. coli Origami strain). For some proteins, coexpressionwith a specific chaperone or chaperones may be necessaryfor correct folding [18]. They may be encoded on the sameplasmid or another plasmid to be cotransformed into the bac-teria or their coding sequence may be integrated into the hostgenome. Another reason for low expression levels may bethat the heterologous gene contains a codon that is very rarein the bacterium used. Use of a strain overexpressing raretRNA species may resolve this problem (for instance the E.coli Rosetta strain).

When the object of interest is a protein complex, pro-teins may be mixed after purification or after expression,and the resulting complex is purified directly. Proteins mayalso be coexpressed using expression vectors encoding twoor more proteins or by the use of multiple expression vec-tors in the same bacterial host. These multiple expressionvectors should be compatible and encode different antibioticresistance genes, so that selection using the relevant antibi-otics simultaneously forces the bacteria to maintain all theplasmids. Terpe [19] has written a short but comprehensivereview of commonly used bacterial expression systems.

1.3.1.2 Protein Expression in Eukaryotic Systems Notall eukaryotic proteins fold correctly in prokaryotic expres-sion systems, in which case expression in eukaryotic systemsmay be tried. Eukaryotic systems may also be necessary if theexpressed protein is to contain certain posttranslational mod-ifications. As a single-celled and innocuous organism, theyeast Saccharomyces cerevisiae has been most extensivelystudied for protein production (reviewed in Reference 20].Expression plasmids have been developed with sequencesfor propagation in E. coli for DNA amplification and in yeastfor protein expression experiments, including yeast promot-ers and terminators for the production of messenger RNA.Chromosomal integration of a suitable protein expressioncassette is also an option, as plasmids are not always sta-bly maintained in yeast cells. Another yeast species, Pichiapastoris, is noted for its high endogenous protein produc-tion capacity and is also used routinely [21]. In P. pastoris,expression vectors that integrate into the genome appear to bethe norm. In both yeast systems, the proteins to be expressedmay be directed to the medium or allowed to accumulateintracellularly.

Cloning the gene to be expressed in a viral vector andinfecting eukaryotic cells with the resulting viruses is alsoa system that can produce high yields of protein. The sys-tem that is developed most for protein expression is infect-ing insect (lepidopteran) cells using recombinant baculovirus[22]. Recombinant baculoviruses are constructed by replac-ing the polyhedrin gene by a gene encoding the protein ofinterest. Expression is controlled by the strong late polh pro-moter, which thus allows the production of the recombinantprotein at high yield. In vivo, polyhedrin is produced at highamounts (up to 50% of the total infected larva protein mass)and is necessary to form occluded virus, which can survivein the environment until uptake by a new feeding caterpil-lar. In vitro, polyhedrin is not necessary for virus survivalbecause budded virus can readily infect cultured insect cellsand replicate in them. Methods to express multiple proteinsto form protein complexes in the baculovirus/insect systemhave been developed [23]. Other viral systems that have beendeveloped for protein expression include vaccinia virus [24],which allows transient expression in human cell lines (suchas HeLa cells). Usually, the PT7 promoter is used, and the T7RNA polymerase necessary for this is either constitutivelyexpressed in the cell line used or included in the recombinantvaccinia virus vector.

The DNA containing the gene for the protein to beexpressed can also be transferred into eukaryotic cells bytransfection. For this, a suitable DNA vector is usually con-structed as a plasmid in E. coli and transfected into mam-malian cells by electroporation or using cationic lipids (lipo-fection) for transient expression [25]. Popular cell lines areHEK293 [26], derived from human embryonic kidney andCHO, derived from Chinese hamster ovary. Cells that haveincorporated the DNA into their genome and express therecombinant protein in a stable manner may be selected.

1.3.1.3 Cell-Free Protein Expression In case the proteinto be expressed is toxic for living cells or very prone to degra-dation, a cell-free in vitro translation system may be a viable,albeit more expensive, solution. For in vitro protein expres-sion, first messenger RNA must be produced by in vitro tran-scription. Bacteriophage T7 RNA polymerase may be usedfor this. In this system, the gene of interest is cloned behinda T7 promoter, allowing large amounts of messenger RNAto be produced when DNA, nucleoside triphosphates, and T7RNA polymerase are mixed. For the translation step, apartfrom the messenger RNA, many other components are nec-essary (initiation factors, ribosomes, transfer RNAs, elonga-tion factors, amino acids, ATP and GTP, termination factors,ions), so that usually cell extracts are used that contain allof them. Examples are rabbit reticulocyte lysate and wheatgerm extract. Coupled systems are available in which thetranscription and translation steps occur in the same tube,either by the same cell extract (such as an E. coli extract) orby mixing the components necessary for the two steps. An

Page 10: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

10 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

advantage of in vitro systems is that ligands or other proteininteraction partners may be added, which in vivo may notbe taken up by the cell or be degraded by living cells beforethey can interact with the expressed protein. These interac-tion partners may make the protein more soluble and/or morestable [27]. More details about cell-free expression systemsare available in specialist books, such as that by Spirin andSwartz [28].

1.3.1.4 Production of Nucleic Acids For the study ofDNA and RNA structure (alone and in complex with nucleic-acid-binding proteins like transcription factors and restrictionenzymes), crystallization-quality nucleic acids will need tobe obtained. DNA molecules may be synthesized chemically,and many companies provide oligonucleotide synthesis ser-vices. RNA oligos may also be synthesized but are morecostly and difficult to produce due to the necessity of protect-ing the extra 2′-hydroxyl group. RNAs may also be producedin the lab by in vitro transcription [29]. The template may bea pair of complementary DNA oligonucleotides encoding theT7 promoter and the sequence of the RNA to be produceddownstream from it. A gene encoding the RNA molecule tobe produced may also be cloned into a plasmid under con-trol of the T7 promoter, the plasmid amplified in E. coli andpurified in large amounts. After linearization of the plasmidwith an efficient restriction enzyme, T7 RNA polymeraseis added along with nucleoside triphosphates, leading to theproduction of large amounts of RNA. For efficient transcrip-tion by T7 RNA polymerase, the first few bases of the RNAto be produced should be purines, while the sequence of the3′ end is determined by the restriction enzyme used. To avoidthese restrictions, a 5′ cis-acting autocleaving hammerheadribozyme may be encoded 5′ and 3′ of the sequence to beproduced [30]. These authors also pioneered the use of therestriction enzyme BsmAI that cleaves 5′ to its recognitionsite to digest the template DNA prior to transcription. In thisway, no restrictions exist for the sequence at the 3′ end of thedesired RNA.

1.3.1.5 Purification and Conditioning After production,the macromolecules or complexes to be crystallized need tobe purified. Oligonucleotides, where irreversible unfolding isless of a problem than for proteins, may be purified by poly-acrylamide gel electrophoresis or high-performance liquidchromatography. If the protein is present in the cultivationmedium, it may be purified directly from it after removalof the expression host cells by centrifugation. This has theadvantage of a relative absence of insoluble contaminants butthe disadvantage of a relatively large volume. If the proteinis produced intracellularly or is to be purified from a naturaltissue source (e.g., meat or spinach leaves, see Section 1.3.1),a crude extract will need to be prepared. Cells will need to bebroken by grinding, sonication, treatment with a hypotonicsolution, detergent treatment, or treatment with a cell-wall

destroying enzyme like lysozyme. In the case of soluble pro-teins, cell debris may be removed by centrifugation and theprotein purified from the soluble fraction, while in the caseof membrane proteins, the protein may be extracted fromthe membrane with detergents. If the protein of interest isexpressed as inclusion bodies, these may be purified by dif-ferential centrifugation and sucrose gradient centrifugationand the protein refolded from these inclusion bodies [31].However, protein refolding is often not straightforward andthere is no guarantee of success.

To facilitate purification, proteins may be expressed withpurification tags or as fusion proteins. The first purificationstep may then be performed using affinity chromatogra-phy, examples are metal affinity chromatography for pro-teins containing an oligohistidine sequence, a matrix witha modified streptavidin for proteins with a streptavidin-recognizing octapeptide, amylose–resin chromatography forproteins expressed as maltose-binding protein fusions, orglutathione affinity chromatography for protein containinga glutathione S-transferase tag. Maltose-binding protein andglutathione S-transferase have the additional advantage thatthey may help the target protein stay soluble during expres-sion, although for crystallization such a large fusion partneris likely to be detrimental and would have to be removed(usually by including a specific protease site between thetwo fusion partners). If no purification tag is present, usu-ally some bulk fractionation step needs to be performedbefore proceeding to more traditional column chromatog-raphy steps. These may include ammonium sulfate precipi-tation, streptomycin sulfate precipitation to remove nucleicacids, or sucrose gradient centrifugations to isolate largecomplexes. Then, purification takes place using anion and/orcation exchange chromatography and size exclusion chro-matography (often as a final “polishing” step). It should bestressed that no universal purification protocols are avail-able and specialized schemes have to be developed for eachparticular protein.

During and after purification, the identity and state ofthe sample should be verified. In the case of proteins, N-terminal sequence analysis (Edman degradation) and massspectrometry can be used to verify the identity of the proteinand to verify that the N-terminus (and sometimes C-terminus)are as expected. In the case of enzymes and macromoleculesthat bind specific ligands, activity and binding assays may beperformed to verify identity and correct folding.

For successful crystallization, it is usually necessary toconcentrate the purified macromolecule to values of morethan 10 mg/mL. Although proteins have been successfullycrystallized from samples at 2 mg/mL or less, a higher con-centration increases the chances of success, and if the proteinis maintained soluble at 20, 50, or even 100 mg/mL, crys-tallization trials may be setup at these higher concentrations.Concentration of macromolecular samples may be achievedby filtration using membranes through which the protein does

Page 11: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

THE STRUCTURE DETERMINATION PROCESS 11

not pass. The necessary pressure to force the buffer throughthe membrane may be provided by centrifugation or pres-surized nitrogen or air. Alternative methods include proteinprecipitation by ammonium sulfate followed by dialysis or bycovering a dialysis tube containing the sample with polyethy-lene glycol powder, removing solvent from the sample butretaining the macromolecule in the tube, optionally followedby dialysis.

Crystals consist of regularly repeating units of the samemolecule or complex, each in the same conformation. Inorder for a sample to successfully crystallize, purity is veryimportant. Therefore the minimum amount of buffer com-ponents to keep the protein stable should be included—infact, many macromolecules are stable in water alone, and thepurification buffer can be exchanged for water or the min-imum buffer in the last concentration or dialysis step. Thechemical purity of the macromolecule or complex may beassessed using denaturing gel electrophoresis. This shouldalso reveal if the protein is intact or whether proteolysis mayhave occurred during expression and purification.

While chemical purity is necessary, it is not sufficient;conformational homogeneity is just as important. Typicalcauses of conformational heterogeneity may be partial andunspecific aggregation, unfolding or flexible domains. Theaggregation state of the protein may be investigated by nativegel electrophoresis, size exclusion chromatography, dynamiclight scattering, or analytical ultracentrifugation. The fact thata protein forms oligomers is not necessarily a problem, aslong as it forms a homogeneous population of them, leadingto a monodisperse sample. Certain proteins may need to formspecific oligomers to perform their natural function and maynot even be as stable as monomers. If the macromoleculeor complex is large enough, it may be useful to observesingle particles by electron microscopy, which may quicklyreveal large differences in conformation or oligomerizationstate using only small amounts of sample. Native gel elec-trophoresis or isoelectric focusing may also reveal multiplecharge states for the macromolecule. If this happens, thesemay need to be separated by ion exchange chromatographyor preparative isoelectric focusing.

To have a reasonable chance of crystallizing, the macro-molecule or complex of interest should be folded correctly.While many unfolded proteins aggregate unspecifically andoften even precipitate, some proteins may be perfectly solu-ble and monomeric, even when unfolded. The folding degreeof a protein may be judged by NMR spectroscopy, a foldedprotein should have a more disperse set of amide proteinresonances when compared to unfolded, random coil, pro-teins (see also Chapter 2). If it is suspected that the macro-molecule has disordered loops or larger flexible domains, itmay be necessary to remove these by limited proteolysis orby redesigning the expression vector. A specific ligand orinhibitor may also be included to try and lock the protein, thenucleic acid, or complex into a unique conformation.

1.3.2 Crystallization

Several different methods exist for obtaining crystals ofmacromolecules. In most of them, the solution containingthe macromolecules (the mother liquor) is mixed with a sim-ilar volume of precipitation solution and allowed to equili-brate with a larger volume of the same precipitant solution.Equilibration by vapor diffusion is the most commonly usedmethod. Traditionally, this was (and is) performed by thehanging drop method, placing the drop of mother liquoron a siliconized microscope cover slip and inverting thiscover slip over a well with precipitant solution in a Lin-bro plate. The borders of the well are sealed with mineraloil or vacuum grease. Currently, sitting drop vapor diffu-sion experiments are becoming more popular because oftheir relative ease of setup, ease of crystal harvesting, andsuitability for automatization. Sitting drop vapor diffusionexperiments can be sealed with extraclear tape, which per-mits opening individual wells by carefully removing the tapeonly from that well and resealing with a piece of the sametape.

Some proteins are sensitive to air, and although vapordiffusion experiments can be setup under a nitrogen atmo-sphere to prevent oxidation, dialysis may be a better option[12]. Microdialysis buttons are available for small volumes(5–350 μL of mother liquor), although these are still an orderof magnitude larger than the volumes used in vapor diffusionor microbatch experiments (see next paragraph). The but-tons are covered with a piece of dialysis membrane kept inplace with a rubber o-ring and incubated in a vial with alarge volume of precipitant solution. A further advantage ofthis method is that after crystal growth, ligands, cryoprotec-tant, and other components can be introduced into the motherliquor without disturbing the crystals by adding them to theprecipitant solution or exchanging the precipitant solutionand waiting for equilibration.

Macromolecules can also be crystallized in batch, bysimply mixing a concentrated solution of them with pre-cipitant solution and waiting. In microbatch experiments,protein solution is directly mixed with precipitant solu-tion and incubated under a layer of mineral oil, allowingfor slow evaporation of aqueous solvent through the oillayer. A percentage of silicon oil can be mixed in withthe mineral oil if faster evaporation is desired. This isoften done in Terasaki plates, which contain 60 or 72 smallwells.

Free interface diffusion is another commonly used tech-nique [32]. The solution containing the concentrated macro-molecules is brought into direct contact with the precipitantsolution in a capillary and slow free diffusion is allowed totake place through the small contact surface. The concentra-tion gradient that forms along the capillary allows samplingof a larger fraction of crystallization space in a smaller num-ber of experiments.

Page 12: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

12 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

Crystallization robots can significantly expedite the crys-tallization process, eliminating a lot of tedious manipula-tions and allowing for small-volume drops (typically 50μL). There are robots specialized in microbatch exper-iments or sitting-drop vapor diffusion, but multipurposeones are also available that can also perform hanging-dropvapor diffusion experiments. Robots generally use 96-wellplates, with the possibility of multiple crystallization dropsper well.

A typical initial screen consists of one or more 96-wellplates with very different conditions [33, 34], and if possi-ble, the same experiments are incubated at different tem-peratures (e.g., at 20◦C and 5◦C). Incubation should be inlow-vibration conditions. If crystals are obtained, they aremeasured to confirm they are protein, not salt or anothersmall-molecule additive, and to assess their diffraction limitand quality. If crystalline precipitates are obtained, furtherscreens are performed around these conditions to see if crys-tals can be obtained. At the same time, it is worth carefullyexamining the cloning, expression, and purification strategyto see if improvements in protein purity and conformationalhomogeneity can be obtained (see Section 1.3.1.5). In addi-tion to these initial more-or-less random screens, it is worthscreening common precipitants such as ammonium sulfateand polyethylene glycol at different concentrations, pH, andtemperatures. Precipitant solutions should be prepared usinghigh-grade chemicals. Other parameters that may be variedto obtain crystals or improve their size and quality are ini-tial protein concentration, drop size, and the ratio of proteinsolution to precipitation solution in the drop. Additives ofdifferent classes may be tried, such as multivalent cations,common salts, chaotropes, reducing agents, polyamines, andorganic molecules.

The results of crystallization experiments include cleardrops and precipitates due to unspecific protein aggrega-tion. In these cases, future experiments in which the precip-itant concentration is increased or decreased, respectively,may yield more promising results. Phase separation in whichthe protein concentrates in an organic phase may also beobserved, and sometimes protein crystals nucleate on theedges of such organic phases. Crystalline precipitates mayform due to excessive nucleation or inversely, clusters of crys-tals due to insufficient nucleation sites. Sometimes, crystalsor crystal fragments useful for diffraction experiments maybe separated from these clusters. Single crystals may alsobe observed. Often, crystal growth is not equally efficientin all three dimensions and needle- or plate-shaped crys-tals result, but if the conditions are just right, crystals withsizes of 10–100 μm in all three dimensions may be obtained.Where crystals are too small, seeding drops with preformedmicrocrystals may lead to growth of larger crystals [35, 36].Seeding may also improve crystal qualities other than size.For more complete texts on protein crystallization, textbooksare available [37–39].

1.3.3 Data Collection and Processing

The first step of data collection is the recovery of the fragilecrystals from the crystallization setup. For room temperaturedata collection, they may be carefully transferred to a quartzcapillary and mounted in conditions in which the crystal willnot dry up or be able to attract moisture from the surroundingatmosphere and dissolve. They can also be picked up with anylon or plastic microloop about the same size as the crystal.The loop is then covered with a plastic hood filled with a dropof mother liquor. To prolong crystal life, a crystal can alsobe briefly incubated in a suitable cryoprotectant, and in thiscase, they can either be flash-frozen at 100 K inside a nitrogengas stream or in liquid nitrogen [40]. If data collection isthen performed at 90–120 K, a significant increase in crystallifetime can be obtained as radiation damage decreases atlower temperature [41].

The most common strategy setup used nowadays to mea-sure X-ray diffraction intensities is the oscillation method.Consecutive images are recorded for small rotation angles(0.25◦ to 2◦) around an axis perpendicular to the incidentX-ray beam [42].

Depending on the space group of the crystals obtainedand the structure solution method that is to be used, some-what different data collection procedures will need to beemployed. In all cases, complete datasets are necessary, andif the diffraction data anomalous signal is to be exploited,Friedel’s pairs will have to be collected for each reflectionat high multiplicity. This is because the anomalous intensitydifferences between Friedel’s pairs are generally small com-pared to the diffraction intensities. For high-symmetry spacegroups, a relatively small fraction of reciprocal space needsto be explored, while for lower-symmetry space groups, alarger fraction of reciprocal space will need to be covered,that is, more images per dataset will have to be collected.For structure solution by molecular replacement or isomor-phous replacement methods (see Section 1.3.4), high multi-plicity is not a necessity (although it is always an advantage),while for anomalous dispersion methods it is very important.High-multiplicity datasets will require longer data collectiontimes, while at the same time radiation damage will have to beavoided [43]. Therefore, to allow successful structure solu-tion, at times higher resolution data will have to be sacrificed(i.e., less exposure time per image) for data completenessand/or multiplicity. Once the structure is solved and morecrystals are available, one can always attempt to collect acomplete higher resolution dataset for the final refinementof the structure. Completeness means that as many as pos-sible reflections for this particular crystal structure are well-measured. A common mistake is to overexpose crystals inorder to achieve the highest possible resolution, leading tooverloading low-resolution reflections. In some cases thisproblem is best overcome by merging two datasets measuredat low- and high-beam intensity or exposure time.

Page 13: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

THE STRUCTURE DETERMINATION PROCESS 13

1.3.4 Structure Determination

Crystal structure determination is basically the resolution ofthe “phase problem”; different methods have been devel-oped to estimate phase values. The first protein structures,myoglobin and hemoglobin, were solved using multiple iso-morphous replacement (MIR) [44, 45]. These days, in manycases homologous protein structures are available and molec-ular replacement can be successful. The development ofvariable-wavelength X-ray sources at synchrotrons has ledto the increased use of anomalous dispersion-based methodsto solve crystal structures.

1.3.4.1 Molecular Replacement New structures isomor-phous to already known structures may be solved by Fouriersynthesis using the phases calculated from the previous struc-ture combined with the diffraction intensity data collectedfrom the new crystal. Isomorphous means both crystals havethe same space group, very similar cell parameters, and thesame orientation of the molecules in the asymmetric unit.Common examples are solving the structure of the sameprotein with a new ligand or with a point mutation in itssequence.

If a similar structure is known from a crystal that belongsto a different space group and/or with significantly differ-ent lattice parameters, structure solution using the molecularreplacement technique may be possible [46]. For molecu-lar replacement to work, the search model will have to bea significant fraction of the total structure to be solved andsufficiently similar in structure. In general, if the proteinsequence identity is 25–30% or more, a reasonable chanceof success can be expected. However, it should be remem-bered that sequence similarity is not what is important, butstructure similarity is, which means that sometimes molec-ular replacement can be successful with search models withless sequence identity or fail with search models with moresequence identity than the mentioned cutoff.

For structure solution by molecular replacement, in mostspace groups six parameters have to be determined: threerotation angles and three translation shifts to be applied tothe search model. Some exceptions are triclinic space groups(only the three rotation parameters, no translation parame-ters) and monoclinic space groups (three rotation angles andtwo translation shifts). A full six-parameter search can beperformed, but it is computationally very intensive, which iswhy most molecular replacement protocols first determinethe rotation angles, then the translation parameters, and thenperform a quick rigid body refinement to optimize all sixparameters at once (fitting).

Patterson functions, which can be calculated withoutphases, are calculated for the model and for the experimentaldata. Self-vectors in the Patterson function (intramolecularvectors from one atom in the molecule to another atom in thesame molecule) depend on the orientation of the molecule

and are used in the rotation function. The three angles wherethe self-vector Patterson functions are most similar to eachother determine the orientation of the search molecule in thecell. Cross-vectors in the Patterson function (intermolecu-lar vectors from the atom in the molecule to the equivalentones in the other molecules) depend on both the orienta-tion of the molecule and on its position in the cell. So, oncethe orientation is known, cross-vectors can be exploited inthe translation function to determine the translational shifts.Computer programs used for molecular replacement includeAMORE [47], MOLREP [48], and PHASER [49].

1.3.4.2 Direct Methods In case very high-resolution datacan be obtained and not too many atoms are present in theasymmetric unit, structure solution by direct methods maybe possible. Limit estimates are around 1.2 Å or better forthe resolution and up to 200–1000 nonhydrogen atoms inthe asymmetric unit, which is rare for crystals of macro-molecules. Direct methods are based on mathematical rela-tionships among certain combinations of phases. Cosine val-ues of phase combinations known as triplet structure invari-ants can be reliably estimated if measured intensities arelarge (i.e., good diffraction) and the number of atoms in theasymmetric unit is small. Multiple sets of trial phases areconstructed, and each phase is refined using these mathemat-ical relationships. In favorable cases, initial phase estimatesconverge toward a complete set of phases with small phaseerrors. Another approach is to try out random arrangementsof atoms in the asymmetric unit, simulate their diffractionpatterns, and compare these simulated patterns with thoseobtained from the crystals. Correct solutions should havehigh correlations between the simulated and the observeddiffraction patterns. Even if only physically possible arrange-ments of atoms are tried, the number of trial arrangements totest quickly gets too large for big molecules. However, evenfor large macromolecules or complexes, direct methods areoften used to locate the limited number of heavy atoms inderivative datasets, see Section 1.3.4.3. Programs for directmethods include SNB [50] and SHELX [51].

1.3.4.3 Isomorphous Replacement and Anomalous Dis-persion If molecular replacement is not successful, heavyatom derivatives will have to be produced for structure solu-tion by multiple isomorphous replacement (MIR), singleisomorphous replacement using anomalous signal (SIRAS),multiwavelength anomalous diffraction (MAD) [52], orsingle-wavelength anomalous diffraction (SAD) [53]. Com-mon derivatives are mercury compounds, which bind cova-lently to cysteine residues and are especially useful for MIRor (SIRAS), or selenomethionine derivatives, especially use-ful for the MAD method [54]. Specific radiation damage mayalso be used to solve macromolecular structures [55]. Heavyatoms naturally present in some proteins (i.e., metal-bindingproteins) may also be used for phasing, and in favorable cases,

Page 14: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

14 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

the anomalous dispersion properties of sulfur (proteins) andphosphate (nucleic acids) may help determining phases.

Heavy atoms are generally introduced into preformed pro-tein crystals by soaking techniques [56] although cocrys-tallization is also a possibility. Selenomethionine can beintroduced into proteins instead of methionine by grow-ing methionine-auxotroph bacteria in expression cultures inthe presence of selenomethionine [57] or by the inhibitionof the methionine synthesis pathway and provision of thenecessary amino acids and selenomethionine in expressioncultures. If no cysteines or methionines are present in thenatural sequence, these can be introduced by site-directedmutagenesis.

The isomorphous replacement technique uses the inten-sity differences between equivalent reflections of datasetsmeasured from crystals of the “native” macromolecule andcrystals of the same macromolecule in which one or a fewhighly ordered heavy atoms are present. The native crystaland the derivative crystal should be isomorphous: the latticeparameters and macromolecular structure should be the sameand the only difference should be the presence or absence ofthe heavy atoms. The structure factors of the derivative (Fph),the native (Fp) and of the heavy atom structure alone (Fh)are related by the relation Fp = Fph − Fh. The amplitudes|Fph| and |Fp| can be measured, and if Fh can be determinedby direct methods, a vector diagram shows there are twopossible solutions for the phase of Fp, of which only one iscorrect. This phase ambiguity can be resolved with a sec-ond derivative (or several more derivatives), hence the nameMIR. The anomalous dispersion signal, see Section 1.3.5,can also solve the phase ambiguity; this technique is calledsingle isomorphous replacement with anomalous signal ormultiple isomorphous replacement using anomalous signalif more than one derivative is used.

In the absence of anomalous dispersion, structure factorsfollow Friedel’s law (Section 1.2.5). If the crystal containsatoms that resonate with the X-ray radiation, anomalous dis-persion occurs, and Friedel’s law is no longer true. By com-paring reflections that should be symmetrically related byFriedel’s law, an anomalous dispersion effect can be mea-sured. The intensity signal due to the anomalous dispersioneffect is small but can be optimized when the X-rays used areof the energy corresponding to the adsorption edge of the res-onating atom. X-ray fluorescence emission scans indicate themagnitude of the effect and its dependence on wavelength.The anomalous dispersion effect gives phase information andcan be used in combination with isomorphous replacementas described above. In some cases, the SAD technique issufficient to produce reliable phases. In other cases, MADis necessary. This consists of measuring complete datasetsfrom the same crystal at different wavelengths, usually threeto five different wavelengths. For SAD or MAD to work, it isobviously necessary to have ordered heavy atoms (Se, Hg, Pt,Fe, Zn, Cu, etc.) in the crystal. A derivative that turned out not

to be isomorphous and thus unsuitable for MIR may be used.It is also common to “label” the protein during expressionwith Se-Met. An advantage of the MAD and SAD techniquesis that they do not have nonisomorphism problems, becausethe datasets are measured from the same crystal.

1.3.4.4 Density Modification Once reasonable startingphases have been determined and the resulting maps showsome interpretable features, density modification procedurescan significantly improve them. These procedures use priorknowledge of the distribution of the electron density in theasymmetric unit. Solvent flattening uses the observation thatin crystals of macromolecules a significant connected portionof the asymmetric unit is not occupied by the macromoleculebut by solvent. We also usually know the size of our macro-molecule, allowing us to make a reasonable guess for thesolvent content. If the starting phases are good enough toestimate which parts of the asymmetric unit are occupied bysolvent, the electron density in these regions can be set to aconstant value, typically 0.33 e Å−3 (protein electron densityaverages to 0.43 e Å−3 but is not constant and shows stronglocal variation in the protein region). However, in the proteinregion, negative density should be absent, and this knowl-edge is also incorporated, by resetting to zero regions wherethe density is negative.

From previously solved protein structures, the expecteddensity distribution in the protein region is also known.Using this density distribution histogram as a mold, smallalterations are made to the protein density to make the exper-imental density distribution histogram match the expectedone. If multiple copies of the macromolecules or complex arepresent in the asymmetric unit (i.e., noncrystallographic sym-metry, NCS, is present) and envelopes can be identified forthe NCS-related protomers, the density at NCS-related pointswithin these monomers can also be averaged and imposed tobe equivalent. The density modification process is cyclic: themodified map is back-transformed to give modified phases;these phases are recombined with experimentally determinedphases and a new map is calculated. This new map is thenagain modified. Programs for density modification includeSOLOMON [58], DM [59], RESOLVE [60], and SHELXE[61].

1.3.4.5 Combined Methods The methods mentionedabove may be combined into procedures that are more pow-erful than any by themselves. Two examples of combiningmolecular replacement with direct methods are describedhere. The program ACORN locates small user-defined pep-tide fragments by molecular replacement and performs phaserefinement by direct methods. It is useful for solving peptidesand small proteins when high-resolution data are available(1.2 Å or better) [62]. The ARCIMBOLDO procedure alsolocates small model fragments, alpha-helices in this case.It uses PHASER for molecular replacement and performs

Page 15: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

THE STRUCTURE DETERMINATION PROCESS 15

sophisticated density modification with SHELXE, allowingsuccess for relatively large proteins and with data extendingto 2 Å resolution or better [63].

Detailed discussions and explanations of macromolecularphasing methods are available in Taylor [64] and in severaltextbooks [65–69]. Programs and program pipelines com-monly used for phasing include SHARP [70], SOLVE [60],SHELX [51], and CRANK [71].

1.3.5 Electron Density Map Interpretation:Model Construction

Once an interpretable electron density map has beenobtained, a model for the protein will have to be built usingmolecular graphic programs or, if the map is of sufficientquality, in combination with automated building procedures(Fig. 1.4]. An important quality measure is the resolution,

which in general should be better than 2.3 Å in order toallow automated procedures to construct virtually completemodels. However, the completeness of the dataset is alsoimportant, as is the necessity that all reflections, includinglow-resolution ones, are measured well. Maps calculatedusing data with missing or badly measured low-resolutionreflections may suffer from reduced electron density con-nectivity and may be more difficult to use for constructinginitial models. At intermediate resolutions (2.5–3 Å), auto-mated structure building programs are unlikely to constructcomplete models but may still be useful for construction ofparts of the model.

At lower resolutions, where a complete model needs to beconstructed manually, the, for now, superior pattern recogni-tion capability of the human brain plus additional knowledgeabout the macromolecule studied (expected fold, ligands,etc.) is employed. It is often useful to skeletonize the map.

FIGURE 1.4 Construction of protein models in electron density maps. Top left: electron densitymap obtained by experimental phasing using the program SOLVE. Top right: Map with skeletoncalculated by COOT. Bottom left: Model obtained by autotracing with RESOLVE superimposed onthe map. Bottom right: Refined protein model including some water atoms (yellow crosses). Seeinsert for a color representation of the figure.

Page 16: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

16 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

A skeleton is a collection of lines representing connectedregions of electron density in the map. If the map is of suf-ficient quality, this skeleton should be similar to the proteinchain trace and often will allow one to estimate the fold.The skeleton is edited to remove spurious connections andto introduce connections the procedure failed to identify.The program O [72] is useful for this and also contains othermodel construction facilities. Once the edited skeleton resem-bles the protein chain trace as much as possible, it is replacedbit by bit with amino acids, in the first instance by a polyala-nine chain. Using the known protein sequence, regions ofthe map are then carefully inspected to see if short sequenceof side chains can be recognized. Once the protein chain isreasonably complete and at least some of the side chains areidentified, intermediate refinement runs (see Section 1.3.6)may be used to improve the density maps and calculate dif-ference density maps although care has to be taken not tointroduce “model bias” (model bias is caused by the calcu-lated phases from the model biasing the resulting electrondensity to the model rather than to the measured diffractionintensities). The improved maps and difference density mapscan then be used to identify nonmodeled density and wronglymodeled regions. A modern model construction program isCOOT [73], which contains many tools for building, refine-ment, analysis and validation of protein and nucleic acidstructures, ligands, and solvent molecules.

In high-quality electron density maps, automated modelconstruction programs like ARP/wARP [74], RESOLVE[60], or BUCCANEER [75] can identify large fractions ofthe protein chain and solvent molecules. Some programs, likeARP/wARP, can also automatically build nucleic acid struc-tures. The WARPNTRACE feature of ARP/wARP interpretselectron density maps as free atom models, bonds atoms toeach other if they are sufficiently close and resemble aminoacids, joins the amino acids in a protein trace, and refinesthe resulting protein-solvent hybrid model. It used the hybridmodel to calculate a new electron density map, which isthen used to find new free atoms and remove atoms thatno longer show electron density. It also docks the identifiedprotein chain into the known sequence [76]. The RESOLVEautomatic building procedure identifies helices and strandsby matching templates to the electron density. Then, frag-ments of helices or strands from a library are matched to thisdensity and extended in both directions using tripeptide frag-ment libraries. Subsequently, side chains are identified usinglibraries and the protein chains are assembled and are ascompact unit as possible. BUCCANEER identifies likely C-alpha positions in the electron density map. The best-fittingones are used as seed positions; the seeds are then grownusing the other C-alpha positions in the map into extendedchain fragments. The chain fragments are then docked intothe known protein sequence.

When a complete protein chain has been built, differencemaps can be used to identify ligands and solvent atoms.Ligands may be copurified from the expression system or

added during purification and crystallization. Care should betaken not to overinterpret initial maps, because model biasmay appear to confirm the presence of a ligand which maynot really be there. Normally, many ordered water moleculeswill be observed on the surface of the crystallized macro-molecule and can be validated both in terms of electrondensity and analysis of the hydrogen bonding with the macro-molecule and other solvent atoms. Ordered glycerol from thecryoprotecting solution or precipitant molecules, for exam-ple, a partially ordered polyethylene glycol molecule, mayalso be observed. Chemical reactions may also occur in thecrystallization drop, and their products may be observed, forexample an oxidized dithiothreitol molecule. Some of theidentified water molecules may in fact be ions, either met-als or other ions such as ammonium, sulfate, carbonate, orphosphate. Careful inspection of the maps and analysis ofthe coordinating atoms will be necessary to identify thesecorrectly. If the ligands or solvent molecules contain heavyatoms, they may show sufficient anomalous signal to helpcorrect identification. The program COOT has incorporatedfunctions to add many common ligand molecules for identi-fying and validating water atoms. Some automatic buildingprograms, like ARP/wARP, also identify solvent atoms.

1.3.6 Model Refinement

Once a complete protein model, including ligands andordered solvent molecules, has been built, the structureshould be refined using appropriate geometric restraints andthe best dataset available with respect to completeness andresolution. Refinement consists of making small changes tothe positional parameters and temperature factors of all atomssimultaneously, using a certain target function. Traditionallythe target consisted of minimizing the R-factor (Equation1.8), a factor that expresses residual disagreement betweenthe observed structure factor amplitudes (Fobs) and the calcu-lated ones (Fcalc). The R-factor is still an important statisticquoted in articles reporting macromolecular structures. Toavoid model bias, current practice is to remove a small frac-tion of reflections from the refinement target, which is thenused to calculate the R-free value [77]. Upon refinement, theR-free should drop to a similar extent as the R-value, sug-gesting absence of model bias. As a general rule-of-thumb,an R-free value of less than 0.3 is considered reasonable,although this depends on the quality of the data and at higherresolution lower values are expected, while at resolutionsworse than 3 Å, in some cases higher R-free values may beacceptable.

R = ||Fobs| − |Fcalc||/|Fobs|, (1.8)

At the resolutions typical for data collected from macro-molecular crystals, the total number of parameters to berefined is of the same order of magnitude as the structurefactor amplitudes they are to be refined against. This low

Page 17: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

THE STRUCTURE DETERMINATION PROCESS 17

FIGURE 1.5 Examples of a bad (left) and good (right) Ramachandran plot. On the left is Ramachan-dran plot of a partially built, unrefined structure, on the right the final structure as deposited in thePDB (code 2XGF). Notice how in the plot on the right all residues are in allowed (dark gray) regionsand most are in preferred regions (gray). Plots were generated using the program MOLPROBITY.

data-to-parameter ratio makes cross-validation necessary (R-free value, see previous paragraph) and imposes the use ofrestraints and constraints. Constraints effectively reduce theamount of parameters to be refined. At very low resolu-tion, one may for instance refine groups of amino acids orwhole protein domains as rigid bodies. At 1.5–3 Å resolution,refinement can only be performed if appropriate geometricrestraints are included. These restraints effectively augmentthe number of data points and include the distances betweenbonded atoms, their bond angles, and certain torsion angles.The planarity of the atoms involved in peptide bonds, car-boxyl and carboxamide groups, and aromatic groups is alsorestrained, as is the minimal distance between noninteract-ing atoms. Temperature factors may also be constrained to bethe same between groups of atoms or restrained not to varytoo much between neighboring atoms. NCS, if present, mayalso be used to constrain or restrain multiple copies of thesame macromolecule to be the same or similar. In this case,one should keep in mind that legitimate differences mayof course exist between certain portions, and these shouldbe removed from the restraints.The most common programsused for refinement are REFMAC [78] and PHENIX [79].REFMAC uses a maximum-likelihood target while PHENIXcan also use a least-squares target.

1.3.7 Validation

Validation of the solved and refined macromolecular struc-ture is a necessary quality-control step, as important errorsin model building and refinement may have gone unnoticed.

The validation process judges parameters used in refinementsuch as bond distances, bond angles, certain torsion angles,correctness of chiral centers, planarity of groups of atoms thatshow resonance (such as atoms involved in peptide bonds,carboxylate and carboxamide groups, and aromatic rings),van der Waals distances, hydrogen bonds, and coordina-tion distances to metals. Usually, during refinement, care hasalready been taken to keep these parameters at sensible val-ues, and they are therefore not truly independent parameters.The temperature factor distribution should also be sensible inthat connected atoms should not have very different tempera-ture factors and that high temperature factors are restricted toatoms that have room to move in the structure, that is, are onthe surface of the protein. Refinement programs may includethese temperature factor restrictions.

Validation can also and should be used to verify indepen-dent parameters that were not used in refinement. A goodexample of parameters not usually refined are the phi and psitorsion angles, which are usually represented in a Ramachan-dran plot (Fig. 1.5) [80]. Certain combinations of phi and psiangles are much more common than others, while other com-binations are highly unlikely or even physically impossible.The probability depends on the nature of the side chain of theamino acid. A glycine residue, which lacks a side chain, canadopt more different conformations than other amino acids,while proline, with its atypical side chain covalently bondedto both the alpha-carbon and the nitrogen atom, has a morerestricted conformation space than other residues. A structureis also expected to contain all or nearly all of the peptidesin trans-conformation, although in rare cases cis-peptides

Page 18: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

18 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

may occur, especially when the amino acid C-terminal tothe peptide bond is a proline. Side chains also have energet-ically favored orientations (preferred rotamers), which canbe expressed as combinations of their torsion angles (Chi1,Chi2, etc.). Certain amino acids have parameters that can bespecifically checked. Prolines should show a distinct puck-ering, in which the gamma-carbon is rotated either above orbelow the approximate plane formed by the alpha-carbon,nitrogen, and other side-chain carbons. Asparagine, glu-tamine, and histidine residues have pseudosymmetric sidechains; they should be positioned, and if necessary “flipped”,to optimize hydrogen bonding. The likely protonation state ofthe residues, especially histidine, should be taken account forthis. Finally, an important validation parameter is whether allamino acids are in suitable environments in relation to theirnature. Apolar and aromatic side chain should preferably beburied in the hydrophobic core of the protein. Polar groupsshould be in polar environments, either in contact with sol-vent or with other polar residues. Electrostatic charges shouldbe neutralized by other charged atoms of the protein or sol-vent. Validation for nucleic acid structures is less developedand limited to checking of bond lengths and angles, sugarpucker, hydrogen bonding, and contact analyses.

Several computer programs are available for automatedvalidation checks, either as stand-alone programs or as webservers. PROCHECK [81] checks basic validation parame-ters and outputs Ramachandran plots. WHATCHECK [82]performs more extensive checks. The web-based MOLPRO-BITY server [83] and the PHENIX [79] validation optionsare more modern implementations. POLYGON [84] com-pares model quality indicators to similar structures in thedatabase.

1.4 STRUCTURAL ANALYSIS ANDBIOLOGICAL IMPLICATIONS

Once the structure has been solved and preferably refined tocompletion, the structure will have to be analyzed. Firstly, tojudge whether the structure is similar to other known struc-tures or whether perhaps a new fold has been discovered.Further analysis concerns the biological interest of the struc-ture, which in turn can provide hypotheses that can be testedby additional biochemical or structural analyses.

1.4.1 Structural Analysis

If the protein structure has been solved by molecular replace-ment, the final structure will have significant structural sim-ilarity to the input model and most likely will have the samefold. For de novo structure solutions, the program DALIcan perform similarity searches against the protein struc-ture database automatically [85]. The program outputs dif-ferent similarity scores and a structural alignment, and also

a superposition matrix. This matrix can be used to superim-pose the structures and inspect them for structural similarityand differences using a structure visualization program. Inthe case of multidomain structures, analyses will have to beperformed with all domains separately. In some cases, nostructural homologs can be identified, and a truly new foldhas been identified. However, in most cases one or more clearstructural homologs can be identified. In any case, the topol-ogy of the new structure should be determined. The fold ortopology is defined as the composition of secondary structureelements and their interconnection. If one or more structuralhomologs are identified, the topology of the new structureshould be compared with the previously analyzed structuresto see if they have identical topologies or whether there aresome different interconnections between the secondary struc-ture elements. The composition of secondary structure ele-ments determines whether the protein falls in the family ofalpha-helical structures, beta-structures, or mixed alpha/beta-folds. The SCOP (structural classification of proteins) [86]and CATH (class-architecture-topology-homologous super-family) [87] databases both aim to further classify all theexisting protein folds at different hierarchical levels.

While some macromolecular structures exclusively existas monomers, others form stable multimers, exist in differentquaternary states depending on conditions, or are in dynamicequilibrium between different quaternary states. Due to thehigh concentrations used in crystallization, usually the high-est possible multimeric state is observed in crystals. Further-more, additional interaction interfaces are often observed inthe crystal that turn out not to have biological relevance.The program PISA attempts to discriminate between gen-uine interaction interfaces and fortuitous crystal contactsby calculating interaction surfaces and complexation ener-gies and entropies [88]. The quaternary structure in solu-tion can also be investigated using analytical ultracentrifuga-tion, dynamic light scattering techniques, or size exclusionchromatography.

1.4.2 Biological Implications

Although crystal structures are static, conclusions about pro-tein movement can often be drawn. Local flexibility is oftenindicated by higher atomic displacement parameters of thecorresponding atoms, although this is usually limited to loopson the surface of the protein. Comparison of different crys-tal forms of the same protein may locate hinges in the pro-tein structure around which surrounding domains may move.Larger-scale movements can sometimes also be obvious. Aclear example is F1-ATPase, where the presence of threealpha- and three beta-subunits alternating with each other ina ring around a seventh gamma subunit clearly indicated thepossibility of relative rotation of the alpha-beta ring aroundgamma [89]. This rotation was later demonstrated, amongother methods, by microscopy [90].

Page 19: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

REFERENCES 19

Inspection of the structure during modeling (Section1.3.5), refinement (Section 1.3.6), and validation (Section1.3.7) stages may have turned up ligand molecules associ-ated with the macromolecule. These ligand molecules mayhave been copurified from the expression host, may havebound to the protein in the purification process, or may becomponents of the crystallization or cryoprotection mixture.Observed ligands may be substrates, products, cofactors, orinhibitors of the crystallized macromolecule in case it is anenzyme. In these cases, the observed structure and bindingmode often allow reasonable proposals to be made about thereaction mechanism. Ligands may also mimic natural sub-strates, products, cofactors, or inhibitors (for instance, sulfatefrom the precipitant mimicking phosphate or multiple glyc-erol molecules mimicking a more complex carbohydrate).Computational ligand docking may also be performed, eitherusing the protein as a static entity or incorporating inducedfit principles [91].

Analysis of a protein surface may indicate regions impli-cated in interaction with other biomolecules. Various struc-ture visualization programs may be used to predict sur-face potential, for example, GRASP [92], or either PYMOL(The PyMOL Molecular Graphics System, Version 1.5.0.4.Schrodinger, LLC) or CHIMERA [93] in combination withAPBS [94]. Proteins interacting with nucleic acids often havepositively charged patches in the regions that interact withthe RNA or DNA phosphate groups (for an example, seeReference 95]. Shape complementarity may also be usedto predict interaction partners; and computational protein–protein docking approaches may lead to useful binding modehypotheses [96].

Experiments to verify hypotheses suggested by the crys-tal structure may be very diverse in nature. Binding assaysmay be performed to confirm interactions in solution and tomeasure binding affinity. Implication of specific amino acidsor nucleotides in binding sites may be confirmed by site-directed mutagenesis and binding assays or other in vitro orin vivo experiments. New crystal structures may be deter-mined with related ligands to test if their binding mode issimilar to that observed in the original crystal. In general,the solution of the crystal structure of a new macromoleculeopens up a multitude of new research directions, making theendeavor described in Section 1.3 very worthwhile.

1.5 FUTURE PROSPECTS

From its history of over half a century and the precedingparagraphs, it can be concluded that macromolecular crys-tallography is a mature technique, embedded in the maturescience of structural biology. When well-diffracting crystalscan be obtained, determination of small- and intermediate-size soluble protein structures is almost routine and seenas a technique among others to be used to understand a

specific biological process. Indeed, in high-throughput struc-tural proteomics projects, structures are determined in a mat-ter of weeks from construction of the expression vector to afully refined structure, although with a limited success rate(see for instance Reference 97). Automatization of cloning,expression, purification, crystallization, X-ray data collec-tion, and structure solution and refinement steps have signif-icantly contributed to this. The development of better algo-rithms for phase determination has also been important.

Continuing improvements in X-ray beam intensity, com-bined with a reduction in beam size and larger and more sen-sitive detectors, also mean that ever larger macromolecularcomplexes can be studied (provided that they can be crys-tallized of course). A good example is the ribosome [98].In spite of their importance for cellular processes and phar-macology, structures of integral membrane proteins are stillunder-represented in databases, but this is mainly due to thedifficulty in producing them in large amounts and with thepurity and homogeneity necessary for successful crystalliza-tion.

Protein structure prediction is another field in which sig-nificant improvements have been made and in several cases,predicted, albeit homologous, structures were good enough toserve as a molecular replacement model in de novo structuresolution [99]. Nevertheless, in the near infinity of sequenceand fold space there will always be “orphan” proteins withinteresting functions and new folds which will need con-tinued dedication of a specialized crystallography group inorder to successfully determine their structures.

ACKNOWLEDGMENTS

We thank Carmela Garcıa-Doval and Bruno Dacunha-Marinho for careful reading of the manuscript. We acknowl-edge funding by the Spanish Ministry of Science and Inno-vation (grants BFU2008-01588 and BFU2011-24843) andthe European Commission (BeNatural coordinated project,contract NMP4-CT-2006-033256).

REFERENCES

1. Schwarzenbach D. Crystallography. 1st ed. West Sussex, UK:Wiley & Sons; 1996. p 89–99.

2. Seivert MM, Ekeberg T, Maia FRNC, Svenda M, AndreassonJ, Jonsson O, Odic D, Iwan B, Rocker A, Westphal D, HantkeM, DePonte DP, Barty A, Schulz J, Gumprecht L, Coppola N,Aquila A, Liang M, White TA, Martin A, Caleman C, SternS, Abergel C, Seltzer V, Claverie JM, Bostedt C, Bozek JD,Boutet S, Miahnahri AA, Messerschmidt M, Krzywinski J,Williams G, Hodgson KO, Bogan MJ, Hampton CY, SierraRG, Starodub D, Andersson I, Bajt S, Barthelmess M, SpenceJCH, Fromme P, Weierstall U, Kirian R, Hunter M, Doak RB,Marchesini S, Hau-Riege SP, Frank M, Shoeman RL, Lomb L,

Page 20: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

20 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

Epp SW, Hartmann R, Rolles D, Rudenko A, Schmidt C, Fou-car L, Kimmel N, Holl P, Rudek B, Erk B, Homke A, Reich C,Pietschner D, Weidenspointner G, Struder L, Hauser G, GorkeH, Ullrich J, Schlichting I, Herrmann S, Schaller G, SchopperF, Soltau H, Kuhnel KU, Andritschke R, Schroter CD, KrasniqiF, Bott M, Schorb S, Rupp D, Adolph M, Gorkhover T, Hirse-mann H, Potdevin G, Graafsma H, Nilsson B, Chapman HN,Hajdu J. Single mimivirus particles intercepted and imagedwith an X-ray laser. Nature 2011;470:78–81.

3. Hahn T. International Tables for Crystallography. Volume A.Space-Group Symmetry. 5th ed. Dordrecht, The Netherlands:Springer; 2005.

4. Prince E. International Tables for Crystallography. VolumeC. Mathematical, Physical and Chemical Tables. 3rd ed.Dordrecht, The Netherlands: Kluwer Academic Publishers;2004.

5. Caticha-Ellis S. Anomalous Dispersion of X-rays in Crystal-lography. Teaching Pamphlet No. 8. Cardiff, UK: InternationalUnion of Crystallography by University College Cardiff Press;1981.

6. Taylor GL. The phase problem. Acta Cryst D 2003;59:1881–1890.

7. Sayre D. X-ray crystallography: the past and present of thephase problem. Struct Chem 2002;13:81–96.

8. Kendrew JC, Parrish RG. The crystal structure of myoglobin.III. Sperm-whale myoglobin. Proc R Soc Lond B Biol Sci A1957;238(1214):305–324.

9. Perutz MF. Preparation of haemoglobin crystals. J CrystalGrowth 1968;2:54–56.

10. Morikawa K, la Cour TFM, Nyborg J, Rasmussen KM, MillerDL, Clark BFC. High resolution X-ray crystallographic anal-ysis of a modified form of the elongation factor Tu:guanosinediphosphate complex. J Mol Biol 1978;1255:325–338.

11. Wittmann HG, Mussig J, Piefke J, Gewitz HS, Rheinberger HJ,Yonath A. Crystallisation of Escherichia coli ribosomes. FEBSLett 1982;146:217–220.

12. Lutter R, Abrahams JP, van Raaij MJ, Todd RJ, LundqvistT, Buchanan SK, Leslie AG, Walker JE. Crystallisationof F1-ATPase from bovine heart mitochondria. J Mol Biol1993;229:787–790.

13. Liu Z, Yan H, Wang K, Kuang T, Zhang J, Gui L, An X, ChangW. Crystal structure of spinach major light-harvesting complexat 2.72 Å resolution. Nature 2004;428:287–292.

14. de Boer HA, Comstock LJ, Vasser M. The tac promoter: afunctional hybrid derived from the trp and lac promoters. ProcNatl Acad Sci USA 1983;80:21–25.

15. Amann E, Ochs B, Abel KJ. Tightly regulated tac promotervectors useful for the expression of unfused and fused proteinsin Escherichia coli. Gene 1988;69:301–305.

16. Studier FW, Moffat BA. Use of bacteriophage T7 RNA poly-merase to direct selective high-level expression of clonedgenes. J Mol Biol 1986;189:113–130.

17. Guzman LM, Belin D, Carson MJ, Beckwith J. Tight regu-lation, modulation, and high-level expression by vectors con-taining the arabinose PBAD promoter. J Bact 1995;177:4121–4130.

18. Bartual SG, Garcia-Doval C, Alonso J, Schoehn G, vanRaaij MJ. Two-chaperone assisted soluble expression andpurification of the bacteriophage T4 long tail fibre protein gp37.Protein Expr Purif 2010;70:116–121.

19. Terpe K. Overview of bacterial expression systems for heterol-ogous protein production: from molecular and biochemicalfundamentals to commercial systems. Appl Microbiol Biotech-nol 2006;72:211–222.

20. Romanos MA, Scorer CA, Clare JJ. Foreign gene expressionin yeast: a review. Yeast 1992;8:432–488.

21. Daley R, Hearn MTW. Expression of heterologous proteins inPichia pastoris: a useful experimental tool in protein engineer-ing and production. J Mol Recognit 2005;18:119–138.

22. Drugmand JC, Schneider YJ, Agathos SN. Insect cells as fac-tories for biomanufacturing. Biotechnol Adv 2012; 30: 1140–1157.

23. Trowitzsch S, Bieniossek C, Nie Y, Garzoni F, Berger I. Newbaculovirus expression tools for recombinant protein complexproduction. J Struct Biol 2010;172; 45–54.

24. Bleckwenn NA, Bentley WE, Joseph Shiloach J. Exploringvaccinia virus as a tool for large-scale recombinant proteinexpression. Biotechnol Prog 2003;19:130–136.

25. Geisse S, Fux C. Recombinant protein production by tran-sient gene transfer into mammalian cells. Methods Enzymol2009;463:223–238.

26. Thomas P, Smart TG. HEK293 cell line: a vehicle for theexpression of recombinant proteins. J Pharmacol Toxicol Meth2005;51:187–200.

27. Hoffman M, Nemetz C, Madin K, Buchberger B. Rapid transla-tion system: a novel cell-free way from gene to protein. BiotechAnn Rev 2004;10:1–30.

28. Spirin AS, Swartz JR. Cell-free Protein Synthesis: Methodsand Protocols. Weinheim, Germany: Wiley-VCH; 2008.

29. Francklyn CS, Schimmel P. Synthetic RNA moleculesas substrates for enzymes that act on transfer-RNAsand transfer-RNA-like molecules. Chem Rev 1990;90:1327–1342.

30. Price SR, Ito N, Oubridge C, Avis JM, Nagai K. Crystallisa-tion of RNA-protein complexes I. Methods for the large-scalepreparation of RNA suitable for crystallographic studies. J MolBiol 1995;249:398–408.

31. Burgess RR. Refolding solubilized inclusion body proteins.Methods Enzymol 2009;463:259–282.

32. Ng JD, Gavira JA, Garcia-Ruiz JM. Protein crystallization bycapillary counterdiffusion for applied crystallographic struc-ture determination. J Struct Biol 2003;142:219–231.

33. Jancarik J, Kim SH. Sparse matrix sampling: a screen-ing method for crystallisation of proteins. J Appl Cryst1991;24:409–411.

34. Dimasi N, Flot D, Dupeux F, Marquez JA. Expression,crystallization and X-ray data collection from microcrys-tals of the extracellular domain of the human inhibitoryreceptor expressed on myeloid cells IREM-1. Acta Cryst F2007;63:204–208.

35. Bergfors T. Seeds to crystals. J Struct Biol 2003;142:66–76.

Page 21: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

REFERENCES 21

36. D’Arcy A, Mac Sweeney A, Habera A. Modified microbatchand seeding in protein crystallization experiments. J SynchrRadiat 2004;11:24–26.

37. McPherson A. Preparation and Analysis of Protein Crystals.Malabar, FL: Krieger Publishing; 1989.

38. Chayen NE. Protein Crystallization Strategies for Struc-tural Genomics. La Jolla, CA: International University Line;2007.

39. Bergfors T. Protein Crystallization. 2nd ed. La Jolla, CA: Inter-national University Line; 2009.

40. Hope H. Crystallography of biological macromolecules atultra-low temperature. Ann Rev Biophys Biophys Chem1990;19:107–126.

41. Massover WH. Radiation damage to protein specimens fromelectron beam imaging and diffraction: a mini-review of anti-damage approaches, with special reference to synchrotron X-ray crystallography. J Synchr Rad 2007;14:116–127.

42. Garman E, Sweet RM. X-ray data collection from macromolec-ular crystals. Meth Mol Biol 2007;364:63–93.

43. Bourenkov GP, Popov AN. Optimization of data collection tak-ing radiation damage into account. Acta Cryst D 2010;66:409–419.

44. Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H,Phillips DC. A three-dimensional model of the myoglobinmolecule obtained by X-ray analysis. Nature 1958;181:662–666.

45. Perutz MF, Rossmann MG, Cullis AF, Muirhead H, WillG, North AC. Structure of haemoglobin: a three-dimensionalFourier synthesis at 5.5-A. resolution, obtained by X-ray anal-ysis. Nature 1960;185:416–422.

46. Rossmann MG. Molecular replacement, a historical back-ground. Acta Cryst D 2001;57:1360–1366.

47. Trapani S, Navaza J. AMoRe: classical and modern. Acta CrystD 2008;64:11–16.

48. Vagin A, Teplyakov A. Molecular replacement with MOLREP.Acta Cryst D 2010;66:22–25.

49. McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD,Storoni LC, Read RJ. Phaser crystallographic software. J ApplCryst 2007;40:658–674.

50. Weeks CM, Miller R. Optimizing Shake-and-Bake for proteins.Acta Cryst D 1999;55:492–500.

51. Sheldrick GM. A short history of SHELX. Acta Cryst A2008;64:112–122.

52. Ealick SE. Advances in multiple wavelength anomalousdiffraction crystallography. Curr Opin Chem Biol 2000;4:495–499.

53. Dodson E. Is it jolly SAD? Acta Cryst D 2003;59:1958–1965.

54. Hendrickson W. Maturation of MAD phasing for thedetermination of macromolecular structures. J Synchr Rad1999;6:845–851.

55. Ravelli RB, Leiros HK, Pan B, Caffrey M, McSweeney S.Specific radiation damage can be used to solve macromolecularcrystal structures. Structure 2003;11:217–224.

56. Garman E, Murray JW. Heavy-atom derivatization. Acta CrystD 2003;59:1903–1913.

57. Hendrickson WA, Horton JR, LeMaster DM. Selenome-thionyl proteins produced for analysis by multiwavelengthanomalous diffraction (MAD): a vehicle for direct determi-nation of three-dimensional structure. EMBO J 1990;9:1665–1672.

58. Abrahams JP, Leslie AG. Methods used in the structure deter-mination of bovine mitochondrial F1 ATPase. Acta Cryst D1996;52:30–42.

59. Cowtan K. Recent developments in classical density modifica-tion. Acta Cryst D 2010;66:470–478.

60. Terwilliger T. Solve and resolve: automated structure solu-tion, density modification, and model building. J Synchr Radiat2004;11:49–52.

61. Sheldrick GM. Experimental phasing with SHELXC/D/E:combining chain tracing with density modification. Acta CrystD 2010;66:479–485.

62. Dodson EJ, Woolfson MM. ACORN2: new developments ofthe ACORN concept. Acta Cryst D 2009;65:881–891.

63. Rodriguez-Martinez DD, Grosse C, Himmel S, Gonzalez C,de Ilarduya IM, Becker S, Sheldrick GM, Uson I. ARCIM-BOLDO: crystallographic ab initio protein solution far belowatomic resolution. Nat Meth 2009;6:651–653.

64. Taylor GL. Introduction to phasing. Acta Cryst D 2010;66:325–338.

65. McRee DE. Practical Protein Crystallography. 2nd ed. SanDiego, CA: Academic Press; 1999.

66. Drenth J. Principles of Protein X-ray Crystallography. 3rd ed.New York: Springer; 2006.

67. Rhodes G. Crystallography Made Crystal Clear. 3rd ed. SanDiego, CA: Academic Press; 2006.

68. McPherson A. Introduction to Macromolecular Crystallogra-phy. 2nd ed. Hoboken, NJ: Wiley-Blackwell; 2009.

69. Rupp B. Biomolecular Crystallography: Principles, Practice,and Application to Structural Biology. 1st ed. New York: Gar-land Science; 2009.

70. Vonrhein C, Blanc E, Roversi P, Bricogne G. Automated struc-ture solution with autoSHARP. Meth Mol Biol 2007;364:215–2230.

71. Pannu NS, Waterreus WJ, Skubak P, Sikharulidze I, AbrahamsJP, de Graaff RA. Recent advances in the CRANK softwaresuite for experimental phasing. Acta Cryst D 2011;67:331–337.

72. Jones TA, Zou JY, Cowan SW, Kjeldgaard M. Improved meth-ods for building protein models in electron density maps and thelocation of errors in these models. Acta Cryst A 1991;47:110–119.

73. Emsley P, Lohkamp B, Scott WG, Cowtan K. Features anddevelopment of Coot. Acta Cryst D 2010;66:486–501.

74. Langer G, Cohen SX, Lamzin VS, Perrakis A. Auto-mated macromolecular model building for X-ray crystal-lography using ARP/wARP version 7. Nat Protoc 2008;3:1171–1179.

75. Cowtan K. The Buccaneer software for automated model build-ing. 1. Tracing protein chains. Acta Cryst D 2006;62:1002–1011.

Page 22: PART I - catalogimages.wiley.com · X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS Antonio L. Llamas-Saiz and Mark J. van Raaij 1.1 INTRODUCTION

JWST259-01 JWST259-Ruso Printer: Yet to Come December 27, 2012 13:37 8.5in×11in

22 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

76. Perrakis A, Morris R, Lamzin VS. Automated protein modelbuilding combined with iterative structure refinement. NatStruct Biol 1999;6:458–463.

77. Brunger AT. Free R value: a novel statistical quantity for assess-ing the accuracy of crystal structures. Nature 1992;355:472–475.

78. Murshudov GN, Vagin AA, Dodson EJ. Refinement of macro-molecular structures by the maximum-likelihood method. ActaCryst D 1997;53:240–255.

79. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW,Echols N, Headd JJ, Hung LW, Kapral GJ, Grosse-KunstleveRW, McCoy AJ, Moriarty NW, Oeffner R, Read RJ, Richard-son DC, Richardson JS, Terwilliger TC, Zwart PH. PHENIX:a comprehensive Python-based system for macromolecularstructure solution. Acta Cryst D 2010;66:213–221.

80. Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stere-ochemistry of polypeptide chain configurations. J Mol Biol1963;7:95–99.

81. Laskowski RA, MacArthur MW, Moss DS, Thornton JM.PROCHECK: a program to check the stereochemical qualityof protein structures. J Appl Cryst 1993;26:283–291.

82. Vriend G. What if: a molecular modeling and drug designprogram. J Mol Graphics 1990;8:52–56.

83. Chen VB, Arendall WB 3rd, Headd JJ, Keedy DA, ImmorminoRM, Kapral GJ, Murray LW, Richardson JS, Richardson DC.MolProbity: all-atom structure validation for macromolecularcrystallography. Acta Cryst D 2010;66:12–21.

84. Urzhumtseva L, Afonine PV, Adams PD, Urzhumtsev A.Crystallographic model quality at a glance. Acta Cryst D2009;65:297–300.

85. Holm L, Kaarianinen S, Rosenstrom P, Schenkel A. Searchingprotein structure databases with DaliLite v.3. Bioinformatics2008;24:2780–2981.

86. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hub-bard TJ, Chothia C, Murzin AG. Data growth and its impacton the SCOP database: new developments. Nucleic Acids Res2008;36:D419–425.

87. Cuff AL, Sillitoe I, Lewis T, Clegg AB, Rentzsch R, FurnhamN, Pellegrini-Calace M, Jones D, Thornton J, Orengo CA.Extending CATH: increasing coverage of the protein structureuniverse and linking structure with function. Nucleic Acids Res2011;39:D420–D426.

88. Krissinel E, Henrick K. Inference of macromolecular assem-blies from crystalline state. J Mol Biol 2007;372:774–797.

89. Abrahams JP, Leslie AG, Lutter R, Walker JE. Structure at 2.8A resolution of F1-ATPase from bovine heart mitochondria.Nature 1994;370:621–628.

90. Noji H, Yasuda R, Yoshida M, Kinosita K. Direct observationof the rotation of F1-ATPase. Nature 1997;386:299–302.

91. Lill M. Efficient incorporation of protein flexibility anddynamics into molecular docking simulations. Biochemistry2011;50:6157–6169.

92. Nicholls A, Sharp KA, Honig B. Protein folding and associa-tion: insights from the interfacial and thermodynamic proper-ties of hydrocarbons. Proteins 1991;11:281–296.

93. Pettersen EF, Goddard TD, Huang CC, Couch GS, GreenblattDM, Meng EC, Ferrin TE. UCSF Chimera—a visualizationsystem for exploratory research and analysis. J Comput Chem2004;25:1605–1612.

94. Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA.Electrostatics of nanosystems: application to microtubulesand the ribosome. Proc Natl Acad Sci USA 2011;98:10037–10041.

95. Guardado-Calvo P, Vazquez-Iglesias L, Martinez-Costas J,Llamas-Saiz AL, Schoehn G, Fox GC, Hermo-Parrado XL,Benavente J, van Raaij MJ. Crystal structure of the avianreovirus inner capsid protein sigmaA. J Virol 2008;82:11208–11216.

96. Ritchie DW, Kozakov D, Vajda S. Accelerating andfocusing protein–protein docking correlations using multi-dimensional rotational FFT generating functions. Bioinformat-ics 2008;24:1865–1873.

97. Ehebauer MT, Willmanns M. The progress made in deter-mining the Mycobacterium tuberculosis structural proteome.Proteomics 2011;11:3128–3133.

98. Schmeing TM, Ramakrishnan V. What recent ribosome struc-tures have revealed about the mechanism of translation. Nature2009;461:1234–1342.

99. DiMaio F, Terwilliger TC, Read RJ, Wlodawer A, OberdorferG, Wagner U, Valkov E, Alon A, Fass D, Axelrod HL, DasD, Vorobiev SM, Iwaı H, Pokkuluri PR, Baker D. Improvedmolecular replacement by density- and energy-guided proteinstructure optimization. Nature 2011;473:540–543.