Resolution: Implications in Refinement

44
Resolution: Implications in Refinement Swanand Gore & Gerard Kleywegt May 6 th 2010, 12-1 pm Macromolecular Crystallography Course

description

Resolution: Implications in Refinement. Swanand Gore & Gerard Kleywegt May 6 th 2010, 12-1 pm. Macromolecular Crystallography Course. Outline. Intuitive idea of resolution – why higher order diffraction is better. Parameters, model, observations, refinement – more data is better. - PowerPoint PPT Presentation

Transcript of Resolution: Implications in Refinement

Page 1: Resolution: Implications in Refinement

Resolution: Implications in Refinement

Swanand Gore & Gerard KleywegtMay 6th 2010, 12-1 pm

Macromolecular Crystallography Course

Page 2: Resolution: Implications in Refinement

Outline• Intuitive idea of resolution – why higher order diffraction is better.

• Parameters, model, observations, refinement – more data is better.

• Observations, parameters, over-fitting in crystallographic refinement.

• Features that can be modeled at various resolutions.

• Refinement practices at low and high resolution.

Page 3: Resolution: Implications in Refinement

Idealized diffraction in 1D

• Images scanned from David Blow’s book

h1

h2

h3

-h1

-h2

-h3

Page 4: Resolution: Implications in Refinement

Idealized diffraction in 1D

• Images scanned from David Blow’s book

h1

h2

h3

-h1

-h2

-h3

Incr

easin

g re

solu

tion

Assuming:•B = 0•Occupancy = 1•Uniform scattering power in all directions.•Phase angles = 0

Page 5: Resolution: Implications in Refinement

Idealized diffraction in 1D

• Images scanned from David Blow’s book

• Higher order diffraction• Higher Fourier coefficients• Higher frequency wave in real space• Sharper signal• Greater resolution

Page 6: Resolution: Implications in Refinement

What separation can be resolved?

• Images from B. Rupp’s book• Image from Gerard’s ppt.

• Nominal resolution– The h-th order diffracted wave samples the

lattice at interval of a/h.– a/h is the crystallographic resolution which is

routinely quoted.– In tetragonal cell abc, diffraction hkl comes

from planes separated by• √[ (a/h)2 + (b/k)2 + (c/l)2 ]

– For tetragonal cell 100, 95, 90, and highest order diffraction 50, 52, 48, resolution is ~3.29.

• For non-orthogonal axes, corrections apply.

• Resolution intuitively means the least distance between objects below which they cannot be distinguished apart.

– For 3D crystallography, it is ~0.92*dmin, almost same as nominal.

blob

O O

Page 7: Resolution: Implications in Refinement

Atomic scatterers in 1D

• Images from B. Rupp’s book

C O C OC O

FourierCoefficients& Phases

ResolutionFilter

Peaks get sharper as higher resolution Fourier coefficients are included.

Page 8: Resolution: Implications in Refinement

Occupancy and B factors

• Images from B. Rupp’s book

Peaks get broader due to larger B factors and shorter due to lower occupancy.

Page 9: Resolution: Implications in Refinement

Data truncation

• Images from B. Rupp’s book

Happens naturally due to B factors.Truncated data leads to incomplete reverse FT, causes ripples.Ripples around heavy atoms can ‘drown’ nearby lighter atoms.Ripples can seem to originate from real atoms.

O at 0.5 occupancy?N’s at 0.5 occupancy?

Page 10: Resolution: Implications in Refinement

Diffracting duck in 2D• Leaving out higher order diffraction data will reduce the detail

retrieved through reverse transform.• Leaving out lower resolution data will blur the boundaries.• Randomly absent data is not too problematic for maps.

– Doesn’t matter if Rfree set is / not used in map calc?

• Images from Kevin Cowtan’s website.

Page 11: Resolution: Implications in Refinement

Make everything as simple as possible,but not simpler…..

ρx = 1/V Σ Fh exp (-2πi h.x + iφh)

Fh exp(i φh) =V Σ fi oi exp(2πi h.x) exp (-Bi sin2θ/λ2)

Estimate phases.Model xyz, B, o.Model solvent.…..

Noise.Errors in data collection.Static, dynamic disorder.

• Images from B. Rupp’s book

Page 12: Resolution: Implications in Refinement

Choosing resolution cutoff• B factors and scattering factors impose a natural cutoff

on what can be observed.

• Reliability of measurement is indicated by S/N ratio and completeness

• Signal to noise ratio– A low SNR does not matter too much if proper maximum

likelihood target is used to weigh in error estimates.– High <I/σ(I)> matters when collecting data for phasing.

• Completeness– Low completeness in highest resolution shell does not

confer a level of detail to the map as implied by nominal resolution

– Effective resolution = dmin . C-1/3

– Randomly or systematically missing data creates undesirable effects in reverse FT.

– Completeness > 0.95

• Number of reflections increases as cube of nominal resolution.

– 2/3z π VUC / dmin3

– Not unique due to centro-symmetry and spacegroup symmetry

• Images from B. Rupp’s book

Page 13: Resolution: Implications in Refinement

Model and refinement• Model is defined as a set of parameters and a set of

functions over parameters, designed to explain observations

• Refinement– Is an algorithmic process of fitting a model to explain

observations, by assigning optimal values to parameters.– Reduces the differences between observations and model-

calculated values of observations

• A linear model in 2D consists of 2 parameters– Y = mX + c

• Some models are more accurate than others, depending on quality of refinement.

• Refinement is necessary when observations contain errors and there are enough observations to refine the parameters.

c

m1

Observations

Well-refinedmodel

Ill-refinedmodel

Page 14: Resolution: Implications in Refinement

Model and refinement• A linear model in 2D

– consists of 2 parameters : Y = mX + c– 1 observation, howsoever accurate, is not sufficient if model has 2

parameters• Under-determined, over-fitted model• Many models can be imagined

– 2 distinct accurate observations are sufficient to determine the linear model

• Well-determined model– 3 accurate observations over-determine the model– But observations generally contain random error! Greater

number of observations lead to error cancellation and more accurate model

• Model with too few params can lead to under-fitting• Model with too many params can lead to over-fitting

– Fitting to error too!

• Quality of modelling– Choice of model (linear, quadratic, higher polynomial?)– Quality of refinement (R value)

• Images from B. Rupp’s book

Page 15: Resolution: Implications in Refinement

Model and refinement

• In presence of errors, refinement quality does not indicate model quality

– Well-refined model is of bad quality if it was fitted to erroneous observations.

• Hence, observations not subject to refinement are required to assess the accuracy.

– R and “free” R– M1: 0.2, 0.3– M2: 0.2, 0.4– M1 better than M2

• Free R and data/param ratio helps in comparing models with different number of parameters

– MA: 0.2, 0.3. d/p = 15/2 = 7.5• Under-fit

– MB: 0.1, 0.2. d/p = 15/3 = 5• optimal

– MC: 0.01, 0.25. d/p = 15/10 = 1.5• Overfitting = Low d/p, high Rfree

• Images from B. Rupp’s book

MA

Occham’s valley

M1

M2

MB

MC

Page 16: Resolution: Implications in Refinement

A crystallographic model• Biochemical entities

– Biopolymers• polypeptides, polynucleotides, carbohydrates

– Small-molecule ligands (ions, organic)• Crystallographic additives, e.g. GOL, PEG• Physiologically relevant, e.g. heme, ions• Synthesized molecules, e.g. a drug candidate

– Solvent

• Coordinates, Displacement– Unique x,y,z– Partial, multiple, absent (occupancy)– Isotropic or anisotropic B factors– TLS approximation

• Crystallographic etc.– Cell, symmetry, NCS– Bulk solvent correction (Ksol, Bsol)

• 3hbq images made with pymol.• http://www.cgl.ucsf.edu/chimera/feature_highlights/ellipsoids.png• B factor putty from Antonyuk et al. 10.1073/pnas.0809170106• www.ruppweb.org/xray/tutorial/Crystal_sym.htm

Page 17: Resolution: Implications in Refinement

Quick note on NCS, TLS• Non-crystallograpic symmetry

– Molecule/s -> ASU -> locally-related ASUs -> Unitcell -> Crystal– Sometimes ASU can consist of multiple, nearly identical subunits.– The transformation operator between subunits is local and distinct from

space-group operators.– Subunits need not be identical because they are in different environments,

differences do not indicate problems!– This additional symmetry can be used in refinement (restraints, constraints)

and validation.

• Translation-libration-screw– Overall anisotropy = lattice disorder + inter-molecular motions + intra-

molecular rigid body motions within molecule + atomic anisotropy– Paradigm shift from atom-level anisotropy modelling to anisotropic

movements of rigid bodies– 1d: a point (3) through which rotation axis (2) will pass + ratio (1) of rotation

to translation on that axis = 6– 2d: 2 points + 2 ratios + 2 orthogonal axes (3) + 2 more ratios = 13– 3d: 3 points + 3 ratios + 3 orthogonal axes (3) + 6 more screws = 20(ish)– TLS group granularity can range from full domain to sidechain

• Images from Rupp book and Martyn Winn ppt

Page 18: Resolution: Implications in Refinement

Counting parameters• Average-case parameters

– Per atom 4 params• 3 params for coordinates• 1 param for isotropic B factor

– No hydrogens, 1 water per residue– 8 atoms per residue– N * 8 * 4 = 32 N

• Increasing the parameters– 6 params per atom for anisotropic B factor (>2x)– Refining occupancy (1.25x) or multiple occupancy– Hydrogens modeled explicitly (8 per residue) (2x)– Multiple models (M x)

• Reducing parameters– 20 params per TLS group

• 5 groups: 20 * 5 groups of 40 res each = 100• => 32 * 200 to 100 (1/64 x for 200 res protein)

– Strict NCS (1/n x for n-fold)

1clm, calmodulin, 1.8Å1132 protein atoms + 4 Ca + 71 waters= 4828 xyzB#unique reflections = 10610Data / params = 2.2

Restraint counts taken from: http://ccp4wiki.org/~ccp4wiki/wiki/images/9/9f/Winn_prague09_data_parameters.pdf

1exr, calmodulin, 1Å1467 protein atoms with alt conf + 5 Ca + 178 waters9900 anisotropic B + 316 occupancy= 15166 params#unique reflections = 77150Data / params = 4.6

1h6v , 3Å6 TLS groups = 120 params22514 protein atoms + 552 ligand atoms + 9 watersxyzB = 92300 (residual)#unique reflections = 69328 (5% free)d/p = 69328/92300 = 0.7

Page 19: Resolution: Implications in Refinement

Data to parameters ratio• r = (number of unique reflections) / (number of

parameters)– Graph for a calmodulin 1up5, ~2500 atoms, xyzB– r < 1, i.e. under-determined for dmin < 2.5Å– Reflections-based refinement is possible only for r >

10, i.e. resolution approaching 1Å!– But most PDB entries have r ~ 2-5

• There must be more observations provided to refinement than only the reflections

– Reflections = observations specific to a particular MX experiment

– But there are other more general observations applicable to any MX refinement

– Covalent geometry, steric clashes, ….

(Graph by Konrad Hinsen, 2008)

• Image from B. Rupp’s book

Page 20: Resolution: Implications in Refinement

Observations to parameters ratio

• Observations = reflections + constraints and restraints based on well-known features of macromolecules

• o/p > d/p– Tricky to estimate the difference due to

dependences, but generally sufficient to make refinement possible

– 1exr: 1Å, 22732 restraints • Bonds, angles, planarity, chirality…

– o/p = (22732 + 77150) / 15166 = 6.1 > 4.6 = d/p

Bungee jumperRElaxation = REstraint

HangmanCONvict = CONstraint

• Images from Gerard’s slides• Restraint counts taken from: http://ccp4wiki.org/~ccp4wiki/wiki/images/9/9f/Winn_prague09_data_parameters.pdf

Ener

gylength

Page 21: Resolution: Implications in Refinement

Observations to parameters ratio

• o/p > d/p for 1h6v at 3Å– Restraints (including NCS) = 209378– o/p = (209378+69328)/92300 = 3– d/p = 0.7 < 3 = o/p

• 2 components of refinement residuals– Data-based

• Changes model (xyzB..) to reduce Fo ~ Fc– Knowledge-based

• Changes model (xyz) to take values of geometric features towards idealized values

– Qtot = wx Qx + Qgeom

– Small wx : greater stress on geometric correctness• Low resolution, low d/p

– Large wx : model deviation from ideal geometry• High resolution, high d/p

• Restraint counts taken from: http://ccp4wiki.org/~ccp4wiki/wiki/images/9/9f/Winn_prague09_data_parameters.pdf

Page 22: Resolution: Implications in Refinement

Greater d/p => more detail(given decent phases)

• Image from http://www.crystal.uwa.edu.au/px/alice/projects/SCOA_atomic.html, 1mxt• Images from Rupp’s book

0.95Å

Page 23: Resolution: Implications in Refinement

Lower d/p => lower detaildecent phases often not available

• Pics of 2g34, 1z56 with coot using EDS maps

2g34, 5Å

1z56, 3.9Å

Page 24: Resolution: Implications in Refinement

Lower d/p => lower detail

• Pics of 2bf1 with coot using EDS maps

2bf1, 4Å

Page 25: Resolution: Implications in Refinement

All resolutions not equal…

• From Gerard’s slides and Phil Evans

Page 26: Resolution: Implications in Refinement

Levels of detail interpretableat various resolutions

Protein Feature Resolution (Å)

Helix 9

Sheet 4

Main chain 3.7

Aromatic sidechains 3.5

Small sidechains 3.2

Sidechain conformations 2.9

Carbonyl, peptide 2.7

Ordered waters 2.7

Central dimple of aromatic ring 2.4

Correct stereochemistry at Ile CB 2.2

Proline pucker 2.0

Individual atoms 1.5

Nucleic Acid Feature Resolution (Å)

Double helix 20

Single strand 12

Stacked base pairs 4

Phosphates 3.5

Purine or pyrimidine? 3.2

Individual bases 2.7

Ribose pucker 2.4

Individual atoms 1.5

• From David Blow’s book

Orbitals and bonds (beyond 1Å)!

Page 27: Resolution: Implications in Refinement

Rules of thumb at all resolutions for model-building and refinement

• Start with few parameters and slowly enrich the model

• Be very conservative till a majority of backbone is identified and produces stable refinement

• Prioritize: Backbone > side-chains > small-mols > waters

• Be aware of prevalent modeling practices at your resolution

• Whole model contributes to quality of region of interest.

• Use similar structures for comparison and copying.

• Use quality criteria often.

Page 28: Resolution: Implications in Refinement

Low resolution refinement• Low resolution structures offer great biological insights.

– Mainly for complexes e.g. 70S ribosome at 7Å, SIV gp120 envelope glycoprotein at 4Å

– Large complexes generally diffract to lower resolution.• Components may have physiologically relevant conformations

only in complexed states.• High impact

– In absence of better resolution, low resolution data must be used.

– Low resolution does not have to mean low quality!

• Basic guidelines for model building and refinement.– Low d/p => Be cautious of biasing the model– Make extensive use of information in addition to reflections– Use as few parameters as possible– Increase params only when confident• Images from Karmali et al. Acta Cryst. 2009.

Page 29: Resolution: Implications in Refinement

Low resolution refinement

• Build model with fewer parameters– Mainchain-only model– Constrain B factor values to be isotropic and

constant.– Full occupancies only.– TLS to model anisotropic motions of rigid domains.– Strictly constrained or restrained NCS to reduce

params many-fold– No waters or small molecules, use only ‘bulk

solvent’

Page 30: Resolution: Implications in Refinement

Low resolution refinement

• Model cautiously– Initial tracing

• Build regions that are likely to be seen clearly

• Good packing, low B factors, bulky group, electron-rich groups

• core, mainchain, helices, big sidechains, bases, phosphates

– Sequence registry• Beware of register and topology errors• Guess sequence register from bulky

sidechains• Extend the register by trial and error• Check sequence register with a

homologous structure• Truncate to Gly wherever unsure of

residue identity

• From Gerard’s slides

Page 31: Resolution: Implications in Refinement

Low resolution refinement

– Try copying fragments from other high resolution structures when there is clear homology

– Treat ligands extra-carefully• Copy high-quality observed conformation or predicted low

energy conformation• Restrain tightly unless there is density and other clues to

deviate

• Axel T. Brunger et al. 2009. Acta Cryst D 65 128–133 X-ray structure determination at low resolution.

Page 32: Resolution: Implications in Refinement

Low resolution refinementdensity modification tools

• Expected solvent density– define solvent boundary– followed by solvent flattening / flipping, histogram matching

• Images B. Rupp’s book and from Acta Cryst. (2003). D59, 1881-1890. The phase problem. G. Taylor• Brunger 2006, Low resolution crystallography. Acta Cryst.• https://wasatch.biochem.utah.edu/chris/tutorial/Density_Modification.pdf a ppt on DM

Page 33: Resolution: Implications in Refinement

Low resolution refinementdensity modification

• Averaging maps of NCS-restrained copies

• Image from B. Rupp’s Brook.• unger 2006, Low resolution crystallography. Acta Cryst.• https://wasatch.biochem.utah.edu/chris/tutorial/Density_Modification.pdf a ppt on DM

Page 34: Resolution: Implications in Refinement

Low resolution refinementdensity modification

• B-factor sharpening– High-resolution reflections get attenuated most by B factors– Application of negative B factors can artificially up-weigh high-res terms to

obtain greater detailed but possibly noisier map

• Brunger 2006, Low resolution crystallography. Acta Cryst.• https://wasatch.biochem.utah.edu/chris/tutorial/Density_Modification.pdf a ppt on DM

Page 35: Resolution: Implications in Refinement

Low resolution refinement• Refinement techniques

– Rigid body refinement• A fragment is constrained to be internally rigid, has only 6

degrees of freedom• B factor is isotropic and constant• Powerful first step of refinement needing only low

resolution data• Arbitrary rigid fragments (high quality helices, high-

resolution domain structures) can be optimized for location and orientation relative to each other to yield better phases and maps

– Torsion angle refinement• Bonds, angles, chirality, planarity not variables, only

torsion angles are refined• Protein is divided into rigid subgroups to sample

thoroughly a limited conformational space• Higher radius of convergence, reduced overfitting

• Image from Schwieters, C.D. & Clore, G.M. (2001) Internal coordinates for molecular dynamics and minimization in structure determination and refinement. J. Magn. Reson. 152, 288-302• Nice tutorial at http://speedy.st-and.ac.uk/~naismith/workshop/torsion.pdf• See Axel Brunger’s papers on torsion angle refinement

Page 36: Resolution: Implications in Refinement

Low resolution refinement

• Solving multiple times– Try to automate as much as possible the process of model building and

refinement, and then repeat it– Consensus substructures are more reliable, average them– Regions with differences are unreliable, remove them– Gives an idea of precision

• Gradual increase in number of parameters– Mainchain -> bulky sidechains -> sequence register -> other sidechains– Finally known small mol binders with known binding site can be modelled if

reasonable density appears

• Validation– Keep track of Ramachandran and sidechains rotamers– Remove unlikely parts of mainchain and sidechain– Do not restrain Rama distribution or sidechains to rotamers during

refinement, it may give false validation results

• Read what others are doing for low resolution– e.g. Axel Brunger’s literature, CCP4 & phenix tools, CCP4bb

• Images from wikipedia and Furnham et al. Structure 2006.

Page 37: Resolution: Implications in Refinement

High resolution refinement• High resolution structures provide atomic insights

– Packing, binding– Flexibility– Enzyme mechanisms– Hydration

• Basic guidelines for model building and refinement– High d/p => Be cautious of under-fitting!– Make greater use of data than in low res case– Make as detailed a model as possible, esp of interesting regions– Check all empty density critically

Page 38: Resolution: Implications in Refinement

High resolution refinement

• Allow model to deviate from geometry when data is strong– Weight on xray term can be slowly increased to

reveal any unusual geometry without risking model bias

• Use automation to fit biopolymers– Trace secondary structure automatically, in coot or

with phenix tools– Trace mainchain and build sidechains using

programs, e.g. with buccaneer, warpNtrace, Rapper– Do this multiple times to identify regions requiring

manual attention

• Validation tools: can they indicate the information content of macromolecular crystal structures? EJ Dodson et al. Volume 6, Issue 6, 1998, 685-690.• Image from Terwillinger et al. papers in Acta Cryst D on automatic chain tracing.

Page 39: Resolution: Implications in Refinement

High resolution refinement

• Explain all unoccupied density– Is it due to ligands?

• Build expected ligands (including MX additives)

• Search unexpected small-mols• E.g. coot or phenix ligand tools

– Is it due to multi-conformer sidechains?

– Is it water?

• Images from B. Rupp’s book and Terwillinger et al. Acta Cryst. 2005.

Page 40: Resolution: Implications in Refinement

High resolution refinement

• Build waters– Peak-pick semi-automatically to

form a reasonable hydration network with sidechains

• Model hydrogens– When difference density is visible

• Image from B. Rupp’s book• Atomic resolution crystallography reveals how changes in pH shape the protein microenvironment. Lyubimov et al. Nature Chemical Biology 2, 259 - 264 (2006)

Page 41: Resolution: Implications in Refinement

High resolution refinement

• Verify correct sidechain orientations of NQH– Manually or automatically flip NQH

sidechains to improve h-bonding– Model more sidechain conformations if

necessary

• Use non-standard atomic scattering models– At subatomic resolution, model

electron density with nonspherical multipolar model, or model bonds as scatterers• Image from B. Rupp’s book

• Afonine et al. Acta Cryst. (2007). D63, 1194–1197• Jelsch et al. PNAS 2000 97 7 3171.

Page 42: Resolution: Implications in Refinement

High resolution refinement

• Even in high res, maintain order of adding detail to avoid overfitting

– bb > sc > ligand– Anisotropy, multiconformers, waters,

hydrogens

•Image from Antonyuk S V et al. PNAS 2005;102:12041-12046•Image from David Blow’s book.

• Invest more parameters around the regions of interest

– multi-conformers– Anisotropy– waters near active site– Possibility of multiple ligands– Releasing constraints / restraints

Page 43: Resolution: Implications in Refinement

Summary• Resolution is the least distance between Bragg planes with observable reflection. Two

atoms closer than resolution cannot be observed distinctly using data at that resolution.

• Resolution dictates the detail revealed by electron density maps.– Low resolution => low detail– High resolution => high detail

• Parameters in the model must be chosen to suit the resolution.

• Over-fitting can be detected using Rfree and data to parameter ratio.

• Knowledge-based constraints and restraints augment experimental data to make refinement possible.

• Geometric target is weighted more than crystallographic data at low resolution. Model is allowed to diverge from ideal geometry at high resolution.

• Greater detail should be modelled at higher resolution to make best use of data.

Page 44: Resolution: Implications in Refinement

Acknowledgements• Alejandro & IPMont MX organizers

• Sameer Velankar, Jawahar Swaminathan (EBI)

• Online resources– Kevin Cowtan– Rupp web– Randy Read’s course– Various papers and images therefrom– Martyn Winn’s ppt at on data to params

• Books– David Blow– Alex McPherson– Bernhard Rupp