Evans-Chicago08-integration-datareduction.pdf

download Evans-Chicago08-integration-datareduction.pdf

of 90

Transcript of Evans-Chicago08-integration-datareduction.pdf

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    1/90

    Data processing

    Integration of diffraction images

    and data reduction

    Phil Evans APS May 2008MRC Laboratory of Molecular Biology

    Cambridge UK

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    2/90

    Assumptions

    It is helpful if you understand (at least somethingabout)

    Diffraction from a crystal & the Laue equations Reciprocal lattice

    Ewald construction diffraction geometry

    These are topics I dont have time to discuss today

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    3/90

    Crystal

    h k l I (I)

    h k l F (F)

    Data collection

    Images

    Integration

    Scaling & merging (data reduction)

    Diffraction geometryStrategy

    Indexing

    Space group determinationQuality assessment

    Data collection & processing

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    4/90

    Decisions

    Is this your best crystal? Mosaicity,resolution, size, ice

    Total rotation, rotation/image(overlaps), exposure time, positionof detector.

    How good is the dataset? Anybad bits?

    Is the crystal twinned?

    What is the correct lattice?[Integration parameters: boxsize, overlap check]

    What Laue group, space group?

    Select crystal

    Collect a few images to judge quality.Index & examine carefully

    Decide strategy and collect all images

    Integration

    Index

    choose lattice Refine unit cell

    Integrate

    Choose Laue group (point group)

    Scale & merge

    Convert I to F

    How to collect good data

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    5/90

    What is a good crystal?

    Single only one lattice check by indexing pattern and looking for unpredicted spots Diffracts to high angle

    Large the diffracted intensity is proportional to the number of unit cells in the beam, sonot much gain for a crystal much larger than beam (typically 50200m). Smaller

    crystals may freeze better (lower mosaicity)

    Low mosaicity better signal/noise Good freeze no ice, minimum amount of liquid (low background), low mosaicity

    Optimise cryo procedure

    The best that you have! (the least worst)

    The quality of the crystal determines the quality of the dataset.

    Beware of pathological cases

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    6/90

    Some bad ones

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    7/90

    Phi = 0 Phi = 90

    Always check diffraction in twoorthogonal images !

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    8/90

    Additional spots present, not resolved

    Results in instability inrefinement of detector

    parameters.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    9/90

    Spots not resolved, very poor spot shape andstreaking

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    10/90

    Index Strategy Integrate

    Scale/Merge

    Detwin Convert I to F

    (---------------------MOSFLM---------------------)

    SCALA

    TRUNCATE

    Data

    Integration and scaling in CCP4

    POINTLESSdetermine Laue group & space group, sort

    Tools

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    11/90

    Starting Point:A series of diffraction images, each recorded on a 2Darea detector while rotating the crystal through a small angle (typically0.2-1.0 per image) about a fixed axis (the Rotation/Oscillation

    Method).Outcome: A dataset consisting of the indices (h,k,l) of all reflectionsrecorded on the images with an estimate of their intensities and thestandard uncertainties of the intensities: h, k, l, I(hkl), (I)

    Integration

    Two distinct methods:

    2-D: integrate spots on each image, add together partially recordedobservations in the scaling program. MOSFLM, DENZO, HKL2000, etc

    3-D: integrate 3-dimensional box around each spot, from a series ofimages. XDS, D*TREK, SAINT etc

    For today: MOSFLM

    integration slides from Andrew Leslie

    Images hkl I (I)

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    12/90

    Reciprocal space is3-dimensional, eventhough we havesliced it into 2Dimages

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    13/90

    Note that a series ofimages samples the full 3-dimensional reciprocalspace, Bragg diffraction

    and any otherphenomena, all scatteringfrom crystal and itsenvironment.

    In practice, defects in the crystals (ordetectors) make integration far from trivial, egweak diffraction, crystal splitting, anisotropicdiffraction, diffuse scattering, ice rings/spots,high mosaicity, unresolved spots, overloadedspots, zingers/cosmic rays, etc, etc.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    14/90

    We want to calculate the intensity of each spot: then working backwards

    The simplest method is draw a box around each spot, add up all thenumbers inside, & subtract the background (or better, fit profile)

    To do this, we must know where the spot is: this needs the unit cell of the crystal the orientation of the crystal relative to the camera the exact position of the detector To find the unit cell and crystal orientation, we must index the diffractionpattern

    this can be done by finding spots on one or more images

    Integration

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    15/90

    Tools: the new iMosflm interface

    Images window: select & load images

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    16/90

    Image Display

    Simple control over:Found spots

    Predicted pattern

    Direct beam position

    Resolution limits

    Masking functionPanning and Zooming

    Note manually drawnmask for beam-stopshadow

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    17/90

    Integration procedure in iMosflm

    1.Find spotswhat is a spot? should have uniform shape, not streak2.Indexfind lattice which fits spots

    3. Estimate mosaicity improve estimate later4. Check prediction, on images remote in (90 away)

    is the indexing correct?5. Refine cell use two wedges at 90, or more in low symmetry6. Mask backstop shadow not (yet) done automatically by program7. Integrate one (or few) image to check resolution etc8. Integrate all images run in background for speed

    Strategy option, for use before data collection

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    18/90

    Indexing

    If we know the main beam position on the image,we can count spots from the centre

    To do it properly, we needto put the spots into 3

    dimensions, knowing therotation of the crystal forthis image

    a*

    b*

    (3,1,0)

    l=0

    l=1l=2

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    19/90

    Back-project each spot on to Ewald sphere,then rotate back into zero-frame

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    20/90

    Autoindexing

    Objective: to determine the unit cell, likely symmetry and orientation. (Note thatintensities are required to find the true symmetry, see later).

    The spot positions in a diffraction image are a distorted projection of thereciprocal lattice. Using the Ewald sphere construction, the observed reflections(Xd,Yd,) can be mapped back into reciprocal space giving a set of scatteringvectors si.

    s =

    D/r 1

    Xd/r

    Yd/r

    r = Xd

    2+Y

    d

    2+ D

    2

    D is crystal to detector distance. Uncertainty in leads to errors in s

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    21/90

    Lattice plane normal to lattice plane: vectors cluster at lengths which aremultiples of the lattice spacing. Fourier transform shows sharp peaks

    Consider every possible direction in turn as a possible real-space axis, ieperpendicular to a reciprocal lattice plane. Project all observed vectors on tothis axis

    Fourier transform

    1/a

    a

    Non-lattice direction, random length. No peaks in Fourier transform

    Fourier transform

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    22/90

    1D Fourier transform of projected scattering vectors

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    23/90

    In the 2D example shown, the black cell corresponds to the reduced cell, while the redor blue cells may have been found in the autoindexing.

    Pick three non-coplanar directions which have the largest peaks in theFourier transforms to define a lattice.

    This is not necessarily the simplest lattice (the reduced cell)

    Autoindexing Window

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    24/90

    Autoindexing Window

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    25/90

    A penalty is associated with each solution, which reflects how well thedetermined cell obeys the constraints for that lattice type.

    If nothing is known about the crystal, choose an initial solution inthe following way:

    correct solutions usually have penalties < ~20, often < 10 andrarely > 30: also the errors ([x,y] & [] should be small) note where there is a sharp drop in the penalty in this case

    below solution 8. pick the solution with the highest symmetry with a penalty

    lower than the sharp drop, in this case, solution 7.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    26/90

    Having a sufficient number of spots (preferably a few hundred although 50 may beenough). You may need to change spot finding parameters

    Correct parameters: direct beam position, wavelength & detector distance.

    Only a single lattice present (2 lattices OK if one is weaker).

    Reasonable mosaic spread (no overlap of adjacent lunes).

    Resolved spots.

    Given a list of solutions, select the one with highest symmetry from the solutions withlow penalties. Note that the true symmetry may be lower, since the lattice shape may bemisleading.

    Absence of a clear separation between solutions with low penalties and solutions withhigh penalties can indicate errors in direct beam position, distance etc (or a triclinicsolution).

    Results from a single image can be misleading for low symmetries: index from two ormore images

    Autoindexing.requirements for success

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    27/90

    How do we know that the indexing is correct?

    Check by predicting the pattern on several images at different angles

    The pattern should match reasonably well

    The prediction should explain all spots:

    unpredicted spots may indicate incorrect mosaicity multiple lattices (split crystal) superlattice (pseudo-symmetry)

    Note that the list of solutions given by Mosflm are in fact all the same solution, withdifferent lattice symmetries imposed, so that if the triclinic solution (number 1) iswrong, then all the others are too

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    28/90

    Mosaicity Estimation

    Predict pattern with increasing values for the mosaic spread (eg 0.0,

    0.05, 0.1, 0.15 degrees). In each case, measure the total intensity of allpredicted reflections. The mosaicity can be estimated from the plot oftotal intensity vs mosaic spread.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    29/90

    1.Find spotswhat is a spot? should have uniform shape, not streak

    2.Indexfind lattice which fits spots3. Estimate mosaicity improve estimate later4. Check prediction, on images remote in (90 away) is the indexing correct?5. Refine cell use two wedges at 906. Mask backstop shadow not done automatically by program7. Integrate one (or few) image

    to check resolution etc8. Integrate all images run in background for speed

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    30/90

    Cell refinement

    Cell refinement does not work well at low resolution (>~3)Just take values from indexing of several images

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    31/90

    Generally, once an orientation matrix and cell parameters have beenderived from the autoindexing procedures described, these parametersare refined further using different algorithms.

    Parameters to be refined:

    1) Crystal parameters: Cell dimensions, orientation, mosaic spread.2) Detector parameters:

    Detector position, orientation and (if appropriate) distortion parameters.3) Beam parameters (possibly): Orientation, beam divergence.

    Parameter Refinement

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    32/90

    The positional residual gives no information about small errors in the crystalorientation around the spindle axis, or about the mosaic spread.

    The angular residual gives no information on the detector parameters(because it does not depend on spot positions).

    Two types of refinement in Mosflm

    1) Using spot coordinates and a positional residual:

    1 = iix(Xicalc- Xiobs)2 + iy(Yicalc- Yiobs)22) Using spot position in and an angular residual:

    2 =

    i

    i[(Ricalc

    - Riobs

    )/di*

    ]2

    where Ricalc,Riobs are the calculated and observed distances ofthe reciprocal lattice point di* from the centre of the Ewaldsphere (Post refinement).

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    33/90

    A fully-recorded spot isentirely recorded on oneimage

    Partials are recordedon two or moreimages

    Fine-sliced data has spotssampled in 3-dimensions

    illustrations from Elspeth Garman

    Fully recorded and partially recorded reflections

    P t R fi t

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    34/90

    The radius of a reciprocal lattice point() is modelled by:

    Consider a partially recorded reflection spread over two images, with a recorded intensityI1 on the first and I2 on the second. To determine the observed position, P from thefraction of the total intensity that is observed on the first image , F = I1/(I1+I2), requires amodel for the rocking curve, eg:

    Knowing F and ,R, the distance of P from the sphere, can be calculated, giving Robs. (The plus

    or minus sign depends on whether the rlp is entering or exiting the sphere).

    Post Refinement

    Refine cell, orientation and mosaicity tominimise the angular residual ():

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    35/90

    Step 1: Predict the position in the digitised image of each Bragg reflection.Step 2: Estimate its intensity (need to subtract the X-ray background) andan error estimate of the intensity.

    Integration of the Images

    1) Predicting reflection positions

    Accuracy in prediction is crucial. Ideally, cell parameters should be known tobetter than 0.1%. Errors in prediction will introduce systematic errors inprofile fitting.

    Typically the detector parameters, crystal orientation and mosaic spread will berefined for every image during the integration. The cell parameters are notnormally refined.

    Integration window

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    36/90

    shift/click

    Run pointless & scalawith default options

    Integration window

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    37/90

    Summation integration:

    Sum the pixel values of all pixels in the peak area of the mask, and thensubtract the sum of the background values calculated from the backgroundplane for the same pixels.

    Profile fitting:Assume that the shape or profile (in 2 or 3 dimensions) of the spots isknown. Then determine the scale factor which, when applied to the knownspot profile, gives the best fit to be observed spot profile. This scale factor isthen proportional to the profile fitted intensity for the reflection. Minimise:

    R = i (Xi - KPi)2

    Xi is the background subtracted intensity at pixel iPi is the value of the standard profile at the corresponding pixeli is a weight, derived from the expected variance of XiKis the scale factor to be determined

    Summation integration and Profile Fitting

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    38/90

    Profile fitting assumes that the spot shape is independent of the spotintensity. For non-saturated spots this is a valid assumption, in spite of thedifferent appearance of strong and weak spots in the image.

    All these spots are fully recorded, the weaker spots look smaller becausethe signal is lost in the background.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    39/90

    Determining the "Standard" Profile

    The profiles are determined empirically (as the average of many spots). The spot shape variesaccording to position on the detector, and this must be allowed for (different programs dothis in different ways).

    Need to take precautions to avoid introducing systematic errors due to broadening profiles duringaveraging.For each reflection integrated, a new profile is calculated as a weighted mean of the standard profilesfor the adjacent regions.Profile fitting is used for both fully recorded and partially recorded reflections. Although this is strictlynot valid, in practice it works well.

    Profile in centre Profile at edge

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    40/90

    Standard Deviation Estimates

    For summation integration or profile fitted partially recordedreflections, a standard deviation can be obtained based on Poissonstatistics.

    For profile fitted intensities the goodness of fit of the scaled standardprofile to the true reflection profile can be used for fully recordedreflections.

    These will generally underestimate the true errors, and should bemodified accordingly at the merging step (see later) so that they reflect

    the actual differences between multiple (symmetry-related)measurements. It is important to get realistic estimates of the errors inthe intensities.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    41/90

    1) Autoindexing, preferably using two orthogonal images, will give the crystalcell parameters, orientation and a suggestion of the lattice symmetry. Usingthis information an initial estimate of the mosaicity can be obtained.

    2) Post refinement requires the integration of a series of images, and uses the

    observed distribution of intensity of partially recorded reflections over thoseimages to refine the unit cell and mosaic spread. Best carried out prior tointegration of the data set.

    3) During integration of the entire data set, the cell parameters are normallyfixed, but the detector parameters, crystal orientation and mosaic spread are

    refined to ensure the best prediction of spot positions.

    4) Intensities are estimated by both summation integration and profile fitting,but generally the profile fitted values are used for structure solution.

    Summary of the steps in data integration

    Strategy window

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    42/90

    Label willprobablychange

    gy

    D t ll ti t t

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    43/90

    Data collection strategy

    Total rotation range

    Ideally 180 (or 360 in P1 to get full anomalous data)

    Use programs (eg Mosflm) to give you the smallest required range (eg 90 fororthorhombic, or 2 x 30) and the start point.

    Rotation/image: not necessarily 1!

    good values are often in range 0.25 - 0.5, minimize overlap and background Time/imagedepends on total time available Detector position: further away to reduce background and improve spot resolution

    Compromise between statistics (enough photons/reflection, and multiplicity) andradiation damage. Radiation damage is the big problem.

    Radiation damage controls the total time available for crystal exposure.

    T o Cases:

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    44/90

    Two Cases:

    Anomalous scattering, MAD

    High redundancy is better than long exposures (eliminates outliers) Split time between all wavelengths, be cautious about radiation damage,reduce time & thus resolution if necessary Collect Bijvoet pairs close(ish) together in time: align along dyad or collect

    inverse-beam images

    Recollect first part of data at end to assess radiation damageData for refinement

    Maximise resolution: longer exposure time (but still beware of radiationdamage)

    High multiplicity less important, but still useful Use two (or more) passes with different exposure times (ratio ~10) if

    necessary to extend range of intensities (high & low resolution)

    Short wavelength (

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    45/90

    Decisions

    Select crystal

    Collect a few images to judge quality

    Decide strategy and collect all images

    Integration

    Index

    choose lattice Refine unit cell

    Integrate

    Choose Laue group (point

    group)

    Scale & merge

    Convert I to F

    Is this your best crystal?Mosaicity, resolution, size, ice

    Total rotation, rotation/image,exposure time, position ofdetector. Programs: DNA, BEST

    How good is the dataset?Any bad bits?

    Is the crystal twinned?

    Correct latticeIntegration parameters: boxsize, overlap check

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    46/90

    Determination of Space group

    The space group symmetry is only a hypothesis until the structure issolved, since it is hard to distinguish between true crystallographic andapproximate (non-crystallographic) symmetry.

    By examining the diffraction pattern we can get a good idea of the likelyspace group.

    It is also useful to find the likely symmetry as early as possible, since thisaffects the data collection strategy.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    47/90

    Stages in space group determination

    1. Lattice symmetry crystal class

    The crystal class imposes restrictions on cell dimensions, and this information isneeded for indexing & accurate image prediction

    Cubic a=b=c ===90Hexagonal/trigonal a=b ==90,=120Tetragonal a=b ===90Orthorhombic ===90Monoclinic ==90Triclinic no restrictionsHowever, these restrictions may occur accidentally, or from pseudo-symmetry, sowe need to score deviations between experimental cell dimensions and idealvalues: for this we need estimates of the errors. Various penalty functions havebeen used.

    2 L t (P tt )

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    48/90

    2. Laue group symmetry (Patterson group)

    The Laue group is the symmetry of the diffraction pattern, so can bedetermined from the observed intensities. It corresponds to the spacegroup without any translations, and with an added centre of symmetryfrom Friedels law.

    3. Point group symmetry

    For chiral space groups (ie all macromolecular crystals), there is only onepoint group corresponding to each Laue group. It corresponds to the spacegroup without any translations.

    4. Space group symmetry

    Point group + translations (eg screw dyad rather than pure dyad). Onlyvisible in diffraction pattern as systematic absences, usually along axes these are not very reliable indicators as there are few axial reflections andthere may be accidental absences.

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    49/90

    Protocol for space group determination(program POINTLESS)

    1. From the unit cell dimensions, find the highest compatible latticesymmetry (within a tolerance)

    2. Score each symmetry element (rotation) belonging to latticesymmetry using all pairs of observations related by that element

    3. Score combinations of symmetry elements for all possible sub-groups(Laue groups) of lattice symmetry group.

    4. Score possible space groups from axial systematic absencesScoring functions for rotational symmetry based on correlationcoefficient, since this relatively independent of the unknown scales.Rmeas values are also calculated

    POINTLESS

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    50/90

    POINTLESSCCP4i interface

    Multiple fileinput, same

    dataset

    Options for setting

    General options

    See ccp4 wiki

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    51/90

    Example: a confusing case in C222:

    Unit cell 74.72 129.22 184.25 90 90 90

    This has b 3 a so can also be indexed on a hexagonal lattice,

    lattice point group P622 (P6/mmm), with the reindex operator:

    h/2+k/2, h/2-k/2, -l

    Conversely, a hexagonal lattice may be indexed as C222 in three distinctways, so there is a 2 in 3 chance of the indexing program choosing thewrong one

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    52/90

    A hexagonal lattice may be indexed as C222 in three distinct ways so

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    53/90

    A hexagonal lattice may be indexed as C222 in three distinct ways, sothere is a 2 in 3 chance of the indexing program choosing the wrong one

    Hexagonal axes (black)

    Three alternativeC-centred orthorhombic

    Lattices (coloured)

    The distinctionbetween thepossibilities dependson the symmetry ofthe intensities, not thelattice symmetry

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    54/90

    Score each symmetry operator in P622

    Only the orthorhombic symmetry operators are present

    Correlation coefficient on E2 Rfactor (multiplicity weighted)

    Nelmt Lklhd Z-cc CC N Rmeas Symmetry & operator (in Lattice Cell)

    1 0.808 5.94 0.89 9313 0.115 identity

    2 0.828 6.05 0.91 14088 0.141 *** 2-fold l ( 0 0 1) {-h,-k,+l}

    3 0.000 0.06 0.01 16864 0.527 2-fold ( 1-1 0) {-k,-h,-l}

    4 0.871 6.33 0.95 10418 0.100 *** 2-fold ( 2-1 0) {+h,-h-k,-l}

    5 0.000 0.53 0.08 12639 0.559 2-fold h ( 1 0 0) {+h+k,-k,-l}

    6 0.000 0.06 0.01 16015 0.562 2-fold ( 1 1 0) {+k,+h,-l}

    7 0.870 6.32 0.95 2187 0.087 *** 2-fold k ( 0 1 0) {-h,+h+k,-l}8 0.000 0.55 0.08 7552 0.540 2-fold (-1 2 0) {-h-k,+k,-l}

    9 0.000 -0.12 -0.02 11978 0.598 3-fold l ( 0 0 1) {-h-k,+h,+l} {+k,-h-k,+l}

    10 0.000 -0.06 -0.01 17036 0.582 6-fold l ( 0 0 1) {-k,+h+k,+l} {+h+k,-h,+l}

    Z-score(CC)Likelihood

    A clear preference for Laue group Cmmm

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    55/90

    A clear preference for Laue group Cmmm

    Net Z(CC)Likelihood

    Correlation coefficient& R-factor

    Cell deviation

    Net Z(CC) scores are

    Z+(symmetry in group) - Z-(symmetry not in group)Likelihood allows for the possibility of pseudo-symmetry

    Laue Group Lklhd NetZc Zc+ Zc- CC CC- Rmeas R- Delta ReindexOperator

    > 1 C m m m *** 0.991 6.00 6.12 0.12 0.93 0.02 0.12 0.56 0.1 [1/2h-1/2k,3/2h+1/2k,l]> 2 C 1 2/m 1 0.367 5.00 6.13 1.13 0.95 0.17 0.10 0.48 0.1 [3/2h+1/2k,-1/2h+1/2k,l]> 3 C 1 2/m 1 0.365 4.55 6.04 1.49 0.95 0.22 0.09 0.46 0.1 [1/2h-1/2k,3/2h+1/2k,l]> 4 P 1 2/m 1 0.250 4.88 5.99 1.11 0.91 0.17 0.14 0.49 0.0 [1/2h+1/2k,l,1/2h-1/2k]5 P -1 0.031 4.27 5.94 1.67 0.89 0.25 0.12 0.44 0.0 [-1/2h+1/2k,-1/2h-1/2k,l]

    6 C 1 2/m 1 0.000 2.45 4.18 1.73 0.08 0.26 0.54 0.44 0.1 [3/2h-1/2k,1/2h+1/2k,l]7 C 1 2/m 1 0.000 1.62 3.40 1.79 0.08 0.27 0.56 0.43 0.1 [-1/2h-1/2k,3/2h-1/2k,l]8 C 1 2/m 1 0.000 0.60 2.55 1.95 0.01 0.29 0.56 0.42 0.0 [-k,h,l]9 C 1 2/m 1 0.000 0.57 2.52 1.96 0.01 0.29 0.53 0.43 0.0 [h,k,l]10 P -3 0.000 0.75 2.68 1.93 -0.02 0.29 0.60 0.42 0.1 [1/2h-1/2k,1/2h+1/2k,l]11 C m m m 0.000 2.60 3.80 1.20 0.44 0.18 0.38 0.47 0.1 [-1/2h-1/2k,3/2h-1/2k,l]

    =12 C m m m 0.000 0.94 2.59 1.65 0.26 0.25 0.42 0.46 0.0 [h,k,l]13 P 6/m 0.000 0.83 2.54 1.70 0.24 0.26 0.45 0.44 0.1 [1/2h-1/2k,1/2h+1/2k,l]14 P -3 m 1 0.000 0.72 2.46 1.74 0.24 0.26 0.45 0.44 0.1 [1/2h-1/2k,1/2h+1/2k,l]15 P -3 1 m 0.000 -0.57 1.79 2.36 0.10 0.35 0.52 0.39 0.1 [1/2h-1/2k,1/2h+1/2k,l]16 P 6/m m m 0.000 2.09 2.09 0.00 0.25 0.00 0.44 0.00 0.1 [1/2h-1/2k,1/2h+1/2k,l]

    Reindexing

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    56/90

    Systematic absences

    For a Pq axis along say c (index l), axial reflections are only present if

    l = (p/q)n where n is an integereg21 2n2,4,6,

    31

    3n3,6,9,

    41,43 4n4,8,12,42 2n2,4,6,61, 65 6n6,12,18,62, 64 3n3,6,9,63 2n2,4,6,

    BUT we may only have observed a few of the axial reflections, so be

    careful

    S i l 00l h i C222

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    57/90

    Zone Number PeakHeight SD Probability ReflectionCondition

    1 screw axis 2(1) [c] 109 0.878 0.083 0.747 00l: l=2n

    Spacegroup TotProb SysAbsProb Reindex Conditions

    ( 20) 1.063 0.747 00l: l=2n (zones 1).......... ( 21) 0.360 0.253

    Screw axis along 00l shows space group is C2221

    PeakHeight from Fourier analysis1.0 is perfect screw Probability of screw

    Screws detected by Fourieranalysis of I/

    Alternati e inde in

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    58/90

    Alternative indexing

    If the true point group is lower symmetry than the lattice group,

    alternative valid but non-equivalent indexing schemes are possible, relatedby symmetry operators present in lattice group but not in point group(these are also the cases where merohedral twinning is possible)

    eg if in space group P3 there are 4 different schemes(h,k,l) or (-h,-k,l) or (k,h,-l) or (-k,-h,-l)

    For the first crystal, you can choose any scheme

    For subsequent crystals, the autoindexing will randomly choose onesetting, and we need to make it consistent: POINTLESS will do this for youby comparing the unmerged test data to a merged reference dataset

    E l

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    59/90

    Reindex Operator CC Rfactor(E^2) Number RMSdeviation

    [h,k,l] 0.904 0.199 7999 0.17[-h,-k,l] 0.278 0.503 7996 0.17

    Example

    Space group P 32 2 1

    two possible indexing schemes

    POINTLESS

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    60/90

    Consistent indexing to reference file (merged or unmerged)

    Example in space group H3 (R3 hexagonal setting)

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    61/90

    Scaling and Data Quality

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    62/90

    |F|2 I

    Experiment

    |F|2I

    lots of effects(errors)

    Model of experiment

    Parameteriseexperiment

    Our job is to invert the experiment

    Scaling and Merging

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    63/90

    Choices

    What scaling model? the scaling model should reflect the experiment

    considerations of scaling may affect design of experiment

    Is the dataset any good? should it be thrown away immediately?

    what is the real resolution? are there bits which should be discarded (bad images)?

    Wh fl i diff l ?

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    64/90

    Why are reflections on different scales?

    Various physical factors lead to observed intensities being ondifferent scales. Some corrections are known eg Lorentz andpolarisation corrections, but others can only be determined fromthe data

    Scaling models should if possible parameterise the experiment sodifferent experiments may require different models

    Understanding the effect of these factors allows a sensible design

    of correction and an understanding of what can go wrong(a) Factors related to incident beam and the camera(b) Factors related to the crystal and the diffracted beam(c) Factors related to the detector

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    65/90

    Factors related to incident Xray beam

    (a)

    incident beam intensity: variable on synchrotrons and not normallymeasured. Assumed to be constant during a single image, or at leastvarying smoothly and slowly (relative to exposure time). If this isnot true, the data will be poor

    (b) illuminated volume: changes with if beam smaller than crystal

    (c) absorption in primary beam by crystal: indistinguishable from (b)

    (d) variations in rotation speed and shutter synchronisation. These

    errors are disastrous, difficult to detect, and (almost) impossible tocorrect for: we assume that the crystal rotation rate is constantand that adjacent images exactly abut in . Shutter synchronisationerrors lead to partial bias which may be positive, unlike the usualnegative bias

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    66/90

    Factors related to crystal and diffracted beam

    (e) Absorption in secondary beam - serious at long wavelength(including CuK), worth correcting for MAD data

    (f) radiation damage - serious on high brilliance sources. Not easilycorrectable unless small as the structure is changing

    Maybe extrapolate back to zero time?

    The relative B-factor is largely a correction for radiation damage

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    67/90

    Factors related to the detector

    The detector should be properly calibrated for spatial distortionand sensitivity of response, and should be stable. Problems with thisare difficult to detect from diffraction data.

    The useful area of the detector should be calibrated or told to theintegration program

    Calibration should flag defective pixels and dead regions egbetween tiles

    The user should tell the integration program about shadowsfrom the beamstop, beamstop support or cryocooler (definebad areas by circles, rectangles, arcs etc)

    Determination of scales

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    68/90

    Determination of scalesWhat information do we have?

    Scales are determined by comparison of symmetry-related reflections, ie byadjusting scale factors to get the best internal consistency of intensities.Note that we do not know the true intensities and an internally-consistentdataset is not necessarily correct. Systematic errors which are the same forsymmetry-related reflections will remain

    Minimize = hl whl (Ihl - 1/khl)2

    Ihl lth intensity observation of reflection h

    khl scale factor for Ihl

    current estimate of Ihghl = 1/khl is a function of the parameters of the scaling model

    ghl = g( rotation/image number) . g(time) . g(s) ...other factors

    Primary beam s0 B-factor Absorption

    Scaling function (SCALA)

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    69/90

    ghl = g( rotation/image number) . g(time) . g(s) ...other factors

    Primary beam s0 B-factor Absorption eg tails

    Scaling function (SCALA)

    scale is smooth function of spindlerotation ()

    or discontinuous function of image(batch) number (usually lessappropriate)

    g(time) = exp[+2B(time) sin2

    /2

    ]

    essentially a time-dependent radiationdamage correction

    Time

    fall-off of highresolution datawith time

    variation ofintensity with

    Secondary beam correction (absorption)

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    70/90

    Sample dataset: Rotatinganode (RU200, Osmic mirrors,Mar345) Cu K (1.54)100 images, 1, 5min/,

    resolution 1.8

    Rmerge

    NoAbsCorr

    AbsCorr

    No AbsCorr

    AbsCorr/sd

    Correction improves the data

    corrected

    uncorrected

    Phasing power

    expressed as sum of spherical harmonics g(,) = lm Clm Ylm(,)

    y ( p )

    scale as function of secondary beam direction(,)

    How well are the scales determined?

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    71/90

    This depends on the strategy of data collection, thus affects the strategy

    Note that determination of scaling parameters depends on symmetry-related observations having different scales. If all observations of areflection have the same value of the scale component, then there is noinformation about that component and it remain as a systematic error

    in the merged data (this may well be the case for absorption forinstance)

    Thus to get intensities with the lowest absolute error, the symmetry-related observations should be measured in as different way as possible(eg rotation about multiple axes). This will increase Rmerge, but improvethe estimate of .

    Conversely, to measure the most accurate differences for phasing(anomalous or dispersive), observations should be measured in assimilar way as possible

    How well are the scales determined?

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    72/90

    For multiple-wavelength datasets, it is best to scale all wavelengthstogether simultaneously. This is then a local scaling to minimise thedifference between datasets, reducing the systematic error in theanomalous and dispersive differences which are used for phasing

    Other advantages of simultaneous scaling:- rejection of outliers with much higher reliability because ofhigher multiplicity (but with the danger of eliminating real signal)

    correlations between Fanom and Fdisp indicate the reliability of

    the phasing signal (very) approximate determination of relative f" and relative f'values

    Scaling datasets together

    SCALA

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    73/90

    Automatic optimisation of SD correction parameters

    SigmaFull

    SigmaPartial

    Sigma(scatter/SD) All runs

    Irms

    10000 20000

    0

    0.5

    1

    SigmaFull

    SigmaPartial

    Sigma(scatter/SD) All runs

    Irms

    10000 20000

    0

    0.5

    1

    Before After

    Optimisation of2 = SDfac2 [2 + SdB + (SdAdd )2]

    Minimises deviation of Sigma(scatter/) from 1.0ie flattens out the plot

    Makes average scatter2

    equal to average SD2

    Questions about the data

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    74/90

    What is the overall quality of the dataset? How does it

    compare to other datasets for this project?

    What is the real resolution? Should you cut the high-resolution data?

    Are there bad batches (individual duff batches or ranges ofbatches)?

    Was the radiation damage such that you should exclude thelater parts?

    Is the outlier detection working well?

    Is there any apparent anomalous signal?

    Questions about the data

    What to look at?

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    75/90

    A. How well do equivalent observations agree with each other?

    1. R-factors: traditional overall measures of quality

    (a) Rmerge (Rsym) = | Ihl - | / | |This is the traditional measure of agreement, but it increases with highermultiplicity even though the merged data is better

    (b) Rmeas = Rr.i.m.= (n/n-1) | Ihl - | / | |

    The multiplicity-weight R-factor allows for the improvement in data withhigher multiplicity. This is particularly useful when comparing differentpossible point-groups (it is output by POINTLESS along with the correlationcoefficient, as well as in SCALA)

    (c) Rp.i.m.= (1/n-1) | Ihl - | / | |Precision-indicating R-factor gets better (smaller) with increasingmultiplicity, ie it estimates the precision of the merged

    Diederichs & Karplus, Nature Structural Biology, 4, 269-275 (1997)

    Weiss & Hilgenfeld, J.Appl.Cryst. 30, 203-205 (1997)

    2. Intensities and standard deviations: what is the real resolution?

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    76/90

    (a) Corrected (Ihl)2 = SDfac2 [2 + SdB + (SdAdd )2]

    The corrected (I) is compared with the intensities: the most useful statistic is

    < / () > (labelled Mn(I)/sd in table) as a function of resolution

    This statistic shows the improvement of the estimateof with multiple measurements. It is the bestindicator of the true resolution limit

    < / () > greater than ~ 2 (or so)

    Maybe lower for anisotropic data, 1.5 to 1.0

    (b) Correlationbetween half datasets(random halves)

    ResolutionCorrelationcoeffic

    ient

    Correlation of

    indicating aresolution limit

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    77/90

    B. Are some parts of the data bad?

    Analysis of Rmerge against batch number gives a very clear indicationof problems local to some regions of the data. Perhaps somethinghas gone wrong with the integration step, or there are some badimages

    Here the beginning ofthe dataset is wrongdue to problems inintegration (Mosflm)

    A case of severe radiation damage: B-factor should be small (not more

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    78/90

    g (than -10, and even that is large)

    -10

    Outliers

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    79/90

    Detection of outliers is easiest if the multiplicity is high

    Removal of spots behind the backstop shadow does not work well at

    present: usually it rejects all the good ones, so tell Mosflm where thebackstop shadow is

    Scala also has facilities for omitting regions of the detector (rectangles andarcs of circles)

    Inspect the ROGUES file to see what is being rejected (at leastoccasionally)

    The ROGUES file contains all rejected reflections (flag "*", "@" for I+- rejects, "#" for Emax rejects)

    TotFrc = total fraction, fulls (f) or partials (p)

    Flag I+ or I- for Bijvoet classes

    DelI/sd = (Ihl - Mn(I)others)/sqrt[sd(Ihl)**2 + sd(Mn(I))**2]

    h k l h k l Batch I sigI E TotFrc Flag Scale LP DelI/sd d(A) Xdet Ydet Phi

    (measured) (unique)

    -2 -2 0 2 2 0 1220 24941 2756 1.03 0.95p I- 2.434 0.031 -1.1 30.40 1263.7 1103.2 210.8

    -4 2 0 2 2 0 1146 9400 2101 0.63 0.99p *I+ 3.017 0.032 -6.7 30.40 1266.4 1123.3 151.3

    4 -2 0 2 2 0 1148 27521 2972 1.08 1.09p I- 2.882 0.032 0.0 30.40 1058.8 1130.0 153.2

    2 -4 0 2 2 0 1075 29967 2865 1.13 0.92p I+ 2.706 0.032 1.1 30.40 1060.9 1106.6 94.4

    Weighted mean 27407

    Reasons for outliers

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    80/90

    outside reliable area of detector (eg behind shadow)

    specify backstop shadow, calibrate detector ice spots

    do not get ice on your crystal! zingers

    bad prediction (spot not there) improve prediction spot overlap

    lower mosaicity, smaller slice, move detector back deconvolute overlaps multiple lattices

    find single crystal

    Ice rings

    Rejects lie on

    ice rings (red)(ROGUEPLOT

    in Scala)

    Position of rejects on detector

    Detection of anomalous signal

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    81/90

    0

    2

    4

    -2

    -4

    0 2 4-2-4

    D(expected)

    D(obse

    rved)

    PeakEdge Remote

    Are the differences greaterthan would be expectedfrom the errors?

    Test using a NormalProbability Plot: a slope> 1.0 means a significantdifference

    Differences are largest at the peak wavelength

    Are the different measurements of the anomalous difference correlated?

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    82/90

    Correlation between wavelengths (MAD)

    Resolution Resolution

    Correlationcoefficient

    Correlationcoefficient

    Correlation between half datasetsat peak wavelength

    3.5 3.5

    Correlation of indicating resolutionlimit

    This can be used to set the useful resolution for finding anomalous scatterers

    Centric, no anomalous

    Another way of looking at correlations: scatter plot ofanom1 v.anom2

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    83/90

    Correlated differences Uncorrelated native

    Resolution3.5

    Ratio of distribution widthalong to width acrossdiagonal ~= signal/noise

    Running SCALA from ccp4i interface

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    84/90

    See ccp4 wiki

    Intensity distributions and their pathologies

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    85/90

    One way of analysing this: the

    cumulative intensity plot is thefraction of intensities less than afraction Z of the mean, eg foracentric reflections, we expect18.1% of the reflections to havean intensity < 20% of the meanintensity at that resolution.

    A sigmoidal curve implies fewerweak reflections than expected,the hypercentric curve too many.

    Deviations from the expected Wilson distribution of intensities isdiagnostic of various crystal and processing pathologies, notably twinning.

    0.4

    0.2

    0.6

    0.40.2 0.6 0.8Z=I/

    N(Z)Theoretical curve

    Sigmoidal curve

    Hypercentriccurve

    Most statistics compare intensities with the average in resolution shells.This is equivalent to normalisingintensities to make = 1.0

    (1) Pathologies arising from the intensities themselves

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    86/90

    (2) Pathologies arising from the intensity averages

    Too few weak reflections:

    Twinning the twin operator superimposes (on average) a stronger

    reflection on a weak one Overlapped reflections a weak reflection is overestimated due tocontamination by a neighbouring strong reflection

    Systematic underestimation of background this can arise from anunderestimate of the detector gain, leading to Poisson too small, andthen rejecting too many many high outlier points, biasing thebackground

    Too many (usually) weak reflections

    because average is inappropriate:

    Anisotropic diffraction in resolution shells is wrong Translational NCS whole classes of reflections are weak and shouldbe compared to their own average, eg NCS ~ (1/2, 0, 0) makes h odd

    reflections weak

    Merohedral twinning (exact overlap of lattices) is possible if the true

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    87/90

    Example from Jan Lwe

    apparent point group 422a=64.3, c=198.8, dimer/ASU,

    35kDa, 2.0

    Structure solved on untwinned crystal by MADMolecular replacement (difficult, large conformational change)Refined in CNS 1.1 with =0.5, P41

    422 has no possibility of twinning,must be lower point group (4).

    point group is lower symmetry than the lattice point group.

    Intensity statistics show too few weak reflections (and too few strong

    ones)

    Another case: pseudo-merohedral twinning

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    88/90

    Acent_theor

    Acent_obser

    Centric_theor

    Centric_obser

    Cumulative intensity distribution (Acentric and centric)

    Z

    0 0.2 0.4 0.6 0.8 1

    0

    20

    40

    60

    Unit cell: 79.2, 81.3 81.2 90, 90, 90

    True space group: P212121

    Pseudo-merohedral twinning into point-group P422 (twin operator k,h,-l)79.2 = a b = 81.3 (not very close!)

    Split spots due tonon-overlappinglattices

    on image

    in averageprofiles

    Solved by SeMet MAD at 3.1 resolution, ignoring twinningModel refined with 20% twinning in CNS at 2.6 resolution

    Sigmoidal cumulative intensity plot

  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    89/90

    References:

    ccp4 wiki (www.ccp4wiki.org)

    CCP4 Study Weekends

    Many useful papers

    Acta Cryst. D62, part 1, 1-123 (2006)

    Data collection and analysis

    Acta Cryst. D55, part 10, 1631-1772 (1999)

    Data collection and processing

    Acknowledgements

    http://www.ccp4wiki.org/http://www.ccp4wiki.org/http://www.ccp4wiki.org/
  • 7/27/2019 Evans-Chicago08-integration-datareduction.pdf

    90/90

    g

    Andrew Leslie slides

    Mosflm team:Present:

    Andrew LeslieHarry Powell mosflmLuke KontogiannisimosflmPast:Geoff Battyeimosflm

    Pointless:Ralf Grosse-Kunstleve cctbxKevin Cowtan clipper, simplex, C++ adviceMartyn Winn & CCP4 gang ccp4 librariesPeter Briggs ccp4i