The Effect of Finite Sampling on the Determination of Orientational Properties

47
The Effect of Finite Sampling on the Determination of Orientational Properties Nick Patrick Computer Science 260 Duke University March 6, 2008

description

The Effect of Finite Sampling on the Determination of Orientational Properties. Nick Patrick Computer Science 260 Duke University March 6, 2008. Motivation. Problem: - PowerPoint PPT Presentation

Transcript of The Effect of Finite Sampling on the Determination of Orientational Properties

The Effect of Finite Sampling on the Determination of Orientational

Properties

Nick PatrickComputer Science 260

Duke UniversityMarch 6, 2008

Motivation• Problem:

– Determine Saupe alignment tensor S from a set of RDCs and corresponding interatomic bond vectors (e.g. NH bond vectors) using SVD

• Generalized problem:– Determine a second-rank

tensor D from interatomic bond vectors and other experimental data

• How accurate is the tensor?

• Why is this important?Losonczi et al. (1999).

Annila and Permi. (2004).

Motivation

• Develop a mathematical framework to quantify the accuracy of the tensor derived from experimental data

• Based on the uniformity of the distribution of bond vectors

Fushman, D., Ghose, R., and Cowburn, D. (2000). The effect of finite sampling on the determination of orientational properties: A theoretical treatment with application to interatomic vectors in proteins. J. Am. Chem. Soc. 122: 10640-10649.

Outline

I. Background

II. Theory

III. Results

IV. Applications

V. Alternate Approaches

Background

• “Distribution of bond vectors”– Consider the orientation of

NH bond vectors in an α-helix

– Suppose the alignment tensor has the illustrated set of principle axes

– Principal axis of alignment tensor would be well sampled, while axes orthogonal essentially undetermined

– CαHα bond vectors?1UBI, residues 23-34

Background

1. How well is the orientation space sampled by a distribution of interatomic vectors?

2. How well does the bond vector distribution sample the various components of the second-rank tensor of interest?

3. How well can the bond vector distribution completely characterize the tensor?

Background

• Which sets of bond vectors sample the orientation space the best? (NH, CαHα, etc.)

• Survey known structures from PDB to determine

from lecture notes

Background

• If an infinite number of vectors were available, all directions in the orientation space are sampled equally

• The quality of the determined tensor would be independent of the orientation of its principal axes

• In real NMR experiments:– The set of interatomic vectors is finite; incomplete

sampling of orientation space– The orientational distribution of available vectors is

not uniform

Outline

I. Background

II. Theory

III. Results

IV. Applications

V. Alternate Approaches

Theory

• We want to (a) characterize the distribution of bond vectors, and (b) characterize the accuracy of the tensor

• We can derive:– sampling tensor (Ω): Represents the sampling of the bond

vectors along three axes of an arbitrary reference frame– generalized sampling parameter (Ξ): Quantifies the degree of

uniformity of the distribution of bond vectors– average constant (Dav): Quantifies how well the tensor of

interest, D, is sampled– generalized quality factor (Λ): Describes how efficiently a set of

bond vectors samples all elements of the tensor of interest, D

Review

• The alignment tensor S represents the average substructure alignment in an aligning medium

• S can be diagonalized:

• V is a 3×3 rotation matrix defining the principal order frame (rotation from molecular frame)

• Σ is a 3×3 diagonal, traceless matrix containing the principal values of S (Szz, Syy, Sxx)

Sampling Tensor (Ω)

• ri = projection of a unit vector on the axis i• i, j = x’, y’, z’ (an arbitrary reference frame)

• Ω can be diagonalized to yield:– the principal axis frame (a rotation R(φ,θ,ψ) from the arbitrary

reference frame) corresponding to the direction of best sampling

– the principal values Ωi of the sampling tensor

• Represents the sampling of bond vector orientations along three axes of an arbitrary reference frame

Sampling Tensor (Ω)

• Ωi = the principal values of the sampling tensor• fi = the fraction of vectors oriented along the three principal

directions• ordering (direction of best sampling): Ωz ≥ Ωy ≥ Ωx, thus fz ≥ fy ≥ fx

or

• Optimal: If the distribution of vectors is uniform, fx = fy = fz = 1/3, and Ω is the null tensor

• Worst case: If all vectors are oriented along the principal z-axis, then fx = fy = 0, fz = 1 and Ωx = Ωy = -1/2, Ωz = 1

• Deviations of fi from 1/3 and Ωi from 0 reflect non-uniformity

Example• Optimal: If the distribution of

vectors is uniform, fx = fy = fz = 1/3, and Ω is the null tensor

• Worst case: If all vectors are oriented along the principal z-axis, then fx = fy = 0, fz = 1 and Ωx = Ωy = -1/2, Ωz = 1

• Example:– Consider the NH vectors of

an α-helix aligned along the helical axis a

– a is approximately parallel to the principal z-axis of Ω 1UBI, residues

23-34

Geometric Representation

• Sampling fractions for a set of bond vectors can be represented as a vector in {fx, fy, fz}-space

• f = (fx, fy, fz)

• Plane can be parameterized by {η, ζ}, rhombic and axial components

{fx, fy, fz}-space

Generalized Sampling Parameter (Ξ)

• Optimal: fx = fy = fz = 1/3 Ξ = 0

• Worst case: fx = fy = 0, fz = 1 Ξ = 1

• Quantifies the degree of uniformity of the distribution of bond vector orientations, on a scale from 0 to 1

Average Constant (Dav)

• Quantifies how well the tensor of interest, D, is sampled

• Ωij represent elements of the sampling tensor Ω, while Dij represent elements of D

• If all parts of the tensor are sampled equally well, Dav = (1/3)Tr[D] = Diso

Average Constant (Dav)• Rewriting in the principal axis frame of the sampling tensor,

we get

which quantifies how well each principal component of the tensor is defined by the distribution of vectors

• principle components = Di (= Dxd, Dyd , Dzd

)

for example, Sxx, Syy, Szz for the alignment tensor

• Φi = {Φx, Φy, Φz} is a three component vector which measures how well each principle component Di is sampled

Average Constant (Dav)

• Φi = {Φx, Φy, Φz} is a three component vector which measures how well each principle component Di is sampled

• (li, mi, ni) are direction cosines which relate the ith principal axis of D (i = xd, yd, zd) to the principle axes of the sampling tensor (x, y, z)

• Optimal: If there is a uniform distribution, Φx, Φy, Φz = 1/3 and all principle components of Di are uniformly sampled

• Worst case: If all vectors are aligned parallel to some axis a– When the ith principal axis of D is parallel to a:

Di is maximally sampled and Φi = 1– When the ith principal axis of D is orthogonal to a:

Di is minimally sampled and Φi = 0

Example• Worst case: If vectors are

aligned parallel to someaxis a– If the ith principal axis of D is

parallel to a, Di is maximally sampled and Φi = 1

– If the ith principal axis of D is orthogonal to a, Di is minimally sampled, Φi = 0

• Consider the NH vectors of an α-helix aligned along the helical axis a, and suppose the tensor of interest, D, has principle axes such that axis zd || a 1UBI, residues

23-34

If the tensor D has principal axes such that zd || a

Then:Φz ≈ 1Φy ≈ 0Φx ≈ 0

(Φi measures how well the principle component Di is sampled)

Generalized Quality Factor

• Describes how efficiently a set of bond vectors samples all elements of D, on a scale from 0 to 1

• Recall: Dav quantifies how well D is sampled, and Diso is the optimal value of Dav; Λ represents deviation from optimal value

• Can also calculate Λmin, lower bound on tensor quality, based only on bond vector distribution

• Optimal: Λ = 1, Worst case: Λ = 0

• f = (fx, fy, fz)• Parametrize plane in terms of

{η, ζ}

• Rhombic component:

• Axial component:

• Intuition: degree of “directional asymmetry” of sampling tensor

Geometric Representation

fz = fy = fx = 1/3

η = ζ = 0

Geometric Representation

Ξ (generalized sampling parameter)

Λ (generalized quality factor) “allowed triangle”,

since fz > fy > fx bounds η, ζ

Review• sampling tensor (Ω): Represents the sampling of the bond vectors

along three axes of an arbitrary reference frame

• generalized sampling parameter (Ξ): Quantifies the degree of uniformity of the distribution of bond vectors

• average constant (Dav): Quantifies how well the tensor of interest, D, is sampled

• generalized quality factor (Λ): Describes how efficiently a set of bond vectors samples all elements of D; that is, how accurate is the tensor in general?

Questions so far?

Theory

• How are quality factor (Λ) and actual tensor accuracy of D related?– Take a correct tensor, introduce random errors principal values

(εd) and orientation (εa) of D– Correlate size of error to decrease in quality factor (Λ)

• When principal axes of Ω and D are related by “magic angle,” correlation with Λ breaks down (although Ξ still accurate)

• Useful framework, but “contrived” and subject to errors• Other approaches?

Outline

I. Motivation

II. Theory

III. Results

IV. Applications

V. Alternate Approaches

Results

• Sampling properties of known structures– Survey using structures from

PDB– 1736 structures (879 single

proteins, 857 multi-subunit proteins)

– Represents all experimentally determined protein folds

– Structural basis for distributions

• Each structure (i.e. each set of bond vectors) corresponds to a point in the “allowed triangle” parametrized by {η, ζ}

Results• Ξ: 0 optimal, 1 worst case

• Λ: 1 optimal, 0 worst case

Results

• Ideal α-helix • Ideal β-sheet

Molecular Biology of the Cell: Fifth Edition

Results

• Structural basis for distributions

• Different generalized sampling parameters in different secondary structures

generalized sampling parameter (Ξ): 0 optimal, 1 worst case

ResultsSampling distribution:

Results

• Observations:– NH vectors are the least uniformly distributed– C’O vectors also non-uniform, correlated to

NH; almost antiparallel, in the same peptide plane as NH vectors

– Reflects protein folding, secondary structure, e.g. N-H•••O=C hydrogen bonding in α-helices, β-sheets

Results

• α-helix, 310 helix: NH vectors highly ordered, adding CαHα improves

• β-sheet: NH and CαHα highly ordered, adding CαHα will not improve

generalized sampling parameter (Ξ): 0 optimal, 1 worst case

Results

• Ideal α-helix • Ideal β-sheet

Molecular Biology of the Cell: Fifth Edition

Outline

I. Motivation

II. Theory

III. Results

IV. Applications

V. Alternate Approaches

Applications

• What does determining these values (sampling parameter, quality factor) allow us to do?

• Characterize the accuracy of the tensor derived from experimental data

• Therefore, we can optimize experimental design– Which vectors to use? Avoid limitations from vector

set– Which aligning medium to use? Optimize sampling

Applications• Example 1: Determine rotational

diffusion tensor from 15N relaxation data; NH vectors

• βARK PH domain (PDB: 1BAK)– all residues:

Ξ = 0.0232Λ = 0.9256f = (0.4060, 0.3583, 0.2357)

– α-helical residues:Ξ = 0.7610f = (0.9148, 0.0473, 0.0379)

– β-strand residues:Ξ = 0.1398f = (0.5569, 0.3171, 0.1261)

• Note: NH vectors in β-strands orthogonal to helical NH vectors

Applications

• Conclusion:– With NH vectors,

α-helix alone insufficient to fully characterize tensor

• Solution:– Use more vectors, or

additional set(s) of vectors

– CαHα or CαC’ vectors

Applications

• Example 2: Determining Saupe alignment tensor from RDC measurements

• Ubiquitin in liquid-crystalline aligning medium; NH vectors– Ξ = 0.1084

Λ = 0.7724Λmin = 0.69

– The quality factor (Λ) changes if we change the alignment tensor frame (sampling tensor frame remains the same)

Prestegard et al. (2004).

The quality factor (Λ) depends on the orientation of the axes of the alignment tensor with respect to the sampling tensor frame

Applications

• Conclusion:– Need to change the orientation of

the alignment tensor

• Solution:– Changing the orientation of the

alignment will result in more optimal sampling, higher quality factor (Λ), and more accurate alignment tensor

– Experimentally: dope aligning medium with ions (circles), use different orienting medium (triangles), etc.

Prestegard et al. (2004).

The quality factor (Λ) depends on the orientation of the axes of the alignment tensor with respect to the sampling tensor frame

Applications

• Nuclear Vector Replacement (NVR)– Assignment depends on

accuracy of alignment tensor• Backcalculate RDCs from

bond vectors in structural model: D = DmaxvTSv

– NH RDCs in two media (two alignment tensors)

• Minimalistic approach• Saves spectrometer time (no

13C-labeling, triple resonance experiments)

• What about sampling of tensors? What about distribution of NH bond vectors? from lecture notes

Langmead and Donald. (2004).

Applications

• My project:– Modify NVR: NH, CαHα RDCs in

one medium• More experiments, more expensive

experiments• But, allows testing on new systems• Also, increased assignment

accuracy?– Grouping NH, CαHα bond vectors

gives more uniform distribution– One alignment tensor instead of

two; better sampled– Alignment tensor more accurate,

resulting in more accurate assignments?

– Better disambiguation of RDCs?

distributionmore uniform, higher Λ

Outline

I. Motivation

II. Theory

III. Results

IV. Applications

V. Alternate Approaches

Alternate Approaches

• How accurate is a tensor?– This approach: concerned with distribution of bond

vectors used to determine tensor

– Other approach: compare estimated tensor to correct tensor

Yan, A.K., Langmead, C.J., & Donald, B.R. (2005). A probability-based similarity measure for Saupe alignment tensors with applications to

residual dipolar couplings in NMR structural biology. The International Journal of Robotics Research. 24(2-3): 165-182.

– Also suggested: assume uniform distribution of bond vectors, compare distribution of RDC values generated (RMSD)

Alternate Approaches

• Compare estimated Saupe matrix to correct Saupe matrix– Upper bound on the probability that a

randomly rotated tensor has error smaller than the estimated tensor

– Compare eigenvalues (compare axial and rhombic components of the tensor)

– Compare angular error between eigenvectors

Summary

• How accurate is a second-rank tensor determined from bond vectors and other experimental data?– How well is orientation space sampled?– How well are components of tensor sampled?– How well is the tensor completely characterized?

• Sampling properties of bond vector sets in real proteins have a biological, structural basis

• Can use measurements to optimize experimental design

Thanks!Questions?

References:

Fushman, D., Ghose, R., & Cowburn, D. (2000). The effect of finite sampling on the determination of orientational properties: A theoretical treatment with application to interatomic vectors in proteins. J. Am. Chem. Soc. 122: 10640-10649.

Annila, A., Permi, P. (2004). Weakly aligned biological macromolecules in dilute aqueous liquid crystals. Concepts in Magnetic Resonance. 23A(1): 22-37.

Langmead, C.J., & Donald, B.R. (2004). An expectation/maximization based nuclear vector replacement algorithm for automated NMR resonance assignments. Journal of Biomolecular NMR. 29: 111-138.

Losonczi, J.A., Andrec, M., Fischer, M.W.F., Prestegard, J.H. (1999). Order matrix analysis of residual dipolar couplings using singular value decomposition. Journal of Magnetic Resonance. 138: 334-342.

Yan, A.K., Langmead, C.J., & Donald, B.R. (2005). A probability-based similarity measure for Saupe alignment tensors with applications to residual dipolar couplings in NMR structural biology. The International Journal of Robotics Research. 24(2-3): 165-182.

Lecture notes from Computer Science 260 at Duke University, Spring 2008. http://www.cs.duke.edu/brd/Teaching/Bio/asmb/current/