Molecular Modeling: Conformational Molecular Field Analysis (CoMFA) C372 Dr. Kelsey Forsythe.
-
Upload
barnaby-ezra-terry -
Category
Documents
-
view
220 -
download
0
Transcript of Molecular Modeling: Conformational Molecular Field Analysis (CoMFA) C372 Dr. Kelsey Forsythe.
Molecular Modeling: Conformational Molecular
Field Analysis (CoMFA)C372
Dr. Kelsey Forsythe
CoMFA
• Cramer and Milne (1979)– Comparison of molecules by alignment and
field generation
• Wold (1986)– Proposes using PLS instead of PCA for
overrepresented (1000’s of field non-orthogonal “variables”) problem (correlate field values with activities)
• Cramer, Patterson and Bunce (1988)– Introduced CoMFA
CoMFAAssumptions
• Activity is directly related to structural properties of system
• Structural properties determined by non-bonding forces
Outline of CoMFA
• Hypothesize mechanism for binding – Structure of binding site– Most important/difficult
• Find equilibrium geometry • Construct lattice or grid of points • Compute interaction of probe with
molecule at each point• Apply PLS • Predict
CoMFA Structural Focus
• Hypothesize mechanism for binding – Structure of active site and/or common
pharmacophore between all compounds– Most important/difficult– Structural errors propagate to later stages
• Superpose structures– SEAL
• Similarity index
CoMFA Structural Focus
Poor alignment
Better alignment
CoMFAEquilibrium Geometry
• Find equilibrium geometry – Ab Initio, Semi-Empirical or Molecular
Mechanics– Method depends
• Size • Accuracy
CoMFALattice Construction
• Construct lattice or grid of points for field analysis
Steroid (1 representative conformer shown)14 x 11 x 7 = 1078 points
CoMFAField Data Generation
• Compute interaction of probe with molecule at each point– Interaction is typically non-covalent (e.g.
non-bonding forces)• Steric, electrostatic and hydrophobic
– Probe depends on interaction• Kim et. al.
– H+ (electrostatic)– CH3 (steric)– H2O (hydrophobic)
CoMFAField Data Generation
• Compute interaction of probe with molecule at each point– Ncalc=Ngrid * Ncmpds* Nprobes
Outline of CoMFA
• Apply PLS– Problem overrepresented in field
variables/descriptors– Sieve most important field components
(PCA)– Use in regression
QSAR/QSPR-Regression Types
• Partial Least Squares– Cross-validation determines
number of descriptors/components to use
– Derive equation – Use bootstrapping and t-test to
test coefficients in QSAR regression
QSAR/QSPR-Regression Types
• Partial Least Squares (a.k.a. Projection to Latent Structures)– Regression of a Regression
•Provides insight into variation in x’s(bi,j’s as in PCA) AND y’s (ai’s)
– The ti’s are orthogonal – M= (# of field points OR molecules
whichever smaller)
€
y = ai * ti
i
N
∑
ti = bij * x j
j
M
∑
QSAR/QSPR-Regression Types
• PLS is NOT MR or PCR in practice– PLS is MR w/cross-validation– PLS Faster
•couples the target representation (QSAR generation) and component generation while PCA and PCR are separate
• PLS well applied to multi-variate problems
CoMFA PLS Regression
• Sij field value for jth probe at ith grid point
• cij regression weight for Sij
€
activity = C + c ijSij
j=1
P
∑i=1
N
∑
3-D QSAR (CoMFA)Post-Qualifications
• Confidence in Regression• TSS-Total Sum of Squares• ESS-Explained Sum of Squares• RSS-Residual Sum of Squares
€
TSS = ESS + RSS
R2 =ESS
TSS=
1 (100% explaination of data)
0 (no explaination of data)
⎧ ⎨ ⎩
€
y i − y( )2
i
N
∑ = TSS
ycalc,i − y( )2
= ESSi
N
∑
y i − ycalc,i( )2
i
N
∑ = RSS
3-D QSAR (CoMFA)Post-Qualifications
• Cross-validation• Bootstrapping• Reassign ‘wrong’ activity
3-D QSAR (CoMFA)Post-Qualifications
• Standard Deviation in Error Prediction
– N - Number of observations
• No penalty for exclusions/inclusion of latent variables
€
SDEP =PRESS
N PRESS = yobs,i − ycalc,i( )
2
i=1
N
∑
3-D QSAR (CoMFA)Post-Qualifications
• Standard Deviation in Predictions
• PRESS (Predictive Error Sum of Squares)• N - Number of observations• c - Number of latent variables used in regression
• Want ‘c’ s.t. (c + 1 results in 5% decrease in sPRESS)
€
sPRESS =PRESS
N − c −1, PRESS = yobs,i − ycalc,i( )
2
i=1
N
∑
3-D QSAR (CoMFA)Post-Qualification
• Randomly re-assign activities to compounds
• Compare predictability of ‘wrong’ regressions with true regression – Determine random correlation – Determine efficacy of ‘true’
regression
3-D QSAR (CoMFA)Dependencies
• Active compounds in data set• Grid size• Energy model • Probe groups (# and type)
ApplicationNilsson, J. , De Jong, S. Smilde, A. K. Multiway Calibration in 3D
QSAR. J of Chemometrics 1997, 11, 511-524.
• Multilinear PLS applied to group of benzamides interacting with dopamine D3 receptor subtype (anti-schizophrenia drugs)
Application
• 30 aligned set of benzamides and napthamides
• Regions indicate principal components
Application Field Generation
• 5 Modes– Molecular (1)
• 30 molecules
– Field (3)• X, Y and Z
– Probes (1)• Steric ( C )• Hydrophobic (H2O)• Electrostatic (H+)
ApplicationPre-Qualifications
• Scaling (Not Applied Here) – Unit Variance (Auto Scaling)– Ensures equal statistical weights
(initially)
• Mean Centering
€
x i' =
x i
σ
σ ' =1
€
x i' = x i − x
x ' = 0
ApplicationPrincipal Components
• First 4 PCs in space of original descriptors
ApplicationRegression
• X - Principal Components• B - Regression coefficients
€
vy =
v X (0)
v b PLS
ApplicationSteric Plot
Steric Plot
ApplicationSteric Plot
• Y=x1b1+…xibi
• Guide placement of substituents on novel compounds depending on the value of Y (log(Ki)) desired
ApplicationValidation
• Cross Validation– Leave-One-Out
• External Predictions– Test Set– 21 compounds
€
Q2 = 1−
y i − y i,pred( )2
i=1
∑
y i − y ( )2
i=1
∑
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
×100
ApplicationValidation
• Cross Validation– Leave-One-Out (ypred from 29)
• External Predictions– Test Set (ypred from regression)
€
Q2 = 1−
y i − y i,pred( )2
i=1
∑
y i − y ( )2
i=1
∑
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
×100
ApplicationTheory vs. Experiment
3-D QSAR (CoMFA)Potent Pitfalls
• Sensitivity to binding structure• Hydrophobicity not well-quantified• Sensitivity to Nlatent
• Relation between latent variables NOT intuitive
• Test compounds should not differ significantly in properties from training set
• Low S/N (too many useless field variables)
CoMFAAssumptions
• Activity is directly related to structural properties of system– Dynamical corrections?
• Structural properties determined by non-bonding forces– Covalent– Hydrophobic
Advanced CoMFA
• SRD (Smart Region Definition)– LOCAL Set of variables/grid values will
display similar behavior due to structural changes
– Reduce M-grid points to one focal point or seed
– Use “distance” cutoff (nearest, next nearest etc.) to define reduced set of field points
• Reduced PLS– Use only high weight PCs in regression
Other QSAR-based Methods
• HQSAR– Convert 3D --> 2D string– Generate random collections of string elements
• CoMSIA (Conformational Molecular Similarity Indices Analysis
• Wprobe,k=+1(charge),+1(hydrophobicity),1A,+1(h-bond acceptor),+1(h-bond donor)€
AF ,K ,( j )q = − W probe ,kW i,ke
−αriq
k
M traits
∑i
Natoms
∑
References
• Cramer III, R. D., Patterson, D. E., Bunce, J. D. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc. 1988, 110, 5959-5967.
• Hansch, C. and Leo, A. Exploring QSAR: Fundamentals and Applications in Chemistry and Biology American Chemical Society (1995)
• Leach, Andrew R. Molecular Modelling: Principles and Applications Prentice Hall, New York (2001)
Additional Resources
• The QSAR and Modelling Society (http://www.pharma.ethz.ch/qsar)
• Quantitative Structure Activity Relationships (Journal)
Additional Resources
• SYBYL-Molecular Modeling Software, 6.9, Tripos Incorporated, 1699 S. Hanley Rd. St. Louis, Mo. 63144, USA
• GRID, Goodford, P. J. Molecular Discovery Ltd, University of Oxford, England