Molecular Modeling: Conformational Molecular Field Analysis (CoMFA) C372 Dr. Kelsey Forsythe.

Molecular Modeling: Conformational Molecular

Field Analysis (CoMFA)C372

Dr. Kelsey Forsythe

CoMFA

• Cramer and Milne (1979)– Comparison of molecules by alignment and

field generation

• Wold (1986)– Proposes using PLS instead of PCA for

overrepresented (1000’s of field non-orthogonal “variables”) problem (correlate field values with activities)

• Cramer, Patterson and Bunce (1988)– Introduced CoMFA

CoMFAAssumptions

• Activity is directly related to structural properties of system

• Structural properties determined by non-bonding forces

Outline of CoMFA

• Hypothesize mechanism for binding – Structure of binding site– Most important/difficult

• Find equilibrium geometry • Construct lattice or grid of points • Compute interaction of probe with

molecule at each point• Apply PLS • Predict

CoMFA Structural Focus

• Hypothesize mechanism for binding – Structure of active site and/or common

pharmacophore between all compounds– Most important/difficult– Structural errors propagate to later stages

• Superpose structures– SEAL

• Similarity index

CoMFA Structural Focus

Poor alignment

Better alignment

CoMFAEquilibrium Geometry

• Find equilibrium geometry – Ab Initio, Semi-Empirical or Molecular

Mechanics– Method depends

• Size • Accuracy

CoMFALattice Construction

• Construct lattice or grid of points for field analysis

Steroid (1 representative conformer shown)14 x 11 x 7 = 1078 points

CoMFAField Data Generation

• Compute interaction of probe with molecule at each point– Interaction is typically non-covalent (e.g.

non-bonding forces)• Steric, electrostatic and hydrophobic

– Probe depends on interaction• Kim et. al.

– H+ (electrostatic)– CH3 (steric)– H2O (hydrophobic)

CoMFAField Data Generation

• Compute interaction of probe with molecule at each point– Ncalc=Ngrid * Ncmpds* Nprobes

Outline of CoMFA

• Apply PLS– Problem overrepresented in field

variables/descriptors– Sieve most important field components

(PCA)– Use in regression

QSAR/QSPR-Regression Types

• Partial Least Squares– Cross-validation determines

number of descriptors/components to use

– Derive equation – Use bootstrapping and t-test to

test coefficients in QSAR regression


• Partial Least Squares (a.k.a. Projection to Latent Structures)– Regression of a Regression

•Provides insight into variation in x’s(bi,j’s as in PCA) AND y’s (ai’s)

– The ti’s are orthogonal – M= (# of field points OR molecules

whichever smaller)

€

y = ai * ti

i

N

∑

ti = bij * x j

j

M

∑


• PLS is NOT MR or PCR in practice– PLS is MR w/cross-validation– PLS Faster

•couples the target representation (QSAR generation) and component generation while PCA and PCR are separate

• PLS well applied to multi-variate problems

CoMFA PLS Regression

• Sij field value for jth probe at ith grid point

• cij regression weight for Sij

€

activity = C + c ijSij

j=1

P

∑i=1

N

∑

3-D QSAR (CoMFA)Post-Qualifications

• Confidence in Regression• TSS-Total Sum of Squares• ESS-Explained Sum of Squares• RSS-Residual Sum of Squares

€

TSS = ESS + RSS

R2 =ESS

TSS=

1 (100% explaination of data)

0 (no explaination of data)

⎧ ⎨ ⎩

€

y i − y( )2

i

N

∑ = TSS

ycalc,i − y( )2

= ESSi

N

∑

y i − ycalc,i( )2

i

N

∑ = RSS


• Cross-validation• Bootstrapping• Reassign ‘wrong’ activity


• Standard Deviation in Error Prediction

– N - Number of observations

• No penalty for exclusions/inclusion of latent variables

€

SDEP =PRESS

N PRESS = yobs,i − ycalc,i( )

2

i=1

N

∑


• Standard Deviation in Predictions

• PRESS (Predictive Error Sum of Squares)• N - Number of observations• c - Number of latent variables used in regression

• Want ‘c’ s.t. (c + 1 results in 5% decrease in sPRESS)

€

sPRESS =PRESS

N − c −1, PRESS = yobs,i − ycalc,i( )

2

i=1

N

∑

3-D QSAR (CoMFA)Post-Qualification

• Randomly re-assign activities to compounds

• Compare predictability of ‘wrong’ regressions with true regression – Determine random correlation – Determine efficacy of ‘true’

regression

3-D QSAR (CoMFA)Dependencies

• Active compounds in data set• Grid size• Energy model • Probe groups (# and type)

ApplicationNilsson, J. , De Jong, S. Smilde, A. K. Multiway Calibration in 3D

QSAR. J of Chemometrics 1997, 11, 511-524.

• Multilinear PLS applied to group of benzamides interacting with dopamine D3 receptor subtype (anti-schizophrenia drugs)

Application

• 30 aligned set of benzamides and napthamides

• Regions indicate principal components

Application Field Generation

• 5 Modes– Molecular (1)

• 30 molecules

– Field (3)• X, Y and Z

– Probes (1)• Steric ( C )• Hydrophobic (H2O)• Electrostatic (H+)

ApplicationPre-Qualifications

• Scaling (Not Applied Here) – Unit Variance (Auto Scaling)– Ensures equal statistical weights

(initially)

• Mean Centering

€

x i' =

x i

σ

σ ' =1

€

x i' = x i − x

x ' = 0

ApplicationPrincipal Components

• First 4 PCs in space of original descriptors

ApplicationRegression

• X - Principal Components• B - Regression coefficients

€

vy =

v X (0)

v b PLS

ApplicationSteric Plot

Steric Plot

ApplicationSteric Plot

• Y=x1b1+…xibi

• Guide placement of substituents on novel compounds depending on the value of Y (log(Ki)) desired

ApplicationValidation

• Cross Validation– Leave-One-Out

• External Predictions– Test Set– 21 compounds

€

Q2 = 1−

y i − y i,pred( )2

i=1

∑

y i − y ( )2

i=1

∑

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

×100

ApplicationValidation

• Cross Validation– Leave-One-Out (ypred from 29)

• External Predictions– Test Set (ypred from regression)

€

Q2 = 1−

y i − y i,pred( )2

i=1

∑

y i − y ( )2

i=1

∑

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

⎡

⎣

⎢ ⎢ ⎢ ⎢

⎤

⎦

⎥ ⎥ ⎥ ⎥

×100

ApplicationTheory vs. Experiment

3-D QSAR (CoMFA)Potent Pitfalls

• Sensitivity to binding structure• Hydrophobicity not well-quantified• Sensitivity to Nlatent

• Relation between latent variables NOT intuitive

• Test compounds should not differ significantly in properties from training set

• Low S/N (too many useless field variables)

CoMFAAssumptions

• Activity is directly related to structural properties of system– Dynamical corrections?

• Structural properties determined by non-bonding forces– Covalent– Hydrophobic

Advanced CoMFA

• SRD (Smart Region Definition)– LOCAL Set of variables/grid values will

display similar behavior due to structural changes

– Reduce M-grid points to one focal point or seed

– Use “distance” cutoff (nearest, next nearest etc.) to define reduced set of field points

• Reduced PLS– Use only high weight PCs in regression

Other QSAR-based Methods

• HQSAR– Convert 3D --> 2D string– Generate random collections of string elements

• CoMSIA (Conformational Molecular Similarity Indices Analysis

• Wprobe,k=+1(charge),+1(hydrophobicity),1A,+1(h-bond acceptor),+1(h-bond donor)€

AF ,K ,( j )q = − W probe ,kW i,ke

−αriq

k

M traits

∑i

Natoms

∑

References

• Cramer III, R. D., Patterson, D. E., Bunce, J. D. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. J. Am. Chem. Soc. 1988, 110, 5959-5967.

• Hansch, C. and Leo, A. Exploring QSAR: Fundamentals and Applications in Chemistry and Biology American Chemical Society (1995)

• Leach, Andrew R. Molecular Modelling: Principles and Applications Prentice Hall, New York (2001)

Additional Resources

• The QSAR and Modelling Society (http://www.pharma.ethz.ch/qsar)

• Quantitative Structure Activity Relationships (Journal)

Additional Resources

• SYBYL-Molecular Modeling Software, 6.9, Tripos Incorporated, 1699 S. Hanley Rd. St. Louis, Mo. 63144, USA

• GRID, Goodford, P. J. Molecular Discovery Ltd, University of Oxford, England

Molecular Modeling: Conformational Molecular Field Analysis (CoMFA) C372 Dr. Kelsey Forsythe.

Documents

Transcript of Molecular Modeling: Conformational Molecular Field Analysis (CoMFA) C372 Dr. Kelsey Forsythe.