Richard Cramer 2014 euro QSAR presentation

34
Might Template CoMFA Integrate Structure-Based and Ligand-Based Design? Some Remarkable Predictions Richard D. Cramer ([email protected]) EuroQSAR 2014, St. Petersburg, Russian Federation September 4, 2014

description

Might Template CoMFA integrate structure-based and ligand-based design? Some remarkable predictions.

Transcript of Richard Cramer 2014 euro QSAR presentation

Page 1: Richard Cramer 2014 euro QSAR presentation

Might Template CoMFA Integrate Structure-Based and Ligand-Based Design? Some Remarkable Predictions

Richard D. Cramer ([email protected])EuroQSAR 2014, St. Petersburg, Russian FederationSeptember 4, 2014

Page 2: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

True Predictions: for a random half of ChEMBL’s facXa data

SDEP = 1.14n = 1907

.. from a single CoMFA model

Page 3: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Diversity of ChEMBL factorXa structures

1339 “reduced”Bemis-Murckoskeletons

267 publications

>100 assay protocols

Twenty random structures

Page 4: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Template CoMFA: automated general 3D-QSAR setup

2D Test Set(Predictions!)

3D templatesBy using aligned3D structures (from X-ray, pharmacophore, ??) as templates

TemplateCoMFA

2D Training Set Aligned (3D) Training Set

CoMFAmodel

Cramer, R. D.; Wendt, B. J. Chem. Inf. Model. 2014, 54, 660-671.

Page 5: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 5

The experimental data available for factor Xa

Goal:Predict ChEMBL(3900+ usable)

From bindingdb From ChEMBL

12 templates(.pdb references)

270 training SAR (analogs of thetemplates)

Page 6: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

SDEP = 1.14n = 1907

Factor Xa (training set = random half of ChEMBL)

Training Set

bindingDB12 .pdb

Test Set

ChEMBL

Model: q2=.381/ sdep =1.15)

Predictions’ SDEP ==model’s SDEP!

Page 7: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

SDEP = 1.14n = 1907

What’s going on?

Training Set

bindingDB12 .pdb

Test Set

ChEMBL

Model: q2=.381/ sdep =1.15)

This is the surprise!

Not surprising ..

Representative of“all small mol space”?

Page 8: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 8

More about this remarkable prediction result ..

• Other biological targets?• The “why” and “how” of such results• Drill-down on this factorXa result• Toward its possible applications• Other attributes of template CoMFA

Page 9: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Template CoMFA had succeeded on ~90% of 114 targets

Includes:All the 74 targetsin bindingdb.orgreferencing more than one .pdb

Cramer, R. D. J. Chem. Inf. Model. 2014, 54, 2147–2156.

Page 10: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Another “all-CheMBL”target: Checkpoint kinase 1 predictions

SDEP=1.14

But a third target, carbonic anhydrase II, did not work (no model from ChEMBL)

Page 11: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Fundamentals of Template CoMFA

• Theory underlying 3D-QSAR: The primary cause of potency differences among (non-covalently acting) ligands is steric and electrostatic field differences.

• Empirical Observations: When the goal is an informative comparison of ligand field differences, increasing ligand shape similarity seems at least as productive as increasing physicochemical precision (by, e.g., docking).

• Concept: Template CoMFA seeks ligand shape similarity by:– “Copying” coordinates from any atom within any template

ligand that “best matches” a candidate’s (training or test set) atom

– Using the topomer protocol to generate coordinates for the still remaining “non-matching” atoms

Page 12: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Many templates – how is the one “best match” chosen?

• “Best match” == best “anchor bond” pairing– The best anchor bond pairing maximizes the size of

anchor-bond-rooted branches having “similar” atoms– To be considered, a possible anchor bond must be “interesting”– The “best match” search is exhaustive

• Every interesting bond in the candidate• Vs every template • Vs every interesting bond in each template

– “Atom similarity” considers types, properties, topological locations– The actual alignment (“coordinate copying”) then begins by overlaying

the chosen “candidate” anchor onto the chosen template anchor bond

Page 13: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Factor Xa inhibitors having .pdb references in bindingdb (2D)

© Tripos, L.P. All Rights Reserved Slide 13

Page 14: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

The only 3D info: the 12 overlaid factor Xa templates

Page 15: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 15

.pdb Template #10 and its ChEMBL “homologues“ (2D)

Page 16: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 16

.pdb Template #10 and its ChEMBL “homologues” (3D)

Page 17: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 17

Template #10 and some of its ChEMBL Non-homologues (2D)

Page 18: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 18

Template #10 and some of its ChEMBL Non-homologues (3D)

Page 19: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Combined Template CoMFA Alignments and Contours

Page 20: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 20

Integration of Structure- and Ligand-based Design (chk1)

Receptor pocket surfaces

Steric Contours Electrostatic Contours

Color coded by electrostatic potential

Page 21: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 21

What might this capability be good for?

1. Are so many training set structures needed?2. Is there any way to put confidence limits around an

individual prediction?3. But crystal structures are not available for many important

biological targets?

Page 22: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

1. Much smaller (random) training sets do seem useful

nTrng nPred ratio SDEP #Cmp q2 SDEP r2 s1908 1907 1x 1.14 9 0.405 1.13 0.630 0.89

954 2861 3x 1.22 12 0.337 1.20 0.796 0.67477 3338 7x 1.30 10 0.336 1.22 0.856 0.57239 3565 15x 1.30 2 0.190 1.31 0.522 1.00

CoMFA stats (ntrng)

Page 23: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 23

2. Confidence limits can be tightened with similarity metrics

• Suppose the following project situation (using facXa)– Actual pI50 > 8.65 (one SD > mean pI50) must be avoided– Proposal: if predicted pI50 > 7.2 (mean pI50), no test needed– What is the error rate (false negative)?

• Suppose a predicted pI50 is rejected if similarity to its CoMFA template is too low– Then what is the error rate?

Pred pI50 <7.2

Obsvd pI50 >8.65

False Negative

% False Negative

All structures 1907 629 185 29.4& topdiff <200 168 98 8 8.2& mathv > 0.65 339 348 26 7.5& asim > .999 694 306 56 18.3& fgpt Tan >.75 98 93 4 4.3

Page 24: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 24

3. No X-ray structures?

• Ligand-based approaches (pharmacophoric &/or shape)• The pharmacophoric “elephant in the room”

– Enormous configurational search space – Criteria for a correct pharmacophore are mostly subjective

• A possible objective criterion for a correct pharmacophore ? – obtaining a satisfactory CoMFA model from a training set..– aligned by template CoMFA alignment with that pharmacophoric

hypothesis as its templates

Page 25: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Template CoMFA Attributes (.. implicit in talk content !!)

• TC is a ligand alignment protocol for classical CoMFA that:– As input, only uses 3D template(s) and 2D SAR table, thus providing:

• Fast and convenient throughput• Objectively determined models • Application of crystallographic and/or pharmacophoric constraints• No limitations on structural applicability

– As output, enables, practically:• Rapid, objective, structurally unlimited potency predictions that so far are

reasonably accurate• More structurally informative contour maps• 3D database searching with potency predictions• Potency-prediction-constrained de novo design

– Its 3D-QSAR models can:• Successfully combine multiple series into a single model• Be generated completely automatically

Page 26: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 26

Thanks to everyone who helped!

• Bernd Wendt for “bleeding edge” trials and feedback• Supervisors who have kept paying me -- mostly to pursue

this topomer/self-similarity thing (for over twenty years now)– John McAlister– Jim Hopkins– Dan Weiner– Jim Mahan

Page 27: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 27

Template CoMFA References

• Cramer, R. D. Template CoMFA applied to 114 Biological Targets. J. Chem. Inf. Model., 2014, 54, 2147–2156. 

• Wendt, B.; Cramer, R. D. Challenging the Gold Standard for 3D-QSAR: Template CoMFA versus X-ray Alignment. J. Comp.-Aided Drug Design, 2014 , accepted.

• Cramer, R. D.; Wendt, B. Template CoMFA: The 3D-QSAR Grail? J. Chem. Inf. Model. 2014, 54, 660-671.

• Cramer, R. D. Rethinking 3D-QSAR. J. Comp.-Aided Drug Design, 2011, 25, 197-201.• Cramer, R. D.; Jilek, R. J.; Guessregen, S.; Clark, S. J.; Wendt, B.; Clark, R. D..“Lead-

Hopping”. Validation of Topomer Similarity as a Superior Predictor of Similar Biological Activities. J. Med. Chem., 2004, 47, 6777-6791.

• Jilek, R. J., Cramer, R. D. Topomers: A Validated Protocol for their Self-Consistent Generation. J. Chem. Inf. Comp. Sci. 2004, 44, 1221-1227.

• Cramer, R. D. Topomer CoMFA: A Design Methodology for Rapid Lead Optimization, J. Med. Chem. 2003, 46, 374-389.

• Cramer, R. D.; Clark, R. D.; Patterson, D. E.; Ferguson, A. M. Bioisosterism as a molecular diversity descriptor: steric fields of single topomeric conformers. J. Med. Chem. 1996, 39, 3060-3069. 

• Patterson, D. E.; Cramer, R. D.; Ferguson, A. M.; Clark, R. D.; Weinberger, L. E. Neighborhood behavior: a useful concept for validation of molecular diversity descriptors. J. Med. Chem. 1996, 39, 3049-3059.

Page 28: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved. 28

Goal:Predict ChEMBL(3900+ usable)

bindingDB11 .pdb270 2D SAR

SDEP = 1.74

q2=.577 / SDEP=.86

factorXa predictions (if training set = bindingdb)

This doesn’t work!

Cramer, R. D.; Wendt, B. JCIM 2014, 54, 660-671.

Page 29: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

A second target: Checkpoint kinase 1 predictions

SDEP=1.14SDEP=1.46

Training set from bindingdb SAR Training set = half ChEMBL

But a third target, carbonic anhydrase II, did not work (no model from ChEMBL)

Page 30: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Why is pure shape similarity so productive?

Assay should be constant..

pIC50

-H 7.2

-F 7.9F

WHILE CONVERSELY:Docking (small combi library) moves the core around, producing field variation that is noise, because ....an invariant core cannot have caused changes in biological activity

The only possible cause of this pIC50 difference is the difference in the fields surrounding F => H – any docking pose change from that field difference is only mechanistic and can beignored for QSAR purposes

Page 31: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

How can Topomer CoMFA be so Effective? (2)

Y X6.0 6.0

3.3 3.3

0.9 0.9

5.3 5.3

Y X

6.0 6.0

3.3 3.3

0.9 0.9

5.3 5.3

X X X X X X

.. .. .. .. .. ..

.. .. .. .. .. ..

.. .. .. .. .. ..

+ Many columns ofrandom x values

q2 for Y = f(X)Multiple Regression PLS

1.000 1.000

1.000 0.000 !!

Clark, M.; Cramer, R.D. Quant. Struct.-Act. Relat. 1993, 12, 137-145

Input data:

One perfectlycorrelateddescriptor

Multiple Regression and PLS are different !!

Page 32: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Examples of Issues During “Atom Matching”

3D template

topomer

template

Page 33: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

• Topomer: a single “black-box constructed” 3D model of a monovalent

fragment• Topomer protocol:

– Only input is the “2D structure” of a single fragment (A)– “Embedded” in 3D space by superposing the open valence (B)– Valence geometries (bonds, angles, rings) from Concord (or

Corina) (B)– Torsions, stereochemistry, ring flips from canonical rules (C)– Resulting “strain energy” is ignored

“Topomer” positioning of unmapped atoms

A B C D

*

*

Page 34: Richard Cramer 2014 euro QSAR presentation

© Copyright 2014 Certara, L.P. All rights reserved.

Another random sample of ten training set structures

.. suppose you only needed to align each of these ten structures to one of those twelve templates ..