Practical Experiences of 3D-QSAR Tools in Forge ......Overview of 3D-QSAR in Forge • 3D features...

Practical Experiences of 3D-QSAR Tools in Forge: Application to RET Kinase Inhibitors

Bohdan WaszkowyczDrug Discovery UnitCancer Research UK Manchester Institute

Cresset European User Group Meeting18th June 2015

Overview

• Introduction– Background to RET kinase inhibitor project

• 3D-QSAR implementation in Forge– Forge alignments versus protein-ligand docking

• Comparison of 3D-QSAR models– Impact of alignments and outliers– Separate training sets and test sets– QSAR for target selectivity

• Final thoughts

Background to RET Inhibitor Project

• RET receptor tyrosine kinase involved in cell survival/differentiation/proliferation– Activating RET mutations observed in medullary thyroid carcinoma– RET gene fusions observed in lung adenocarcinoma

• Several non-selective tyrosine kinase inhibitors currently in clinical use– Hence dose-limiting toxicity from inhibition of KDR (VEGFR2) and other kinases

• Aim to identify series of inhibitors with improved affinity & selectivity for RET– Initially, optimise known inhibitor scaffolds to explore SAR leading to RET selectivity

N

N

NH

OH

OMe

OMe

N

N

NH

Br

F OMe

O

NMe

Vandetanib – clinically used – poor RET/KDR selectivity

Early lead with promising gain in activity and selectivity

N

N

XR

R

Broad exploration of SAR around quinazoline core

Overview of Quinazoline Dataset

• Selection of dataset for 3D-QSAR studies– Over 450 examples synthesised – biochemical IC50 from 0.2nM to >30µM– Focus on a subset of 128 compounds for QSAR – exploring gatekeeper pocket– Early modelling based on PDB RET-vandetanib X-ray – later obtained in-house X-ray

N

N

NH

OH

OMe

OMe

Broad range of substituents, phenol mimetics, heterocycles Invariant dimethoxy

quinazoline scaffold

RET biochemical IC50 5nM(15x selective over KDR)

Electrostatic Fields for Selected Examples

3-OH (IC50 5nM) vs 4,6-diF,3-OH (IC50 0.4nM) – highlights impact of fluorines

3-OH (IC50 5nM) vs indazole (IC50 140nM) – heterocyclic phenol mimic

Difference maps

Why Should We Want to Build a QSAR Model?

• Quantitative Structure-Activity Relationships– Aim to correlate biological activity with chemical structure/properties

structural descriptors,chemical properties

statistical modelling,machine learning

Insight and Rationale

- Which features are essential for activity?- How statistically robust is the SAR?

- Can we account for outliers?

Predictions and Prioritisation

- What should we make next?- Where is the SAR incomplete/unclear?

- Does the SAR transfer to a new scaffold?

QSAR

biological activity(binding affinity, IC50…)

Overview of 3D-QSAR in Forge

• 3D features of the dataset described in terms of electrostatic & steric fields– Regression analysis identifies which field points are correlated with biological activity

Assemble aligned set of compounds with measured

biological activity

For each compound, Forge calculates electrostatic and steric

energies at each field point QSAR descriptors

Assess quality of prediction in terms of Q2

(cross-validated coefficient of determination) to ensure model is not over-fitted

Run PLS regression visualise significant regression

coefficients in 3D space

Comparison of Alignment Methods

Structure-based- Glide docking

- Schrödinger Glide SP docking to rigid protein- Use of loose core constraints

- Retain 5 poses per ligand- Manually select preferred poses

- Tweak and refine inconsistent poses

Ligand-based- Forge

• Preferred alignment protocols (after much experimentation…)

- Select parent 3-OH as initial reference, X-ray pose- Substructure alignment, protein present, no constraints

- Use most accurate conformation hunt- Tweak and refine inconsistent groups (e.g. OH)

- Use additional references to re-align selected ligands

Aim to generate plausible & consistent set of alignments

before running the regression analysis

Ambiguous Alignments

• How best to align 3-OMe (IC50 1200nM) with 3-OH (IC50 5nM)?– Forge aligns OMe with OH loss of HB donor, steric clash– Docking flips the OMe – protein binding site for OH is very restrictive – What happens in reality? Which leads to the most meaningful model?

3-OH reference compound

3-OMe aligned in Forge

3-OMe docked in protein

3D-QSAR Based on Docked Poses

• 100 compound set – omit compounds that fail to dock– Leave-one-out Q2 = 0.559 – predictions quite noisy– Coefficients capture main points of SAR

• 3-OH favoured with strong steric component• 4-substituent sterically favoured, other positions generally disfavoured• Some coefficients arise from structurally conserved regions

Cross-validated Predicted vs Experimental pIC50

(diagonal indicates x=y)

3D-QSAR Based on Forge Alignments

• Same 100 compound dataset but using Forge alignments– Q2 = 0.553 – very similar to previous dataset

• Predictions better overall apart from some weakly active outliers– Coefficients appear more clear-cut

• 3-OH electrostatics now more dominant• 4-substituent sterically favoured, 2-substituent sterically disfavoured• Less noise from invariant quinazoline core

Field Contributions to Predicted Activity

• Visualise how each compound is scored by the model

3-OH, 6-FIC50 1nM

3-OH, 6-MeIC50 8nM

IndazolylIC50 140nM

PhenylIC50 1800nM

Impact of Outliers on Model Performance

• All 100 compounds, Forge alignments: Q2 = 0.553– Some weakly active compounds – less reliable IC50, tricky alignments, under-represented chemistry

• Remove weakly active compounds– Omit IC50 > 10µM 81 cpds, Q2 = 0.671– Omit IC50 > 5µM 71 cpds, Q2 = 0.707

• Remove poorly predicted outliers from original model– Omit 3 worst predicted outliers 97 cpds, Q2 = 0.759

• (3-OH,4-Ac; 3-OH,2-CN; benzimidazol-5-yl)

97 compound set71 compound set

Separate Training and Test Sets

• More reliable estimation of robustness comes from separate training and test sets– Forge alignments, 97 compound set – leave-one-out Q2 = 0.759– Split 80% training set, 20% test set

Select test set by activity R2 = 0.770

Select test set randomly – 10 repeats mean R2 = 0.655 (SD 0.232)

Worst R2 0.055 Best R2 0.877

Prediction of RET vs KDR Selectivity

• In place of activity, use RET:KDR selectivity– RET_pIC50 – KDR_pIC50

– Using original Forge alignments, omit RET IC50>10µM 80 compound set, Q2 = 0.721

– Coefficients highlight impact of 2-substituent in boosting selectivity for RET over KDR

X-ray of 3-phenol highlighting close contact of 2-substituent to Ser891

RET IC50 44nM(130x selective over KDR)

Final Thoughts

• Forge offers a user-friendly interface to building 3D-QSAR models– Easy to set up training/test sets, run cross validation, visualise graphs and

coefficients

• Application to RET project – Obtained models consistent with observed SAR, offering insight into features

required for activity/selectivity and highlighting outliers

• Building a successful and robust QSAR model takes time (and patience)– Careful selection of the data set is important – range of activities/diversity– Generation of plausible, consistent and objective alignments is critical

“Just remember: all of the signal is in the alignments. Align your molecules well. Check them, fix them, check them again. Check them once more.

Then, and only then, should you press the QSAR button. Good luck!” Mark Mackey

Acknowledgements

• RET project team at CRUK Manchester Institute Drug Discovery Unit

• Neil McDonald, Birkbeck College, London, for X-ray crystallography

• Cancer Research UK (Grant numbers C480/A1141 and C5759/A17098) and the Cancer Research Technology Pioneer Fund for funding

• Quinazoline patent WO2015/079251

Practical Experiences of 3D-QSAR Tools in Forge ......Overview of 3D-QSAR in Forge • 3D features...

Documents

Transcript of Practical Experiences of 3D-QSAR Tools in Forge ......Overview of 3D-QSAR in Forge • 3D features...