Practical Experiences of 3D-QSAR Tools in Forge ......Overview of 3D-QSAR in Forge • 3D features...
Transcript of Practical Experiences of 3D-QSAR Tools in Forge ......Overview of 3D-QSAR in Forge • 3D features...
Practical Experiences of 3D-QSAR Tools in Forge: Application to RET Kinase Inhibitors
Bohdan WaszkowyczDrug Discovery UnitCancer Research UK Manchester Institute
Cresset European User Group Meeting18th June 2015
Overview
• Introduction– Background to RET kinase inhibitor project
• 3D-QSAR implementation in Forge– Forge alignments versus protein-ligand docking
• Comparison of 3D-QSAR models– Impact of alignments and outliers– Separate training sets and test sets– QSAR for target selectivity
• Final thoughts
Background to RET Inhibitor Project
• RET receptor tyrosine kinase involved in cell survival/differentiation/proliferation– Activating RET mutations observed in medullary thyroid carcinoma– RET gene fusions observed in lung adenocarcinoma
• Several non-selective tyrosine kinase inhibitors currently in clinical use– Hence dose-limiting toxicity from inhibition of KDR (VEGFR2) and other kinases
• Aim to identify series of inhibitors with improved affinity & selectivity for RET– Initially, optimise known inhibitor scaffolds to explore SAR leading to RET selectivity
N
N
NH
OH
OMe
OMe
N
N
NH
Br
F OMe
O
NMe
Vandetanib – clinically used – poor RET/KDR selectivity
Early lead with promising gain in activity and selectivity
N
N
XR
R
Broad exploration of SAR around quinazoline core
Overview of Quinazoline Dataset
• Selection of dataset for 3D-QSAR studies– Over 450 examples synthesised – biochemical IC50 from 0.2nM to >30µM– Focus on a subset of 128 compounds for QSAR – exploring gatekeeper pocket– Early modelling based on PDB RET-vandetanib X-ray – later obtained in-house X-ray
N
N
NH
OH
OMe
OMe
Broad range of substituents, phenol mimetics, heterocycles Invariant dimethoxy
quinazoline scaffold
RET biochemical IC50 5nM(15x selective over KDR)
Electrostatic Fields for Selected Examples
3-OH (IC50 5nM) vs 4,6-diF,3-OH (IC50 0.4nM) – highlights impact of fluorines
3-OH (IC50 5nM) vs indazole (IC50 140nM) – heterocyclic phenol mimic
Difference maps
Why Should We Want to Build a QSAR Model?
• Quantitative Structure-Activity Relationships– Aim to correlate biological activity with chemical structure/properties
structural descriptors,chemical properties
statistical modelling,machine learning
Insight and Rationale
- Which features are essential for activity?- How statistically robust is the SAR?
- Can we account for outliers?
Predictions and Prioritisation
- What should we make next?- Where is the SAR incomplete/unclear?
- Does the SAR transfer to a new scaffold?
QSAR
biological activity(binding affinity, IC50…)
Overview of 3D-QSAR in Forge
• 3D features of the dataset described in terms of electrostatic & steric fields– Regression analysis identifies which field points are correlated with biological activity
Assemble aligned set of compounds with measured
biological activity
For each compound, Forge calculates electrostatic and steric
energies at each field point QSAR descriptors
Assess quality of prediction in terms of Q2
(cross-validated coefficient of determination) to ensure model is not over-fitted
Run PLS regression visualise significant regression
coefficients in 3D space
Comparison of Alignment Methods
Structure-based- Glide docking
- Schrödinger Glide SP docking to rigid protein- Use of loose core constraints
- Retain 5 poses per ligand- Manually select preferred poses
- Tweak and refine inconsistent poses
Ligand-based- Forge
• Preferred alignment protocols (after much experimentation…)
- Select parent 3-OH as initial reference, X-ray pose- Substructure alignment, protein present, no constraints
- Use most accurate conformation hunt- Tweak and refine inconsistent groups (e.g. OH)
- Use additional references to re-align selected ligands
Aim to generate plausible & consistent set of alignments
before running the regression analysis
Ambiguous Alignments
• How best to align 3-OMe (IC50 1200nM) with 3-OH (IC50 5nM)?– Forge aligns OMe with OH loss of HB donor, steric clash– Docking flips the OMe – protein binding site for OH is very restrictive – What happens in reality? Which leads to the most meaningful model?
3-OH reference compound
3-OMe aligned in Forge
3-OMe docked in protein
3D-QSAR Based on Docked Poses
• 100 compound set – omit compounds that fail to dock– Leave-one-out Q2 = 0.559 – predictions quite noisy– Coefficients capture main points of SAR
• 3-OH favoured with strong steric component• 4-substituent sterically favoured, other positions generally disfavoured• Some coefficients arise from structurally conserved regions
Cross-validated Predicted vs Experimental pIC50
(diagonal indicates x=y)
3D-QSAR Based on Forge Alignments
• Same 100 compound dataset but using Forge alignments– Q2 = 0.553 – very similar to previous dataset
• Predictions better overall apart from some weakly active outliers– Coefficients appear more clear-cut
• 3-OH electrostatics now more dominant• 4-substituent sterically favoured, 2-substituent sterically disfavoured• Less noise from invariant quinazoline core
Field Contributions to Predicted Activity
• Visualise how each compound is scored by the model
3-OH, 6-FIC50 1nM
3-OH, 6-MeIC50 8nM
IndazolylIC50 140nM
PhenylIC50 1800nM
Impact of Outliers on Model Performance
• All 100 compounds, Forge alignments: Q2 = 0.553– Some weakly active compounds – less reliable IC50, tricky alignments, under-represented chemistry
• Remove weakly active compounds– Omit IC50 > 10µM 81 cpds, Q2 = 0.671– Omit IC50 > 5µM 71 cpds, Q2 = 0.707
• Remove poorly predicted outliers from original model– Omit 3 worst predicted outliers 97 cpds, Q2 = 0.759
• (3-OH,4-Ac; 3-OH,2-CN; benzimidazol-5-yl)
97 compound set71 compound set
Separate Training and Test Sets
• More reliable estimation of robustness comes from separate training and test sets– Forge alignments, 97 compound set – leave-one-out Q2 = 0.759– Split 80% training set, 20% test set
Select test set by activity R2 = 0.770
Select test set randomly – 10 repeats mean R2 = 0.655 (SD 0.232)
Worst R2 0.055 Best R2 0.877
Prediction of RET vs KDR Selectivity
• In place of activity, use RET:KDR selectivity– RET_pIC50 – KDR_pIC50
– Using original Forge alignments, omit RET IC50>10µM 80 compound set, Q2 = 0.721
– Coefficients highlight impact of 2-substituent in boosting selectivity for RET over KDR
X-ray of 3-phenol highlighting close contact of 2-substituent to Ser891
RET IC50 44nM(130x selective over KDR)
Final Thoughts
• Forge offers a user-friendly interface to building 3D-QSAR models– Easy to set up training/test sets, run cross validation, visualise graphs and
coefficients
• Application to RET project – Obtained models consistent with observed SAR, offering insight into features
required for activity/selectivity and highlighting outliers
• Building a successful and robust QSAR model takes time (and patience)– Careful selection of the data set is important – range of activities/diversity– Generation of plausible, consistent and objective alignments is critical
“Just remember: all of the signal is in the alignments. Align your molecules well. Check them, fix them, check them again. Check them once more.
Then, and only then, should you press the QSAR button. Good luck!” Mark Mackey
Acknowledgements
• RET project team at CRUK Manchester Institute Drug Discovery Unit
• Neil McDonald, Birkbeck College, London, for X-ray crystallography
• Cancer Research UK (Grant numbers C480/A1141 and C5759/A17098) and the Cancer Research Technology Pioneer Fund for funding
• Quinazoline patent WO2015/079251