Development and sharing of ADME/Tox and Drug Discovery Machine learning models
-
Upload
sean-ekins -
Category
Science
-
view
439 -
download
1
Transcript of Development and sharing of ADME/Tox and Drug Discovery Machine learning models
![Page 1: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/1.jpg)
Development and Sharing of ADME/Tox and Drug Discovery Machine Learning Models
Alex M. Clark1*, Krishna Dole2, Anna Coulon-Spector2, Andrew
McNutt2, George Grass3, Joel S. Freundlich4,5, Robert C. Reynolds6
and Sean Ekins2,7*
1 Molecular Materials Informatics, 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada2 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA
3 G2 Research, Inc., PO Box 1242, Tahoe City, CA 961454Center for Emerging & Re-emerging Pathogens, Division of Infectious Diseases, Department of Medicine, Rutgers
University-New Jersey Medical School, Newark, New Jersey 07103, United States5Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Newark, New Jersey
07103, United States
6University of Alabama at Birmingham, College of Arts and Sciences, Department of Chemistry, 1530 3rd Avenue South,
Birmingham, AL 35294-1240, USA.
7 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA
![Page 2: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/2.jpg)
![Page 3: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/3.jpg)
ADME/Tox models 15 yrs on: Then & Now• Datasets very small < 100 cpds• Heavy focus on P450• Models rarely used • Very limited number of
properties addressed• Few tools / agorithms used• Limited access to models
• Much bigger datasets > 1000s cpds >10,000
• Broader range of models• Models more widely used and
reported• More accessible models• Pharma making data available
70 hERG models (Villoutreix and
Taboroureau 2015) 19 protein binding models
(Lambrinidis et al 2015) 40 BBB models upto 2009
![Page 4: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/4.jpg)
The Opportunity
•Get pharmas to use open source molecular descriptors and algorithms
•Benefit from initial work done by Pfizer/CDD
•Avoid repetition of open source tools vs commercial tools comparisons
•Change the mindset from real data to virtual data – confirm predictions
•ADME/Tox is precompetitive
•Expand the chemical space and predictivity of models
•Share models with collaborators – Companies could share data as models
Ekins and Williams, Lab On A Chip, 10: 13-22, 2010.
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
![Page 5: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/5.jpg)
Model resources for ADME/Tox
![Page 6: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/6.jpg)
CYP 1A2 2C9 2C19
Substrate (mM) phenacetin (10) diclofenac (10) omeprazole (0.5)
Inhibitor naphthoflavone sulfaphenazole tranylcypromine
Compounds IC50 (mM) IC50 (mM) IC50 (mM)
JSF-2019 2.25 3.55 10.8
Retinal dehydrogenase 1
ADME SARfari predicts importance of CYP1A2, CYP2C9, CYP2C19
The Naïve Bayes model was built with 142345 compounds (training and validation) and features 135 learned classes.
Testing by Dr. Joel Freundlich
![Page 7: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/7.jpg)
The big idea (2009)
Challenge..There is limited access to ADME/Tox
data and models needed for R&D
How could a company share data but keep the
structures proprietary?
Sharing models means both parties use costly
software
What about open source tools?
Pfizer had never considered this - So we proposed a
study and Rishi Gupta generated models
![Page 8: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/8.jpg)
Pfizer Open models and descriptors
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
• What can be developed with very large training and test sets?
• HLM training 50,000 testing 25,000 molecules
• training 194,000 and testing 39,000
• MDCK training 25,000 testing 25,000
• MDR training 25,000 testing 18,400
• Open molecular descriptors / models vs commercial descriptors
![Page 9: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/9.jpg)
• Examples – Metabolic Stability
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
HLM Model with CDK and
SMARTS Keys:
HLM Model with MOE2D and
SMARTS Keys
# Descriptors: 578 Descriptors
# Training Set compounds:
193,650
Cross Validation Results: 38,730
compounds
Training R2: 0.79
20% Test Set R2: 0.69
Blind Data Set (2310
compounds):
R2 = 0.53
RMSE = 0.367
Continuous Categorical:
κ = 0.40
Sensitivity = 0.16
Specificity = 0.99
PPV = 0.80
Time (sec/compound): 0.252
# Descriptors: 818 Descriptors
# Training Set compounds:
193,930
Cross Validation Results: 38,786
compounds
Training R2: 0.77
20% Test Set R2: 0.69
Blind Data Set (2310
compounds):
R2 = 0.53
RMSE = 0.367
Continuous Categorical:
κ = 0.42
Sensitivity = 0.24
Specificity = 0.987
PPV = 0.823
Time (sec/compound): 0.303
PCA of training (red) and test (blue)
compounds
Overlap in Chemistry space
![Page 10: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/10.jpg)
• Examples – P-gp
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
Open source descriptors CDK and C5.0 algorithm
~60,000 molecules with P-gp efflux data from Pfizer
MDR <2.5 (low risk) (N = 14,175) MDR > 2.5 (high risk) (N = 10,820)
Test set MDR <2.5 (N = 10,441) > 2.5 (N = 7972)
Could facilitate model sharing?
CDK +fragment descriptors MOE 2D +fragment descriptors
Kappa 0.65 0.67
sensitivity 0.86 0.86
specificity 0.78 0.8
PPV 0.84 0.84
![Page 11: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/11.jpg)
MoDELS RESIDE IN PAPERS
NOT ACCESSIBLE…THIS IS
UNDESIRABLE
How do we share them?
How do we use Them?
![Page 12: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/12.jpg)
Open Extended Connectivity Fingerprints
ECFP_6 FCFP_6• Collected,
deduplicated, hashed
• Sparse integers
• Invented for Pipeline Pilot: public method, proprietary details
• Often used with Bayesian models: many published papers
• Built a new implementation: open source, Java, CDK– stable: fingerprints don't change with each new toolkit release
– well defined: easy to document precise steps
– easy to port: already migrated to iOS (Objective-C) for TB Mobile app
• Provides core basis feature for CDD open source model serviceClark et al., J Cheminform 6:38 2014
![Page 13: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/13.jpg)
Select dataset actives in vault for model
![Page 14: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/14.jpg)
Build model
![Page 15: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/15.jpg)
Select dataset and actives
![Page 16: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/16.jpg)
BBB Model output
![Page 17: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/17.jpg)
View models
![Page 18: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/18.jpg)
View predictions and Applicability
Applicability = 1 then molecule is in the model training setSelect more models…
![Page 19: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/19.jpg)
Exporting models from CDD
Clark et al., JCIM 55: 1231-1245 (2015)
![Page 20: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/20.jpg)
Export model from CDD and open in mobile apps
Clark et al., JCIM 55: 1231-1245 (2015)
![Page 21: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/21.jpg)
Machine Learning – Different tools• Models generated using : molecular
function class fingerprints of maximum
diameter 6 (FCFP_6), AlogP, molecular
weight, number of rotatable bonds,
number of rings, number of aromatic
rings, number of hydrogen bond
acceptors, number of hydrogen bond
donors, and molecular fractional polar
surface area.
• Models were validated using five-fold
cross validation (leave out 20% of the
database).
• Bayesian, Support Vector Machine and
Recursive Partitioning Forest and single
tree models built.
• RP Forest and RP Single Tree models
used the standard protocol in Discovery
Studio.
• 5-fold cross validation or leave out 50%
x 100 fold cross validation was used to
calculate the ROC for the models
generated
• *fingerprints only Ai et al., ADDR 86: 46-60, 2015
KCNQ1
![Page 22: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/22.jpg)
Ames Bayesian model built with 6512 molecules (Hansen et al., 2009)
Features important for Ames actives. Features important for Ames inactives.
![Page 23: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/23.jpg)
Ames Bayesian model built using CDD Models showing ROC for 3 fold cross validation. Note only FCFP_6 descriptors were used
![Page 24: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/24.jpg)
FCFP6 fingerprint models in CDD
Clark et al., JCIM 55: 1231-1245 (2015)
![Page 25: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/25.jpg)
ECFP6 fingerprint only models in MMDS
Clark et al., JCIM 55: 1231-1245 (2015)
![Page 26: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/26.jpg)
Using AZ-ChEMBL data for CDD Models
![Page 27: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/27.jpg)
• Human microsomal intrinsic clearance
• Rat hepatocyte intrinsic clearance
![Page 28: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/28.jpg)
• Human protein binding• Octanol water (logD7.4)
![Page 29: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/29.jpg)
• Solubility pH7.4
![Page 30: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/30.jpg)
Results for Bayesian model cross validation. 5-fold and Leave one out (LOO) validation with Bayesian models generated with Discovery Studio and Open Models implemented in the mobile app MMDS. * = previously published
Ekins et al Drug Metab Dispos In Press 2015
Transporter models
![Page 31: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/31.jpg)
Ekins et al Drug Metab Dispos In Press 2015
Transporter models
![Page 32: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/32.jpg)
Summary and Next Steps
• Shown that open source models/ descriptors comparable to previously published models with commercial software
• Implemented Bayesian machine learning in CDD Vault• Can be used on private or public data• Can enable sharing of models in CDD Vault• Enabled export of models – can use models in 3rd part mobile apps or
other tools• Demonstrated various ADME/Tox models and transporters
• Additional work with Dr. Joel Freundlich and Dr Alex Perryman on microsomal stability models
• Provide more information on models and predictions• Visualize training set molecules vs test compounds• Use a model to predict compounds and then test them
![Page 33: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/33.jpg)
Acknowledgements
• Antony Williams• Steven Wright • Barry Bunin and all colleagues at CDD
• Award Number 9R44TR000942-02 “Biocomputation across distributed private datasets to enhance drug discovery” from the NIH National Center for Advancing Translational Sciences.
• R41-AI108003-01 “Identification and validation of targets of phenotypic high throughput screening” from NIH National Institute of Allergy and Infectious Diseases
• Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”).
![Page 34: Development and sharing of ADME/Tox and Drug Discovery Machine learning models](https://reader031.fdocuments.in/reader031/viewer/2022021921/58f0df681a28abc36a8b457f/html5/thumbnails/34.jpg)
Models can be accessed at
• http://molsync.com/bayesian1
• http://molsync.com/transporters