Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert...
-
Upload
hannah-lawrence -
Category
Documents
-
view
213 -
download
0
Transcript of Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert...
![Page 1: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/1.jpg)
Domain of Applicability
A Cluster-Based Measure of Domain of Applicability of a QSAR Model
Robert Stanforth
6 September 2005
![Page 2: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/2.jpg)
© IDBS 2005
DC = DD + DM + DA - c
What is QSAR?
Motivation
Modelling the Dataset
Measure of Distance from Domain
Validation
Overview
![Page 3: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/3.jpg)
© IDBS 2005
DC = DD + DM + DA - c
What is QSAR?
Quantitative Structure-Activity Relationships BiologicalActivity = f ( ChemicalStructure ) + Error
Descriptor-based QSAR Descriptors measure chemical structure
E.g. topological indices of chemical graph
Use Multivariate Linear Regression Regress activity onto high-dimensional descriptor space
Problem of extrapolation
3c=0 3c=0.289 3c=0.408 3c=0.667 3c=1.802
![Page 4: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/4.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Motivation
QSAR model only valid in domain of its training set
Measure membership of this ‘domain of applicability’
Provides assurance of: External test set
k-fold cross validation
Prediction
??
![Page 5: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/5.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Bounding Box
Convex Hull
Distance to Centroid
Nearest Neighbour and k-NN Methods
Existing Methods
?
?
![Page 6: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/6.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Use ‘clusters’ to model the shape of the dataset
K-Means algorithm iteratively adjusts partitioning into clusters to increase accuracy of the model
Computationally feasible
K-Means for Clustering
![Page 7: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/7.jpg)
![Page 8: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/8.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Use the K-Means Model Base on distances to cluster centroids
Fuzzy cluster membership
Weighted average of distances to cluster centroids,
weighted according to cluster membership
Computationally efficient
Measure of Distance
![Page 9: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/9.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Contour Plot First contour defines boundary of applicability domain
Measure of Distance
![Page 10: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/10.jpg)
![Page 11: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/11.jpg)
![Page 12: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/12.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Assess stability of distance measure
Use k-fold cross validation
Leave out one group at a time
Retrain distance measure
Mean relative change in distance of compounds left out
Internal Validation
![Page 13: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/13.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Internal Validation
Method Averaged Relative Deviation
Bounding Box 53.2%
Leverage 80.5%
k-NN 83.1%
Cluster-based 43.2%
![Page 14: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/14.jpg)
© IDBS 2005
DC = DD + DM + DA - c
External Validation
Assess relationship between distance and prediction error
Analyse mean-square prediction error over: 50 ‘new’ compounds
Those inside domain
Those outside domain
![Page 15: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/15.jpg)
© IDBS 2005
DC = DD + DM + DA - c
External Validation
Mean Square Prediction Error
Method All(50)
Inside Domain
Outside Domain
Bounding Box 2.76 3.08(27)
2.40(23)
Leverage 2.76 2.81(48)
1.61(2)
k-NN 2.76 2.73(45)
3.11(5)
Cluster-based 2.76 2.70(46)
3.58(4)
![Page 16: Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515d33255034638038b4610/html5/thumbnails/16.jpg)
© IDBS 2005
DC = DD + DM + DA - c
Need quantitative measure of applicability of a descriptor-based QSAR model to a structure
Existing methods are all either too crude or too slow
Our new method is computationally efficient, and copes well
with non-convex domains
Conclusions