Chemical Structure Generation Based on Inverse...
Transcript of Chemical Structure Generation Based on Inverse...
![Page 1: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/1.jpg)
Chemical Structure Generation
Based on Inverse Quantitative Structure-Property
Relationship/Quantitative Structure-Activity
Relationship
Kimito Funatsu
Chemical System Engineering, The University of Tokyo
Strasbourg Summer School on Chemoifnormatics
June 27, 2018
![Page 2: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/2.jpg)
2Outline
▪ 1: General introduction
– Inverse QSPR/QSAR
– Objective and hypothesis
▪ 2: Structure generation
▪ 3: Inverse QSPR/QSAR analysis (from y to x)
▪ 4: Structure generation based on inverse
QSPR/QSAR
▪ 5: Summary
![Page 3: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/3.jpg)
31. General Introduction
Molecular design with inverse quantitative
structure-property/activity relationship (QSPR/QSAR)
![Page 4: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/4.jpg)
4Molecular Design with QSPR/QSAR
y=f(x)
QSPR/QSAR
y=f(x)
Data from experiments
(compound, property)
Property,
activity..
Descriptors
Chemical
structures
Property value
MP [degree] -5
Viscosity [Pa·s] 8.0
LogP 3
LogS -1
Descriptors Values
MW [g/mol] 180.04
#HBA 3
#NBD 1
#Aromatic Rings 1
TPSA [Å2] 63.6
x
y
Quantitative structure-property relationship (QSPR)
Quantitative structure-activity relationship (QSAR)
![Page 5: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/5.jpg)
5Inverse QSPR/QSAR
y=f(x)
Descriptors
Inverse QSPR/QSAR
Property,
activity..
Structure
generator
x
Data from experiments
(compound, property)
Chemical
structures
Descriptors Values
MW [g/mol] 180.04
#HBA 3
#NBD 1
#Aromatic Rings 1
TPSA [Å2] 63.6
Chemical
structures
![Page 6: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/6.jpg)
6
y: Objective variable
(property, activity)
Inverse
QSPR/QSAR
analysis
x: Explanatory
variables
(Descriptors)
Chemical structures
(chemical graphs)
Obtaining x information from y
Inverse QSPR/QSAR Workflow
![Page 7: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/7.jpg)
7
y: Objective variable
(property, activity)
Inverse
QSPR/QSAR
analysis
x: Explanatory
variables
(Descriptors)
Chemical structures
(chemical graphs)
Generating structures based on x information
Inverse QSPR/QSAR Workflow
![Page 8: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/8.jpg)
8Challenges in Inverse QSPR/QSAR
y: Objective variable
(property, activity)
Inverse
QSPR/QSAR
analysis
x: Explanatory
variables
(Descriptors)
Chemical structures
(chemical graphs)Obtaining x information from y
Not considering applicability domain
(AD)
Poor predictability by multiple linear
regression(MLR) model
![Page 9: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/9.jpg)
9Applicability Domain (AD)
▪Only inside AD, predicted values produced by regression models
should be reliable.
– Density-based method
– Ensemble-based method
sampledescriptor
space x
Baskin, I. I., Kireeva, N. and Varnek, A. Mol. Inform. 29, 581–587, 2010
Kaneko, H. and Funatsu, K. J. Chem. Inf. Model. 54, 2469–2482, 2014
![Page 10: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/10.jpg)
10Applicability Domain (AD)
Baskin, I. I., Kireeva, N. and Varnek, A. Mol. Inform. 29, 581–587, 2010
Kaneko, H. and Funatsu, K. J. Chem. Inf. Model. 54, 2469–2482, 2014
descriptor
space xsample
Predicted value is reliable.
▪Only inside AD, predicted values produced by regression models
should be reliable.
– Density-based method
– Ensemble-based method
![Page 11: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/11.jpg)
11Applicability Domain (AD)
descriptor
space xsample
Predicted value may be unreliable.
Baskin, I. I., Kireeva, N. and Varnek, A. Mol. Inform. 29, 581–587, 2010
Kaneko, H. and Funatsu, K. J. Chem. Inf. Model. 54, 2469–2482, 2014
▪Only inside AD, predicted values produced by regression models
should be reliable.
– Density-based method
– Ensemble-based method
![Page 12: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/12.jpg)
12Applicability Domain (AD)
descriptor
space xsample
AD In a density-based method,
Probability density function
p(x) (PDF) is a criterion for AD
Baskin, I. I., Kireeva, N. and Varnek, A. Mol. Inform. 29, 581–587, 2010
Kaneko, H. and Funatsu, K. J. Chem. Inf. Model. 54, 2469–2482, 2014
▪Only inside AD, predicted values produced by regression models
should be reliable.
– Density-based method
– Ensemble-based method
![Page 13: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/13.jpg)
13Applicability Domain (AD)
▪ In inverse QSPR/QSAR analysis, AD has not been considered.
– Not considering training data information
• Extrapolation is allowed without limitation.
descriptor
space xsample
AD In a density-based method,
Probality density function
p(x)(PDF) is a criterion for AD
![Page 14: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/14.jpg)
14Challenges in Inverse QSPR/QSAR
y: Objective variable
(property, activity)
Inverse
QSPR/QSAR
analysis
x: Explanatory
variables
(Descriptors)
Chemical structures
(chemical graphs)Chemical structure generation
Treating limited variety (number) of
descriptors
Not considering universal AD
![Page 15: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/15.jpg)
15Descriptors in Inverse QSPR/QSAR Analysis
▪Specific type of descriptors is employed.
– Kier indices, Χ indices, Wiener index.
– Signatures
▪Proper descriptor set varies from projects to projects.
M. I. Skvortsova, I. I. Baskin, et al., J. Chem. Inf. Comput. Sci., 33, 4, 630–634, 1993.
C. J. Churchwell, et al., J. Mol. Graph. Model., 22, 263–273, 2004.
Kirkpatrick, P. and Ellis, C. Nature, 432, 823–823, 2004.
![Page 16: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/16.jpg)
16Universal AD
▪Universal AD is an abstract concept,
which is irrelevant with models
– Determined based only on the training data
before constructing any QSPR/QSAR models.
Baskin, I. I., Kireeva, N. and Varnek, A. Mol. Inform. 29, 581–587, 2010
![Page 17: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/17.jpg)
17Universal AD
▪Simple example: boiling point model
Seybold, P. G., May, M. and Bagal, U. A. J. Chem. Educ. 64, 575, 1987
bp ℃ = −126.19 + 33.42N𝑐 − 6.286T𝑚
N𝑐 ∶ Number of carbon atomsT𝑚 ∶ Number of terminal carbon atoms
n = 39
s = 5.86
r2 = 0.987
![Page 18: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/18.jpg)
18Universal AD
▪Simple example: boiling point model
bp ℃ = −126.19 + 33.42N𝑐 − 6.286T𝑚
This equation is valid for C2-C8 alkanes.
Structures to be generated
should be restricted to alkanes.
Seybold, P. G., May, M. and Bagal, U. A. J. Chem. Educ. 64, 575, 1987
![Page 19: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/19.jpg)
19Review the Challenges
Challenges in inverse QSPR/QSAR analysis
Chemical structure generation
Treating limited variety (number)
of descriptors
Not considering universal AD
Obtaining x information from y
Not considering AD
Poor predictability by MLR
![Page 20: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/20.jpg)
20Objective
To develop a practical chemical structure generation
system based on inverse QSPR/QSAR
by overcoming the challenges.
y: Objective variable
(property, activity)
Inverse
QSPR/QSAR
analysis
x: Explanatory
variables
(Descriptors)
Chemical structures
(chemical graphs)
![Page 21: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/21.jpg)
21
y: predicted property/activity
QSPR/QSAR model
x: Molecular descriptors
Experimental data(compound, property/activity)
Workflow of QSPR/QSAR analysis
a specific y value
QSAR and Inverse QSAR Workflow
![Page 22: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/22.jpg)
22
y: predicted property/activity
QSPR/QSAR model
x: Molecular descriptors
Experimental data(compound, property/activity)
Workflow of QSPR/QSAR analysis
a specific y value
Inverse QSPR/QSAR analysis
QSPR and Inverse QSPR Workflow
![Page 23: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/23.jpg)
23
y: predicted property/activity
QSPR/QSAR model
x: Molecular descriptors
Experimental data(compound, property/activity)
Workflow of QSPR/QSAR analysis
a specific y value
Inverse QSPR/QSAR analysis
x information
QSPR and Inverse QSPR Workflow
![Page 24: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/24.jpg)
24
y: predicted property/activity
QSPR/QSAR model
x: Molecular descriptors
Experimental data(compound, property/activity)
Workflow of QSPR/QSAR analysis
a specific y value
Inverse QSPR/QSAR analysis
x information
Exhaustive chemical
structures based on
x information
QSPR and Inverse QSPR Workflow
![Page 25: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/25.jpg)
25Hypothesis
Chemical structures giving a specific y can be
exhaustively generated in inverse QSPR/QSAR by
✓ considering local and universal ADs,
✓ using efficient structure generator,
✓ introducing variety of descriptors.
![Page 26: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/26.jpg)
26Strategy
To develop a practical chemical structure generation
system based on inverse QSPR/QSAR
Chemical structure generation
Explaining a methodology for
using variety of descriptors
Describing algorithms for
treating chemical graphs
Obtaining x information from y
Introducing probability density
for treating AD
Explaining a non-linear
regression methodology
![Page 27: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/27.jpg)
272. Structure Generation
Chemical structure generation
Treating limited variety (number)
of descriptors
Not considering universal AD
Challenges
Goal To develop a structure generator
that overcomes challenges as follows:
![Page 28: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/28.jpg)
28Proposed Methodologies
Chemical structure generation
Treating limited variety (number)
of descriptors
Not considering universal AD
• Using ring systems and atom fragments as building blocks
in structure generation.
• Introducing monotonous changing descriptors (MCDs)
Challenges
B. D. McKay, J. Algorithms, 26, 306-324, 1993
Miyao, T., Arakawa, M. and Funatsu, K. Mol. Inform. 29, 111–125, 2010
![Page 29: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/29.jpg)
29Generation Strategy for Considering Universal AD
Training data • Ring systems in the training dataset.
• Elements in the training dataset.
Bemis, G. W. and Murcko, M. A. J. Med. Chem. 39, 2887–2893, 1996
Taylor, R. D., MacCoss, M. and Lawson, A. D. G. J. Med. Chem. 57, 5845–5859, 2014
Building blocks (number and kinds)
![Page 30: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/30.jpg)
30Structure Generation by Combining Building Blocks
CH2
NH
Ring systems and atom fragments are combined to form
a chemical graph.
![Page 31: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/31.jpg)
31Challenges in Structure Generation
▪Generating duplicate structures
– Combinatorial explosion
CH2
NH
![Page 32: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/32.jpg)
32Strategies for Pursuing Computational Efficiency
▪Modifying the canonical construction path method
to treat building blocks
– Assure the uniqueness and exhaustiveness
of the generated structures
▪Using reduced graphs instead of ring systems
– For speeding up graph operation
during structure generation
▪Combining building blocks in a tree-like approach.
B. D. McKay, J. Algorithms, 26, 306-324, 1998
Miyao, T., Kaneko H., Funatsu, K. J. Comput. Aided Mol. Des. 2016, 30, 425-446.
![Page 33: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/33.jpg)
33How to Use a Variety Set of Descriptors
▪Monotonous changing descriptors (MCDs)
– MCDs are descriptors whose values change
monotonously by adding a building block to a
growing structure.
• molecular weight, topological indices
Miyao, T., Arakawa, M. and Funatsu, K. Mol. Inform. 29, 111–125, 2010.
Example
![Page 34: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/34.jpg)
34Regression Model with MCDs
Opt.
Compt. aQ5fold
2 RMSEcv R2 RMSEpred R2pred
MCD 10 0.832 0.354 0.891 0.385 0.836
DRAGON 11 0.859 0.324 0.918 0.347 0.867
▪MCDs: 409 extracted from DRAGON 5 (790 descriptors)
▪Data set: Ligands for alpha 2A adrenergic receptor (GVK)
– y: pKi
– training data: 500, test data: 143
▪Regression method: partial least squares regression
DRAGON for Windows (Software for Molecular Descriptor Calculation) version 5.4.
GVK data base, http://www.gvkbio.com
![Page 35: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/35.jpg)
35Partial Conclusions
▪Propose efficient structure generation algorithms
by combining ring systems and atom fragments
▪Using all MCDs in DRAGON has molecular description ability
compatible to the comprehensive descriptors
– High predictability of a PLS regression model for alpha 2A adrenoceptor.
![Page 36: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/36.jpg)
363. Inverse QSPR/QSAR Analysis (from y to x)
Challenges
Goal To develop a inverse QSPR/QSAR methodology
that overcomes challenges as follows:
Obtaining x information from y
Not considering applicability
domain
Poor predictability by MLR
![Page 37: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/37.jpg)
37Proposed Methodologies
• Probability density function (PDF)• Gaussian mixture models (GMMs)
• Pseudo nonlinear regression methodology
• GMMs and cluster-wise multiple linear regression
(GMMs/cMLR)Challenges
Obtaining x information from y
Not considering applicability domain
Poor predictability by MLR
Miyao, T., Kaneko, H. and Funatsu, K. J. Chem. Inf. Model. 56, 286–299, 2016.
In order to consider AD
![Page 38: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/38.jpg)
38Proposed Methodologies
• GMM: p(x)
• GMMs/cMLR: p(y|x)
Posterior density: p(x|y)
( )=
=M
i
iii Np1
Σ,|)( μxx ( )=
−− +=M
i
ii
T
i yNyp1
12 },{|)|( μΣAxx
Prior distribution (p(x)) Posterior distribution (p(x|y))
Molecular weight Molecular weight
y value
In order to consider AD
![Page 39: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/39.jpg)
39Evaluation of Posterior PDF for a Criterion of AD
▪Aqueous solubility dataset
– training data: 900
– test data: 254
– Objective variable: LogS
– Descriptors (6):
• Molecular weight (MW)
• Hydrogen bond donor (HBD)
• Hydrogen bond acceptor (HBA)
• Number of rings (CIC)
• Topological polar surface area (TPSA)
• Number of rotatable bonds (nBR)
Hou, T J and Xia, K and Zhang, W and Xu, X J, J. chem. inf. comput. sci., 44, 266-275, 2004
![Page 40: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/40.jpg)
40QSPR Model Construction
▪7 Gaussians formed a prior PDF: p(x)
▪With the Gaussians, GMMs/cMLR model was constructed.
R2 RMSE Rpred2 RMSEpred
MLR 0.736 1.061 0.722 1.131
GMMs/cMLR 0.853 0.791 0.854 0.820
![Page 41: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/41.jpg)
41
log(p(x|y=-6))
log(p
(x))
Test dataset
Green: far from the target y value
Brown: close to the target y value
• Could inherit the prior distribution’s feature
Posterior PDF of x Given a Specific y Value p(x|y)
![Page 42: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/42.jpg)
42Partial Conclusions
▪AD was considered by the posterior PDF
with GMMs and cluster-wise multiple linear
regression(cMLR)
▪Posterior PDF gains information
from the degree of closeness to a target y value.
▪GMMs/cMLR showed better predictability than
MLR did.
![Page 43: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/43.jpg)
434. Proposed Workflow for Inverse QSPR/QSAR
Miyao, T., Arakawa, M., Funatsu, K., Mol. Inform., 29, 111–125, 2010
Miyao, T., Kaneko, H., Funatsu, K., Mol. Inform., 33, 764-778, 2014
QSPR/ QSAR modelwith cMLR :p(y|x)
Descriptors:MCDs
Dataset for QSPR/QSAR
model
Prior distributionGMM :p(x)
Atom fragments with hydrogen atoms
canonical ring systems
Posterior distributionby Bayes’ theorm
:p(x|y )
Structure generationby Molgilla*
Desiredproperty/ activity: y
Constraints in the generation process based on
the center of a Gaussian
… CH3 CH2 CH NH2 NH ...
*Molgilla is a structure generator developed by our lab.
![Page 44: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/44.jpg)
44Case Study (Inverse QSAR)
▪ Target: Thrombin
▪ Dataset: from ChEBML 20,
1705 samples annotated with pKi (inhibition constant)
– confidence score > 7
– bioactivity type Ki
– assay type = B
▪ Elimination of peptide (more than 10 amide bonds or MW > 1,000)
▪ Descriptors: 27 MCDs
CIC R05 aR ZM1V nBMnHAcc
LipinnCH2R2 nCHR3 nCH3R
nCH3X nOH =O nArNR2 nArCO TPSA LL LD LP
AA AP AN DD RL RA RD RP RR
Blue: Sum of topological distance-based descriptors - inspired by the Chemically Advanced Template Search (CATS)
Reutlinger, M., Koch, C. P., Reker, D., Todoroff, N., Schneider, P., Rodrigues, T., Schneider, G., Mol. Inform. 32,
133–138. 2013
![Page 45: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/45.jpg)
45p(x) and p(y|x) with GMMs/cMLR
Observed pKi
Pre
dic
ted p
Ki
• RMSE: 0.993
• R2: 0.656
p(y|x)p(x)
• 8 Gaussians formed p(x)
Topological polar surface area (TPSA) [Å2]
Density
![Page 46: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/46.jpg)
46
p(x|y = 11)
Posterior Density p(x|y)
y=f(x) y valuex coordinates
Target Gaussian
for structure generation.
Numbers: the Gaussian centers
![Page 47: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/47.jpg)
47Structure Generation Conditions and Result
▪ Stochastic generation was conducted in order to generate structures having
more building blocks.
▪ Prohibition rules were applied to avoid generating structures having
reactive and unstable substructures
▪ Ring systems: 289
▪ Atom fragments: C, N, O, F, Cl, Br, I
Generated structures 1,739
Generation Result
![Page 48: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/48.jpg)
48Generated Structures
d
p(x|y =11): 2.22
predicted pKi: 9.03
p(x|y =11): 1.02
predicted pKi: 9.09
c
b
p(x|y =11): 0.29
predicted pKi: 9.51
![Page 49: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/49.jpg)
49Generated Structures
p(x|y =11): 0.04
predicted pKi: 9.97
p(x|y =11): 8.35
predicted pKi: 8.29
a
e
![Page 50: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/50.jpg)
50Partial Conclusions
▪Structure generation system based
on inverse QSPR/QSAR was constructed
– A Gaussian center of posterior PDF p(x|y) is
selected for structure generation constraints
▪As a practical application, small molecules
for thrombin inhibitor candidates were generated.
![Page 51: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/51.jpg)
51Summary
▪Efficient chemical graph construction algorithms were introduced.
– ring systems and atom fragments combination
– constraints by MCDs are considered during generation
▪ Inverse QSPR/QSAR analysis methodology was proposed.
– by introducing PDFs with GMMs/cMLR
• AD consideration
• higher predictability than with MLR
▪A structure generation system by combining these methodologies
was proposed in order to generate chemical structures de novo.
![Page 52: Chemical Structure Generation Based on Inverse ...infochim.u-strasbg.fr/CS3_2018/Presentations/... · Workflow of QSPR/QSAR analysis a specific y value Inverse QSPR/QSAR analysis](https://reader030.fdocuments.in/reader030/viewer/2022040913/5e8a10e200887465e812b3d0/html5/thumbnails/52.jpg)
52Acknowledgements
– Dr. Miyao, Tomoyuki
– Dr. Kaneko, Hiromasa
– Dr. Escobar, Matheus
– Dr. Tanaka, Kenichi
– Prof. Dr. Schneider, Gisbert
– Dr. Schneider, Petra