QSAR/QSPR: the Universal Approach to the Prediction of Properties of Chemical Compounds and...
-
Upload
chester-bruce -
Category
Documents
-
view
218 -
download
2
Transcript of QSAR/QSPR: the Universal Approach to the Prediction of Properties of Chemical Compounds and...
QSAR/QSPR: the Universal Approach to the Prediction of Properties of
Chemical Compounds and Materials
V.A.Palyulin, I.I.Baskin, N.S.Zefirov
Department of Chemistry Moscow State University
"Every attempt to employ mathematical methods in the study of chemical questions must be considered profoundly irrational and contrary to the spirit of chemistry. If mathematical analysis should ever hold a prominent place in chemistry - an aberration which is happily almost impossible - it would occasion a rapid and widespread degeneration of that science."
A. Compte, 1798-1857
Fundamental Problem in Chemistry:
Evaluation of relationships
between the structures of chemical compounds and
their properties or biological activity
QSAR/QSPR: General Approach
A Structure Descriptors
Traini
ng
– – – – –
– – – – –
– – – – –
– – – – –
Test
– – – – –
– – – – –
New
? – – – –
? – – – –
N
N
Cl
N
N
N Cl
N
N
Br
N
F: A=F(S)
ΔA
Model
Predictivity
Prediction
PROPERTIES
Physico-chemical properties:
Boiling points, melting points, density, viscosity, surface tension, solubility in various solvents, lipophilicity, magnetic susceptibility, retention indices, dipole moments, enthalpy of formation, etc.
Biological activity:
IC50, EC50, LD50, MEC, ILS, etc.
Structural formula, Molecular graph,
Connectivity,H 2
C1
H 3
H 1 C2
H 4
H 5
O H 6
C1 C2 O H1 H2 H3 H4 H5 H6
C1 1 0 1 1 1 0 0 0
C2 1 1 0 0 0 1 1 0
O 0 1 0 0 0 0 0 1
H1 1 0 0 0 0 0 0 0
H2 1 0 0 0 0 0 0 0
H3 1 0 0 0 0 0 0 0
H4 0 1 0 0 0 0 0 0
H5 0 1 0 0 0 0 0 0
H6 0 0 1 0 0 0 0 0
C1 C2 O
C1 1 0
C2 1 1
O 0 1
C2H6O
DESCRIPTORS Topological indices: Connectivity indices (Randic, ; Kier-Hall, mv, solvation indices ms), Wiener W and expanded
Wiener, Balaban J, Gutman indices, Hosoya, Merrifield-Simmons indices, indices based on local invariants,
informational indices, …
Fragmental descriptors: The number of fragments of various size (chains, cycles, branched fragments) in a molecule with several levels of classification of atoms
Physico-chemical descriptors: Indices based on atomic charges and electronegativities, atomic inductive constants, VdW volume and surface, H-bond descriptors,
Lipophilicity (Log P), …
Quantum-mechanical3D
Usp.Khim. (Russ.Chem.Rev.), 57 (3), 337-366 (1988)
Randić Index ()
123
1 1
= 1/(3)1/2+1/(3)1/2+1/(6)1/2+1/(2)1/2=2.27
= 1/vivj
bonds
CH3
CH
CH3
CH2
CH3
C1 C2 C3 C4 C5
C1 1 0 0 0
C2 1 1 0 1
C3 0 1 1 0
C4 0 0 1 0
C5 0 1 0 0
C1
C2
C5
C3
C4
31
24
Prediction of Non-Specific Solvation Enthalpy of Organic Compounds
Dokl. Akad. Nauk, 1993, 331(2), 173-176
Solvation enthalpy (kJ/mol) Vaporization enthalpy (kJ/mol)SYA
solvH 1/ 04.952.4 n = 141 R = 0.985 s = 2.1
μ – dipole moment
1χS – 1-st order solvation topological index
Zi – period number (measure of atom size)
δi – number of non-hydrogen neighbors
21 827.052.913.4 SAvapH
n = 528 R = 0.989 s = 2.0
)(
1
4
1
bonds ji
jiS ZZ
The scheme of the design of new topological indices (TIs)
aSelection
of fragments
Construction of graph matrices and
their storage
Selectionof functions
Construction of topological indices
a) Using matrices b) Using already constructed TIs
The set of constructed TIsfor QSAR/QSPR studies
Prediction of Diffusion of Small Molecules in Polymers
log D pred.
log D exp.
at
at
HOMO
NW
NW
D
/)]~~
1[ln(82.1
/)]~
1[ln(13.4
)(min16.925.11log 2
n = 14 R = 0.989 s = 0.103 F = 145
D – diffusion coefficient (cm2/s)
Nat – number of non-hydrogen atoms
min ρHOMO – minimal HOMO π-electron density
– extended and inverted extended Wiener indicesWW~~
,~
Dokl. Akad. nauk. 1994 337 (2) 211-214
Sulfenamide Vulcanization Accelerators
Dokl. Akad. nauk. 1993 333(2) 189-192
Resistance to preliminary vulcanization (min)
Vulcanization rate constant (min-1)
Maximum torque increase (Nm)
)'(022.033.0/)][ln( 5min CSIN
22 )(max3.16/max87.5439.0 LUMO
CNk
2)(max447ln6.506.8/ LUMOCmSNR
n = 12 R = 0.989 s = 0.004 F = 444
n = 12 R = 0.990 s = 0.15 F = 213
n = 12 R = 0.989 s = 0.054 F = 134
N – number of non-hydrogen atoms
– maximum carbon LUMO π-electron density
Sm – molecular electronegativity
– indices based on atomic induction effect parameters max),(CSI
LUMOCmax
N
S
S
NR1
R2
Prediction of Mutagenicity of Substituted Biphenyls
n = 19 R = 0.94 s = 0.75 F = 39.3
Dokl. Akad. nauk. 1993 332(5) 587-589
ln (Nhis+) pred.
ln (Nhis+) exp.
ln (Nhis+) pred.
ln (Nhis+) exp.
59.3303.1
272.0118.1)ln(
Fr
FrFrNhis
967.04.107.36
66.2111)ln(
43
21
dd
ddNhis
H
N
H
H
COOH
Fr1 Fr2 Fr3
Nhis+ – number of revertantsFr1-3 – number of fragmentsd1 – minimum squared C-atom LUMO contributiond2 – minimum squared N-atom LUMO contributiond3 – maximum C-atom free valence indexd4 – average O-atom free valence index
n = 19 R = 0.95 s = 0.69 F = 35
Fragmental Descriptors
The numbers of fragments of various kind and
various size (chains, cycles, branched fragments) in a
molecule with several levels of classification of atoms.
For each molecule hundreds of fragmental descriptors
can be computed.
If a structure-property data set is sufficiently
large to allow building statistically significant models,
then any topological index can be replaced with a set of
substructural (or fragmental) descriptors.
NEURAL NETWORK SOFTWARE: NASAWIN
Predicted propertyParametersof models
Boiling Point, оС log (), (Pas) d20, g/cm3 log (VP),
(Pa) Log P
Number of compounds 509 531 367 803 352 7805
Average number of selected descriptors
46 54 46 69 56 741
Neural network model
Rav 0.9920 0.9960 0.9885 0.9980 0.9981 0.9827
RMSrain 8.7 3.7 0.084 0.021 0.090 0.3233
RMSval 14.2 4.5 0.104 0.046 0.122 0.3936
RMSpred 16.6 5.4 0.141 0.051 0.0152 0.3968
MLR
RMSav 0.9814 0.9946 0.9794 0.9885 0.9902 0.9702
RMStrain 12.9 4.3 0.111 0.038 0.198 0.4171
RMSval 16.7 5.0 0.195 0.055 0.248 0.4541
RMSpred 18.6 5.5 0.212 0.067 0.258 0.4324
Fragment types p1, p2, p3 p1, p2, p3, p4, p5, p8, c3, c4, c5, c6, c7, c8, c9, s4,
b0, b1, b4, b5 p1, p2, p3
p1, p2, p3, p4, p5, c4, c5, s4, s5
p1, p2px, cx, sx,
bx, tx
Fragmental descriptors in QSPR
Water Solubility
Boiling point [1] (diverse set of 885 compounds)
fragment types p1, p2, p3, p4, p5, p6, c3, c4, c5, c6, s4, s5, s6
Boiling point (2)
Anticoccidial Activity of Triazinediones
N
NN
O
O
H
X
R1Cl
R5R4
R3
R2
Cl
Glass Transition Temperature of Polymers
Molar Heat Capacity of Polymers in the Liquid State
Architecture of the Neural Device for Direct QSAR Neural device in application to the propane molecule :
(1) (2) (3)
EYE 1("looks" at atoms)
BRAIN
(1,2) (2,1) (2,3)
EYE 2("looks" at bonds)
(3,2)
CH3
CH3
CH2
1
2
3
1 2 3
SENSOR FIELD(each sensor detects the numberof the attached hydrogen atoms)
Baskin, I. I.; Palyulin, V. A.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 37, 715 (1997)
EXAMPLES OF THE DIRECT STRUCTURE-PROPERTY CORRELATIONS
Baskin, I. I.; Palyulin, V. A.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 37, 715 (1997)
PROPERTY Class of compoundsCorrelation coefficient
boiling point alkanes 0.999viscosity hydrocarbons 0.996heat of evaporation hydrocarbons 0.996density hydrocarbons 0.971heat of solvation in cyclohexane
mixed set of compounds 0.985
polarizability mixed set of compounds 0.995anesthetic pressure of gases
mixed set of organic and inorganic gases 0.990
New approach in QSAR: Neural Quantitative Structure-Conditions-
Property Relationships
Investigated property
Num-ber of entries
R St Sv
Boiling point of hydrocarbons under different pressures (оС)
14346 0.9996 2.8 2.8
Dynamic viscosity of hydrocarbons under different temperatures (ln units)
3426 0.9949 0.14 0.16
Density of hydrocar-bons under different temperatures (g/ml)
3056 0.9977 0.0063 0.0063
Acid hydrolysis rate constants for carboxylic acid esters under diffe-rent temperatures and different 2-component solvent composition (log units)
2092 0.9669 0.27 0.34
R – correlation coefficient; St and Sv – RMSE for the
training and validation sets
Molecular Field Topology Analysis (MFTA)
Q 1
Q 1 Q 0 R 0
Q N R 1
R 1
R N
Construction of Descriptor Matrix
Palyulin, V. A.; Radchenko, E. V.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 40, 659 (2000)
N
O NHN
CH3
CH3
R
Construction of Molecular Supergraph
Generation of novel promising structures
Model building
Local descriptors:
- Electrostatic- Steric- Lipophilic- Hydrogen bonding- Stereochemical- Topological
Molecular Supergraph Construction
N
NHON
Me
Me
N
NHON
Me
Me
N
NHON
Me
Me
N
NHON
Me
Me
N
NHON
Me
Me
R
N
NHON
Me
Me
1)
2)
3)
4)
5)
n)
Local DescriptorsSufficient coverage of major interaction typesEasy extension of the descriptor set
ElectrostaticElectrostaticGasteiger's atomic charge Q (electronegativity equalization)Absolute atomic charge Qa = abs(Q)Sanderson's electronegativity Electrotopological state ETS (Hall, Mohney, Kier)StericStericBondi's van der Waals radius RAtomic contribution to the molecular van der Waals surface SRelative steric accessibility A=S/Sfree
LipophilicLipophilicAtomic lipophilicity contribution La (environment-dependent - Ghose, Crippen)Group lipophilicity Lg (atom and attached hydrogens)Hydrogen bondingHydrogen bondingHydrogen bond donor (Hd) and acceptor (Ha) ability of an atom (Abraham)StereochemicalStereochemicalLocal stereochemical indicator variablesTopologicalTopologicalSite occupancy factors for atoms Pa and bonds Pb (1 if a feature is present)
Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptorTraining set: 31 compounds
R1 = H, Me, CH2CN
R2 = N
R
R
N
R
R
N N
R
R
N N
R
N
N
H2C
R = H, Me, F, Cl, Br, OH, NH2, OMe, CN, CH2NH2, CONH2, NO2, PhCOO
N
N
R1
R2
S
CH3
Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptorKi – inhibition of competitive bindingMED – minimum effective dose (hot plate test)
lg(1/Ki) lg(1/MED)
Q,R,Ha,Hd,Lg
F=7
R=0.960
Q2=0.850
Q,R
F=4
R=0.977
Q2=0.918
y (predicted)
y (original)
-5 -4 -3 -2 -1 0 1 2 3-5
-4
-3
-2
-1
0
1
2
3
Fit
Predicted lg(1/Ki)
Experimental
Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptorKi – inhibition of competitive binding
Q R
Ha Lg
Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to
nicotinic acetylcholine receptor
Construction of novel potentially active structures
Total generated structures: 1715 best structures wrt lg(1/Ki)
4.01
3.69
3.69
3.66
3.44
Activity range in training set -3.41 ... 2.05
N NR1 R2
R1 = Me, Et, CN, Pr, i-Pr, t-Bu, Ph,
R
R
R
N
R
N
R
N
R
где R = CH3, Cl, Br, NO2
R2 = Me, Et, Pr, CN, i-Pr, t-Bu
N N
N
Br
N N
N
Br
N
N N
N
N
N N
NN N
N
Cl
Bradycardic activity of 3,7,9,9-tetraalkyl- 3,7-diazabicyclo[3.3.1]nonanes
N N
R1 R2
R3 R4 Training set: 26 compounds
R1, R2 = Me, Pr, i-Pr, Bu, i-Bu, C5H11, C6H13, C10H21, CH2-c-Pr, CH2-c-C6H11, CH=CH2, CH2CH2CH=CH2
R3, R4 = Me, Et, Pr, Bu, -(CH2)3-, -(CH2)4-, -(CH2)5-
Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes
SR75 – ability to decrease pacemaker pulse frequency (target effect)F75 – ability to decrease myocardium contraction force (side effect)SelF – selectivity wrt FFRP75 – ability to increase refractory period (side effect)SelFRP – selectivity wrt FRP
lg(1/SR75) lg(1/F75) lg(1/FRP75) SelF SelFRP
Q,R,Ha,Hd
F=5
R=0.976
Q2=0.830
Q,R,Ha,Hd
F=3
R=0.932
Q2=0.800
Q,R,Ha,Hd
F=7
R=0.952
Q2=0.510
Q,R,Ha,Hd
F=6
R=0.972
Q2=0.819
Q,R,Ha,Hd
F=1
R=0.310
Q2=0.022
Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes
SR75 – ability to decrease pacemaker pulse frequency (target effect)
Predictedy (predicted)
y (original)
-1.5 -1.2 -0.9 -0.6 -0.3 0 0.3 0.6 0.9-1.5
-1.2
-0.9
-0.6
-0.3
0
0.3
0.6
0.9
Fit
Experimental
Q
R
Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes
SelF – selectivity of antiarrhythmic activity wrt myocardium contraction force
Predicted
Experimental
Q
R
y (predicted)
y (original)
-10 10 30 50 70 90 110 130 150 170 190-10
10
30
50
70
90
110
130
150
170
190
Fit
Ha
Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes
Construction of novel potentially active structures
NNR1 R2 R3
R1, R3 = Me, Et, Pr, i-Pr, t-Bu,
R2 = Me, Et, Pr, i-Pr, t-Bu
Total generated structures: 1055 best structures wrt SelF
N N
N N
N N N N N N
70.75
70.74
63.83 63.82 63.12
Activity range in training set 0.4 ... 177
ConclusionsQSAR/QSPR (Quantitative structure-activity/property relationships) approaches can be considered as universal techniques for the modeling and prediction of nearly any properties of chemical compounds and many properties of materials.
Some properties of materials can be predicted as dependent on the structure of small molecules used as additives (e.g. antioxidants, etc.).
A number of properties of polymers had been modelled as dependent of the chemical structure of monomeric unit (e.g. glass transition temperature, molar heat capacity for liquid and solid state, dielectric constant, refraction index).
AMPA–receptor modulators(“ampakines”)
The group of molecular designAcademician N. S. Zefirov – Head of Organic Chemistry Division
Dr. V.A. Palyulin – Head of GroupDr. I.I. BaskinDr. A.A.OliferenkoDr. E.V.RadchenkoDr. M.I.SkvortsovaDr. I.G.TikhonovaDr. M.S.BelenikinDr. A.A.Ivanov Dr. A.Yu.ZotovS.A.PisarevA.A.IvanovaA.A.Melnikov