QSAR/QSPR: the Universal Approach to the Prediction of Properties of Chemical Compounds and...

Post on 25-Dec-2015

218 views 2 download

Tags:

Transcript of QSAR/QSPR: the Universal Approach to the Prediction of Properties of Chemical Compounds and...

QSAR/QSPR: the Universal Approach to the Prediction of Properties of

Chemical Compounds and Materials

V.A.Palyulin, I.I.Baskin, N.S.Zefirov

Department of Chemistry Moscow State University

"Every attempt to employ mathematical methods in the study of chemical questions must be considered profoundly irrational and contrary to the spirit of chemistry. If mathematical analysis should ever hold a prominent place in chemistry - an aberration which is happily almost impossible - it would occasion a rapid and widespread degeneration of that science."

A. Compte, 1798-1857

Fundamental Problem in Chemistry:

Evaluation of relationships

between the structures of chemical compounds and

their properties or biological activity

QSAR/QSPR: General Approach

A Structure Descriptors

Traini

ng

– – – – –

– – – – –

– – – – –

– – – – –

Test

– – – – –

– – – – –

New

? – – – –

? – – – –

N

N

Cl

N

N

N Cl

N

N

Br

N

F: A=F(S)

ΔA

Model

Predictivity

Prediction

PROPERTIES

Physico-chemical properties:

Boiling points, melting points, density, viscosity, surface tension, solubility in various solvents, lipophilicity, magnetic susceptibility, retention indices, dipole moments, enthalpy of formation, etc.

Biological activity:

IC50, EC50, LD50, MEC, ILS, etc.

Structural formula, Molecular graph,

Connectivity,H 2

C1

H 3

H 1 C2

H 4

H 5

O H 6

C1 C2 O H1 H2 H3 H4 H5 H6

C1 1 0 1 1 1 0 0 0

C2 1 1 0 0 0 1 1 0

O 0 1 0 0 0 0 0 1

H1 1 0 0 0 0 0 0 0

H2 1 0 0 0 0 0 0 0

H3 1 0 0 0 0 0 0 0

H4 0 1 0 0 0 0 0 0

H5 0 1 0 0 0 0 0 0

H6 0 0 1 0 0 0 0 0

C1 C2 O

C1 1 0

C2 1 1

O 0 1

C2H6O

DESCRIPTORS Topological indices: Connectivity indices (Randic, ; Kier-Hall, mv, solvation indices ms), Wiener W and expanded

Wiener, Balaban J, Gutman indices, Hosoya, Merrifield-Simmons indices, indices based on local invariants,

informational indices, …

Fragmental descriptors: The number of fragments of various size (chains, cycles, branched fragments) in a molecule with several levels of classification of atoms

Physico-chemical descriptors: Indices based on atomic charges and electronegativities, atomic inductive constants, VdW volume and surface, H-bond descriptors,

Lipophilicity (Log P), …

Quantum-mechanical3D

Usp.Khim. (Russ.Chem.Rev.), 57 (3), 337-366 (1988)

Randić Index ()

123

1 1

= 1/(3)1/2+1/(3)1/2+1/(6)1/2+1/(2)1/2=2.27

= 1/vivj

bonds

CH3

CH

CH3

CH2

CH3

C1 C2 C3 C4 C5

C1 1 0 0 0

C2 1 1 0 1

C3 0 1 1 0

C4 0 0 1 0

C5 0 1 0 0

C1

C2

C5

C3

C4

31

24

Prediction of Non-Specific Solvation Enthalpy of Organic Compounds

Dokl. Akad. Nauk, 1993, 331(2), 173-176

Solvation enthalpy (kJ/mol) Vaporization enthalpy (kJ/mol)SYA

solvH 1/ 04.952.4 n = 141 R = 0.985 s = 2.1

μ – dipole moment

1χS – 1-st order solvation topological index

Zi – period number (measure of atom size)

δi – number of non-hydrogen neighbors

21 827.052.913.4 SAvapH

n = 528 R = 0.989 s = 2.0

)(

1

4

1

bonds ji

jiS ZZ

The scheme of the design of new topological indices (TIs)

aSelection

of fragments

Construction of graph matrices and

their storage

Selectionof functions

Construction of topological indices

a) Using matrices b) Using already constructed TIs

The set of constructed TIsfor QSAR/QSPR studies

Prediction of Diffusion of Small Molecules in Polymers

log D pred.

log D exp.

at

at

HOMO

NW

NW

D

/)]~~

1[ln(82.1

/)]~

1[ln(13.4

)(min16.925.11log 2

n = 14 R = 0.989 s = 0.103 F = 145

D – diffusion coefficient (cm2/s)

Nat – number of non-hydrogen atoms

min ρHOMO – minimal HOMO π-electron density

– extended and inverted extended Wiener indicesWW~~

,~

Dokl. Akad. nauk. 1994 337 (2) 211-214

Sulfenamide Vulcanization Accelerators

Dokl. Akad. nauk. 1993 333(2) 189-192

Resistance to preliminary vulcanization (min)

Vulcanization rate constant (min-1)

Maximum torque increase (Nm)

)'(022.033.0/)][ln( 5min CSIN

22 )(max3.16/max87.5439.0 LUMO

CNk

2)(max447ln6.506.8/ LUMOCmSNR

n = 12 R = 0.989 s = 0.004 F = 444

n = 12 R = 0.990 s = 0.15 F = 213

n = 12 R = 0.989 s = 0.054 F = 134

N – number of non-hydrogen atoms

– maximum carbon LUMO π-electron density

Sm – molecular electronegativity

– indices based on atomic induction effect parameters max),(CSI

LUMOCmax

N

S

S

NR1

R2

Prediction of Mutagenicity of Substituted Biphenyls

n = 19 R = 0.94 s = 0.75 F = 39.3

Dokl. Akad. nauk. 1993 332(5) 587-589

ln (Nhis+) pred.

ln (Nhis+) exp.

ln (Nhis+) pred.

ln (Nhis+) exp.

59.3303.1

272.0118.1)ln(

Fr

FrFrNhis

967.04.107.36

66.2111)ln(

43

21

dd

ddNhis

H

N

H

H

COOH

Fr1 Fr2 Fr3

Nhis+ – number of revertantsFr1-3 – number of fragmentsd1 – minimum squared C-atom LUMO contributiond2 – minimum squared N-atom LUMO contributiond3 – maximum C-atom free valence indexd4 – average O-atom free valence index

n = 19 R = 0.95 s = 0.69 F = 35

Fragmental Descriptors

The numbers of fragments of various kind and

various size (chains, cycles, branched fragments) in a

molecule with several levels of classification of atoms.

For each molecule hundreds of fragmental descriptors

can be computed.

If a structure-property data set is sufficiently

large to allow building statistically significant models,

then any topological index can be replaced with a set of

substructural (or fragmental) descriptors.

NEURAL NETWORK SOFTWARE: NASAWIN

Predicted propertyParametersof models

Boiling Point, оС log (), (Pas) d20, g/cm3 log (VP),

(Pa) Log P

Number of compounds 509 531 367 803 352 7805

Average number of selected descriptors

46 54 46 69 56 741

Neural network model

Rav 0.9920 0.9960 0.9885 0.9980 0.9981 0.9827

RMSrain 8.7 3.7 0.084 0.021 0.090 0.3233

RMSval 14.2 4.5 0.104 0.046 0.122 0.3936

RMSpred 16.6 5.4 0.141 0.051 0.0152 0.3968

MLR

RMSav 0.9814 0.9946 0.9794 0.9885 0.9902 0.9702

RMStrain 12.9 4.3 0.111 0.038 0.198 0.4171

RMSval 16.7 5.0 0.195 0.055 0.248 0.4541

RMSpred 18.6 5.5 0.212 0.067 0.258 0.4324

Fragment types p1, p2, p3 p1, p2, p3, p4, p5, p8, c3, c4, c5, c6, c7, c8, c9, s4,

b0, b1, b4, b5 p1, p2, p3

p1, p2, p3, p4, p5, c4, c5, s4, s5

p1, p2px, cx, sx,

bx, tx

Fragmental descriptors in QSPR

Water Solubility

Boiling point [1] (diverse set of 885 compounds)

fragment types p1, p2, p3, p4, p5, p6, c3, c4, c5, c6, s4, s5, s6

Boiling point (2)

Anticoccidial Activity of Triazinediones

N

NN

O

O

H

X

R1Cl

R5R4

R3

R2

Cl

Glass Transition Temperature of Polymers

Molar Heat Capacity of Polymers in the Liquid State

Architecture of the Neural Device for Direct QSAR Neural device in application to the propane molecule :

(1) (2) (3)

EYE 1("looks" at atoms)

BRAIN

(1,2) (2,1) (2,3)

EYE 2("looks" at bonds)

(3,2)

CH3

CH3

CH2

1

2

3

1 2 3

SENSOR FIELD(each sensor detects the numberof the attached hydrogen atoms)

Baskin, I. I.; Palyulin, V. A.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 37, 715 (1997)

EXAMPLES OF THE DIRECT STRUCTURE-PROPERTY CORRELATIONS

Baskin, I. I.; Palyulin, V. A.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 37, 715 (1997)

PROPERTY Class of compoundsCorrelation coefficient

boiling point alkanes 0.999viscosity hydrocarbons 0.996heat of evaporation hydrocarbons 0.996density hydrocarbons 0.971heat of solvation in cyclohexane

mixed set of compounds 0.985

polarizability mixed set of compounds 0.995anesthetic pressure of gases

mixed set of organic and inorganic gases 0.990

New approach in QSAR: Neural Quantitative Structure-Conditions-

Property Relationships

Investigated property

Num-ber of entries

R St Sv

Boiling point of hydrocarbons under different pressures (оС)

14346 0.9996 2.8 2.8

Dynamic viscosity of hydrocarbons under different temperatures (ln units)

3426 0.9949 0.14 0.16

Density of hydrocar-bons under different temperatures (g/ml)

3056 0.9977 0.0063 0.0063

Acid hydrolysis rate constants for carboxylic acid esters under diffe-rent temperatures and different 2-component solvent composition (log units)

2092 0.9669 0.27 0.34

R – correlation coefficient; St and Sv – RMSE for the

training and validation sets

Molecular Field Topology Analysis (MFTA)

Q 1

Q 1 Q 0 R 0

Q N R 1

R 1

R N

Construction of Descriptor Matrix

Palyulin, V. A.; Radchenko, E. V.; Zefirov, N. S., J. Chem. Inf. Comput. Sci., 40, 659 (2000)

N

O NHN

CH3

CH3

R

Construction of Molecular Supergraph

Generation of novel promising structures

Model building

Local descriptors:

- Electrostatic- Steric- Lipophilic- Hydrogen bonding- Stereochemical- Topological

Molecular Supergraph Construction

N

NHON

Me

Me

N

NHON

Me

Me

N

NHON

Me

Me

N

NHON

Me

Me

N

NHON

Me

Me

R

N

NHON

Me

Me

1)

2)

3)

4)

5)

n)

Local DescriptorsSufficient coverage of major interaction typesEasy extension of the descriptor set

ElectrostaticElectrostaticGasteiger's atomic charge Q (electronegativity equalization)Absolute atomic charge Qa = abs(Q)Sanderson's electronegativity Electrotopological state ETS (Hall, Mohney, Kier)StericStericBondi's van der Waals radius RAtomic contribution to the molecular van der Waals surface SRelative steric accessibility A=S/Sfree

LipophilicLipophilicAtomic lipophilicity contribution La (environment-dependent - Ghose, Crippen)Group lipophilicity Lg (atom and attached hydrogens)Hydrogen bondingHydrogen bondingHydrogen bond donor (Hd) and acceptor (Ha) ability of an atom (Abraham)StereochemicalStereochemicalLocal stereochemical indicator variablesTopologicalTopologicalSite occupancy factors for atoms Pa and bonds Pb (1 if a feature is present)

Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to

nicotinic acetylcholine receptorTraining set: 31 compounds

R1 = H, Me, CH2CN

R2 = N

R

R

N

R

R

N N

R

R

N N

R

N

N

H2C

R = H, Me, F, Cl, Br, OH, NH2, OMe, CN, CH2NH2, CONH2, NO2, PhCOO

N

N

R1

R2

S

CH3

Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to

nicotinic acetylcholine receptorKi – inhibition of competitive bindingMED – minimum effective dose (hot plate test)

lg(1/Ki) lg(1/MED)

Q,R,Ha,Hd,Lg

F=7

R=0.960

Q2=0.850

Q,R

F=4

R=0.977

Q2=0.918

y (predicted)

y (original)

-5 -4 -3 -2 -1 0 1 2 3-5

-4

-3

-2

-1

0

1

2

3

Fit

Predicted lg(1/Ki)

Experimental

Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to

nicotinic acetylcholine receptorKi – inhibition of competitive binding

Q R

Ha Lg

Affinity of substituted 2,5-diazabicyclo[2.2.1]heptanes to

nicotinic acetylcholine receptor

Construction of novel potentially active structures

Total generated structures: 1715 best structures wrt lg(1/Ki)

4.01

3.69

3.69

3.66

3.44

Activity range in training set -3.41 ... 2.05

N NR1 R2

R1 = Me, Et, CN, Pr, i-Pr, t-Bu, Ph,

R

R

R

N

R

N

R

N

R

где R = CH3, Cl, Br, NO2

R2 = Me, Et, Pr, CN, i-Pr, t-Bu

N N

N

Br

N N

N

Br

N

N N

N

N

N N

NN N

N

Cl

Bradycardic activity of 3,7,9,9-tetraalkyl- 3,7-diazabicyclo[3.3.1]nonanes

N N

R1 R2

R3 R4 Training set: 26 compounds

R1, R2 = Me, Pr, i-Pr, Bu, i-Bu, C5H11, C6H13, C10H21, CH2-c-Pr, CH2-c-C6H11, CH=CH2, CH2CH2CH=CH2

R3, R4 = Me, Et, Pr, Bu, -(CH2)3-, -(CH2)4-, -(CH2)5-

Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes

SR75 – ability to decrease pacemaker pulse frequency (target effect)F75 – ability to decrease myocardium contraction force (side effect)SelF – selectivity wrt FFRP75 – ability to increase refractory period (side effect)SelFRP – selectivity wrt FRP

lg(1/SR75) lg(1/F75) lg(1/FRP75) SelF SelFRP

Q,R,Ha,Hd

F=5

R=0.976

Q2=0.830

Q,R,Ha,Hd

F=3

R=0.932

Q2=0.800

Q,R,Ha,Hd

F=7

R=0.952

Q2=0.510

Q,R,Ha,Hd

F=6

R=0.972

Q2=0.819

Q,R,Ha,Hd

F=1

R=0.310

Q2=0.022

Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes

SR75 – ability to decrease pacemaker pulse frequency (target effect)

Predictedy (predicted)

y (original)

-1.5 -1.2 -0.9 -0.6 -0.3 0 0.3 0.6 0.9-1.5

-1.2

-0.9

-0.6

-0.3

0

0.3

0.6

0.9

Fit

Experimental

Q

R

Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes

SelF – selectivity of antiarrhythmic activity wrt myocardium contraction force

Predicted

Experimental

Q

R

y (predicted)

y (original)

-10 10 30 50 70 90 110 130 150 170 190-10

10

30

50

70

90

110

130

150

170

190

Fit

Ha

Bradicardic activity of 3,7,9,9-tetraalkyl-3,7-diazabicyclo[3.3.1]nonanes

Construction of novel potentially active structures

NNR1 R2 R3

R1, R3 = Me, Et, Pr, i-Pr, t-Bu,

R2 = Me, Et, Pr, i-Pr, t-Bu

Total generated structures: 1055 best structures wrt SelF

N N

N N

N N N N N N

70.75

70.74

63.83 63.82 63.12

Activity range in training set 0.4 ... 177

ConclusionsQSAR/QSPR (Quantitative structure-activity/property relationships) approaches can be considered as universal techniques for the modeling and prediction of nearly any properties of chemical compounds and many properties of materials.

Some properties of materials can be predicted as dependent on the structure of small molecules used as additives (e.g. antioxidants, etc.).

A number of properties of polymers had been modelled as dependent of the chemical structure of monomeric unit (e.g. glass transition temperature, molar heat capacity for liquid and solid state, dielectric constant, refraction index).

AMPA–receptor modulators(“ampakines”)

The group of molecular designAcademician N. S. Zefirov – Head of Organic Chemistry Division

Dr. V.A. Palyulin – Head of GroupDr. I.I. BaskinDr. A.A.OliferenkoDr. E.V.RadchenkoDr. M.I.SkvortsovaDr. I.G.TikhonovaDr. M.S.BelenikinDr. A.A.Ivanov Dr. A.Yu.ZotovS.A.PisarevA.A.IvanovaA.A.Melnikov