QSAR & QSPR - Unistrainfochim.u-strasbg.fr/FC/docs/Descriptors/FC_QSAR_2009_descriptors.pdf ·...

Post on 19-Jan-2020

1 views 0 download

Transcript of QSAR & QSPR - Unistrainfochim.u-strasbg.fr/FC/docs/Descriptors/FC_QSAR_2009_descriptors.pdf ·...

Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

QSAR & QSPR

Alexandre VarnekFaculté de Chimie, ULP, Strasbourg, FRANCE

History of QSAR

Dmitry Mendeleév (1834 –

1907)

Russian chemist who arranged the 63 known elements into a periodic table based on atomic mass, which he published in Principles of Chemistry in 1869. Mendeléev left space for new elements, and predicted three yet-to-be-discovered elements:

Ga (1875), Sc

(1879)

and Ge

(1886).

Discoverer of the Periodic Table —an early “Chemoinformatician”

Periodic Table

Chemical properties of elements gradually vary along the two axis

History of QSAR

1868, D. Mendeleev –

The Periodic Table of Elements

1868, A. Crum-Brown and T.R. Fraser –

formulated a suggestion that physiological activity of molecules depends on their constitution:

Activity = F(structure)They studied a series of quaternized strychnine derivatives, some of

which possess activity similar to curare in paralyzing muscle.

1869, B.J. Richardson –

narcotic effect of primary alcohols varies in proportion to their molecular weights.

History of QSAR

1893, C. Richet has shown that toxicities of some simple organic compounds (ethers, alcohols, ketones) were inversely related to their solubility in water.

1899, H. Meyer and 1901, E. Overton have found variation of the potencies of narcotic compounds with LogP.

1904, J. Traube found a linear relation between narcosis and surface tension.

History of QSAR

1937, L.P. Hammett studied chemical reactivity of substituted benzenes: Hammett equation,Linear Free Energy Relationship (LFER)

1939, J. Fergusson formulated a concept linking narcotic activity, logP and thermodynamics.

1952-

1956, R.W. Taft devised a procedure for separating polar, steric and resonance effects.

History of QSAR

1964, C. Hansch and T. Fujita: the biologist’s Hammett equation.

1964, Free and Wilson, QSAR on fragments.

1970s –

1980s –

development of 2D QSAR (descriptors, mathematical formalism).

1980s –

1990s, development of 3D QSAR (pharmacophores, CoMFA, docking).

1990s –

present, virtual screening.

R H CH3 OCH3 F Cl NO2

ortho 6.27 12.3 8.06 54.1 11.4 671

meta 6.27 5.35 8.17 13.6 14.8 32.1

para 6.27 4.24 3.38 7.22 10.5 37.0

1934 -

Hammett

Substituent SubstituentMeta Para Meta Para

O -0.708 -1.00 F +0.337 +0.062

OH +0.121 -0.37 Cl +0.373 +0.227

OCH3+0.115 -0.268 CO2 H

+0.355 +0.406

NH2-0.161 -0.660 COCH3

+0.376 +0.502

CH3-0.069 -0.170 CF3

+0.43 +0.54

(CH3

)3

Si -0.121 -0.072 SO2 Ph +0.61 +0.70

C6 H5+0.06 -0.01 NO2

+0.710 +0.778

H 0.000 0.000 +N(CH3)3

+0.88 +0.82

SH +0.25 +0.15 N2 + +1.76 +1.91

SCH3+0.15 0.00 +S(CH3

)2+1.00 +0.90

1934 -

Hammett

σ σ

Here, the size of R affects the rate of reaction by blocking nucleophilic attack by water.

Taft

quantified the steric (spatial) effects using the hydrolysis of

esters:

In this case, the steric effects were quantified by the Taft parameter

Es

: k is the rate constant for ester hydrolysis. This expression is analogous to the Hammett equation.

Steric effects

t-Bu -2.78 : large resistance to hydrolysis

Me -1.24: little steric resistance to hydrolysis

H 0.00 the reference substituent in the Taft equation

Compare some extreme values:

Es Values for Various SubstituentsH Me Pr t-Bu F Cl Br OH SH NO2 C6

H5 CN NH2

0.0 -1.24 -1.60 -2.78 -0.46 -0.97 -1.16 -0.55 -1.07 -2.52 -3.82 -0.51 -0.61

Note: H is usually used as the reference substituent (Es

(0)), but sometimes when another group, such as methyl (Me) is used as the reference, as in the chemical

equation above, the value becomes 1.24.

Organophosphates must be hydrolysed to be active and it is observed that their biological activity is directly related to the Taft steric parameter ES

for the substituent R by the equation:

Es may be used in other chemical reactions and to explain biological activities, for example the hydrolysis of inhibitors of acetylcholine esterase.

Steric effects

Usually, logP instead of P is used

logP > 0, the compound prefers hydrophobic (unpolar) medialogP > 0, the compound prefers polar media

Octanol/water partition coefficient

Biological activity as a function of logP

Hansch AnalysisHansch Analysis

Biological ActivityBiological Activity

= = log1/C log1/C C, drug concentration causes EC50, GI50, etcC, drug concentration causes EC50, GI50, etc..

EL (electronic descriptor): EL (electronic descriptor): σσ

Hammett constant ( Hammett constant ( σσ

mm

, , σσ

p, p, σσ

pp

00, , σσ

pp

++, , σσ

pp

--, , R, F )R, F )

HPh (hydrophobicity descriptor):HPh (hydrophobicity descriptor):ππ

hydrophobic subst. constant, hydrophobic subst. constant, log Plog P

octanol/water octanol/water

partition coeff. partition coeff.

ST (steric descriptor):ST (steric descriptor):

Taft steric constantTaft steric constant

Biological Activity = Biological Activity = f f ((EL, ST, HPhEL, ST, HPh) + constant ) + constant

Hansch, C.; Fujita, T. J. Am. Chem. Soc., 1964, 86, 1616.Hansch, C.; Fujita, T. J. Am. Chem. Soc., 1964, 86, 1616.

log1/C = a ( log P )log1/C = a ( log P )22 + b log P + + b log P + ρσρσ

+ + δδEEss + C+ C

Physicochemical properties can be broadly classiied into three general types:

Electronic •

Steric

Hydrophobic

Hansch AnalysisHansch AnalysisBiological Activity = Biological Activity = f f ((Physicochemical properties Physicochemical properties ) + constant ) + constant

Descriptors

Molecular Structure

Molecular Molecular StructureStructure ACTIVITIESACTIVITIESACTIVITIES

RepresentationRepresentationRepresentation Feature Selection & Mapping

Feature Selection & Feature Selection & MappingMapping

DescriptorsDescriptorsDescriptors

Quantitative structureQuantitative structure--activity relationships correlate, within activity relationships correlate, within congeneric seriescongeneric series of of compounds, their chemical or biological activities, either with compounds, their chemical or biological activities, either with certain structural certain structural features or with atomic, group or molecular descriptors.features or with atomic, group or molecular descriptors.

Quantitative Structure Activity Relationship (QSAR)

Katiritzky, A. R. ; Lovanov, V. S.; Karelson, M. Chem. Soc. Rev.

19951995, 24, 279-287

The molecular descriptor is the final result of a logic

and mathematical procedure which transforms

chemical information encoded within a symbolic

representation of a molecule into a useful number or

the result of some standardized experiment.

Definition of molecular descriptorDefinition of molecular descriptor

Roberto Todeschini and Viviana Consonni

A complete description of all the molecular descriptors is given in: A complete description of all the molecular descriptors is given in:

Handbook of Molecular DescriptorsHandbook of Molecular DescriptorsRoberto Todeschini and Viviana ConsonniRoberto Todeschini and Viviana Consonni

WILEY -

VCH, Mannheim, Germany -

2000WILEY -

VCH, Mannheim, Germany -

2000

Methods and Principles in Medicinal ChemistryVolume 11

Edited by:H. KubinyiR. Mannholdxx. Timmermann

Descriptors from Codessa Pro

TopologicalFragmentsReceptor surfaceStructuralInformation-contentSpatialElectronicThermodynamicConformationalQuantum mechanical

Descriptor Families

Products

Plus Molecular and Quantum Methods

Descriptors -

calculable molecular attributes that govern particular macroscopic properties

Molecular Descriptors

1D (atom counts, MW, number of functional groups, …)

2D (topological indices, BCUT, TPSA, Shannon enthropy, …)

3D (geometrical parameters, molecular surfaces, parameters calculated in quantum chemistry programs, …)

Classification based on the dimensionality of structure presentation

Molecular Descriptors

1D

Constitutional descriptorsConstitutional descriptors

••

number of atoms number of atoms ••

absolute and relative numbers of C, H, O, S, N, F, Cl, Br, I, P absolute and relative numbers of C, H, O, S, N, F, Cl, Br, I, P atoms atoms

••

number of bonds (single, double, triple and aromatic bonds) number of bonds (single, double, triple and aromatic bonds) ••

number of benzene rings, number of benzene rings divided by the number of benzene rings, number of benzene rings divided by the number of atoms number of atoms

••

molecular weight and average atomic weight molecular weight and average atomic weight ••

Number of rotatable bonds (All terminal H atoms are ignored) Number of rotatable bonds (All terminal H atoms are ignored)

••

Hbond acceptor Hbond acceptor -- Number of hydrogen bond acceptors Number of hydrogen bond acceptors ••

Hbond donor Hbond donor -- Number of hydrogen bond donors Number of hydrogen bond donors

These simple descriptors reflect only the molecular composition These simple descriptors reflect only the molecular composition of the of the compound without using the geometry or electronic structure of compound without using the geometry or electronic structure of the molecule.the molecule.

Molecular Descriptors

2D

Topological DescriptorsTopological Descriptors

Descriptors based on the molecular graph representation are wideDescriptors based on the molecular graph representation are widely used in ly used in QSPR, QSAR studies because they help to differentiate the molecuQSPR, QSAR studies because they help to differentiate the molecules les according mostly to their size, degree of branching, flexibilityaccording mostly to their size, degree of branching, flexibility and overall and overall shape.shape.

Total adjacency index: A

= (1/2)

For G1

and G2

, A = 5.•

This TI can only distinguish between structures having different

number of cycles (for cyclohexane A = 6).

TI based on the adjacency matrix

, 1

n

iji j

a=

•M1 =

M2 = where the vertex degree δι is a number of σ

bonds involving atom i excluding

bonds to H atoms.

TI based on the adjacency matrix

: Zagreb group indices

2

1

n

ii

δ=∑ i jδ δ∑

Zagreb group indices were introduced to characterize branching

M1 =

M2 =

Zagreb group indices

2

1

n

ii

δ=∑ i jδ δ∑

M1

(G2

) = 2*12

+4*22

= 18

M1

(G2

) = 2*(1*2) +3*(2*2) = 16M1

(G1

) = 4*12

+2*32

= 22

M2

(G1

) = 4*(1*3) +1*(3*3) = 21

Randić’s molecular connectivity indexRandic introduced a connectivity index similar to M2

χR

=

M. Randić, J. Am. Chem. Soc., 97, 6609 (1975).

1/ 2( )i jδ δ −∑

The entry dij of the distance matrix indicates the number of edges in the shortest path between vertices i and j.

The Wiener index (the first TI !) accounts for the branching:W(G1) = 29 W(G2) = 35

Reference: H. Wiener, J. Am. Chem. Soc., 69, 17 (1947)

TI based on the Distance Matrix: the Wiener Index

Peter Ertl, Bernhard Rohde, and Paul Selzer, J. Med. Chem. 2000, 43, 3714-3717

TPSA - Topological Polar Surface Area

)c(fragmentn i

fragmN

ii

PSAD .)(

13 ∑

=

=−

TPSA - Topological Polar Surface Area

TPSA - Topological Polar Surface Area

3D PSA vs TPSA for 34 810 molecules from theWorld Drug Index

••Moments of inertia Moments of inertia -- rigid rotator approximation rigid rotator approximation -- The moments of inertia characterize the mass distribution in thThe moments of inertia characterize the mass distribution in the molecule. e molecule.

Geometrical descriptorsGeometrical descriptors

Area Area ––

--

Molecular surface area descriptor Molecular surface area descriptor

––

--

Describes the van der Waals area of molecule Describes the van der Waals area of molecule ––

--

related to binding, transport, and solubilityrelated to binding, transport, and solubility

1. Rohrbaugh, R.H., Jurs, P.C., 1. Rohrbaugh, R.H., Jurs, P.C., Anal.Chim. ActaAnal.Chim. Acta, , 19871987. . 199199, 99, 99--109.109.

( )

mass ofcenter the torelative scoordinate atomic the: zy,x,atoms ofnumber : N

222

⎟⎟⎠

⎞⎜⎜⎝

⎛ ++= ∑ N

zyxRog iii

∑=i

iidmI 2

••Shadow indicesShadow indices11

-- Surface area projectionsSurface area projections

Radius of gyration Radius of gyration

Molecular Descriptors

3D

Steric parametersSteric parameters••

LengthLength--toto--breadth ratio : L/B breadth ratio : L/B 11

••

Molecular thickness Molecular thickness

••

Ovality Ovality 2 2

(ratio of the actual surface area and (ratio of the actual surface area and minimum surface )minimum surface )

••

Molecular volume Molecular volume

••

Sterimol parameters Sterimol parameters 33

••

Taft steric parameter ETaft steric parameter Ess

1.1. Janini, G.M.; Johnston, K.; Zielinski, W. L. Janini, G.M.; Johnston, K.; Zielinski, W. L. Anal. Anal. Chem.Chem.

1975, 1975, 4747, 670. , 670. 2.2. Verloop, A.; Tipker, J. In Verloop, A.; Tipker, J. In Biological Activity and Biological Activity and

Chemical StructureChemical Structure, Buisman, J. A. K.(editors), , Buisman, J. A. K.(editors), Elsevier, Amsterdam, Netherlands, 1977, p63. Elsevier, Amsterdam, Netherlands, 1977, p63.

3.3. Kourounakis, A.; Bodor, N. Kourounakis, A.; Bodor, N. Pharm. Res.Pharm. Res.

1995, 1995, 12(8)12(8), , 1199.1199.

LLBBLL BB

Molecular thicknessMolecular thickness

B1

B4

B2 B3

L ax is

B1

B4

B2 B3

L ax is

⎥⎥

⎢⎢

⎡⎟⎠⎞

⎜⎝⎛ ×

=32

434

πvolumnπ

eaSurface arovality

L ax i sL ax i sL ax i s

B1B1B1

Quantum Chemical DescriptorsQuantum Chemical Descriptors••

Quantitative values calculated in QUANTUM MECHANICSQuantitative values calculated in QUANTUM MECHANICS(semi(semi--empirical, HF empirical, HF Ab InitioAb Initio

or DFT ) calculationsor DFT ) calculations

-- Atomic charges Atomic charges (quant)(quant)-- Atomic chargesAtomic charges-- LUMO LUMO --

Lowest occupied molecular orbital energy Lowest occupied molecular orbital energy

––

HOMO HOMO --

Highest occupied molecular orbital energy Highest occupied molecular orbital energy ––

DIPOLE DIPOLE --

Dipole moment Dipole moment

••

--

Components of dipole moment along inertia axes (DComponents of dipole moment along inertia axes (Dxx

, D, Dyy

, D, Dzz

) ) ––

Hf Hf --

Heat of formation Heat of formation

––

Mean PolarizabilityMean Polarizability --

αα

= 1/3(= 1/3(αα

xxxx

++αα

yyyy

++αα

zzzz

) ) ––

EAEA ––

Electron Affinity Electron Affinity

––

IPIP ––

Ionization Potential Ionization Potential ––

ΔΔEE ––

Energy of Protonation Energy of Protonation

––

Electrostatic PotentialElectrostatic Potential --

∫∑ −−

−=

rrdrr

rRZrV

A A

A

'')'()( ρ

Lipophilic Descriptors (2D and 3D)

Lipophilic Descriptors

OctanolOctanol--water partition coefficient water partition coefficient ••

HanschHansch--Leo methodLeo method (ClogP)(ClogP)

••

Rekker's methodRekker's method ∑∑==

+=M

mmm

N

nnn FbfaP

11

log

∑= +

=n

i ij

i

dfjMLP

1 1)(

••GhoseGhose--Grippen methodGrippen method

(calculated logP based on summing contributions of atom types)(calculated logP based on summing contributions of atom types)

logP(octanollogP(octanol--water), logP(alkanewater), logP(alkane--water), logP(chloroformwater), logP(chloroform--water), logP(dichloroethane/water)water), logP(dichloroethane/water)

••Molecular lipophilicity potential (MLP)Molecular lipophilicity potential (MLP)

The MLP describe how lipophilicity is distributed all over the dThe MLP describe how lipophilicity is distributed all over the different parts of a ifferent parts of a molecule(lipophilicity maps and determination of hydro and lipopmolecule(lipophilicity maps and determination of hydro and lipophilic regions of hilic regions of a molecule)a molecule)

Lipophilic Descriptors

Some LogPo/w Extremes in Therapy

OH

Cl

Cl

Cl

OH

Cl

Cl

ClNNH

O

N

Cl

F

F

O

O Cl

Cl

O

NH2

NH

OH

O

NH2

NH

HO

O

OH

XX

OH

OH

OH

OOHOH

OHOH O

O

OHOH

OOH

OH

OH

OH

OH

OH

permethrin6.5

clopimozide7.1

hexachlorophen7.54

arginine-4.2

inuline-3.7

sucrose-3.7

What do these Drugs have in Common?

NH O

NH O

O

OH

O O

O

O

OH

O

O

OH

O O

OOH

O

H

H

H

N

O

OH

ONH

OO

HH

H

HH

N

NCl

N

NH2

NH2

Cl

Cl

ClCl

IrsogladineLogPo/w

= 1.97

ChloroformLogPo/w

= 1.97

SecobarbitalLogPo/w

= 1.97

TrandolaprilLogPo/w

= 1.97

AcetyldigitoxineLogPo/w

= 1.97

3D Hydrophobicity

All molecules have the same logP ~1.5, but different 3D MLP pattern.

hydrophobic hydrophilic

Drug is exposed to a large varietyof pH values:

Saliva pH 6.4•

Stomach pH 1.0 –

3.5

Duodenum pH 5 –

7.5•

Jejunum pH 6.5 –

8

Colon pH 5.5 –

6.8•

Blood pH 7.4

„Liver-first-pass-effect“ www.3dscience.com

Example of oral administration:

Lipophilic DescriptorsLipophilic Descriptors

••

Log D Log D ••

Log PLog PNN : : logP of the neutral form logP of the neutral form

••

Log PLog PII : : logP of the ionized form logP of the ionized form

II

NN

pHsystem PfPfD •+•=

logD –

The Calculation•

LogD may simply be calculated from predicted logP and pKa of the singly ionized species at certain pH:

For acids:logD(pH)

= logP –

log[1 + 10(pH -

pKa)]

For bases:logD(pH)

= logP –

log[1 + 10(pKa

- pH)]

Fragment Descriptors

Descriptors: Cl, amide, COOH, Br, Phenyl

Cl = 1amide = 1COOH = 1Br = 0Phenyl = 0

NO

N

S

N

O

OCl

H

Cl

O

NH

O

O

N

N

N

N

NHH

HH

HI. Sequences

II. Augmented Atoms

ISIDA Fragment descriptors

Type of Fragments

C-N=C-H

C-N=CN=C-NC-NN=CC-H

I(AB, 2-4)

sequenceAtoms+Bonds

2 to 4 atoms

I. Sequences

II. Augmented AtomsN

N

N

N

NHH

HH

H

Type of Fragments

II(Hy) (hybridization of neighboursis taken into account)

II(A) (no hybridization)

ISIDA Fragment descriptors

N

O

N

O

N

O

Etc.

DataSet

C-C-C

-C-C

-CC-C

-C-N

-C-C

C=OC-C

-C-N

C-N-C

-C*C

ISIDA FRAGMENTOR

0 10 1 5 0

0 8 1 4 0

0 4 1 2 4

the Pattern matrix

Calculation of Descriptors

+

PATTERN MATRIX PROPERTY VALUES

-0.222

0.973

-0.066

LEARNING STAGEBuilding of models

QSAR models

VALIDATION STAGEQSAR models filtering ->

selection of the most predictive ones

Example : linear QSPR model Daa i

k

ii.Propriété

10 ∑

=+=Property

PROPERTYcalc

= -0.36 * NC-C-C-N-C-C

+ 0.27 * NC=O

+ 0.12 * NC-N-C*C

+ …

Software

DRAGON

The software DRAGON calculates 1664 molecular descriptors divided in 20 blocks

CODESSA Pro

calculate a large variety of molecular descriptors on the basis of the 3D geometrical structure and/or quantum-chemical parameters;

develop (multi)linear and non-linear QSPR

ISIDA program

calculates fragment descriptors; develop (multi)linear and non-linear QSPR models