Introduction on QSAR and modelling of physico-chemical and biological properties

80
Introduction on QSAR Introduction on QSAR and modelling of and modelling of physico-chemical and physico-chemical and biological properties biological properties Alessandra Roncaglioni – IRFMN [email protected] Problems and approaches in computational chemistry

description

Introduction on QSAR and modelling of physico-chemical and biological properties. Alessandra Roncaglioni – IRFMN [email protected]. Problems and approaches in computational chemistry. Outline. History QSAR/QSPR steps ( Descriptors ) Activity data Modelling approaches - PowerPoint PPT Presentation

Transcript of Introduction on QSAR and modelling of physico-chemical and biological properties

Page 1: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Introduction on QSAR Introduction on QSAR and modelling of and modelling of physico-chemical and physico-chemical and biological propertiesbiological propertiesAlessandra Roncaglioni – [email protected]

Problems and approaches in computational chemistry

Page 2: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

OutlineOutlineHistoryQSAR/QSPR steps

◦(Descriptors)◦Activity data◦Modelling approaches◦Validation (OECD principles)

QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)

2

Page 3: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

QSAR postulatesQSAR postulatesThe molecular structure is

responsible for all the activities Similar compounds have similar

biological and chemico-physical properties (Meyer 1899)

Hansch analysis (‘70s)Free Wilson approach (‘70s)

H. Kubinyi. From Narcosis to Hyperspace: The History of QSAR. Quant. Struct.-Act. Relat., 21 (2002) 348-356. 3

Page 4: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Hansch analysisHansch analysisApplied to congeneric series

Log 1/C = a + b + c Es + const.whereC = effect concentration = octanol - water partition coefficient = Hammett substituent constant (electronic)Es= Taft’s substituent constant

Linear free energy-related approachMcFarland principle

4

Page 5: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Free-Wilson analysisFree-Wilson analysisLog 1/C = ai +

where C = effect concentrationai= contribution per group=activity of reference compound

5

Page 6: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

The old QSAR paradigmThe old QSAR paradigmCompounds in the series must be

closely relatedSame mode of actionBasics biological activitySmall number of “intuitive” propertiesLinear relation

6

Page 7: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

The old QSAR paradigmThe old QSAR paradigmFactors limiting to the old Factors limiting to the old

paradigm:paradigm:Sw availabilityCalculation of molecular propertiesLimited COMPUTING POWERCosts of hw and sw

7

Page 8: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

The new QSAR paradigmThe new QSAR paradigmHeterogeneous compound setsMixed modes of actionComplex biological endpointsLarge number of propertiesNon linear modelling

8

Page 9: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

The new QSAR paradigmThe new QSAR paradigmFactors enabling new paradigm:Factors enabling new paradigm:Increased computing powerQM calculationsThousands of descriptorsCost drop for hw and sw (freeware)

9

Page 10: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

OutlineOutlineHistoryQSAR/QSPR steps

◦(Descriptors)◦Activity data◦Modelling approaches◦Validation (OECD principles)

QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)

10

Page 11: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

QSAR flowchatQSAR flowchat

11

Page 12: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

2D 3D

… … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … …

Descriptos (1, …, m)

D(n,m)

… … … … … … … … … … … … … …

Activity

A A = f (D(n,m))

12

Page 13: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

QSAR/QSPR defined by Y QSAR/QSPR defined by Y datadataQuantitative Structure-Property

Relationship: physico-chemical or biochemical properties◦ Boiling point◦ Partition coefficients (LogP)◦ Receptor binding

Quantitative Structure-Activity Relationship: interaction with the biota◦ Toxicity◦ Metabolism

13

Page 14: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Activity dataActivity dataGarbage in, garbage outQuality and quantity of data

◦ Suitable for purposes?

◦ Intrinsic variability of Y data (particularly for QSAR): examples later on

◦ Chemical domain covered with experimental data

◦ As much as you can expecially if using complex models

14

Page 15: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Data needData needData are one of the

pillars of the modelsThe goal is to

extract knowledge from these data

If they are too noisy it is not possible to extract this knowledge

Enough number of training data

Keep data variability low

Large number of compounds

Quality / Accuracy

Nr.

of

com

poun

ds

15

Page 16: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Modelling stepsModelling stepsData pre-processing

◦Scaling X block and transformation of Y block

Variable selectionApplication of algorithms to

search for the reationship

16

Page 17: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Scaling variables◦ making sure that each descriptor has an

equal chance of contributing to the overall analysis

◦ E.g.: autoscaling, range scaling

Y transformation

Data pre-processing (I)Data pre-processing (I)

17

No. TestSub Name_Jp CAS nr MolWeight ZM1V HNar MSD GMTIV SPI TI11 1-001 Trichlorfon 52-68-6 257,437 162,124 1,358 0,266 1336,074 9,072 -13,9822 1-002 Dimethoate 60-51-5 229,257 148,198 1,485 0,324 1587,741 8,253 -16,4763 1-003 Dichlorvos 62-73-7 220,976 172,519 1,451 0,326 1380,444 7,646 -13,9084 1-004 Malathon 121-75-5 330,358 274,198 1,551 0,26 6524,185 13,757 -34,658

51-005 Methoprene

40596-69-8 310,471

224 1,562 0,334 9922 17,17 -57,135

6 1-006 Propylthiourea 927-67-3 118,201 50,444 1,448 0,431 235,667 4,166 -6,9627 1-007 2-Butanone oxime 96-29-7 87,1204 72 1,385 0,414 224 3,484 -4,9098 1-008 Dibromoacetic acid 631-64-1 217,844 86,134 1,286 0,38 204,949 3,786 -4,5939 1-011 Bis(2-ethylhexyl)adipate 103-23-1 370,566 254 1,696 0,334 14655 17,037 -80,091

10 1-013 Thiram 137-26-8 240,433 87,778 1,44 0,338 829,445 9,053 -17,0811 1-015 Stannane, tributylfluoro- 1983-10-4 309,051 88,008 1,6 0,314 1243,889 8,561 -22,079

121-016 Methomyl

16752-77-5 162,21

148,444 1,5 0,375 1128,333 6,542 -12,904

13 1-017 Aldicarb 116-06-3 190,263 158,444 1,485 0,351 1697 8,357 -17,60914 1-018 Demeton-s-methyl 919-86-8 230,285 124,198 1,548 0,353 1163,185 7,554 -17,68515 1-019 Citral 5392-40-5 152,233 106 1,535 0,387 1399 7,197 -16,0616 1-020 Disulfiram 97-77-8 296,539 103,778 1,548 0,307 1801,444 11,43 -28,14117 1-021 2-Ethyl-1,3-hexanediol 94-96-2 146,227 86 1,5 0,338 805 6,412 -11,89718 1-022 Tributyl phosphate 126-73-8 266,314 183,309 1,659 0,311 3441,667 10,069 -32,43720 1-024 Tris(2-chloroethyl)phosphate 115-96-8 285,49 170,124 1,6 0,314 1985,037 8,561 -22,07922 1-026 Ethylene glycol 107-21-1 62,0678 58 1,333 0,527 139 1,893 -2,499

Page 18: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Data pre-processing (II)Data pre-processing (II)Variable pruning

◦ Detecting constant variables◦ Detecting quasi-constant variables

It can distinguish between informative and non informative variables

◦ Detecting correlated variables Variables can be grouped into correlation groups

and the most correlated variable with the response is retained

◦ Variables with missing values

18

Page 19: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Variable selectionVariable selectionReducing dimensions, facilitating

data visualization and interpretation

Likely improving prediction performance

Hypothesis driven or statistically driven

19

Wrappers: utilizes the choice of prediction method to score subsets of features according to their predictive power;

Filters: a preprocessing step, independent of the choice of the predictor.

Page 20: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Variable selection Variable selection techniquestechniquesPrincipal component analysis (PCA)ClusteringSelf organizing maps (SOM)Stepwise procedures

• Forward selection: features are progressively incorporated into larger and larger subsets;

• Backward elimination: starting with the set of all features and progressively eliminates the least promising ones.

Genetic algorithmsVariable importance/sensitivity

20

Page 21: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Principal component Principal component analysisanalysisKeep only those components that

possess largest variationPC are orthogonal to each other

Loadings plot21

Page 22: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

ClusterCluster analysis analysisProcess of putting objects into classes,

based on similarityDescriptors in the same cluster are

assume similar values for the molecules of the dataset

Many different methods and algorithms◦ different clustering methods will result in

different clusters, with different relationships between them

◦ different algorithms can be used to implement the same method (some may be more efficient than others)

22

Page 23: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Hierarchical and non-Hierarchical and non-hierarchicalhierarchicalA basic distinction is between

clustering methods that organise clusters hierarchically, and those that do not

3 42 5 6 7 81 3 42 5 6 7 81

23

Page 24: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Hierarchical Hierarchical agglomerativeagglomerativeThe hierarchy is built from the bottom

upwardsSeveral different methods and algorithmsBasic Lance-Williams algorithm (common

to all methods) starts with table of similarities between all pairs of items◦ at each step the most similar pair of

molecules (or previously-formed clusters) are merged together

◦ until everything is in one big cluster◦ methods differ in how they determine the

similarity between clusters

24

Page 25: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Hierarchical divisiveHierarchical divisiveThe hierarchy is built from the top

downwardsAt each step a cluster is chosen to

divide, until each cluster has only one member

Various ways of choosing next cluster to divide◦ one with most members◦ one with least similar pair of members◦ etc.

Various ways of dividing it25

Page 26: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Non-hierarchical methodsNon-hierarchical methodsUsually faster than hierarchical e.g.: Nearest neighbour methods

◦best known is example is Jarvis-Patrick method identify top k (e.g. 20) nearest neighbours

for each object two objects join same cluster if they have

at least kmin of their top k nearest neighbours in common

◦tends to produce a few large heterogeneous clusters and a lot of singletons (single-member clusters)

26

Page 27: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Self organizing mapsSelf organizing mapsA SOM is an unsupervised NN

condensing the input space into a low-dimensional representation

27

Page 28: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Genetic algorithmsGenetic algorithmsBased on the Darwinian evolutionary theory

◦ individuals in a population of models are crossed over, mutated, then iteratively evaluated against a fitness function which gives a statistical evaluation of the model’s performances

28

Initialpopulation

Evaluation of individuals

Cross-over

MutationsIndividual selection

Fitness?

End

Y

N

10010111011011010101010101011110011100111010001010010010 10001010110010

11010111011111

Page 29: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Modelling approachesModelling approachesSAR

Quantitative SAR

29

Categorical YClassification

Continuous YRegression

Page 30: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Modelling techniquesModelling techniquesMultiple Linear RegressionPLS…Neural NetworksClassification treesDiscriminant analysisFuzzy classification…

30

Page 31: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Multiple RegressionMultiple RegressionLinear relationship between Y

and several Xi descriptorsY = aX1 + bX2 + cXn + … + const.

Minimize error by least squaresMay include polynomial terms

31

; (1)

)(

)ˆ(1

1

12

yy

yyR

i

n

i

ii

n

i

Page 32: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

32

Partial Least SquarePartial Least SquarePLS similarly to PCA uses orthogonal PC of linearly correlated variables more closely related to the Y response

Scores t1&t2 projection

Page 33: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

O = f(I)I O

Neural networksNeural networksInspired by biological NNs are a set of connected

nonlinear elements making transformation of input

33

Page 34: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

The problem of The problem of overfittingoverfitting

y = 0.979x + 0.344R² = 0.956

y = -0.062x4 + 1.293x3 - 9.472x2 + 29.24x - 27.37R² = 0.999 34

Page 35: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Solution: validationSolution: validation

35

Training prediction

Validation prediction

Complexity

Perf

orm

ance

s

Page 36: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Validation criteriaValidation criteriaInterna validation - robustness

◦Cross-validation (LOO, LSO)◦Bootstrap◦Y scrambling

External validation - prediction ability◦Test set representative of training set◦Tropsha criteria

Applicability domain

36

Page 37: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Cross validationCross validationLeave One OutAll the data are used for fitting but one compoundPredict the excluded sampleRepeat it for all samplesCalculate Q2 or R2cv similarly to R2 on the basis of

these predictionsProblem: to optimistic if there are many

samples

Leave Many OutUse larger groups to obtain a more realistic

outcome37

Page 38: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

BootstrappingBootstrappingBootstrapping simulates what happen by

randomly resampling the data set with n objects

K n-dimensional groups are generated by a randomly repeated some objects

The model obtained on the different sets is used to predict the values for the excluded sample

From each bootstrap sample the statistical parameter of interest is calculated

The estimation of accuracy is obtained by the average of all calculated statistics

38

Page 39: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Y-scramblingY-scramblingRandomply permutate Y responses while X

variables are kept in the same order for several times

39

Page 40: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Tropsha criteria*Tropsha criteria*

40

* A. Golbraikh, M. Shen, Z. Xiao, Y.D. Xiao, K.-H. Lee, A. Tropsha, Rational selection of training and test sets for the development of validated QSAR models, JCAMD, 17 (2003) 241-253.

a) Q2 > 0.5; b) R2 > 0.6;

c) (R2 - R20)/ R2 < 0.1 and 0.85 < k < 1.15 or

(R2 – R’20)/ R2 < 0.1 and 0.85 < k’ < 1.15

(k=slope of the regression line)(R2

0 = R2 related to y=kx)

d) if (c) is not fulfilled, then | R20 – R’2

0| < 0.3

Page 41: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Applicability domainApplicability domainThe applicability domain of a (Q)SAR model is the response and chemical structure space in which the model makes predictions with a given reliability.*

41

* Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. ATLA, 33:1-19, 2005.

Page 42: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Applicability domainApplicability domain

42Training data

Page 43: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Applicability domainApplicability domain

43

Training data

New compounds

Page 44: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

AD assessmentAD assessmentSimilarity measures:

Response range (span of activity data)

Chemometric treatment of the descriptor space

Fragment-based approaches

44

Page 45: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Chemometric MethodsChemometric MethodsDescriptor range-based

45

0

2

4

6

8

10

12

0 5 10 15 20

Descr. 1

Des

cr. 2

Page 46: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Chemometric MethodsChemometric MethodsDescriptor range-basedGeometric methods

46

0

2

4

6

8

10

12

0 5 10 15 20

Descr. 1

Des

cr. 2

Page 47: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Chemometric MethodsChemometric MethodsDescriptor range-basedGeometric methodsDistance-based

47

Page 48: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Chemometric MethodsChemometric MethodsDescriptor range-basedGeometric methodsDistance-basedProbability density

distribution

48

Page 49: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

AMBIT softwareAMBIT software

http://ambit.acad.bg/main.php 49

Page 50: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

AD assessmentAD assessmentSimilarity measures:

Response range (span of activity data)

Chemometric treatment of the descriptor space

Fragment-based approaches

50

Page 51: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Example of AD Example of AD assessmentassessment

0

10

20

30

40

50

60

70

80

90

100Within 1 log unit

Within 2 log unit

Test set 1

Test set 2

0

10

20

30

40

50

60

70

80

90

100

% o

f com

poun

ds

Within 1 log unit

Within 2 log unit

% of all compounds in the test set predicted within one or two log unit without assessing the AD

51

Page 52: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Further aspects in ADFurther aspects in AD

52

Including the model’s characteristicsTerminal

nodeNode

assignment

Misclassification ratio

Training set Validation set

Test set

4 0 0.04 0.02 06 1 0.13 0.16 0.148 0 0.1 0.26 0.17

11 1 0.31* 0 0.2512 0 0.05 0.14 0.2514 1 0.47* 0.67* 0.2515 0 0.14 0.19 0.1318 1 0.32* 0.33* 0.55*19 0 0.2 0 020 0 0.06 0.2 0.1721 1 0.44* 0.2 0.17

Page 53: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

OutlineOutlineHistoryQSAR/QSPR steps

◦(Descriptors)◦Activity data◦Modelling approaches◦Validation (OECD principles)

QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)

53

Page 54: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Why?Why?

54

Large number of existing and new chemicals without a complete (eco)toxicological characterization

http://www.ewg.org/reports/skindeep/ Ingredient Search Results: [tocopheryl acetate]

Ingredient Sample Product Categories Number of Products Ingredient Score (5=highest concern)

1. TOCOPHERYL ACETATE

Moisturizer , Facial Moisturizer/Treatment , Facial Cleanser , Body Wash/Cleanser , Lip Gloss 3144 0.6

Ingredient Categories: Low concern Moderate concern Higher concern

Cancer hazard Reproductive/developmental toxicity Unsafe for use in cosmetics Illegal ingredients (EU) Illegal ingredients (US) Unsafe in infant products Potential for harmful impurities Ingredient(s) not disclosed on label

Sunburn/skin cancer risk Estrogenic chemicals and other endocrine disruptors

Irritants - eye, skin, or lungs Fragrance

Persistent/bioaccumulative Immune system toxicants (allergies, sensitization)

Penetration enhancers Safety limits on use/purity/manufacturing Classified as toxic Potential for infectious disease risk Hazards for occupational exposures Industry safety warnings Illegal for use in food Illegal for use in drugs Insufficient safety data Wildlife/environmental toxicity

Ingredient(s) not assessed for safety No safety information in 37 regulatory/toxicity data sources

Summary - health information

Constrains: time consuming, expensive,

ethical issues

Page 55: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

REACHREACHEnterprises that manufacture or import

more than one tonne of a chemical substance per year would be required to register it in a central database

It is estimated that the testing of the approximately 30’000 existing substances would result in total costs of about 2,1 billion €, over the next 11 years

Promotion of non-animal testing

55

Registration, Evaluation and Authorisation of CHemicals

Page 56: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

REACHREACH

56

Registration, Evaluation and Authorisation of CHemicals

Additional cost Use of (Q)SARs, read-across 2.3 billion Euro Minimal use

1.5 billion Euro Average use (likely scenario)

1.1 billion Euro Maximal use

Cost-saving potential: € 800-1130 million Pedersen et al. (2003).

Assessment of additional testing needs under REACH.

Page 57: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

REACHREACH

57

Registration, Evaluation and Authorisation of CHemicals

Additional animalsUse of (Q)SARs, read-across 3.9 million Minimal use

2.6 million Average use (likely scenario)

2.1 million Maximal use

Animal-saving potential: 1.3-1.9 million animals Van der Jagt et al. (2004). Alternative approaches can reduce the use of test animals under REACH.

Page 58: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

OECD principles for QSAR OECD principles for QSAR validationvalidationEfforts to improve transparency and acceptability of in silico methods:A defined endpointAn unambiguous algorithmA defined domain of applicabilityAppropriate measures of goodness-of-fit, robustness and predictivityA mechanistic interpretation, if possible

58

Page 59: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

QSPRQSPRPhysico-chemical properties

◦Boiling point◦Solubility◦Partition coefficients◦Viscosity◦Hydrophobicity

Biochemical assays

59

Page 60: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Specific aspects of QSPRSpecific aspects of QSPRIn general you can expect to

obtain more precise models, and experience reduced experimental variability

Many properties important for drug design◦Biochemical assay – target property◦Bioavailability – LogP◦Side effects

Many others important for REACH60

Page 61: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

QSARQSARBiological activities

◦Ecotoxicity◦Mammalian toxicity (as surrogate of

human health)◦Carcinogenicity & Mutagenicity◦…◦& many more

61

Page 62: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Specific aspcts of QSARSpecific aspcts of QSARBiological variabilityMoles vs. wheight dataRole of LogPMechanistic interpretation

62

Page 63: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Biological variabilityBiological variabilityIntrinsic variability of toxicological data (LC50)

63

Page 64: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Mole vs. Mole vs. wheight wheight unitsunits

64

Page 65: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Role of LogPRole of LogPUsed to model the penetration

into the phospholipidic membrane

Extreamly common for its easyness of interpretation

65

Page 66: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Role of LogPRole of LogPWhich is the your favourite option?

66

Tox = 1.32 • LogP + 0.23

Tox = 0.55 • des1 + 0.36 • des2 + 0.29 • des3 + 0.64 • des4 - 0.47 • des5 - 1.56 • des6 -

0.53 • des7 + 0.27 • des8 + 0.55 • des9 + 0.50 • des10 + 0.23

Page 67: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Role of LogPRole of LogP

67

KowWin (LogKow) Log P Calculation: SMILES : CC(C)(C)C(=O)C(Oc1ccc(Cl)cc1)n2cncn2 CHEM : Triadimefon MOL FOR: C14 H16 CL1 N3 O2 MOL WT : 293.76 -------+-----+--------------------------------------------+----------+--------- TYPE | NUM | LOGKOW v1.66 FRAGMENT DESCRIPTION | COEFF | VALUE -------+-----+--------------------------------------------+----------+--------- Frag | 3 | -CH3 [aliphatic carbon] | 0.5473 | 1.6419 Frag | 1 | -CH [aliphatic carbon] | 0.3614 | 0.3614 Frag | 8 | Aromatic Carbon | 0.2940 | 2.3520 Frag | 1 | -CL [chlorine, aromatic attach] | 0.6445 | 0.6445 Frag | 1 | -O- [oxygen, one aromatic attach] |-0.4664 | -0.4664 Frag | 1 | -C(=O)- [carbonyl, aliphatic attach] |-1.5586 | -1.5586 Frag | 3 | Aromatic Nitrogen [5-member ring] |-0.5262 | -1.5786 Frag | 1 | -tert Carbon [3 or more carbon attach] | 0.2676 | 0.2676 Factor| 1 | -N-C-O- structure correction | 0.5494 | 0.5494 Factor| 1 | -C-CO-C-O- structure correction | 0.5000 | 0.5000 Const | | Equation Constant | | 0.2290 -------+-----+--------------------------------------------+----------+--------- Log Kow = 2.9422

LogKow Estimated Log P: 2.94

Page 68: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

From LogPFrom LogP

68

descriptorsdescriptors

structure

logP

activity

descriptors

structure activity

To direct descriptors

Page 69: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

MechanisticMechanistic interpretationinterpretationA priori (experimentally

determined – even more complex that the studied endpoint itself) or postulated or a posteriori

Different classification schemes for MOA exist (narcosis, specific reactive modes)

69

Page 70: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Global models …Global models …

70

Training setTraining set n = 422 d = 5 R2 % = 69,9 %Rcv2 % = 68,0 % RMS = 0,77

Test setTest set n = 141 R2 = 71,7 RMS = 0,70

- Log (LC50) for training and test set

Observed-4 -2 0 2 4 6

Pred

icted

-4

-2

0

2

4

6 Training setTest set

N-Vinylcarbazole

Acrolein

2-propyn-1-ol

2-propen-1-olMalononitrile

Page 71: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Global models …Global models …

71

nn 563 563

dd

Log P, ELog P, ELUMOLUMO, MW,Kier&Hall (order 0), Molecular surface area

Log P, Log P, EELUMOLUMO

RR22 71.1 69.5QQ22 70.7 69.3

RMSRMS 0.74 0.76

Page 72: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

vs mechanistic models …vs mechanistic models …

72

-Log (LC50) for reactive compounds

Observed

0 2 4 6

Pre

dict

ed

0

2

4

6

n d R2% Rcv2% RMS

141 2 58.6 58.2 0.83

MOA Class n d R2% Rcv2%

Narcosis I Narcosis II Narcosis III

238 38 26

2 3 4

90.1 82.9 91.7

89.9 81.1 90.6

Page 73: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

OutlineOutlineHistoryQSAR/QSPR steps

◦(Descriptors)◦Activity data◦Modelling approaches◦Validation (OECD principles)

QSPR (Phys-chem properties)QSAR (Biological activities)Example (DEMETRA)

73

Page 74: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Activity data collectionActivity data collection Identification of the endpoints that can

mostly benefit for QSAR Costs, test severity, feasibility, etc…

Identification of data sources Quality, guidelines, protocols

Refinement of the data Multiple sources comparison, precautionary selection

74

TROUT(282)

WATER FLEA

(263) ORAL QUAIL (116)

DIETARY QUAIL (123) BEE

(105)

WHOLE

DATASET

(398)

Page 75: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

TroutDaphniaOral quailDietary quailBee

• Individual models

Linear models, ANN models

• Hybrid system Combining model results

Modelling processModelling process

75

Page 76: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

ValidationValidationValidation of the hybrid model for

Daphnia with new data subsequenly identified in literature◦Real “blind” test set

Comparison with Expert systems◦ECOSAR◦Topkat

76

Page 77: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

DEMETRA Hs - training DEMETRA Hs - training setset

77

-4

-2

0

2

4

6

-4 -2 0 2 4 6

experimental value [-log(mg/l)]

pred

icte

d va

lue

[-log

(mg/

l)]Daphnia MagnaTRAINING SETNC = 193R2 = 0.80

Page 78: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

DEMETRA Hs - DEMETRA Hs - results on results on teststests

78

Daphnia MagnaTEST SETSEPA test setNC = 36R2 = 0.80

-4

-2

0

2

4

6

-4 -2 0 2 4 6

experimental value [-log(mg/l)]

pred

icte

d va

lue

[-log

(mg/

l)]

D-BBA test setNC = 101R2 = 0.70

Page 79: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

US EPA ECOSAR US EPA ECOSAR predictionspredictions

79

Daphnia MagnaECOSAR – tutti i datiNC = 432R2 = 0.20

-4

-2

0

2

4

6

-4 -2 0 2 4 6

experimental value [-log(mg/l)]

pred

icte

d va

lue

[-log

(mg/

l)]

Page 80: Introduction  on QSAR and  modelling of physico-chemical  and  biological properties

Topkat predictionsTopkat predictions

80

NC = 176NC (training test) = 31R2 = 0.20

-4

-2

0

2

4

6

-4 -2 0 2 4 6

experimental value [-log(mg/l)]

pred

icte

d va

lue

[-log

(mg/

l)]