Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro...

68
Roberto Todeschini Roberto Todeschini Viviana Consonni Viviana Consonni Manuela Pavan Manuela Pavan Andrea Mauri Andrea Mauri Davide Ballabio Davide Ballabio Alberto Manganaro Alberto Manganaro chemometrics chemometrics molecular descriptors molecular descriptors QSAR QSAR multicriteria decision multicriteria decision making making environmetrics environmetrics experimental design experimental design artificial neural artificial neural networks networks statistical process statistical process control control Milano Chemometrics and QSAR Research Group Milano Chemometrics and QSAR Research Group Department of Environmental Sciences Department of Environmental Sciences University of Milano - Bicocca University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano P.za della Scienza, 1 - 20126 Milano (Italy) (Italy) Website: michem.unimib.it/chm/ Website: michem.unimib.it/chm/

Transcript of Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro...

Page 1: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Roberto Todeschini Roberto Todeschini

Viviana Consonni Viviana Consonni

Manuela PavanManuela Pavan

Andrea MauriAndrea Mauri

Davide BallabioDavide Ballabio

Alberto ManganaroAlberto Manganaro

chemometricschemometrics

molecular descriptorsmolecular descriptors

QSARQSAR

multicriteria decision makingmulticriteria decision making

environmetricsenvironmetrics

experimental designexperimental design

artificial neural networksartificial neural networks

statistical process controlstatistical process control

Milano Chemometrics and QSAR Research GroupMilano Chemometrics and QSAR Research Group

Department of Environmental SciencesDepartment of Environmental Sciences

University of Milano - BicoccaUniversity of Milano - Bicocca

P.za della Scienza, 1 - 20126 Milano (Italy)P.za della Scienza, 1 - 20126 Milano (Italy)

Website: michem.unimib.it/chm/Website: michem.unimib.it/chm/

Page 2: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Roberto TodeschiniMilano Chemometrics and QSAR Research Group

An introduction to

molecular descriptors and QSAR

Iran - February 2009Iran - February 2009

Page 3: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

synthesissynthesis: chemistry produces the : chemistry produces the

objetcs of its own studyobjetcs of its own study

chemical compositionchemical composition: a unifying concept : a unifying concept

for all the experimental sciencesfor all the experimental sciences

molecular structuremolecular structure: one the most fruitful : one the most fruitful

scientific concepts of this centuryscientific concepts of this century

synthesissynthesis: chemistry produces the : chemistry produces the

objetcs of its own studyobjetcs of its own study

chemical compositionchemical composition: a unifying concept : a unifying concept

for all the experimental sciencesfor all the experimental sciences

molecular structuremolecular structure: one the most fruitful : one the most fruitful

scientific concepts of this centuryscientific concepts of this century

The chemical data

Page 4: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

The concept of molecular structure is one of The concept of molecular structure is one of

the most reach of the last 140 years.the most reach of the last 140 years.

Molecular structure

Page 5: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

The basic assumptions are that different The basic assumptions are that different

molecular structures have different chemical molecular structures have different chemical

properties and similar molecular structures properties and similar molecular structures

have similar molecular properties.have similar molecular properties.

Molecular structure

congenericity principlecongenericity principle

Page 6: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Each molecular representation represents a Each molecular representation represents a

different way to look at the molecular structure different way to look at the molecular structure

and its chemical meaning is strongly and its chemical meaning is strongly

immersed in the framework of the chemical immersed in the framework of the chemical

theories.theories.

Molecular structure

Page 7: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Some historical notes

Page 8: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Studi sull’isomeria delle così dette sostanze aromatiche Studi sull’isomeria delle così dette sostanze aromatiche

a sei atomi di carbonio.a sei atomi di carbonio.

Gazzetta Chimica ItalianaGazzetta Chimica Italiana, vol. IV, p.305, vol. IV, p.305

Some historical notes

18741874

Wilhelm KÖRNERWilhelm KÖRNER

Page 9: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

To distinguish the observed different di-substituted benzenes, To distinguish the observed different di-substituted benzenes,

he proposed to distinguish them into he proposed to distinguish them into ortho-, meta-, and para-.ortho-, meta-, and para-.

Some historical notes

These can be considered the These can be considered the

first 3 molecular descriptorsfirst 3 molecular descriptors

18741874

Wilhelm KÖRNERWilhelm KÖRNER

Page 10: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Based on these descriptors, 90 years later, Corwin Hansch Based on these descriptors, 90 years later, Corwin Hansch

proposed the first QSAR approach.proposed the first QSAR approach.

Some historical notes

Lipophilic, electronic and Lipophilic, electronic and

steric descriptors for ortho-, steric descriptors for ortho-,

meta-, and para-substituentsmeta-, and para-substituents

19641964

Corwin HANSCHCorwin HANSCH

Page 11: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

“The molecular descriptor is the final result of a logic

and mathematical procedure which transforms

chemical information encoded within a symbolic

representation of a molecule into a useful number or

the result of some standardized experiment.”

R. Todeschini and V. Consonni

Definition of molecular descriptorDefinition of molecular descriptorDefinition of molecular descriptorDefinition of molecular descriptor

Molecular descriptors

Page 12: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

3300 molecular descriptors3300 molecular descriptors 3300 molecular descriptors3300 molecular descriptors

Molecular descriptors

Page 13: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

lion forefeetlion forefeeteagle hind legseagle hind legs

scorpion tailscorpion tail

dragon headdragon head

bull bodybull bodyunicornunicorn

snake necksnake neck

Molecular descriptors

Page 14: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

sizesize

symmetrysymmetry

branchingbranching

stericsteric

shapeshape

cyclicitycyclicity

hydrophobicityhydrophobicity

H - bondingH - bonding

electronic aspectselectronic aspects

reactivityreactivity

Molecular descriptors

Page 15: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

sizesize

symmetrysymmetry

branchingbranching

stericsteric

shapeshape

cyclicitycyclicity

hydrophobicityhydrophobicity

H - bondingH - bonding

electronic aspectselectronic aspects

several several meanings in just meanings in just

one numberone number

reactivityreactivity

Molecular descriptors

Page 16: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Page 17: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Molecular descriptorsMolecular descriptors

graph theory discrete mathematics physical chemistrygraph theory discrete mathematics physical chemistry

information theory quantum chemistry organic chemistryinformation theory quantum chemistry organic chemistry

differential topology algebraic topologydifferential topology algebraic topology

derived from ….derived from ….

QSAR/QSPR medicinal chemistry pharmacology genomicsQSAR/QSPR medicinal chemistry pharmacology genomics

drug design toxicology proteomics analytical chemistrydrug design toxicology proteomics analytical chemistry

environmetrics virtual screening library searchingenvironmetrics virtual screening library searching

applied in ….applied in ….

statisticsstatistics

chemometricschemometrics

chemoinformaticschemoinformatics

processed by ….processed by ….

Molecular descriptors

Page 18: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecule

physico - chemicalproperties

biologicalactivities

molecular

descriptors

Molecular descriptors

Page 19: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Historical note: fragment approach

The biological activity of a molecule is The biological activity of a molecule is

the sum of its fragment propertiesthe sum of its fragment properties

common reference skeletoncommon reference skeleton

molecule properties gradually modified by substituentsmolecule properties gradually modified by substituents

Congenericity principleCongenericity principle

QSAR styrategies can be applied ONLY to classes of QSAR styrategies can be applied ONLY to classes of

similar compoundssimilar compounds

Page 20: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Biological response = fBiological response = f11((LL) + f) + f22((EE) + f) + f33((SS) + f) + f44((MM))

Corvin Hansch, 1964Corvin Hansch, 1964

Historical note: Hansch approach

Lipophilic propertiesLipophilic properties

Electronic propertiesElectronic properties

Steric propertiesSteric properties

Other molecular propertiesOther molecular properties

11

22

33

44

Page 21: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

11 Congenericity approachCongenericity approach

22 Linear additive schemeLinear additive scheme

33 Limited representation of global molecular propertiesLimited representation of global molecular properties

44 No 3D and conformational informationNo 3D and conformational information

Historical note: Hansch approach

Page 22: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

boiling point

melting point

dipole moment

molar refractivity

parachor

octanol/water partition coefficient

vapor pressure

density

solubility

.............................

Physico-chemical propertiesPhysico-chemical properties

The role of the molecular descriptors

Page 23: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

binding affinity

lethal dose

inhibition concentration

mutagenicity

carcinogenicity

................

Biological activitiesBiological activities

The role of the molecular descriptors

Page 24: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

biodegradation

bioconcentration

BOD

COD

half - life time

mobility

atmospheric persistance

.........................

Environmental propertiesEnvironmental properties

The role of the molecular descriptors

Page 25: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

.... and more.... and more

conductivity

retention time

reological behaviours

.........................

The role of the molecular descriptors

Page 26: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecule

moleculardescriptors

molecular structure

representation

a real objecta real object

numbersnumbers

Representations of a molecular structure

Page 27: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Representations of a molecular structure

Page 28: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

3D - geometrical3D - geometrical3D - geometrical3D - geometrical

0D - counts0D - counts0D - counts0D - counts

Representations of a molecular structure

Cl Cl

ClCl

H

H

H

H

H

H

2D - topochemical2D - topochemical2D - topochemical2D - topochemical

2D - topostructural2D - topostructural2D - topostructural2D - topostructural

. .· ·

··· ·

· ···

· ·.

.

.

...

. .C

C

C

C

C C

C C

CC

CC

C l C l

C l C l

H

H

H

H

H

H

1D – fragment counts1D – fragment counts1D – fragment counts1D – fragment counts

. .· ·

··· ·

· ···

· ·.

.

.

...

. .C

C

C

C

C C

C C

CC

CC

C l C l

C l C l

H

H

H

H

H

H

Page 29: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

probesprobes interaction energy valueinteraction energy valueat each pointat each pointfor each probefor each probe

• stericsteric

• electronicelectronic

• hydrophobichydrophobic

Representations of a molecular structure

4D4D

Page 30: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecular graphmolecular graph

graph invariantsgraph invariants

topostructural topostructural descriptorsdescriptors

topochemical topochemical descriptorsdescriptors

topographic topographic descriptorsdescriptors

topological information indicestopological information indices

2D2D

Atom listAtom list 0D0D

countingcounting summingsumming

grid-based QSAR grid-based QSAR techniquestechniques

interaction energy interaction energy valuesvalues

4D4D

Substructure listSubstructure list 1D1D

countingcounting

molecular geometrymolecular geometryx, y, z coordinatesx, y, z coordinates

geometrical geometrical descriptorsdescriptors

quantum-chemical quantum-chemical descriptorsdescriptors

bulk descriptorsbulk descriptors

molecular surface molecular surface descriptorsdescriptors

3D3D

structural keysstructural keys

Page 31: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecular graphmolecular graph graph invariantsgraph invariants

Wiener index, Hosoya Z indexZagreb indices, Mohar indicesRandic connectivity indexBalaban distance connectivity indexSchultz molecular topological indexKier shape descriptorseigenvalues of the adjacency matrixeigenvalues of the distance matrixKirchhoff numberdetour indextopological charge indices...............

Wiener index, Hosoya Z indexZagreb indices, Mohar indicesRandic connectivity indexBalaban distance connectivity indexSchultz molecular topological indexKier shape descriptorseigenvalues of the adjacency matrixeigenvalues of the distance matrixKirchhoff numberdetour indextopological charge indices...............

total information content on .....mean information content on .....total information content on .....mean information content on .....

Kier-Hall valence connectivity indicesBurden eigenvaluesBCUT descriptorsKier alpha-modified shape descriptors2D autocorrelation descriptors...............

Kier-Hall valence connectivity indicesBurden eigenvaluesBCUT descriptorsKier alpha-modified shape descriptors2D autocorrelation descriptors...............

3D-Wiener index3D-Balaban indexD/D index...............

3D-Wiener index3D-Balaban indexD/D index...............

topological information indicestopological information indices

topostructural topostructural descriptorsdescriptors

topochemical topochemical descriptorsdescriptors

molecular geometrymolecular geometryx, y, z coordinatesx, y, z coordinates

topographic topographic descriptorsdescriptors

Page 32: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecular geometrymolecular geometryx, y, z coordinatesx, y, z coordinates

geometrical geometrical descriptorsdescriptors

interaction energy interaction energy

valuesvalues

grid-based QSAR grid-based QSAR techniquestechniques

quantum-chemical quantum-chemical

descriptorsdescriptors

gravitational indices3D-Morse descriptorsEVA descriptorsEEVA descriptorsWHIM descriptorsGETAWAY descriptors..............

gravitational indices3D-Morse descriptorsEVA descriptorsEEVA descriptorsWHIM descriptorsGETAWAY descriptors..............

CoMFA, GRIDG-WHIM descriptors............

CoMFA, GRIDG-WHIM descriptors............

van der Waals volumegeometric volume...........

van der Waals volumegeometric volume...........

chargeselectronegativitiessuperdelocalizabilityhardnesssoftness

ELUMO

EHOMO

..............

chargeselectronegativitiessuperdelocalizabilityhardnesssoftness

ELUMO

EHOMO

..............

solvent-accessible surface areaCPSA descriptorsmolecular shape analysisMezey 3D shape analysis...........

solvent-accessible surface areaCPSA descriptorsmolecular shape analysisMezey 3D shape analysis...........

molecular surfacemolecular surfacemolecular surfacemolecular surface

volume volume

descriptorsdescriptors

Page 33: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Properties of a molecular descriptor

Several scientists are involved in searching for new

molecular descriptors able to catch new aspects of

the molecular structure. This kind of reasearch

involves creativity and imagination together with

solid theoretical basis allowing to obtain numbers

with some structural chemical meaning.

"There are no restriction on the design of structural

invariants, the limiting factor is one's own

imagination." [1].

M. Randic (1996), Molecular bonding profiles, J. Math. Chem., 19, 375-392

Page 34: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Properties of a molecular descriptor

invariance with respect to labeling and invariance with respect to labeling and

numbering of atomsnumbering of atoms

invariance with respect to roto-translationinvariance with respect to roto-translation

an unambiguous algorithmically computable an unambiguous algorithmically computable

definitiondefinition

values in a suitable numerical range for the values in a suitable numerical range for the

set of molecules where it is applicable toset of molecules where it is applicable to

invariance with respect to labeling and invariance with respect to labeling and

numbering of atomsnumbering of atoms

invariance with respect to roto-translationinvariance with respect to roto-translation

an unambiguous algorithmically computable an unambiguous algorithmically computable

definitiondefinition

values in a suitable numerical range for the values in a suitable numerical range for the

set of molecules where it is applicable toset of molecules where it is applicable to

a descriptor MUST have ...

Page 35: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Properties of a molecular descriptor

a descriptor should have ...

a structural interpretation a good correlation with at least one property no trivial correlation with other molecular descriptors gradual change in its values with gradual changes in the

molecular structure not including in the definition experimental properties not restricted to a too small class of molecular structures preferably, some discrimination power among isomers preferably, not trivially including in the definition other

molecular descriptors preferably, allowing reversible decoding (back from the

descriptor value to the structure)

Page 36: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

QSAR strategy

regression models (quantitative response)regression models (quantitative response)

classification models (qualitative response)classification models (qualitative response)

ranking models (ordered response)ranking models (ordered response)

regression models (quantitative response)regression models (quantitative response)

classification models (qualitative response)classification models (qualitative response)

ranking models (ordered response)ranking models (ordered response)

models ...

Page 37: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

QSAR strategy - Regression

Page 38: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

QSAR strategy - Classification

Page 39: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Toxicity1

2

3

4

5

6

7

8 9

10

11

12

13

14

15

16

17

18

19

20

21

Toxicity1

2

3

4

5

6

7

8 9

10

11

12

13

14

15

16

17

18

19

20

21

QSAR strategy - Ranking

Page 40: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

QSAR strategy

experimental responses

molecular descriptors

training set

set ofmolecules

MODEL

SRC (QSAR, QSPR, ... )

fitting

molecular descriptors

newmolecules

predicted newresponses

reversible decoding

experimental responses

molecular descriptors

test set

prediction power

Page 41: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

QSAR strategy

The true interest is inThe true interest is in

predictive power of the modelpredictive power of the model

Model validationModel validation

ChemometricsChemometrics

Page 42: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

… towards conclusions …

Page 43: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

FAQ - Frequently Asked Questions

1. What is the meaning of that descriptor ?1. What is the meaning of that descriptor ?

2. Why are there some models with the same prediction 2. Why are there some models with the same prediction

power but different molecular descriptors ?power but different molecular descriptors ?

3. Why use a huge number of molecular descriptors ?3. Why use a huge number of molecular descriptors ?

Page 44: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

FGA - FGA - ourour Frequently Given Answers Frequently Given Answers

1. What is the meaning of that descriptor ?1. What is the meaning of that descriptor ?

A A molecular descriptormolecular descriptor is a number extracted by a well is a number extracted by a well

defined algorithm from a molecular representation of a defined algorithm from a molecular representation of a

complex system, i.e. the molecule. There are complex system, i.e. the molecule. There are good reasons good reasons

to believeto believe that often our difficulties to attribute a meaning to that often our difficulties to attribute a meaning to

this number ultimately flow from the this number ultimately flow from the lacking of deeper lacking of deeper

chemical theories and higher level languageschemical theories and higher level languages and not from and not from

exoteric approaches to the descriptor definition. exoteric approaches to the descriptor definition.

R. Todeschini and V. ConsonniR. Todeschini and V. Consonni

Page 45: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

2. Why are there some models with the same prediction 2. Why are there some models with the same prediction

power but different molecular descriptors ? power but different molecular descriptors ?

Molecular descriptors are often intercorrelated, therefore Molecular descriptors are often intercorrelated, therefore

different molecular descriptors can, in turn, take part in a different molecular descriptors can, in turn, take part in a

model.model.

FGA - FGA - ourour Frequently Given Answers Frequently Given Answers

Any alternative viewpoint with a different emphasis Any alternative viewpoint with a different emphasis

leads to an leads to an inequivalent descriptioninequivalent description. There is only one . There is only one

reality but there are reality but there are many points of viewmany points of view..

Hans PrimasHans Primas

Page 46: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

3. Why use a huge number of molecular descriptors ?3. Why use a huge number of molecular descriptors ?

Complexity is not an intrinsic property of systems, but Complexity is not an intrinsic property of systems, but

rather arises from the number of ways in which we are rather arises from the number of ways in which we are

able (or desire) to interact with a system. able (or desire) to interact with a system.

A molecule is undoubtedly a complex systemA molecule is undoubtedly a complex system

FGA - FGA - ourour Frequently Given Answers Frequently Given Answers

Page 47: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

www.moleculardescriptors.eu

Page 48: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Department of Environmental SciencesDepartment of Environmental Sciences

University of Milano - BicoccaUniversity of Milano - Bicocca

P.za della Scienza, 1 - 20126 Milano (Italy)P.za della Scienza, 1 - 20126 Milano (Italy)

Website: michem.disat.unimib.it/chm/Website: michem.disat.unimib.it/chm/THANK YOU

Roberto Todeschini Roberto Todeschini

Viviana Consonni Viviana Consonni

Manuela PavanManuela Pavan

Andrea MauriAndrea Mauri

Davide BallabioDavide Ballabio

Alberto ManganaroAlberto Manganaro

chemometricschemometrics

molecular descriptorsmolecular descriptors

QSARQSAR

multicriteria decision makingmulticriteria decision making

environmetricsenvironmetrics

experimental designexperimental design

artificial neural networksartificial neural networks

statistical process controlstatistical process control

Milano Chemometrics and QSAR Research GroupMilano Chemometrics and QSAR Research Group

Page 49: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Page 50: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

coffee break

Page 51: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

... since December 2006... since December 2006... since December 2006... since December 2006

www.moleculardescriptors.eu

newsnews softwaresoftware booksbooks tutorialstutorials

and a forumand a forum

newsnews softwaresoftware booksbooks tutorialstutorials

and a forumand a forum

Page 52: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.
Page 53: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Don’t forget your goal!Don’t forget your goal!

An understanding of the behavior of a system does not An understanding of the behavior of a system does not

always coincide with the prediction of the system’s future always coincide with the prediction of the system’s future

behavior!behavior!

4. Is a model explaining the known facts of a system 4. Is a model explaining the known facts of a system

better than a model predicting the future events of that better than a model predicting the future events of that

system ?system ?

fitting versus predictionfitting versus prediction

FGA - FGA - ourour Frequently Given Answers Frequently Given Answers

Page 54: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

QSAR strategy - Regression

Page 55: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

"SIGNORI, Si potrebbe chiedersi quale sia il modo più "SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla."rammarichi del poi non servono a nulla."

Giacomo CiamicianGiacomo Ciamician

Tratto dalla Prolusione all'Opera scientifica di Wilhelm Tratto dalla Prolusione all'Opera scientifica di Wilhelm KÖRNER, Milano 15 maggio 1910.KÖRNER, Milano 15 maggio 1910.

Page 56: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Fragment approach

The biological activity of a molecule is The biological activity of a molecule is

the sum of its fragment propertiesthe sum of its fragment properties

Congeneric molecules, i.e. a common reference skeletonCongeneric molecules, i.e. a common reference skeleton

Substituent propertiesSubstituent properties

Page 57: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Fragment approach

Parametric approach (Hammett – Hansch,1964)Parametric approach (Hammett – Hansch,1964)

Group approach (Free-Wilson and Fujita-Ban, 1976)Group approach (Free-Wilson and Fujita-Ban, 1976)

DARC-PELCO approach (Dubois, 1966)DARC-PELCO approach (Dubois, 1966)

Sterimol approach (Verloop, 1976)Sterimol approach (Verloop, 1976)

Page 58: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Hansch molecular descriptorsHansch molecular descriptors

partition coefficients - logP, logKow

chromatog. param. - Rf, RT,

Solubility

….

Hammett constants

molar refraction

dipole moment

HOMO, LUMO

Ionization potential

….

molecular weight

VDW volume

molar volume

surface area

….

lipophilic lipophilic propertiesproperties

steric steric propertiesproperties

electronic electronic propertiesproperties

Hansch approach

Page 59: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

The role of the molecular descriptors

Page 60: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Introduction

Page 61: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Conclusions

A A molecular descriptormolecular descriptor is a number extracted by a well is a number extracted by a well

defined algorithm from a molecular representation of a defined algorithm from a molecular representation of a

complex system, i.e. the molecule. There are complex system, i.e. the molecule. There are good reasons good reasons

to believeto believe that often our difficulties to attribute a meaning to that often our difficulties to attribute a meaning to

this number ultimately flow from the this number ultimately flow from the lacking of deeper lacking of deeper

chemical theories and higher level languageschemical theories and higher level languages and not from and not from

exoteric approaches to the descriptor definition.exoteric approaches to the descriptor definition.

R. Todeschini and V. ConsonniR. Todeschini and V. Consonni

Page 62: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Properties of a molecular descriptor

Page 63: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

Conclusions

Any alternative viewpoint with a different Any alternative viewpoint with a different

emphasis leads to an emphasis leads to an inequivalent descriptioninequivalent description. .

There is only one reality but there are There is only one reality but there are many many

points of viewpoints of view..

Hans PrimasHans Primas

Page 64: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

X

Page 65: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecule

physico - chemicalproperties

biologicalactivities

moleculardescriptors

Page 66: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

1D1D1D1D. .· ·

··· ·

· ···

· ·.

.

.

...

. .C

C

C

C

C C

C C

CC

CC

C l C l

C l C l

H

H

H

H

H

H

3D3D3D3D

0D0D0D0D. .· ·

··· ·

· ···

· ·.

.

.

...

. .

Cl Cl

ClCl

H

H

H

H

H

H

2D2D2D2D

Representations of a molecular structure

Page 67: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

molecular structure ?molecular structure ?

Just a question …

Page 68: Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria.

““... : benchè certamente si traveggano già dei ... : benchè certamente si traveggano già dei rapporti fra la rapporti fra la costituzione chimica (composizione e struttura) e le proprietà costituzione chimica (composizione e struttura) e le proprietà fisichefisiche loro, è ancor certamente di gran lunga troppo ristretto loro, è ancor certamente di gran lunga troppo ristretto il numero dei fatti, per dedurne delle conseguenze, che oltre il numero dei fatti, per dedurne delle conseguenze, che oltre al carattere d’una semplice ipotesi possono pretendere al carattere d’una semplice ipotesi possono pretendere anche quello della probabilità.anche quello della probabilità.In ogni caso tali rapporti non sono di natura tanto semplice In ogni caso tali rapporti non sono di natura tanto semplice come a priori forse era lecito aspettarsi.come a priori forse era lecito aspettarsi.Di certo Di certo le proprietà fisiche dei corpile proprietà fisiche dei corpi sonosono in primo luogo in primo luogo una una funzionefunzione della composizione e strutturadella composizione e struttura loro, sulla di cui loro, sulla di cui forma nulla ancora si sa; funzione probabilmente molto forma nulla ancora si sa; funzione probabilmente molto complessa e per il di cui studio occorrerà un imprevedibile complessa e per il di cui studio occorrerà un imprevedibile numero di fatti, onde poter sufficientemente restringere la numero di fatti, onde poter sufficientemente restringere la cerchia delle rappresentazioni possibili.” cerchia delle rappresentazioni possibili.”

Some historical notes