Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro...
-
Upload
daisy-couzens -
Category
Documents
-
view
229 -
download
3
Transcript of Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro...
Roberto Todeschini Roberto Todeschini
Viviana Consonni Viviana Consonni
Manuela PavanManuela Pavan
Andrea MauriAndrea Mauri
Davide BallabioDavide Ballabio
Alberto ManganaroAlberto Manganaro
chemometricschemometrics
molecular descriptorsmolecular descriptors
QSARQSAR
multicriteria decision makingmulticriteria decision making
environmetricsenvironmetrics
experimental designexperimental design
artificial neural networksartificial neural networks
statistical process controlstatistical process control
Milano Chemometrics and QSAR Research GroupMilano Chemometrics and QSAR Research Group
Department of Environmental SciencesDepartment of Environmental Sciences
University of Milano - BicoccaUniversity of Milano - Bicocca
P.za della Scienza, 1 - 20126 Milano (Italy)P.za della Scienza, 1 - 20126 Milano (Italy)
Website: michem.unimib.it/chm/Website: michem.unimib.it/chm/
Roberto TodeschiniMilano Chemometrics and QSAR Research Group
An introduction to
molecular descriptors and QSAR
Iran - February 2009Iran - February 2009
synthesissynthesis: chemistry produces the : chemistry produces the
objetcs of its own studyobjetcs of its own study
chemical compositionchemical composition: a unifying concept : a unifying concept
for all the experimental sciencesfor all the experimental sciences
molecular structuremolecular structure: one the most fruitful : one the most fruitful
scientific concepts of this centuryscientific concepts of this century
synthesissynthesis: chemistry produces the : chemistry produces the
objetcs of its own studyobjetcs of its own study
chemical compositionchemical composition: a unifying concept : a unifying concept
for all the experimental sciencesfor all the experimental sciences
molecular structuremolecular structure: one the most fruitful : one the most fruitful
scientific concepts of this centuryscientific concepts of this century
The chemical data
The concept of molecular structure is one of The concept of molecular structure is one of
the most reach of the last 140 years.the most reach of the last 140 years.
Molecular structure
The basic assumptions are that different The basic assumptions are that different
molecular structures have different chemical molecular structures have different chemical
properties and similar molecular structures properties and similar molecular structures
have similar molecular properties.have similar molecular properties.
Molecular structure
congenericity principlecongenericity principle
Each molecular representation represents a Each molecular representation represents a
different way to look at the molecular structure different way to look at the molecular structure
and its chemical meaning is strongly and its chemical meaning is strongly
immersed in the framework of the chemical immersed in the framework of the chemical
theories.theories.
Molecular structure
Some historical notes
Studi sull’isomeria delle così dette sostanze aromatiche Studi sull’isomeria delle così dette sostanze aromatiche
a sei atomi di carbonio.a sei atomi di carbonio.
Gazzetta Chimica ItalianaGazzetta Chimica Italiana, vol. IV, p.305, vol. IV, p.305
Some historical notes
18741874
Wilhelm KÖRNERWilhelm KÖRNER
To distinguish the observed different di-substituted benzenes, To distinguish the observed different di-substituted benzenes,
he proposed to distinguish them into he proposed to distinguish them into ortho-, meta-, and para-.ortho-, meta-, and para-.
Some historical notes
These can be considered the These can be considered the
first 3 molecular descriptorsfirst 3 molecular descriptors
18741874
Wilhelm KÖRNERWilhelm KÖRNER
Based on these descriptors, 90 years later, Corwin Hansch Based on these descriptors, 90 years later, Corwin Hansch
proposed the first QSAR approach.proposed the first QSAR approach.
Some historical notes
Lipophilic, electronic and Lipophilic, electronic and
steric descriptors for ortho-, steric descriptors for ortho-,
meta-, and para-substituentsmeta-, and para-substituents
19641964
Corwin HANSCHCorwin HANSCH
“The molecular descriptor is the final result of a logic
and mathematical procedure which transforms
chemical information encoded within a symbolic
representation of a molecule into a useful number or
the result of some standardized experiment.”
R. Todeschini and V. Consonni
Definition of molecular descriptorDefinition of molecular descriptorDefinition of molecular descriptorDefinition of molecular descriptor
Molecular descriptors
3300 molecular descriptors3300 molecular descriptors 3300 molecular descriptors3300 molecular descriptors
Molecular descriptors
lion forefeetlion forefeeteagle hind legseagle hind legs
scorpion tailscorpion tail
dragon headdragon head
bull bodybull bodyunicornunicorn
snake necksnake neck
Molecular descriptors
sizesize
symmetrysymmetry
branchingbranching
stericsteric
shapeshape
cyclicitycyclicity
hydrophobicityhydrophobicity
H - bondingH - bonding
electronic aspectselectronic aspects
reactivityreactivity
Molecular descriptors
sizesize
symmetrysymmetry
branchingbranching
stericsteric
shapeshape
cyclicitycyclicity
hydrophobicityhydrophobicity
H - bondingH - bonding
electronic aspectselectronic aspects
several several meanings in just meanings in just
one numberone number
reactivityreactivity
Molecular descriptors
Molecular descriptorsMolecular descriptors
graph theory discrete mathematics physical chemistrygraph theory discrete mathematics physical chemistry
information theory quantum chemistry organic chemistryinformation theory quantum chemistry organic chemistry
differential topology algebraic topologydifferential topology algebraic topology
derived from ….derived from ….
QSAR/QSPR medicinal chemistry pharmacology genomicsQSAR/QSPR medicinal chemistry pharmacology genomics
drug design toxicology proteomics analytical chemistrydrug design toxicology proteomics analytical chemistry
environmetrics virtual screening library searchingenvironmetrics virtual screening library searching
applied in ….applied in ….
statisticsstatistics
chemometricschemometrics
chemoinformaticschemoinformatics
processed by ….processed by ….
Molecular descriptors
molecule
physico - chemicalproperties
biologicalactivities
molecular
descriptors
Molecular descriptors
Historical note: fragment approach
The biological activity of a molecule is The biological activity of a molecule is
the sum of its fragment propertiesthe sum of its fragment properties
common reference skeletoncommon reference skeleton
molecule properties gradually modified by substituentsmolecule properties gradually modified by substituents
Congenericity principleCongenericity principle
QSAR styrategies can be applied ONLY to classes of QSAR styrategies can be applied ONLY to classes of
similar compoundssimilar compounds
Biological response = fBiological response = f11((LL) + f) + f22((EE) + f) + f33((SS) + f) + f44((MM))
Corvin Hansch, 1964Corvin Hansch, 1964
Historical note: Hansch approach
Lipophilic propertiesLipophilic properties
Electronic propertiesElectronic properties
Steric propertiesSteric properties
Other molecular propertiesOther molecular properties
11
22
33
44
11 Congenericity approachCongenericity approach
22 Linear additive schemeLinear additive scheme
33 Limited representation of global molecular propertiesLimited representation of global molecular properties
44 No 3D and conformational informationNo 3D and conformational information
Historical note: Hansch approach
boiling point
melting point
dipole moment
molar refractivity
parachor
octanol/water partition coefficient
vapor pressure
density
solubility
.............................
Physico-chemical propertiesPhysico-chemical properties
The role of the molecular descriptors
binding affinity
lethal dose
inhibition concentration
mutagenicity
carcinogenicity
................
Biological activitiesBiological activities
The role of the molecular descriptors
biodegradation
bioconcentration
BOD
COD
half - life time
mobility
atmospheric persistance
.........................
Environmental propertiesEnvironmental properties
The role of the molecular descriptors
.... and more.... and more
conductivity
retention time
reological behaviours
.........................
The role of the molecular descriptors
molecule
moleculardescriptors
molecular structure
representation
a real objecta real object
numbersnumbers
Representations of a molecular structure
Representations of a molecular structure
3D - geometrical3D - geometrical3D - geometrical3D - geometrical
0D - counts0D - counts0D - counts0D - counts
Representations of a molecular structure
Cl Cl
ClCl
H
H
H
H
H
H
2D - topochemical2D - topochemical2D - topochemical2D - topochemical
2D - topostructural2D - topostructural2D - topostructural2D - topostructural
. .· ·
··· ·
· ···
· ·.
.
.
...
. .C
C
C
C
C C
C C
CC
CC
C l C l
C l C l
H
H
H
H
H
H
1D – fragment counts1D – fragment counts1D – fragment counts1D – fragment counts
. .· ·
··· ·
· ···
· ·.
.
.
...
. .C
C
C
C
C C
C C
CC
CC
C l C l
C l C l
H
H
H
H
H
H
probesprobes interaction energy valueinteraction energy valueat each pointat each pointfor each probefor each probe
• stericsteric
• electronicelectronic
• hydrophobichydrophobic
Representations of a molecular structure
4D4D
molecular graphmolecular graph
graph invariantsgraph invariants
topostructural topostructural descriptorsdescriptors
topochemical topochemical descriptorsdescriptors
topographic topographic descriptorsdescriptors
topological information indicestopological information indices
2D2D
Atom listAtom list 0D0D
countingcounting summingsumming
grid-based QSAR grid-based QSAR techniquestechniques
interaction energy interaction energy valuesvalues
4D4D
Substructure listSubstructure list 1D1D
countingcounting
molecular geometrymolecular geometryx, y, z coordinatesx, y, z coordinates
geometrical geometrical descriptorsdescriptors
quantum-chemical quantum-chemical descriptorsdescriptors
bulk descriptorsbulk descriptors
molecular surface molecular surface descriptorsdescriptors
3D3D
structural keysstructural keys
molecular graphmolecular graph graph invariantsgraph invariants
Wiener index, Hosoya Z indexZagreb indices, Mohar indicesRandic connectivity indexBalaban distance connectivity indexSchultz molecular topological indexKier shape descriptorseigenvalues of the adjacency matrixeigenvalues of the distance matrixKirchhoff numberdetour indextopological charge indices...............
Wiener index, Hosoya Z indexZagreb indices, Mohar indicesRandic connectivity indexBalaban distance connectivity indexSchultz molecular topological indexKier shape descriptorseigenvalues of the adjacency matrixeigenvalues of the distance matrixKirchhoff numberdetour indextopological charge indices...............
total information content on .....mean information content on .....total information content on .....mean information content on .....
Kier-Hall valence connectivity indicesBurden eigenvaluesBCUT descriptorsKier alpha-modified shape descriptors2D autocorrelation descriptors...............
Kier-Hall valence connectivity indicesBurden eigenvaluesBCUT descriptorsKier alpha-modified shape descriptors2D autocorrelation descriptors...............
3D-Wiener index3D-Balaban indexD/D index...............
3D-Wiener index3D-Balaban indexD/D index...............
topological information indicestopological information indices
topostructural topostructural descriptorsdescriptors
topochemical topochemical descriptorsdescriptors
molecular geometrymolecular geometryx, y, z coordinatesx, y, z coordinates
topographic topographic descriptorsdescriptors
molecular geometrymolecular geometryx, y, z coordinatesx, y, z coordinates
geometrical geometrical descriptorsdescriptors
interaction energy interaction energy
valuesvalues
grid-based QSAR grid-based QSAR techniquestechniques
quantum-chemical quantum-chemical
descriptorsdescriptors
gravitational indices3D-Morse descriptorsEVA descriptorsEEVA descriptorsWHIM descriptorsGETAWAY descriptors..............
gravitational indices3D-Morse descriptorsEVA descriptorsEEVA descriptorsWHIM descriptorsGETAWAY descriptors..............
CoMFA, GRIDG-WHIM descriptors............
CoMFA, GRIDG-WHIM descriptors............
van der Waals volumegeometric volume...........
van der Waals volumegeometric volume...........
chargeselectronegativitiessuperdelocalizabilityhardnesssoftness
ELUMO
EHOMO
..............
chargeselectronegativitiessuperdelocalizabilityhardnesssoftness
ELUMO
EHOMO
..............
solvent-accessible surface areaCPSA descriptorsmolecular shape analysisMezey 3D shape analysis...........
solvent-accessible surface areaCPSA descriptorsmolecular shape analysisMezey 3D shape analysis...........
molecular surfacemolecular surfacemolecular surfacemolecular surface
volume volume
descriptorsdescriptors
Properties of a molecular descriptor
Several scientists are involved in searching for new
molecular descriptors able to catch new aspects of
the molecular structure. This kind of reasearch
involves creativity and imagination together with
solid theoretical basis allowing to obtain numbers
with some structural chemical meaning.
"There are no restriction on the design of structural
invariants, the limiting factor is one's own
imagination." [1].
M. Randic (1996), Molecular bonding profiles, J. Math. Chem., 19, 375-392
Properties of a molecular descriptor
invariance with respect to labeling and invariance with respect to labeling and
numbering of atomsnumbering of atoms
invariance with respect to roto-translationinvariance with respect to roto-translation
an unambiguous algorithmically computable an unambiguous algorithmically computable
definitiondefinition
values in a suitable numerical range for the values in a suitable numerical range for the
set of molecules where it is applicable toset of molecules where it is applicable to
invariance with respect to labeling and invariance with respect to labeling and
numbering of atomsnumbering of atoms
invariance with respect to roto-translationinvariance with respect to roto-translation
an unambiguous algorithmically computable an unambiguous algorithmically computable
definitiondefinition
values in a suitable numerical range for the values in a suitable numerical range for the
set of molecules where it is applicable toset of molecules where it is applicable to
a descriptor MUST have ...
Properties of a molecular descriptor
a descriptor should have ...
a structural interpretation a good correlation with at least one property no trivial correlation with other molecular descriptors gradual change in its values with gradual changes in the
molecular structure not including in the definition experimental properties not restricted to a too small class of molecular structures preferably, some discrimination power among isomers preferably, not trivially including in the definition other
molecular descriptors preferably, allowing reversible decoding (back from the
descriptor value to the structure)
QSAR strategy
regression models (quantitative response)regression models (quantitative response)
classification models (qualitative response)classification models (qualitative response)
ranking models (ordered response)ranking models (ordered response)
regression models (quantitative response)regression models (quantitative response)
classification models (qualitative response)classification models (qualitative response)
ranking models (ordered response)ranking models (ordered response)
models ...
QSAR strategy - Regression
QSAR strategy - Classification
Toxicity1
2
3
4
5
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
Toxicity1
2
3
4
5
6
7
8 9
10
11
12
13
14
15
16
17
18
19
20
21
QSAR strategy - Ranking
QSAR strategy
experimental responses
molecular descriptors
training set
set ofmolecules
MODEL
SRC (QSAR, QSPR, ... )
fitting
molecular descriptors
newmolecules
predicted newresponses
reversible decoding
experimental responses
molecular descriptors
test set
prediction power
QSAR strategy
The true interest is inThe true interest is in
predictive power of the modelpredictive power of the model
Model validationModel validation
ChemometricsChemometrics
… towards conclusions …
FAQ - Frequently Asked Questions
1. What is the meaning of that descriptor ?1. What is the meaning of that descriptor ?
2. Why are there some models with the same prediction 2. Why are there some models with the same prediction
power but different molecular descriptors ?power but different molecular descriptors ?
3. Why use a huge number of molecular descriptors ?3. Why use a huge number of molecular descriptors ?
FGA - FGA - ourour Frequently Given Answers Frequently Given Answers
1. What is the meaning of that descriptor ?1. What is the meaning of that descriptor ?
A A molecular descriptormolecular descriptor is a number extracted by a well is a number extracted by a well
defined algorithm from a molecular representation of a defined algorithm from a molecular representation of a
complex system, i.e. the molecule. There are complex system, i.e. the molecule. There are good reasons good reasons
to believeto believe that often our difficulties to attribute a meaning to that often our difficulties to attribute a meaning to
this number ultimately flow from the this number ultimately flow from the lacking of deeper lacking of deeper
chemical theories and higher level languageschemical theories and higher level languages and not from and not from
exoteric approaches to the descriptor definition. exoteric approaches to the descriptor definition.
R. Todeschini and V. ConsonniR. Todeschini and V. Consonni
2. Why are there some models with the same prediction 2. Why are there some models with the same prediction
power but different molecular descriptors ? power but different molecular descriptors ?
Molecular descriptors are often intercorrelated, therefore Molecular descriptors are often intercorrelated, therefore
different molecular descriptors can, in turn, take part in a different molecular descriptors can, in turn, take part in a
model.model.
FGA - FGA - ourour Frequently Given Answers Frequently Given Answers
Any alternative viewpoint with a different emphasis Any alternative viewpoint with a different emphasis
leads to an leads to an inequivalent descriptioninequivalent description. There is only one . There is only one
reality but there are reality but there are many points of viewmany points of view..
Hans PrimasHans Primas
3. Why use a huge number of molecular descriptors ?3. Why use a huge number of molecular descriptors ?
Complexity is not an intrinsic property of systems, but Complexity is not an intrinsic property of systems, but
rather arises from the number of ways in which we are rather arises from the number of ways in which we are
able (or desire) to interact with a system. able (or desire) to interact with a system.
A molecule is undoubtedly a complex systemA molecule is undoubtedly a complex system
FGA - FGA - ourour Frequently Given Answers Frequently Given Answers
www.moleculardescriptors.eu
Department of Environmental SciencesDepartment of Environmental Sciences
University of Milano - BicoccaUniversity of Milano - Bicocca
P.za della Scienza, 1 - 20126 Milano (Italy)P.za della Scienza, 1 - 20126 Milano (Italy)
Website: michem.disat.unimib.it/chm/Website: michem.disat.unimib.it/chm/THANK YOU
Roberto Todeschini Roberto Todeschini
Viviana Consonni Viviana Consonni
Manuela PavanManuela Pavan
Andrea MauriAndrea Mauri
Davide BallabioDavide Ballabio
Alberto ManganaroAlberto Manganaro
chemometricschemometrics
molecular descriptorsmolecular descriptors
QSARQSAR
multicriteria decision makingmulticriteria decision making
environmetricsenvironmetrics
experimental designexperimental design
artificial neural networksartificial neural networks
statistical process controlstatistical process control
Milano Chemometrics and QSAR Research GroupMilano Chemometrics and QSAR Research Group
coffee break
... since December 2006... since December 2006... since December 2006... since December 2006
www.moleculardescriptors.eu
newsnews softwaresoftware booksbooks tutorialstutorials
and a forumand a forum
newsnews softwaresoftware booksbooks tutorialstutorials
and a forumand a forum
Don’t forget your goal!Don’t forget your goal!
An understanding of the behavior of a system does not An understanding of the behavior of a system does not
always coincide with the prediction of the system’s future always coincide with the prediction of the system’s future
behavior!behavior!
4. Is a model explaining the known facts of a system 4. Is a model explaining the known facts of a system
better than a model predicting the future events of that better than a model predicting the future events of that
system ?system ?
fitting versus predictionfitting versus prediction
FGA - FGA - ourour Frequently Given Answers Frequently Given Answers
QSAR strategy - Regression
"SIGNORI, Si potrebbe chiedersi quale sia il modo più "SIGNORI, Si potrebbe chiedersi quale sia il modo più proficuo per ritrarre da una ipotesi il maggior utile per lo proficuo per ritrarre da una ipotesi il maggior utile per lo sviluppo di una data dottrina. Forse a molti potrà sembrare sviluppo di una data dottrina. Forse a molti potrà sembrare che in tale riguardo convenga procedere con grande che in tale riguardo convenga procedere con grande prudenza per non introdurre nella scienza concezioni prudenza per non introdurre nella scienza concezioni ipotetiche troppo ardite, che non si trovino poi in ipotetiche troppo ardite, che non si trovino poi in concordanza con la realtà dei fatti. Io credo invece che il concordanza con la realtà dei fatti. Io credo invece che il progresso della scienza sia stato ritardato piuttosto da progresso della scienza sia stato ritardato piuttosto da soverchia prudenza che da soverchio ardire. Nella scienza soverchia prudenza che da soverchio ardire. Nella scienza bisogna a tempo sapere osare come in materia di amore: bisogna a tempo sapere osare come in materia di amore: sapere osare subito ed andare fino in fondo; i reclami ed i sapere osare subito ed andare fino in fondo; i reclami ed i rammarichi del poi non servono a nulla."rammarichi del poi non servono a nulla."
Giacomo CiamicianGiacomo Ciamician
Tratto dalla Prolusione all'Opera scientifica di Wilhelm Tratto dalla Prolusione all'Opera scientifica di Wilhelm KÖRNER, Milano 15 maggio 1910.KÖRNER, Milano 15 maggio 1910.
Fragment approach
The biological activity of a molecule is The biological activity of a molecule is
the sum of its fragment propertiesthe sum of its fragment properties
Congeneric molecules, i.e. a common reference skeletonCongeneric molecules, i.e. a common reference skeleton
Substituent propertiesSubstituent properties
Fragment approach
Parametric approach (Hammett – Hansch,1964)Parametric approach (Hammett – Hansch,1964)
Group approach (Free-Wilson and Fujita-Ban, 1976)Group approach (Free-Wilson and Fujita-Ban, 1976)
DARC-PELCO approach (Dubois, 1966)DARC-PELCO approach (Dubois, 1966)
Sterimol approach (Verloop, 1976)Sterimol approach (Verloop, 1976)
Hansch molecular descriptorsHansch molecular descriptors
partition coefficients - logP, logKow
chromatog. param. - Rf, RT,
Solubility
….
Hammett constants
molar refraction
dipole moment
HOMO, LUMO
Ionization potential
….
molecular weight
VDW volume
molar volume
surface area
….
lipophilic lipophilic propertiesproperties
steric steric propertiesproperties
electronic electronic propertiesproperties
Hansch approach
The role of the molecular descriptors
Introduction
Conclusions
A A molecular descriptormolecular descriptor is a number extracted by a well is a number extracted by a well
defined algorithm from a molecular representation of a defined algorithm from a molecular representation of a
complex system, i.e. the molecule. There are complex system, i.e. the molecule. There are good reasons good reasons
to believeto believe that often our difficulties to attribute a meaning to that often our difficulties to attribute a meaning to
this number ultimately flow from the this number ultimately flow from the lacking of deeper lacking of deeper
chemical theories and higher level languageschemical theories and higher level languages and not from and not from
exoteric approaches to the descriptor definition.exoteric approaches to the descriptor definition.
R. Todeschini and V. ConsonniR. Todeschini and V. Consonni
Properties of a molecular descriptor
Conclusions
Any alternative viewpoint with a different Any alternative viewpoint with a different
emphasis leads to an emphasis leads to an inequivalent descriptioninequivalent description. .
There is only one reality but there are There is only one reality but there are many many
points of viewpoints of view..
Hans PrimasHans Primas
X
molecule
physico - chemicalproperties
biologicalactivities
moleculardescriptors
1D1D1D1D. .· ·
··· ·
· ···
· ·.
.
.
...
. .C
C
C
C
C C
C C
CC
CC
C l C l
C l C l
H
H
H
H
H
H
3D3D3D3D
0D0D0D0D. .· ·
··· ·
· ···
· ·.
.
.
...
. .
Cl Cl
ClCl
H
H
H
H
H
H
2D2D2D2D
Representations of a molecular structure
molecular structure ?molecular structure ?
Just a question …
““... : benchè certamente si traveggano già dei ... : benchè certamente si traveggano già dei rapporti fra la rapporti fra la costituzione chimica (composizione e struttura) e le proprietà costituzione chimica (composizione e struttura) e le proprietà fisichefisiche loro, è ancor certamente di gran lunga troppo ristretto loro, è ancor certamente di gran lunga troppo ristretto il numero dei fatti, per dedurne delle conseguenze, che oltre il numero dei fatti, per dedurne delle conseguenze, che oltre al carattere d’una semplice ipotesi possono pretendere al carattere d’una semplice ipotesi possono pretendere anche quello della probabilità.anche quello della probabilità.In ogni caso tali rapporti non sono di natura tanto semplice In ogni caso tali rapporti non sono di natura tanto semplice come a priori forse era lecito aspettarsi.come a priori forse era lecito aspettarsi.Di certo Di certo le proprietà fisiche dei corpile proprietà fisiche dei corpi sonosono in primo luogo in primo luogo una una funzionefunzione della composizione e strutturadella composizione e struttura loro, sulla di cui loro, sulla di cui forma nulla ancora si sa; funzione probabilmente molto forma nulla ancora si sa; funzione probabilmente molto complessa e per il di cui studio occorrerà un imprevedibile complessa e per il di cui studio occorrerà un imprevedibile numero di fatti, onde poter sufficientemente restringere la numero di fatti, onde poter sufficientemente restringere la cerchia delle rappresentazioni possibili.” cerchia delle rappresentazioni possibili.”
Some historical notes