Basic Concepts and Areas of Application - kpfu.rukpfu.ru/docs/F1472709644/Varnek.pdfChemoinformatics...
Transcript of Basic Concepts and Areas of Application - kpfu.rukpfu.ru/docs/F1472709644/Varnek.pdfChemoinformatics...
1
Chemoinformatics: Basic Concepts and Areas of Application
Alexandre Varnek
Laboratory of Chemoinformatics, University of Strasbourg
Double diploma UniStra/KFU
Chem(o)informatics
Cheminformatics
Chemical Informatics
Infochimie
Chémoinformatique
Хемоинформатика
Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information
G. Paris, 1998
Chemoinformatics - definition
Chemoinformatics is the application of informatics methods to solve chemical problems
J. Gasteiger, 2004
Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization”
F.K. Brown, 1998
Chemoinformatics is a field based on the representation of molecules as objects
(graphs or vectors) in a chemical space A. Varnek & I. Baskin, 2011
Selected books in chemoinformatics
Paul Emile Lecoq de
Boisbaudran
Gallium discovery:
the first QSAR successful story
Predicted in 1869
Dmitry Mendeleév
Discovered in 1875
Densitypred ≈ 6.0 g/cm3 Densityexp = 4.7 (initial)
Densityexp = 5.935
(corrected)
Chemoinformatics:
new disciline combining several „old“ fields
• Chemical databases
• Structure-Activity modeling (QSAR)
• Structure-based drug design
• Computer-aided synthesis design
Peter Willett Michael Lynch
Corwin Hansch Johann Gasteiger
Irwin D. Kuntz
Elias Corey Ivar Ugi
Hans-Joachim Böhm
• Needs for chemoinformatics
• Fundamentals of chemoinformatics
• Chemical Space paradigm
• Virtual screening approaches
• Perspectives
OUTLOOK
Needs in Chemoinformatics
10
Chemical universe
> 100 M compounds are currently recorded
• How to select useful compounds from this huge dataset ?
• How to design new compounds ?
• How to synthesize these compounds ?
Target Protein
Large libraries
of molecules
High Throughout Screening
Hit
experiment
computations
Virtual
Screening
Small Library of selected hits
Chemical universe:
• > 108 compounds are currently available
• 1033 druglike molecules could potentially be synthesised
(see P. Polischuk, T. Madzidov et al., JCAMD, 2013)
Virtual screening is inevitable to analyse a
huge amount of protein-ligand combinations
Virtual screening must be very fast and efficient !
Human proteome:
• 5000 druggable proteins
Ionic Liquids
Ionic Liquids are composed of
large organic cations:
PF6-, Cl-, BF4
-, CF3SO3-, [CF3SO2)2N]-
and anions:
N RR12
+N R
R
1
2
+ N
N+
R
R
R
1
2
3
N
R
R
R
R1
2
3
4
+N
N+
R
R
1
3
There exist combinations of
ions that could lead to useful ionic
liquids
Ionic Liquids
Large organic cations:
PF6-, Cl-, BF4
-, CF3SO3-, [CF3SO2)2N]-
anions:
N RR12
+N R
R
1
2
+ N
N+
R
R
R
1
2
3
N
R
R
R
R1
2
3
4
+N
N+
R
R
1
3
1018
Virtual screening : finding the needle in the haystack
CHEMICAL DATABASE
~106 – 109
molecules
Chemoinformatics: pattern recognition in chemistry
CHEMICAL DATABASE
~106 – 109
molecules
model
- Specific structural motifs,
- Selected molecular properties (shape, fields, …),
- Interaction patterns,
- Mathematical equations
Activity = F (structure)
VIRTUAL
SCREENING
INACTIVES
HITS ~106 – 109
molecules
CHEMICAL DATABASE
Chemoinformatics: Virtual screening “funnel”
Similarity search
Filters
(Q)SAR
Docking
Pharmacophore
~101 – 103
molecules
Chemoinformatics: Virtual screening “funnel”
Similarity search
Filters
(Q)SAR
Docking
Pharmacophore
VIRTUAL
SCREENING
INACTIVES
HITS
~106 – 109
molecules
CHEMICAL
DATABASE
~101 – 103
molecules
Ligand-based
Structure-based
Chemoinformatics as a
theoretical chemistry discipline
20
Chemoinformatics is defined as individual discipline
characterized by its own molecular model, basic concepts,
major applications and learning approach
21
Theoretical chemistry
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics
- Molecular model - Basic concepts - Major applications - Learning approaches
22
Molecular Model
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics • molecular graph
• descriptor vector
electrons and nuclei
atoms and bonds
Chemoinformatics is a field based on the representation of molecules as
objects (graphs or vectors) in a chemical space
Chemoinformatics: From Data to Knowledge
know-
ledge
information
data
generalization
context
measurement
or calculation
deductive
learning
inductive
learning
Chemoinformatics learns from experimental data !
Basic concepts
Quantum Chemistry
Force Field
Molecular Modelling
Chemoinformatics chemical space
wave/particle dualism
classical mechanics
Chemical space paradigm
26
Chemical Space representations
graphs-based descriptors -based
SPACE = objects + metric
Graph-based chemical space
A. Schuffenhauer, P. Ertl, et al. J. Chem. Inf. Model., 2007, 47 (1), 47-58
Scaffold Tree
Natural Product Scaffold Tree
Courtesy of P. Ertl
Natural Product Scaffold Tree
Courtesy of P. Ertl
Descriptors-based chemical space
vectorial space defined by molecular descriptors
32
Case study: Hansch Analysis
3 types of physicochemical parameters are used:
• Electronic (s)
• Steric (dEs)
• Hydrophobic (logP)
Biological Activity = f (Physicochemical parameters ) + constant
Activity = a ( log P )2 + b log P + s + dEs + cont
33
Case study: Hansch Analysis
Molecule 1
Molecule 2
34
Molecular Descriptors :
ensemble of topological, electronic, geometry parameters calculated directly
from molecular structure
Descriptors
D1
D2
…
Di
…
Molecular graph
-Topological indices,
- Atomic charges,
- Inductive descriptors,
- Substructural fragments,
- Molecular volume and surface, …
Descriptor vector
> 5000 types of descriptors are reported
35
Chemography: Design and visualization of chemical space
Greenland
2.2 M km2
Australia
7.7 km2
Arabian Peninsula
3.5 M km2
Dimensionality Reduction problem
37
Swiss Roll
• GTM relates the latent space with a 2D “rubber sheet” (manifold) injected into
the high-dimensional data space.
• The visualization plot is obtained by projecting the data points onto the manifold
and then letting the “rubber sheet” relax to its original form.
Generative Topography Mapping (GTM)
N. Kireeva, I. Baskin, H. Gaspar, D. Horvath, G. Marcou, A. Varnek Mol. Inf. 2012, 31, 301–312
GTM of a dataset containing 10 activities from DUD
Similarity principle:
similar molecules possess similar properties
39
Chemical Similarity
0.82
0.39
0.84
0.72
0.67
0.64
0.53
0.56
0.52
reference
compound
Similar compounds possess similar properties
Chemical space representation: Activity Landscapes
i
ik
i
iki
kR
RA
= A Expectation of activity in k - node for the training set
logK of Lu3+L complexes
Ak
logKLu
42
Strong binders Weak binders
Activity landscape of lanthanides’ binders
Generative Topographic Mappping
of the set of Ln binders
Contours correspond to different
logK values
H. Gaspar, I. Baskin, G. Marcou, A. Varnek unpublished results
Biopharmaceutics Drug Disposition Classification System
DATASET: 893 drugs
DESCRIPTORS: VolSurf
Case study: classification models for BDDCS classes
Visualization of models’ Applicability Domain
44
CPF ≤ 1, coverage =100 % CPF ≤ 5, coverage = 47 %
BDDCS classes probability distribution
Colored zones on the maps correspond to model’s applicability domain
H. Gaspar, G. Marcou, A. Varnek JCIM, 2013
Class Preference Factor 𝑪𝑷𝑭 = max𝑐 𝑃(𝑘|𝐶)
𝑃(𝑘|𝐶𝑖), ∀𝐶𝑖 ≠ 𝐶
Chemoinformatics:
Properties predictions
46
Quantitative Structure-Activity Relationships
(QSAR)
Activity = F (structure)
= F (descriptors)
machine-learning methods
• neural networks, support vector machine,
random forest, naïve Bayes, PLS, …
A. Varnek & I. Baskin Machine Learning Methods in Chemoinformatics: Quo Vadis?
J. Chem. Inf. Model. 2012, 52, 1413−1437
predictions of > 20 physico-chemical
properties and NMR spectra for
each individual compound
Chemoinformatics tools in SciFinder:
ISIDA virtual screening platform
infochim.u-strasbg.fr/webserv/VSEngine.html
Machine Learning Methods in Chemoinformatics: Quo Vadis ?
A. Varnek and I. Baskin , J Chem. Inf. Mod., 2012, 52, 1413-1437
Chemoinformatics: virtual screening in 3D
Virtual screening : finding the needle in the haystack
CHEMICAL DATABASE
~106 – 109
molecules
What is in common between these two molecules ?
-
+
+ -
- Arg-Gly-Asp-Phe
Tirofiban
Pharmacophore model of ligand complementary to
integrine αIIbβ3
Positive charge,
H-donor
Negative charge,
H-acceptor
15.5 Å
5 Å
- +
Hydrophobic
interactions
-
+ + -
pKi = 7.51
TanimotoCombo = 0.74
pKi = 7.82
TanimotoCombo = 0.67
pKi = 7.82
Molecular Shape similarity analysis
Molecular fields
56
Lock Key
Ligand-Protein complex
+
Hermann Emil Fischer
Ligand-to-protein docking :
Lock-and-key paradigm
Selected in silico designed compounds that were synthesized
and successfully tested for bioactivity
G. Schneider J Comput Aided Mol Des (2012) 26:115–120
Chemoinformatics: areas of application
- Drug design (pharmacodynamics and pharmacokinetics),
- Prediction of physico-chemical properties,
- Materials design,
- Synthesis design,
- Molecular spectra simulations
Chemoinformatics: perspectives
60
Assessment of biological activity
61
Assessment of side effects
62 See review by D. Rognan, British Journal of Pharmacology (2007), 1–15
Chemoinformatics : Complexity challenge
P. Csermely1 et al. Pharmacology & Therapeutics, 2012
64
Day 1: Databases
Veli-Pekka Hyttinen
Timur Madzidov
Gilles Marcou Dragos Horvath
Chemical Databases: Encoding, Storage and Search
of Chemical Structures
SciFinder - The choice for chemistry research
Tutorial with ChemAxon
Day 2: QSAR
Igor Tetko
Igor Baskin
Obtaining, Validation and Application of SAR/QSAR Models
SAR/QSAR Modelling: state of the art
Tutorial with OChem
Alex Tropsha
ADMET Predictions
Day 3: virtual screening in 3D
Conformational Sampling
Pharmacophore and Its Applications
Tutorial with LigandScoute
Molecular Docking Methods
Gilles Marcou
Dragos Horvath
Thierry Langer
Sharon Bryant
Gilles Marcou Dragos Horvath
Tutorial with LeadIt
Day 4: Drug Design applications
Konstantin Balakin
Vladimir Poroikov
Computational Mapping Tools for Drug Discovery
Drug Design & Discovery in Academia
staff
Invited Professors
at UniStra Invited Lecturers
Visiting scientists
Visiting friends