Basic Principles & Methodologies of 2D & 3D QSARs
Transcript of Basic Principles & Methodologies of 2D & 3D QSARs
Basic Principles & Methodologies of
2D & 3D QSARs
Dr. A. K. Yadav
Assistant Professor-Chemistry
Maharana Pratap Govt. P.G. College, Hardoi
Drug Discovery Research: Paradigm shifts Molecular Biology
Target ID
Limited number of genes
Molecular biology techniques
Target validation
Cell and Tissue culture
Mouse knokouts
Target Identification
Cell and Tissue culture
Mouse knokouts
Discovery
Syntheses & Screening
Parallel synthesis for library design
Assay development for HTS
High Throughput Screening (HTS)
Chemical Optimization
Bench synthesis
Parallel synthesis
Development
Pre-clinical
Animal testing
Clinical
Patient trials
Post
-ge
nomic e
ra
Pre-ge
nomic e
ra
Presecpetives from Boston consulting group
Target validation
Large numbers of genes
Industrialized techniques
(eg. Gene chip expression)
Bioinformatics
(e.g. database searches for
homologies)
Biology
Syntheses & Screening
Structure biology (target structure)
SAR profiling of library (library design)
Assay development for LTS
Virtual screening and LTS
Chemical Optimization
In-silico supported bench synthesis
In-silico early ADME/tox
Discovery
Pre-clinical (ADME/tox)
Animal testing
In silico ADME/tox
In vitro toxicology
Surrogate markers
Patient trials
Surrogate markers
Clinical
Development
Pre-genomics
Post-Genomics
Chemical Genomics
Based approach
Genetics Based
Approach
Pre-genomics
Post-Genomics
Genetics Based
Approach
Chemical Genomics
Based approach
Weighted average of approaches across value chain
Time to Drug
880
780
435
14.7
6-8
11.3
15.8
REALIZABLE RESULTS BASED ON DISCOVERY APPROACH
0
0
200
10
20
400 600 800 1000
5 15
Data from Boston consulting group
Cost to Drug
($M)
Time to Drug
(Years)
Drug Discovery : Cost to Drug & Time to Drug
~ 300
The evolution of Drug Research
Time Materials Test systems
Ancient time Plants, venoms, minerals (Natural
Products)
Humans
-1806 Morphine
- 1850 Chemicals
- 1890 synthetics, dyes animals
- 1920 synthetics, dyes animals, isolated organs
- 1970 synthetics, dyes enzymes, membranes
- 1990 Combinatorial libraries
Human proteins, HTS
- 2000 onwards … Focused libraries uHTS, virtual screening
Overview of lead identification (Current Paradigm)
Virtual chemistry space: 10100
Easy to handle: 1010
General Screening Library: 106
Focused Library: 104
Lead candidates: 5
Drug-Likeness Filter
Combi- Chemistry
HTS, QSAR, Docking & Scoring
Optimization ADME/Tox
(Adsorption,
Distribution,
Metabolism,
Excretion)
Issues in lead identification & optimization
HTS/ Combichem Virtual screening
Robust essay/ screening protocol
Establishment takes time and money
Relatively easy
Diversity Relatively low diversity High diversity achieved by design
Drug like compounds Time-consuming Filtered out using
ADME parameters
Amount of compound The synthesized amount of compound is very less and high costs are involved in synthesis and replenishment
Virtual molecules, bottomless resources
In silco Drug Discovery - Virtual screening
Virtual screening is the use of high-performance computing to analyze
large databases of chemical compounds in order to identify possible drug
candidates. It is a technology that complements current advances in high-
throughput chemical synthesis and biological assay
Target structure-based approaches
Protein-ligand docking
Active site-directed pharmacophores
Molecular fingerprints
Keyed 2D fingerprints (each bit position is associated
with a specific chemical feature)
Hashed 2D fingerprints (properties are mapped to
overlapping bit segments)
Multiple-point 3D pharmacophore fingerprints
Virtual screening methods
Compound classification techniques
Cluster analysis
Cell-based partitioning (of compounds into sub-sections
of n-dimensional descriptor space)
3D/4D-QSAR models
Statistical methods
Binary QSAR/QSPR
Recursive partitioning
Molecule-based queries
2D substructures
3D pharmacophores
Complex molecular descriptors (eg, electrotopological)
Volume- and surface-matching algorithms
Virtual screening methods continued…
Protein-ligand docking
o The most promising route available for determining which
molecules are capable of fitting within the very strict
structural constraints of the receptor binding site and to
find structurally novel leads.
o The most valuable source of data for understanding the
nature of ligand binding in a given receptor
Active site-directed pharmacophores
o A Pharmacophore based method along with the
utilisation of the geometry of the active site for enzyme
inhibitors, represented by 'excluded volumes'
features,
o Produces an optimised pharmacophore with
improved predictivity compared with the
corresponding pharmacophore derived without
receptor information
Pharmacophore
Excluded volumes Greenidge et. al. J Med Chem. 1998, 41, 2503
Virtual screening: Target structure based approaches
Problems: High through-put screening & docking
HTS Compounds are tested rapidly and inhibition/ activation can be missed due to faults in the assay Some compounds appear to inhibit due to interference with properties of the assay
In general, both HTS and docking suffer from false positives and false negatives
Docking The scoring functions are inaccurate, the sampling of conformational states is crude, and solvent-related terms are typically, ignored. However it can reliably screen out compounds that do not fit in a binding site or that have wrong electrostatic potential. It can be used for virtual compounds, that have not been synthesized already
Bottlenecks in virtual screening: Target structure based approaches
500 current drug targets
Enzymes 28%
Receptors (GPCRs)
45%
Nuclear receptors
2% DNA 2%
Hormones & factors
11%
Unknown 7%
Ion channels
5%
Among the current drug targets, structures are known only for a small group ( ~10 %)
The Human Genome project is expected to yield over 100,000 proteins, out of which some 5,000 new targets are predicted to be found - with unknown structures !
Human genome ~
100,000 proteins
Druggable
Genome
~10,100
Disease
modifying
genes
~10,100
Drug
Targets
~ 5,000
The General QSPR/QSAR Problem
MOLECULAR STRUCTURES PROPERTIES X
Usually not available; Lack of target
structures
Representation (0D – 3D)
STRUCTURAL DESCRIPTORS
Variable Selection and Modelling
De-Nevo approach
Basic Assumption: Molecular structure determines its property
Alternatives in virtual screening: Ligand- based approaches
Galileo Galilee: In order to introduce order into the universe man must pay attention to the quantitative aspects of his surroundings and discover the existing mathematical relationships between them.
Crum - Brown and Frazer (1865 – 70): postulated that biological response (BR) of a molecule is a function of its chemical constitution (C)
Biological Response = Function(Chemical Constitution)
B = ( C )
Richet (1893) demonstrated that activity of narcotics was inversely related to their water solubility.
Overton and Meyer (1890's) Related Tadpole narcosis activity to the octanol/water partitioning of the chemicals
Ferguson Postulated that Thermodynamics principles could be related to drug activities.
Equilibrium between bound and free state is related to the energies associated
with the binding interaction and desolvation of the drug and the receptor.
Motivation and Historical Aspects of QSAR
Historical Aspects of QSAR
1964- Hansch and Fujita
- Fragment and additive group contribution theory - The hypothesis was that substituent on a parent molecule have a quantitative relationship with biological activity - Treated biological activity with techniques of physical organic chemistry - Used electronic substituents - Hammett’s s and p
1964 - Free and Wilson
Related biological activity to the presence/absence of a specific functional group at a specific location on the parent molecule
Activity = A + ijGijXij
log 1/[C] = A(logP) - B(logP)2 + C(Es) + D(rs) + E + ...
A was defined as the average biological activity for the series, Gij the contribution to activity of a functional group i in the jth position and Xij the presence (1.0) or absence (0.0) of the functional group i in the jth position
Spencer M. Free and James W. Wilson, J. Med. Chem., 7, 395 (1964).
Hansch, C.; and Fujita T.; J.Am.Chem. Soc. 1964, 86, 1616
1972 - John Topliss
– Topliss Tress, non-statistical method to automate the Hansch approach
1979 - Marshall
- Active Analog Approach - extended the 2-D approach to QSAR by explicitly considering the conformational flexibility of a series as reflected by their 3-D shape; implemented in the software Sybyl
1980 - Hopfinger and co-workers
-molecular shape analysis -Related the shape-dependent steric and electrostatic fields for molecules to their biological activity
1988 - Richard Cramer
-CoMFA (Comparative Molecular Field Analysis) -related the shape-dependent steric and electrostatic fields for molecules to their biological activity -Introduced a new method of data analysis: PLS (Partial Least Squares) and cross-validation, to develop models for activity predictions
Historical Aspects of QSAR
John G. Topliss, J. Med. Chem., 15, 1006 (1972).
G.R. Marshall, C.D. Barry, H.E. Bosshard, R.A.. Dammkoehler and D.A. Dunn, ACS Symposia, 112, American Chemical Society, Washington D.C., 1979
Anton J. Hopfinger, J. Amer. Chem. Soc., 102, 7196 (1980).
Richard D. Cramer III, David E. Patterson and Jeffrey D. Bunce, J. Amer. Chem. Soc., 110, 5959 (1988).
1988 – Arthur M. Doweyko
- HASL: The Hypothetical Active Site Lattice
- a superposed set of molecules into a set of regularly-spaced points (lattice) defined by Cartesian coordinates (x,y,z) and atom type
biological activity = F (Lattice points)
1990 – V. E. Golender and A. B. Rozenblit
-APEX-3D expert system -Relates biological activity to biophoric (pharmacophoric) and secondary sites using statistical techniques and 3-D pattern matching algorithms
1994 - Greene, J.; Kahn, S.; Savoj, H.; Sprague, P.; Teig, S
- derives a hypotheses which consist of a set of generalized chemical functions (regions of hydrophobic surface, hydrogen bond vectors, charge centers, or other user-defined features) at specified relative positions
Historical Aspects of QSAR
A.M. Doweyko, J. Med. Chem. 1988, 31, 1396-1406.
V. E. Golender and A. B. Rozenblit, Drug Design, Vol IX, Academic Press (1980). , V. E. Golender and A. B. Rozenblit, Research Studies Press, UK (1983). , V. E. Golender and E. R. Vorpagel, ESCOM Science Publishers, Netherlands (1993).
Greene, J.; Kahn, S.; Savoj, H.; Sprague, P.; Teig, S. J. Chem. Inf. Comput. Sci., 1994, 34, 1297-1308.
- Genetic Function Approximation (GFA)
- Genetic algorithim for variable selection and MARS approach for data analysis
1994 - D. Rogers and A. J. Hopfinger
1994- Klebe, G., Abraham, U., Mietzner, T
- Comparative molecular similarity indices analysis (CoMSIA) - An alternative approach to the computation of molecular potential fields - The indices replace the distance functions of the Lennard-Jones and Coulomb-type potentials with Gaussian-type functions
- hologram QSAR
- uses an extended form of fingerprint, known as a Molecular Hologram, which encodes more information which implicitly encodes 3D structural information
2000- Tripos
Historical Aspects of QSAR
US patent 6208942 & US patent 5751605
D. Rogers and A. J. Hopfinger, J. Chem. Inf. Comp. Sci., 34, 854-866 (1994).
Klebe, G., Abraham, U., Mietzner, T., J. Med. Chem. 1994, 37, 4130-4146.
Classical 2D QSAR
Started in 1964 independently by two groups,
Basis : D/S + E/R [D:R or E:S] ~ ~ ~ ~ ~ Biological Response(BR)
BR [D:R] D GDR, D Go log(1/[D]) log(1/c)
D BR = D Gh + DGe + D Gs + D Gc +Constant,
log (1/c) = D Gh + DGe +D Gs + D Gc +Constant
log I/C = a (log P)2 + b log P + c s + dEs + constant
Basis : The mathematical contribution of chemical (substituent) to structure activity studies
log I/C = a i + ,
Basis: Part ‘A’ in which physicochemical parameters used in Hansch analysis are used and part ‘B’ of
indicator variables based on the assumption of Free-Wilson
log I/C = a1 (log P)2 + a2 (log P) + a3 s + a4 Es + a 5 I 1 + a 6 I 2 + a 7 + constant
A B
Hansch analysis (LFER approach)
Free Wilson Analysis
Combined Approach:
Steps in QSAR
Structure Entry &
Molecular Modeling
Descriptor
Generation
Feature
Selection Construct
Model
Model
Validation
Steps in QSAR: Structure Entry & Molecular Modeling
Structures are sketched using standard drawing softwares
- commercial or freewares
Molecular modelling for the generation of low energy conformation
Ab intitio Semi-empirical Molecular mechanics
Very small molecules Highly accurate High computational costs Software: Gaussian Zindo DGauss Dmol
Medium sized molecules Accurate but computationally Intensive Software: MOPAC and AMPAC MNDO, AM1, PM3 and PM5 Hamiltonians
No restriction on size Accurate with proper Conformational analysis Software: InsightII, Sybyl Hyperchem CVFF, Tripos, CharmM and other forcefields
Accuracy
Time
Steps in QSAR: Descriptor Generation
Physicochemical properties that describe some aspect of the chemical structure
Empirical descriptors Theoretical descriptors
Determined experimentally Melting point, NMR chemical shift, Hanschs’ physicochemical parameters Viz. logP, MR, Pka,, s, ..
Calculated (theoretical) Topological, BCUT, ….
logP = Corg/ Caqu
pX = log PR-X – log PR-H
sx = log KX – log KH
Es = log KR- log KH
DG = -RT ln K
logK = -DG/RT ln10
Some of the experimentally determined parameters
Steps in QSAR: Descriptor Generation
Theoretical Descriptors
0D – Descriptors
Constitutional
1D – Descriptors
Most indicator variables: Functional groups, atom centered
fragments, empirical descriptors & properties
2D – Descriptors
topological, molecular walk counts, BCUT, Galvez. topo.
2D auto.
3D – Descriptors
aromatic, molecular profiles, geometric, radial density
functions, 3D-Morse, WHIM, GETAWAY and Quantum
Chemical descriptors
Information content
Time
Steps in QSAR: Descriptor Generation
0D Descriptors
Simplest, Includes properties and numbers atoms
1D Descriptors
Includes substructures, hydrophobicity and MR
Hydrophobic parameters
The most commonly used parameters
are p, log P, and RM
Molar refractivity, MR Proposed by Pauling and Pressman
Parameter for the correlation of dispersion forces in the
binding of haptens to antibodies
Determined from the refractive index, n, the molecular
weight, MW and the density of a crystal, d. Equation for the
molar refractivity
Steps in QSAR: Descriptor Generation
2D Descriptors Topological descriptors
The structures of compounds can be represented as graphs, a field of mathematics.
CH3
CH3
CH3
Isopentane Graph Representation
edge
node
Isopentane = five nodes, four edges, and the adjacency
relationships implicit in the structure
Theorems of Graph theory graph invariants
(topological
descriptors)
Examples
• Atom counts
• Molecular connectivity indices
• Substructure counts
• Molecular weight
• Weighted paths
• Molecular distance edges
• Kappa indices
• Electrotopological state indices
Ex: calculation of Path 1 molecular connectivity, 1 = 1/sqrt(mn), m and n are the degrees of
adjacent vertices
Thus 1(2-methylbutane) = 1/ sqrt(1.3) + 1/ sqrt(1.3) + 1/ sqrt(3.2) + 1/ sqrt(2.1) = 2.27
Steps in QSAR: Descriptor Generation
3D Descriptors
Encode the 3-D aspects of the structures
Moments of inertia, solvent accessible surface area, length-to- breadth ratios, shadow areas, gravitational index.
Quantum Chemical Descriptors
Encode aspects of the structures that are related to the electrons. Electronic descriptors include the following: partial atomic charges, HOMO or LUMO energies, dipole moment.
Geometric Descriptors
Molecular field analysis (MFA) descriptors
-evaluate the energy between a probe and a molecular model at a series of points defined by a rectangular or spherical grid -Energies may be added to the study table to be used as independent X variables
Steps in QSAR: Descriptor Generation
Receptor surface analysis descriptors -Calculates energy of interaction between each point on the receptor surface and each model to the study table -Filtering methods available to reduce the input to the study table
Pharmacophoric descriptors
Used for the calculation of the 3D fingerprint or hypothesis for the molecules Calculates all possible combinations of 2-10 features in 3D space for all conformers Possible features considered are – Negative and positive charges, negative and positive ionizable groups, hydrogen bind donor and acceptors, hydrophobic groups and aromatic rings
Feature Selection
Objective: Identify the best (information rich and as small as possible) subset
of descriptors
Go through two steps (I) Objective (2) subjective
Objective (Independent variables only)
Subjective (Use dependent variable)
Correlations Identical tests
Vector-space desc. analysis
Interactive regression analysis Simulated annealing Genetic algorithm
Partial Least Squares Principle component analysis
Steps in QSAR: Feature Selection and Modelling
Subjective Feature Selection
Hill-climbing and other gradient-type methods: perform comparatively
poorly on these types of problems.
Simulated annealing :has proven to be a thorough but not efficient,
however, on larger problems this method is penalized by its lack of
efficiency, often failing to converge to the global optimum within feasible
computing time
Tabu search : identifiable methodological flaws which make it unsuitable
for these types of problems
Genetic algorithms : as general methods which appear both thorough and
efficient, often yielding excellent results when applied to complex
optimization problems where other methods are either not applicable or
turn out to be unsatisfactory
Steps in QSAR: Feature Selection and Modelling
Subjective Feature Selection: Difficult Problem Spaces
Hills Plateaus
Rippled Bryce Canyon Levy's 'Egg Carton'
Multi-dimensional modeling, involves exploring mathematical landscapes that are non-smooth with
cliffs and discontinuities in the response surface, have practical constraints on the available options,
and feature multiple local optima which can trap these methods.
Hypercubes
Steps in QSAR: Feature Selection and Modelling
Subjective Feature Selection: Search Space
The space of all feasible solutions (the set of solutions among which the
desired solution resides) is called search space (also state space).
Each point in the search space represents one possible solution and can be
"marked" by its value (or fitness) for the problem.
Looking for a solution is then equal to looking for some extreme value (minimum
or maximum) in the search space.
The problem is that the search can be very complicated as one may not know
where to look for a solution or where to start.
Some of these methods that search and find suitable solutions are hill climbing,
tabu search, simulated annealing and the genetic algorithm.
Global minima
Global maxima
Steps in QSAR: Feature Selection and Modelling
Traditional techniques in feature selection
MLR : Multiple linear regression analysis
• Simple, linear models
• In case of large number of independent variables, the search space
becomes large and requires the use of variables selection
techniques (search techniques)
• The variables may be inter-correlated among each other resulting in
a over-fitting model.
PCA : Principle component analysis
• Introduced to address the problem of intercorrelation and
reduction in the number of variables
• Inter-correlated variables are extracted as components and the
extracted components become the new variables for regression
analysis
PLS : Partial least squares
• Modification of the PCA technique, where the dependent
variables are also extracted into a new component as to
maximize the correlation with the extracted components
• Has an additional advantage of modeling multiple dependent
parameters
Steps in QSAR: Feature Selection and Modelling
MLR based QSAR model development
MLR is the simplest paradigm for QSAR model development making the interpretation
of the model easier. However QSAR modeling where a large number of variables
are present is like searching a multi-dimensional search space like the hypercube
shown below Hence the successful development of
MLR model depends on a effective
search protocol.
Kubinyi Quant. Struct.-Act. Relat (1994)
Traditional techniques in feature selection
Steps in QSAR: Feature Selection and Modelling
Exhaustive search : All possible variable subset combinations of all p variables (p!(p-m)!)
are examined exhaustively to identify the best ‘m’ variables.
It is practically impossible due to the high computational expenses
particularly in case of large values of p and m.
Forward selection: Incremental addition of (p-1) variables is made initially to a model with
one variable giving the lowest cost function out of a total of p variables so
as to further minimize the cost function.
The major drawback is that it often ends up in local minima because it
fails to discover information about the combined affect of sets of features.
Backward selection: Which operates in the opposite direction starting with the elimination of
the least significant variable for the total of p variables
The major limitation is its applicability to data sets with p < n-1.
Stepwise multiple regression: A combination of forward selection of significant variables
and backward elimination of variables below a certain significance level.
However this most widely used method also often ends up in local
minima
Classical strategies for MLR models
Steps in QSAR: Feature Selection and Modelling
Randomly create an initial population
Are the optimization Criteria met ?
Evaluate the Fitness function
Selection
Crossover
Mutation
Best Individuals
Start Result Genetic Operators
Yes
No
Crossover Mutation
Flowchart of genetic algorithms
Recent techniques in feature selection: Hybrid GA-MLR
Steps in QSAR: Feature Selection and Modelling
Recent strategy for MLR model: Hybrid GA-MLR
The genetic algorithm (GA) introduced by John Holland
It is a search paradigm inspired by natural evolution where the variables are
represented as genes on a chromosome (model)
It is similar to simplex optimization and evolves a group of random initial models
(population) with fitness scores and searches for chromosomes with better fitness
functions (response function scores) through natural selection and the genetic
operators, mutation and recombination
The natural selection guarantees the propagation of chromosomes with better fitness in
future populations
The GA combines genes from two parent chromosomes using the genetic
recombination operator to form two new chromosomes (children) that have a high
probability of having better fitness that their parents and also explore new response
surface (local optima) through mutation.
The GAs offer a combination of hill-climbing ability (natural selection) and a stochastic
method (recombination and mutation) are very flexible because they optimize on a
representation of variables not the variables themselves.
In addition the GAs provide efficient optimization as they use implicit parallelism to
process information quickly and require fewer response function evaluations than other
automated numerical optimization algorithms.
Steps in QSAR: Feature Selection and Modelling
Steps in QSAR: Model Validation
Selection and Validation of QSAR models
The selection and validation of the QSAR model for virtual screening is of utmost
importance and should confer to the following recommendations-
Careful selection of independent variables
Significance of the variables (Statistical parameters)
Principle of parsimony (Occam’s razor)
Minimum number of compounds per variable
Importance of the model that corroborates with known biophysical data.
Classical criteria for predictive ability of QSAR model
Correlation coefficient r (relative quality of fit)
r = sqrt[1 – Σ(ycalc-yobs)2 / Σ (yobs - ymean)] Standard deviation s (absolute quality of fit)
s = sqrt( Σ(ycalc-yobs)2 /(n - k - 1) F test (Fisher value; level of statistical significance)
F = r2.(n - k - 1)/(k.(1 - r2)) Q² squared cross-validation, regression coefficient
Q2 = 1 – [Σ(ypred-yobs)2 / Σ (yobs - ymean)2]
sPRESS standard deviation of cross-validation predictions
s = sqrt( Σ(ypred-yobs)2 /(n - k - 1)
Measure of internal Predictivity Describes how well the data are fitted
Measure of external predictivity
Steps in QSAR: Model Validation
Current criteria for predictive ability of QSAR model
Cross Validation is an external validation method where all the calibration objects are used. It
seeks, like the prediction testing, to validate the model on independent test data and hence
was widely accepted as the measure of predictive ability of the model.
However Cross Validation fails for designed data sets and for small data sets. Thus R2 (LOO
Q2) appears to be the necessary but not the sufficient condition for the model to have high
predictive power
Hence the method which gives the most correct estimate of the predictive ability of the model
is the external test Set Validation
The test set must also full-fill the following
The test set should be 25-50% of the total data set.
The test set must be representative for the future samples.
The range of the test set should match that of the training set.
Golbraikh and Tropsha (2002) Journal of Molecular Graphics and Modelling, 20, 269-276.
Wold, et. Al. (1983). Proc. Conf. Matrix Pencils, (A. Ruhe and B. Kågström, eds.), March 1982. Lecture Notes in Mathematics
973, Springer Verlag, Heidelberg, pp 286-293
Wold, S. Technometrics, 20 (1978) p 397.
Steps in QSAR: Model Validation
3D QSAR
The use of classical QSAR was expanded during the sixties as a means of
correlating observed activity to physicochemical properties.
However, there are many areas where these techniques could not be used or
where they failed to provide useful correlations because of their limitations of
non-consideration of 3-dimensional geometry of molecules (different stereo-
isomers/ enantiaomers) or molecules from non-congeneric series or molecules
acting through different mechanisms.
Some of these problems were addressed by extensions to the Hansch method
in combined approach and the development of alternative approaches to
QSAR.
Basis of Hypothetical Active Site Lattice (HASL) approach
HASL: Hypothetical Active Site Lattice approach for 3D-QSAR
HASL type (H) Definitions MM2a
H
Atom
Type
MM2
H
Atom
Type
1
0
C
Sp3 alkane
15
+1
S
Sulfide
2
0
C
Sp2 alkene
16
-1
S+
Sulfonium
3
-1
C
Sp2 carbonyl
17
-1
S
Sulfoxide
4
0
C
Sp acetylene
18
-1
S
Sulfone
5
0
H
hydrogen
19
0
Si
Silane
6
+1
O
COH, COC
21
-1
H
OH, alcohol
7
+1
O
C=O carbonyl
22
0
C
Cyclopropyl
8
+1
N
Sp3
23
-1
H
NH, amine
9
+1
N
Sp2
24
-1
H
COOH, carboxyl
10
+1
N
Sp
25
+1
P
Phosphine
11
+1
F
Fluoride
26
-1
B
Trigonal boron
12
+1
Cl
Chloride
27
0
B
Tetrahedral boron
13
+1
Br
Bromide
28
-1
H
Vinyl hydrogen
14
+1
I
Iodide
37
+1
N
Imine Nitrogen
An expert system developed to represent, elucidate, and utilize knowledge on
structure-activity relationships.
Can be used to build 3D-SAR and 3D-QSAR models which can be used for activity
classification and prediction.
Emulates the intelligence of a researcher engaged in establishing relationships
between a compound's structural parameters and its activity
The corner-stone of the Apex-3D methodology is automated identification of
biophores (pharmacophores) .
These biophores can be used for building qualitative activity prediction rules and for
creating search queries to identify new leads in a 3D-database.
Identified biophores can be used as starting points for constructing 3D-QSAR
models when good quantitative data is available.
Combination of a 3D pharmacophore with a quantitative regression equation is
unique to the Apex-3D approach
Apex – 3D: Pharmacophore and 3D QSAR analysis program
A descriptor center represents a part of the hypothetical biophoric moiety capable of interacting with
a receptor.
Descriptor centers can be either atoms or pseudoatoms which can participate in ligand-receptor
interactions based on the following types of physical properties:
Electrostatic interactions
Hydrogen bonds
Charge-transfer complexes
Hydrophobic interaction
van der Waals (or London) dispersion forces
Biophore Identification Algorithm
Automated identification of biophores in Apex-3D incorporates the following elements:
1.Structural Elements:
pharmacophoric centers which interact with receptors,
electronic and structural indexes quantifying ligand-receptor interaction effects,
distance relationships between pharmacophoric centers forming unique recognizable
patterns.
2. Statistical Criteria: assessing the probability of correct activity prediction for compounds
possessing a certain biophore.
Chemical Structure Representation
Apex – 3D: Pharmacophore and 3D QSAR analysis program
Structure input-2D/3D molecule editor
Force field selection and assignment of the charges
Optimization - Discover module
MOPAC
Electronic
parameters
Simplified modules
Non-electronic parameters
ACC_01
DON_01
Charge
HOMO
LUMO
Pi-popultion
Formal_charge
Hybrid_type
LP
Hydrophobicity
Hydro_region
Refractivity
Automated molecular superimposition and identification of the biophore
Building and selecting 3D QSAR model using the biophore as a template
Validation of the selected models - Test set predictions
- Statistical criteria
Flow Chart of the methodology
Calculation of the
parameters
Apex – 3D: Pharmacophore and 3D QSAR analysis program
Easy to use interface
Upto 255 conformers per molecule - energy range 20kcal/mol
Models covering as much conformational space as possible
3D hypotheses explain variability of K m , K i - measure of the
properties of the active site
Related to structural features e.g. HBA, Hydrophobe etc. Catalysts Chemical functions
Catalyst: Predictive hypothesis & common feature hypothesis
Flow Chart of the methodology
Structure input-2D/3D molecule editor
Optimization - CHARMm forcefield
Generation of conformational models
Poling Algorithm
Best method Fast method
Generation of Hypothesis
HypoGen - Generates
Pharmacophores which can
variations in activity.
Structure activity analysis is based
on the fact that inactive molecules
cannot map to all the features of the
hypothesis while active molecules
can and hence estimated to be
active.
HipHop - Produces common feature
hypothesis, where relative activity is
not taken into account
Chemically diverse and flexible
molecules can be aligned on a wide
range structural features(HBA, HBD)
hydrophobic regions and user defined
regions. These alignments can be
used as starting points for 3D QSAR
studies
Catalyst: Predictive hypothesis & common feature hypothesis
(CoMFA): Comparative Molecular Field Analysis for 3D-QSAR
Basis of CoMFA
Interactions responsible for binding are usually noncovalent in nature
Treatment of noncovelant (nonbonded) interactions using only steric and electrostatic forces
can account for a variety of molecular properties
Richard Cramer proposed that biological activity could be analyzed by relating the
shape-dependent steric and electrostatic fields for molecules to their biological activity
CoMFA Approach
Define alignment rules for the series which overlap the putative pharmacophore for each
molecule; the active conformation and alignment rule must be specified
Each molecule is fixed into a three-dimensional grid by the program and the electrostatic and
steric components of the molecular mechanics force field, arising from interaction with a probe
atom (e.g., an SP3 C atom), are calculated at intersecting lattice points within the 3-D grid
The equations which result from this exercise have the form
Act1 = Const1 + a1(stericxyz) + b1(stericxyz) + ... + a'1(estaticxyz) + b'1(estaticxyz) + ...
Act2 = Const2 + a2(stericxyz) + b2(stericxyz) + ... + a'2(estaticxyz) + b'2(estaticxyz) + ...
Actn = Constn + an(stericxyz) + bn(stericxyz) + ... + a'n(estaticxyz) + b'n(estaticxyz) + ...
In CoMFA, molecules are
represented and compared by
their steric and electrostatic
fields sampled at the
intersections of one or more
lattices (or grids, or boxes)
spanning a three-dimensional
region.
Thus each CoMFA descriptor
column of a QSAR MSS
contains the magnitudes of
either the steric or electrostatic
field exerted by the atoms in the
tabulated molecules on a probe
atom located at a point in
Cartesian space
(CoMFA): Comparative Molecular Field Analysis for 3D-QSAR
The contours of the steric map are shown in yellow and green,
and those of the electrostatic map are shown in red and blue.
Greater values of 'Bio-Activity Measurement' are correlated with:
more bulk near green;
less bulk near yellow;
more positive charge near blue,
more negative charge near red
Steric and Electrostatic Maps
(CoMFA): Comparative Molecular Field Analysis for 3D-QSAR
CoMFA, CoMSIA and Adv. CoMFA
Tripos Standard( Steric & electrostatic) Steric fields Electrostatic fields Hydrophobic fields Hydrogen bond acceptor fields Hydrogen bond donor fields Steric & Electrostatic fields Hydrogen bond acceptor & donor fields
CoMISA fields CoMFA field
Molecular Interaction Fields calculation
Four different CoMFA fields and seven different CoMSIA fields were generated The PLS algorithm was used to relate these fields to the Histamine H1 antagonistic activity
Hydrogen bonding Fields
Indicator fields
Parabolic fields
Adv. CoMFA fields
(CoMFA): Comparative Molecular Field Analysis for 3D-QSAR
Shapes of various
functions
CoMFA calculates steric fields using a Lennard-
Jones potential, and electrostatic fields using a
Coulombic potential
Both potential functions are very steep near the
van der Waals surface of the molecule, causing
rapid changes in surface descriptions
Further scaling factor is applied to the steric field
Steric fields, Lennard-Jones potential
E = r + r
r - 2
r + r
rjk = 1
natoms probe k
ij
12probe k
jk
6
Electrostatic fields, Coulomb potential
E = q q
rjprobe k
jkk = 1
natoms
CoMSIA fields: similarity indices
CoMFA & CoMSIA standard
(Steric and Electrostatic fields
Probe atom = SP3 hybridized C+
Effective Vander Waals radii = 1.53
Charge = +1
CoMSIA Hydrophobic fields
the atomic values are directly
based on the research of
Viswanadhan et.al
Probe
atoms
CoMFA indicator fields
Created by converting
continuous data to discreet
Adv
CoMFA
CoMFA parabolic fields
Created by squaring the original
field at each lattice, but retaining
the sign of the original field
Adv
CoMFA
CoMFA & CoMSIA (donor & acceptor)
hydrogen bonding field
Probe atom = H2O
Effective Vander Waals radii = 1.7 – 1.8 A
Charge = 0
Adv
CoMFA
HQSAR - HQSAR works by identifying patterns of substructural fragments relevant to biological
activity in sets of bioactive molecules.
Differs from other similar concepts like maximal common sub-graph algorithms and the
Stigmata algorithm, in that HQSAR yields a predictive relationship between structural features
(descriptors) in the dataset and biological activity using PLS
Descriptors
A molecular hologram is an array containing
counts of molecular fragments The process of hologram generation
HQSAR Overview
Advantages of HQSAR:
2D approach, does not require 3D modeling and alignment and yet has
3D information in it.
Hologram QSAR
Conclusion
The development of QSARs in last 40 years has evolved both in
terms of descriptor generation and data analysis augmented with
improved performance of computers for simulation and 3D
visualization. It has reached to a stage where it can be used as an
alternative for both lead identification and optimization. It
provides powerful tool for virtual screening and can complement
well with the current techniques of combinatorial chemistry and
high throughput screening in drug discovery research.
Thank you