Basic Principles & Methodologies of 2D & 3D QSARs

Basic Principles & Methodologies of

2D & 3D QSARs

Dr. A. K. Yadav

Assistant Professor-Chemistry

Maharana Pratap Govt. P.G. College, Hardoi

Drug Discovery Research: Paradigm shifts Molecular Biology

Target ID

Limited number of genes

Molecular biology techniques

Target validation

Cell and Tissue culture

Mouse knokouts

Target Identification

Cell and Tissue culture

Mouse knokouts

Discovery

Syntheses & Screening

Parallel synthesis for library design

Assay development for HTS

High Throughput Screening (HTS)

Chemical Optimization

Bench synthesis

Parallel synthesis

Development

Pre-clinical

Animal testing

Clinical

Patient trials

Post

-ge

nomic e

ra

Pre-ge

nomic e

ra

Presecpetives from Boston consulting group

Target validation

Large numbers of genes

Industrialized techniques

(eg. Gene chip expression)

Bioinformatics

(e.g. database searches for

homologies)

Biology

Syntheses & Screening

Structure biology (target structure)

SAR profiling of library (library design)

Assay development for LTS

Virtual screening and LTS

Chemical Optimization

In-silico supported bench synthesis

In-silico early ADME/tox

Discovery

Pre-clinical (ADME/tox)

Animal testing

In silico ADME/tox

In vitro toxicology

Surrogate markers

Patient trials

Surrogate markers

Clinical

Development

Pre-genomics

Post-Genomics

Chemical Genomics

Based approach

Genetics Based

Approach

Pre-genomics

Post-Genomics

Genetics Based

Approach

Chemical Genomics

Based approach

Weighted average of approaches across value chain

Time to Drug

880

780

435

14.7

6-8

11.3

15.8

REALIZABLE RESULTS BASED ON DISCOVERY APPROACH

0

0

200

10

20

400 600 800 1000

5 15

Data from Boston consulting group

Cost to Drug

($M)

Time to Drug

(Years)

Drug Discovery : Cost to Drug & Time to Drug

~ 300

The evolution of Drug Research

Time Materials Test systems

Ancient time Plants, venoms, minerals (Natural

Products)

Humans

-1806 Morphine

- 1850 Chemicals

- 1890 synthetics, dyes animals

- 1920 synthetics, dyes animals, isolated organs

- 1970 synthetics, dyes enzymes, membranes

- 1990 Combinatorial libraries

Human proteins, HTS

- 2000 onwards … Focused libraries uHTS, virtual screening

Overview of lead identification (Current Paradigm)

Virtual chemistry space: 10100

Easy to handle: 1010

General Screening Library: 106

Focused Library: 104

Lead candidates: 5

Drug-Likeness Filter

Combi- Chemistry

HTS, QSAR, Docking & Scoring

Optimization ADME/Tox

(Adsorption,

Distribution,

Metabolism,

Excretion)

Issues in lead identification & optimization

HTS/ Combichem Virtual screening

Robust essay/ screening protocol

Establishment takes time and money

Relatively easy

Diversity Relatively low diversity High diversity achieved by design

Drug like compounds Time-consuming Filtered out using

ADME parameters

Amount of compound The synthesized amount of compound is very less and high costs are involved in synthesis and replenishment

Virtual molecules, bottomless resources

In silco Drug Discovery - Virtual screening

Virtual screening is the use of high-performance computing to analyze

large databases of chemical compounds in order to identify possible drug

candidates. It is a technology that complements current advances in high-

throughput chemical synthesis and biological assay

Target structure-based approaches

Protein-ligand docking

Active site-directed pharmacophores

Molecular fingerprints

Keyed 2D fingerprints (each bit position is associated

with a specific chemical feature)

Hashed 2D fingerprints (properties are mapped to

overlapping bit segments)

Multiple-point 3D pharmacophore fingerprints

Virtual screening methods

Compound classification techniques

Cluster analysis

Cell-based partitioning (of compounds into sub-sections

of n-dimensional descriptor space)

3D/4D-QSAR models

Statistical methods

Binary QSAR/QSPR

Recursive partitioning

Molecule-based queries

2D substructures

3D pharmacophores

Complex molecular descriptors (eg, electrotopological)

Volume- and surface-matching algorithms

Virtual screening methods continued…

Protein-ligand docking

o The most promising route available for determining which

molecules are capable of fitting within the very strict

structural constraints of the receptor binding site and to

find structurally novel leads.

o The most valuable source of data for understanding the

nature of ligand binding in a given receptor

Active site-directed pharmacophores

o A Pharmacophore based method along with the

utilisation of the geometry of the active site for enzyme

inhibitors, represented by 'excluded volumes'

features,

o Produces an optimised pharmacophore with

improved predictivity compared with the

corresponding pharmacophore derived without

receptor information

Pharmacophore

Excluded volumes Greenidge et. al. J Med Chem. 1998, 41, 2503

Virtual screening: Target structure based approaches

Problems: High through-put screening & docking

HTS Compounds are tested rapidly and inhibition/ activation can be missed due to faults in the assay Some compounds appear to inhibit due to interference with properties of the assay

In general, both HTS and docking suffer from false positives and false negatives

Docking The scoring functions are inaccurate, the sampling of conformational states is crude, and solvent-related terms are typically, ignored. However it can reliably screen out compounds that do not fit in a binding site or that have wrong electrostatic potential. It can be used for virtual compounds, that have not been synthesized already

Bottlenecks in virtual screening: Target structure based approaches

500 current drug targets

Enzymes 28%

Receptors (GPCRs)

45%

Nuclear receptors

2% DNA 2%

Hormones & factors

11%

Unknown 7%

Ion channels

5%

Among the current drug targets, structures are known only for a small group ( ~10 %)

The Human Genome project is expected to yield over 100,000 proteins, out of which some 5,000 new targets are predicted to be found - with unknown structures !

Human genome ~

100,000 proteins

Druggable

Genome

~10,100

Disease

modifying

genes

~10,100

Drug

Targets

~ 5,000

The General QSPR/QSAR Problem

MOLECULAR STRUCTURES PROPERTIES X

Usually not available; Lack of target

structures

Representation (0D – 3D)

STRUCTURAL DESCRIPTORS

Variable Selection and Modelling

De-Nevo approach

Basic Assumption: Molecular structure determines its property

Alternatives in virtual screening: Ligand- based approaches

Galileo Galilee: In order to introduce order into the universe man must pay attention to the quantitative aspects of his surroundings and discover the existing mathematical relationships between them.

Crum - Brown and Frazer (1865 – 70): postulated that biological response (BR) of a molecule is a function of its chemical constitution (C)

Biological Response = Function(Chemical Constitution)

B = ( C )

Richet (1893) demonstrated that activity of narcotics was inversely related to their water solubility.

Overton and Meyer (1890's) Related Tadpole narcosis activity to the octanol/water partitioning of the chemicals

Ferguson Postulated that Thermodynamics principles could be related to drug activities.

Equilibrium between bound and free state is related to the energies associated

with the binding interaction and desolvation of the drug and the receptor.

Motivation and Historical Aspects of QSAR

Historical Aspects of QSAR

1964- Hansch and Fujita

- Fragment and additive group contribution theory - The hypothesis was that substituent on a parent molecule have a quantitative relationship with biological activity - Treated biological activity with techniques of physical organic chemistry - Used electronic substituents - Hammett’s s and p

1964 - Free and Wilson

Related biological activity to the presence/absence of a specific functional group at a specific location on the parent molecule

Activity = A + ijGijXij

log 1/[C] = A(logP) - B(logP)2 + C(Es) + D(rs) + E + ...

A was defined as the average biological activity for the series, Gij the contribution to activity of a functional group i in the jth position and Xij the presence (1.0) or absence (0.0) of the functional group i in the jth position

Spencer M. Free and James W. Wilson, J. Med. Chem., 7, 395 (1964).

Hansch, C.; and Fujita T.; J.Am.Chem. Soc. 1964, 86, 1616

1972 - John Topliss

– Topliss Tress, non-statistical method to automate the Hansch approach

1979 - Marshall

- Active Analog Approach - extended the 2-D approach to QSAR by explicitly considering the conformational flexibility of a series as reflected by their 3-D shape; implemented in the software Sybyl

1980 - Hopfinger and co-workers

-molecular shape analysis -Related the shape-dependent steric and electrostatic fields for molecules to their biological activity

1988 - Richard Cramer

-CoMFA (Comparative Molecular Field Analysis) -related the shape-dependent steric and electrostatic fields for molecules to their biological activity -Introduced a new method of data analysis: PLS (Partial Least Squares) and cross-validation, to develop models for activity predictions


John G. Topliss, J. Med. Chem., 15, 1006 (1972).

G.R. Marshall, C.D. Barry, H.E. Bosshard, R.A.. Dammkoehler and D.A. Dunn, ACS Symposia, 112, American Chemical Society, Washington D.C., 1979

Anton J. Hopfinger, J. Amer. Chem. Soc., 102, 7196 (1980).

Richard D. Cramer III, David E. Patterson and Jeffrey D. Bunce, J. Amer. Chem. Soc., 110, 5959 (1988).

1988 – Arthur M. Doweyko

- HASL: The Hypothetical Active Site Lattice

- a superposed set of molecules into a set of regularly-spaced points (lattice) defined by Cartesian coordinates (x,y,z) and atom type

biological activity = F (Lattice points)

1990 – V. E. Golender and A. B. Rozenblit

-APEX-3D expert system -Relates biological activity to biophoric (pharmacophoric) and secondary sites using statistical techniques and 3-D pattern matching algorithms

1994 - Greene, J.; Kahn, S.; Savoj, H.; Sprague, P.; Teig, S

- derives a hypotheses which consist of a set of generalized chemical functions (regions of hydrophobic surface, hydrogen bond vectors, charge centers, or other user-defined features) at specified relative positions


A.M. Doweyko, J. Med. Chem. 1988, 31, 1396-1406.

V. E. Golender and A. B. Rozenblit, Drug Design, Vol IX, Academic Press (1980). , V. E. Golender and A. B. Rozenblit, Research Studies Press, UK (1983). , V. E. Golender and E. R. Vorpagel, ESCOM Science Publishers, Netherlands (1993).

Greene, J.; Kahn, S.; Savoj, H.; Sprague, P.; Teig, S. J. Chem. Inf. Comput. Sci., 1994, 34, 1297-1308.

- Genetic Function Approximation (GFA)

- Genetic algorithim for variable selection and MARS approach for data analysis

1994 - D. Rogers and A. J. Hopfinger

1994- Klebe, G., Abraham, U., Mietzner, T

- Comparative molecular similarity indices analysis (CoMSIA) - An alternative approach to the computation of molecular potential fields - The indices replace the distance functions of the Lennard-Jones and Coulomb-type potentials with Gaussian-type functions

- hologram QSAR

- uses an extended form of fingerprint, known as a Molecular Hologram, which encodes more information which implicitly encodes 3D structural information

2000- Tripos


US patent 6208942 & US patent 5751605

D. Rogers and A. J. Hopfinger, J. Chem. Inf. Comp. Sci., 34, 854-866 (1994).

Klebe, G., Abraham, U., Mietzner, T., J. Med. Chem. 1994, 37, 4130-4146.

Classical 2D QSAR

Started in 1964 independently by two groups,

Basis : D/S + E/R [D:R or E:S] ~ ~ ~ ~ ~ Biological Response(BR)

BR [D:R] D GDR, D Go log(1/[D]) log(1/c)

D BR = D Gh + DGe + D Gs + D Gc +Constant,

log (1/c) = D Gh + DGe +D Gs + D Gc +Constant

log I/C = a (log P)2 + b log P + c s + dEs + constant

Basis : The mathematical contribution of chemical (substituent) to structure activity studies

log I/C = a i + ,

Basis: Part ‘A’ in which physicochemical parameters used in Hansch analysis are used and part ‘B’ of

indicator variables based on the assumption of Free-Wilson

log I/C = a1 (log P)2 + a2 (log P) + a3 s + a4 Es + a 5 I 1 + a 6 I 2 + a 7 + constant

A B

Hansch analysis (LFER approach)

Free Wilson Analysis

Combined Approach:

Steps in QSAR

Structure Entry &

Molecular Modeling

Descriptor

Generation

Feature

Selection Construct

Model

Model

Validation

Steps in QSAR: Structure Entry & Molecular Modeling

Structures are sketched using standard drawing softwares

- commercial or freewares

Molecular modelling for the generation of low energy conformation

Ab intitio Semi-empirical Molecular mechanics

Very small molecules Highly accurate High computational costs Software: Gaussian Zindo DGauss Dmol

Medium sized molecules Accurate but computationally Intensive Software: MOPAC and AMPAC MNDO, AM1, PM3 and PM5 Hamiltonians

No restriction on size Accurate with proper Conformational analysis Software: InsightII, Sybyl Hyperchem CVFF, Tripos, CharmM and other forcefields

Accuracy

Time

Steps in QSAR: Descriptor Generation

Physicochemical properties that describe some aspect of the chemical structure

Empirical descriptors Theoretical descriptors

Determined experimentally Melting point, NMR chemical shift, Hanschs’ physicochemical parameters Viz. logP, MR, Pka,, s, ..

Calculated (theoretical) Topological, BCUT, ….

logP = Corg/ Caqu

pX = log PR-X – log PR-H

sx = log KX – log KH

Es = log KR- log KH

DG = -RT ln K

logK = -DG/RT ln10

Some of the experimentally determined parameters


Theoretical Descriptors

0D – Descriptors

Constitutional

1D – Descriptors

Most indicator variables: Functional groups, atom centered

fragments, empirical descriptors & properties

2D – Descriptors

topological, molecular walk counts, BCUT, Galvez. topo.

2D auto.

3D – Descriptors

aromatic, molecular profiles, geometric, radial density

functions, 3D-Morse, WHIM, GETAWAY and Quantum

Chemical descriptors

Information content

Time


0D Descriptors

Simplest, Includes properties and numbers atoms

1D Descriptors

Includes substructures, hydrophobicity and MR

Hydrophobic parameters

The most commonly used parameters

are p, log P, and RM

Molar refractivity, MR Proposed by Pauling and Pressman

Parameter for the correlation of dispersion forces in the

binding of haptens to antibodies

Determined from the refractive index, n, the molecular

weight, MW and the density of a crystal, d. Equation for the

molar refractivity


2D Descriptors Topological descriptors

The structures of compounds can be represented as graphs, a field of mathematics.

CH3

CH3

CH3

Isopentane Graph Representation

edge

node

Isopentane = five nodes, four edges, and the adjacency

relationships implicit in the structure

Theorems of Graph theory graph invariants

(topological

descriptors)

Examples

• Atom counts

• Molecular connectivity indices

• Substructure counts

• Molecular weight

• Weighted paths

• Molecular distance edges

• Kappa indices

• Electrotopological state indices

Ex: calculation of Path 1 molecular connectivity, 1 = 1/sqrt(mn), m and n are the degrees of

adjacent vertices

Thus 1(2-methylbutane) = 1/ sqrt(1.3) + 1/ sqrt(1.3) + 1/ sqrt(3.2) + 1/ sqrt(2.1) = 2.27


3D Descriptors

Encode the 3-D aspects of the structures

Moments of inertia, solvent accessible surface area, length-to- breadth ratios, shadow areas, gravitational index.

Quantum Chemical Descriptors

Encode aspects of the structures that are related to the electrons. Electronic descriptors include the following: partial atomic charges, HOMO or LUMO energies, dipole moment.

Geometric Descriptors

Molecular field analysis (MFA) descriptors

-evaluate the energy between a probe and a molecular model at a series of points defined by a rectangular or spherical grid -Energies may be added to the study table to be used as independent X variables


Receptor surface analysis descriptors -Calculates energy of interaction between each point on the receptor surface and each model to the study table -Filtering methods available to reduce the input to the study table

Pharmacophoric descriptors

Used for the calculation of the 3D fingerprint or hypothesis for the molecules Calculates all possible combinations of 2-10 features in 3D space for all conformers Possible features considered are – Negative and positive charges, negative and positive ionizable groups, hydrogen bind donor and acceptors, hydrophobic groups and aromatic rings

Feature Selection

Objective: Identify the best (information rich and as small as possible) subset

of descriptors

Go through two steps (I) Objective (2) subjective

Objective (Independent variables only)

Subjective (Use dependent variable)

Correlations Identical tests

Vector-space desc. analysis

Interactive regression analysis Simulated annealing Genetic algorithm

Partial Least Squares Principle component analysis

Steps in QSAR: Feature Selection and Modelling

Subjective Feature Selection

Hill-climbing and other gradient-type methods: perform comparatively

poorly on these types of problems.

Simulated annealing :has proven to be a thorough but not efficient,

however, on larger problems this method is penalized by its lack of

efficiency, often failing to converge to the global optimum within feasible

computing time

Tabu search : identifiable methodological flaws which make it unsuitable

for these types of problems

Genetic algorithms : as general methods which appear both thorough and

efficient, often yielding excellent results when applied to complex

optimization problems where other methods are either not applicable or

turn out to be unsatisfactory


Subjective Feature Selection: Difficult Problem Spaces

Hills Plateaus

Rippled Bryce Canyon Levy's 'Egg Carton'

Multi-dimensional modeling, involves exploring mathematical landscapes that are non-smooth with

cliffs and discontinuities in the response surface, have practical constraints on the available options,

and feature multiple local optima which can trap these methods.

Hypercubes


Subjective Feature Selection: Search Space

The space of all feasible solutions (the set of solutions among which the

desired solution resides) is called search space (also state space).

Each point in the search space represents one possible solution and can be

"marked" by its value (or fitness) for the problem.

Looking for a solution is then equal to looking for some extreme value (minimum

or maximum) in the search space.

The problem is that the search can be very complicated as one may not know

where to look for a solution or where to start.

Some of these methods that search and find suitable solutions are hill climbing,

tabu search, simulated annealing and the genetic algorithm.

Global minima

Global maxima


Traditional techniques in feature selection

MLR : Multiple linear regression analysis

• Simple, linear models

• In case of large number of independent variables, the search space

becomes large and requires the use of variables selection

techniques (search techniques)

• The variables may be inter-correlated among each other resulting in

a over-fitting model.

PCA : Principle component analysis

• Introduced to address the problem of intercorrelation and

reduction in the number of variables

• Inter-correlated variables are extracted as components and the

extracted components become the new variables for regression

analysis

PLS : Partial least squares

• Modification of the PCA technique, where the dependent

variables are also extracted into a new component as to

maximize the correlation with the extracted components

• Has an additional advantage of modeling multiple dependent

parameters


MLR based QSAR model development

MLR is the simplest paradigm for QSAR model development making the interpretation

of the model easier. However QSAR modeling where a large number of variables

are present is like searching a multi-dimensional search space like the hypercube

shown below Hence the successful development of

MLR model depends on a effective

search protocol.

Kubinyi Quant. Struct.-Act. Relat (1994)

Traditional techniques in feature selection


Exhaustive search : All possible variable subset combinations of all p variables (p!(p-m)!)

are examined exhaustively to identify the best ‘m’ variables.

It is practically impossible due to the high computational expenses

particularly in case of large values of p and m.

Forward selection: Incremental addition of (p-1) variables is made initially to a model with

one variable giving the lowest cost function out of a total of p variables so

as to further minimize the cost function.

The major drawback is that it often ends up in local minima because it

fails to discover information about the combined affect of sets of features.

Backward selection: Which operates in the opposite direction starting with the elimination of

the least significant variable for the total of p variables

The major limitation is its applicability to data sets with p < n-1.

Stepwise multiple regression: A combination of forward selection of significant variables

and backward elimination of variables below a certain significance level.

However this most widely used method also often ends up in local

minima

Classical strategies for MLR models


Randomly create an initial population

Are the optimization Criteria met ?

Evaluate the Fitness function

Selection

Crossover

Mutation

Best Individuals

Start Result Genetic Operators

Yes

No

Crossover Mutation

Flowchart of genetic algorithms

Recent techniques in feature selection: Hybrid GA-MLR


Recent strategy for MLR model: Hybrid GA-MLR

The genetic algorithm (GA) introduced by John Holland

It is a search paradigm inspired by natural evolution where the variables are

represented as genes on a chromosome (model)

It is similar to simplex optimization and evolves a group of random initial models

(population) with fitness scores and searches for chromosomes with better fitness

functions (response function scores) through natural selection and the genetic

operators, mutation and recombination

The natural selection guarantees the propagation of chromosomes with better fitness in

future populations

The GA combines genes from two parent chromosomes using the genetic

recombination operator to form two new chromosomes (children) that have a high

probability of having better fitness that their parents and also explore new response

surface (local optima) through mutation.

The GAs offer a combination of hill-climbing ability (natural selection) and a stochastic

method (recombination and mutation) are very flexible because they optimize on a

representation of variables not the variables themselves.

In addition the GAs provide efficient optimization as they use implicit parallelism to

process information quickly and require fewer response function evaluations than other

automated numerical optimization algorithms.


Steps in QSAR: Model Validation

Selection and Validation of QSAR models

The selection and validation of the QSAR model for virtual screening is of utmost

importance and should confer to the following recommendations-

Careful selection of independent variables

Significance of the variables (Statistical parameters)

Principle of parsimony (Occam’s razor)

Minimum number of compounds per variable

Importance of the model that corroborates with known biophysical data.

Classical criteria for predictive ability of QSAR model

Correlation coefficient r (relative quality of fit)

r = sqrt[1 – Σ(ycalc-yobs)2 / Σ (yobs - ymean)] Standard deviation s (absolute quality of fit)

s = sqrt( Σ(ycalc-yobs)2 /(n - k - 1) F test (Fisher value; level of statistical significance)

F = r2.(n - k - 1)/(k.(1 - r2)) Q² squared cross-validation, regression coefficient

Q2 = 1 – [Σ(ypred-yobs)2 / Σ (yobs - ymean)2]

sPRESS standard deviation of cross-validation predictions

s = sqrt( Σ(ypred-yobs)2 /(n - k - 1)

Measure of internal Predictivity Describes how well the data are fitted

Measure of external predictivity


Current criteria for predictive ability of QSAR model

Cross Validation is an external validation method where all the calibration objects are used. It

seeks, like the prediction testing, to validate the model on independent test data and hence

was widely accepted as the measure of predictive ability of the model.

However Cross Validation fails for designed data sets and for small data sets. Thus R2 (LOO

Q2) appears to be the necessary but not the sufficient condition for the model to have high

predictive power

Hence the method which gives the most correct estimate of the predictive ability of the model

is the external test Set Validation

The test set must also full-fill the following

The test set should be 25-50% of the total data set.

The test set must be representative for the future samples.

The range of the test set should match that of the training set.

Golbraikh and Tropsha (2002) Journal of Molecular Graphics and Modelling, 20, 269-276.

Wold, et. Al. (1983). Proc. Conf. Matrix Pencils, (A. Ruhe and B. Kågström, eds.), March 1982. Lecture Notes in Mathematics

973, Springer Verlag, Heidelberg, pp 286-293

Wold, S. Technometrics, 20 (1978) p 397.


3D QSAR

The use of classical QSAR was expanded during the sixties as a means of

correlating observed activity to physicochemical properties.

However, there are many areas where these techniques could not be used or

where they failed to provide useful correlations because of their limitations of

non-consideration of 3-dimensional geometry of molecules (different stereo-

isomers/ enantiaomers) or molecules from non-congeneric series or molecules

acting through different mechanisms.

Some of these problems were addressed by extensions to the Hansch method

in combined approach and the development of alternative approaches to

QSAR.

Basis of Hypothetical Active Site Lattice (HASL) approach

HASL: Hypothetical Active Site Lattice approach for 3D-QSAR

HASL type (H) Definitions MM2a

H

Atom

Type

MM2

H

Atom

Type

1

0

C

Sp3 alkane

15

+1

S

Sulfide

2

0

C

Sp2 alkene

16

-1

S+

Sulfonium

3

-1

C

Sp2 carbonyl

17

-1

S

Sulfoxide

4

0

C

Sp acetylene

18

-1

S

Sulfone

5

0

H

hydrogen

19

0

Si

Silane

6

+1

O

COH, COC

21

-1

H

OH, alcohol

7

+1

O

C=O carbonyl

22

0

C

Cyclopropyl

8

+1

N

Sp3

23

-1

H

NH, amine

9

+1

N

Sp2

24

-1

H

COOH, carboxyl

10

+1

N

Sp

25

+1

P

Phosphine

11

+1

F

Fluoride

26

-1

B

Trigonal boron

12

+1

Cl

Chloride

27

0

B

Tetrahedral boron

13

+1

Br

Bromide

28

-1

H

Vinyl hydrogen

14

+1

I

Iodide

37

+1

N

Imine Nitrogen

An expert system developed to represent, elucidate, and utilize knowledge on

structure-activity relationships.

Can be used to build 3D-SAR and 3D-QSAR models which can be used for activity

classification and prediction.

Emulates the intelligence of a researcher engaged in establishing relationships

between a compound's structural parameters and its activity

The corner-stone of the Apex-3D methodology is automated identification of

biophores (pharmacophores) .

These biophores can be used for building qualitative activity prediction rules and for

creating search queries to identify new leads in a 3D-database.

Identified biophores can be used as starting points for constructing 3D-QSAR

models when good quantitative data is available.

Combination of a 3D pharmacophore with a quantitative regression equation is

unique to the Apex-3D approach

Apex – 3D: Pharmacophore and 3D QSAR analysis program

A descriptor center represents a part of the hypothetical biophoric moiety capable of interacting with

a receptor.

Descriptor centers can be either atoms or pseudoatoms which can participate in ligand-receptor

interactions based on the following types of physical properties:

Electrostatic interactions

Hydrogen bonds

Charge-transfer complexes

Hydrophobic interaction

van der Waals (or London) dispersion forces

Biophore Identification Algorithm

Automated identification of biophores in Apex-3D incorporates the following elements:

1.Structural Elements:

pharmacophoric centers which interact with receptors,

electronic and structural indexes quantifying ligand-receptor interaction effects,

distance relationships between pharmacophoric centers forming unique recognizable

patterns.

2. Statistical Criteria: assessing the probability of correct activity prediction for compounds

possessing a certain biophore.

Chemical Structure Representation


Structure input-2D/3D molecule editor

Force field selection and assignment of the charges

Optimization - Discover module

MOPAC

Electronic

parameters

Simplified modules

Non-electronic parameters

ACC_01

DON_01

Charge

HOMO

LUMO

Pi-popultion

Formal_charge

Hybrid_type

LP

Hydrophobicity

Hydro_region

Refractivity

Automated molecular superimposition and identification of the biophore

Building and selecting 3D QSAR model using the biophore as a template

Validation of the selected models - Test set predictions

- Statistical criteria

Flow Chart of the methodology

Calculation of the

parameters


Easy to use interface

Upto 255 conformers per molecule - energy range 20kcal/mol

Models covering as much conformational space as possible

3D hypotheses explain variability of K m , K i - measure of the

properties of the active site

Related to structural features e.g. HBA, Hydrophobe etc. Catalysts Chemical functions

Catalyst: Predictive hypothesis & common feature hypothesis

Flow Chart of the methodology

Structure input-2D/3D molecule editor

Optimization - CHARMm forcefield

Generation of conformational models

Poling Algorithm

Best method Fast method

Generation of Hypothesis

HypoGen - Generates

Pharmacophores which can

variations in activity.

Structure activity analysis is based

on the fact that inactive molecules

cannot map to all the features of the

hypothesis while active molecules

can and hence estimated to be

active.

HipHop - Produces common feature

hypothesis, where relative activity is

not taken into account

Chemically diverse and flexible

molecules can be aligned on a wide

range structural features(HBA, HBD)

hydrophobic regions and user defined

regions. These alignments can be

used as starting points for 3D QSAR

studies

Catalyst: Predictive hypothesis & common feature hypothesis

(CoMFA): Comparative Molecular Field Analysis for 3D-QSAR

Basis of CoMFA

Interactions responsible for binding are usually noncovalent in nature

Treatment of noncovelant (nonbonded) interactions using only steric and electrostatic forces

can account for a variety of molecular properties

Richard Cramer proposed that biological activity could be analyzed by relating the

shape-dependent steric and electrostatic fields for molecules to their biological activity

CoMFA Approach

Define alignment rules for the series which overlap the putative pharmacophore for each

molecule; the active conformation and alignment rule must be specified

Each molecule is fixed into a three-dimensional grid by the program and the electrostatic and

steric components of the molecular mechanics force field, arising from interaction with a probe

atom (e.g., an SP3 C atom), are calculated at intersecting lattice points within the 3-D grid

The equations which result from this exercise have the form

Act1 = Const1 + a1(stericxyz) + b1(stericxyz) + ... + a'1(estaticxyz) + b'1(estaticxyz) + ...

Act2 = Const2 + a2(stericxyz) + b2(stericxyz) + ... + a'2(estaticxyz) + b'2(estaticxyz) + ...

Actn = Constn + an(stericxyz) + bn(stericxyz) + ... + a'n(estaticxyz) + b'n(estaticxyz) + ...

In CoMFA, molecules are

represented and compared by

their steric and electrostatic

fields sampled at the

intersections of one or more

lattices (or grids, or boxes)

spanning a three-dimensional

region.

Thus each CoMFA descriptor

column of a QSAR MSS

contains the magnitudes of

either the steric or electrostatic

field exerted by the atoms in the

tabulated molecules on a probe

atom located at a point in

Cartesian space


The contours of the steric map are shown in yellow and green,

and those of the electrostatic map are shown in red and blue.

Greater values of 'Bio-Activity Measurement' are correlated with:

more bulk near green;

less bulk near yellow;

more positive charge near blue,

more negative charge near red

Steric and Electrostatic Maps


CoMFA, CoMSIA and Adv. CoMFA

Tripos Standard( Steric & electrostatic) Steric fields Electrostatic fields Hydrophobic fields Hydrogen bond acceptor fields Hydrogen bond donor fields Steric & Electrostatic fields Hydrogen bond acceptor & donor fields

CoMISA fields CoMFA field

Molecular Interaction Fields calculation

Four different CoMFA fields and seven different CoMSIA fields were generated The PLS algorithm was used to relate these fields to the Histamine H1 antagonistic activity

Hydrogen bonding Fields

Indicator fields

Parabolic fields

Adv. CoMFA fields


Shapes of various

functions

CoMFA calculates steric fields using a Lennard-

Jones potential, and electrostatic fields using a

Coulombic potential

Both potential functions are very steep near the

van der Waals surface of the molecule, causing

rapid changes in surface descriptions

Further scaling factor is applied to the steric field

Steric fields, Lennard-Jones potential

E = r + r

r - 2

r + r

rjk = 1

natoms probe k

ij

12probe k

jk

6

Electrostatic fields, Coulomb potential

E = q q

rjprobe k

jkk = 1

natoms

CoMSIA fields: similarity indices

CoMFA & CoMSIA standard

(Steric and Electrostatic fields

Probe atom = SP3 hybridized C+

Effective Vander Waals radii = 1.53

Charge = +1

CoMSIA Hydrophobic fields

the atomic values are directly

based on the research of

Viswanadhan et.al

Probe

atoms

CoMFA indicator fields

Created by converting

continuous data to discreet

Adv

CoMFA

CoMFA parabolic fields

Created by squaring the original

field at each lattice, but retaining

the sign of the original field

Adv

CoMFA

CoMFA & CoMSIA (donor & acceptor)

hydrogen bonding field

Probe atom = H2O

Effective Vander Waals radii = 1.7 – 1.8 A

Charge = 0

Adv

CoMFA

HQSAR - HQSAR works by identifying patterns of substructural fragments relevant to biological

activity in sets of bioactive molecules.

Differs from other similar concepts like maximal common sub-graph algorithms and the

Stigmata algorithm, in that HQSAR yields a predictive relationship between structural features

(descriptors) in the dataset and biological activity using PLS

Descriptors

A molecular hologram is an array containing

counts of molecular fragments The process of hologram generation

HQSAR Overview

Advantages of HQSAR:

2D approach, does not require 3D modeling and alignment and yet has

3D information in it.

Hologram QSAR

Conclusion

The development of QSARs in last 40 years has evolved both in

terms of descriptor generation and data analysis augmented with

improved performance of computers for simulation and 3D

visualization. It has reached to a stage where it can be used as an

alternative for both lead identification and optimization. It

provides powerful tool for virtual screening and can complement

well with the current techniques of combinatorial chemistry and

high throughput screening in drug discovery research.

Thank you

Basic Principles & Methodologies of 2D & 3D QSARs

Documents

Transcript of Basic Principles & Methodologies of 2D & 3D QSARs