Chapter 1 Introduction - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13343/4/04_chapter...
Transcript of Chapter 1 Introduction - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13343/4/04_chapter...
Cliapter 1 Introtfuction
1.1 Drug discovery process and drug research
Drug discovery is a long, arduous process (Kapetanovic, 2008) and involves
identification of new drug targets and development of marketable drugs. Drugs mainly
work by interacting with target molecules (receptors) in our bodies and altering their
activities in a way that is beneficial to our health.
Drug discovery process is broadly grouped into different areas, namely, disease target
identification, target validation, high-throughput identification of "hits" and "leads", lead
optimization, and pre-clinical and clinical evaluation. New drugs begin in the laboratory
with identification of new drug targets that play a role in specific diseases. The chemical
and biological substances that target these biological markers that are likely to have drug
like effects are searched. Out of every 5,000 new compounds identified during the
discovery process, only five are considered safe for testing in human volunteers after
preclinical evaluations. After three to six years of further clinical testing in patients, only
one of these compounds is ultimately approved as a marketed drug for treatment.
According to an estimate, discovering and bringing one new drug to the public typically
costs a pharmaceutical or biotechnology company a whopping amount of nearly $900
million and takes an average of 10 to 12 years. An outline of different stages involved in
drug discovery process with respect to time is represented in Figure 1.1. However, in
special circumstances, such as the search for effective drugs to treat AIDS or in a sudden
outbreak like Swine flu, the U.S. Food and Drug Administration (FDA,
http://www.fda.gov/) has encouraged an abbreviated process for drug testing and
approval called Fast-Tracking. The drug discovery and drug development process is
designed to ensure that only those pharmaceutical products that are both safe and
effective are brought to market. Fortunately, advances in computer-assisted drug
discovery (CADD) are improving the way scientists do their work saving lots of time and
money. Since the main area of concern in this thesis is the application of computational
methodologies in new inhibitor discovery, therefore foremost it is important to introduce
to the concept of rational drug design.
Cfiapter 1 I ntrotfuction
[tHE DRUG DISCOVERY PROCESS) The entire "drug discovery'' process takes an average of 12 -15 years to complete.
• Basic Research Stage - Years 0-3 • During this time period. thousands of substances are being developed.
examined and screened. • Development Stage - Years 4-10 • During this time, 10-20 substances are tested, both in vitro (within Cl1
artificial environment) and in vivo (within a living body). • Preclinical Testing conducted using animals for years 4,5,6 • Clinical Testing conducted using humans for years 7,8,9 e11d10 • Phase I - clinical testing on 5-10 subste11ees • Phase II - clinical testing on 2-5 substances • Phase III - clinical testing on 2 subste11ees
Registration of the drug with the U.S. Food e11d Drug Administration Introduction of the drug to the public- Years 11+ Product Surveillance done on people for years 11,12,+
• Phase IV - observations/monitoring the product
Fig 1.1 Drug Discovery Process. An overview
1.2 Rational drug design
The birth of chemotherapy: Paul Ehrlich, first argued that differences in chemoreceptors
between species may be exploited therapeutically (Kaufmann, 2008).
Finding effective drugs is difficult!! Many drugs are discovered by chance observations,
the scientific analysis of folk medicines or by noting side effects of other drugs. It is the
rare case today when an unmodified natural product like taxol becomes a drug. Most drug
discovery programs begins with rational drug design which is a more focused approach
and uses information about the structure of a drug receptor or one of its natural ligands to
identify or create candidate drugs. This philosophy of rational drug design, or more
specifically, the 'one gene, one drug, one disease' paradigm (Drews, 2000), arose from a
congruence between genetic reductionism and new molecular biology technologies that
2
cliapter 1 Introduction
enabled the isolation and characterization of individual 'disease-causing' genes. The
underlying assumption of the current approach is that safer, more effective drugs will
result from designing very selective ligands where undesirable and potentially toxic side
effects have been removed. However, after nearly two decades of focusing on developing
highly selective ligands, the current clinical attrition figures challenge this hypothesis and
scientist world-over are proposing newer strategies to make rational drug design a
success.
Discussion of the use of structural biology in drug discovery began over 35 years ago,
with the advent of the 3D structures of globins, enzymes and polypeptide hormones.
Early ideas in circulation were the use of 3D structures to guide the synthesis of ligands
of hemoglobin to decrease sickling or to improve storage of blood (Goodford, et a/.,
1980), the chemical modification of insulins to increase half-lives in circulation
(Blundell, eta/., 1972) and the design of inhibitors of serine proteases to control blood
clotting (Davie, et a/., 1991). One of the first drugs produced by rational design was
Relenza (von Itzstein, et a/., 1993), which is used to treat influenza. Relenza was
developed by choosing molecules that were most likely to interact with neuraminidase, a
virus-produced enzyme that is required to release newly formed viruses from infected
cells.
Many of the recent drugs developed to treat HIV infections (e.g. Ritonivir, Indinavir)
{Hightower & Kallas, 2003) were designed to interact with the viral protease, the enzyme
that splits up the viral proteins and allows them to assemble properly. Since the activity
of the molecule is the result of multitude of factors such as bioavailability, selectivity,
toxicity and metabolism, the rational drug design strategies been helping medicinal
chemists for quite some time. Scientific advancements during the past two decades have
altered the way pharmaceutical research produces new bioactive molecules. Very
recently, impressive technological advances in areas such structural characterization of
proteins, combinatorial chemistry, computational chemistry and molecular biology have
made a significant contribution to make rational drug design more feasible.
Computational methodologies have become an integral component of modem drug
discovery programs (Jorgensen, 2004). It is far more than just data-management and
3
Cfuzpter 1 I ntrotfuction
includes target determination, target validation, hit identification, lead optimization and
beyond. During the past two decades, the computer-based design of hit and lead structure
candidates has emerged as a complementary approach to high-throughput screening
(Bajorath, 2002). The addition of CADD technologies have expedited the discovery
processes by delivering new drug candidates at a faster rate and could lead to a reduction
of up to 500/o in the total cost of drug design (Stahl, et al., 2006). Advances in
computational techniques and hardware solutions have enabled in silico methods to speed
up lead optimization and identification. As all the aspects of rational drug design are
beyond the scope of this chapter, only current topics in computer-aided drug design
underscoring some of the most recent approaches and interdisciplinary processes are
discussed here.
1.3 Computer-Aided Drug Design
Use of computational techniques in drug discovery and development process is rapidly
gaining popularity, implementation and appreciation. Different terms are being applied to
this area, including CADD, computational drug design, computer-aided molecular design
(CAMD), computer-aided molecular modeling (CAMM), rational drug design, in silico
drug design, computer-aided rational drug design (Kapetanovic, 2008). Computer
assisted approaches to identify novel inhibitors via ligand-based drug design, structure
based drug design (drug-target docking), quantitative structure-activity and quantitative
structure-property relationships have been used to identify new inhibitors using large
available databases and algorithms implemented in highly selective and efficient
programs such as Insightll and Catalyst (Accelrys, San Diego, CA}, Sybyl {Tripos, San
Diego, CA), Gaussian (Pittsburgh, PA), and many others.
Here, we introduce basic concepts and specific features of CADD including small
molecule-protein docking methods, hit identification and lead optimization and several
selected applications, with particular emphasis on virtual screening, but do not
specifically discuss protein-protein docking, which is less relevant for small-molecule
drug discovery.
4
cnapter 1 Introauction
1.3.1 Small molecule resources and drug targets
"We're in a very target-rich but lead-poor post-genomics era for drug discovery." Raymond Stevens.
In general, a detailed 3D structure of the drug target and small molecule library is the
first consideration before starting a CADD project. The introduction of genomics,
proteomics, metabolomics and advances in structure determination techniques of NMR
and X-ray crystallography has paved the way for biology-driven process, leading to a
plethora of drug targets. To date, more than 55,000 protein structures have been deposited
in RCSB (http://www.rcsb.org/pdb). Identification and validation of viable targets is an
important first step in drug discovery and new methods, and integrated approaches are
continuously being explored to improve the discovery rate and exploration of new drug
targets. The sequencing of human and other genomes has made possible the identification
of many unknown proteins that might serve as new drug targets. The druggablility which
is the presence of protein folds that favors interaction with drug-like chemical
compounds, of the genomes has been predicted (Russ & Lampel, 2005). However, the
detailed 3D structures for the majority of membrane proteins and others are still
unknown. Comparative homology modeling of protein structures, based on
experimentally determined structures of homologous proteins, can be a useful
methodological alternative. For the future, collaborative structural genomics initiatives
may aim at determining the 3D structure of all known proteins, based on a combination
of experimental structure determination and molecular modeling.
The public availability of data on small molecules such as drugs and drug-like molecules
databases, for example, chemical
(http:/ /redpoll. pharmacy. ualberta.caldrugbank/),
repositories
Zinc
such as DrugBank
(http:/ /www.docking.org),
PubChem (http://pubchem.ncbi.nlm.nih.gov/), and others consist of a wealth of targets
and small molecule data that can be mined and used for CADD approaches.
1.3.2 Virtual screening
The term "Virtual screening" (referred as VS) appeared in 1997, became an integral part
of the drug discovery process in recent years and is an increasingly important strategy of
5
Cfiapter 1 I ntrotfuction
the computer-aided search for novel lead compounds (Fig 1.2). Virtual screening (VS) or
in silico screening has been established as a powerful alternative and complement to
high-throughput screening (HTS) (Bajorath, 2002). In essence, VS methods are designed
for searching large compound databases in silico and selecting a limited number of
candidate molecules for testing to identify novel chemical entities that have the desired
biological activity. A current estimate is that biological screening and preclinical
pharmacological testing alone account for -14% of the total research and development
(R&D) expenditures of the pharmaceutical industry. Target and ligand-based virtual
screening has emerged as resource-saving techniques that have been successfully applied
to identify novel chemotypes as biologically active molecules, either inhibitors or
antagonists of diverse biological targets.
VS techniques applications include compound filtering methods, lead identification and
lead optimization strategies. The compound filtering methods are implemented to create
and filter large databases for compounds with desired or undesired chemical groups,
drug-like characters, preferred solubility and absorption characteristics, or oral
bioavailability etc. A diverse array of VS methods has been developed, including
structural queries, pharmacophores, molecular fmgerprints, QSAR models, diverse
cluster analysis tools, statistical techniques and docking calculations (Bajorath, 2002).
Lead identification and lead optimization VS strategies that are often divided into
structure-based virtual screening (SBVS) and ligand-based virtual screening (LBVS),
possible synergies between two and some compound filtering techniques which are an
indispensable part of both are discussed in detail here. Both SBVS and LBVS methods
rely on appropriate software to query large databases of virtual or existing chemicals.
1.3.2.1 Ligand-based virtual screening (LBVS)
Given a bioactive conformer (chemotype) derived from structural methods (X-ray and
NMR.), or from molecular modeling, LBVS can rank novel ligands by 3D similarity
searching or by pharmacophore pattern matching (Oprea & Matter, 2004).
6
Cftapter 1
140
120
I'DD
I: 20 0 .,__,..._.,.......,..
1987 18118 ,. - 2001 - - 21004 2005 2006 NIIICIIIan ,..,
I ntrotluction
Fig 1.2 Analysis of VS publications obtained from PubMed search performed using virtual screening as keyword (Adapted from: Drug Discovery Today: Technologies Vo/.3, 405-411, 2006).
Similarity searches: In this paradigm, molecules with similar features are considered to
have similar biological activity. For single templates, similarity searching is a preferred
approach (Fig 1.3), and popular tools include various two-dimensional (2D) and three
dimensional (3D) structural queries, pharmacophore models, 2D and 3D fmgerprints,
volume-matching techniques and complex molecular descriptors (Willett, 2006). These
approaches include 2D or 3D quantitative structure-activity relationship (QSAR) models
and diverse clustering and partitioning methods.
Structure or descriptor-based queries: When a single inhibitor molecule is available as
a template, the structure-based queries can be either 2D substructures (molecular
fragments) or 3D molecular queries, particularly pharmacophore models can be used to
perform similarity searches against compound databases. These methods generally
require the efficient generation of reasonable, low-energy (single or multiple)
conformations of database compounds. As with any 3D search method, the crucial
assumption is made that database conformations are at least similar to biologically active
structures of candidate molecules, which is merely a hypothesis in most cases.
7
Cftapter 1 I ntroauc.tion
I Similarity Searching I !
! Structure- or descriptor-based queries
• 2D substructures or molecular fingerprints • 3D molecular queries- pharmacophore
models • Molecular Descriptors- shape, topology,
electronic features • 3D or 4D QSAR
! Molecular Fingerprints
• 2D fingerprints • 3D pharmacophore
fingerprints
Fig 1.3 Similarity searching strategies in Ligand-based virtual screening.
Phannacophore identification and matching: In the pharmacophore concept, all
ligands, regardless of chemotype, present similar steric and electrostatic features that are
recognized at the target-binding site and are responsible for the biological activity.
Pharmacophore model is a hypothesis on the 3D arrangement of structural properties.
These models include aromatic rings, hydrophobic groups of compounds that bind to a
biological target as well as hydrogen bond donor and acceptor groups. Geometric and
steric constraints can be defined when one has the 3D structure of the receptor target or
by comparison with inactive analogs. 3D searches in large databases can be performed
once a pharmacophore model is established.
Descriptor-based queries can also be used for database searching. Such complex
descriptors are mostly designed to detect similarities in molecular shape and shape
related properties or topological and electronic features between query and test
compounds. In addition to single 2D or 3D queries, information from different sub
structural descriptors can also be combined and reduced to generate descriptor spaces of
low complexity for similarity analysis and searching. Furthermore, when series of
analogues that have differential activity are available as training sets, 3D or 4D
Quantitative structure-activity relationship (QSAR) models can be developed and used
for analyzing molecular similarity and VS.
8
Cfiapter 1 I ntrotfuction
Molecular fingerprints are binary bit string representations that capture diverse aspects
of molecular structure and properties, and are among the most popular tools for VS.
Similar to structure-based queries, fmgetprints are the method of choice for cases in
which only single hits or leads are available as templates for searching. This analysis
proceeds in 'fingetprint space', which means that a fmgetprint is calculated for the
template molecule and searched against the corresponding fingetprints of database
compounds. Fingetprint overlap is compared and quantified using a similarity measure,
such as the Tanimoto Coefficient (Tc), and any fingetprint search requires the definition
of similarity threshold values. A detailed comparison of a large number of similarity
coefficients demonstrates that the well-known Tanimoto coefficient remains the method
of choice for the computation of fingetprint-based similarity, despite possessing some
inherent biases related to the sizes of the molecules that are being sought.
Although all fmgetprints are binary bit strings, their design concepts and lengths vary
substantially. Simple 20 fmgetprints might consist of only -100 bit positions, each of
which detects the presence or absence of a specific structural fragment or codes for a
value of a property descriptor. More complex 20 fingetprints map properties, such as
possible connectivity pathways, through molecules to overlapping bit segments and
consist of several thousand bits. By contrast, 3D pharmacophore fmgetprints monitor all
possible arrangements of predefmed, three- or four-point phamacophores in a molecule,
as explored by systematic conformational searching, and consist of millions ofbits.
Partitioning and Clustering
Compound clustering (Bajorath, 2002) founds application in classification of compound
databases for HTS (High-throughput screening) and VS. For VS calculations, clustering
and partitioning (Fig 1.4) are particularly suitable when series or different sets of active
compounds are available as starting points. In such cases, database clusters or partitions
into which active molecules fall can be identified, and representative compounds can be
selected for biological testing.
Filter methods
Although compound filtering is not directly involved in identification of active
molecules, it is still regarded as a VS method. Filtering is applied to enrich libraries with
9
Cliapter 1 Introduction
molecules that have preferred properties or, equally important, to eliminate compounds
that have characteristics that are clearly incompatible with the discovery requirements.
According to one analysis, half of the costly late stage failures in drug development were
attributed to poor pharmacokinetics (390/o) and animal toxicity (11%) (van de
Waterbeemd & Gifford, 2003). The in silico approaches has increased the ability to
predict and model the most relevant pharmacokinetic, metabolic and toxicity endpoints
on absorption, distribution, metabolism, excretion (ADME) and toxicity data (together
called ADMET data}, thereby accelerating the drug discovery process. The ADMET
properties that are addressed to improve the quality of screening libraries includes:
Elimination of reactive or toxic groups, Aqueous solubility and passive absorption, drug
like characters given by Lipinski's Rule of five (Lipinski, et al. 2001}, Blood-brain barrier
penetration and CNS activity and other ADME properties like oral availability and
metabolic stability.
Partitioning
00
l o• 0 0 0 0
•••••••• ••• •o 0 0 • 0 0 o 0 • 0 0
Clustering
Adapted from Nat Drug Discov Rev 2002, 882-894.
Fig 1.4 Clustering versus partitioning. The figure illustrates methodological differences
between clustering and partitioning. Coloured in red are active compounds that are added
to a source database before the analysis. Molecules found to be 'similar' to these
templates (candidates for biological testing) are shown in light grey.
10
Chapter 1 I ntrotfuction
1.3.2.2 Structure-based virtual screening (SBVSl
Given a protein structure, and/or its binding site, SBVS finds a new molecule that
changes the protein's activity. The success of SBVS is well documented (Maryanoff,
2004); it has contributed to the introduction of ~50 compounds into clinical trials and to
numerous drug approvals. Structure-based virtual screening has evolved over the past
decade, and many different variations of the basic methodology have been proposed.
Numerous reviews have been published on virtual screening. One of the more recent ones
(Waszkowycz, 2008), covers the current state-of-the-art and many of the practical aspects
of SBVS that are briefly discussed here.
Docking
With the rise in the number of structural targets with therapeutic potential, high
throughput docking is the primary hit identification tool used in drug discovery research
process (Kitchen, et al., 2004). Docking is also used later on during lead optimization,
when modifications to known active structures can quickly be tested in computer models
before compound synthesis. Furthermore, docking can also contribute to the analysis of
drug metabolism using structures such as cytochrome P450 isoforms.
Concepts: In the context of molecular modeling, docking means predicting the bioactive
conformation of a molecule in the binding site of a target structure. In essence, this is
equivalent to finding the global free energy minimum of the system consisting of the
ligand and the target. Docking is used as a tool in structure-based drug design as well as
in SBVS. The first algorithm developed to dock small molecules into the binding pocket
of a macromolecule, the DOCK algorithm, was published in 1982 by Kuntz et al.
(Moustakas, et al., 2006). In a review from 2007, more than sixty published docking
programs and thirty scoring functions were listed (Moitessier, et al., 2008). However, the
earliest and most widely used docking programs over the past years are probably DOCK
(Moustakas, et al., 2006}, AutoDock (Morris, et al., 1998), GOLD (Verdonk, et al., 2003)
and FlexX (Rarey, et al., 1996}, and in recent years also e.g. Glide(Halgren, et al., 2004),
ICM (Abagyan & Totrov, 1994), FRED (McGann, et al., 2003), and Surflex-Dock (Jain,
2003). All docking programs contain at least two components, a fitness or scoring
function, whose global minimum is intended to coincide with the global free energy
11
Cfzapter 1 I ntroduc.tion
minimum of the target-ligand system, and a search method, used to sample the search
space in which the scoring function is optimized (Table lA). This search space can be
very large, combining all ligand positions and rotations with all possible conformations
of the ligand and possibly also the target protein. The docking process involves the
prediction of ligand conformation and orientation (or posing) within a target binding site.
In general, there are two aims of docking studies: accurate structural modeling and
correct prediction of activity. However, the identification of molecular features those are
responsible for specific biological recognition, or the prediction of compound
modifications that improve potency, are complex issues that are often difficult to
understand and even more so, to simulate on a computer. In view of these challenges,
docking is generally devised as a multi-step process in which each step introduces one or
more additional degrees of complexity. The process begins with the application of
docking algorithms that 'pose' small molecules in the active site. Algorithms are
complemented by scoring functions that are designed to predict the biological activity
through the evaluation of interactions between compounds and potential targets. It should
also be noted that ligand-binding events are driven by a combination of enthalpic and
entropic effects, and that either entropy or enthalpy can dominate specific interactions.
This often presents a conceptual problem for contemporary scoring functions, because
most of them are much more focused on capturing energetic than entropic effects. Other
challenges are posed by various factors including limited resolution of crystallographic
targets, inherent flexibility, induced fit or other conformational changes that occur on
binding, and the participation of water molecules in protein-ligand interactions.
Search methods and molecular flexibility: Methods used for the treatment of ligand
flexibility can be divided into three basic categories viz. systematic methods (incremental
construction, conformational search, databases); random or stochastic methods (Monte
Carlo, genetic algorithms, tabu search); and simulation methods (molecular dynamics,
energy minimization) (Kitchen, et al., 2004). A summary of the search approaches
implemented in widely used docking programs are described here briefly as well as listed
in Table lA.
12
Cliapter 1 Introluction
Search techniques
Monte Carlo algorithm in its basic form: Generates an initial configuration of a ligand
in an active site consisting of a random conformation, translation and rotation. Scores the
initial configuration and then generates a new configuration and score it. Use a
Metropolis criterion (see below) to determine whether the new configuration is retained.
Repeat previous steps until the desired number of configurations is obtained.
Metropolis criterion
If a new solution scores better than the previous one, it is immediately accepted. If the
configuration is not a new minimum, a Boltzmann-based probability function is applied.
If the solution passes the probability function test, it is accepted; if not, the configuration
is rejected.
Molecular dynamics
Molecular dynamics is a simulation technique that solves Newton's equation of motion
for an atomic system: Fi= mi ah in which Fi is force, mi is mass and ai is acceleration. The
force on each atom is calculated from a change in potential energy (usually based on
molecular mechanics terms) between current and new positions: Fi = -(dE/ri), in which r
is distance. Atomic forces and masses are then used to determine atomic positions over
series of very small time steps: Fi = mi (d2r/dt2), in which t is time. This provides a
trajectory of changes in atomic positions over time. Practically, it is easier to determine
time-dependent atomic positions by first calculating accelerations ai from forces and
masses, then velocities Vi from ai = dv/dt and, ultimately, positions from velocities Vi =
dr/dt.
Genetic algorithms
Genetic algorithms are a class of computational problem-solving approaches that adapt
the principles of biological competition and population dynamics. Model parameters are
encoded in a 'chromosome' and stochastically varied. Chromosomes yield possible
solutions to a given problem and are evaluated by a fitness function. The chromosomes
that correspond to the best intermediate solutions are subjected to crossover and mutation
operations analogous to gene recombination and mutation to produce the next generation.
13
Cfuzpter 1 I ntrotfuction
For docking applications, the genetic algorithm solution is an ensemble of possible ligand
conformations.
Tabu search
This search algorithm makes n small random changes to the current conformation. Rank
each change according to the value of the chosen fitness function. Determine which
changes are 'tabu' (that is, previously rejected conformations). If the best modification
has a lower value than any other accepted so far, accept it, even if it is in the 'tabu';
otherwise, accept the best 'non-tabu' change. Add the accepted change to the 'tabu' list
and record its score. Go to the first step.
Protein flexibility
Various approaches have been applied to flexibly model at least part of the target,
including molecular dynamics and Monte Carlo calculations, rotamer libaries and protein
ensemble grids. The idea behind using aminoacid side-chain rotamer libraries is to model
protein conformational space on the basis of a limited number of experimentally observed
and preferred side-chain conformations. To reduce the number of discrete protein
conformations arising from combinations of rotamers, a dead-end elimination algorithm
is often used. This algorithm recursively removes side-chain conformations that do not
contribute to a minimum energy structure. Another method of treating protein flexibility
is to use ensembles of protein conformations (rather than a single one) as the target for
docking and to map these ensembles on a grid representation. One approach generates an
average potential energy grid of the ensemble, as first implemented in DOCK; another
maps various receptor potentials to each grid point and subsequently scores ligand
conformations against each set of receptor potentials.
Scoring
Scoring functions implemented in docking programs make various assumptions and
simplifications in the evaluation of modeled complexes and do not fully account for a
number of physical phenomena that determine molecular recognition - for example,
entropic effects. Essentially, three types or classes of scoring functions are currently
14
Chapter 1 I ntrotfuc.tion
applied and includes Force-field-based, empirical and knowledge-based scoring
functions. All of these approaches attempt to approximate the binding free energy.
Empirical scoring functions are a weighted sum of individual ligand-receptor
interaction types commonly supplemented by penalty terms, such as the number of
rotatable ligand bonds. The weights correspond to the average free-energy contribution of
a single interaction of that type and are obtained by a regression analysis of a set of
receptor-ligand complexes. Interaction types include, for example, hydrogen bonds,
electrostatic interactions and hydrophobic interactions. The regression analysis requires
both known structures and binding constants, and so the available datasets are limited in
size and often feature similar ligands and receptors. This can result in a bias of empirical
scoring functions towards specific structural motifs. However, they are fast and have
proved their suitability, and are therefore implemented in several de novo design
programs. LUDI (Bohm, 1992) and Chemscore (Eldridge, et al., 1997) are the popular
implementions of empirical scoring functions.
Knowledge-based scoring is grounded on a statistical analysis of ligand-receptor
complex structures. The frequencies of each possible pair of atoms in contact to each
other are determined. Interactions found to occur more frequently than would be
randomly expected are considered attractive; interactions that occur less frequently are
considered repulsive. Only structural information is necessary to derive these frequencies
so that a greater number of structures can be included in the analysis. As the available
structures are also more diverse than those with known binding affinities, less bias is
expected compared with empirical scoring functions. Popular implementations of such
functions include Potential of mean force (PMF) (Muegge, 2000) and DrugScore
(Gohlke, et al., 2000), which also includes solvent-accessibility corrections to pair-wise
potentials.
Force-field-based scoring is based on the molecular mechanics terms which express the
energy of the system. Molecular mechanics force fields usually quantify the sum of two
energies, the receptor-ligand interaction energy and internal ligand energy (such as steric
strain induced by binding). Interactions between ligand and receptor are most often
described by using van der Waals and electrostatic energy terms. The van der Waals
15
Cfiapter 1 I ntrotfuction
energy term is given by a Lennard-Jones potential function. Electrostatic terms are
accounted for by a Coulombic formulation with a distance- dependent dielectric function
that lessens the contribution from charge-charge interactions. The functional form of the
internal ligand energy is typically very similar to the protein-ligand interaction energy,
and also includes van der Waals contributions and/or electrostatic terms. Various force
field scoring functions are based on different force field parameter sets. For example,
GScore is based on the Tripos force field (Kramer, et a/., 1999) and AutoDock on the
Amber force field (Weiner, eta/., 1986). Recent extensions of force-field-based scoring
functions include a torsional entropy term for ligands in G-Score and the inclusion of
explicit protein-ligand hydrogen-bonding terms in Gold (Verdonk, et a/., 2003) and
AutoDock (Morris, et a/., 1998).
Consensus scoring is a recent trend in the drug discovery against the backdrop of given
imperfections of the current scoring functions. Consensus scoring combines information
from different scores to balance errors in single scores and improve the probability of
identifying 'true' ligands. However, the potential value of consensus scoring might be
limited, if terms in different scoring functions are significantly correlated, which could
amplify calculation errors, rather than balancing them.
16
Cfiapter 1
Flexible ligand-search methods Random/stochastic
• AutoDock (MC) • MOE-Dock (MC,TS) •GOLD(GA) • PRO_LEADS (TS)
Systematic •DOCK (incremental) • FlexX (incremental) • Glide (incremental) • Hammerhead (incremental) • FLOG (database)
Simulation •DOCK •Glide •MOE-Dock •AutoDock • Hammerhead
Introauction
Types of scoring functions Force-field-based
•D-Score •G-Score •GOLD ··AutoDock •DOCK
Empirical •LUDI • F-Score •ChemScore •SCORE •Fresno •X-SCORE
Knowledge-based •PMF •DrugScore •SMoG
Table lA. List of various flexible-ligand search methods and scoring functions implemented in different docking programs.
De novo design
Concepts: Basically, three questions have to be addressed by a de novo design program,
firstly how to assemble the candidate compounds; secondly how to evaluate their
potential quality; and lastly how to sample the search space effectively. De novo design is
faced with the problem of combinatorial explosion, i.e., the number of different element
types and the way they can be linked together is huge. Furthermore, there are not only a
large number of theoretically possible topologies but also a variety of conformations for a
single topology. This renders a simple enumeration of all solutions making an exhaustive
search impossible. All algorithmic decisions of a de novo design program are obviously
assessed by the quality of their outcome, which, in turn, crucially depends on a
meaningful reduction of the search space (Schneider & Fechner, 2005).
All information that is related to the ligand-receptor interaction forms the primary target
constraints for candidate compounds. Such constraints can be gathered both from the
17
Cliapter 1 Introduction
three-dimensional receptor structure and from known ligands of the particular target. If
the former is consulted, the design strategy is receptor-based; in the latter case, it is
ligand-based. Receptor-based design starts with the determination of the interaction sites.
Interaction sites are typically subdivided into hydrogen bonds, electrostatic and
hydrophobic interactions. Receptor groups capable of hydrogen-bonding are of special
interest owing to the strongly directional nature of the two interacting partners -
hydrogen-bond acceptor and donor which often forms key interaction sites. These allow
the assignation of ligand atom positions with a complementary hydrogen-bond type
within a small region of space and a defmed orientation. Receptor-based de novo design
uses a variety of methods to deduce interaction sites from the three-dimensional structure
of the binding pocket. Besides rule- and grid-based methods, Multiple Copy
Simultaneous Search (MCSS) are also used where the functional groups are minimized,
using forcefield, within the binding pocket. Groups are discarded if the interaction energy
between them and the protein is above a certain threshold. An MCSS run yields a set of
pre-docked fragments that can be further investigated to choose the most promising ones.
The same outcome can be achieved by the use of docking software, which is indeed the
initial step in some de novo design programs. Rule- and grid-based methods only derive
the primary target constraints: the outcome of these two methods is a map of the receptor
that pinpoints favorable interaction sites of a ligand. MCSS and the pre-docking of
fragments effectively amalgamate the derivation procedure and the first steps of structure
assembly, i.e., favorable positions of specific functional groups in the binding site are not
only indicated but are already placed at these positions. This placement of chemical
groups provides a starting point for the next step in de novo design which is the assembly
of a complete ligand.
Receptor-based scoring
The application of a de novo design program yields more than one candidate compound.
These structures can either emerge from a single run of the program or from several runs
with one candidate compound per run. Scoring functions rank the generated structures
and thereby suggest which structures are the most promising ones. Moreover, during a
run they also guide the design process through the search space by assigning fitness
values to the sampled space. Basically, these quality-assessment approaches can be
18
Chapter 1 Introauction
subdivided into three different types of receptor-based scoring functions: explicit force
field methods; empirical scoring functions and knowledge-based scoring functions. The
scoring is based on similar principles discussed earlier with docking.
Structure sampling
The basic building blocks for the assembly of candidate structures can be either single
atoms or fragments. Atom-based approaches are superior to fragment-based methods in
terms of the structural variety that can be generated. But this increase in potential
solutions makes it harder to find suitable candidate compounds among the ones that are
amenable. Fragment-based design strategies, on the other hand, significantly reduce the
size of the search space. This reduction can be called a 'meaningful reduction' if
fragments are used that commonly occur in drug molecules. There are several general
concepts of structure sampling: linking, growing, lattice-based sampling, random
structure mutation, transitions driven by molecular dynamics simulations, and graph
based sampling.
Combinatorial search strategies
De novo design has to tackle the issue of combinatorial explosion, which leads to
problems that are np-hard. Combinatorial search algorithms offer a practical solution by
giving up one or both of these two aims. A combinatorial search algorithm is able to
solve instances of combinatorial problems by reducing the size of the search space and by
exploring it efficiently. Heuristic algorithms represent one way to achieve this.
Breadth-first and depth-first search: Several de novo design programs implement either a
breadth-first strategy or a depth-first strategy. The depth-first strategy retains only one of
a variety of possible partial solutions at each level of the search space graph until an end
state is reached. Even if the highest-scoring partial solution is selected each time, it is not
guaranteed to fmd the overall best solution, but the search space is reduced significantly.
A breadth-first strategy retains all partial solutions at one level of the search space graph
and explores, sequentially, other levels until each of these paths reaches an end state.
During a breadth-first search all nodes are systematically examined so that identifying the
optimal solution is guaranteed.
19
Cftapter 1 Introtfuction
Monte Carlo and the Metropolis criterion: Random sampling, also called Monte Carlo
Search, can also be combined with a Metropolis criterion. In this case, after each
structure- modification step, the change is evaluated to decide whether it is accepted or
rejected. If the modification results in a fitter candidate compound, it is immediately
accepted. If, on the other hand, the modification yields a less fit candidate compound, it
can still be accepted with a probability that is based on the scoring function difference
between the modified and unmodified structure and a random number.
Evolutionary algorithms: Evolutionary algorithms are based on the ideas described by
Charles Darwin in 1859. They are population-based optimization algorithms that mimic
biological evolution with the genetic operators 'reproduction', 'mutation' and
'recombination' (crossover). Mutation introduces new information into a population,
whereas recombination exploits the information inherent in the population. Candidate
compounds are represented by individuals in a population so that a set of solutions is
obtained in a single run of the algorithm. Genetic algorithms are type of evolutionary
algorithms and are already discussed briefly as part of search techniques. The discussion
about strategies in VS would be incomplete without discussing fragment-based screening
for lead generation by high throughput X-ray crystallography/NMR.
1.3.3 Fragment-based screening by high throughput X-ray crystallography/NMR
Fragment-based approaches to lead discovery (Hartshorn, et al., 2005) are recently
becoming important as a complementary approach to traditional high throughput
screening for discovering new leads in drug discovery program. The approach involves
screening libraries of compounds, referred as fragments that are significantly smaller and
functionally simpler than drug molecules. Despite their low affinities {> 100 microM),
fragments possess high ratios of free energy of binding to molecular size and and may
therefore represent suitable starting points for evolution to good quality lead compounds.
Moreover, it is possible to optimize fragments to high quality leads of relatively low
molecular weight that possess better drug-like properties (Bemis & Murcko, 1996; Bemis
& Murcko, 1999). The methodology includes design of virtual fragment library, virtual
screening, synthesizing the hits identified by VS, cocktailing, structure solution through
X-ray/NMR techniques and fmally optimization where de novo design strategies are used
20
Cliapter 1 Introduction
for stitching potential binding fragments together with the help of suitable linkers within
the binding pocket of the enzyme followed by an iterative cycle of synthesis and structure
solution through X-ray/NMR techniques. This new approach is believed to have many
advantages over conventional screening such as more efficient sampling of chemical
space using fewer compounds and a more rapid hit-to-lead optimization phase.
1.3.4 Synergies between structure-based and ligand-based virtual screening
SBVS and LBVS have been considered almost mutually exclusive suggesting LBVS to
be used primarily in the absence of protein target structure(s) and SBVS to be used if
target structure(s) are available. Especially when a target protein structure at high atomic
resolution is available, SBVS is often considered as the first choice strategy ignoring
possible LBVS alternatives. However, there are numerous drawbacks to the strict
separation between ligand- and structure-based CADD methods. Most ligand-based
methods try to conserve the 3D arrangement of functional groups on a scaffold believed
to be important in the activity of existing ligands precluding the discovery of novel
ligands, which undertake different interactions with the target protein. If induced fit of
both ligand and protein are evaluated in docking methods, it becomes more expensive
computationally. Large scale changes, in the protein upon ligand binding are often
ignored. With and without ligands complexed to it, the availability of detailed structures
of the target, ideally in different conformations places limitations in structure-based
methods. Therefore, SBVS and LBVS should not be applied independently but rather in
concert to increase the chances of fmding novel hits. Some software approaches such as
SDOCKER have begun integrating both strategies by including ligand similarity as a part
of the scoring functions used in the docking algorithm (Muegge & Oloff, 2006).
Integrated CADD methodologies where SBDD is combined with pharmacophores use
crystal structure complexes to produce structure-based pharmacophores representing the
ligand features that are involved in interactions with the target protein, as well as the
space around the ligand occupied by the protein (Griffith, et a/., 2005) yielding
information about the binding cavity size and all relevant interactions. The protein-ligand
complexes can thus yield a "superligand" which can also be viewed as a pharmacophore.
Various types of novel pharmacophores can thereafter be compared and investigated.
Tfl-1'1165 21
Chapter 1 I ntroauction
1.4 Network pharmacology
The dominant paradigm in drug discovery is the concept of designing maximally
selective ligands to act on individual drug targets, the 'one gene, one drug, one disease'
paradigm (Drews, 2000) mentioned earlier. However, many effective drugs act via
modulation of multiple proteins rather than single targets. Advances in systems biology
are revealing a phenotypic robustness and a network structure, i.e., network biology
analysis predicts that if, in most cases, deletion of individual nodes has little effect on
disease networks, modulating multiple proteins may be required to perturb robust
phenotypes. This finding challenges the dominant assumption of single target drug
discovery and led to the · emergence of new drug discovery approach called
polypharmacology. This new appreciation of the role of polypharmacology is thought to
have significant implications for tackling the two major sources of attrition in drug
development-efficacy and toxicity. Integrating network biology and polypharmacology
holds the promise of expanding the current opportunity space for druggable targets.
However, the rational design of polypharmacology faces considerable challenges in the
need for new methods to validate target combinations and optimize multiple structure
activity relationships while maintaining drug-like properties. Advances in these areas are
creating the foundation of the next paradigm in drug discovery: network pharmacology
(Hopkins, 2008).
1.5 Future challenges
Over the past decades, the decrease in the clinical attrition rate challenges the current
drug discovery paradigm. In particular, there has been a worrying rise in late-stage
attrition in phase 2 and phase 3 (Kola & Landis, 2004). Currently, the two single most
important reasons for attrition in clinical development are (i) lack of efficacy and (ii)
clinical safety or toxicology, which each account for 30% of failures. An examination of
the root causes of why compounds undergo attrition in the clinic is very instructive and
helps in the identification of strategies and tactics to reduce these rates and thereby
22
Cliapter 1 Introauction
improve the efficiency of drug development. The discussion here will be restricted to in
silico methods in drug design that can be improvised further.
When one analyzes the future challenges of protein-ligand docking one must keep in
mind the ruling principles whereby protein receptors recognize, interact, and associate
with molecular substrates and inhibitors, which is of paramount importance in drug
discovery efforts. Protein-ligand docking aims to predict and rank the structure(s) arising
from the association between a given ligand and a target protein of known 3D structure.
Despite the breathtaking advances in the field over the last decades and the widespread
application of docking methods, several downslides still exist. In particular, protein
flexibility - a critical aspect for a thorough understanding of the principles that guide
ligand binding in proteins (induced-fit) and imperfections in scoring functions is a major
hurdle in current protein-ligand docking efforts that needs to be more efficiently
accounted for. Despite the size of the field, the principal types of search algorithms and
scoring functions as well as traditional limitations associated with molecular docking
must be addressed.
Fast docking protocols can be combined with accurate but more costly molecular
dynamics techniques to predict more reliable ligand-macromolecule complexes. The idea
of this combination lies in their complementary strengths. Docking simulations are used
to explore the vast conformational space in a short period of time, allowing the scrutiny
of large collections of drug-like compounds at a reasonable cost. Molecular dynamics can
treat both ligand and protein flexibility, especially in the protein that is usually a limited
characteristic in docking protocols. In molecular dynamics simulations, the effect of
explicit water molecules can be investigated directly, and accurate binding free energies
can be obtained.
As the capacity for biological screening and chemical synthesis have dramatically
increased, so have the demands for large quantities of early information on absorption,
distribution, metabolism, excretion (ADME) and toxicity data (van de Waterbeemd &
Gifford, 2003). Various medium and high-throughput in vitro ADMET screens are
therefore now in use. In addition, there is an increasing need for good tools for predicting
these properties to serve two key aims - first, at the design stage of new compounds and
23
Cfiapter 1 Introauction
compound libraries so as to reduce the risk of late-stage attrition; and second, to optimize
the screening and testing by looking at only the most promising compounds. Moreover,
principles of scaffold-hopping can be applied to generate diverse scaffolds that reduce the
risk oflate stage failure (Zhang & Muegge, 2006).
It is generally recognized that drug discovery and development are very time and
resources consuming processes. There is an ever-growing effort to apply computational
power to the combined chemical and biological space in order to streamline drug
discovery, design, development and optimization. In biomedical arena, computer-aided or
in silico design is being utilized to expedite and facilitate hit identification, hit-to-lead
selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile
and avoid safety issues. Regulatory agencies as well as pharmaceutical industry are
actively involved in development of computational tools that will improve effectiveness
and efficiency of drug discovery and development process, decrease use of animals, and
increase predictability. It is expected that the power of CADD will grow as the
technology continues to evolve.
The first part of this introduction discusses the principles that have been used for virtual
ligand as well as target-based screening and profiling to predict biological activity. The
aim of the second part of this chapter was to illustrate some of the current applications of
in silico methods for drug-design and the future challenges that need to be addressed.
Towards the end, our conclusion is that the in silico drug discovery paradigm is ongoing
and presents a rich array of opportunities that will assist in expediting the discovery of
new targets, and ultimately lead to compounds with predicted biological activity for the
drug targets.
1.6 References
Abagyan, R. and Totrov, M. (1994) Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J Mol Bioi, 235, 983-1002.
Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nat Rev Drug Discov, 1, 882-894.
24
Cliapter 1 I ntroiuc.tion
Bemis, G.W. and Murcko, M.A. (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem, 39, 2887-2893.
Bemis, G.W. and Murcko, M.A. (1999) Properties ofknown drugs. 2. Side chains. J Med Chem, 42,5095-5099.
Blundell, T.L., et a/. (1972) Three-dimensional atomic structure of insulin and its relationship to activity. Diabetes, 21, 492-505.
Bohm, H.J. (1992) LUDI: rule-based automatic design of new substituents for enzyme inhibitor leads. J Comput Aided Mol Des, 6, 593-606.
Davie, E.W., et a/. (1991) The coagulation cascade: initiation, maintenance, and regulation. Biochemistry, 30, 10363-10370.
Drews, J. (2000) Drug discovery: a historical perspective. Science, 287, 1960-1964.
Eldridge, M.D., eta/. (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des, 11, 425-445.
Gohlke, H., et a/. (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Bioi, 295, 337-356.
Goodford, P.J., et a/. (1980) The interaction of human haemoglobin with allosteric effectors as a model for drug-receptor interactions. Br J Pharmacol, 68, 7 41-7 48.
Griffith, R., et a/. (2005) Combining structure-based drug design and pharmacophores. J Mol Graph Model, 23, 439-446.
Halgren, T.A., et a/. (2004) Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J Med Chem, 4 7, 17 50-17 59.
Hartshorn, M.J., et a/. (2005) Fragment-based lead discovery using X-ray crystallography. J Med Chem, 48,403-413.
Hightower, M. and Kallas, E.G. (2003) Diagnosis, antiretroviral therapy, and emergence of resistance to antiretroviral agents in HN-2 infection: a review. Braz J Infect Dis, 7, 7-15.
Hopkins, A.L. (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Bioi, 4, 682-690.
Jain, A.N. (2003) Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J Med Chem, 46, 499-511.
Jorgensen, W.L. (2004) The many roles of computation in drug discovery. Science, 303, 1813-1818.
Kapetanovic, I.M. (2008) Computer-aided drug discovery and development (CADDO): in silico-chemico-biological approach. Chem Bioi Interact, 171, 165-176.
Kaufinann, S.H. (2008) Paul Ehrlich: founder of chemotherapy. Nat Rev Drug Discov, 7, 373.
Kitchen, D.B., eta/. (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov, 3, 935-949.
25
Cfuzpter 1 Introtfuction
Kola, I. and Landis, J. (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov, 3, 711-715.
Kramer, B., et al. (1999) Evaluation of the FLE:XX incremental construction algorithm for protein-ligand docking. Proteins, 37, 228-241.
Lipinski, C.A., et al. (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev, 46, 3-26.
Maryanoff, B.E. (2004) Inhibitors of serine proteases as potential therapeutic agents: the road from thrombin to tryptase to cathepsin G. J Med Chern, 47, 769-787.
McGann, M.R., et al. (2003) Gaussian docking functions. Biopolyrners, 68, 76-90.
Moitessier, N., et al. (2008) Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br J Pharmacol, 153 Suppl 1, S7-26.
Morris, G.M., et al. (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Cornput Chern, 19, 1639-1662.
Moustakas, D.T., et al. (2006) Development and validation of a modular, extensible docking program: DOCK 5. J Cornput Aided Mol Des, 20, 601-619.
Muegge, I. (2000) A knowledge-based scoring function for protein-ligand interactions: probing the reference state. Perspect Drug Discov Des, 20, 99-114
Muegge, I. and Oloff, S. (2006) Advances in virtual screening. Drug Discovery Today: Technologies, 3, 405-411.
Oprea, T.I. and Matter, H. (2004) Integrating virtual screening in lead discovery. Curr Opin Chern Bioi, 8, 349-358.
Rarey, M., et al. (1996) A fast flexible docking method using an incremental construction algorithm. J Mol Bioi, 261,470-489.
Russ, A.P. and Lampel, S. (2005) The druggable genome: an update. Drug Discov Today, 10, 1607-1610.
Schneider, G. and Fechner, U. (2005) Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov, 4, 649-663.
Stahl, M., et al. (2006) Integrating molecular design resources within modem drug discovery research: the Roche experience. Drug Discov Today, 11, 326-333.
van de Waterbeemd, H. and Gifford, E. (2003) ADMET in silico modelling: towards prediction paradise? Nat Rev Drug Discov, 2, 192-204.
Verdonk, M.L., et al. (2003) Improved protein-ligand docking using GOLD. Proteins, 52, 609-623.
von ltzstein, M., et al. (1993) Rational design of potent sialidase-based inhibitors of influenza virus replication. Nature, 363, 418-423.
Waszkowycz, B. (2008) Towards improving compound selection in structure-based virtual screening. Drug Discov Today, 13, 219-226.
26
Chapter 1 I ntrotfuction
Weiner, S.J., eta/. (1986) An all-atom force field for simulations of proteins and nucleic acids. J Comput Chem 1, 252.
Willett, P. (2006) Similarity-based virtual screening using 2D fingerprints. Drog Discov Today, 11, 1046-1053.
Zhang, Q. and Muegge, I. (2006) Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. J Med Chem, 49, 1536-1548.
27