Chapter 1 Introduction - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13343/4/04_chapter...

28
Chapter 1 · · j · Introduction

Transcript of Chapter 1 Introduction - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/13343/4/04_chapter...

Chapter 1

• · · j · Introduction

Cliapter 1 Introtfuction

1.1 Drug discovery process and drug research

Drug discovery is a long, arduous process (Kapetanovic, 2008) and involves

identification of new drug targets and development of marketable drugs. Drugs mainly

work by interacting with target molecules (receptors) in our bodies and altering their

activities in a way that is beneficial to our health.

Drug discovery process is broadly grouped into different areas, namely, disease target

identification, target validation, high-throughput identification of "hits" and "leads", lead

optimization, and pre-clinical and clinical evaluation. New drugs begin in the laboratory

with identification of new drug targets that play a role in specific diseases. The chemical

and biological substances that target these biological markers that are likely to have drug­

like effects are searched. Out of every 5,000 new compounds identified during the

discovery process, only five are considered safe for testing in human volunteers after

preclinical evaluations. After three to six years of further clinical testing in patients, only

one of these compounds is ultimately approved as a marketed drug for treatment.

According to an estimate, discovering and bringing one new drug to the public typically

costs a pharmaceutical or biotechnology company a whopping amount of nearly $900

million and takes an average of 10 to 12 years. An outline of different stages involved in

drug discovery process with respect to time is represented in Figure 1.1. However, in

special circumstances, such as the search for effective drugs to treat AIDS or in a sudden

outbreak like Swine flu, the U.S. Food and Drug Administration (FDA,

http://www.fda.gov/) has encouraged an abbreviated process for drug testing and

approval called Fast-Tracking. The drug discovery and drug development process is

designed to ensure that only those pharmaceutical products that are both safe and

effective are brought to market. Fortunately, advances in computer-assisted drug

discovery (CADD) are improving the way scientists do their work saving lots of time and

money. Since the main area of concern in this thesis is the application of computational

methodologies in new inhibitor discovery, therefore foremost it is important to introduce

to the concept of rational drug design.

Cfiapter 1 I ntrotfuction

[tHE DRUG DISCOVERY PROCESS) The entire "drug discovery'' process takes an average of 12 -15 years to complete.

• Basic Research Stage - Years 0-3 • During this time period. thousands of substances are being developed.

examined and screened. • Development Stage - Years 4-10 • During this time, 10-20 substances are tested, both in vitro (within Cl1

artificial environment) and in vivo (within a living body). • Preclinical Testing conducted using animals for years 4,5,6 • Clinical Testing conducted using humans for years 7,8,9 e11d10 • Phase I - clinical testing on 5-10 subste11ees • Phase II - clinical testing on 2-5 substances • Phase III - clinical testing on 2 subste11ees

Registration of the drug with the U.S. Food e11d Drug Administration Introduction of the drug to the public- Years 11+ Product Surveillance done on people for years 11,12,+

• Phase IV - observations/monitoring the product

Fig 1.1 Drug Discovery Process. An overview

1.2 Rational drug design

The birth of chemotherapy: Paul Ehrlich, first argued that differences in chemoreceptors

between species may be exploited therapeutically (Kaufmann, 2008).

Finding effective drugs is difficult!! Many drugs are discovered by chance observations,

the scientific analysis of folk medicines or by noting side effects of other drugs. It is the

rare case today when an unmodified natural product like taxol becomes a drug. Most drug

discovery programs begins with rational drug design which is a more focused approach

and uses information about the structure of a drug receptor or one of its natural ligands to

identify or create candidate drugs. This philosophy of rational drug design, or more

specifically, the 'one gene, one drug, one disease' paradigm (Drews, 2000), arose from a

congruence between genetic reductionism and new molecular biology technologies that

2

cliapter 1 Introduction

enabled the isolation and characterization of individual 'disease-causing' genes. The

underlying assumption of the current approach is that safer, more effective drugs will

result from designing very selective ligands where undesirable and potentially toxic side

effects have been removed. However, after nearly two decades of focusing on developing

highly selective ligands, the current clinical attrition figures challenge this hypothesis and

scientist world-over are proposing newer strategies to make rational drug design a

success.

Discussion of the use of structural biology in drug discovery began over 35 years ago,

with the advent of the 3D structures of globins, enzymes and polypeptide hormones.

Early ideas in circulation were the use of 3D structures to guide the synthesis of ligands

of hemoglobin to decrease sickling or to improve storage of blood (Goodford, et a/.,

1980), the chemical modification of insulins to increase half-lives in circulation

(Blundell, eta/., 1972) and the design of inhibitors of serine proteases to control blood

clotting (Davie, et a/., 1991). One of the first drugs produced by rational design was

Relenza (von Itzstein, et a/., 1993), which is used to treat influenza. Relenza was

developed by choosing molecules that were most likely to interact with neuraminidase, a

virus-produced enzyme that is required to release newly formed viruses from infected

cells.

Many of the recent drugs developed to treat HIV infections (e.g. Ritonivir, Indinavir)

{Hightower & Kallas, 2003) were designed to interact with the viral protease, the enzyme

that splits up the viral proteins and allows them to assemble properly. Since the activity

of the molecule is the result of multitude of factors such as bioavailability, selectivity,

toxicity and metabolism, the rational drug design strategies been helping medicinal

chemists for quite some time. Scientific advancements during the past two decades have

altered the way pharmaceutical research produces new bioactive molecules. Very

recently, impressive technological advances in areas such structural characterization of

proteins, combinatorial chemistry, computational chemistry and molecular biology have

made a significant contribution to make rational drug design more feasible.

Computational methodologies have become an integral component of modem drug

discovery programs (Jorgensen, 2004). It is far more than just data-management and

3

Cfuzpter 1 I ntrotfuction

includes target determination, target validation, hit identification, lead optimization and

beyond. During the past two decades, the computer-based design of hit and lead structure

candidates has emerged as a complementary approach to high-throughput screening

(Bajorath, 2002). The addition of CADD technologies have expedited the discovery

processes by delivering new drug candidates at a faster rate and could lead to a reduction

of up to 500/o in the total cost of drug design (Stahl, et al., 2006). Advances in

computational techniques and hardware solutions have enabled in silico methods to speed

up lead optimization and identification. As all the aspects of rational drug design are

beyond the scope of this chapter, only current topics in computer-aided drug design

underscoring some of the most recent approaches and interdisciplinary processes are

discussed here.

1.3 Computer-Aided Drug Design

Use of computational techniques in drug discovery and development process is rapidly

gaining popularity, implementation and appreciation. Different terms are being applied to

this area, including CADD, computational drug design, computer-aided molecular design

(CAMD), computer-aided molecular modeling (CAMM), rational drug design, in silico

drug design, computer-aided rational drug design (Kapetanovic, 2008). Computer

assisted approaches to identify novel inhibitors via ligand-based drug design, structure­

based drug design (drug-target docking), quantitative structure-activity and quantitative

structure-property relationships have been used to identify new inhibitors using large

available databases and algorithms implemented in highly selective and efficient

programs such as Insightll and Catalyst (Accelrys, San Diego, CA}, Sybyl {Tripos, San

Diego, CA), Gaussian (Pittsburgh, PA), and many others.

Here, we introduce basic concepts and specific features of CADD including small­

molecule-protein docking methods, hit identification and lead optimization and several

selected applications, with particular emphasis on virtual screening, but do not

specifically discuss protein-protein docking, which is less relevant for small-molecule

drug discovery.

4

cnapter 1 Introauction

1.3.1 Small molecule resources and drug targets

"We're in a very target-rich but lead-poor post-genomics era for drug discovery." Raymond Stevens.

In general, a detailed 3D structure of the drug target and small molecule library is the

first consideration before starting a CADD project. The introduction of genomics,

proteomics, metabolomics and advances in structure determination techniques of NMR

and X-ray crystallography has paved the way for biology-driven process, leading to a

plethora of drug targets. To date, more than 55,000 protein structures have been deposited

in RCSB (http://www.rcsb.org/pdb). Identification and validation of viable targets is an

important first step in drug discovery and new methods, and integrated approaches are

continuously being explored to improve the discovery rate and exploration of new drug

targets. The sequencing of human and other genomes has made possible the identification

of many unknown proteins that might serve as new drug targets. The druggablility which

is the presence of protein folds that favors interaction with drug-like chemical

compounds, of the genomes has been predicted (Russ & Lampel, 2005). However, the

detailed 3D structures for the majority of membrane proteins and others are still

unknown. Comparative homology modeling of protein structures, based on

experimentally determined structures of homologous proteins, can be a useful

methodological alternative. For the future, collaborative structural genomics initiatives

may aim at determining the 3D structure of all known proteins, based on a combination

of experimental structure determination and molecular modeling.

The public availability of data on small molecules such as drugs and drug-like molecules

databases, for example, chemical

(http:/ /redpoll. pharmacy. ualberta.caldrugbank/),

repositories

Zinc

such as DrugBank

(http:/ /www.docking.org),

PubChem (http://pubchem.ncbi.nlm.nih.gov/), and others consist of a wealth of targets

and small molecule data that can be mined and used for CADD approaches.

1.3.2 Virtual screening

The term "Virtual screening" (referred as VS) appeared in 1997, became an integral part

of the drug discovery process in recent years and is an increasingly important strategy of

5

Cfiapter 1 I ntrotfuction

the computer-aided search for novel lead compounds (Fig 1.2). Virtual screening (VS) or

in silico screening has been established as a powerful alternative and complement to

high-throughput screening (HTS) (Bajorath, 2002). In essence, VS methods are designed

for searching large compound databases in silico and selecting a limited number of

candidate molecules for testing to identify novel chemical entities that have the desired

biological activity. A current estimate is that biological screening and preclinical

pharmacological testing alone account for -14% of the total research and development

(R&D) expenditures of the pharmaceutical industry. Target and ligand-based virtual

screening has emerged as resource-saving techniques that have been successfully applied

to identify novel chemotypes as biologically active molecules, either inhibitors or

antagonists of diverse biological targets.

VS techniques applications include compound filtering methods, lead identification and

lead optimization strategies. The compound filtering methods are implemented to create

and filter large databases for compounds with desired or undesired chemical groups,

drug-like characters, preferred solubility and absorption characteristics, or oral

bioavailability etc. A diverse array of VS methods has been developed, including

structural queries, pharmacophores, molecular fmgerprints, QSAR models, diverse

cluster analysis tools, statistical techniques and docking calculations (Bajorath, 2002).

Lead identification and lead optimization VS strategies that are often divided into

structure-based virtual screening (SBVS) and ligand-based virtual screening (LBVS),

possible synergies between two and some compound filtering techniques which are an

indispensable part of both are discussed in detail here. Both SBVS and LBVS methods

rely on appropriate software to query large databases of virtual or existing chemicals.

1.3.2.1 Ligand-based virtual screening (LBVS)

Given a bioactive conformer (chemotype) derived from structural methods (X-ray and

NMR.), or from molecular modeling, LBVS can rank novel ligands by 3D similarity

searching or by pharmacophore pattern matching (Oprea & Matter, 2004).

6

Cftapter 1

140

120

I'DD

I: 20 0 .,__,..._.,.......,..

1987 18118 ,. - 2001 - - 21004 2005 2006 NIIICIIIan ,..,

I ntrotluction

Fig 1.2 Analysis of VS publications obtained from PubMed search performed using virtual screening as keyword (Adapted from: Drug Discovery Today: Technologies Vo/.3, 405-411, 2006).

Similarity searches: In this paradigm, molecules with similar features are considered to

have similar biological activity. For single templates, similarity searching is a preferred

approach (Fig 1.3), and popular tools include various two-dimensional (2D) and three­

dimensional (3D) structural queries, pharmacophore models, 2D and 3D fmgerprints,

volume-matching techniques and complex molecular descriptors (Willett, 2006). These

approaches include 2D or 3D quantitative structure-activity relationship (QSAR) models

and diverse clustering and partitioning methods.

Structure or descriptor-based queries: When a single inhibitor molecule is available as

a template, the structure-based queries can be either 2D substructures (molecular

fragments) or 3D molecular queries, particularly pharmacophore models can be used to

perform similarity searches against compound databases. These methods generally

require the efficient generation of reasonable, low-energy (single or multiple)

conformations of database compounds. As with any 3D search method, the crucial

assumption is made that database conformations are at least similar to biologically active

structures of candidate molecules, which is merely a hypothesis in most cases.

7

Cftapter 1 I ntroauc.tion

I Similarity Searching I !

! Structure- or descriptor-based queries

• 2D substructures or molecular fingerprints • 3D molecular queries- pharmacophore

models • Molecular Descriptors- shape, topology,

electronic features • 3D or 4D QSAR

! Molecular Fingerprints

• 2D fingerprints • 3D pharmacophore

fingerprints

Fig 1.3 Similarity searching strategies in Ligand-based virtual screening.

Phannacophore identification and matching: In the pharmacophore concept, all

ligands, regardless of chemotype, present similar steric and electrostatic features that are

recognized at the target-binding site and are responsible for the biological activity.

Pharmacophore model is a hypothesis on the 3D arrangement of structural properties.

These models include aromatic rings, hydrophobic groups of compounds that bind to a

biological target as well as hydrogen bond donor and acceptor groups. Geometric and

steric constraints can be defined when one has the 3D structure of the receptor target or

by comparison with inactive analogs. 3D searches in large databases can be performed

once a pharmacophore model is established.

Descriptor-based queries can also be used for database searching. Such complex

descriptors are mostly designed to detect similarities in molecular shape and shape­

related properties or topological and electronic features between query and test

compounds. In addition to single 2D or 3D queries, information from different sub­

structural descriptors can also be combined and reduced to generate descriptor spaces of

low complexity for similarity analysis and searching. Furthermore, when series of

analogues that have differential activity are available as training sets, 3D or 4D

Quantitative structure-activity relationship (QSAR) models can be developed and used

for analyzing molecular similarity and VS.

8

Cfiapter 1 I ntrotfuction

Molecular fingerprints are binary bit string representations that capture diverse aspects

of molecular structure and properties, and are among the most popular tools for VS.

Similar to structure-based queries, fmgetprints are the method of choice for cases in

which only single hits or leads are available as templates for searching. This analysis

proceeds in 'fingetprint space', which means that a fmgetprint is calculated for the

template molecule and searched against the corresponding fingetprints of database

compounds. Fingetprint overlap is compared and quantified using a similarity measure,

such as the Tanimoto Coefficient (Tc), and any fingetprint search requires the definition

of similarity threshold values. A detailed comparison of a large number of similarity

coefficients demonstrates that the well-known Tanimoto coefficient remains the method

of choice for the computation of fingetprint-based similarity, despite possessing some

inherent biases related to the sizes of the molecules that are being sought.

Although all fmgetprints are binary bit strings, their design concepts and lengths vary

substantially. Simple 20 fmgetprints might consist of only -100 bit positions, each of

which detects the presence or absence of a specific structural fragment or codes for a

value of a property descriptor. More complex 20 fingetprints map properties, such as

possible connectivity pathways, through molecules to overlapping bit segments and

consist of several thousand bits. By contrast, 3D pharmacophore fmgetprints monitor all

possible arrangements of predefmed, three- or four-point phamacophores in a molecule,

as explored by systematic conformational searching, and consist of millions ofbits.

Partitioning and Clustering

Compound clustering (Bajorath, 2002) founds application in classification of compound

databases for HTS (High-throughput screening) and VS. For VS calculations, clustering

and partitioning (Fig 1.4) are particularly suitable when series or different sets of active

compounds are available as starting points. In such cases, database clusters or partitions

into which active molecules fall can be identified, and representative compounds can be

selected for biological testing.

Filter methods

Although compound filtering is not directly involved in identification of active

molecules, it is still regarded as a VS method. Filtering is applied to enrich libraries with

9

Cliapter 1 Introduction

molecules that have preferred properties or, equally important, to eliminate compounds

that have characteristics that are clearly incompatible with the discovery requirements.

According to one analysis, half of the costly late stage failures in drug development were

attributed to poor pharmacokinetics (390/o) and animal toxicity (11%) (van de

Waterbeemd & Gifford, 2003). The in silico approaches has increased the ability to

predict and model the most relevant pharmacokinetic, metabolic and toxicity endpoints

on absorption, distribution, metabolism, excretion (ADME) and toxicity data (together

called ADMET data}, thereby accelerating the drug discovery process. The ADMET

properties that are addressed to improve the quality of screening libraries includes:

Elimination of reactive or toxic groups, Aqueous solubility and passive absorption, drug­

like characters given by Lipinski's Rule of five (Lipinski, et al. 2001}, Blood-brain barrier

penetration and CNS activity and other ADME properties like oral availability and

metabolic stability.

Partitioning

00

l o• 0 0 0 0

•••••••• ••• •o 0 0 • 0 0 o 0 • 0 0

Clustering

Adapted from Nat Drug Discov Rev 2002, 882-894.

Fig 1.4 Clustering versus partitioning. The figure illustrates methodological differences

between clustering and partitioning. Coloured in red are active compounds that are added

to a source database before the analysis. Molecules found to be 'similar' to these

templates (candidates for biological testing) are shown in light grey.

10

Chapter 1 I ntrotfuction

1.3.2.2 Structure-based virtual screening (SBVSl

Given a protein structure, and/or its binding site, SBVS finds a new molecule that

changes the protein's activity. The success of SBVS is well documented (Maryanoff,

2004); it has contributed to the introduction of ~50 compounds into clinical trials and to

numerous drug approvals. Structure-based virtual screening has evolved over the past

decade, and many different variations of the basic methodology have been proposed.

Numerous reviews have been published on virtual screening. One of the more recent ones

(Waszkowycz, 2008), covers the current state-of-the-art and many of the practical aspects

of SBVS that are briefly discussed here.

Docking

With the rise in the number of structural targets with therapeutic potential, high­

throughput docking is the primary hit identification tool used in drug discovery research

process (Kitchen, et al., 2004). Docking is also used later on during lead optimization,

when modifications to known active structures can quickly be tested in computer models

before compound synthesis. Furthermore, docking can also contribute to the analysis of

drug metabolism using structures such as cytochrome P450 isoforms.

Concepts: In the context of molecular modeling, docking means predicting the bioactive

conformation of a molecule in the binding site of a target structure. In essence, this is

equivalent to finding the global free energy minimum of the system consisting of the

ligand and the target. Docking is used as a tool in structure-based drug design as well as

in SBVS. The first algorithm developed to dock small molecules into the binding pocket

of a macromolecule, the DOCK algorithm, was published in 1982 by Kuntz et al.

(Moustakas, et al., 2006). In a review from 2007, more than sixty published docking

programs and thirty scoring functions were listed (Moitessier, et al., 2008). However, the

earliest and most widely used docking programs over the past years are probably DOCK

(Moustakas, et al., 2006}, AutoDock (Morris, et al., 1998), GOLD (Verdonk, et al., 2003)

and FlexX (Rarey, et al., 1996}, and in recent years also e.g. Glide(Halgren, et al., 2004),

ICM (Abagyan & Totrov, 1994), FRED (McGann, et al., 2003), and Surflex-Dock (Jain,

2003). All docking programs contain at least two components, a fitness or scoring

function, whose global minimum is intended to coincide with the global free energy

11

Cfzapter 1 I ntroduc.tion

minimum of the target-ligand system, and a search method, used to sample the search

space in which the scoring function is optimized (Table lA). This search space can be

very large, combining all ligand positions and rotations with all possible conformations

of the ligand and possibly also the target protein. The docking process involves the

prediction of ligand conformation and orientation (or posing) within a target binding site.

In general, there are two aims of docking studies: accurate structural modeling and

correct prediction of activity. However, the identification of molecular features those are

responsible for specific biological recognition, or the prediction of compound

modifications that improve potency, are complex issues that are often difficult to

understand and even more so, to simulate on a computer. In view of these challenges,

docking is generally devised as a multi-step process in which each step introduces one or

more additional degrees of complexity. The process begins with the application of

docking algorithms that 'pose' small molecules in the active site. Algorithms are

complemented by scoring functions that are designed to predict the biological activity

through the evaluation of interactions between compounds and potential targets. It should

also be noted that ligand-binding events are driven by a combination of enthalpic and

entropic effects, and that either entropy or enthalpy can dominate specific interactions.

This often presents a conceptual problem for contemporary scoring functions, because

most of them are much more focused on capturing energetic than entropic effects. Other

challenges are posed by various factors including limited resolution of crystallographic

targets, inherent flexibility, induced fit or other conformational changes that occur on

binding, and the participation of water molecules in protein-ligand interactions.

Search methods and molecular flexibility: Methods used for the treatment of ligand

flexibility can be divided into three basic categories viz. systematic methods (incremental

construction, conformational search, databases); random or stochastic methods (Monte

Carlo, genetic algorithms, tabu search); and simulation methods (molecular dynamics,

energy minimization) (Kitchen, et al., 2004). A summary of the search approaches

implemented in widely used docking programs are described here briefly as well as listed

in Table lA.

12

Cliapter 1 Introluction

Search techniques

Monte Carlo algorithm in its basic form: Generates an initial configuration of a ligand

in an active site consisting of a random conformation, translation and rotation. Scores the

initial configuration and then generates a new configuration and score it. Use a

Metropolis criterion (see below) to determine whether the new configuration is retained.

Repeat previous steps until the desired number of configurations is obtained.

Metropolis criterion

If a new solution scores better than the previous one, it is immediately accepted. If the

configuration is not a new minimum, a Boltzmann-based probability function is applied.

If the solution passes the probability function test, it is accepted; if not, the configuration

is rejected.

Molecular dynamics

Molecular dynamics is a simulation technique that solves Newton's equation of motion

for an atomic system: Fi= mi ah in which Fi is force, mi is mass and ai is acceleration. The

force on each atom is calculated from a change in potential energy (usually based on

molecular mechanics terms) between current and new positions: Fi = -(dE/ri), in which r

is distance. Atomic forces and masses are then used to determine atomic positions over

series of very small time steps: Fi = mi (d2r/dt2), in which t is time. This provides a

trajectory of changes in atomic positions over time. Practically, it is easier to determine

time-dependent atomic positions by first calculating accelerations ai from forces and

masses, then velocities Vi from ai = dv/dt and, ultimately, positions from velocities Vi =

dr/dt.

Genetic algorithms

Genetic algorithms are a class of computational problem-solving approaches that adapt

the principles of biological competition and population dynamics. Model parameters are

encoded in a 'chromosome' and stochastically varied. Chromosomes yield possible

solutions to a given problem and are evaluated by a fitness function. The chromosomes

that correspond to the best intermediate solutions are subjected to crossover and mutation

operations analogous to gene recombination and mutation to produce the next generation.

13

Cfuzpter 1 I ntrotfuction

For docking applications, the genetic algorithm solution is an ensemble of possible ligand

conformations.

Tabu search

This search algorithm makes n small random changes to the current conformation. Rank

each change according to the value of the chosen fitness function. Determine which

changes are 'tabu' (that is, previously rejected conformations). If the best modification

has a lower value than any other accepted so far, accept it, even if it is in the 'tabu';

otherwise, accept the best 'non-tabu' change. Add the accepted change to the 'tabu' list

and record its score. Go to the first step.

Protein flexibility

Various approaches have been applied to flexibly model at least part of the target,

including molecular dynamics and Monte Carlo calculations, rotamer libaries and protein

ensemble grids. The idea behind using aminoacid side-chain rotamer libraries is to model

protein conformational space on the basis of a limited number of experimentally observed

and preferred side-chain conformations. To reduce the number of discrete protein

conformations arising from combinations of rotamers, a dead-end elimination algorithm

is often used. This algorithm recursively removes side-chain conformations that do not

contribute to a minimum energy structure. Another method of treating protein flexibility

is to use ensembles of protein conformations (rather than a single one) as the target for

docking and to map these ensembles on a grid representation. One approach generates an

average potential energy grid of the ensemble, as first implemented in DOCK; another

maps various receptor potentials to each grid point and subsequently scores ligand

conformations against each set of receptor potentials.

Scoring

Scoring functions implemented in docking programs make various assumptions and

simplifications in the evaluation of modeled complexes and do not fully account for a

number of physical phenomena that determine molecular recognition - for example,

entropic effects. Essentially, three types or classes of scoring functions are currently

14

Chapter 1 I ntrotfuc.tion

applied and includes Force-field-based, empirical and knowledge-based scoring

functions. All of these approaches attempt to approximate the binding free energy.

Empirical scoring functions are a weighted sum of individual ligand-receptor

interaction types commonly supplemented by penalty terms, such as the number of

rotatable ligand bonds. The weights correspond to the average free-energy contribution of

a single interaction of that type and are obtained by a regression analysis of a set of

receptor-ligand complexes. Interaction types include, for example, hydrogen bonds,

electrostatic interactions and hydrophobic interactions. The regression analysis requires

both known structures and binding constants, and so the available datasets are limited in

size and often feature similar ligands and receptors. This can result in a bias of empirical

scoring functions towards specific structural motifs. However, they are fast and have

proved their suitability, and are therefore implemented in several de novo design

programs. LUDI (Bohm, 1992) and Chemscore (Eldridge, et al., 1997) are the popular

implementions of empirical scoring functions.

Knowledge-based scoring is grounded on a statistical analysis of ligand-receptor

complex structures. The frequencies of each possible pair of atoms in contact to each

other are determined. Interactions found to occur more frequently than would be

randomly expected are considered attractive; interactions that occur less frequently are

considered repulsive. Only structural information is necessary to derive these frequencies

so that a greater number of structures can be included in the analysis. As the available

structures are also more diverse than those with known binding affinities, less bias is

expected compared with empirical scoring functions. Popular implementations of such

functions include Potential of mean force (PMF) (Muegge, 2000) and DrugScore

(Gohlke, et al., 2000), which also includes solvent-accessibility corrections to pair-wise

potentials.

Force-field-based scoring is based on the molecular mechanics terms which express the

energy of the system. Molecular mechanics force fields usually quantify the sum of two

energies, the receptor-ligand interaction energy and internal ligand energy (such as steric

strain induced by binding). Interactions between ligand and receptor are most often

described by using van der Waals and electrostatic energy terms. The van der Waals

15

Cfiapter 1 I ntrotfuction

energy term is given by a Lennard-Jones potential function. Electrostatic terms are

accounted for by a Coulombic formulation with a distance- dependent dielectric function

that lessens the contribution from charge-charge interactions. The functional form of the

internal ligand energy is typically very similar to the protein-ligand interaction energy,

and also includes van der Waals contributions and/or electrostatic terms. Various force­

field scoring functions are based on different force field parameter sets. For example,

GScore is based on the Tripos force field (Kramer, et a/., 1999) and AutoDock on the

Amber force field (Weiner, eta/., 1986). Recent extensions of force-field-based scoring

functions include a torsional entropy term for ligands in G-Score and the inclusion of

explicit protein-ligand hydrogen-bonding terms in Gold (Verdonk, et a/., 2003) and

AutoDock (Morris, et a/., 1998).

Consensus scoring is a recent trend in the drug discovery against the backdrop of given

imperfections of the current scoring functions. Consensus scoring combines information

from different scores to balance errors in single scores and improve the probability of

identifying 'true' ligands. However, the potential value of consensus scoring might be

limited, if terms in different scoring functions are significantly correlated, which could

amplify calculation errors, rather than balancing them.

16

Cfiapter 1

Flexible ligand-search methods Random/stochastic

• AutoDock (MC) • MOE-Dock (MC,TS) •GOLD(GA) • PRO_LEADS (TS)

Systematic •DOCK (incremental) • FlexX (incremental) • Glide (incremental) • Hammerhead (incremental) • FLOG (database)

Simulation •DOCK •Glide •MOE-Dock •AutoDock • Hammerhead

Introauction

Types of scoring functions Force-field-based

•D-Score •G-Score •GOLD ··AutoDock •DOCK

Empirical •LUDI • F-Score •ChemScore •SCORE •Fresno •X-SCORE

Knowledge-based •PMF •DrugScore •SMoG

Table lA. List of various flexible-ligand search methods and scoring functions implemented in different docking programs.

De novo design

Concepts: Basically, three questions have to be addressed by a de novo design program,

firstly how to assemble the candidate compounds; secondly how to evaluate their

potential quality; and lastly how to sample the search space effectively. De novo design is

faced with the problem of combinatorial explosion, i.e., the number of different element

types and the way they can be linked together is huge. Furthermore, there are not only a

large number of theoretically possible topologies but also a variety of conformations for a

single topology. This renders a simple enumeration of all solutions making an exhaustive

search impossible. All algorithmic decisions of a de novo design program are obviously

assessed by the quality of their outcome, which, in turn, crucially depends on a

meaningful reduction of the search space (Schneider & Fechner, 2005).

All information that is related to the ligand-receptor interaction forms the primary target

constraints for candidate compounds. Such constraints can be gathered both from the

17

Cliapter 1 Introduction

three-dimensional receptor structure and from known ligands of the particular target. If

the former is consulted, the design strategy is receptor-based; in the latter case, it is

ligand-based. Receptor-based design starts with the determination of the interaction sites.

Interaction sites are typically subdivided into hydrogen bonds, electrostatic and

hydrophobic interactions. Receptor groups capable of hydrogen-bonding are of special

interest owing to the strongly directional nature of the two interacting partners -

hydrogen-bond acceptor and donor which often forms key interaction sites. These allow

the assignation of ligand atom positions with a complementary hydrogen-bond type

within a small region of space and a defmed orientation. Receptor-based de novo design

uses a variety of methods to deduce interaction sites from the three-dimensional structure

of the binding pocket. Besides rule- and grid-based methods, Multiple Copy

Simultaneous Search (MCSS) are also used where the functional groups are minimized,

using forcefield, within the binding pocket. Groups are discarded if the interaction energy

between them and the protein is above a certain threshold. An MCSS run yields a set of

pre-docked fragments that can be further investigated to choose the most promising ones.

The same outcome can be achieved by the use of docking software, which is indeed the

initial step in some de novo design programs. Rule- and grid-based methods only derive

the primary target constraints: the outcome of these two methods is a map of the receptor

that pinpoints favorable interaction sites of a ligand. MCSS and the pre-docking of

fragments effectively amalgamate the derivation procedure and the first steps of structure

assembly, i.e., favorable positions of specific functional groups in the binding site are not

only indicated but are already placed at these positions. This placement of chemical

groups provides a starting point for the next step in de novo design which is the assembly

of a complete ligand.

Receptor-based scoring

The application of a de novo design program yields more than one candidate compound.

These structures can either emerge from a single run of the program or from several runs

with one candidate compound per run. Scoring functions rank the generated structures

and thereby suggest which structures are the most promising ones. Moreover, during a

run they also guide the design process through the search space by assigning fitness

values to the sampled space. Basically, these quality-assessment approaches can be

18

Chapter 1 Introauction

subdivided into three different types of receptor-based scoring functions: explicit force­

field methods; empirical scoring functions and knowledge-based scoring functions. The

scoring is based on similar principles discussed earlier with docking.

Structure sampling

The basic building blocks for the assembly of candidate structures can be either single

atoms or fragments. Atom-based approaches are superior to fragment-based methods in

terms of the structural variety that can be generated. But this increase in potential

solutions makes it harder to find suitable candidate compounds among the ones that are

amenable. Fragment-based design strategies, on the other hand, significantly reduce the

size of the search space. This reduction can be called a 'meaningful reduction' if

fragments are used that commonly occur in drug molecules. There are several general

concepts of structure sampling: linking, growing, lattice-based sampling, random

structure mutation, transitions driven by molecular dynamics simulations, and graph­

based sampling.

Combinatorial search strategies

De novo design has to tackle the issue of combinatorial explosion, which leads to

problems that are np-hard. Combinatorial search algorithms offer a practical solution by

giving up one or both of these two aims. A combinatorial search algorithm is able to

solve instances of combinatorial problems by reducing the size of the search space and by

exploring it efficiently. Heuristic algorithms represent one way to achieve this.

Breadth-first and depth-first search: Several de novo design programs implement either a

breadth-first strategy or a depth-first strategy. The depth-first strategy retains only one of

a variety of possible partial solutions at each level of the search space graph until an end

state is reached. Even if the highest-scoring partial solution is selected each time, it is not

guaranteed to fmd the overall best solution, but the search space is reduced significantly.

A breadth-first strategy retains all partial solutions at one level of the search space graph

and explores, sequentially, other levels until each of these paths reaches an end state.

During a breadth-first search all nodes are systematically examined so that identifying the

optimal solution is guaranteed.

19

Cftapter 1 Introtfuction

Monte Carlo and the Metropolis criterion: Random sampling, also called Monte Carlo

Search, can also be combined with a Metropolis criterion. In this case, after each

structure- modification step, the change is evaluated to decide whether it is accepted or

rejected. If the modification results in a fitter candidate compound, it is immediately

accepted. If, on the other hand, the modification yields a less fit candidate compound, it

can still be accepted with a probability that is based on the scoring function difference

between the modified and unmodified structure and a random number.

Evolutionary algorithms: Evolutionary algorithms are based on the ideas described by

Charles Darwin in 1859. They are population-based optimization algorithms that mimic

biological evolution with the genetic operators 'reproduction', 'mutation' and

'recombination' (crossover). Mutation introduces new information into a population,

whereas recombination exploits the information inherent in the population. Candidate

compounds are represented by individuals in a population so that a set of solutions is

obtained in a single run of the algorithm. Genetic algorithms are type of evolutionary

algorithms and are already discussed briefly as part of search techniques. The discussion

about strategies in VS would be incomplete without discussing fragment-based screening

for lead generation by high throughput X-ray crystallography/NMR.

1.3.3 Fragment-based screening by high throughput X-ray crystallography/NMR

Fragment-based approaches to lead discovery (Hartshorn, et al., 2005) are recently

becoming important as a complementary approach to traditional high throughput

screening for discovering new leads in drug discovery program. The approach involves

screening libraries of compounds, referred as fragments that are significantly smaller and

functionally simpler than drug molecules. Despite their low affinities {> 100 microM),

fragments possess high ratios of free energy of binding to molecular size and and may

therefore represent suitable starting points for evolution to good quality lead compounds.

Moreover, it is possible to optimize fragments to high quality leads of relatively low

molecular weight that possess better drug-like properties (Bemis & Murcko, 1996; Bemis

& Murcko, 1999). The methodology includes design of virtual fragment library, virtual

screening, synthesizing the hits identified by VS, cocktailing, structure solution through

X-ray/NMR techniques and fmally optimization where de novo design strategies are used

20

Cliapter 1 Introduction

for stitching potential binding fragments together with the help of suitable linkers within

the binding pocket of the enzyme followed by an iterative cycle of synthesis and structure

solution through X-ray/NMR techniques. This new approach is believed to have many

advantages over conventional screening such as more efficient sampling of chemical

space using fewer compounds and a more rapid hit-to-lead optimization phase.

1.3.4 Synergies between structure-based and ligand-based virtual screening

SBVS and LBVS have been considered almost mutually exclusive suggesting LBVS to

be used primarily in the absence of protein target structure(s) and SBVS to be used if

target structure(s) are available. Especially when a target protein structure at high atomic

resolution is available, SBVS is often considered as the first choice strategy ignoring

possible LBVS alternatives. However, there are numerous drawbacks to the strict

separation between ligand- and structure-based CADD methods. Most ligand-based

methods try to conserve the 3D arrangement of functional groups on a scaffold believed

to be important in the activity of existing ligands precluding the discovery of novel

ligands, which undertake different interactions with the target protein. If induced fit of

both ligand and protein are evaluated in docking methods, it becomes more expensive

computationally. Large scale changes, in the protein upon ligand binding are often

ignored. With and without ligands complexed to it, the availability of detailed structures

of the target, ideally in different conformations places limitations in structure-based

methods. Therefore, SBVS and LBVS should not be applied independently but rather in

concert to increase the chances of fmding novel hits. Some software approaches such as

SDOCKER have begun integrating both strategies by including ligand similarity as a part

of the scoring functions used in the docking algorithm (Muegge & Oloff, 2006).

Integrated CADD methodologies where SBDD is combined with pharmacophores use

crystal structure complexes to produce structure-based pharmacophores representing the

ligand features that are involved in interactions with the target protein, as well as the

space around the ligand occupied by the protein (Griffith, et a/., 2005) yielding

information about the binding cavity size and all relevant interactions. The protein-ligand

complexes can thus yield a "superligand" which can also be viewed as a pharmacophore.

Various types of novel pharmacophores can thereafter be compared and investigated.

Tfl-1'1165 21

Chapter 1 I ntroauction

1.4 Network pharmacology

The dominant paradigm in drug discovery is the concept of designing maximally

selective ligands to act on individual drug targets, the 'one gene, one drug, one disease'

paradigm (Drews, 2000) mentioned earlier. However, many effective drugs act via

modulation of multiple proteins rather than single targets. Advances in systems biology

are revealing a phenotypic robustness and a network structure, i.e., network biology

analysis predicts that if, in most cases, deletion of individual nodes has little effect on

disease networks, modulating multiple proteins may be required to perturb robust

phenotypes. This finding challenges the dominant assumption of single target drug

discovery and led to the · emergence of new drug discovery approach called

polypharmacology. This new appreciation of the role of polypharmacology is thought to

have significant implications for tackling the two major sources of attrition in drug

development-efficacy and toxicity. Integrating network biology and polypharmacology

holds the promise of expanding the current opportunity space for druggable targets.

However, the rational design of polypharmacology faces considerable challenges in the

need for new methods to validate target combinations and optimize multiple structure­

activity relationships while maintaining drug-like properties. Advances in these areas are

creating the foundation of the next paradigm in drug discovery: network pharmacology

(Hopkins, 2008).

1.5 Future challenges

Over the past decades, the decrease in the clinical attrition rate challenges the current

drug discovery paradigm. In particular, there has been a worrying rise in late-stage

attrition in phase 2 and phase 3 (Kola & Landis, 2004). Currently, the two single most

important reasons for attrition in clinical development are (i) lack of efficacy and (ii)

clinical safety or toxicology, which each account for 30% of failures. An examination of

the root causes of why compounds undergo attrition in the clinic is very instructive and

helps in the identification of strategies and tactics to reduce these rates and thereby

22

Cliapter 1 Introauction

improve the efficiency of drug development. The discussion here will be restricted to in

silico methods in drug design that can be improvised further.

When one analyzes the future challenges of protein-ligand docking one must keep in

mind the ruling principles whereby protein receptors recognize, interact, and associate

with molecular substrates and inhibitors, which is of paramount importance in drug

discovery efforts. Protein-ligand docking aims to predict and rank the structure(s) arising

from the association between a given ligand and a target protein of known 3D structure.

Despite the breathtaking advances in the field over the last decades and the widespread

application of docking methods, several downslides still exist. In particular, protein

flexibility - a critical aspect for a thorough understanding of the principles that guide

ligand binding in proteins (induced-fit) and imperfections in scoring functions is a major

hurdle in current protein-ligand docking efforts that needs to be more efficiently

accounted for. Despite the size of the field, the principal types of search algorithms and

scoring functions as well as traditional limitations associated with molecular docking

must be addressed.

Fast docking protocols can be combined with accurate but more costly molecular

dynamics techniques to predict more reliable ligand-macromolecule complexes. The idea

of this combination lies in their complementary strengths. Docking simulations are used

to explore the vast conformational space in a short period of time, allowing the scrutiny

of large collections of drug-like compounds at a reasonable cost. Molecular dynamics can

treat both ligand and protein flexibility, especially in the protein that is usually a limited

characteristic in docking protocols. In molecular dynamics simulations, the effect of

explicit water molecules can be investigated directly, and accurate binding free energies

can be obtained.

As the capacity for biological screening and chemical synthesis have dramatically

increased, so have the demands for large quantities of early information on absorption,

distribution, metabolism, excretion (ADME) and toxicity data (van de Waterbeemd &

Gifford, 2003). Various medium and high-throughput in vitro ADMET screens are

therefore now in use. In addition, there is an increasing need for good tools for predicting

these properties to serve two key aims - first, at the design stage of new compounds and

23

Cfiapter 1 Introauction

compound libraries so as to reduce the risk of late-stage attrition; and second, to optimize

the screening and testing by looking at only the most promising compounds. Moreover,

principles of scaffold-hopping can be applied to generate diverse scaffolds that reduce the

risk oflate stage failure (Zhang & Muegge, 2006).

It is generally recognized that drug discovery and development are very time and

resources consuming processes. There is an ever-growing effort to apply computational

power to the combined chemical and biological space in order to streamline drug

discovery, design, development and optimization. In biomedical arena, computer-aided or

in silico design is being utilized to expedite and facilitate hit identification, hit-to-lead

selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile

and avoid safety issues. Regulatory agencies as well as pharmaceutical industry are

actively involved in development of computational tools that will improve effectiveness

and efficiency of drug discovery and development process, decrease use of animals, and

increase predictability. It is expected that the power of CADD will grow as the

technology continues to evolve.

The first part of this introduction discusses the principles that have been used for virtual

ligand as well as target-based screening and profiling to predict biological activity. The

aim of the second part of this chapter was to illustrate some of the current applications of

in silico methods for drug-design and the future challenges that need to be addressed.

Towards the end, our conclusion is that the in silico drug discovery paradigm is ongoing

and presents a rich array of opportunities that will assist in expediting the discovery of

new targets, and ultimately lead to compounds with predicted biological activity for the

drug targets.

1.6 References

Abagyan, R. and Totrov, M. (1994) Biased probability Monte Carlo conformational searches and electrostatic calculations for peptides and proteins. J Mol Bioi, 235, 983-1002.

Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nat Rev Drug Discov, 1, 882-894.

24

Cliapter 1 I ntroiuc.tion

Bemis, G.W. and Murcko, M.A. (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem, 39, 2887-2893.

Bemis, G.W. and Murcko, M.A. (1999) Properties ofknown drugs. 2. Side chains. J Med Chem, 42,5095-5099.

Blundell, T.L., et a/. (1972) Three-dimensional atomic structure of insulin and its relationship to activity. Diabetes, 21, 492-505.

Bohm, H.J. (1992) LUDI: rule-based automatic design of new substituents for enzyme inhibitor leads. J Comput Aided Mol Des, 6, 593-606.

Davie, E.W., et a/. (1991) The coagulation cascade: initiation, maintenance, and regulation. Biochemistry, 30, 10363-10370.

Drews, J. (2000) Drug discovery: a historical perspective. Science, 287, 1960-1964.

Eldridge, M.D., eta/. (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des, 11, 425-445.

Gohlke, H., et a/. (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Bioi, 295, 337-356.

Goodford, P.J., et a/. (1980) The interaction of human haemoglobin with allosteric effectors as a model for drug-receptor interactions. Br J Pharmacol, 68, 7 41-7 48.

Griffith, R., et a/. (2005) Combining structure-based drug design and pharmacophores. J Mol Graph Model, 23, 439-446.

Halgren, T.A., et a/. (2004) Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J Med Chem, 4 7, 17 50-17 59.

Hartshorn, M.J., et a/. (2005) Fragment-based lead discovery using X-ray crystallography. J Med Chem, 48,403-413.

Hightower, M. and Kallas, E.G. (2003) Diagnosis, antiretroviral therapy, and emergence of resistance to antiretroviral agents in HN-2 infection: a review. Braz J Infect Dis, 7, 7-15.

Hopkins, A.L. (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem Bioi, 4, 682-690.

Jain, A.N. (2003) Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J Med Chem, 46, 499-511.

Jorgensen, W.L. (2004) The many roles of computation in drug discovery. Science, 303, 1813-1818.

Kapetanovic, I.M. (2008) Computer-aided drug discovery and development (CADDO): in silico-chemico-biological approach. Chem Bioi Interact, 171, 165-176.

Kaufinann, S.H. (2008) Paul Ehrlich: founder of chemotherapy. Nat Rev Drug Discov, 7, 373.

Kitchen, D.B., eta/. (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov, 3, 935-949.

25

Cfuzpter 1 Introtfuction

Kola, I. and Landis, J. (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov, 3, 711-715.

Kramer, B., et al. (1999) Evaluation of the FLE:XX incremental construction algorithm for protein-ligand docking. Proteins, 37, 228-241.

Lipinski, C.A., et al. (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev, 46, 3-26.

Maryanoff, B.E. (2004) Inhibitors of serine proteases as potential therapeutic agents: the road from thrombin to tryptase to cathepsin G. J Med Chern, 47, 769-787.

McGann, M.R., et al. (2003) Gaussian docking functions. Biopolyrners, 68, 76-90.

Moitessier, N., et al. (2008) Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br J Pharmacol, 153 Suppl 1, S7-26.

Morris, G.M., et al. (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Cornput Chern, 19, 1639-1662.

Moustakas, D.T., et al. (2006) Development and validation of a modular, extensible docking program: DOCK 5. J Cornput Aided Mol Des, 20, 601-619.

Muegge, I. (2000) A knowledge-based scoring function for protein-ligand interactions: probing the reference state. Perspect Drug Discov Des, 20, 99-114

Muegge, I. and Oloff, S. (2006) Advances in virtual screening. Drug Discovery Today: Technologies, 3, 405-411.

Oprea, T.I. and Matter, H. (2004) Integrating virtual screening in lead discovery. Curr Opin Chern Bioi, 8, 349-358.

Rarey, M., et al. (1996) A fast flexible docking method using an incremental construction algorithm. J Mol Bioi, 261,470-489.

Russ, A.P. and Lampel, S. (2005) The druggable genome: an update. Drug Discov Today, 10, 1607-1610.

Schneider, G. and Fechner, U. (2005) Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov, 4, 649-663.

Stahl, M., et al. (2006) Integrating molecular design resources within modem drug discovery research: the Roche experience. Drug Discov Today, 11, 326-333.

van de Waterbeemd, H. and Gifford, E. (2003) ADMET in silico modelling: towards prediction paradise? Nat Rev Drug Discov, 2, 192-204.

Verdonk, M.L., et al. (2003) Improved protein-ligand docking using GOLD. Proteins, 52, 609-623.

von ltzstein, M., et al. (1993) Rational design of potent sialidase-based inhibitors of influenza virus replication. Nature, 363, 418-423.

Waszkowycz, B. (2008) Towards improving compound selection in structure-based virtual screening. Drug Discov Today, 13, 219-226.

26

Chapter 1 I ntrotfuction

Weiner, S.J., eta/. (1986) An all-atom force field for simulations of proteins and nucleic acids. J Comput Chem 1, 252.

Willett, P. (2006) Similarity-based virtual screening using 2D fingerprints. Drog Discov Today, 11, 1046-1053.

Zhang, Q. and Muegge, I. (2006) Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. J Med Chem, 49, 1536-1548.

27