3D QSAR in Drug Design. Vol2

406

Transcript of 3D QSAR in Drug Design. Vol2

Page 1: 3D QSAR in Drug Design. Vol2
Page 2: 3D QSAR in Drug Design. Vol2

3D QSARin Drug Design

Ligand-Protein Interactions and Molecular Similarity

Page 3: 3D QSAR in Drug Design. Vol2

QSAR =Three-Dimensional Quantitative Structure Activity Relationships

VOLUME 2

The titles published in this series are listed at the end of this volume.

Page 4: 3D QSAR in Drug Design. Vol2

3D QSARin Drug DesignVolume 2Ligand-Protein Interactions andMolecular Similarity

Edited by

Hugo KubinyiZHF/G, A30, BASF AG, D-67056 Ludwigshafen, Germany

Gerd Folkers ETH-Zürich, Department Pharmazie, Winterthurer Strasse 190, CH-8057 Zürich, Switzerland

Yvonne C. Martin Abbott Laboratories, Pharmaceutical Products Division, 100 Abbott Park Rd.,Abbott Park, IL 60064-3500, USA

KLUWER ACADEMIC PUBLISHERSNew York / Boston / Dordrecht / London / Moscow

Page 5: 3D QSAR in Drug Design. Vol2

eBook ISBN: 0-306-46857-3Print ISBN: 0-792-34790-0

©2002 Kluwer Academic PublishersNew York, Boston, Dordrecht, London, Moscow

Print ©1998 Kluwer Academic PublishersLondon

All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Kluwer Online at: http://kluweronline.comand Kluwer's eBookstore at: http://ebooks.kluweronline.com

Page 6: 3D QSAR in Drug Design. Vol2

Preface

relationships in cases where the biological targets, or at least their 3D structures, are stillunknown.

This project would not have been realized without the ongoing enthusiasm of Mrs. Elizabeth Schram, founder and former owner of ESCOM Science Publishers, who initi-ated and strongly supported the idea of publishing further volumes on 3D QSAR in Drug Design. Special thanks belong also to Professor Robert Pearlman, University of Texas, Austin, Texas, who was involved in the first planning and gave additional support and input. Although during the preparation of the chapters Kluwer Academic Publishers acquired ESCOM, the project continued without any break or delay in thework. Thus, the Editors would also like to thank the new publisher, especially Ms.Maaike Oosting and Dr. John Martin, for their interest and open-mindedness, whichhelped to finish this project in time.

Lastly, the Editors are grateful to all the authors. They made it possible for these volumes to be published only 16 months after the very first author was contacted. It is the authors’ diligence that has made these volumes as complete and timely as wasVolume 1 on its publication in 1993.

Hugo Kubinyi, BASF AG, Ludwigshafen, GermanyGerd Folkers, ETH Zürich, SwitzerlandYvonne C. Martin, Abbott Laboratories, Abbott Park, IL, USA

October 1997

Page 7: 3D QSAR in Drug Design. Vol2

Contents

Preface vii

Part I. Ligand–Protein Interactions

Progress in Force-Field Calculations of Molecular Interaction Fields and 3Intermolecular Interactions

Tommy Liljefors

Comparative Binding Energy Analysis 19Rebecca C. Wade, Angel R. Ortiz and Federico Gigo

Receptor-Based Prediction of Binding Affinities

A Priori Prediction of Ligand Affinity by Energy Minimization

Rapid Estimation of Relative Binding Affinities of Enzyme Inhibitors

35Tudor I. Oprea and Girland R. Marshall

63M. Katharine Holloway

85M. Rami Reddy, Velarkad N. Viswanadhan and M. D. Erion

Binding Affinities and Non-Bonded Interaction Energies 99Ronald M.A. Knegtel and Peter D.J. Grootenhuis

Molecular Mechanics Calculations on Protein-Ligand Complexes 115Irene T. Weber and Robert W. Harrison

Part II. Quantum Mechanical Models and Molecular DynamicsSimulations

Some Biological Applications of Semiempirical MO Theory 131Bernd Beck and Timothy Clark

Density-Functional Theory and Molecular Dynamics: A New Perspective forSimulations of Biological Systems

161

Wanda Andreoni

Density-functional Theory Investigations of Enzyme-substrate Interactions 169Paolo Carloni and Frank Alber

V

Page 8: 3D QSAR in Drug Design. Vol2

Preface

Significant progress has been made in the study of three-dimensional quantitativestructure-activity relationships (3D QSAR) since the first publication by RichardCramer in 1988 and the first volume in the series. 3D QSAR in Drug Design. Theory,Methods and Applications, published in 1993. The aim of that early book was tocontribute to the understanding and the further application of CoMFA and relatedapproaches and to facilitate the appropriate use of these methods.

Since then, hundreds of papers have appeared using the quickly developing techniquesof both 3D QSAR and computational sciences to study a broad variety of biologicalproblems. Again the editor(s) felt that the time had come to solicit reviews on publishedand new viewpoints to document the state of the art of 3D QSAR in its broadestdefinition and to provide visions of where new techniques will emerge or new applica-tions may be found. The intention is not only to highlight new ideas but also to show theshortcomings, inaccuracies, and abuses of the methods. We hope this book will enableothers to separate trivial from visionary approaches and me-too methodology from inno-vative techniques. These concerns guided our choice of contributors. To our delight, ourcall for papers elicited a great many manuscripts. These articles are collected in twobound volumes, which are each published simultaneously in two related series: they formVolumes 2 and 3 of the 3D QSAR in Drug Design series which correspond to volumes9-11 and 12-14, respectively, in Perspectives in Drug Discovery and Design. Indeed, the field is growing so rapidly that we solicited additional chapters even as the early chapterswere being finished. Ultimately it will be the scientific community who will decide if thecollective biases of the editors have furthered development in the field.

The challenge of the quantitative prediction of the biological potency of a new mole-cule has not yet been met. However, in the four years since the publication of the firstvolume, there have been major advances in our understanding of ligand-receptor inter-action s, molecular similarity , pharmacophore s, and macromolecular structures. Although currently we are well prepared computationally to describe ligand-receptorinteractions, the thorny problem lies in the complex physical chemistry of inter- molecular interactions. Structural biologists, whether experimental or theoretical in approach, continue to struggle with the field’s limited quantitative understanding of the enthalpic and entropic contributions to the overall free energy of binding of a ligand to a protein. With very few exceptions, we do not have experimental data on the thermo- dynamics of intermolecular interactions. The recent explosion of 3D protein structures helps us to refine our understanding of the geometry of ligand-protein complexes. However, as traditionally practiced, both crystallographic and NMR methods yield static pictures and relatively coarse results considering that an attraction between two non-bonded atoms may change to repulsion within a tenth of an Ångstrom. This is well below the typical accuracy of either method. Additionally, neither provides information about the energetics of the transfer of the ligand from solvent to the binding site.

Page 9: 3D QSAR in Drug Design. Vol2

Preface

With these challenges in mind, one aim of these volumes is to provide an overview ofthe current state of the quantitative description of ligand-receptor interactions. To aidthis understanding, quantum chemical methods, molecular dynamics simulations andthe important aspects of molecular similarity of protein ligands are treated in detail inVolume 2. In the first part ‘Ligand-Protein Interactions,’ seven chapters examine theproblem from very different points of view. Rule- and group-contribution-based ap-proaches as well as force-field methods are included. The second part ‘Quantum Chemical Models and Molecular Dynamics Simulations’ highlights the recent ex-tensions of ab initio and semi-empirical quantum chemical methods to ligand-proteincomplexes. An additional chapter illustrates the advantages of molecular dynamics simulations for the understanding of such complexes. The third part ‘PharmacophoreModelling and Molecular Similarity’ discusses bioisosterism. pharmacophores andmolecular similarity, as related to both medicinal and computational chemistry. These chapters present new techniques, software tools and parameters for the quantitative description of molecular similarity.

Volume 3 describes recent advances in Comparative Molecular Field Analysis and related methods. In the first part ‘3D QSAR Methodology. CoMFA and Related Approaches’, two overviews on the current state, scope and limitations, and recent progress in CoMFA and related techniques are given. The next four chapters describe improvements of the classical CoMFA approach as well as the CoMSIA method, an alternative to CoMFA. The last chapter of this part presents recent progress in PartialLeast Squares (PLS) analysis. The part ‘Receptor Models and Other 3D QSARApproaches’ describes 3D QSAR methods that are not directly related to CoMFA, i.e., Receptor Surface Models, Pseudo-receptor Modelling and Genetically Evolved Receptor Models. The last two chapters describe alignment-free 3D QSAR methods. The part ‘3D QSAR Applications’ completes Volume 3. It gives a comprehensive overview of recent applications but also of some problems in CoMFA studies. The first chapter should give a warning to all computational chemists. Its conclusion is that all investigations on the classic corticosteroid-binding globulin dataset suffer from seriouserrors in the chemical structures of several steroids, in the affinity data and/or in their results. Different authors made different mistakes and sometimes the structures used in the investigations are different from the published structures. Accordingly it is not poss-ible to make any exact comparison of the reported results! The next three chapters should be of great value to both 3D QSAR practitioners and to medicinal chemists, as they provide overviews on CoMFA applications in different fields, together with a detailed evaluation of many important CoMFA publications. Two chapters by Ki Kim and his comprehensive list of 1993-1997 CoMFA papers are a highly valuable source of information.

These volumes are written not only for QSAR and modelling scientists. Because of their broad coverage of ligand binding, molecular similarity, and pharmacophore andreceptor modelling, they will help synthetic chemists to design and optimize new leads,especially to a protein whose 3D structure is known. Medicinal chemists as well as agri-cultural chemists, toxicologists and environmental scientists will benefit from the de-scription of so many different approaches that are suited to correlating structure–activity

Page 10: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 11: 3D QSAR in Drug Design. Vol2

Part I

Ligand–ProteinInteractions

Page 12: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 13: 3D QSAR in Drug Design. Vol2

Progress in Force-Field Calculations of Molecular Interaction Fields and Intermolecular Interactions

Tommy LiljeforsDepartment of Medicinal Chemistry, Royal Danish School of Pharmacy, Copenhagen, Denmark

1. Introduction

In the force-field (molecular mechanics) method [1], a molecule is considered as a col-lection of atoms held together by classical forces. These forces are described by para-meterized potential energy functions of structural features like bond lengths, bondangles, torsional angles, etc. The energy of the molecule is calculated as a sum of terms(Eq. 1).

(1)The first four terms in Eq. 1 describe the energies due to deviations of structural featuresand non-bonded distances from their ‘ideal’ or reference values, Eelectrostatic is the energy contribution due to attraction or repulsion between charges (or dipoles) and some force-fields use a special hydrogen-bonding term, Ehydrogen Bond. Additional terms are required— e.g. for calculations of vibrational frequences and thermodynamic quantities.

Force-field calculations [l–3] are used in many research areas aiming at an under-standing, modelling and subsequent exploiting of structure-activity/property relation-ships. Such areas include conformational analysis, pharmacophore identification, liganddocking to macromolecules, de novo ligand design, comparative molecular field analy-sis (CoMFA) and identification of favorable binding sites from molecular interaction fields. Although ab initio quantum chemical computational methodology [4] today iscompetitive with experiments in determining a large number of molecular properties, force-fields are commonly used due to the prohibitive amounts of computer time required for high-level ab initio calculations on series of drug-sized molecules and onlarge molecular systems as those involved in ligand-protein interactions. For cal-culations of, for example, molecular structures and conformational energies force-fieldcalculations may give results in excellent agreement with experiments, provided that the force-field parameters involved in the calculations have been accurately determined [1,5].

The force-fields used for calculations of molecular interaction fields and inter-molecular interactions vary from very simple force-fields as those commonly used in CoMFA 3D QSAR studies [6] to the more sophisticated force-field used for the cal-culation of molecular interaction fields by the GRID method [3,7-1 l], and the complex force-fields required for calculations of complexation energies and geometries of general intermolecular interactions between complete molecules [2].

The aim of this chapter is to review and discuss some recent developments and evalu-ations of force-fields used for the calculations of molecular interaction fields and inter-molecular energies and geometries. The review is not meant to be exhaustive, but some

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design, Volume 2. 3-17.© 1998 Kluwer Academic Publishers. Printed in Great Britain.

E = Estretching + Ebending + Etorsion + Evan der Waals + Eelectronic+ Ehydrogen band + other terms

Page 14: 3D QSAR in Drug Design. Vol2

Tommy Liljefors

recent studies have been selected to illustrate important current directions in the de-velopment of force-fields and in their use in connection with 3D QSAR studies and cal-culations of molecular interaction fields and intermolecular interactions. Thus, relationships between the quality of the force-field and the results of a CoMFA 3DQSAR study will be discussed in the light of some recent investigations. New develop-ments in the computations of molecular interaction fields by the GRID method will bedescribed, and recent developments in the calculations of intermolecular energies andgeometries for the biologically important cation-π and π–π systems will be reviewed.

2.

The force-field commonly used in a CoMFA QSAR study is very simple and includesonly two terms, a Lennard-Jones 6-12 potential for the van der Waals (vdW) inter-actions and a Coulomb term for the electrostatic interactions with the probe. Considering the large number of successful CoMFA QSAR studies which have beenreported [12], these two terms seem to be sufficient in most cases. It may be expectedthat a hydrophobic field and/or an explicit hydrogen-bond potential may, in some cases,be advantageous. However, more experience with the inclusion of such fields isnecessary before a firm conclusion may be drawn.

No systematic comparison of different force-fields in connection with CoMFAstudies has been undertaken. However, some interesting case studies have recently been reported. Folkers et al. have reviewed the results of CoMFA QSAR studies employingthe GRID force-field and the standard (Tripos) CoMFA force-field [13].The two force-fields are significantly different. For instance, the atomic charges employed in thecalculation of the electrostatic interactions are significantly different. The GRID force-field also includes a sophisticated hydrogen-bonding potential [8–10]. Folkers et al.concluded that the GRID force-field generally gives better results in terms of Q2 andstandard error of prediction than the standard CoMFA force-field for an unchargedmethyl probe in cases where only the steric field contributes to the correlation. Whensteric and electrostatic fields contribute more equally to the correlation, the force-fields tested give very much the same results.

Two recent studies, discussed below, further illuminate the question of the force-fielddependence in CoMFA QSAR studies.

2.1.

Instead of using the standard Lennard-Jones 6-12 potential, Kroemer and Hecht [14],mapped all atoms in the target molecules directly onto a predefined grid and for eachatom checked if the center of the atom is inside or outside cubes defined by the grid.Depending on the outcome of this check, different values (0.0 or 30.0) were assigned tothe grid point at the corresponding lattice intersection. These atom-based indicator vari-ables correspond to what is obtained by using a simple hard-sphere approximation (with energy cutoff) for the calculations of vdW interactions. The indicator variables were used as the ‘steric field’ in a CoMFA QSAR study on five sets of dihydrofolate reduc-

CoMFA QSAR: Is there a Force-Field Dependence?

The steric field: Is an explicit vdW potential sequired?

4

Page 15: 3D QSAR in Drug Design. Vol2

Progress in Force-Field Calculations of Molecular Interaction Fields and Intermolecular Interactions

tase inhibitors and the results were compared to those obtained by using the standardCoMFA method. A similar approach has previously been reported by Floersheim et al. [15]. However, in this study, the vdW surfaces of the molecules were used for the com-putations of a ‘shape potential’ and values of 0 or 1 were assigned to grid points depending on whether the points were located within or outside the surface.

In terms of Q² and predictive r², the study by Kroemer and Hecht shows that the useof atom-based indicator variables gives results at least as good as those obtained by using the 6-12 Lennard-Jones potential in the standard CoMFA method. Thus, in the context of a CoMFA QSAR analysis, a step function as the one employed by Kroemerand Hecht, gives a description of the variation in the shapes of the molecules in the dataset of similar quality (or usefulness) as the Lennard-Jones 6-12 potential. An interesting point is that, in contrast to standard CoMFA QSAR, the use of a finer grid(< 2 Å) in conjunction with the indicator variables improved the results significantly.This study indicates that in general, there seems to be little to gain from fine-tuning theLennard-Jones 6-12 potential for CoMFA QSAR studies or introducing more accurate functions as, for instance, exponential functions [1] for the calculation of the vdW field.However, considering the results obtained by Folkers et al. for the GRID versus Tripos CoMFA force-fields described above, an exception to this may be cases for which the van der Waals field is the strongly dominating contributor to the correlation.

2.2.

The accurate calculation of the electrostatic contribution is clearly the most problematicpart in the calculation of intermolecular energies and geometries by force-field methods [16]. The results of such calculations, in general, very much depend on the quality of the charge distribution employed and also on a proper balance between the electrostatic term and the rest of the force-field. Is there a similar strong dependence of the charge distribution used on the results of a CoMFA QSAR study?

In a CoMFA analysis of 49 substituted benzoic acids, Kim and Martin found thatAMI partial charges performed better than STO-3G ESPFIT charges in reproducingHammett σ constants [17]. The results were also found to be superior to those obtainedby using regression analysis of the charges. Folkers et al. [13] studied the influence ofthe charge of the probe and the charge distribution of the target molecules for a set of24 N²-phenylguanines. The probes used were a sp³ carbon probe with charge +1, andoxygen probes with charges –0.85, –0.5 and –0.2, respectively. Three sets of atom- centered partial charges were used for the phenylguanines: (i) Gasteiger-Marsili charges [18] (default Tripos CoMFA charges); (ii) Mulliken charges calculated by semi-empirical quantum chemical methodology (the Hamiltonian was unfortunately not reported); and (iii) charges obtained by linear least-squares fitting of atom-centeredpoint charges to the electrostatic potential calculated from the semiempirical wave func- tion (ESP-charges [19]). The vdW field was calculated by the standard Lennard-Jones6-12 function. The use of the different probes gave significantly different Q² values and contributions of the fields, but the different atomic charge schemes resulted in similar statistical parameters.

The eIectrostatic field: what quality of the charge distribution is required?

5

Page 16: 3D QSAR in Drug Design. Vol2

Tommy Liljefors

Recently, Kroemer et al. [20] have extended this type of study to include a largevariety of different methods to calculate the electrostatic field of 37 ligands at the ben-zodiazepine inverse agonist site. A total number of 17 different charge schemes wasevaluated, including empirical charges (Gasteiger-Marsili), charges obtained from semi- empical quantum chemical methods (MNDO, AM1, PM3) and from ab initio quantumchemical methods at the Hartree-Fock level, employing three different basis sets(HF/STO-3G, HF/3-2IG*, HF/6-31G*). The semiempirical and ab initio atom-centeredpartial charges were calculated by a Mulliken population analysis. as well as by linearleast-squares fitting of atom-centered point charges to reproduce the calculated electro-static potential (ESP-charges). In addition, the molecular electrostatic potentials cal-culated by using the HF/STO-3G, HF/3-21G* and HF/6-31G* basis sets were directlymapped onto the CoMFA grid. In all cases, a sp³ carbon probe with a charge of +1 wasused. The vdW field was calculated in the standard way by a Lennard-Jones 6-12potential and the results were compared to standard CoMFA QSAR.

All 17 charge schemes resulted in good models in terms of Q² (0.61–0.77)and stan-dard error ofprediction (0.76–0.94). Although the different charge distribution schemeswere obtained at very different levels of theory, in the context of CoMFA QSAR studiesand the resulting statistical parameters there is hardly any significant difference. Theelectrostatic fields calculated by the various methods were, in many cases, shown to bestrongly correlated. However, even with a low correlation between fields e.g.between the semiempirically calculated Mulliken charges and the directly mapped elec-trostatic potentials employing the STO-3G basis set (r² = 0.62–0.66) — the results in terms of Q² arc very similar (0.61-0.72).

In contrast to the similarity of the statistical parameters in the study discussed above,the contour maps obtained by using different charge distribution schemes may differsignificantly. This has consequences for the use of contour maps in terms of a physico- chemical interpretation of intermolecular interactions and for the use of such maps inthe design of new compounds. Different charge distribution schemes may give contourmaps which lead the design process in different directions. The charge scheme which isthe ‘best’ one in this respect cannot unambiguously be selected on the basis of thestatistical parameters obtained.

3. Recent Developments of the GRID Method for the Calculation of MolecularInteraction Fields

The GRID method developed by Goodford [7–11] is designed for calculation of inter-actions between a probe (a small molecule or molecular fragment) and a macromolecu-lar system of known structure in order to find energetically favorable sites for the probe.A large number of probes including multi-atom probes are provided. The GRID methodis very carefully parameterized by fitting experimental data for proteins and small mole-cule crystals. In addition to its primary use to find favorable probe sites in macro-molecules, the interaction fields calculated by the GRID method have also been usedextensively in 3D QSAR studies as a replacement of the electrostatic and steric fields ofstandard Tripos CoMFA. Wade has reviewed the GRID method and its use for ligand

6

Page 17: 3D QSAR in Drug Design. Vol2

Progress in Force-Field Calculatons of Molecular Interaction Fields and Intermolecular Interactions

design, ligand docking and 3D QSAR [21]. Recently, Goodford has reviewed theGRID force-field and its use in multivariate characterization of molecules for QSAR analysis [11].

In calculations of molecular interactions fields, a static target is generally used. Withsome exceptions (see below), this has also been the case for calculations using theGRID method. In the most recent version of GRID (version 15) [22], new featurestaking target flexibility into account have been introduced. These important new fea-tures are discussed below. In addition, a new hydrophobic probe developed byGoodford and included in the GRID method is described. It should be noted that theseadditions to the GRID method are all very new and no publications in which thesefeatures have been used have yet appeared.

3.1. Identification of energetically favorable probe sites in hydrogen-bond interactions

Hydrogen bonding is extremely important in ligand-protein interactions and thereforeGoodford and co-workers have spent much effort in developing a sophisticated andcarefully parameterized methodoly for the calculation of hydrogen-bonding inter- actions [8–10]. GRID has always taken torsions about the C−O bond in aliphatic al- cohols into account. Thus, in the interactions between an aliphatic alcohol and a probewhich can accept or donate a hydrogen-bond, the hydrogen atom and the lone pairs ofthe hydroxyl group are allowed to move without any energy penalty in order to find themost favorable binding energy between the probe and the target. For instance, for theinteraction between methanol and a water probe, virtually identical interaction energiesare calculated for a staggered and eclipsed probe position with respect to the methylhydrogens in methanol (Fig. 1).

However, the energy difference between eclipsed and staggered methanol is cal-culated to be 1.4 kcal/mol by ab initio HF/6-31G* calculations [23]. If an eclipsedprobe position results in an eclipsed conformation of methanol, there should be a significant energy penalty for the eclipsed arrangement shown in Fig. 1, relative to thestaggered one. To investigate this, ab initio calculations (HF-6-31G*) were undertakenfor the two hydrogen-bonded complexes by locking the O–O–C–H dihedral angle (indi- cated by asterisks in Fig. 1) to 180 (staggered) and 0 degrees (eclipsed), respectively, but optimizing all other degrees of freedom, including the position of the hydrogen

Fig. I. Staggered and eclipsed positions of H 2O in its hydrogen-bond interaction with methanol as the hydrogen-bond donor.

7

Page 18: 3D QSAR in Drug Design. Vol2

Tommy Liljefors

Fig. 2. Staggered and eclipsed positions of H2 O in its hydrogen-bond interaction with methanol as the hy-drogen-bond acceptor Note that methanol prefers to be in a staggered conformation in both complexes. Theasterisks mark the atoms of the dihedral angle locked in the calculations.

atom involved in the hydrogen-bond [23]. The energy difference between the eclipsedand staggered arrangements in Fig. 1 was calculated to be 1.1 kcal/mol, only slightlylower than that for the corresponding conformations in methanol itself. Due to the strong directionality of the hydrogen-bond in this case, an eclipsed position of the water oxygen also leads to an eclipsed conformation of the methanol part of the complex. Thus, it can be concluded that there is an energy penalty of approximately 1 kcal/mol for the hydrogen-bonded eclipsed complex in Fig. 1 relative to the staggered one. If thehydrogen atom marked by an asterisk in the methanol part is replaced by a methy1group, this energy penalty increases to about 2 kcal/mol. The results are very similar forthe interaction between a carbonyl group and an aliphatic alcohol as hydrogen-bonddonor.

When methanol is acting as a hydrogen-bond acceptor. the situation is different. Thetwo hydrogen-bonding arrangements shown in Fig. 2 are calculated to have very similarenergies [23]. The reason for this is that due to the delocalized character of the lonepairs on the methanol oxygen atom, methanol can be staggered in both hydrogen-

Fig. 3. GRID maps for ethonol interacting with a carbonyl oxygen probe. The left map (a) shows favorable interaction sites calculated by GRID version 14 or earlier, whereas (b) shows the corresponding mapcalculated by GRID version 15. The hydroxy hydrogen atom is shown in one of its rotemeric slates.

8

Page 19: 3D QSAR in Drug Design. Vol2

Progress in Force-Field Calculations s of MoIecular Interaction Fields and Inermolecular Interactions

bonded complexes. Thus, the energy increasing eclipsing in the methanol part of the complex is avoided.

GRID version 15 includes a new energy function which takes the findings discussedabove into account. Probes which can both accept and donate a hydrogen-bond (e.g.H2O) may in the new version turn around compared to earlier versions of GRID andgive rise to a new interaction geometry. For interactions with probes which cannot turn around (e.g. the carbonyl probe), the hydrogen-bond energy may by significantly dimin-ished in unfavorable probe positions and thus give significant changes to the GRIDcontour map compared to earlier GRlD versions. This is illustrated in Fig. 3 which dis- plays GRID maps for the interaction between ethanol and a carbonyl oxygen probe cal-culated by GRID with version number ≤ 14 (Fig. 3a) and GRID version 15 (Fig. 3b).

Recently, Mills and Dean have reported the results of an extensive investigation of hydrogen-bond interactions in structures in the Cambridge Structural Database [24].This study provides 3D distributions of complementary atom about hydrogen-bondinggroups. Scatterplots and cumulative distribution functions clearly demonstrate thathydrogen-bond accepting groups interacting with a hydroxyl group prefer a staggeredposition, as in the left structure in Fig. 1, while hydrogen-bond donating groups aremuch less localized. This is in nice agreement with the findings discussed above.

3.2. Target flexibility

An important limitation in calculations of molecular interaction fields is that flexibilityof the target is not taken into account. In 3D QSAR studies, ‘side-chain’ conformationsin the target molecules are often more or less arbitrarily assigned when the bioactiveconformations are unknown and this may give misleading results. In ligand-proteininteractions, amino acid side chains may adopt different conformations i n order to betteraccommodate or better interact with a ligand. Of course, the analysis (GRID orCoMFA) may be repeated for several different conformations, but this is prohibitivelytime-consuming, especially in the case of side-chain conformations in proteins.

As discussed above, the GRID method has always taken some flexibility of the targetinto account. However, side chains in the target have remained fixed in their inputconformations. A very interesting new development of the GRID method is the imple-mentation of algorithms which take conformational flexibility of amino acid side chainsinto account, allowing them to be attracted or repelled by the probe as the probe is moving around [22]. The new algorithm is primarly intended for proteins but it may. insome cases, also be used for ‘side-chains’ in non-protein molecules. The amino acidscurrently supported are arginine, aspartate. asparagine, glutamate, glutamine, isoleucine,leucine, lysine, methionine, serine, threonine and valine.

The algorithm works by dividing the target molecule into an inflexible core and aflexible side chain on an atom basis. This is automatically done by the program allowingfor differences in the local environment of the side chain. However, the user may over-ride the default by forcing atoms into the core or out of the core into the flexible side-chain part.

9

Page 20: 3D QSAR in Drug Design. Vol2

Tommy Liljefors

So far, there is only limited experience with the ‘flexible side-chain’ option, but it isclear that this new feature in the GRID method is a very important step forward in thecalculations of molecular interaction fields. This feature should significantly improvethe description of energetically favorable probe locations and should be very useful in,for instance, ligand design and ligand docking.

3.3. The hydrophobic probe

To the library of GRID probes a hydrophobic probe has recently been added (probename DRY [22]). The hydrophic probe is designed to find locations near the targetsurface where the target molecule may favorably interact with another molecule in anaqueous environment. The energy expression for the hydrophic probe is shown in Eq. 2.

E = EENTROPY + E LJ – EHB (2)

The basic assumptions behind the construction of the probe is that the water ordering responsible for the entropic contribution to the hydrophobic effect is due to hydrogen-bonds between water molecules at nonpolar (undisturbed) target surfaces. On binding ofa hydrophobic molecule, the ordered water molecules are displaced and transfered intoless ordered (higher entropy) bulk water. This is an energetically favorable process(EENTROPY). Dispersion interactions between the two hydrophobic molecules adds to thefavorable energy (ELJ). The ordering of water at the nonpolar surface may be disturbedby polar target atoms which form hydrogen-bonds to water molecules. This decreasesthe order (increases the entropy) of the water molecules at the surface and, con-sequently, diminishes the hydrophobic effect (EHB). In addition, there are breaking ofhydrogen-bonds which is enthalpically unfavorable. EENTROPY is calculated from theassumption that the ordered water molecules at a nonpolar surface will form on theaverage three out of the theoretically four possible hydrogen-bonds per oxygen atom.This gives four permutations for three out of four possible hydrogen-bonds and EENTROPY may thus be cal culated by Eq. 3.

EENTROPY = –RT 1n4 (3)

R is the gas constant (1.987 × 10–3 kcal/mol/K) and T = 308 K. This entropic con-tribution is assumed to be constant at an undisturbed surface.

Dispersion interactions (ELJ) are calculated by using the Lennard-Jones function anda water probe. EHB which measures the hydrogen-bond interactions between watermolecules and polar functional groups of the target is calculated by using the hydrogen-bond function of the GRID force-field.

The hydrophobic probe, in general, gives wide and shallow minima. This implies thatthe variance of the hydrophobic energies is small. Therefore, PLS methods whichcluster grid points into chemically meaningful regions [25] should be employed if thehydrophobic fields are to be used as input to CoMFA/PLS. Scaling of the fields should not be done. As the energies obtained by using different GRID probes are alreadyscaled, further scaling of GRID fields is inappropriate [ 26].

10

Page 21: 3D QSAR in Drug Design. Vol2

Progress in Force-Field Calculations of Molecular Fields and Intermolecular Interactions

4.

Aromatic ring systems are of immense importance in drugs. Bemis and Murcko ana- lyzed 5210 known drugs i n the Comprehensive Medicinal Chemistry database [27].Among the 41 most common frameworks, 29 contain aromatic rings and benzene(phenyl) was found to be the most common one.

The benzene ring system, the prototypical aromatic system. has very unique proper-ties. It does not have a permanent dipole moment and is, in that sense, a nonpolar mole- cule. However, it has a strong quadrupole moment. The electrostatic potential ofbenzene leads to strong attraction of a cation to the π-face of the ring system (cation–πinteractions: Fig. 4). Thus, benzene and related aromatic systems may be considered as‘hydrophobic anions’ [28].

Amino acids containing aromatic side-chains as phenylalanine, tyrosine and trypto-phan play an important role in protein structure and function. Thus, the understandingof intermolecular interactions involving aromatic systems and the ability to model such interactions are of crucial importance for computational studies of ligand-receptor com-plexes. These types of interactions have received considerable attention during recentyears. Significant progress in the understanding of the nature of cation-π interactionsand π–π interactions and the computational problems involved in the force- field cal- culations of such interactions have recently been made.

4.1. Cation–π interactions

The binding of cations to the π-face of benzene involves large binding energies. For instance, the binding enthalpy of Li+-benzene in the gas phase is 38 kcal/mol and thecorresponding binding enthalpy of NH4

+-benzene is 19 kcal/mol [28]. This strong inter- action with cations in fact makes the benzene ring competitive with a water molecule inbinding to cations. The K+-H2 O and the K+-benzene complexes have binding enthalpies of 18 and 19 kcal/mol, respectively [28]. It has been demonstrated that cyclophanchosts made up of aromatic rings are able to strongly bind cations including quaternaryammonium ions in aqueous solution [29].

For a long time, it was generally believed that the quaternary positively chargedammonium group of acetylcholine was interacting with an anionic site in acetylcholineesterase. However, i t is now clear that acetylcholine in its binding to its esterase inter-acts via cation–π interactions with aromatic ring systems, in particular to tryptophan

Force-Field Calculations of Cation–π and π–π Complexes

Fig. 4. Cation-π interactions

11

Interaction

Page 22: 3D QSAR in Drug Design. Vol2

Tommy Liljefors

(W84) [30]. The importance of cation–π interactions in chemistry and biology has recently been reviewed by Dougheity [28].

Standard high-quality force-field methods seriously underestimate cation–π inter- actions. For example, the interaction enthalpy of NH4

+-benzene is underestimated by asmuch as 5.8 kcal/mol even if high-quality atom-centered electrostatic potential derived charges are used, charges which accurately reproduce the quadrupole moment ofbenzene [3 1].

Recently, high-level ab initio calculations have provided insight into the nature ofcation–π interactions. On the basis of MP2/6-311+G** calculations, which give very good agreement with experiments (binding enthalpies as well as free energies) for theammonium ion-benzene complex. Kim et al. [32] conclude that, in addition tocharge-quadrupole interactions, correlation effects (dispersion energies) and polar-ization of the benzene electron distribution by the cation give very important con- tributions to the complexation energy. The failure of standard molecular mechanicsforce-fields to handle cation–π interactions can, to a large part. be attributed to the factthat polarization effects are not taken into account.

In general, molecular mechanics force-fields include only two-body additive potentialfunctions. Non-additive effects as polarization have only recently been included.Caldwell and Kollman have developed a molecular mechanics force field which explic-itly includes polarization and also includes non-additive exchange-repulsion [33]. Thisforce-field excellently reproduces the complexation enthalpy of alkali cations-benzeneand ammonium ion-benzene. Thus, as has previously been shown in other cases. forinstance in the calculations of hydration free energies of ions [34], the inclusion ofnon-additive effects such as polarization is necessary for force-field calculations ofintermolecular interactions involving ions.

In a recent study, Mecozzi et al. [35] show that the variation in cation (Na+) bindingabilities to a series of aromatic systems surprisingly well correlates to the electrostaticpotential at the position of the cation i n the complex. Thus, virtually all variation inbinding energy is reflected in the electrostatic term.Cation–π interactions are not only limited to full cations, as ammonium ions andalkali cations but also polar molecules as H2O, NH 3 and other molecules with partial positive charges display this type of interaction with the π-face of benzene, albeit with weaker interaction energies [36,37]. This makes the cation–π type of interactions of great importance for the understanding of ligand–protein interactions.

The strong attractive interactions between cations and the π-face of benzene andrelated aromatic ring systems have been used as an argument for a proposed stabil-ization of the putative ion-pair interaction between the ammonium group of aminergicneurotransmitters and an aspartate side-chain in their receptors. In models of thebinding site of these receptors, the aspartate residue is surrounded by highly conserved aromatic residues [38]. However, it has recently been shown on the basis of ab initiocalculations that the stabilization of an ion pair by benzene is very much smaller than the stabilization of an isolated cation [39]. In fact, the stabilization provided is notsufficient to prevent hydrogen transfer from the ammonium ion to the carboxylate ion giving the intrinsically more stable amine-carboxylic acid complex. However, a pro-

12

Page 23: 3D QSAR in Drug Design. Vol2

Progress in Force-Field Calculations of Molecular Interaction Fields and intermolecular Interactions

perly located water molecule in conjunction with a dielectric continuum may providethe required stabilization [40].

The progress made in the understanding of cation–π interactions during recent yearshas provided developers of molecular interaction fields and force-fields for calculationsof inermolecular interactions with much valuable insight.

4.2. π–π interactions

The benzene-benzene interaction. the prototypical π–π interaction, is of great import- ance due to its role in the stability of proteins and in ligand-protein binding. Although itwas pointed out more than 20 years ago that benzene crystal data could not be fittedwithout introducing electrostatics [41],in the context of force-field calculations benzenecontinued for a long time to be considered as a nonpolar molecule interacting with other molecules or molecular Cragments only via non-bonded vdW interactions. However,computational problems with such a model became increasingly evident [42].

A T-shaped type of arrangement of aromatic rings (Fig. 5) is strongly preferred inprotein structures [43–45]. High-level ab initio calculations show the T-shaped complexto be significantly more stable than the stacked one [46]. However, a ‘parallel-displaced’ or tilted ‘parallel-displaced’ structure may be slightly more stable than theT-shaped one [42,47]. Although attractive vdW non-bonded interactions (dispersion) favor the stacked structure. the electro tic interactions (quadrupole-quadrupole) whichare attractive for the T-shaped arrangement but repulsive for the stacked one determine the preference for the T-shape.

Recently, Chipot et al. [46] employed potential of mean force calculations on the benzene dimer and toluene dimer in the gas phase and in water. They find the T-shaped benzene dimer in gas phase to be lower in free energy than the stacked structure. Interestingly, in the toluene case, their simulations indicate that the stacked arrangement is slightly preferred. The results of the force-field calculations are supported by high- level ab initio calculations. The same difference in orientational preferences are found

Fig. 5. Geometries of the benzene dimer.

13

Page 24: 3D QSAR in Drug Design. Vol2

Tommy Liljefors

in simulations for aqueous solution. This leads to the provocative question if thebenzene dimer really is a good model for π–π interactions in proteins. The authors con-clude that the rarity of stacked arrangements of phenylalanine side-chains in protein structures should be explained by other factors than quadrupole-quadrupole inter-actions. They propose that steric and other interactions with neighboring functional groups should additionally be considered.

From a force-field point of view, a very interesting point in this study is the demonstra- tion that atom-centered point charges obtained by least squares fitting to ab initio calcu-lated (6-31G**) electrostatic potentials accurately reproduce quadrupole moments and even higher-order multipole moments of benzene and toluene. Thus, such charges aresufficiently accurate to treat quantitatively the important qadrupole-qudrupole interactionsin the dimers. If this also holds for other types of aromatic systems remains to be studied.

5. Summary and concluding remarks

Recent case studies on the force-field dependence of the results obtained by the CoMFA3D QSAR methodology indicate that, in general, the use of a higher-quality force-field does not seem to lead to a significantly better 3D QSAR model in terms of statistical para- meters (Q² and standard error of prediction). In particular, the statistical parameters seemto be quite insensitive to the quality of the charge distribution used for the calculations of the electrostatic field. However, the contour plots derived from the analysis may show significant differences. Whether a force-field based on a higher level of theory also pro-duces contour plots better suited for the design of new analogs remains to be studied.

Significant developments of the force-fields in Goodford’s GRID method for the cal-culation of molecular interactions fields have recently been made. The most important new feature is that the flexibility of target side-chains may optionally be taken into account. In addition, the calculation of directional preferences in hydrogen-bonding to aliphatic alcohols has been improved and a hydrophobic probe has been included in the GRID library of probes.

Progress in the understanding of intermolecular interactions of the biologically important cation–π and π–π type has demonstrated that such interactions can be quan- titatively modelled by force-field calculations. However, explicit inclusion of polar-ization is required in the cation–π case. It has also been demonstrated that using atom-centered point charges obtained by least-squares fitting to ab initio calculatedelectrostatic potentials, quadrupole moments and also higher-order multipole momentsof benzene and toluene can be accurately reproduced. Atom-centered point charges, canin this case, be used to quantitatively calculate dimer properties for which quadrupole-quadrupole interactions are important.

Acknowledgements

I thank Dr Peter Goodford for valuable discussions on the GRID method. This work was supported by grants from the Danish Medical Research Council and the LundbeckFoundation, Copenhagen.

14

Page 25: 3D QSAR in Drug Design. Vol2

Progress iii Force-Field Calculations of Molecular Interaction Fields and Intermolecular Interactions

References

1.

2.

Burkert, U. and Allinger, N.L., Molocular mechanics, ACS Monograph 177. American ChemicalSociety. Washington D.C., 1982.Siebel, G.L. and Kollman, PA., Molecular mechanics and the Modeling of drug structures, InComprehensive medicinal chemistry. Vol. 4, Hansch C.. Sammes P.G., Taylor. J.B. and Ramsden. C.A.(Eds.), Pergamon Press, Oxford, 1990. pp. 125-138.Goodford, P., The properties of force fields. In Sanz. F., Giraldo, J. and Manual, F. ( Eds.) QSAR and molecular modeling: Concepts, computational tools and biological applications, Prous SciencePublishers. Barcelona. 1995, pp. 199–205.Hehre, W.J., Radom, L., v.R. Schleyer. P. and Pople, J.A., Ab initio molecular orbital theory, JohnWiley & Sons. New York, 1986. (a) Gundertofte. K.. Liljefors, T., Norrby, P.-O. and Pettersson, I.. A comparison of conformational energies calculated by several molecular mechanics methods. J. Comput. Chem., 17 (1996) 429–449.(b) Pettersson, I and Lil jefors,T. , Molecular Mechanics calculated confromational energies of organic molecules, In Lipkowitz, K.B. and Boyd. D.B. (Eds.) Reviews in Computational Chemistry, Vol. 9.VCH Publishera, Inc., New York, 1996. pp. 167-189.Cramer, III, R.D., Patterson, D.E. and Bunce, J.D., Comparative molecular field analysis ( CoMFA ):l. Effect of shape on binding of seriods to carrier proteins, J. Am. Chem. Soc., 110 (1998) 5959–5967. Goodford. P.J., A computational procedure for determining energetically favorable binding sites on

Boobbyer, D.N.A., Goodford, P.J., McWhinnie, P.M. and Wade, R.C., New hydrogen-bond potentials for use in determining Energetically favourable binding sites on molecules of known sturcture, J. Med.

Wade, R.C., Clark K. and Goodford, P.J., Further development of hydrogen-bond functions for use in determing energetically favorable binding sites on molecules of known structure: 1. Ligand probe groups with the ability to form two hydrogen bonds, J. Med. Chem., 36 (1993) 140–147. Wade. R.C., Clark. K. and Goodford, P.J., Further development of hydrogen-bond functions for use in determing energetically favourable binding sites on molecule of known structure: 2. Ligand probe group with the ability to form more than two hydrogen bonds. J. Med. Chem., 36 (1993) 148–156.Goodford, P., Multivariate characterization of molecules for QSAR analysis, J . Chemometrics. 10 (1996) 107–111.Cramer, III, R.D., DePriest. SA., Patterson, D.E. and Hecht, P., The developing practice of comparative molecular field analysis, In Kubinyi, H. (Ed.) 3D QSAR in drug design: Theory, methods and applica-tions, ESCOM Science Publuishers, Leiden. 1993. pp. 443–485. Folkers. G.. Merz, A. and Rognan. D.. CoMFA: Scope and limitations In Kubinyi, H. (Ed.) 3D QSARin drug design: Theory, methods and applications. ESCOM Science Publishers. Leiden, 1993.pp. 583-618.Kroemer. R.T. and Hecht, P. Replacement of steric 6–12 potential-derived interaction enrgies by atom- based indicator variables in CoMFA leads to models of higher consistency, J. Comput.-Aided Mol.Design. 9 (1995) 205-212.Floersheim. P.. Nozulak, J. and Weber, H.P., Experience with comparative molecular fields analysis, InWermuth. C.G. (Ed.) Trends in QSAR and molecular modelling 92 (Proceedings of the 9th EuropeanSymposium on Strocture-Activity Relationships: QSAR and Molecular Modeling). ESCOM SciencePublishers, Leiden. 1993. pp. 227-232.Berendsen, H.J.C., Electrostatic interactions, In van Gunsteren. W.F., Weiner, P.K. and Wilkinson. A.J.(Eds.) Computer simulation of biomolecular systems: Theoretical and experimental applications. Vol. 2. ESCOM Science Publishers, Leiden. 1993. pp. 161–181.Kim. K. H. and Martin. Y.C. Direct predictions of linear free energy substituent effects from 3D struc- tures using comparative molecular field analysis: 1. Electronic effects of substituted benzoic acids, J. Org. Chem., 34 (1991) 2723-2729.Gasteiger, J. and Marsili. M.. Interactive partial equalization of orbital elect atomic charges, Tetrahedron, 36 (1980) 3219–3228.

3.

4.

5.

6.

7.

8.biologically important macromolecules, J. Med. Chem., 28 (1985) 849–857.

Chem., 32 (1989) 1083-10949.

10.

11.

12.

13.

14.

15.

16.

17.

18.. ronegativity A rapid access to

15

Page 26: 3D QSAR in Drug Design. Vol2

Tommy Liljefors

19. (a) Chirlian, L.E. and Franel, M.M., Atomic charges dertived from electrostatic potentials: A detailedstudy. J. Comput. Chem., (1987) 804–905.(b) Besler, B.H., Merz, Jr., K.M. and Kollman, P.A., Atomic charges derived from semiempirical

Kroemer. K.T., Hecht. P. and Liedl, K.R., Different electrostatic descriptors in comparative molecularfield analysis: A comparison of molecular electrostatic and coulomb potentials, J. Comput. Chem., 11 (1996) 1296-1308.Wade. R.C.. Molecular interaction fields, In Kubinyi. H. (Ed.) 3D QSAR in drug design, Theory,methods and applications, ESCOM Science Publishers, Leiden, 1993, pp. 486–505.Goodford, P.,. GRID user guide, Edition 15, Molecular Discovery Ltd., Oxford. UK, 1997,Liljefors, T. and Norrby. P.-O., unpublished results.Mills, J.E.J. and Dean. P.M., Three-dimensional hydrogen-bond geometry and probablity informationfrom a crystal survey, J . Comput-Aided Mol Design, 10 (1996) 607 –622.Clementi, S.. Cruciani G., Rigan elli, D. and Valigi, R., GOLPE: Merits and drawbacks in 3D-QSAR, InSanz, F., Giraldo, J. and Munaut, F. (Eds.) QSAR and molecular modelling: Concepts, computationaltools and biological applications, Prous Science Publishers, Barcelona, 1996, pp. 408–414.

Bemis, G.W. and Murcko, M.A., The properties of known drugs: I. Molecular frameworks, J. Med.Chem., 39 (1996) 2887-2983.Dougherty. D.A., Cation-π interactions inchemistry and biology: A new view of benzene, Phe, Tyr, andTrp, Science 271 (1996) 163-168.Dougherty, D.A., and Stauffer. D.A., Acetycholine binding by a synthetic receptor: Implications for biological recognition, Science, 250 ( 1990) 1558-1560.(a) Sussman. J.L., Halel, M., Frolow. F., Oefner, C., Goldman, A., Toker, L. and Silman, I., Atomicstructure of acetylcholinesterase from Torpedo californica: A prototypic acetycholine-binding protein, Science. 253 (1991) 872-879.(b) Harel, M., Quinn, D.M., Nair, H. K., Silman, I. and Sussman, J.L., The X-ray structure of a transi- tion state analog complex reveals the molecular origin of the catalytic power and substrate specificity ofacetylcholinesterase, J. Am. Chem. Soc., 118 (1996) 2340–2346.Caldwell, J.W. and Kollman, P.A., Cation-π interactions: Nonadditive effects are critical in their accurate representation, J . Am. Chem. Soc.,117 (1995) 3177–4178.Kim. K.S., Lee, J.Y., Lee, S.J., Ha, T.-K. and Kim, D.H., On binding forces between aromatic ring andquaternary ammonium compound. J. Am. Chem. Soc., 116 (1994) 7399–7400.Caldwell, J., Dung. L.X. and Kollman, P A., Implementation of nonadditive intermolecular potentials byuse of molecular dynamics: Development of a water–water potential and water–ion cluster interactions, J. Am. Chem. Soc., 112 (1990) 9144-9147.Meng. E.C., Cieplak, P., Caldwell, J.W. and Kollman. PA., Accurate solvation free energies of acetate and methyIammonium ions calculated with a polarizable water model J . Am. Chem. Soc., 119 (1994)12061–12062.Mecozzi, S., West, Jr., A.P. and Dougherty, D.A., Cation -π interactions in simple aromatics: Electrostatics provide a predictive tool, J. Am. Chem. Suc., 118 (1996) 2307–2308. Cheney. B.V., Schultz, M.W., Cheney. J. and Richards. W.G., Hydrogen-bonded complexes involvingbenzene as an H- acceptor, J . Am. Chem. Soc. 110 (1988) 4295–4198.Rodtham, D.A., Suzuki. S., Suenram, R.D.. Lovas, F.J., Dasgupta, S.. Goddard, III. W.A. and Blake, G.A., Hydrogen bonding in the benzene –ammonia dimer, Nature. 363 (1993) 735-737.Trumpp-Kallmeyer, S., Hoflack, J.. Briunvels, A. and Hibert, M., Modeling of G-protein-coupled recep- tors: Applicaiton to dopamine, adrenaline, serotonin acetycholine, and mammalian opsin receptors,J. Med. Chem., 35 (1992) 3348–62. Liljefors. T. and Norrby. P.-O., Ab initio quantum chemical model calculations on the interactionsbetween monoamine neurotransmitters and their recep tors, In Schwartz, T.W., Hjort, S.A. andSandholm Kastrup, J. (Eds.) Structure and function of 7TM receptors, Alfred Benzon Symposium 39,Munksgaard, Copenhagen, 1996, pp. 194-207.

methods, J. Comput. Chem., 11 (1990) 431–439 20.

21.

22.23.24.

25.

26. Goodford, P., personal communication. 27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

16

Page 27: 3D QSAR in Drug Design. Vol2

Progress in Force-Field Calculations of Molecular Interaction Fields and Intermolecular Interactions

Liljefors, T. and Norrby, P.-O., An ab initio study of the trimethylamine-formic acid and the trimethy-lammonium–formate anion complexes, their monohydrates and continuum solvation, J. Am. Chem. SOC.,

Williams, D.E., Coulombic interaction in crystalline hydrocarbons, Acta Cryst., A30 (1974) 71-17.Pettersson, I. and Liljefors, T., Benzene–benzene (phenyl–phenyl) interactions in MM2/MMP2 molecular mechanics calculations, J. Comput. Chem., 8 (1987) 139-145.Burley, S.K. and Petsko, G.A., Aromatic-aromatic interaction: A mechanism of protein structurestabilization, Science, 229 (1985) 23-28.Burley, S.K. and Petsko, G.A., Dimerization energetics of benzene and aromatic amino acid side chains,J. Am. Chem. SOC., 108 (1986) 7995-8001.Singh, J. and Thornton, J.M., The interaction between phenylalanine rings in proteins, FEBS Lett., 191

Chipot, C., Jaffe, R., Maigret, B., Pearlman, D.A. and Kollman, P.A., Benzene dimer: A good model forπ–π interactions in proteins? A comparison between the benzene and the toluene dimers in the gasphase and in aqueous solution, J. Am. Chem. SOC., 118 (1996) 11217-11224.Schauer, M. and Bernstein, E.R., Calculations of the geometry and binding energy of aromatic dimers:benzene, toluene, and toluene-benzene, J. Chem. Phys., 82 (1985) 3722-3727.

40.

119 (1997) 1052-1058.41.42.

43.

44.

45.

46.(1985) 1-6.

47.

17

Page 28: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 29: 3D QSAR in Drug Design. Vol2

Comparative Binding Energy Analysis

Rebecca C. Wadeª*, Angel R. Qrtizb and Federico Gagoca European Molecular Biology Laboratory, Meyerhoƒstraße I., Postfach 10.2209, 69012

Heidelberg, G e r m a n yb Department of Molecular Biology, TPC-5, The Scripps Research Institute 10550 North Torrey

Pines Road, La Jolla, CA 92037, U.S.A. c Departamento de Farmacologia Universidad de Alcalá, 28871 AIcalá de Henares, Madrid,

Spain.

1. Introduction

Classical regression techniques have long been used to correlate the properties ofa series of molecules with their biological activities in order to derive quantitativestructure–activity relationships (QSAR) to assist the design of more active compounds [1].This approach has been successfully extended to three dimensions by using molecular co-ordinates of the ligands to derive 3D QSARs [2]. However, the availability of the three-di-mensional structures of many macromolecular drug targets has opened an alternativeapproach to drug design, namely structure-based drug design (SBDD), in which the physico-chemical interactions between the receptor and a series of ligands are used to ra- tionalize the binding affinities [3,4]. SBDD makes use of techniques ranging from those employing simple scoring functions through molecular mechanics calculations to detailed free energy perturbation calculations employing molecular dynamics simulation [5]. Now, particularly as a result of recent developments in the design of targeted combinatorial li- braries of compounds [6], it is becoming increasingly common to have data on the activi- ties of a family of compounds and knowledge of the three-dimensional structure of thetarget macromolecule to which they bind. While the activities of these compounds could

* To whom correspondence should be addressed.

AbbreviationsCoMFA Comparative Molecular Field AnalysisHSF-PLA2 Human synovial fluid phospholipase A2PLS Partial least squaresQSAR Quantitative structure activity relationshipSBDD Structure-based drug designSDEP Standard Deviation of Error of Predictions given by:

SDEP =

where Y is experimental activity: Y' is predicted activity; and N is the number of compounds

Q² Performance metric given by:

Y- <Y>)²

where <Y> is the average experimental activity

H. Kubinyi et al (eds.), 3D QSAR in Drug Design, volume 2. 19–34.© 1998 Kluwer Academic Publishers, Printed in Great Britian.

Page 30: 3D QSAR in Drug Design. Vol2

Rebecca C. Wade, Angel R. Ortiz and Federico Gago

be improved using the techniques of classical QSAR, 3D QSAR or SBDD, none of thesealone makes full, simultaneous and systematic use of all the available information. This isthe purpose of Comparative Binding Energy (COMBINE) Analysis [7,8].

The ‘COMBINE’ acronym refers to combinations in terms of both data andtechniques:

1.

2.

In outline, COMBINE analysis involves generating molecular mechanics models of aseries of ligands in complex with their receptor and of the ligands and the receptor, inunbound forms, and then subjecting the computed ligand-receptor interaction energiesto regression analysis in order to derive a QSAR relating ligand-binding constants oractivities to weighted selected components of the ligand-receptor interaction energy. While the chemometric analysis performed is similar to that in a Comparative Molecular Field Analysis (CoMFA) [9], the data analyzed in COMBINE analysis differby explicitly including information about the receptor-ligand interaction energies rather than only about the interaction properties of the ligands.

In contrast to free energy perturbation methods [10,11], a full sampling of phase space is not performed in COMBINE analysis: it is instead assumed that one or a fewrepresentative structures of the molecules are sufficient when experimental informationabout binding free energies is used for model derivation. Although any error in themodelling would introduce ‘noise’ into the dataset, this can be filtered out by means ofthe subsequent chemometric analysis.

Although occasionally there is a linear relationship between binding free energy and computed binding energy derived from molecular mechanics calculations for single conformations of the bound and unbound states of a series of ligand-receptorpairs, thisis not the case in general. This is because the entropic contribution to binding can vary over a series of ligands and because sufficiently accurate modelling of a full series of compounds can be difficult to achieve. A number of authors have correlated binding free energies with a few terms, defined according to physical interaction type, of thecomputed binding energies by linear regression [12-16]. A physical basis for such an analysis is provided by linear response theory which relates the electrostatic bindingenergy to the electrostatic binding free energy [16]. The COMBINE method differsfrom these approaches, in that more extensive partitioning of the binding energy is con-sidered and multivariate regression analysis is used to derive a model. This is important for two reasons: firstly, from a modelling perspective, because it is not assumed that the computed components of the binding free energy can be calculated with high accuracy. Rather, one of the foundations of COMBINE analysis is the realization that such cal- culations are usually noisy, and that is why only those contributions of the binding energy that present the best predictive ability are selected and weighted in the resultant model. Secondly, it is realized that binding free energy is rarely a linear function ofbinding energy. The extensive decomposition allows those components that are pre- dictive of binding free energy to be detected and these may implicitly represent other physically important interactions or even entropic terms.

20

data on ligand-receptor structures and the measured activities of a series of ligandsare combined; molecular mechanics and chemometrics are combined for the analysis.

Page 31: 3D QSAR in Drug Design. Vol2

A QSAR model is derived for each target receptor studied with the COMBINEmethod, as the method was specifically designed for ligand optimization. Thus, a derived regression model is not applicable to all ligand-receptor interactions in the waythat a general-purpose empirical ‘scoring function’ derived from statistical analysis of adiverse set of protein-ligand complexes is designed to be [17,18]. The philosophy isto account for peculiarities in the modelling and parameterization of a given set of compounds, so that both optimal and inexpensive predictive models can be derived.

In the next section, we describe the COMBINE analysis method. This is followed by a description of its application to two sets of enzyme inhibitors. COMBINE analysis is then discussed in terms of the quality of its predictions, its pros and cons, and its future prospects.

2. The COMBINE Method

2.1. Theory

The goal of the COMBINE analysis procedure is to derive an expression for thereceptor binding free energy of a ligand, ∆G, of the following form.

(1)

From this expression, biological activities may be derived by assuming that these quanti-ties are functions of ∆G. The expression is derived by analyzing the interaction of a set ofligands with experimentally known activities or binding affinities for a target receptor.Conformations of the ligand-receptor complexes and the unbound ligands and receptorare modelled with a molecular mechanics force field. It is assumed that these are represen-tative of the full ensemble of structures that would be sampled by these molecules. Fromthese. ligand-receptor binding energies, ∆U, are computed for each ligand.

(2)

where Elr and Einterlr are the total and intermolecular energies, respectively, of the

ligand–receptor complex; Er the energy of the unbound receptor r; and ∆Er is thechange in the potential energy of the receptor upon formation of the complex; andand ∆El are the corresponding energies for the ligand 1. ∆U itself will not, in general,correlate with ∆G, but it is likely that some of its components will. Therefore, ∆U ispartitioned into components according to physical type and which of the nl definedfragments of the ligand and nr defined regions of the receptor are involved.

(3)

21

Comparative Binding Energy Analysis

Page 32: 3D QSAR in Drug Design. Vol2

Rebecca C. Wade, Angel R. Ortz and Federico Gago

The first two terms on the right-hand side describe the intermolecular interaction energies between each fragment i of the ligand and each region j of the receptor. The next four terms describe changes in the bonded (bond. angle and torsion) and the non- bonded (Lennard-Jones and electrostatic) energies of the ligand fragments upon binding to the receptor, and the last four terms account for changes in the bonded and non- bonded energies of the receptor regions upon binding of the ligand.

The n terms, ∆ uisel in Eq. 1 that correlate with ∆G are selected from the ligand-

receptor binding energy. ∆ U, and the coefficients wi and constant C determined byregression analysis.

2.2. Implementation

The procedure for COMBINE analysis is outlined schematically in Fig. 1. There are essentially three steps to be followed for the derivation of a COMBINE model, namely modelling of the molecules and their complexes, measurement of the interactions between ligands and the receptor and chemometric analysis to derive the regression equation. Each of these steps will be considered in turn.

2.2.1. Molecular modelling The three-dimensional models of the ligand-receptor complexes and the unbound receptor and ligands can be derived with a standard molecular mechanics program. The dependence of the results on the modelling protocol followed has not yet been invest- igated in detail. The use of different starting conformations for the receptor. the inclu- sion of positional restraints on parts of the receptor, different convergence criteria during energy minimization or different ways of treating the solute-solvent interface and the dielectric environment can all produce different regression equations. The sens-itivity of COMBINE models to these factors compared to corresponding QSAR models that use the overall intermolecular interaction energies as regressors remains to be fully studied. One of the appealing characteristics of the COMBINE approach, however, is that, as a result of the decomposition of the intermolecular interaction energies on the basis of chemical fragments, artefacts in the modelled ligand-receptor complexes that could otherwise pass unnoticed can be easily detected.

In general, the limited available experience indicates that energy minimization should be mild, so that major steric clashes are eliminated while avoiding artefact-ual structural distortions due to inaccuracies in the modelled forces. It is particularly important to employ a suitable model of the solvent environment. When modelling explicit water molecules, inclusion of only crystallographic water molecules may notbe sufficient [30] but we have round that solvation of the ligand and receptor mole-cules with an approximately 5 Å thick shell of water molecules produces reasonable results [7,8].

While several conformations of each molecule or complex, derived for example from conformational analysis or molecular dynamics simulations, could be used for COMBINE analysis, we have so far used only single conformations derived fromenergy minimization.

22

Page 33: 3D QSAR in Drug Design. Vol2

Comparative Binding Energy Analysis

Fig. 1. Flowchart showing the stages of a COMBINE analysis

23

Page 34: 3D QSAR in Drug Design. Vol2

Rebecca C. Wade, Angel R. Ortz and Federico Gago

2.2.2. Measurement of the interaction energiesAfter modelling, ligand and receptor energies must be computed and decomposed in the form required for regression analysis. That is, a matrix is built with columns represent-ing the energy components given in Eq. 3 and rows representing each compound in theset. A final column containing inhibitory activities is then added to the matrix.

The energy decomposition scheme must, at some point, meet the two opposing ten-dencies of the Scylla of detailing enough energy terms that the elements responsible forthe activity differences can be isolated and the Charybdis of including so many termsthat the signal-to-noise ratio is so low that the subsequent analysis fails to obtain a meaningful model. Recent investigations in our groups indicate that a reasonable com-promise is to consider each residue in the receptor as contributing two interaction terms: one for van der Waals and one for electrostatic interactions. Inclusion of intramolecularenergies, which are a potential source of noise and cumbersome to compute, appears to result in little improvement in the regression models [19]. For these reasons, it is prob- ably advisable to omit them from the statistical analysis, although their importance can be expected to depend on the extent of conformational changes on binding.

In our studies, we have found that differences in the way electrostatic interactions arecomputed can have a considerable effect on the regression models [19] and Pérez et al. (submitted). The electrostatic energies can be given by a Coulombic expression or derived from solution of the Poisson-Boltzmann equation according to classical con-tinuum electrostatic theory [20]. We are currently comparing these methods inCOMBINE analysis and our results so far underscore the general importance of con- sidering the desolvation free energies upon binding as additional variables. although the information they provide may not always be 'new’ as it can be implicitly contained inother intermolecular electrostatic energy terms that are highly correlated with them. This collinearity may explain why good results can be obtained when this physically relevant contribution is not included in the analysis (see section 3.2 for additional details).

Additional terms to describe entropic contributions — e.g. freezing out of side-chainrotamers on binding — could also be included in COMBINE analysis. This has not yetbeen tested and their influence on the models derived remains to be investigated.

2.2.3. Chemometric analysisAs a result of the large number of terms and the correlated nature of the variables. partial least squares (PLS) [21] is the technique of choice for deriving the regression equation. In PLS analysis, a model is derived by projecting the original matrix of energyterms onto a small number of orthogonal ‘latent variables’. After this projection, theoriginal energy terms are given weights according to their importance in the model.Those that do not contribute to explaining the differences in binding have negligible effects and just add ‘noise’. When the ratio between the really informative variables and these ‘noisy’ variables is too low, the PLS method may fail to obtain a model. A sens-ible strategy to avoid this situation is to pretreat the data by setting very small values tozero and removing those variables that take nearly constant values in the matrix. If thispretreatment is insufficient, variable selection can be carried out, with the aim of climi-

24

Page 35: 3D QSAR in Drug Design. Vol2

comparative Binding Energy Analysis

nating from the matrix those variables that do not contribute to improving the predictive ability of the model. To this end, we have employed the GOLPE method [22], in whichthe effect of the variables on the predictive ability of the models is evaluated through fractional factorial designs and advanced cross-validation techniques Variable selection must, however, be carried out with care as it is prone to overfitting the data. particularlywhen selection is pursued beyond a certain limit [19].

COMBINE models can be validated by following the same principles as used in other 3D QSAR methodologies [2]. Apart from the minimum requirement of internal consist- ency (as evaluated by cross-validation), random exchange of the biological activities among the different molecules (permutation or scrambling) and the use of external test sets are strongly recommended, in order to highlight possible overfitting problems.

3. Applications

3.1. Phospholipase A2

The first application of COMBINE analysis [7,8] was done on a set of 26 inhibitors of human synovial fluid phospholipase A 2 (HSF-PLA2), an enzyme that catalyzes thehydrolysis of the sn-2 acyl chain of phosphoglycerides releasing arachidonic acid, the pre-cursor of several inflammatory mediators. The enzyme is mainly alpha-helical and hasabout 120 amino acid residues and seven disulfide bridges. The inhibitors are transition state analogs that bind in the substrate binding site, a slot whose opening is on the enzyme surface and runs all the way through the enzyme. The key catalytic residues are His-48and Asp-99, and a calcium ion bound to the active site is required for substrate binding.

An initial scatterplot showed a very poor correlation between biological activities and calculated binding energies (r = 0.21, Fig. 2a). However, the COMBINE model

Fig. 2. (a) Total calculated bindinging energy of the HSF-PLA2 inhibitors to the enzyme versus activityversus experimental activity for the HSF-PLA2

inhibitory activity on external ‘blind’ cross-validation The predictive model was derived using two latentvariables and yielded a fitted R² = 0.92, an internally cross-validated Q² = 0.82. and an externally crossvalidated Q² = 0.52. The broken line corresponds to a perfect fit, and the solid line shows the regression ƒit (r = 0.71).

expressed as percentage inhibition (r = 0.2 I ). (b) Predicted

25

Page 36: 3D QSAR in Drug Design. Vol2

Rebecca C. Wade, Angel R. Ortiz and Federico Gago

obtained for this data set showed good fitting properties and significant predictive ability (Fig. 2b), as assessed by a value of Q² – <Q2>s of 0.59, that is, the differencebetween the estimated Q² and the average Q² obtained in 20 scrambled models [7].

Fig. 3. Schematic diagram of HSF-PLA2 complexed with a representative inhibitor (LM1228 ). Spheres rep- resent atoms of protein residues lining the binding site that are frequently selec ted to contribute to regression models in COMBINE analysis (see reference [7] and table 3 therein). The calcium ion in the active site(shaded sphere) makes an important contribution to COMBINE models. This diagram was generated with themolscript program [32].

26

Page 37: 3D QSAR in Drug Design. Vol2

Comparative Binding Energy analysis

From the initial energy matrix, around 50 energy terms were finally selected to obtain the regression model. These energy contributions reflect complex relationships since they may have been selected because of their correlation with other variables. For this reason, they are better regarded as ‘effective’ energies and care must be exercised toavoid misinterpretations. However, it is of interest to examine these energy contribu-tions more closely. Most of the selected intermolecular effective energies correspond to interactions with residues in the enzyme active site. Overall, the model suggests that in this particular dataset the binding affinity is dominated by electrostatic interactions with the calcium ion located at the binding site. Several van der Waals interactions then modulate the affinity of the inhibitors. Some of the residues in the B helix (top left, Fig. 4) and the calcium-binding loop form a rigid wall sensitive to the conform-ation of the sn-2 chain. On the other side of the binding site, two aromatic residuesform a pocket in which an inhibitor must fit in order to have optimal activity. Finally, the C-terminal region of the enzyme forms an additional pocket with favorable inter- actions for inhibitors with benzyl moieties in the sn-3 chain. It is noteworthy that other researchers have arrived at similar SARs for a set of indole-based compounds [23], which suggests that the energies selected by COMBINE analysis may have some physical meaning in favorable cases. On the other hand. some other interactions have no clear physical meaning and they seemed simply to be correlated with some other, physically more relevant, variables. This is the only way to rationalize many of the selected interactions between the phosphate group of the inhibitors and charged residues exposed on the enzyme surface. some of them very far away from the phosphate group.

3.2. HIV-1 proteinase

Perhaps the most controversial issue that arose when the COMBINE approach was first reported [7,8] was the variable selection procedure. The large number of variables used in the original paper was a consequence of splitting the inhibitors into several fragments and considering intramolecular energy terms for both the protein and the inhibitors. The phospholipase A2 example. on the other hand, can be regarded as a particularly difficultcase. in the sense that the initial correlation between experimental activities and cal-culated binding energies was rather poor. The high correlation between calculated inter-molecular interaction energies (using the MM2X force field) and enzyme inhibition reported for a set of 33 HlV-1 proteinase inhibitors [14] prompted us to apply the COMBINE methodology to this same data set using the AMBER force field [24]. The coordinates of L-689, 502-inhibited HIV- 1 proteinase were used for the receptor, which included the water molecule that stabilizes the closed conformation of the dimeric enzyme by bridging a ß-hairpin from each monomer to the inhibitors. Adopting thesame philosophy as followed by the researchers at Merck [14], the enzyme was held fixed and only the inhibitors and the water molecule were allowed to relax on energyminimization. The intermolecular interaction energies were then calculated and related to the biological activities by means of a simple linear regression equation. For the COMBlNE decomposition scheme. these interaction energies were partitioned on a per

27

Page 38: 3D QSAR in Drug Design. Vol2

Fig

. 4. E

xpeh

enrd

ver

sus

pred

icte

d ac

tiviti

es (

pIC

50)

for

the

set

of H

IV-1

pro

tein

ase

inhi

bito

rs u

sed

to d

eriv

e th

e m

odel

(sq

uare

s) a

nd f

or t

he e

xter

nal

set

of 1

6 in

hibi

tors

(tr

iang

lcs)

. (a)

Sim

ple

mod

el th

at c

onsi

ders

ove

rall

inte

rmol

ecul

ar in

tera

ctio

n en

ergi

es(R

2=

0.7

9; Q

2=

0.7

7: e

xter

nal S

DE

P =

1.0

8): a

nd

(b)

CO

MB

INE

mod

el(2

prin

icip

alco

mpo

nent

s)in

whi

chth

eel

ectr

osin

ticin

tera

ctio

nen

ergy

term

sha

vebe

enca

lcul

ated

byco

ntin

uum

met

hods

and

deso

lvat

ion

effe

cts

have

bee

n in

clud

ed (

R2=

0.9

0; Q

2=

0.7

2: e

xter

nal

SDE

P =

0.6

2). O

ne o

f th

e co

mpo

nds

in r

efer

ence

[14

] ha

s a

pIC

50va

lue

less

tha

n 5.

0 an

d w

as

omitt

edfr

ombo

than

alys

is.

28

Page 39: 3D QSAR in Drug Design. Vol2

Comparaitve Binding Energy Analysis

residue basis. Each inhibitor was considered as a single fragment and no intramolecular energy terms were considered. The number of variables per inhibitor was thus equal to 2 (van der Waals and electrostatic) times the number of protein residues ([2 × 99 aminoacids] + 1 water molecule) = 398. No variable selection was employed. The resultingmatrix was pretreated simply by zeroing those interaction energies with absolute valueslower than 0.1 kcal/mol and removing any variables with a standard deviation below 0.1kcal/mol. This pretreatment reduced the number of variables that entered the PLS analy-sis to around 50. It is noteworthy that the number of variables was effectively reducedin this example, without the need for variable selection, underscoring the fact that it ispossible for a simple pretreatment of the original matrix to accomplish virtually the same effect.

Plots of predicted versus observed pIC50 values obtained for the inhibitors studied and for an additional set of 16 inhibitors not included in the derivation of the models[14] are shown in Fig. 4. While the internal cross-validation results are comparable in both cases, it is apparent that the PLS model from the COMBINE analysis (Fig. 4b) out-performs the simpler regression equation (Fig. 4a) in external predictions (Pérez, et al., submitted).

Attempts to incorporate desolvation effects into a predictive model were reportedly unsuccessful for the HIV-1 proteinase complexes [14]. The electrostatic interaction energy terms incorporated into the COMBINE model described in Fig. 4b were cal-culated by means of a continuum method, as implemented in the DelPhi program [25], using dielectric values of 4 and 80 to represent the molecular interiors and the sur- rounding solvent, respectively. The electrostatic desolvation energies of both the protein and the inhibitors were also included as two additional variables. Their incorporation into the model resulted in a slight improvement in predictive ability (Q² = 0.72 versusQ² = 0.70 for 2 principal components) [Pérez et al., submitted]. Interestingly, thevariables whose weights were moat affected by the desolvation energy correction were precisely those involving the charged residues that participate in strong electrostatic interactions between the inhibitors and the enzyme (Fig. 5).

4. Discussion

4. I. Quality of results

COMBINE models for the set of HSF-PLA 2 -inhibitor complexes compare favorably with CoMFA models for the same inhibitors aligned as in the modelled bound com- plexes [ 19]. Using the same dataset and the same cross-validation method. the best CoMFA model (the so-called N-T-C model [19], selected for its optimal predictive ability for external test sets) had a Q² = 0.62 and a standard deviation of error of pre-dictions (SDEP) = 13.5. The corresponding values obtained with COMBINE analysis were Q² = 0.82 and SDEP = 9.3 [7]. These figures of merit suggest a better predictiveperformance for COMBINE, but it should be noted that no scrambling of the biological data was done in the CoMFA study, so that the methods have not been compared usingthe more rigorous ‘excess’Q² value described in section 3.1.

29

Page 40: 3D QSAR in Drug Design. Vol2

Rebecca C. Wade, Angel R. Ortiz and Federico Gago

Fig. 5. (a) PLS coefficients for the electrostatic contributions of each residue from a COMBINE analysis onthe set of HIV- I proteinase-inhibitor complexes. Only the coefficients exhibiting significant variance aregiven non-zero values and labelled. (b) PLS coefficients after incorporation of desolvation effecoefficient of the electrostatic contribution to the desolvation of the inhibitors (∆Gsolv) is the largest of all and clearly modulates some of the other interactions. The electrostatic contribution to the desolvation of theprotein, on the other hand, appears to be highly correlated with other variables. so that its presence in themodel is not required

For the HSF-PLA 2 and HIV-1 protease examples. both the conventional and cross-validated squared correlation coefficients provided by COMBINE analysis compared very favorably with the ones obtained by the classical approach of using just the overallintermolecular interaction energies as the independent variables (Figs 2 and 4).

A further advantage of COMBINE analysis is that it highlights those regions in the enzyme binding site that contribute most to the differences in activity among the ligands. In the two examples reported above. this analysis allowed the identification of mechanistically important residues, and this information may guide the design of further chemical modifications on the inhibitors. For the HSF-PLA, case (see Fig. 3), the regions detected as important for activity were largely consistent with those identified by CoMFA and in studies of different sets of ligands [26].

30

Page 41: 3D QSAR in Drug Design. Vol2

Comparative Binding Energy Analysis

4.2.

It is pertinent to consider the reason for the success of the COMBINE methodology inpredicting binding affinities, given that the method is based on an incomplete model ofmolecular interactions and conformations. It has been suggested that successful bindingfree energy predictions rely on multiple cancellations among the contributions that areneglected in the model [27]. These cancellations may be fortuitous but can mostly beexpected to originate from enthalpy-entropy compensation on binding in aqueoussolution. An alternative explanation is that, within a congeneric series, it is possible thatsome of' the neglected terms, particularly entropic ones, do not contribute to the bindingfree energy differences because they are similar for all the members of the series.Entropic contributions to binding free energies can be roughly divided into three sep-arate terms: changes in translational and rotational entropy, changes in configurationalentropy and changes in solvent entropy [28]. Within a series, it is likely that values arcapproximately constant for translational and rotational entropy changes. and perhapsalso for configurational and vibrational entropy contributions. as well as for theenthalpic changes arising from changes in the conformation of the receptor. One com-ponent for which this cannot be expected to be the case in general, however. is the dif-ferential solvation free energy upon binding. It has even been suggested that, within acertain closely related series of ligands binding to a common receptor, the solvation freeenergies may vary more significantly than the intrinsic ligand-receptor interactions[29]. However, this does not mean that it is absolutely essential to incorporate this termexplicitly in the model in order to obtain good binding affinity predictions. For example,in a study involving a series of flavonoid trypsin inhibitors [30], it was found that theinhibitory potencies could be correlated well with calculated binding energies derivedusing a simple distance-dependent dielectric for modelling electrostatic interactions andconsidering only the bound state. When a continuum model was used to compute elec-trostatic interactions, the accuracy of the computed differences in binding free energieswas increased, but for only one inhibitor was it important to take into account the dif-ferences in solvation free energies between bound and unbound conformations. In thisexample, decomposition of the binding free energy components showed that the dif-ferences in the total electrostatic free energy of binding (including the desolvation con-tribution) were much smaller than the corresponding differences in van der Waalsbinding energies, so that it was this latter term that dominated the binding free energydifferences. A similar scenario can thus be envisaged in some of the other casesin which reasonable correlations between in vitro activities and calculated bindingenergies arc obtained without considering solvation effects.

As shown above, it is our experience that incorporation of a more rigorous treatmentof electrostatic interactions by means of continuum models and consideration of desol-vation effects upon binding lead to an improvement of the correlations and to morerobust COMBINE models [Pérez et al., submitted]. However, the increase in predictiveability upon incorporation of desolvation contributions has not been, in the HIV pro-tease test case, as large as we anticipated (see section 3.2). The reason isthat some intermolecular interactions appear to be collinear with the desolvation free

Why is COMBINE successful?

31

Page 42: 3D QSAR in Drug Design. Vol2

Rebecca C. Wade, Angel R. Ortiz and Federico Gago

energies of the inhibitors, so that some information about the solvent effects is implic- itly included in the intermolecular energies (see sections 3.2 and 2.3 for additionaldetails).

4.3.

The COMBINE method can be applied to any dataset Tor which the following areavailable:

1.

When to apply COMBINE antilysis

experimental binding or activity measurements for a series of ligands that bind to atarget macromolecular receptor; the number of measurements necessary for obtain-ing a good model will depend on the quality of the data but 15 is a reasonable lower limit;an experimentally determined three-dimensional structure of the target macromole- cular receptor complexed to a representative ligand.

2.

As in other QSAR techniques, the method relies on the assumption that all ligandsanalyzed bind to the receptor at the same binding site and that the binding mode can be deduced by comparative modelling techniques. For situations in which the binding mode can alter dramatically on minor modification of the ligand [31], it should in pin- ciple be possible to include more than one model of each complex in the analysis and deduce the correct binding mode from the regression analysis [32]. However, this has yet to be investigated in a practical application.

While the method has so far been applied only to the binding of a series of smallmolecule ligands to a protein, it should also be applicable to interactions between macromolecules. for instance. to predict the effects of protein mutations.

5. Conclusion

COMBINE analysis provides a means to exploit information about the three- dimensional structure of a target macromolecule and about measured activities of a series of compounds in order to derive a model to predict their binding free energies.The applications described here to HSF-PLA2 and HIV proteinase inhibitors demon- strate that predictive models that give insight into the mechanism of inhibition can be derived. The predictive performance of these models compares very favorably with that of other regression methods that make more conventional use of molecular mechanics interaction energies or other QSAR techniques such as CoMFA. Analysis of the import- ance of different energetic terms and the effects of different chemometric protocols has shown how robust models can be obtained. Further work should include the exploration of the effects of including additional descriptors such as those that explicitly describe entropic changes on binding. In addition, it may be possible to obtain improvements in the method by optimizing modelling protocols and tailoring the chemometric tools more specifically towards the data extracted from the molecular systems studied.

Further applications to different types of ligand–receptor datasets are necessary for acomplete assessment of the capabilities of the method. However, the quality of the

32

Page 43: 3D QSAR in Drug Design. Vol2

Comparative Binding Energy Analysis

COMBlNE models obtained so far is probably good enough to encourage its wide-spread application to QSAR problems where it can assist in the design of new com-pounds which will provide real experimental ‘blind tests’ of predictive ability.

Acknowledgements

We thank our collaborators Ana Checa and Carlos Pérez for their enthusiastic par-ticipation in parts of the work presented here; Dr. Gabriele Cruciani and Dr. Manuel Pastor for helpful comments on chemometrics and provision of the GOLPE program;Dr. Albert Palomer for bringing the problem of structure–activity relationships of PLA2

inhibitors to our attention: Dr. Mayte Pisabarro for her contribution to the modelling ofthe PLA2 inhibitors; Dr. Kate Holloway for provision of cartesian coordinates for the training set of inhibitors and L-689, 502-bound HIV-1 proteinase; and Dr. P.J. Kraulis for the MOLSCRIPT program.

A.R.O. was the recipient of a predoctoral fellowship from the Comunidad Autonomade Madrid. This work was supported in part by Laboratorios Menarini, S.A. (Badalona,Spain) and the Spanish CICYT (SAF projects 94/630 and 96/231 to F.G.).

References

1.2.3.4.5.

6.7.

8.

Kubinyi. H., QSAR: Hansch analysis and related approaches, VCH, Weinheim. 1993. Kubinyi, H. (Ed.), 3D-QSAR in drug design: Thory methods and applications, ESCOM, Leiden, 1993.Kuntz, I.D.. Structure-basedstrategies for drug designed and discovery, ScienceBlundell, T.L., Structure-based drug design. Nature,381 Suppl. (1996) 23–26 Greer, J.. Erickson, J.W., Baldwin, J.J. and Varney, M.D., Application of the three-dimensional struc-tures of protein target molecules in structure-based drug design, J. Med. Chem., 37 ( 1994) 1035–1054.Hogan, J.C., Directed combinatorial chemistry. Nature,384 Suppl. (1996) 17- 19.Ortiz, A.R., Pisabarro, M.T., Gago. F. and Wade, R.C., Prediction of drug binding affinites by com-parative binding energy analysis, J. Med. Chem., 38 (1995) 2681-2691.Ortiz, A.R., Pisabrro, M.T., Gago. F. and Wade, R.C.. Prediction of drug binding affinites by compara- tive binding energy analysis: Application to human synovial fluid phospholipase A2 Inhibitors, In QSARand molccular modelling: Concepts. computational tools and biological applications, Sanz. F., Giraldo.J. and Manaut, F. (Eds.) J.R. Prous, Barcelona. 1995, pp. 439–443.Cramer III, R.D.. Patterson, D.E. and Bunce. J.D., Comparative molecular field analysis (CoMFA):1. Effect of shape on binding of steroids to carrier proteins, J. Am. Chem. Suc., 110 (1988) 5959–5067.Straatsma. T.P. and McCammon, J.A., computational alchemy, Ann. Rev. Phys. Chem.. 43 (1992)407–435.

11. Kollman, P., Free energy calculation: Application to chemical and biochemical phenomena Chem.

257 (1992) 1078–1082.

9.

10.

12. Blancy, J.M.. Weiner, P.K.. Dearing, A.. Kollman, P.A.. Jorgensen, E.C.. Oatley, S.J.. Burridge, J.M.and Blake, C.C.F., molecular mechanics simulation of protein-ligand interactions: Binding of thyroid hormone analogues to prealbumin, J. AM. Chem. soc., 104 (1982) 6424–6434.Menziani, M.C., De Benedeti, P.G., Gago. F. and Richards. W.G.. The binding of benzene sulfonamidesto carbonic anhydrase enzyme: A molecular mechanics study and quantitative structure-activity rela- tionship, J. Med. Chem.. 32 (1989) 951– 956.Holloway. M.K., Wai, J.M., Halgren, T.A., Fitzgerald, P.M.D., Vacca, J.P., Dorsey, B.D., Levin, R.B., Thompson, W.J., Chen. L.J., deSolms, S.J.. Gaffin, N., Ghosh. A.K., Giuliani, E.A., Graham, S.L., Guare, J.P., Hungate, R.W., Lyle. T.A., Sanders. W.M., Tucker, T.J., Wiggins, M., Wiscount, C.M.,

13.

14.

33

Rev., 93 (1993) 2395–2417

Page 44: 3D QSAR in Drug Design. Vol2

Rebecca C. Wade, Angel R. Ortiz and Federico Gago

Woltersdorf, O.W., Young,

Grootenhuis. P.D.J. and van Galen, P.J.M., Correlation of binding affinties with non-bonded interactionenergies of thrombin–inhibitor complexes, Acta Cry st., D51 (1995) 560-566.Åqvist, J., Medina. C. and Samnelsson, J.E., A new method for predicting binding affinity in computer- aided drug design, Prot. Eng., 7 (I 994) 385-391Böhm. H.-J., The development of a simple empirical scoring function to estimate the binding constant for a protein–ligand complex of known three-dimensional structure J. Comput-Aided Mol. Design. 8

Head, R.D. Smythe. M.L., Oprea. T.I., Waller, C.L., Green, S.M. and Marshall, G.R., VALIDATE: Anew method for the receptor- based prediction of binding affinites of novel ligands. J. Am. Chem. soc.,118 (1996) 3959-3969.Ortiz, A.R., Pastor, M.. Palomer. A., Cruciani, G., Gago. F. and Wade, R.C.. Reliability of comparativemolecular field analysis models: Effects of data scaling and variable selection using a set of humansynovial phospholipase A2 inhibitors, J. Med. Chem. (1997) 40. 1136-1148.Honig, B. and Nicholls, A., Classical electrostatics in biology and chemistry Science, 268 (1995)1144–1149.Wold, S., PLS–partial least squares projections to latent structures. In 3D-QSAR in drug dcsign:Theory. methods and applications. Kubinyi, H. (Ed.). ESCOM. Leiden. 1993. pp. 523-550.Baroni, M., Constantino. G., Cruciani, G., Riganelli. D., Valigi. R. and Clementi, S.. Generating optimallinear PLS estimations (GOLPE): An advanced chemometric tool for handling 3DQSAR problems,Quant. Struct.-Act. Relat., 12 (1993) 9–20.Schevitz, R.W., Bach, N.J., Carlson, D.G., Chrigadze. N.Y., Clawson, D.K., Dillard, R.D.. Draheim.S.E., Hartley, L.W., Jones, N.D., Mihelich, E.D., Olkowski, J.L., Snyer, D.W , Sommers, C. and Wery, J.-P., structure-based design of the first potent and selective inhibitor of human non-pancreatic secre- tary PhospholipaseA2, Nat. Struct. Biol., 2 (1995) 458–465.cornell, W.D., Cieplak. P., Bayly. C.I., Gould, I.R., Merz, K.M.. Ferguson, D.M.. Spellmeyer, D.C.,Fox, T., Caldwell. J.W. and Kollman. P A., A second generation force field for the simulation ofproteins, nucleic acids, and organic molecules. J. Am. Chem. Soc.. 117 (1995) 5 179-5197.Gilson. M.K., Sharp. K.A. and Honig. B.H., Calculating the electrostatic potential of molecules inrolution: Method and error assessment, J. comput. Chem., 9 (1987) 327-335.Wheeler, T.N., Elanchard, S.G., Andrew, R.C., Fang, F., Gray- Nunez, Y., Harris, C.O., Lambert,M.H., Mehrotra, M.M., Parks, D.J., Ray, J.A. and Smalley, T.L., substrate specificity in short- chainphospholipid analogs at the active site of human synovial phospholipase A2 , J. Med. Chem., 37 (1995)4118–4129.Ajay and Murcko, M.A., Computational methods to predict binding free energy in ligand–receptorcomplexes, J. Med. Chem., 38 (1995) 4952–4967.Gilson, M.K., Given, J.A., Bush, B.L. and McCammon, J.A., The statistical thermodynamic basis forcomputation of binding affinites: a critical review, Biophys. J., 72 (1997) 1047–1069.Lybrand, T.P., Ligand-protein docking and rational drug design, Curr. Op. Str. Biol. 5 (1995) 224–228.Checa, A., Ortiz, A.R., de Pascual Teresa, B. and Gogo, F., Assessment of solvation effects on calculatedbinding affinity differences: Trypsin inhibition by flavonoids as a model system for congeneric series.Med. Chem., in press.Mattos, C. and Ringe, D., Multiple binding modes. In 3D QSAR in Drug design: Theory, methods andapplications, Kubinyi, H., (Ed.) ESCOM. Leiden 1993, pp. 226-254.Kraulis, P.J., MOLSCRIPT: A program to produce both detailed and schematic plots of proteinstructures, J. Appl. Crystallog., 24 ( 1991 ) 946–950.

S.D., Darke, P.L. and Zugay, J.A. A priori prediction of activity for HIV-1 protease inhibitors employing energy minimization in the active site, J. Med. Chem., 38 (1995) 305–317.

15.

16.

17.

(1994) 243–256.18.

19.

21.

22.

23.

24.

25.

26.

27.

28.

29.30.

31.

32.

34

Page 45: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

Tudor I. Opreaª* and Garland R. MarshallbªDepartment of Medicinal Chemistry, ASTRA Hässle AB, S-43183 Mölndal, Sweden

Fax # +46 31 776 3710; E-mail [email protected] b Center for Molecular Design, Washington University, St. Louis, MO 63130, U.S.A.

1. Introduction

The pharmaceutical industry aims at the therapeutic manipulation of macromoleculartargets, collectively termed receptors, using specific ligands (drugs). These receptors are macromolecules specialized in recognizing a specific molecular pattern from the large number of surrounding molecular species with which it could interact. Several categories of macromolecules are included:

1 . pharmacological receptors: macromolecular complexes that can be activated byspecific signal molecules (e.g. agonists), process followed by a specific biological response from the cell (organ) associated with these complexes;enzymes: proteins that catalyze specific biochemical reactions upon substrate(ligand) binding with (± high) specificity:antibodies: macromolecules (receptors) that ± specifically bind antigens (hap- tens), then activate cellular (immune) responses in the presence of these antigens;DNA: nucleic acid chains that (±) specifically bind drugs (e.g. antiviral or anti-mitotic) designed to block DNA replication.

2.

3.

4.

A major task for today's medicinal chemistry units is to reduce research costs, since on the average one in 10 000 screened compounds may reach the market. In the past decade, efforts to reduce these costs have relied on computational and, recently, com- binatorial chemistry. Computational chemists use (largely) theoretical methods and tools implemented as software to design novel cornpounds with optimized biological properties for a particular therapeutic target. Because vast numbers of candidate com-pounds can be virtually generated in the computer (Fig. 1), an important tool in thecomputational chemist's arsenal has become the prediction of the binding affinity of these virtual compounds to the given receptor.

Once a lead compound has been obtained via screening, medicinal chemists generate congeneric compounds, which aim at preserving the same scaffold while replacing key pharmacophoric [1] features with isosteric groups [2–7]. Van Drie et al. [5] have described a program ALADDIN for the design or recognition of compounds that meet geometric, steric or substructural criteria. whereas DOCK, a cavity-matching algorithm [8–11] has been quite successful in finding non-congeneric molecules of the correctshape to interact with a receptor [12,13]. Caflisch et al. [14] have used a fragment- based approach to map high-affinity sites on HIV-1 protease [15] with the idea of

* T o whom correspondence should be addressed.

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design, Volume 2 35-61.© 1998 Kluwer Academic Publishers, Printed in Great Britan.

Page 46: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. Marshall

Fig. 1. The importance of affinity prediction in computational chemistry.

synthetically joining these fragments to create high-affinity ligands. A conceptually similar, experimentally based method, was used by Fesik et al. [16], who determined the relative binding modes of two low-affinity ligands by NMR, then connected them to create a high-affinity lead in their ‘SAR by NMR’ approach.

The importance of altering chemical properties of a given molecular scaffold has been recognized and used in a number of methods that are capable of generating novel structures in the computer [17-33]. All of these approaches attempt to help the med-icinal chemist discover novel compounds which will be recognized at a given receptor and have proven successful in applications to HIV protease [34–39]. Micromolar leadswere found based on haloperidol [10], coumarin analogs [40,41], cyclic ureas [39], and a variety of other structures. Analogs of the coumarin and cyclic urea leads which have been optimized for affinity and bioavailability have been advanced toward clinical trials.

We have recently reviewed [42] these methods in the context of affinity prediction using 3D QSAR methods. De novo design methods, docking and other molecular mod-elling techniques — e.g. computational combinatorial chemistry [43] — must handle large numbers of structures, typically in the thousands. and therefore need a sorting pro-cedure in order to rank these virtual molecules. This is typically performed using a scoring funiction which evaluates the binding affinity by estimating the ligand-receptorfree energy of binding, ∆GL

bind– R. Scoring functions [44-48] are empirical paradigms that

estimate, in the molecular mechanics approximation (or some other framework that esti- mates non-covalent interactions), the intermolecular energies of the electrostatic and van der Waals (vdW) forces, including a hydrogen-bonding term, some estimate of the entropic and desolvation costs, as well as hydrophobic interactions. The use of free- energy perturbation (FEP) methods [49] to calculate the ∆∆G of binding is not yet prac-tical as it is limited to minor modifications of compounds with known activity and requires significant computational resources, limiting their applicability.

An intermediate method between FEP methods and scoring functions has been pro- posed by Åqvist and co-workers [50-51], in their linear interaction energy approach, which averages the interactions between the inhibitor and its surroundings using molec- ular dynamics for the bound and free states only. Another intermediate method con- putes the standard free energy of binding for the predominant states [52] by first

36

Page 47: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

identifying the major minima of the potential energy and solvation energy functions, then evaluating their contribution to the configuration integrals of the ligand, the recep- tor and their complex, using a conformational energy search method [53]. Thisapproach has been successfully applied to cyclic urea inhibitors [39] binding to HIV-1protease [52].

The molecular mechanics interaction energy [54] has been used for a series of HIV-1 protease inhibitors with limited diversity, whereas 3D QSAR methods have been more suitable for a series with higher diversity [55–57]. Such models could also be used toevaluate software-generated compounds of diverse scaffolds for synthetic prioritization. As most of these approaches are reviewed elsewhere in this volume, we focus on themethods that use the receptor's 3D structure to derive the scoring function.

2. Scoring Functions: General Comments

The calculation of the free energy of binding is based on the linear free energy relation-ship (LFER) Formalism that relates ∆G0

bind , the standard free energy of binding, to thelogarithm of the dissociation constant, –logKD , at thermodynamic equilibrium concen-trations of the ligand, [L], receptor, [R], and the corresponding [L – R ] complex, for thereaction L + R → L – R:

(1)

where R = 8.314 J/mol/K, and T is the temperature. For T = 310 K, 2.303RT =5.936 kJ/mol. At 37°C, ∆G0

bind is approx. 6.0*pKD (kJ/mol), or 1.42*pKD (kca/mol). Infact, we model ∆GL

bindR– , but we assume the same reference state, so we often substitute

∆ G0bind with ∆GL

bind– R The binding of a ligand to a receptor, a multi-step process, evaluated

by the total ∆ G0bind (Eq. 1),includes sequential steps going from independent ligands and

receptors in the surrounding physiological environment to the ligand-receptor complex. Any accessible conformation can, in principle, change into the active one during this process. Entropy is lost during reversible or irreversible binding and gained in the desol- vation process because of the waters freed from the binding site and the ligand's hy- dration shell. Hydrophobic forces typically play an important role. One has to ascertain that no 1-ate-limiting steps occur during intermediate stages, and that non-specific binding does not obscure the experimental binding affinity. All the parameters that cannot be directly measured remain hidden (e.g. receptor-induced conformational changes of the ligand, geometric variations of the binding site, etc.), and the scoring functions use a time-sliced (frozen) model (i.e. the system is at equilibrium and time- independent). Kinetic bottlenecks in the intermediate steps may occur. and ∆GL

bind– R

remains thermodynamic in nature (not kinetic). Several assumptions are made when deriving a scoring function, since binding free

energy is, for most cases, approximately calculated [58]:

1. The modelled compound, not its metabolite(s) or any of its derivatives, produces the observed effect.

37

Page 48: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. Marshall

2. The proposed (modelled) geometry is, with few exceptions, considered rigid for thereceptor, and modelled as a single (bioactive) conformation for the ligand which exerts, in this single conformation, the binding effects; the dynamic nature of this process, as shown for lactate dehydrogenase, which is likely to assume different conformational states at the binding site [59], is typically ignored. The loss of translational and rotational entropy upon binding is assumed to follow a similar pattern for all compounds; an additional entropic cost is considered for freezing the single-bond rotors. The binding site is the same for the modelled compounds. The binding free energy, is largely explained within a molecular mechanicsframework, and is prone to the inherent errors of the force field. The on-offrate is similar for modelled compounds (i.e. the system is considered to be at equilibrium), and kinetic aspects are usually not considered. Solvent effects, temperature, diffusion, transport, pH, salt concentrations and other factors that contribute to the overall are not considered.

3

4.5.

6.

7.

Quite frequently the binding free energy is expressed as the sum of the free energy components, conceptually shown in the master equation (Eq. 2) [58]:

(2)

which accounts for contributions due to the solvent (∆Gsol), to conformational changesin both ligand and protein (Gconf), to the ligand-protein intermolecular interactions(∆Gint) and to the motion in the ligand and protein once they are at close range(∆Gmotion). The master equation (Eq. 2) can also be written as:

(3)

where is separated at equilibrium into solvation effects (∆Gsol), and two com-ponents for the process in vacuum: the internal energy (∆Uvac) and entropy (T∆Svac).∆Gsol can be calculated with a variety of methods [58], while T∆Svac is often related tothe number of non-methyl single bonds [47]. Both ∆Gsol and T∆Svac are assumed tohave similar values for congeneric series, hence Eq. 3 is widely used in QSAR studiesby expanding only the internal energy term:

(4)

which includes the steric (vdW ) and electrostatic (coul) aspects of the ligand-receptorinteraction ( and ), the distortions (distort) induced by this interaction inboth ligand and receptor ( and ) and the ligand-induced conformational changes of the receptor ( reprsents agonist-induced conformational re- arrangements of the receptor that may be an important component of signal transduction and are not considered to occur upon antagonist binding to the same receptor [60].

The receptor-based scoring functions compute, using different approximations, the terms of the master equation (Eq. 2), using Eq. 1 to convert the binding-affinity data into

38

Page 49: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

its free energy equivalent. We focus here on some of the recent scoring functions that aim at predicting small ligands, and not proteins. For a thorough review of methods that predict the binding free energy, the reader is referred to the work of Ajay and Murcko [58].

3. The LUDI Scoring Function

This regression-based scoring function [44] aims to predict fast and accurate binding affinities for de novo designed ligands generated by the LUDI [23] program. This approach estimates by approximating the contributions for hydrogen bonding,for entropy due to frozen rotatable bonds due to binding and for desolvation based on hydrophobic complementarity, using the following master equation:

(5)

where ∆G0 is related to the reduction in rotational and translational entropy, ∆Ghb is thefree energy associated with hydrogen-bond formation, ∆Gionic is the binding energyfrom ionic interactions, ∆Glipo, is the lipophilic interaction contribution and ∆Grot is theenergy loss by freezing the internal degrees of freedom in the ligand. A penalty func-tion, ƒ(∆R, ∆α), is introduced to track large deviations from the ideal hydrogen-bond:

where,ƒ(∆R, ∆α) = ƒ1(∆R)ƒ2 (∆α)

and ∆R is the deviation of the hydrogen-bond length H...O/N from the ideal 1.9 Å value, and ∆α is the deviation of the hydrogen-bond angle N/O–H...O/N from its ideal180° value. As defined, this function tolerates small deviations of up to 0.2 Å and 30°from the ideal geometry.

Böhm [44] analyzed 45 protein–ligand complexes (affinity range = –9 to –76 kJ/mol)taken from the Protein Data Bank (PDB) [61], and found the following equation by multiple-regression analysis (r2 = 0.76, S = 7.9 kJ/mol) and cross-validation(q2 = 0.696, Spress = 9.3 kJ/mol = 2.2 kcal/mol):

(6)

39

Page 50: 3D QSAR in Drug Design. Vol2

Tudor 1. Oprea and Garland R. marshall

Fig. 2. predicted versus experimental pKi value for fourteen inhibitors (see reference [44]).

The training set covered 12 orders of magnitude in binding affinity, and ligand weights ranging from 66 to 1047 daltons. The training set included 9 crystalline ligand–receptor complexes and 5 ligands docked to DHFR (see Fig. 2). This function was primarily aimed at small ligands and developed with the speed factor in mind (hence. just five ad- justable parameters). Both VALIDATE [47] and Jain’s scoring function [48] have been inspired by this scoring function, as reflected throughout this chapter.

4. The Wallyvist Scoring Function

Wallqvist et al. proposed a knowledge-based potential based on inter-atomic contact preferences between ligand and receptor atoms [45]. This model was parameterized by an analysis of 38 high-resolution protein crystal complexes taken from PDB [61]. Forthese ligand–protein complexes. molecular surfaces have been generated using theConnolly algorithm [62] and Bondi vdW radii [63]. The interface surface area for eachatom was then catalogued. using a packing score. Sab, calculated for all ab pairs withdab ≤ 2.8 Å:

(7)

where dab is the distance between surface elements a and b, and θ(ra, rb) is the angleformed by the surface normal vectors ra and rb at these points. For each surface element. the area was totaled for each atom pair and used to determine the atom–atom preference score, Pij, as the ratio of the total interfacial area contributed by each atom pair. Fij, nor-malized by the product of the fractional contribution of each atom in the pair. Fi and Fj:

(8)

40

Page 51: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

where the highest- and lowest-scoring preference, Pij, were used to identify the mostand least-observed adjacent atomic surfaces in the dataset.

(9)

where αij is the sum performed for all atoms i of molecule A and j of molecule B. Due tothe paucity of data used, αij is assumed to have two physical components: one that is di-rectly proportional to the surface area, and independent of the surface atom type, andanother that is specific to the buried atom. The coefficients γ and δ are obtained by min-imizing – f or the 38 enzyme-inhibitor complexes whose was cal-culated from their dissociation constant via Eq. l. The energy range was between –18and –12 kcal/mol, and the linear fit RMS deviation was 1.5 kcal/mol (R = 0.74). Theregression analysis yields the following scoring function

(10)

where amin = –59.0 cal/mol/Å. The fit, while not excellent, clearly establishes a connection between atom-atomin-

terfacial contacts and the The |∆Gpred – Gexp| error can be attributed, besides theinherent small dataset problem, to the inter- and intra-experimental errors that occur upon measuring ∆Gexp. The authors further constructed an average interfacial bindingparameter α− i for each atom type in their set, largely hydrophobic in nature. Its analysisrevealed the range and details of specific interactions that are deemed important atprotein interfaces. This scoring function empirically estimates the buried surface of a ligand-receptor complex and is capable of specifically identifying atom-atom andresidue-residue preferences in such complexes. Some prior modelling efforts are needed — e.g. docking of the ligand in the binding site and some conformational analysis. Chemical constructions based on a given optimized surface are still an open issue.

5. The Verkhivker Scoring Function

This approach [46] is a knowledge-based interaction potential based on Sippl’s [64] knowledge-based approach, and parameterized for HIV- 1 protease [15] ligand–receptor complexes using the following master equation (Eq. 11)

(11)

where ∆G L– R interaction is further subdivided in ∆Gn-n (the interaction between nonpolargroups only) and ∆Gn-p.p–p,p–n (the interaction between polar and nonpolar groups). The

41

Page 52: 3D QSAR in Drug Design. Vol2

Tudor- I. Oprea and Garland R. marshall

reader is referred to the original work [46] for a complete description of each term in the above equation: here we just give the ligand-receptor pairwise interaction potentials asan example.

Using a set of 30 pairs of ligand-protein complexes of HIV-I, HIV-2 and SIV pro-teases, a set of distance-dependent interaction potentials was derived for 12 atom pairsrelevant for protein-ligand interactions, using Sippl’s [64] original approach and theCHARMM_19 vdW radii [65]. for a given atom pair a (ligand) and b (receptor)separated by a distance d

(12)

where mab is the number of pairs with ligand atom of type a and receptor atom of typeb; σ is the weight given to each observation;ƒab(d) is the frequency with which this pairof atoms is observed at interatomic distance d; andƒ(d) is the total number of atom pairsof all types that are separated by distance d. In this work [46]. the probabilities of ob-serving a particular distance were computed. normalized by the frequencies observed for all types of L–R atom pairs in the training set. then translated into mean force poten- tials. To minimize the effect of low-occurrence data points in the parametcrization, a cross-validation procedure was applied by eliminating the evaluated complex from the training set for the empirical calculations of the seven studied HIV-1 protease in-hibitor complexes [46]. The authors further break down the contributions of various terms from their master equation (Eq. 11) for the studied inhibitors, providing a ra- tionale for the variation in binding affinity of these ligands. The enthalpy–entropy compensation effect [66] appears to be supported by their model, which provides a reasonable cstimute for the absolute binding free energy (Fig. 3).

Fig. 3. Calculated versus experimemal binding free energies for seven HIV- I protease inhibitors (see reference [46]).

42

Page 53: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

6. VALIDATE

VALIDATE [47] is a hybrid approach explicitly based on (and dependent on) a QSAR training set paradigm that calculates physico-chemical properties of both ligand and L–R complex to estimate based on the following master equation (Eq. 13):

(13)

where βI – β are the fitted regression coefficients for the master equation terms(Eqs. 14-20), clarified below.

6.1.

Structural complementarity is essential for the specific binding of a ligand to the re-ceptor at the binding site. The non-bonded steric interaction energy is computed from the explicit sum of the Lennard-Jones potentials:

The steric and energetic intermolecular interactions and the steric fit

(14)

where

and rij is the distance between atom center i and atom center j, Ri,εi is the vdW radius,epsilon value of atom i, and Ri,εj is the vdW radius, epsilon value of atom j.

The electrostatic interaction energy is the explicit sum of the Coulombic potentials

(15)

using partial atomic charges on the ligand (qi) and the receptor (qj) from the imple-mentation of the Amber force field within the MacroModel program.

Additionally, we have included a steric complementarity fit (SF) to describe thepacking of a ligand in the receptor binding site. The steric fit is computed by summing the number of ‘good contacts’ for each atom of the ligand which is contained in theactive site. For example, antibody-steroid complexes and chymotrypsin-binding ligands actually contain a considerable percentage of their atoms outside of the active site,

43

Page 54: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. Marshall

while HIV protease inhibitors are largely surrounded by the receptor. VALIDATE defines a good contact as an instance where the vdW surface of a ligand atom is within a modifiable parameter, e, of the vdW surface of a receptor atom:

where

(16)

and N is the number of ligand atoms contained in the active site: ri,rj is the van derWaals radii of atoms i and j, and dij is the distance from atom center i to atom center j.For SF, we investigated e* values ranging from 0.1 Å to 0.3 Å, but we reported resultsfor ε* = 0.3 Å only. The steric fit is also normalized by the number of ligand atomswithin the active site.

6.2.

The lipophilicity of a hydrophobe is estimated by the energy needed to create a cavity in the ayucous solvent in which that molecule can tit. When hydrophobic ligands bind to the receptor, the energy of cavity creation is released. entropically favoring the process. Log P, the partition coefficient for octanol-water [67], gauges the ligand's preferencefor the active site of the receptor versus the aqueous solvent. We use the fragment-basedH log P method in Hint 1.1 [ 68] to compute the ligand's partition coefficient. With thepartition coefficient. a negative value indicates a preferencc for a polar (hydrophilic) environment and a positive value indicates a preference for a nonpolar (lipophilic) environment. Based on direct observations of the nature of the binding site, for example. for HTV- 1 protease (predominantly lipophilic) and L-arabinose sugar-binding protein (predominantly hydrophilic). we cornputc the amount of hydrophilic and lipophilic surface area as ratios to the total surface area of the receptor active site. The final value of the partition coefficient is then modified based on this information:

Ligand transfer from solution into the binding site

PC = RC* HlogP_PC (17)

where HlogP_PC is the partition coefficient as computed by Hint 1.1, and RC = 1 if thereceptor active site is predominantly lipophilic and –1 if the receptor active site is predominantly hydrophilic.

The sign of the coefficient RC is determined using the sums of the lipophilic and hydrophilic surface areas of the active site. For a hydrophilic receptor active site with RC = –1, one of the following criteria must bc true:

1. If less than tive ligands are available for this active site, all calculations must yield that at least 55% of the total surface of the active site is hydrophilic.

44

Page 55: 3D QSAR in Drug Design. Vol2

Rcceptor-Based Prediction of Binding Affinities

2. If five or more ligands are available, at least half of the calculations must yield that 55% or more of their total surface area is hydrophilic. while the remaining cal-culations must yield that at least a majority of the total surface area is hydrophilic. This decision is taken by determining the hydrophobic/hydrophilic preference of the active site based on surface area calculations [69] (see section 6.4), defining surface types in the same way as Böhm [44]: any carbon that is covalently bound to no more than one non-carbon is considered lipophilic, and any hydrogen connected to such a carbon is also lipophilic. All other atoms are considered hydrophilic. For protein-protein systems, a different treatnieni for this calculation was performed,as a large portion of the protein inhibitor remains freely accessible to the solvent and is not bound at the active site. Since only the active site, a small portion of theprotein ligand, is desolvated by binding to the rcccptor, only the HlogP for thisregion is relevant. Therefore, this region was extracted from the protein and the calculation was done only on this part of the molecule.

6.3.

Changes in conformational entropy occur when the freely rotating side chains of the dissociated components are forced to adopt more rigid conformations on complex for- mation. VALIDATE estimates the change in conformational entropy by counting thenumber of rotatable bonds. Nrrot. bonds. All non-terminal single bonds (except methyl groups) are included. For non-aromatic ring systems. the number of degrees of freedomis of the order n – 4, where n is the number of bonds in the ring. The rotatable bondcount is:

Con onformational entropy and enthalpy

Nrrot. bonds = Nrntsb + ∑i

(ni – 4) (18)

where Nrntsb is the number of non-terminal single bonds. and ni is the number of singlebonds in ring i. As for the HlogP calculations. an exception was made for protein- protein systems, because a large portion of the protein inhibitor is not bound and remains freely accessible to the solvent. VALIDATE counts only the rotatable bonds at the active site interface.

The change in conformational cnthalpy. is approximated here by the amount of energy required for the ligand to adopt the receptor-bound conformation, definedby:

(19)

where ELbind. site is the energy of the ligand’s receptor-bound conformation, and EL

sol is theenergy of the ligand in solvent at its nearest local minimum. calculated by comparingthe energy of the receptor-bound conformation of the ligand to the nearest local minimum of the unbound ligand using the GB/SA [ 70] solvation model with the Amber all-atom force-field implementation in MacroModel.

45

Page 56: 3D QSAR in Drug Design. Vol2

Tudor. I. Oprea and Garland R. Marshall

6.4.

VALIDATE computes four components to surface complementarity. These arc lipo-philic complementarity (nonpolar/nonpolar). hydrophilic complementarity (polar/polar,opposite charge), lipophilic/hydrophilic (polar/nonpolar) non-complementarity. andhydrophilic (polar/polar, like charge) non-complementarity. VALIDATE uses 256evenly distributed points. obtained from the SASA program [69], placed on the vdWsurface of euch receptor atom whose vdW surface is within 5 Å of the atom center ofany ligand atom. If a point on this surface is within a mean solvent radius (1.4 Å forwater) of the vdW surface or a ligand atom. it is considered a contact point. Its type is based upon the determination of the polar/nonpolar nature of both atoms and the criteria discussed above.

VALIDATE has two different types of CSA calculation. In the first method, the ab-solute surface area between ligand and receptor (similar to Bohm [44]) is computedusing the type of contact observed for each point on the receptor surface. Lipophilic complementarity is counted only once, even if that point is within the distance limit described above of more than 1 ligand atom’s vdW surface. The total surface on eachatom for each type of contact is computed by dividing the number of contact points of that type by 256 (the total number of points possible) and then multiplying by the total surface area of the atom:

Contact su rface a rears: the ligand–receptor interface

(20)

where CPi is the number of contact points on atom i and ri is the vdW radius of atom i.The second method is a pairwise summation. similar to Hint 1.1 [68]. For a single

point of surface contact of a receptor atom within the distance or n ligand, wc record asum of n, instead of 1, as described above.

VALIDATE included both descriptors. as their combination improved both the fitted model and its predictivity. Due to the structural diversity in the training set, we ob-served that different receptor–ligand pairs had different CSA values. We, therefore, scaled the computed surface areas by dividing the total surface area of HIV-1 protease(chosen as the most representative receptor in the dataset) to the largest total surface area computed for a given receptor’s active site for any of its known ligands. CSAvalues were then multiplied with the scaling factor.

6.6. VALIDATE results

VALIDATE was trained on 51 receptor–ligand co-crystallized stmctures [47] availablefrom PDB [61] (see Table 1 ), This training set included 15 HIV-1 protease–inhibitor.9 thermolysin–inhibitor, 12 endothiapepsin–inhibitor. 8 L-arabinose binding protein-sugar, 4 antibody–steroid. 4 subtilisin-Novo-protein and 2 ß-trypsin–protein com-plexes, respectively. Ligand size ranged from 24 atoms (Leu-NHOH) to 1512 atoms(SSI M70G M73K), and the pKi was between 2.47 and 14.0. The crystal-structure based test set (Table 2) included 14 inhibitors which were obtained from PDB. Neither ligands

46

Page 57: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

Table 1

Protein–inhibitor PDB code pKi

HIV – AG1001 N/A 6.44HIV – AG1002 N/A 6.26HIV – AG1004 N/A 6.64HIV – Roche N/A 6.9 1

HlV – SC52964 N/A 10

Receptor–ligand list for the VALIDATE training set

HIV – MVT101 4hvp 6.12

HIV – JG365 7hvp 9.62HIV – Acetylpepstatin 5hvp 7.85HIV – GR I 16624X N/A 8.27HIV – U75875 1hiv 9HIV – L-689,502 N/A 8.95HIV – A74704 9hvp 8.35HIV – A77003 1 hvi 10.02HIV – Hydroxpethylene 1aaq 7.74HIV – L-700,4 17 4phv 9.15Thermolysin – Phosphoramidon 1 tlp 7.55Thermolysin – N-( 1 carboxy-3phenyl)-L-LeuTrp 1tmn 7.47Theromlysin – N-phosphoryl-L-letrcineamide 2tmn 4.1Thermolysin – ValTryp 3tmn 5.9Thermolysin – Lcu-NHOH 4tln 3.72Thermolysin – ZFPLA 4tmn 10.19Thermolysin – ZGp(NH)LL 5 tm n 8 .04 Thermolysin – ZGp(O)LL 6tmn 5.05Thermolysin – CH2CO-Leu-OCH3 7tln 2.47Endothiapepsin – PD 125754 leed 4.9Endothiapepsin – L-364.099 2er0 6.4Endothiapcpsin – H 256 2er6 7.2Endothiapepsin – H 261 2er7 9Endothiapepsin – L-363,564 2er9 7.4

Endothiapepsin – PD 125967 4er1 6.6Endothinpepsin – H 142 4er4 6.8Endothiapepsin – CP 69,799 5er2 6.6

Endothiapepsin – CP 7 1,362 3er3 7.1

L-arabinose Bind. Prot. – L-arabinose 1ahe 56.L-arabinose Rind. Prot. – D-fucose 1 abf 5.2L-arabinose Bind. Prot. P254G – D-fucose 1 abp 5.8L-arabinose Bind. Prot. P254G – L-arabinose 1hap 6.9L-arabinose Rind. Prot. P254G – D-galactose 9abp 8L-arabinose Bind. Prot. MI 08L – L-arabinose 6abp 7L-arabinose Bind. Prot. MI OXL – D-fucose 7abp 5.4L-arabinose Bind. Prot. M 108L – D-galactose 8abp 6.6beta-Trypsin – UPTI 1tpa 14Beta-Trypin – PTI 2pte 13.3DB3 – progesterone 11a-ol hemisuccinate 1dbm 9.44DB3 – 5a-prepnane -3b-ol hemissuccinate 2dbl 8.7

DB3 – Progesterone 1dbb 9Subtilisin-Novo – Eglin c L45R 1sbn 10.3

DB3 – Etiocholanolone 1dhj 7.62

Subtilisin-Novo – CI-2 2sni 11Subtilisin-Novo – SSI M73K 3sic 10.2Subtilisin-Novo – SSI M70G M73K 5sic 10.2

47

Page 58: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and garland R. Marshall

Fig. 4. Cross-validated PLS analysis of 51 complexes in the training set (see Table 1). q2 = 0. 776 and stand-ard error (Spress) = 1. 139 for the six-components model (similar results obtained with SONNIC, a neural network program

Table 2 Crystal structures used as test set for the VALIDATE base model

pKi

PDB code Actual Predicted

DHFR – Folate 1 dhf 7.1 7.29DHFR – Methotrexate 1dds 8.3 6.4Penicillipepsin – IvaVVLySta-OEt 1apt 9.4 7.71Penieillipepsin – Iva VVSta-OEt 1 apv 7.7 8.04Carboxypeptidase – L-benzylsuccinate 1 cbx 6.3 5.92Carboxypeptidase – GlyTyr 3cpa 4 5.14Carboxypeptidase – LAGp(0)F 8cpa 9.1 9.39Alpha-Thrombin – MDI. 28050 1 ths 7.1 7AlpIia-Thrombin – NAPAP ldwd 8.2 8.51Trypsinogen – IleVal 2tpi 3.3 3.35Trypsinogen – ValVal 4tpi 2.0 3.39DNA – Daunomyein 1 da0 6.5 6.1DNA – Netropsin 121d 8.8 9.59DNA – 4-6-diamidine-2-phenyl indole 1d30 6.3 5.04

nor the specific receptors in this test set were included in the training set. Included were 2 DHFR, 2 penicillipepsin, 3 carboxypeptidase, 2 α-thrombin, 2 trypsinogen and3 DNA complexes. Actual versus predicted affinities are shown in Fig. 4 (cross-validation results on the training set) and Fig. 5 (test set), respectively.

An extensive discussion of the VALIDATE results was presented in the initial publi- cation of this work. In the base model, the electrostatic interaction energy contributed only 2.7% to the final PLS model which may, in part, be explained by the inadequate approximations of the partial charge representation in our model derivation. The steric

48

Page 59: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

VALIDATE Base Model Test Set

Fig. 5. Prediction of affinities of 14 crystalline complexes using the VALIDATE hose model: predictive r² = 0.806 and absolute average error = 0.697.

fit parameter also failed to give a significant contribution to the overall model, but itproved to be a major contributor in a series of 22 steroids binding to DB3, the mono-clonal antibody against progesterone [71]. In this particular case, the surface of thesteroid ligands is approx. 45% exposed to the aqueous solvent, therefore a significantnumber of ligand atoms are not in the binding site. Several limitations of VALIDATEare addressed in VALIDATE II, discussed below.

6.7.

Compared to the LUDI scoring function [44] (see Eq. 5),VALIDATE did not explicitlyconsider hydrogen-bonds (Eq. 13), as initial attempts to incorporate this parameter into

VALIDATE II: a scoring function for predicting hiv-1 protease inhibitors

VALIDATE II Model Test Set

Fig. 6. Prediction of affinities of 363 HIV-1 protease inhibitors using the VALIDATE II model: predictiver² = 0.48 and absolute average error = 1.05.

49

Page 60: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. marshall

Table 3

Protein–inhibitor PDB code pKi

HIV – A79285 1 dif 10.66HIV – SB203238 1 hbv 6.37 HIV – SKF108738 1hef 8.8HIV – SKF107457 1heg 7.93HI\/ – CGP53820 1hih 5.53HIV – SB204 144 1hos 8.55 HlV – SB206343 1hps 9.22 HlV – VX477 1 hpv 9.22 HIV – KNI272 1hpx 8.22 HIV – L735524 1hsg 9.42HIV – GR123976 1 hte 7 HIV – GR 126045 1 htf 8.19HIV – GR137615 1htg 9.78HIV [GGSSG linked] – A76928 1hvc 10.96HlV – A78791 1hvj 11.4HIV – A76928 1 hvk 10.96 HIV – A76889 1 hvl 9.95 HIV – XK263 1 hvr 9.51 HIV [V82A] – A77003 1hvs 10.3 HlV – cyc(Phe-Ilc-Val) 1 mtr 8.4

HIV – U8554SE 8hvp 8HIV – DMP323 [NMR average] 1 bvg 9.57HIV – U100313 2upj 10.39HlV – A9888 1 pro 11.3

Receptor–ligand list for the VALIDATE II training set

HIV – SB203386 1sbg 7.74

Note: The other HIV-1 protease inhibitor complexes used in VALIDATE II are listed in Table 1.

the base model reduced its accuracy. Since hydrogen-bonds are important in the receptor–ligand interaction, however. this was reflected in VALIDATE II, a scoring func-tion developed by Ragno and Marshall [manuscript in preparation]. VALIDATE II incor- porales 15 additional descriptors, including 3 related to hydrogen bonds: the number of ideal hydrogen bonds for the ligand in water. the number of hydrogen bonds between the ligand and receptor and the difference between the above two. Other descriptors were the AMSOL-derived polarization free energy of solvation. the cavity formation free energy, the ligand’s dipole moment in solvent and the HOMO energy. These descriptors were included in an attempt to more accurately describe the electrostatic contribution to ligand binding, which was less than 3% in the VALIDATE base model.

VALIDATE II model was trained using 39 HIV-1 protease inhibitor crystallographiccomplexes (see Table 3) and 29 explicative variables (14 from the VALIDATE basemodel). The two-PC model explains 91 % of the variance. and yields a r2 = 0.74 (stand-ard deviation of calculation. SDEC = 0.73). and a q2 = 0.52 (standard deviation of pre-diction. SDEP = 0.98). This wan better compared to using only the original VALIDATE descriptors: r2 = 0.55 (standard deviation of calculation. SDEC = 0.96) and a q2 = 0.37(standard deviation of prediction. SDEP = 1.12), which is not surprising considering that the number of descriptors is double in the first method. The external test set

50

Page 61: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

included 363 HIV-1 protease inhibitors, which were docked into the binding site usingthe nearest crystalline analog as template structure, using MacroModel. The affinityrange was between 6.11 and 11.4 for the test set, and 3.92 and 11.96 for the training set. Actual versus predicted affinities are shown in Fig. 6 for the 363 compounds (test set).

VALIDATE II shows a proof of concept — i.e. a scoring function that can be con-structed from crystal complexes that have no ambiguity with respect to the bindingmode. This function can be readily used to predict the affinities of compounds for whichthe binding mode is not known and has to be modelled, although in its current versionthe error is greater than one would like (± 1.3 kcal/mol). It was hoped that such genericmodels could embody the physical chemistry of binding. The fact that the errors ofprediction for the low-affinity ligands (not included in the training set) are higher thanthe errors of prediction for the high-affinity ligands suggests that this was notaccomplished. It should be pointed out that test-set ligands with low affinity have, onaverage, a higher experimental error than those in the high-affinity range.

VALIDATE tested the ability of a scoring function to generalize in the case wherethe binding modes were experimentally defined, with associated errors from differentassay procedures and no internal standards for reference — in other words, a basic testof how well we can do if we get the correct binding mode of the ligand. Here one has todeal with different errors in calculations on different receptors, which do not necessarilycancel. By contrast, VALIDATE II was focused on an individual protein. thus leadingto some cancellation of errors in modelling the target. This scoring function tested theability to evaluate ligands in the situation where the binding mode is modelled in acrude manner (with errors): ligands were docked into the binding site in an approximatemanner, by similarity to the binding mode of the corresponding transition state isostere.Overall, the results show that VALIDATE II is tolerant to inaccuracies in the bindingmode (albeit less accurate). It was not surprising that its predictive ability is greater thanVALIDATE’S, since it was specifically trained for HIV-1 prolease.

The choice of parameters remains arbitrary due to significant cross-correlationsbetween parameters. For example, in the VALIDATE II model, the HOMO energy, one of the most significant parameters, can be omitted and another model of similar pre-dictability can be derived from the remaining parameters. It is possible that finding thecorrect binding mode may constitute a major drawback for the external prediction ofthis model. The influence of this problem was significantly reduced for VALIDATE,because we used known complexes for both the training and the test set. It may be more difficult to predict the effects of minor changes in structure than in getting the relativeaffinities of diverse complexes approximately right. The VALIDATE II results motivatethe development of an improved scoring function to allow accurate prediction ofbinding modes as we believe much of the error in our predictions comes from impreciseorientations of the inhibitors in the complexes.

7. The Jain Scoring Function

To overcome the problems given by incorrect orientations in the binding site, Jain pro-poses [48]a regression-based scoring function that is both fast and tolerant to inaccurateligand orientations. This approach has a master equation (Eq. 21) that includes terms

51

Page 62: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. Marshall

tuned with neural network-based functions that include a sigmoidal (s), a gaussian (g)(Eq. 22), and a distance (d) (23):

(21)

(22)

(23)

where ƒ0 is the hydrophobic term, is the polar contribution, f2 estimates the repul-sive term, both (l5.phbe) and (l6.lhbe) represent solvation and (l7.n_rot) and(l8.log(mol.weight )) are the entropic term, respectively. Tunable linear parameters aredenoted li, whereas nonlinear ones are denoted ni. In what follows (Eqs. 24 and 26–28),— we have attempted to eliminate what we believe were typos from the original paperby Jain [48]. The reader is advised to compare the two versions.

The hydrophobic complementarity :

where

computes the effect of hydrogen-bonds and salt bridges, and:

(24)

is composed of a gaussian g — that captures the positive portion of atomic contacts, and a sigmoidal s — that captures the portion due to steric overlap.

The polar complementarity (Eq. 25) is summed over all pairs of polar atoms using thecharge–charge interactions (ci, cj):

(25)

(26)

(27)

is a correction term for hydrogen-bond directionality, defined using three vectors: vi

(‘out’ direction for atom i), vj (‘in’ direction for atom j) and a normalized vector bij

(from atom i to atom j), respectively. Switching atoms i and j yields the same value forthis correction term, as f1b is symmetric. The unfavorable charged contacts (Eq. 28) areaccounted for by summing contacts for all pairs of polar atoms of the same sign:

(28)

Solvation effects arc accounted for by computing the difference between the total andactual number of ‘hydrogen-bond equivalents’ for the protein (phbe) and ligand (lhbe),

52

Page 63: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

using the tunable parameters l5 and l6, respectively (see Eq. 21). Entropic costs are esti-mated by counting the number of freely rotatable bonds in the ligand (n_rot) and by the log 10 of the molecular weight of the ligand (log(mol.weight )) , using the tunableparameters l7 and l8 , respectively.

The function F was trained on 34 crystal structures from PDB [61], which includedseveral proteases (e.g., 6 thermolysin–inhibitor, 6 trypsin–inhibitor and 5 thrombin–inhibitor complexes). two cytochrome P450–inhibitor complexes, binding proteins to fatty acids, galactose and retinol, among others. The affinity ranged between 2.82 and 14.0 on the pKd scale. When trained using ligand minimization with pose optimization (defined as the orientation of the ligand into the binding site prior to docking), F yields a good pair rank-correlation coefficient (0.95 out of 1.0) and a good root-mean-squared error of 0.72. The l and n values were obtained after convergence (five iterations) (Table 4).

Table 4 1 and n values after convergence.

parameter Value Parameter Value

l0 0.0898 n0 0.6213

l2 1.2338 n2 0.1880l3 –0.1796 n3 0.3234l4 –0.0500 n4 0.6313l5 –0.1539 n5 0.6139l6 0.0000 n6 0.5000l7 –0.2137 n7 0.5010l8 –1.0406

l1 1–0.084 n1 0 .1339

Additional constraints were imposed by introducing a steric overlap term

Ki,j = –10.0 (dij + (29)

where δ was set to 0.7 for complementary polar contacts and 0.1 for others (thus, asteric overlap of 0.5 Å yields a penalty of 1.6 pKd units). This penalty was introducedfor ligand orientation optimization during docking, because F was trained on nativecrystal structures, where unfavorable ligand–protein contacts are scarce, if any. How- ever, Kij was not included in the fina l score.

The expressions used in this scoring function are, in principle. a different functionalform from the ‘classical’ equations from molecular mechanics (e.g. Eqs. 15 and 25 are different representations of the charge–charge interaction). However, this approach yields a result that is quite similar to VALIDATE [47], in that hydrophobic contacts (expressed as the lipophilic CSA in VALIDATE and as f0 in F) are deemed mostimportant (18.5% in VALIDATE and 44% in F, respectively). Electrostatic interactions rank second in importance (26%), unlike in VALIDATE (less than 3%). F compareswell [48] to Böhm’s function [44], as the breakdown terms from the master equations yield similar values for the trypsin–benzamidine complex [48]. This function appears to be rapid, accurate and tolerant to ligand pose. and to produce good estimates for the binding affinity .

53

Page 64: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. marshall

8. The HTS Approach

For a computational high-throughput screening (HTS) objective, Rose [72] defined a scoring function based on the following master equation (Eq. 30):

(30)

where is the hydrogen bond energy. ∆ Gmetal ctr. is the metal center bond energy,∆GvdW is the Lennard-Jones interaction energy, ∆Gdesolv.,polar is the desolvation energyfor unsatisfied donor/acceptor terms. ∆Gdesolv.,non-polar is the desolvation energy for non-polar atoms, ∆Grot.int is the entropy loss due to restrictions in internal degrees offreedom and ∆Grot/trans is the loss of rotational and translational entropy.The HTS effective hydrogen bond potential, ∆Gh-bond, is defined as:

where the distance of an atom from the protein surface becomes essential: if dsurface > 4 Å,then ∆Gh-bond = –0.5 to –1 .0 kcal/mol. which also assigns a maximum for the ∆Gdesolv.,polar

term. If the same atom is on the surface, then Gh-bond ∆Gh-bond = 0.0 to –0.3 kcal/mol, and ∆Gdesolv.,polar = 0.0. In Eq. 3 1, H is the hydrogen atom, A is the acceptor atom and D is the donor, γ is the angle for the donor out of the acceptor plane. while qA and QD are thepartial charges for the acceptor and donor atoms, respectively. In the HTS function, aro- matic carbons are treated as hydrogen-bond acceptors, whereas aromatic C-H pairs are treated as donors. In a similar fashion the effective metal center bond potential is defined:

where λ is the angle of the metal out of the acceptor plane. M is the metal atom, A is thecoordinating acceptor atom and qA are formal charges.

The HTS effective desolvation terms are:

which is applied for all nonpolar atoms. and ∆Ai is the change in the solvent accessiblesurface area upon ligand binding, and:

(34)

which is applied to all unsatisfied donor and acceptor atoms, with σi as an atom-typespecific solvation parameter fitted to the ∆Gsol of small molecules.

The entropy terms are defined as:

54

Page 65: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. Marshall

scoring functions work at all. It is well known, for example, in the field of HIV-1 pro-tease inhibition, that different pharmaceutical companies measure the Ki or IC50 valuesat pH values ranging from 4.5 to 7, at Na+ concentrations ranging from 0.001 M to0.1 M. Both factors are known to influence KM values for the protease up to 100-fold,hence significantly different Ki values are expected for the same ligand! In view of theabove facts, interested parties should combine their efforts by pooling together binding affinities for all known crystal structures of a given protein, and have them tested withthe same procedure, ideally at different temperatures as well (to observe the influence ofenthalpy-entropy compensation and thus better calibrate the model).

It is evident that attempts to calibrate scoring functions rely upon comparison of cal-culated (predicted) and experimental values. As such, the experimental values remain the only reality check. With the increasing diversity of bioassays. different laboratories yield different affinity constants for the same reference compound, sometimes by asmuch as 3 pKi units. We, therefore, believe i t is timely to propose the definition of inter-nal standards for biological targets of major interest, both at the substrate/agonist andinhibitor/antagonist levels. Peer-reviewed scientific journals should instruct their refer-ees to condition acceptance for publication only upon disclosure of experimental resultsfor such compounds deemed as internal standards. Such compounds could then be usedfor relative potency comparisons. We expect improvements in the performance ofscoring functions relying on such internal standards, thus leading to more accurate pre-dictions. While this request may seem an overstatement, we note that two recentreviews do not even mention the need for more accurate biological assays, despite anelegant treatment of physico-chemical and computational aspects of the ligand-receptorbinding [52,73].

The second requirement, speed, is obviously dependent on the number of terms in themaster equation, the quality of the software implementation and the difficulty of esti-mating each parameter, besides hardware performance. While benchmarks are not avail-able for these scoring functions, one is inclined to believe that Jain’s or Böhm’s scoringfunctions would be faster than VALIDATE on the same set of compounds. None theless, VALIDATE has the highest diversity of the training set, which can be both an ad-vantage (intuitively) and a drawback (as compared to VALIDATE II on the HIV-1 pro-tease crystals). There is obviously a trade between speed and accuracy. One should beconcerned more with rapidly screening out inactives’ without too much effort and, assuch, a steric overlap penalty (Eq. 29) should be an explicit part of every scoringfunction.

The presence of solvent molecules in the binding site is also an important aspectwhich has not been explicitly accounted for in the above scoring functions. While theimportance of water in mediating ligand-protein interactions has been amply describedi n the literature, none of the scoring functions considers docking water molecules in thebinding site in the same run as the ligand. Such a procedure could be improved if oneconsiders that methods that rapidly estimate the mean residence time of a water mole-cule [74] at the interface with biomolecules are available [75] .In combination with suchmethods, the direct influence of the solvent in the binding process would be betteraccounted for.

56

Page 66: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

The third requirement, tolerance to inaccuracies of the ligand,s orientation in the

binding site, appears to be a promising feature, provided that docking calculations weextremely fast. Such inaccuracies are also present in the structural data, and thermalfactors should be checked prior to deriving the scoring function. The binding affinityrepresents the statistical ensemble averaged over 1010 of ligand-receptor pairs, hencethe ability to predict it based on the calculations for a single conformation of bothligand and receptor remains questionable [76].

Last, but not least, one should question the need to perform the scoring of ligands atall. Most such procedures are aimed at de novo design software, which is used with theexplicit intention to explore the virtual chemical space in search of different, diverse, novel ligands. These virtual ligands are then ranked according to the scoring function,then later inspected by the computational chemist alone or in combination with syn-thetic chemists to evaluate synthetic feasibility of such ligands. Since the aim of thisprocedure is to explore molecular diversity, one should perform a cluster analysis (perhaps with the use of an experimental design procedure) on all the novel ligands andcluster them according to some similarity (or diversity) measure. Getting representativeligands from each cluster may prove more effective than predicting their bindingaffinity! In a combined procedure, one could envision a scoring function applied toD-optimal design-selected cluster representatives prior to individual compound inspection/selection.

Overall, the above results show that the receptor-based scoring function methodology holds promise and has room for improvement. With the proper selection of the training set (which one could bias for the therapeutic target of choice), and i f thermodynamicaspects are accounted for (perhaps with the use of microcalorimetry), one can expectquite good results from applying the scoring function. None the less, one has to bear inmind that novel chemistry (in novel ligands) may behave in unpredicted manner, henceone should always expect (and welcome) the unexpected.

Acknowledgements

One of the authors (T.I.O.) wishes to acknowledge useful discussions with Dr. RegineBohacek (Ariad, Cambridge, MA), Dr. Peter Rose (Agouron Pharmeceuticals, SanDiego, CA), Dr. Jeff Blaney (Chiron Corporation. San Francisco, CA) and Dr. Ludovic Kurunczi (Faculty of Pharmacy, Timisoara, Romania). We thank the co-authors ofVALIDATE for a stimulating scientific environment during the development of thisscoring function.

References

1.

2.

3.

Gund, P., Three-dimensional pharmacophoric pattern searching, Prog. Mol. Subcell. Biol., 11 (1997). 117–143.Jakes, S.E., Watts. N.. Willett, P., Bawden, D., and Fisher, J.D.. Pharmacophoric pattern matching in files of 3D chemical structures: Evaluation of search performance, J. Mol. Graph., 5. ( 1987), 41–48, Jakes, S.E., and Willett, P., Pharmacophoric pattern matching in files of 3D chemical structures: Selection of interatomic distance screens, J. Mol. Graphics. 4, (1986) 12–20.

57

Page 67: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. marshall

4. Sheridan, R.P., Rusinko, A.. III.Nilakantan. R.. Venkataraghavan R.,, Searching for pharmacophores inlarge coordinate datu bases and its use in drug design, Proc. Natl. Acad. Sci. USA. 86, (1989), 8165-8169.Van Drie, J.H., Weininger, D.. and Martin. Y.C., ALADDIN: An integrated tool for computer-assistedmolecular design and pharmacophore recognition from geometric steric and substructure searching ofthree-dimensional molecular structures, J. Comput.-Aided Mol Design 3 (1989) 225–251Martin, Y.C., 3D Database searching in Drug Design, J. Med. Chem. , 35 (1992) 2145–2154.Martin. Y.C.. Bures, R.I.G., Danaher. EA., and DeLazzer, J., New stategies that improve the efficiencyof the 3D design of bioctive molecules, In Trends in QSAR and molecular modelling 92, Wermuth, C.G., (Ed.) ESCOM, Leiden, 1993. P. 20–27.Kuntz, 1.D, Blaney, S.M.. Oatley. S.J.. Langridge. R. and Ferrin, T.E., A geometric approach to macro-molecule-ligand interactions, J. Mol. Biol.. 161 (1982) 269–288. DesJarlais, R.L., Seibel, G.L.. Dixon. J.S., and Kuntz, I.D., Using shape complimentary us un initial screen in desiging ligands for a receptor binding site of known three-dimensional structure, J. Med.Chem., 31 (1990) 722-729.DesJarlais, R.L., Seibel, G.L.. Kuntz, I.D., Fruth, P.S., Alvarez, J.C.. de Montellano. P.R 0.. DeCamp, D.L., Babe. L.M. Craik, C.S., Structure-based design of nonpeptide inhibitors specific for the humanimmunodeficiency virus I protease, Proc. Natl. Acad. Sci. USA, 87, (1990) 6644–6648. Oshiro, C M.. Kuntz, I.D., Dixon. J.S., Flexible ligand docking using a generic algorithm, J. Comput.-Aided Mol. Design. 9 (1995) 113–130.Ring C. Sun E., McKerrow S., Lee G.. Rosenthal P.. Kuntz 1.. and Cohen F., Structure-based inhibitor design by using pretein models for the development of antiparasitic agents, Proc. Natl. Acad. Sci. USA.90. (1993), 3583–3587. Shoichet, B.K., Stroud, R.M., Santi, D.V. Kuntz, I.D., and Perry. K.M.. Structure-based discovery of inhibitors of thymidylate synthase, Science. 259. (1993 ), 1445–1450. Caflisch. A., Miranker, A. and Karplus. M.. Multiple copy simultaneous search and construction ofligands in binding sites appliction to inhibitors of HIV-1 aspartic proteinase, J. Med. Chem., 36, (1993), 2142–2167.Oprea, T.I., Waller, C.L.., and Marshall. G.R.. Viral proteases: Structure and function. In Cellular proteo- lytic systems. Ciechanover. A. and Schwartz. A. (Eds.). Wiley-Liss. Inc . New York. 1994. pp. 183–221. Shuker, S.B., Hajduk, P.J.. Meadows R.P. and Fesik. S.W., Discovering high affinity ligands for pro- teins: SAR by NMR, Science, 274. (l996), 1531–1534. Lewis, R. and Dean, P., Automated site-directed drug design: /he concept of spacer skeletons forprimary structure generation, Proc. R. Soc. lond. [Biol.]. 236 (1989) 125–140. Lewis, R. and Dean. P., Automated site-diredirected drug design: The formation of molecular templates inprimary structure generation, Proc. R. Soc. Lond. [Biol.], 236 (1989) 141–162.Ho, C.M.W. and Marshall, G.R.. Cavity search; An algorithm for the isolation and Display of cavity-like binding regions, J. Comput.-Aided Mol. Design. 4 (1990) 337–354. Miranker, A. and Karplus. M., Functionality maps of binding sites: A multiple copy simultaneous search method, Proteins: Struct. Funct. Genet. 11 (1991) 29–34.Moon. J.B., and Howe, W.J., Computer design of bioactive molecules: A method for receptor-based denovo ligand design, Proteins: Struct. Funct. Genet. 11 (1991) 314–328.Nishibata. Y. and Itai, A., Automatic creation of drug candidate structures based on receptor structure Starting point for artificial lead generation, Tetrahedron. 47 (1991 ) 8985–8990.Böhm, H.-J., The computer program LUDI: A new method for the de novo design of enzyme inhihitors.J. Comput.-Aided Mol. Design. 6 (1992) 61–78.Borman. S., New 3-D search and de novo design techniques aid drug development, C&EN. (1992) 18–26. Chau. P.L., and Dean. P.M., Automated site-directed drug design: An accessment of the transferability ofatomic residual charges (CNDO) for molecular fragments, J. Comput.-Aided Mol. Design. 6 (1992)407–426.Chau. P.L.. and Dean, P.M.. Automated site-directed drug design: Searches of the Cambridge Structural Database for bond lenghts s in molecular fragments to be used for automated structure assmebly, J. Comput-Aided. Mol. Design, 6 (1992) 397–406.

5 .

6.7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.25.

26.

58

Page 68: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

Chau. P.L.. and Dean. P.M., Automated site-directed drug design: The generation of a basic set offragments to be used for automated structure assemble, J. Comput-Aided. Mol. Design. 6 (1992)385–396.Ho. C.M.W. and Marshall. G.R., FOUNDATION: A program to retrieve subsets of’ query elements,including active sire region accessibility, from three-dimensional ional databases, J. Coinput.-Aided Mol.Design, 7 (1993) 3–22. Ho. C.M.W., and Marshall, G. R. SPLICE: A program to assemble partial query solutions from three- dimensional database searches into novel Iigands, J. Cornput.-Aided Mol. Design, 7 (1 993) 623-647.Rotstein. S.H., and Murcko. M.A., Group Build: A fragment-based method for de navo drug design,J. Med. Chem., 36. (1993) 1700–1710.Ho. C.M.W. and Marshall. G.R., DBMAKER: A program to generate 3D databases based upon userspecified criteria, J. Comput.-Aided Mol. Design. 8 (1994) 65–86.Ho. C.M.W. and Marshall, G.. De novo design of ligands, In Proceedings of the twenty-seventh annual Hawaii International Conference on system sciences, Hunter, L. (Ed.) IEEE Computer Society Press,Washington. DC. Vol. 5 1994, pp. 213–220.Bohacek, R.S. and McMartin C., Multiple highly diverse structures complimentary to enzyme bindingsites: Results of extensive application of a de novo design method incorporating combinatorial growth,J. Am. Chem. Soc., 116 (1994) 5560–5571. Roberts, N.A., Martin, J.A.. Kinchington, D., Broadhurst, A.V., Craig. J.C., Duncan I. B., Galpin. S.A.,Handa, B.K., Kay, J., Krohn. A.. Lambert, R.W.. Merrett, J.H., Mills J. S., Parkes K.E.B., Redshaw, S.,Ritchie, A.J., Taylor, D.L.. Thomas. G.J., and Machin, P. J, Rational design of peptide-based HIVproteinase inhibitors, Science. 248 ( 1990) 358–361.

35. Tomasselli, A.G., Hui, J.O., Sawyer. T.K., Thaisrivongs, S.. Hester, J.B. and Heinrikson R.L., Theevaluation of non-viral substances of the HIV protease us leads in /he design of inhibitors for AIDStherapy, Adv. Exp. Med. Biol., 306 (1991) 469–482. Thaisrivongs, S.. Tomasselli, A.. Moon, J., Hui, J., McQuade, T.. Turner, S.. Stronbach, J.. Howe. J.Tarpley, W., and Heinrikson, R., Inhibitors of the protease from human immunodeficiency virus: Design and modeling of a compounds containing a dihydroxethylene isostere insert with a high binding affinity and effective antiviral activity, J. Med. Chem., 34 (1991) 2344–2356.Thanki, N.. Rao, J., Foundling, S., Howe. W., Moon, J.. Hui, J.. Tomasselli, A., Heinrikson, R..Thaisrivongs, S., and Wlodawer, A., Crystal structure of a complex of HIV- I protease with adihydroxyethylene-containing inhibitor: comparison with molecular modeling, Protein Science, 1(1992) 1061–1072.Thaisrivongs, S., Turner. S.. Strohbach. J.. TenBrink. R., Tarpley, W.. McQuade T., Heinrickson., R..Tomasselli, A., Hui, J., and Howe, W., Inhibitors of the protease for the HIV: Synthesis, enzyme inhibi-tion and antiviral activity of a series of compuonds containing the dihydroxyenthylene transition-state isostere, J. Med. Chem., 36 (1993) 941–952.Lam, P.Y., Jadhave. P.K.. Eyermann, C.J., Hodge, C.N., Ru, Y., Bacheler, L.T., Meek, J.L.. Otto, M.J.,Rayner, N.M., Wong, Y.N., et al., Rational design of potent, bioavailable, nonpeptide cyclic ureas as HIV protease inhibitors, Science, 263 ( 1994) 380–384.Lunney, E.A., Hagen, S.E., Domagala, J.M.. Humblet, C., Kosinski, J., Tait, R.D., Warmus, J.S., Wilson, M., Ferguson. D.. et al., A novel nonpeptide HIV I protease inhibitor: elucidation of the binding mode and its application in the design of related analogs, J. Med. Chem., 37 (1994) 2664–2667. Thaisrivongs, S.. Tomich, P.K., Watenpaugh, K.D., Chong, K.T., Howe, W.J., Yang, C.P., Strohbach. J.W.. Turner. S.R., McGrath, J.P., Bohanon, M.J., et al. Structure-based design of HIV protease in-hibitors: 4-hydroxycoumarins and 4-hydroxy-2-pyrones as non-peptidic inhibitors, J. Med. Chem., 37(1994) 3200–3204. Oprea. T.I., Ho, C.M.W.. and Marshall, G.R., De novo design: Ligand construction and prediction of affinity, In Application of computer-aided molecular design: Agrochemicals, materials and phar- maceuticals, Reynolds, C.H., Holloway M.K.. and Cox. H.. (Eds.) ACS. Washington DC. 1995.

Caflisch, A., Computational combinatorial ligand design: Application to human alpha-throbin, J. Comput.-Aided Mol. Design. 10 (1996) 372–396.

27.

28.

29

30.

31.

32.

33.

34.

36.

37.

38.

39.

40.

41.

42.

pp. 64–81.43.

59

Page 69: 3D QSAR in Drug Design. Vol2

Tudor I. Oprea and Garland R. Marshall

44. Böhm, H.-J.. The development of a simple empirical w i r i n g function to estimate the binding constantfor a protein-ligand complex of known three-dimensional structure, J. Comput.- Aided Mol. Design, 8(1994) 243–256. Wallqvist. A., Jernigan, R.L., and Covell. D.G., A preference-based free-energy parameterization ofenzyme-inhibitor binding: Applications to HIV-1 protease inhibitor design. Protein Sci.. 4. (1995)1881–1903.Verkhivker. G., Appelt, K.. Freer. S.T., and Villafranca, J.E.. Empirical free energy calculations ofligand-protein crystallographic complexes: 1. Knowledge-based ligand-protein Interaction PotentialsApplied to the Prediction of HIV- 1 Protease Binding Affinity, Protein Eng., 8 (1995) 677-691. Head. R.D., Smythe, M.L.. Oprea, T.I.. Waller. C.L.. Greene S.M.. and Marshall. G.R., VALIDATE: A new method for the receptor-based prediction of binding affinities of novel ligands, J. Ani. Chem. Soc.,

Jain, A.. Scoring non-covalent pretein-ligand interactions: A continuous differentiable function tuned tocompute binding affinities, J. Comput.-Aided Mol. Design. 10 (1996) 427–440. Kollman P.. Free energy calculations: Application to chemical and biochemical phenomena, Chem.,Rev., 93 (1993) 2395–2417. Åqvist, J.. Medina C. and Samuelsson, J.-E., A new method for predicting binding affinity in c computer.-aided drug design, Protein Eng., 7 (1994) 385–391.Åqvist. J., and Mowbray, S.L., Sugar recognition by a glucose/galactose receptor: Evaluatioin of binding energetics from molecular dynamics simulations, J. Biol. Chem., 270. (1995) 9978–9981. Gilson, M.K.. Given J.A.. and Head, M.S., A new class of models for computing receptor-ligand bindingaffinities, Chem, Biol.. 4 (1997) 87–92. Head, M.S., Given J.A.. and Gilson M.K.. Mining minima Direct computation of conformational freeenergy, J. Phys. Chem, A, 101 (1997) 1609–1618. Holloway. M.K.. Structure-based design of human immunodeficiency virus-1 protease inhibitors: corre- lating calculated energy with activity i n Application of computer-Aided molecular design:Agrochemicals, material\ and pharmaceuticals. Reynolds, C.H., Holloway M.K., and Cox, H. (Eds.)ACS, Washington DC. 1995. p. 82.Waller. C.L.. Oprea. T.I.. Giolitti, A., and Marshall. G.R., 3D QSAR of human immunodeficiency virus-1protease inhibitory: I. A CoMFA study employing experimentally-determined alignment rules,J. Med. Chem.. 36 (1993) 4152–4160. Oprea, T.I.. Waller. C.L.. and Marshall. G.R., 3D-QSAR of human immunodeficiency virus-1 protease in-hibitors: II. Predictive power using limited exploration of alternate binding modes, J. Med. Chem.. 37 (1994) 2206–2215.Oprea. T.I., Waller. C.L.. and Marshall. G.R., 3D-QSAR of human immunodeficiency virus-1 protease in-hibitor: III. Interpretation of CoMFA results, Drug Des. Discov., 12 (1994) 29–51 Ajay. and Murcko. M.. Computational methods to predict binding free energy in ligand-receptorcomplexes, J. Med. Chem.. 38 (1995) 4953–4967.Xue. Q., and Yeung, E.S., Differences in the chemical reactivity of individual molecules of an enzyme,Nature, 373 (1995) 681–683.Searle, M.S., and Williams. D.H., The cost of conformational order: Entropy changes in molecularassociations, J. Am. Chem. Soc.. 114 (1992) 10690–10697.Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Brice. M.D.. Rodgers, J.R., Kennard, O.,Shimanonchi. T. and Tasumi M., The Protein Data Bank: A computer-based archival file for macro-molecular structures , J. Mol., Biol., 112 (1977) 535–542. Connolly, M., Solvent-accessible surfaces of protein and nucleic acids, Science. 221 (1983) 709–713.Bondi. A., Van der Waals volumes and radii, J. Phys. Chem., 68 (1964) 441–449. Sippl, M.J., Boltzmann’s principle, knowledge-based mean fields and prorein folding: An approach to the computational determination of protein structures, J. Comput.-Aided Mol. Design, 7 (1993)473-501.Brooks. B.R., Bruccloleri, R.E., Olafson, D.. States, D., Swaminathan, S. and Karplus, M.. CHARMM: A program for macromolecular energy, minimization, and dynamics calculation, J. Comput. Chem., 4(1983) 181–186.

45.

46.

47.

118 (1996) 3959–3969.48.

49.

50 .

51.

52.

53.

54.

55.

56.

57.

58.

59.

60.

61.

62.63.64.

65.

60

Page 70: 3D QSAR in Drug Design. Vol2

Receptor-Based Prediction of Binding Affinities

Gilli, P.; Feretti, V., Gilli G. and Borea, P.A., Enthalpy-entropy compensation in drug receptor binding,J. Phys. Chem., 98 (1994) 1515-1518.Leo, A., Estimating log Poct from structures, Chem. Rev., 5 (1993) 1281-1306.Kellogg, G.E., Semus, S.F., and Abraham, D.J., HINT: A new method of empirical hydrophobic fieldcalculation for CoMFA, J. Cornput.-Aided Mol., Design, 5 (1991) 545-552.Le Grand, S.M. and Merz, K.M., Rapid approximation to molecular surface area via the use of Booleanlogic and look-up tables, J. Comput. Chem., 14 (1993) 349-352.Still, W.C., Tempczyk, A., Hawley R.C., and Hendrickson, T., Semianalytical treatment of solvation f o rmolecular mechanics and dynamics, J. Am. Chem. SOC., 112 (1990) 6127.Oprea, T.I., Head, R.D., and Marshall, G.R., The basis of cross-reactivity for a series of steroids binding to a monoclonal antibody against progesterone (DB3): A molecular modeling and QSAR study, InQSAR and molecular modelling: Concepts, computational tools and biological applications, Sanz, F.,Giraldo, J., and Manaut, F. (Eds.) Prous Science Publishers, Barcelona. 1995,pp. 451–455.Rose, P.W., Scoring methods in ligand design, 2nd USCF Course in Computer-Aided Molecular Design,San Francisco, CA, 1997.Gilson, M.K., Given, J.A., Bush, B.L., and McCammon, J.A., The staristical-thermodynamic basis forcomputation of binding affinities: A critical review, Biophys. J., 72 (1997) 1047–1069. Garcia, A.E., and Stiller, L. Computation of the mean residence time of water in the hydration shells ofbiomolecules, J. Comput. Chem. 14 (1993) 1396-1406.Hummer, G., Garcia, A.E., and Soumpasis, D.M., Hydration of nucleic acid fragments: Comparison oftheory and experiment fo r high-resolution crystal structures of RNA, DNA, and DNA-drug complexes,Biophys. J., 68 (1995) 1639-1652.Oprea, T.I., and Waller, C.L., Theoretical and practical aspects of three dimensional quantitative struc-ture-activity relationships. In Reviews in Computational Chemistry, vol. 11, Lipkowitz, K.B., and Boyd,D.B. (Eds), Wiley, New York, 1997, pp. 127-182.

66.

67.68.

69.

70.

71.

72.

73.

74.

75.

76.

61

Page 71: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 72: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by EnergyMinimization

M. Katharine HollowayMolecular Design and Diversity, Merck Research Luboratories, West Point, Pennsylvania 19486,

U.S.A.

Quantitative a priori prediction of the affinity of a ligand for its receptor is, in a sense,the holy grail of structure-based molecular design, ideally allowing one to optimize thenumber and kind of compounds designed for a program prior to any chemical synthesis.Historically, several approaches have been employed to calculate and/or predict bindingaffinity.

The free energy of binding can be calculated directly via Free Energy Perturbation(FEP) calculations [1]. The relative binding energies for pairs of inhibitors are deter-mined using a thermodynamic cycle in which the structure of one inhibitor is perturbedinto the structure of another, both in the receptor site and in solvent. This approach hasbeen reported to yield relative free energies that are accurate to ± 1 kcal/mol withrespect to experiment. However, due to the amount of computer time required, thesecalculations are impractical for routine assessment of the binding affinity of proposedcompounds.

Other more efficient approaches have included: Comparative Molecular FieldAnalysis (CoMFA) [23], the Hypothetical Active Site Lattice (HASL) [4], HINT hy-drophobicities [5], solvent accessibility [6] or solvent-induced interactions [7], atom-atom contact preferences [8,9], a general mean field model [10,11] and scoring algo-rithms for database search or de novo design methods [12–16], as well as energyminimization methods.

This chapter will focus on computational studies which employ energy minimizationin an X-ray or modelled active site as the means to predict the affinity of a ligand for itsreceptor. These studies fall into two categories: (a) energy-component approaches –i.e. those that incorporate receptor-ligand energy values as one term in a sum of bindingenergy contributions or as part of a 3D QSAR; and (b) energy-only approaches – i.e.those that employ energy minimization methods alone to predict ligand affinity.

1. Energy-component Approaches

Recent approaches that incorporate energy minimization as part of a larger bindingenergy equation or 3D QSAR have included Marshall’s VALIDATE [17], Ortiz andWade’s COMBINE [18] and the empirical free energy evaluation method of DeLisi[19,20].

The VALIDATE method incorporates 12 physico-chemical and energetic parameters,including the electrostatic and steric interaction energy between the receptor and ligandcomputed using the AMBER force field in MacroModel. A predictive 3D QSAR wasderived for a diverse training set of 51 crystalline complexes. including HIV-1 protease,thermolysin, endothiapepsin, ß-trypsin and subtilisin-Novo inhibitors; antibody(DB3)-

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design Volume 2. 63–84.© I998 Kluwer Academic Publishers. Printed in Great Britain.

Page 73: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

bound steroids; and L-arabinose binding protein-bound sugars. The ligands ranged in size from 24–1512 atoms and the pKi values ranged from 2.47 to 14.0. The best fit equa- tion, using PLS analysis, yielded an r2 = 0.849 with a standard error of 1.006 log unitsand a cross-validated r2 = 0.776. This QSAR was found to be predictive for at least twoof three test sets of enzyme-inhibitor complexes: 14 structurally diverse crystalline complexes (predictive r2 = 0.81, absolute average error = 0.70 log units), 13 HIV pro-tease inhibitors (predictive r2 = 0.57, absolute average error = 0.73 log units) and 11 thermolysin inhibitors (predictive r2 = 0.72, absolute average error = 1.48 log units).For more details of this approach see the chapter by T.I. Oprea and G.R. Marshall in this volume.

The COMBINE approach utilizes only the intermolecular interaction energy between the receptor and its ligand, but employs a unique method that partitions the energy among receptor and ligand fragments and subjects these energy components to statisti-cal analysis. This is proposed to enhance contributions from mechanistically important interaction terms and to tune out noise due to inaccuracies in the potential energy func- tions and molecular models. For a set of 26 synovial fluid phospholipase A2 inhibitors,the direct correlation between interaction energies. computed using the cff91 DIS- COVER force field, and percent enzyme inhibition was very low, r = 0.212. However.with the COMBINE approach. employing PLS fitting and the GOLPE variable selection procedure, good correlations with percent inhibition were observed (r2 = 0.92,cross-validated r2 = 0.82). For more details of this approach see the chapter by Wadeet al. in this volume.

DeLisi and co-workers employ an empirical free energy evaluation that computes the binding free energy as a sum of various free energy contributions, including the inter-action energy between the receptor and ligand computed with the CHARMm force fieldThe binding of nine serine endopeptidase inhibitors [19]; biotin, iminobiotin and thio-biotin to streptavidin [ 19]; five peptide antigens to MHC Class I receptors [19]; and six peptide-based HTV-1 protease inhibitors [20] have been examined using this methodol- ogy. The calculated empirical free energies are in good agreement with experiment and/or FEP calculations for the first three cases. For the HIV-1 protease inhibitors, themethod is used to guide docking of the ligands via minimization of the free energy func-tion, leading to better agreement with the observed bound structure (RMSD = 1.21 Å) than minimization of the CHARMm energy function (RMSD = 1.69 Å), although bothcorrelated well with In Ki values (empirical energy function, r = 0.970; CHARMm energy function. r = 0.967).

2. Energy-only Approaches

Recent approaches that primarily employ energy minimization methods to predict ligand affinity have included studies of the binding of (a) influenza sialidase inhibitors [21], (b) aldehyde substrates for aldose reductase [22], (c) thrombin inhibitors [23],(d) serine proteinase inhihitors [24], and HIV-1 protease inhibitors [25–29].

Taylor and von Itzstein [21] performed calculations on the mechanism of sialoside cleavage by influenza virus sialidase and its inhibition by transition-state analogs such

64

Page 74: 3D QSAR in Drug Design. Vol2

A Priori Prediction of ligand affinity by energy Minimization

as 2-deoxy-2,3-didehydro-N-acetylneuraminic acid (Neu5Ac2en). Minimized enzyme– inhibitor complexes for Neu5Ac2en and two analogs were generated employing a con- bination of molecular dynamics and molecular mechanics calculations with the CVFF force field. The computed intermolecular binding enthalpies correlated qualitatively with the observed Ki values.

Subsequently. De Winter and von Itzstein [22] performed calculations on the bindingof substrates (D-xylose, L-xylose and D-lyxose) to wild-type human aldose reductase and two site-directed mutants, H110A and H110Q. Three diffe rent protonation states of His110 were examined: (1) Nε2-H or HisI, (2) Nδ1-H or HisII, and (3) Nε2-H and Nδ1-Hor His+. Minimum energy conformations for each enzyme–substrate complex were gen-erated via a combination of molecular dynamics sampling and molecular mechanicsminimization with the AMBER program. The solvation energy of these complexes wasevaluated using DELPHI. An excellent correlation was observed between the calculated average interaction enthalpy, both with (r = 0.94) and without solvation (r = 0.87), andthe measured log(Km) values for the HisI wild-type and H110A and H110Q mutantmodels, while there was no correlation for the other two wild-type models, HisII and His+. Thus it is proposed that the key residue His110 is neutral and protonated at Nε2

when an aldehyde substrate is bound to human aldose reductase. Further, the log( Km) ofa fourth substrate. D-glucose, was correctly predicted (predicted = –0.93; observed = –0.94) from the correlation equation which included solvation, log(Km) = 2.72 + 0.241(average interact ion enthalpy).

Grootenhuis and van Galen [23] examined the binding of 35 non-covalently bound inhibitors of human thrombin in the argatroban, TAPAP and NAPAP series with pKi

values ranging from 3 to 8. Energy minimization of the inhibitors in the X-ray structure of thrombin was performed using the CHARMm force field. Various electrostatic models were examined (ε = 1. r, 4r), as well as flexibility of the enzyme active site andthe inclusion of solvation corrections. The correlation between the CHARMm inter- action energy and pKi was best (n = 32, r = 0.81, cross-validated r2 = 0.60, standard de-viation = 0.97 log units) when the enzyme active site was held rigid, with a distancedependent dielectric constant (ε = r) employed during energy evaluation but not duringenergy minimization. The equation of the best fit line was Einter = –4.23 (pKi) – 54.74.Preliminary investigation of solvation effects using Eisenberg’s solvation functiondid not improve the correlation. For more details of this work see the chapter byR.M.A. Knegtel and P.D.J. Grootenhuis in this volume (p. 99 ff).

Kurinov and Harrison [24] performed calculations, using the AMMP molecular mechanics program, on a series of 18 compounds containing hydrophobic groups and basic amines in order to predict their ability to inhibit bovine trypsin. Fifteen of the compounds were predicted to be inhibitors and three compounds were predicted not to be inhibitors, based on the presence or absence of a low-energy binding geometry within 2 Å of the known benaamidine binding site. The binding energies calculated for the compounds predicted to inhibit trypsin were found to correlate (n = 15, r = 0.755:n = 14, r = 0.857) with the observed logKi values, which ranged from –1.15 to –4.00. Inaddition, the compounds that were predicted not to inhibit trypsin did not inhibit trypsin. The slope of the correlation line was observed to be less than RT; this was

65

Page 75: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

attributed to neglect of solvation and entropic terms. It was found that an all-atom po-tential with no cutoff radius was essential to obtain a good correlation and that the flex- ibility of the ligand must be included (here via molecular dynamics) in order to properly represent the distribution of conformations. Subsequent X-ray structure determination of six of the inhibitors co-crystallized with trypsin indicated good overall agreement between the predicted and observed binding modes. Interestingly, the enzyme structure varied little from one X-ray structure to another, leading to the conclusion that trypsin is relatively rigid in its inhibited form, thus only flexibility of the inhibitor, not the enzyme, need be considered in predicting binding modes and energies.

HIV-1 protease inhibitor binding has been cxamincd by four different groups. Weber and co-workers initially examined [25] three HIV-I protease inhibitors, MVT-101,JG365, and U85548e, for which X-ray structures had been determined. These structures were minimized using the CVFF force field, using a distance dependent dielectric model to mimic solvation effects. A ‘high’ dielectric model, ε = 4r, was found toproduce minimized complexes and binding energies that were in better agreement with the X-ray structure and Ki measurements than did a ‘low’ dielectric model, ε = r. Usingthis protocol, the calculated interaction energy of the three inhibitors with HIV-1 pro-tease ranged from –53 to–56 kcal/mol, but were not correlated with observed Ki.

However, subsequent examination [26] of a set of 21 modelled peptide substrates ofHIV-1 protease with single amino acid substitutions at P4 to P'3 led to calculated interac-tion energies for the corresponding tetrahedral intermediates that correlated well with the observed kcat/Km (n = 21, r = 0.64; P1 – P'1 only, n = 8. r = 0.93; P2 – P'2 only.n = 14, r = 0.86). These calculations were performed using the AMMP molecularmechanics program and a constant dielectric model (ε = 1 ). Side-chain conformationsfor the peptide substratec were determined via systematic search. The catalytic mechan- ism and factors influencing the catalytic efficiency of the different substrates were dis- cussed in relation to the models: in particular. it was proposed that the ordered watermolecule observed in many X-ray structures between the inhibitor and enzyme ‘flaps’ acts as the nucleophile rather than the more traditional hypothesis of a water activated by the catalytic aspartie acids, AspA25 and AspB225. For more details of this work see thechapter by I.T. Weber and R.W. Harrison in this volume (p. 115 ff).

Miertus and co-workers [27] examined a set of eight peptide-based HIV-1 protease inhibitors, modelled on the hexapeptide MVT-101 structure. Enzyme–inhibitor calcula- tions were performed using the CVFF force field: short molecular dynamics runs were performed to relieve strain in manually constructed inhibitor models: solvation effects mere evaluated with the Polarizable Continuum Method, with = 80 representing a polar environment. Summation of total complexation and solvation energies led to values that were used to prioritize inhibitors for synthesis. e.g. incorporation of nega-tively charged residues at P

2or P'

2improved the energy and was consistent with im-

proved inhibition [28]. Varying the protonation state of the basic amine in these reduced peptide inhibitors led to uniform changes in the complexation energy, but some vari- ability in the solvation energy by modulating the overall charge on the inhibitor. Variation of the central transition state mimic was also examined; inclusion of a second hydroxyl or amine to the U-85545e reference structure was predicted to be favorable.

66

Page 76: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy Minimization

Viswanadhan and Reddy [29] examined a series of 11 peptidomimetic HIV-1 pro-tease inhibitors with structural variation at the P'2 position. Energy minimizations wereperformed with the AMBER force field and solvation energies were computed using an explicit water shell. The binding enthalpy resulting from these calculations compared favorably to FEP calculations for seven of the inhibitor pairs. Hydrophobic interaction energies were also computed but found to correlate less strongly (n = 7, r = 0.72, stan-dard deviation = 1.05 kcal/mol) than the calculated binding enthalpies (n = 7, r = 0.92,standard deviation = 0.57 kcal/mol) with the observed binding energy.

We have also examined the binding of HIV-1 protease inhibitors using energy mini-mization methods [30,31], with the goal of predicting activities and prioritizing syn-thesis for proposed inhibitors a priori. For this purpose, a training set of 33 inhibitorswith structural variation at the P'1 and P'2 positions was employed to derive a correlationbetween the computed enzyme-inhibitor interaction energy and the observed IC50

values. The inhibitors employed in the training set are shown in Tables 1 and 2. They were selected based on variety of structure and activity: 16 inhibitors (2–17) contained

Table 1 set of HIV-1 protease inhibitors.

Experimental IC50 values and calculated enzyme-inhibitor interaction energies for the P1, training

67

Page 77: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

Table 2set of HIV- 1 protease inhibitors.

Experimental IC50 values and calculated enzyme–inhibitor interaction energies for the P2' training

68

Page 78: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy Minimization

Table 2 (continued)

modifications in P'1 and 16 inhibitors (18–33) contained modifications in P'2; pIC50

values ranged from 4.523 to 10.000. An initial model of 1 was constructed based on the X-ray structures of renin in-

hibitors bound in the active sites of fungal aspartyl proteases such as Endothiapepsin and Rhizopus pepsin. Models of 2–33 employed the model of 1 as a template. The in- hibitor models were minimized in the active site of the L-689,502 inhibited HIV-1 pro- tease X-ray structure [32] using the MM2X force field [30]. In all calculations the inhibitor was completely flexible and the enzyme was completely rigid. Dielectric con-stants of 1.5 for intramolecular interactions and 1.0 for intermolecular interactions wereemployed. All titratable HIV-1 protease residues were charged with the exception ofTyr59 and one of the pair of catalytic aspartates, AspA25, which was protonated on Oδ1.The latter protonation slate was chosen based on pH rate profiles which suggest thatthe catalytic aspartates of the fungal aspartyl proteases Penicillopepsin and Rhizopuspepsin [33] and the HIV-1 protease [34] share one negative charge.

The computed total energy, Etot, is a sum of the intramolecular energies of theenzyme. Eenz, and the inhibitor, Einh, and the intermolecular (or interaction) energy,

stant and is not computed. Einter corresponds to the sum of the van der Waals (Evdw ) andelectrostatic (Eelec) interactions between the inhibitor and the enzyme.

Einter, between the enzyme and the inhibitor. Since the enzyme is held fixed, Eenz is con-

69

Page 79: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

Etot = Eenz + Einter + Einh

Einter = Evdw + Eelec

Einter should correspond to one component of the overall free energy or binding, ∆Gbind,with other significant contributions being: the solvation/desolvation of the enzyme, in- hibitor and enzyme-inhibitor complex: the flexibility of the enzyme; the energy associ- ated with achieving the bioactive conformation of the enzyme and inhibitor; and theentropy changes associated with binding. However. one might expect Einter to correlate with ∆Gbind if other effects are not dominant — i.e. if the enzyme structure does notvary much from one inhibited complex to the other. if the inhibitors occupy the same binding sites and have similar conformation and character. and if entropy changes are relatively constant. Indeed, the computed Einter values listed in Tables 1 and 2 correlatewell with the experimental observation, pIC50, as shown in Fig. 1, with the following relationship:

pIC50 = –0.169(Einter) – 15.707 (1)

n = 33, r2 = 0.783, cross - validated r2 = 0.770, s = 0.675

Separation of Einter into the van der Waals. Evdw, and electrostatic, Eelec, components, asdepicted in Fig. 2, indicated that neither correlated as well individually with pIC50 as thesum of the two, Einter . This is consistent with the observation that both hydrophobicbinding — i.e. steric complementarity — and hydrogen-bonding interactions — i.e.electrostatic complementarity — are critical to the activity of HIV-1 protease inhibitors.

Fig. 1. Calculated enzyme–inhibitor interaction energy (Einter) versus experimental enzyme inhibition (pIC50)for the training set of inhibitors.

70

Page 80: 3D QSAR in Drug Design. Vol2

A Priori Pr ediction of Ligand Affinity by Energy Minimization

Variation of the protonation state and protonation site of the catalytic aspartic acids, AspA25 and ASPB225, indicated that the observed correlation between Einter and pIC50 wasessentially independent of protonation state. As shown in Table 3, the correlation

Fig 2. Electrostatic (diamonds) versusEinter (circles), for the training set of

us van der Waals (squares) contributions to the interactions energy,hibitors 1–33. Equations of the best-fit line and correlation coefficients

are: Eelec: pIC50 = –0.23.3(Eelec) – 5.1988, r2 = 0.436; Evdw: pIC50 = –0.221(Evdw) – 10.270, r2 = 0.602;Einter: pIC50 inter) – 15.707. r2 = 0.783.

Table 3 effect of aspartie acid (A25 and B225) protonation state on the correlation between observed plC50

71

= –0169(E

and calculated Einter for 1–33.

Page 81: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

coefficient is similar for all protonation states with the possible exception of the com-pletely ionized species in which both aspartates are negatively charged. This is consist-ent with calculations by Tossi [28] (see above), which indicate that varying theprotonation state of the secondary amine in reduced peptide inhibitors leads to uniform changes in the complexation energies within a series.

However, the observed correlation was not independent of the force field employedas shown in Table 4. Similar calculations with the CHARMm force field [35] led to anr2 of 0.520 between Einter and pIC50. We postulated that the better correlation obtainedwith the MM2X force field might be due to the consistent charging scheme employed.Indeed, incorporation of the MM2X charges in the CHARMm calculations led to asignificant improvement in the correlation (r2 = 0.683). Thus, the MM2X force fieldand, in particular, the MM2X charges, appear superior to the CHARMm force field and charges for this specific application. This is interesting in light of some of the other studies of this kind which employed the CHARMm force field (see above).

We also examined the contribution of other key factors in binding — e.g. the flexibil-ity of the enzyme active site, the difference in energy between the solution and bound conformations of the inhibitor and the solvation/desolvation of the inhibitor and the enzyme. Including the flexibility of the enzyme active site did improve the correlation in the context of CHARMm minimization [31]; unfortunately, these calculations were not possible with the MM2X force field.

Including the difference in energy between the solution and bound conformations of the inhibitor was attempted by subtracting from Einter the difference in intramolecular energy between the free and bound inhibitor conformations. Two different dielectric constants ε = 1 and ε = 50) were employed to model the free inhibitor conformationthat was the closest local minimum to the bound conformation. Neither dielectric model led to energy differences that had a significant effect on the correlation [31]. This sug-gests that either there is no real energy penalty paid for the inhibitor to achieve the bioactive conformation or, more likely, that the penalty is similar for the inhibitors in the training set.

Table 4 correlation with PIC50 for 1–33.

The effect of computational parameters such as force field, charge model and solvation, on the

Computed value r2

Einter (MM2X) 0.784Einter (CHARMm) 0.520Einter (CHARMm, MM2X charges) 0.683ESolv,1 (BMIN 0.079Esolv,1 (BMIN), excluding 9 0.156

Einter (MM2X) & Esolv,1 (BMIN) 0.786

Einter (MM2X) &Esolv,Tot (BMIN) 0.789Inhibitor surface area 0.303Inhibitor volume 0.319

Esolv,Tot (BMIN) 0.118

72

Page 82: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy Minimization

The effects of solvation were included using the BATCHMIN GB/SA [36] con-tinuum solvation method with the MM2 force field [37]. Solvation energies were computed for the rigid enzyme active site, for each free inhibitor, and for each enzyme-inhibitor complex. However, neither the solvation energy of the inhibitor (Esolv,I) nor the total solvation energy (Esolv,Tot = Esolv,E-I – Esolv,E – Esolv,I) improved the correlation when employed in a multiple linear regression with the Einter values as shown in Table 4. This is consistent with a low correlation between the computed solvation energy and the observed activity (r2 = 0.079 and 0.118 for Esolv,I and Esolv,Tot, respectively). Even whenthe outlier 9 was excluded from the regression between Einter and Esolv,I, the correlation was only slightly improved. It is interesting to note that the surface area and volumecomputed for the bound inhibitor conformation correlated better with pIC50 than the BATCHMIN GB/SA solvation energies. Clearly, solvation/desolvation effects are important to the overall binding process. However, these results indicate that either solvation effects vary little for this set of inhibitors or that we have not properly approached the computation of solvation effects. The inability to improve the cor-relation with the inclusion of a solvation term is in agreement with the results of Grootenhuis and van Galen [23] who employed an Eisenberg solvation function for a series of thrombin inhibitors.

Thus, we have discovered nothing that improves the predictivity of our original cor-relation between Einter and pIC50, despite the fact that the other energetic factors are clearly key contributors to binding affinity.

In order to demonstrate that the observed correlation is not specific to this training setof inhibitors, we examined two other sets of HIV-1 protease inhibitors — i.e. cyclic urea inhibitors 34–37 from DuPont-Merck [38] and hydroxycoumarin and hydroxypy-rone inhibitors 38–68 from Upjohn [39]. Both sets of inhibitors differ significantly from the Merck set, in that they are water-displacing templates in which an oxygen of the in-hibitor occupies the binding site of a critical ordered water molecule located between the flaps of the enzyme and the inhibitor in the X-ray structures of HIV-1 protease/ inhibitor complexes. Additionally, the activity measure for these datasets are Ki, notIC50. These cannot be compared directly since Ki is an intrinsic property, while IC,,varies as a function of the assay conditions — e.g. as a function of substrate, pH and salt concentration. Thus, we can derive correlations between the calculated Einter and the ob-served pKi values, given in Tables 5–8, for each of the additional training sets (DuPont-Merck, r2 = 0.967; Upjohn, r2 = 0.606) or for the combination of the two (r2 = 0.759).The individual correlations are illustrated in Fig. 3 for the DuPont-Merck inhibitors and Fig. 4 for the Upjohn inhibitors. However, it would be ill-advised i.e. comparing apples to oranges) to attempt to correlate Einter with the observed pIC50 or pKi, values for the entire set of Merck, DuPont-Merck and Upjohn inhibitors since there is no standard of reference for their activity.

Note that the correlation for the Upjohn series of inhibitors, while very good, is not as strong as that for the Merck or DuPont-Merck series. This may result from the fact that these inhibitors were assayed as diastereomeric mixtures. while the modelling was performed on a single structure that was consistent with the more active diastereomer, 67, of the separate mixture 65–68.

73

Page 83: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

Fig. 3. Calculated enzyme-inhibitor interaction energy (Einter) versus experimental enzyme inhibition (pKi)f o r the DuPont-Merck inhibitors.

Fig. 4. Calculated enzyme-inhibitor interaction energy (Einter) versus experimental enzyme inhibition (pKi)for the Dupont-Merck inhibitors.

Employing the initial correlation for the Merck series of inhibitors described in Eq. 1,we were able to make predictions of activity for inhibitors prior to synthesis — i.e. truepredictions and not post hoc explanations of activity. A set of compounds, 69-79, for

74

Page 84: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy Minimization

which activities were predicted, is shown in Table 9 with the computed Einter andpredicted and observed pIC50 values. The accuracy of these predictions is illustratedgraphically in Fig. 5. Here, the line is one of unit slope, not a correlation line. Theaverage unsigned error Cor the predicted set of compounds is 0.59 log units across arange of 5.10 log units — i.e. only about a factor of 4. In many cases. this is cornparableto the error inherent in the measurement of a biological activity.

It must be emphasized that we made many more predictions than are shown inTable 9; these are only examples. In addition, when our prediction for a compound was unfavorable, it was frequently not synthesized; or when a prediction was favorable, the exact compound that was modelled was not synthesized, but rather an analog.

As can be seen in Table 9, this approach was able to distinguish small differences in structure/activity between two diastereomers, 71 and 72, as well as gross differences in structure/activily between the inverted pair or inhibitors 70 (Ro-31-8959) and 73, whichoccupy the same binding sites but in the opposite direction. The 6-membered lactamring in 69 was correctly predicted to fit poorly in the active site, although the analogous5-merubered lactam was a potent inhibitor (IC50 = 37 nM) [40].

However, the most significant prediction, for 76, involves an interesting feature of HIV- 1 protease–inhibitor complexes. Due to the symmetrical nature of the enzyme,some inhibitors, such as Ro-31-8959, 70 [41], are observed to bind in the active site intwo directions in the X-ray structure with respect to the flaps, that in their closed H-bonded form introduce the asymmetry that is the direction marker. This bidirectional binding is illustrated in Fig. 6. The hypothesis that 1 and 70 might bind in the active sitein opposite directions, as illustrated in Fig. 7, could explain the preference for inverse stereochemistry at the transition state hydroxyl. Based on this hypothesis. the hybrid inhibitor 76, which contained the C-terminal halves of both 1 and 70, was designed. Itwas predicted (IC50 = 2.8 nM), and subsequently observed (IC50 = 7.6 nM), to be apotent HIV- 1 protease inhibitor.

Fig. 5. Plot of predicted pIC 50 versus observed PIC50 values for the inhibitor whose activity was predicted apriori (prior to assay). The line is one ofunit slope, i.e. predicted plC50 = observed pIC50.

75

Page 85: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

Fig. 6. An illustration of the symmetrical nature of the HIV-I protease. The X-ray structures [41] of both o-rientations of Ro-31-8959, 70, in cyan and yellow, are shown in the enzyme active site, represented by a greenα-carbon ribbon. The β-hairpin structures which loop across the front of the image are the flaps. The ordered water molecule which sits between the flaps and the inhibitor is displayed in the foreground us a ball-and-stick figure; the catalytic aspartic acids are displayed in the background as stick figures.

It was the first in a novel series of inhibitors that led to CRIXIVAN®, 80, an HIV-1protease inhibitor which was approved for the treatment of AIDS in March 1996. A

Fig. 7. A comparison of the model of 1 in light gray oriented in an N → C fashion and the X-ray structure(reference [41]) of 70 in dark gray oriented in a C → N fashion in the HIV-I protease active site.

76

Page 86: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy Minimization

Observed Ki, pK i and calculated enzyme-inhibitor interaction energies for Dupont-Merck cyclicTable 5 urea HIV-1 protease inhibitors (reference [38]).

Table 6 coumarin HIV-1 protease inhibitors (reference [39]).

Observed Ki, pKi and calculated enzyme-inhibitor interaction energies for Upjohn 4-hydroxy-

77

Page 87: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

Table 7pyrone HIV-1 prortease inhibitors (reference [39]).

Observed Ki, pKi and calculated enzyme-inhibitor interaction energies for Upjohn 4-hydroxy-2-

78

Page 88: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy Minimization

Observed Ki, pKi and calculated enzyme-inhibitor interaction energies for Upjohn 4-hydroxy-2-Table 8pyrone HIV-1 protease inhibitors (reference [39]).

comparison of the modelled structure of 76 and the X-ray structure [42] of 80 isdepicted in Fig. 8.

Fig. 8. A comparison of the model of 76 in dark gray and the X-ray structure (reference [42]) of 80(CRIXIVAN®) in light gray as bound in the HIV- I protease active site

79

Page 89: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

Table 9predicted set of HIV-1 protease inhibitors.

Calculated enzyme-inhibitor interaction energies and predicted and observed pIC50 values for the

80

Page 90: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy minimization

Table 9 (Continued)

3. Conclusion

Direct computation of a ligand’s affinity for its receptor — e.g. via free-energy per- turbation calculations — can be costly and time-consuming. However, simple rapid com-putational alternatives can be eniployed to predict the affinity of a ligand for its receptor. These approaches include energy minimization of the ligand in a modelled or crystallo- graphically determined receptor site. either alone or as one component of an energy equation or 3D QSAR. They can allow the prediction of activity a priori (i.e. prior to synthesis and assay). thus permitting pre-screening of ideas and prioritization of ligands for chemical synthesis. Due to their speed, they are amenable to use with large numbers

81

Page 91: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

and types of compounds for a given receptor target and a given assay system. Theobvious shortcomings of most energy minimization methods are neglect of solvation/de-solvation, receptor flexibility, ligand flexibility and entropic effects. In general, theyappear to be most successful when employed for a set of ligands that occupy similar sites on a receptor whose structure does not vary greatly from one ligand/receptor complex to another. While this sounds restrictive, it is often the case in the optimization of a ligand structure, either via single compound synthesis or the design of a modular or combinator-ial library. We have demonstrated the utility of such an approach in the rational design ofHIV-1 protease inhibitors, which facilitated the design of CRIXIVAN®, an HIV-1 pro-tease inhibitor indicated for the treatment of AIDS.

References

1.

2.

Kollman, P., Free energy calculation: Applications to chemical and biochemical phenomena, Chem.Rev., 93 (1993) 2395–2417.Oprea, T.I., Waller, C.L. and Marshall, G.R., 3-dimensional quantitavie structure-activity relationship ofhuman- immunodeficiency-virus-(l) protease inhibitors: 2. Predictive power using limited exploration ofalternate binding modes,. J. Med. Chem., 37 (1994) 2206–2215.Waller, C.L., Oprea, T.I., Giolitti, A. and Marshall, G.R., 3- dimensional QSAR of human-immunodeficiency-virus-(I) protease inhibitors: 1. A COMFA study employing experimentally- determined alignment rules, J. Med. Chem., 36 (1993) 4152–4160.Doweyko, A.M., Three-dimensional pharmacophores from binding data, J. Med. Chem., 37 (1994)

3.

4.1769-1778.

82

Page 92: 3D QSAR in Drug Design. Vol2

A Priori Prediction of Ligand Affinity by Energy Minimization

5. Meng, E.C., Kuntz, I.D., Abraham. D.J. and Kellogg, G.E., Evaluating docked complexes with the HINTexponential function and empirical atomic hydrophobicities, J. Comput.-Aided Mol. Design. 8 (1994)299–306.Nauchitel, V., Villaverde, M.C. and Sussman, F., Solvent accessibility as a predictive tool for the free- energy inhibitor binding to the HIV-1 protease, Protein Science. 4 (1995) 1356–1364. Wang, H. and Ben-Naim. A., A possible involvement of solvent-induced interactions in drug design, J. Med. Chem., 39 (1996) 1531–1539.Wallqvist, A., Jernigan, R.L. and Covell, D.G., A preference-based free-energy parameterization of enzyme-inhibitor binding: Applications to HIV 1 protease inhibitor design, Protein Science. 4 (1995)1881–1903.Wallqvist, A. and Covell, D.G., Docking enzyme-inhibitor complexes using a preference-based free-energy surface, Proteins: Struct., Funct. Gene.. 25 (1996) 403–419.Verkhivker, G., Appelt, K., Freer. S.T., and Villafranca. J.E., Empirical free energy calculations ofligand-pretein crystallographic complexes: I. Knowledge-based ligand-protein interaction potentials

(1995)677–691.Verkhivker, G.M. and Rejto, P.A., A mean field model of ligand- protein interaction Implication for the structural assessment of human immunodeficiency virus type I protease complexes and receptor-specific binding, Proc. Natl. Acad. Sci. USA. 93 (1996) 60–64.Meng, E.C. Shoichet, B.K. and Kuntz, I.D., Automated docking with grid-based energy evaluation,J. comput. Chem., 13 (1992) 505.Verlinde, C.L.M.J., Rudenko, G., and Wim, G.J.H., In search of new lead compounds for trypano-somiasis drug design: a protein structure-based linked-fragment approach, J., comput.-Aided Mol.Design, 6 (1992) 131–147. Rotstein, S.H. and Murcko, M.A., Groupbuild: A fragment-based method for de novo drug designJ. Med. Chem., 36 (1993) 1700–1710.

development of a simple empirical scoring function to estimate the binding constantfor a protein-ligand complex of known three-dimensional structure, J. Compul.-Aided Mol. Design, 8(1994) 243–256.Bohacek, R.S.: McMartin, C., De novo designed of highly diverse structures complementary to enzyme binding sites: Application to thermolysin, In Rey nolds, C.H., Holloway, M.K. and Cox. H.K., (Eds.) Computer-aided molecular design: Applications in agrochemicals, materials and pharmaceuticals, ACSSymposium series 589. American Chemical Society, Washington, DC, 1995. pp. 82–97.Head, R.D., Smythe, M.L., Oprea, T.I., Waller, C.L., Green. S.M. and Marshall, C.R., VALIDATE: Anew method for the receptor- based prediction of binding affinities of novel ligands, J. Am. Chem. Soc.,118 (1996) 3959–3969.Ortiz, A.R., Pisabarro, M.T., Gago. F. and Wade, R.C., Prediction of drug binding affinities by com-parative binding energy analysis, J. Med. Chem., 38 (1995) 2681–2691.Vajda, S., Weng, Z., Rosenfeld, R. and DeLisi, C., Effect of conformational flexibility and solvation onreceptor-ligand binding free energies, Biochemistry, 33 (1994) 13977–13988. King. B.L., Vajda, S. and Delisi. C., Empirical-free-energy as a target function in docking and design:Application to HIV-1 protease inhibitors, FEBS Lett., 383 (1996) 87–91. Taylor, N.R. and von Itzstein, M., Molecular modeling studies on ligand-binding to sialidase frominfluenza virus and the mechanism of catalysis, J. Med. Chem., 37 (1994) 616–624.De Winter. H.L., and von Itzstein, M., Aldose ruductase cis a target for drug design-Molecular modelingcalculations on the binding of acyclic sugar substrates to the enzyme, Biochemistry, 34 (1995) 8299–8308.Grootenhuis. P.D.J. and van Galen, P.J.M., Correlation of binding affinities with nonbonding inetraction energies of thrombin-inhibitor complexes, Acta Cryst., D51 (1995) 560–566.Kurinov, I.V. and Harrison. R.W., Prediction of New Serine Proteinase Inhibitors, Structural Biology, 1(1994) 735–743.Sansom, C.E., Wu, J. and Weber, I.T., Molecular mechanics analysis of inhibitor binding to HIV-1protease, Protein Eng., 5 (1992) 659–667.

6.

7.

8.

9.

10.

applied to the prediction of human immunodeficiency virus 1 protease binding affinity, Protein Eng., 8

11.

12.

13.

14.

15. Böhm, H.-J., The

16.

17.

18.

19.

20.

2 1.

22.

23.

24.

25.

83

Page 93: 3D QSAR in Drug Design. Vol2

M. Katharine Holloway

26.

27.

28.

29.

Weber, I.T., Harrison. R.W., Molecular mechanism calculations on HIV-1 protease with peptide- substrates correlate with experimental data, Protein Eng., 9 ( 1996) 679–690. Miertus, S., Furlan, M., Tossi, A. and Romeo. D., Design of new inhibitors of HIV-1 aspartic pretease,Chem. Phys., 204 (1996) 173–180.Tossi, A., Furlan, M., Antcheva, N., Romeo. D. and Miertus, S., Efficient inhibition of HIV-1 asparticprotease by sunthetic, computer designed peptide mimetics, Minerva Biotec., 8 (1996) 165–171.Viswanadhan, V.N., Reddy, M.K., Wlodawer, A., Varney, M.D. and Weinstein. J.N., An approach torapid estimation of relative binding affinities of enzyme inhibitors: Application to peptidomimeticinhibitiors of the human immunodeficiency virus type I protease, J. Med. Chem., 39 (1996) 705–712. Holloway, M.K., Wai, J.M., Halgren, T.A., Fitzgerald, P.M.D., Vacca. J.P., Dorsey, B.D., Levin. R. B..Thompson, W.J., Chen, L.J., Desolms, S.J , Gaffin, N., Ghosh, A.K., Giuliani, E.A , Graham. S.L., Guare, J.P., Hungate, R.W., Lyle, T.A., Sanders, W.M., Tucker. T.J., Wiggins, M., Wiscount. C.M.,Woltersdorf, O.W., Young. S.D., Darke, P L., and Zugay. J.A., A priori prediction of activity for HIV-1protease inhibitors employing energy minimization in the active site, J. Med. Chem., 38 (1995) 305–317.Holloway, M.K. and Wai, J.M., Structure-based design of human immunodeficiency virus-1 proteaseinhibitor: Correlating calculated energy with activity, In Reynolds. C.H., Holloway, M.K., and Cox,H.K. (Eds. ) Computer-aided molecular design: Applications in agrochemicals, materials, and pharma-ceuticals, ACS Symposium series 589. American Chemical Society, Washington. DC. 1995, pp. 36–50.Thompson, W.J., Fitzgerald. P.M.D., Holloway, M.K., Emini, EA., Darke, P.L., McKeever. B.M., Schleif, W.A., Quintero, J.C., Zugay. J.A., Tucker. T.J., Schwering, J.E., Homnick C.F., Nunberg, J.,Springer, J.P. and Huff. J.R., Synthesis and antiviral activity of a series of HIV-1 protease inhibitorswith functionality tethered to the PI or PI, phensl sustituents: X-ray crystal structure assisted design,J . Med. Chem., 35 (1992) 1685–1701.Hofmann, T., Hodges, R.S. and James, M.N.G., Effect of pH on the activities of Penicillopepsin and Rhizopus pepsin and a proposal for the productive substrate binding mode in Penicillopepsin,Biochemistry, 23 (1984) 635–643.Hyland, L.J., Tomaszek., T A., Jr. and Mcek. T.D., Human immunodeficiency virus-1 protease: 2. Use of pH rate studies and solvent Kinetic isotope effects to elucidate details of chemical mechanismBiochemistry. 30 (1991) 8454–8463.CHARMm version 21.1.7b.; available from Molecular Simulations, Inc., Burlington, MA. U.S.A. Available from W. Clark Still. Department of Chemistry. Columbia University, New York, U.S.A..Allinger, N.L., Conformational analysis 130. MM2: A hydrocarbon force field utilizing V1 and V2torsional terms, J. Am. Chem. Soc., 99 (1977) 8127. Lam. P.Y.S., Rational design of potent, bioavailable, nonpeptide cyclic ureas as HIV proteaseinhibitors, Science. 263 (1994) 380–384. Thaisrivongs, S., Random and Rational: Lead Generation via Rational Drug Design and CombinatorialChemistry, New York, 19–20 October 1994.Vacca. J.P., Fitzgerald. P.M.D., Holloway, M.K., Hungate, R.W., Starbuck. K.E., Chen, L.J., Darke,P.L., Anderson. P.S., and Huff, J.R., Conformationally constrained HIV-1 protease inhibitors, Bioorg.Med. Chem. Lett.. 4 (1994) 499–504.Ghosh, A.K., Thompson, W.J., Fitzgerald, P.M.D., Culberson, J.C., Axel. M.G., McKee, S.P., Huff, J.R. and Anderson. P.S., Structure based design of HI\/-1 protease inhibitorw: Replacement of two amides and a 10π-aromatic system by a fused bis-tetrahydrofuran, J. Med. Chem., 37 (1994) 2506–2508.Chen, Z., Li, Y., Chen, E.. Hall. D.L., Darke, P.L, Culberson, J.C., Shafer, J. and Kuo, L.C., Crystalstructure at 1.9-A resolution of human immunodeficiency virus (HIV) II protease complexed withL-735, 524; An orally bioavailable inhibitor of the HIV protease, J. Biol. Chem., 269 (1994) 26344–26348.

30.

31.

32.

33.

34.

35.36.37.

38.

39.

40.

41.

42.

84

Page 94: 3D QSAR in Drug Design. Vol2

Rapid Estimation of Relative Binding Affinities of EnzymeInhibitors

M. Rami Reddyª, Velarkad N. Viswanadhanb and M.D. ErionªªMetabasis Therapeutics, Inc., 9390 Towne Centre Drive, San Diego, CA 92 12 I, U.S.A.

bAmgen, Inc., 1840 Deltavilland Drive, Mail Stop 14-2-8, Thousand Oaks, CA 91360, U.S.A.

1. Introduction

Computer-assisted molecular model ling (CAMM) comprises a variety of computationalmethodologies intended to quantitatively or qualitatively describe molecular properties. In some cases, the field has advanced to a state where accurate predictions are possible(e.g. geometric and electronic properties of small molecules). In other cases, however, the properties are complex and require either advances in theory or substantial increases in computational power (e.g. protein folding). One application of CAMM that hasreceived considerable attention over the past two decades entails its use as an aid indrug-design. Ideally, CAMM would provide rapid and accurate prediction of drug-target binding affinities such that a large and structurally diverse population of potentialtargets could be evaluated and thereby prioritized prior to chemical synthesis. In reality,methodologies have been advanced that either provide qualitative rank ordering of alarge number of molecules in a relatively short period of time or. at the other extreme, generate quantitatively accurate predictions of relative binding affinities for structurallyrelated molecules using substantial computing power. Consequently, techniques thatincrease speed without greatly compromising accuracy (or vice versa) are of value todrug-discovery programs.

Advances in protein crystallography and molecular simulations have greatly aided computer-assisted drug-design paradigms and the accuracy of their binding affinity pre-

ligand in the binding site to calculation of relative binding affinities using molecular dy-namics simulations in conjunction with the TCP approach [3,4]. Figure 1 shows aflowchart employed by drug-discovery groups for structure-based drug-design.Typically, the process begins by generating a working computational model from crys-tallographic data which includes the development of molecular mechanics parameters for non-standard residues, building any missing segments, assigning the protonation states of histidines, and orientation of carbonyl and amide groups of Asn and Gln aminoacid residues based upon neighboring donor/acceptor groups. Inhibitor design is thenaided by a variety of visualization tools. For example. hydrophobic and hydrophilicregions of the active site are readily identified by calculating the electrostatic potentialat different surface grid points. Frequently, the information gained on the characteriza-tion of the active site is supplemented by analyses of ligand conformational energies.

These studies are often followed by an estimation of binding affinity which is per-formed at three different levels or complexity (Fig. 1) depending on computationalpower, time and resources. namely: (i) qualitative predictions based on docking/ visual-ization and molecular mechanics calculations; (ii) quantitative predictions based on

H. Kubinyi et al, (eds. ), 3D QSAR in Drug Design, Volume 2. 85–98.© 1998 Kluwer Academic Publishers. Printed in Great Britian.

dictions [12]. Methods of inhibitor design range from graphical visualization of the

Page 95: 3D QSAR in Drug Design. Vol2

M. Rami Reddy, Velarkad N. Viswanadhan and M.D. Erion

Fig. 1. Flowchart for the structure-based drug-design Paradigm.

86

Page 96: 3D QSAR in Drug Design. Vol2

Rapid Estimation of Relative Binding Affinities of Enzyme Inhibitors

TCP calculations: and (iii) semi-quantitative predictions based on regression methodsthat incorporate interaction variables (intra- and intermolecular interaction energies) andligand properties (desolvation, log P, etc.). Based on the predicted binding affinities,top-scoring compounds are synthesized and tested for activity. In this review, applica-tions of the TCP methods for predicting the binding affinity quantitatively are first dis-cussed, followed by a review of methods used to predict ligand binding affinityqualitatively. These methods are then compared with more recent structure-based 3DQSAR approaches.

2. Free Energy Perturbation Methods

Thermodynamic cycle perturbation (TCP) has been reported to predict accurately therelative binding affinities between two related inhibitors. Accurate prediction of ligandbinding affinity will be dependent on the accuracy of each factor that effects binding.One important factor often neglected in most approximate methods used to determine relative binding affinities is desolvation. The importance of desolvation to bindingaffinity is clearly evident in calculations using the TCP method. For example, TCP cal-culations on transition state inhibitors bound to thermolysin carried out by Merz and Kollman [5] predicted that the replacement of an -NH group with a methylene groupwould not be detrimental to binding affinity, despite the loss of a good hydrogen bondbetween the NH and a backbone amide carbonyl. The principal reason shown clearly bythe calculation was due to differences in inhibitor desolvation. This prediction, whichwas made ahead of biochemical measurements, was later confirmed experimentally [5].

More recently, several research groups used the TCP method and calculated relativebinding differences of HIV-1 protease inhibitors. Reddy et al. [6] evaluated the relativebinding affinities of JG365 and its analog (JC365A) lacking the penultimate valineresidue. The calculated difference in the relative binding free energy (3.3±1 . 1 kcal/mol) is in good agreement with the experimental value of 3.8 ± 1.3 kcal/mol.This calculation showed that the loss of main-chain hydrogen bonds of the valine and itsside-chain hydrophobic interactions with protein residues leads to considerable loss ofpotency for the JG365A. The observed binding preference for JG365 was explained bythe stronger ligand-protein interactions, which dominate over an opposing contributionarising from the larger desolvation of JG365 relative to the truncated analog (JG365A).Ferguson et al. [7] computed the relative binding affinities of the S and R enantiomersof JG365 and reported good agreement with experiment. Tropshaw and Hermans [8]employed the TCP slow-growth method to estimate the relative binding affinities of theS and R enantiomers of JG365 and showed good agreement with experiment.Calculations by Rao et al. [9] for other HIV-1 protease inhibitors showed good agree-ment with experiment. Reddy et al. [10]evaluated a large set of related analogs usingcomputer-assisted drug-design methods that combine molecular mechanics, dynamics,TCP calculations, inhibitor design. synthesis, biochemical testing and crystallographicstructure determination of the protein-inhibitor complexes. The calculated relativebinding free energies were successfully incorporated into the design of novel HIV-1protease inhibitors. This study involved a large set of molecules whose relative binding

87

Page 97: 3D QSAR in Drug Design. Vol2

M. Rami Reddy, Velarkad N. Viswanadhan and M.D. Erion

affinities were predicted using TCP method prior to synthesis, and were later confirmedby experimental measurements.

Though the TCP method is theoretically more accurate, it suffers from some practicallimitations. Firstly, it requires substantial computational resources for the developmentof high-quality force-field parameters, as well as for the molecular dynamics/MonteCarlo calculations (a typical mutation takes about 1 week of CPU time on IBMRS6000/590 workstation). Secondly, the technique is limited to analogs that differ byrelatively small structural changes, because it is often difficult to obtain well-convergedresults for larger mutations. Thirdly, exploring the available conformation space withina reasonable amount of time is often difficult for a molecule with many rotatable bonds(e.g. peptidomimetic inhibitors of HIV- 1 protease). Consequently, the TCP method hasnot been widely used in drug-design. It should be noted, however, that with increasedcomputational resources and improved force-field parameters, this method offersvirtually unlimited potential.

3. Estimation of Binding Affinities

While the TCP method is useful in the prediction of relative binding affinities of struc-turally similar inhibitors, real-life drug-design problems involve the calculation of rela-tive binding affinities for inhibitors with a greater degree of structural dissimilarity.Hence, faster methods that can accommodate structural diversity are being developed.In some cases. predictions of inhibitor binding has been based soley on a visual analysisof structures without any calculations of binding energies. These methods [11] relied ongraphical analysis of features such as steric, hydrophobic and electronic complementar-ity of the docked inhibitor to the target protein, the extent of buried hydrophobic surface and the number of rotatable bonds in the inhibitor. Quantitative descriptors, based onmolecular shape [12] and grid-based energetics [13], have also proved to be useful.More advanced methods have used empirical scoring functions [14], derived fromcrystal structure data rind experimental binding affinities. Though molecular mechanicsmethods would he expected to enhance the accuracy of these studies, results from avariety of studies have shown only modest success [15], which has been attributed tothe large approximations involved i n the analysis (e.g. solvent model used, lack ofentropic terms, etc.).

Recently, some improvements have been realized through inclusion of an analysis ofthe binding conformations of new analogs by Monte Carlo (MC)/energy minimization(EM) methodology, and qualitative prediction of relative binding affinities using molec-ular mechanics calculations to evaluate interaction energies and ligand strain. Forexample, purine nucleoside phosphorylase (PNP) inhibitors [16–18] were designed and evaluated prior to synthesis using MC/EM techniques to derive the binding con-formations and interaction energies that were used in predictions of relative bindingaffinities. The energy differences between inhibitors were thought to reflect relativebinding affinities, since the structures were not highly dissimilar and contained fewrotatable bonds, both of which suggested that differences in solvation and entropywould contribute minimally to binding affinity. Although the binding conformations

88

Page 98: 3D QSAR in Drug Design. Vol2

89

Fig.

2.

Stru

ctur

esof

the

HIV

-1 p

rote

ase

inhi

bito

rs c

onsi

dere

d in

this

wor

k.

Page 99: 3D QSAR in Drug Design. Vol2

M. Rami Reddy, Velarkad N. Viswanadhun and M.D. El-ion

were accurately predicted in this study, interaction energies for some inhibitors proved to be less informative (data unpublished), presumably because of unaccounted factors such as desolvation and entropy. Another example using these techniques [19] was reported for the design of HIV-1 protease inhibitors. In this case, the binding modes of the putative/proposed inhibitors were obtained by carefully aligning them with the known crystal structures of inhibitors in the active site of the enzyme. These inhibitors, which are shown in Fig. 2, were then evaluated by performing minimization cal- culations both in solvent and in complex using the AMBER [20] force field. ∆E scoresare shown in Table 1 and were computed from two sets of minimizations. ∆Eintra thedifference in intramolecular interaction energies for the energy-minimized inhibitor in the bound and solution conformations. ∆Einter is the corresponding difference in inter-molecular interaction energies. Since these energies are associated with significant uncertainties, the results were not expected to match the quality of results from TCP simulations, hence only a qualitative agreement in the overall trends with experimental results was expected using this method. Nevertheless. analysis of the energy differences offered a rationale for the preferential binding of JG365 and AG1007 to the HIV-1 pro-tease relative to their respective analogs. The interaction energies of JG365 andAG1007 in the protein complex are greater than the corresponding interaction energiesin the solvent state. and is presumably the reason these inhibitors have high affinity despite their larger desolvation energies (relative to their analogs, JG365A and AG1006). However, as shown in Table 1, these energy differences do not agree quantitatively with experimental binding free energies.

Table1 Differences in the calc ulated interaction energies of HIV- 1 protease inhibitors (in kcal/mol) in thecomplex and solvated states. ∆Eintra is the difference in intramolecular interaction energy between the complexed and solvated states of an inhibitor; and ∆Einter is the corresponding difference in intermoecular interaction energy The sum of the two numbers (∆Eτοτ) represents the total interaction energy of an inhibitor in the complex state relative to the solvated state. Score for the hydrophobic interaction (P) is also shown for each ligand complex (calculated using equation 10)

System ∆Eintra inter ∆Etot P

JG36S 7.6 -74.5 -70.9 -2.63JG365A 8.2 -59.9 -51.7 -2.41

AG1007 seriesI (AG 1006) 2.7 -63.0 -60.3 –1 .40II (AG1007) 2.4 –65.0 -62.6 -1.53III 4.9 –62.6 -57.7 -1.50IV 3.9 –67.3 –63.4 -1.78V 4.4 –65.7 -61.3 -1.86VI 7.4 -67.0 –59.6 -1.43VII 4.8 -66.6 -61.8 -1.38VIII 5.9 -62.9 -57.0 -1.78IX 7.2 -66.9 -59.7 -1.95

90

∆E

Page 100: 3D QSAR in Drug Design. Vol2

4. Regression Methods

4.1. Introduction

The methods discussed above either provide an accurate prediction of relative bindingaffinities (TCP), but with some significant constraints, or provide only qualitative trendsfor relative binding affinities across a more structurally diverse set of compounds.Ideally, methods that combine both of those features would greatly enhance the utilityof computational methods to drug-design. Increased structural diversity, however, re-quires accurate calculation of additional factors that significantly impact the coin-pounds’ binding affinity. For example, the larger the difference in structure, the greaterthe chance that solvation, hydrophobic effects, conformational flexibility. etc. willinfluence relative binding affinities. Understanding the magnitude of each contributionis key to an accurate prediction. Since an equation that accurately incorporates eachfactor has not been derived, we cannot expect to calculate absolute binding free ener-gies. One strategy for increasing the accuracy of traditional 3D QSAR approaches[2-26] would be to include crystal structure information from protein complexes [27,28]. For example, i n the study by Marshall and co-workers [26], the use of crystalstructure data in the determination of alignment rules and field-fit minimization isshown to enhance CoMFA [21] predictions. However, even these methods do not takefull advantage of the crystallographic information available, since they do not includescoring functions that incorporate interaction variables.

Recently, the results of molecular mechanics calculations on protein complexes havebeen used in regression-based approaches for prediction of relative binding affinities inconjunction with other molecular properties. Unlike the 3D QSAR extensions mentionedearlier, where crystallographic information is used mainly for obtaining alignments for amolecular similarity analysis, these approaches use the information to aid in energetic cal-culations. Holloway and co-workers [29] predicted the binding affinities of a number ofHIV- 1 protease inhibitors using the Merck force field [30] for calculations of intermolecu-lar interaction energies. This study, however, did not include other relevant molecularproperties (such as desolvation and entropy contributions) that are likely to be importantto binding affinity. Head et al. [31] developed a regression-based method (VALIDATE)for the receptor-based prediction of binding affinities of novel ligands. In this method,energy variables calculated by molecular mechanics using the generalized Born/ surfacearea (GB/SA) implicit water model are used as enthalpy of binding and properties such ascomplementary hydrophobic surface area are used to estimate the entropy of bindingthrough heuristics. In this study, 51 diverse crystalline complex structures were used to as-semble the training set and the predictive models employed both conventional regressionand artificial neural networks. This study shows that a regression-based approach canqualitatively assess a diverse set of compounds and targets in a single model.

4.2.

In an effort to develop a regression-based method for semi-quantitative prediction ofrelative binding affinities, we have examined a set of HIV-1 protease inhibitors (Fig. 2)

91

Computational Models and Energy Minimization Methods

Rapid Estimation of Relative Binding Affinities of Enzyme inhibitors

Page 101: 3D QSAR in Drug Design. Vol2

M. Rami Reddy, Velarkad N. Viswanadhan and M.D. Erion

[19]. In our approach [19], the energy variables (intra and inter) calculated by perform-ing molecular mechanics calculations both in complex and solvated states using anexplicit water model, the strength of hydrophobic interaction and number of buriedrotatable bonds of inhibitor are used to derive a regrescion equation for predictingrelative binding free energy differences.

In the method, the protein, inhibitors (Fig. 2) and solvent waters were modelled usingthe AMBER [20] all-atom force field. The computational model of the JG365A(Fig. 2a) complex with HIV-1 protease was derived from the X-ray structure of theJG365 complex. Similarly, crystal structures of HIV-1protease complexes of AG 1006and AG I007 (Fig. 2b) were used to obtain the computational models of HIV- 1 proteasecomplexes with other inhibitors in the series. The hydrogens of the crystallographic water, the inhibitor and the protein dimer were added using the EDIT module ofAMBER. Hydroxyl hydrogens were oriented to obtain the best possible geometry forhydrogen bonding. Electrostatic charges and parameters for the standard residues of theinhibitor model were taken from the AMBER database. For non-standard residues of allthe inhibitors. electrostatic charges were fitted with CHELP [32] from ab initio [33](single-point) HF/6-3 1G* wave functions, using structures optimized at HF/3-2 1G*level. One of the aspartic acids in the catalytic dyad (Asp 124) was protonated in all cal-culations. All equilibrium bond lengths. bond angles and dihedral angles for non-standard residues were calculated from ab initio (HF/3-2 IG *) optimized structures.Missing force-field parameters were estimated from similar chemical species in theAMBER database. For the protein complex calculations, the solute was immersed in alarge spherical water bath (of radius 25 Å) constructed from repeated cubes of extendedsimple-point charge (SPC/E) [34] water molecules which represented a snapshot froman MD simulation of liquid water [35]. The SPC/E rigid geometry model potential wasused to model explicitly the solvent water. For the solvent calculations, the solute wassolvated with SPC/E water i n a rectangular box whose dimensions allowed a 10.0 Ålayer of water to surround the solute atoms. Water molecules located less than 2.5 Åfrom the solute atom were removed.

Molecular mechanics calculations (energy minimizations) on all the structures were performed using the BORN module of the AMBER program. A four-stage protocol wasset up for energy minimizations of the protein-inhibitor complexes, as well as solvatedinhibitors. All the technical details are described in our earlier paper [19].

The minimized structures, both in the complexed and in the solvated states, were usedfor calculating the internal strain on the ligand upon binding (∆Ebind(intra)) and relativeinteraction energy of the ligand in the complex versus solvent (∆Ebind(inter)), as given by

(1)

where Ecom(intra) refers to the intramolecular interaction energy of the ligand in thebound state, and Esol(intra) refers to the intramolecular interaction energy of the ligandin aqueous medium.

Similarly, ∆Ebind(inter) is calculated. using

( 2 )

92

∆Ebind(intra) = Ecom(intra) – Esol(intra)

∆Ebind(intra) = Ecom(intra) – Esol(inter)

Page 102: 3D QSAR in Drug Design. Vol2

Rapid Estimation of Relative Binding Affinities of Enzyme Inhibitors

where Ecom(inter) refers to the intermolecular interaction energy of the ligand in thebound state. and Esol(inter) refers to the intermolecular interaction energy of the ligandin the aqueous medium. Electrostatic and van der Waals interactions may be weighteddifferently by separating ∆Ebind (inter) into electrostatic and van der Waals components.Scores for intra- and intermolecular components of relative differences in interactionenergies for a pair of ligands L1 and L2 are given by

(3)

(4)

∆∆Ebind (tot: L1 L2) from Eq. 5 is used to score the total relative difference in thebinding energies of Ll and L2 to the protein

This energy difference is used to calculate the relative binding free energy difference (∆∆Gbind) between inhibitors L1 and L2. Earlier TCP calculations [6,10] on these coin-pounds offered explanations for the relative binding affinities of several pairs of inhibitors, as shown in Table 2. Scores for relative intra- and intermolecular interaction energy differ- ences (Eqs. 3–5) are listed in Table 2, along with corresponding TCP-calculated results. In order to understand the effect of different N-terminal groups on the final minimization results, energy calculations were performed using both asparaginequinoline and Ala-Ala as N-terminal groups for the compounds 2, 4, 5. 7 and 9. No significant energy differ- ences were found between these two sets of calculations. Therefore, results presented inTables 1 and 2 are consistent with our earlier published work [ 19]. These scores were used in developing the regression equations and calculating correlations.

Table 2(expt.)) and by TCP simulations (∆∆Gbind (TCP)). are compared with the scores of relative enthalpic differ- ences ∆∆Ebind (calc. ), and changes in hydrophobic interactions strength (∆Pbind (calc.)) for pairs analogous inhibitors of the HIV-1 protease

Relative differences in the free energy of binding (kcal/mol) as observed experimentally (∆∆Gbind

‘Mutation’ ∆∆Gbind (expt.) ∆∆Gbind (TCP) ∆∆Ebind (calc.) ∆Pbind (calc.)

JG365 JG36SA 3.80 3.3 19.2 0.22

AG1007 series II I 1.30 1.9 2.3 0.13II III 1.95 1 .3 4.9 0.03II* IV* –0.16 0.2 –0.8 –0.25II* V* –0.06 0.4 1.3 –0.33II VI – 1.1 3.0 0. 10II VII – 0.8 0.8 0.15II* VIII* 2.03 – 5.6 –0.25II* IX* 0.86 – 2.9 -0.42

*Experimental values for these molecules are based on a different N-terminal group. an asparagine–quinoline moiety replacing Ala–Ala in the compounds II, IV, V. VIII and IX.

93

Page 103: 3D QSAR in Drug Design. Vol2

M. Rami Reddy, Velarkad N. Viswanadhan and M.D. Erion

4.3. Results and discussion

In order to identify the variables that are important for the semi-quantitative predictionof HIV-1 protease inhibitors’ binding affinity, a variety of regression equations wereevaluated. Initially, intermolecular interaction energy of the inhibitor to the protein was used as the only regression variable. For a ‘mutation’ L1 to L2, this variable may be defined as

(6)

A meaningful regression model, however, could not be obtained. presumably becausethe model lacked solvent contributions. By taking the role of solvent (desolvation) intoaccount (Eq. 4), a good regression model was obtained

(7)

Inclusion of ligand strain (Eq. 5; (tot))further improved the regression model

(8)

By treating the scores for intermolecular (Eq. 4) and intramolecular (Eq. 3) interactionenergies as independent variables in a multiple linear regression (MLR) model, slightlybetter statistics were obtained

(9)

It is clear that the present dataset is too small to develop MLR models with more thantwo independent variables, since MLR models need to be cross-validated. For largerdatasets, further enhancement of the MLR model may be possible by separating the in-teraction energy score into electrostatic and van der Waals terms. and by inclusion ofother related variables to represent the hydrophobic interactions. For example,hydrophobic interactions for a given ligand ‘L’ binding to a protein, as applied to thepresent set of HIV-1 protease inhibitors [19],can be represented by

(10)

where. P(int) is the score for hydrophobicity interaction. constants and representatomic hydrophobicity constants assigned to atoms i and j in the ligand and protein,respectively, and is the distance between them. An exponential function is used toaccount for distance dependence of hydrophobic interactions. Earlier studies havedemonstrated the usefulness of the exponential functional form in scoring ligand-protein hydrophobic interactions [18]. The difference in hydrophobic interaction energy

94

Page 104: 3D QSAR in Drug Design. Vol2

Rapid Estimation of Relative Binding Affinities of Enzyme Inhibitors

for a given modification of a ligand, L1 L2, was calculated in these studies using thefollowing expression

(11)

where P(int: L2) and P(int: L1) are the scores for hydrophobic interaction (Eq. 10) forligands L2 and L1, respectively. Using this variable (Eq. 11) alone, the tollowingregression equation was obtained

(12)

The best two-variable model ( in terms of correlation coefficient) is obtained when(tot) (Eq. 5 ) and (Eq. 11) are used as independent variables

(13)

For this model, a leave-one-out cross-validation gave r = 0.84 and RMS= 0.81, indicat-ing a reasonable predictive power for this model. When three independent variables

(inter) (Eq. 4), (intra) (Eq. 3) and (Eq. 11)) are used, an almostperfect correlation ( r = 0.99; RMS = 0.38) is obtained, though with 7 data points thedataset is too small for a three-variable model. However, it is worth noting that a leave-one-out cross-validation of the three-variable model gave better predictive statistics( r=0.93; RMS= 0.56).

The above results show that interaction energy scores based on present molecularmechanics calculations are quite useful in developing regression equations lor predict-ing the relative binding affinity, semi-quantitatively. of inhibitors to the HTV- 1 proteaseenzyme. The predicted results correlated well with experimental measurements whensolvation effects were included. Adding hydrophobicity variable, produced slightimprovement in r ( r = 0.92 for the one-variable model versus 0.94 for the two-variablemodel) for this dataset. These results suggest that regression models offer a rapid wayof semi-quantitatively predicting relative binding affinities of inhibitors within a con-generic series and, therefore, a viable alternative to the TCP method. Recently, a similarprocedure was used to develop a multi-variable regression equation ( r = 0.92 and leave-one-out cross-validation gave r = 0.8 1 ) for 25 inhibitors of Fructose- 1,6-bisphosphatase, and applied successfully for designing and optimizing new inhibitors prior to synthesis[36].

5. Conclusion

The TCP method remains the most accurate method for calculating relative binding af-finity of inhibitors to an enzyme. However, due to its relative complexity and computation-intensive nature, practical applications are restricted to analysis of structurally relatedinhibitors. Accordingly, there is a need for methods that enable rapid assessment of a

95

Page 105: 3D QSAR in Drug Design. Vol2

M.Rami Reddy, Velarkad N. Viswanadhan and M.D, Erion

large number of structurally unrelated molecules in a reasonably accurate manner.Regression models, using energy variables based on molecular mechanics calculationsin both explicit solvent and complex states, the strength of hydrophobic interactionsand the number of rotatable bonds of the inhibitor, were shown to be valuable in therapid estimation of the relative binding free energy differences semi-quantitativelybetween two inhibitors. As shown with HIV-1 protease [19] and later with Fructose-1,6–bisphosphatase inhibitors [36], multivariate models that account for these propertiesare useful as a rapid computational alternative to TCP calculations. These regression-based models will continue to evolve and become more accurate as: (1) force-field para-meters become more refined. (2) other variables important for binding are included.(3) methods for estimating relative binding entropy changes improve. (4) docking and scoring procedures improve and (5) average molecular dynamics simulations are used to obtain energy variables.

References

1. Appelt, K., Bacquet, R.J., Bartlett, C.A. Booth. C.L., Freer. S.T., Fuhry, M.A.M., Gehring, M.R.,Hermann. S.M. Houland, E.F., Janson, C.A., Janes, T.R., Kan. C., Kathardekar, V.,Lewis, K.K., Marzoni, G.P., Matthew. D.A., Molhr, C W.M.E., Morse, C.A., Oatley, S.J., Opden, R.O.,Reddy, M.R., Reich, S.H., Schoettlin, W.S., Smith. W.W., Varney, M.D., Villafranca, J.E., Ward. R.W.,Webber, S., Webber, S.E., Welsh, K.M. and White. J . . Design of enzyme inhibitors using iterative

Montgomery, J.A., Niwas, S., Rose. J.D., Secrist, J.A., Sudhakar Babu. Y., Bugg. C.E., Erion, M.E.,Guida. W.C. and Ealick, S.E. , Structure based design of inhibitors of purine nucleoside phosphorylase.1. 9-(arylmethyl) derivatives of 9-deazaguanine, J . Med. Chem., 36 (1993) 55-69.Beveridge, D.L. and DiCapua. F.M., Free energy via molecular simulation, Ann. Rev. Biophys.Biophys. Chem., 18 (1989) 431–492.McCammon, J.A., Free energy from simulation. Current Opinion in Structural Biology, 1 (1991)196-200.Mecz, K.M.. and Kollman, P.A., Free energy pertubation simulation of the inhibition of thermilysin: prediction ofthe free energy of binding ofa new inhibitor, J. Am Chem. Soc., 111 (1989) 5649–5658.

Ferguson. D.M., Radmer, R.J. and Kollman, P.A., Determination of the relative binding free energies ofpeptide inhibitors to the HIV-1 protease, J Riled. Chem., 34 (1991) 2654-2659.Tropshaw. A.J. and Hermans, J.. Application of free energy simulations to the binding of a transition

Rami Reddy, M., Varney. M.D., Kalish, V., Viswanadhan V.N. and Appelt, K., Calculation of relative

tion approach, J. Med. Chem., 114 ( 1994) 10117–10122.Bohacek, R.S. and McMartin, C., Definition and display of steric, hydrophobic, and hydrogen-bonding

Kurtz, J.D., Meng, E.C. and Shoichet, B.K., Structure-based molecular design. Ace. Chem. Res.,27 (1994) 117-123.

logically important macromolecules, J . Med. Chem., 28 ( 1985) 849-857.

‘ protein crystallographic analysis, J. Med. Chem., 34 (1991) 1925-1934.2.

3 .

4.

5 .

6.

7.

8.

9.

10.

11.

12.

13.

96

state analogue inhibitior to HIV-1 protease, Prot. Eng., 5 (1992) 29–33 Rao, B.G., Tilton, R.F. and Singh, U.C., Free energy perturbation studies on inhibitor binding to HIV-1

perturbation approach. Proc, Natl. Acad. Sci. USA. 88 (1991) 10287–10291.

Rami Reddy, M., Viswanadhan. V.N. and Weinstein, J.N., Relative\ free energy differences in the binding free energies of human immunoeficiency virus 1 protease inhibitors: A thermodynamic cycle

difference in the binding freeenergies of HIV- I protease inhibitors: A thermodynamic cycleper turba-

protease. J. Am. Chem. Soc., 114(1992) 4447–4452

properties of ligand binding sites in proteins using Lee and Richards accessible surface: Validation of a high-resolution graphical tool for drug design, J. Med. Chem., 35 (1992) 1671–1684.

for determining energetically favorble binding sites on bio-Goodford, P.J., A computational procedure

Page 106: 3D QSAR in Drug Design. Vol2

8 (1994) 243-256.Sansom, C.E., Wu, J. and Weber, I.T., Molec ular mechanics analysis of inhibitor binding to HIV-I pro-tease. Protein Eng., 5 (1992) 659-667.Montgomery, J.A., Erion, M.D., Niwas, S., Rose, J.D., Secrist, J.A., Babu, S.Y., Bygg. C.E., Guida.

Erion, M.D., Montgomery, J.A., Niwas, S., Rose, J.D., Ananthan, S., Allen, M., Secrist, J.A., Babu,

side phosphorylase. 3. 9-artmenthyl derivatives of 9-deazaguanine substituted on the methylene-group, J. Med. Chem., 36 (1993) 3771-3783.Secrist, J.A.I., Niwas, S., Rose, J.D., Babu, S.Y., Bugg, C.E., Erion, M.D., Guida, W.C., Ealick. S.E.and Montgomery. J.A., Structure-based design of inhibitors of purine nucleoside phosphorylase.2. 9-alicyclic and 9-heteroalicyclic derivatives of 9-deazaguanine, J. Med. Chem., 36 (1993)1847-1854.

Singh. U.C., Weiner, P.K., Caldwell, J.K. and Kollman, PA, AMBER 3.3. University of California, SanFrancisco. CA,U.S.A., 1986.Cramer. R.D., Pattercon, D.E. and Bunce, J.D., Comparative molecular field analysis (CoMFA).

porter protein: A 3 D-QSAR study, Biochim. Biophys. Acta, 1039 ( I 991 ) 356-366.Debnath. A.K.,Hansch, C., Kim, K.H. and Martin, Y.C., Mechanistic interpretation of the genotoxicityof nitrofurans (antibacterial agents)

Depriest, S.A., Mayer. D., Naylor, C.B. and Marshall. G.R., 3D QSAR of angiotensin-converting enzymeand thermolysin inhibitors: A comparison of CoMFA models based on deduced and experimentallydetermined active site geometries, J. Am. Chem. Soc., 115 (1993) 5372-5384.Kubinyi, H., 3D QSAR in drug design : Theory, methods and applicaitons, ESCOM Science publishersB.V., Leiden, 1993.

benzodiazepine agonists , J . Comput.-Aided Mol. Design, 7 (1993) 83–102.Waller, C., Oprea, T.I., Giolitti, A. and Marshall, G.R., Three-dimensional QSAR of human immunodeficiency virus (1) protease inhibitors: 1. A study emloying experimentally-determined align-

Oprea, T.I., Waller, C.L. and Marshall, G.R., 3D-QSAR of human immunodeficiency virus (I) protease

Holloway, K., Wai, J.M., Halgren, T.A., Fitzgerald, P.M., Vacca, J.P., Dorsey, B.D., Lev in, R.B.,Thompson. W.J., Chen, J.L., deSolms, J.S., Gaffin, N., Ghosh, A.K., Giuliani, E.A., Graham, S.L., Guare, J.P., Hungate, R.W., Lyle. T.A., Sanders. W.M., Tucker T.J., Wiggins, M., Wiscount, C.M.,Wolterdorf, O.W., Young, S.D., Darke, P.L. and Zugay, J.A., A priori prediction of activity for HIV-1

site, J. Med. Chem., 38 (1995) 305–317.

Head. R.D., Smythe, M.L., Oprea, T.I., Waller, C.L., Green. S.M. and Marshall, G.R., VALIDATE: AJ. Am. Chem. Soc.,

1I8 (1996) 3059-3969.

J. Comp. Chem., 8 (1987) 894–905.

D.J., Fox, D.J., Whiteside, R.J., Seeger, R., Melius, C.F., Baker J., Martin, R., Kahn, L.R., Stewart, J.J.,Gaussian: Pittsburgh, PA, U.S.A., 1988.

15.

16.

17.

18.

19.

20.

21.

22.

23.

molecular field analysis, J. Med. Chem., 36 (1993) 1007–1016.24.

25.

26.

27.

28.

29.

30.31.

32.

33.

97

Rapid Estimation of Relative binding Affinities of Enzyme Inhibitors

Boehm, H.-J., The development of a simple empirical scoring function to estimate the binding constant14.for a protein-ligand complex of known three-dimensional structure, J. Cornput.-Aided Mol. Design,

W.C. and Ealick, S.E., Structure-based design of inhibitors of purine nucleoside phosphorylase. 2.9-(arlmenthyl)derivatives of 9-deazguanini, J . Med. Chem., 36 (1 993) 55–69.

S.Y., Bugg, C.E., Guida. W.C. and Ealick, S.E., Structure-based design of i nhibitors of purine nucleo-

rapid estimation of relative binding affinities of enzymes inhibitors: application to peptidomimetic in- hibitors of the human immunodeficiency virus type1protease, J. Med. Chem., 39 (1996) 705-712.

Viswanadhan, V.N., Rami Redy, M. Wlodawer, A,,Varney, M.D. and Weinstein, J.N., An approach to

using quantitative structure-activity relationships and comparative

Martin. Y.C., A fast new approach to pharmacophore mapping and its application to dopaminergic and

inhibitors 111: Interpretation of CoMFA results. Drug Design and Discovery. 12 (1994) 29-51.

Viswanadhan, V.N., Ghose , A.K. and Weinstein.J.N., Mapping the binding site of the nucleoside trans-1. Effect of shape onbinding of steriods to carrier proteins, J. Am. Chem. Soc., 110 (1988) 5959–5967.

protease inhibitors employing energy minimization in the active

new method for the receptor-based prediction of binding affinitied of novel ligands,

Chirlian, L.E. and Franel, M.M., Atomic charges derived from electrostatic potentials: A detailed study,

Fluder, E.M., Topiel, S . and Pople, J.A., GAUSSlAN 88:

Halgren, T.A., MM forcefield, J. Comput. Chem., 17 (1996) 490.

ment rules, J. Med. Chem., 36 (1993) 41 52–41 60.

Frisch, M.J., Head-Gordon, M., Schelegel, H.B., Raghavachari, K., Binkley, J.S., Gonzales. C., Defrees

Page 107: 3D QSAR in Drug Design. Vol2

M. Rami Reddy, Velarkad N . Viswanadhan and M.D. Erion

34.

35.

36.

Berendsen, H.J.C., Grigera, J.R. and Straatsma, T.P., The Missing Term in Effective Pair Potentials,J. Phys. Chem., 91 (1987) 6269-6271.Rami Reddy, M. and Berkowitz, M., Dielectric constant of SPC/E water, Chem. Phys. Lett., 155 (1989)173-176.Rami Reddy, M., Viswanadhan. V.N.and Erion, M.D., Rapid estimation of relative binding affinities of

J. Med. Chem. (to be submitted).

98

enzyme inhibitors: Application to inhibitors of Fructose-1,6-bisphosphatase,

Page 108: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-Bonded Interaction Energies

Ronald M.A. Knegtel and Peter D.J. Grootenhuis*Department of Computational Medicinal Chemistry,

N. V. Organon, P.O. Box 20, 5340 BH Oss, The Netherlands

1. Introduction

The association of two molecules to form a stable complex involves a delicate interplayof electrostatic interactions, such as hydrogen bonds and ionic contacts, the matching ofhydrophobic surfaces and entropic effects [1]. A schematic representation of the factorsthat play a role in molecular association is given in Fig. 1. These factors determine thechange in free energy (∆G) upon complexation which is the resultant of a change inenthalpy (∆ H) and entropy (∆S). The logarithm of the dissociation or inhibition con-stant (pKi) of the formed complex is linearly related to the change in binding freeenergy:

(1)

in which T denotes the absolute temperature and R the gas constant. The linear relation-ship between ∆G and pKi poses the challenge to correlate energetic terms, which can be calculated from physical principles, with experimentally determined binding constants[2,3] .Especially in cases where the actual synthesis and evaluation of large numbers ofcompounds is too time-consuming or expensive, theoretical approaches capable ofaccurately predicting binding constants would be extremely useful [4]. Even if this isnot the case, such methodology may he used to prioritize the order of screening and synthesis. However, the derivation of such a ‘scoring function’ has proven to be a ratherchallenging task. In fact, the lack of reliable methods for predicting binding affinitiesstill presents one of the major bottlenecks in structure-based drug design [2,3,5].

In this chapter, recent research will be reviewed aimed at predicting binding con-stants and energies of molecular complexes for which the three-dimensional (3D) struc-ture is known. Applications using comparative molecular field analysis (CoMFA) [6,7]usually do not utilize experimental structural information regarding the receptor and thus fall outside of the scope of this review. It has been demonstrated, however, that insome cases the use of experimentally determined conformations of bound ligands forsuperpositioning can yield CoMFA models with improved predictive properties (seee.g. reference [8]). Although the methodology discussed here can be applied generally,we focus in this review on complexes between biopolymers, such as proteins and DNA.on the one hand, and small organic molecules or polypeptides, on the other hand. Suchsystems are particularly relevant for the discovery of enzyme inhibitors and receptor (ant)agonists with therapeutical applications. In the search for novel drug molecules, the

* To whom correspondence should be addressed.

2. 99-114© 1998 KluwerAcademic Publishers. Printed in GreatBritainH. Kubinyi et al. eds.), 3D QSAR in Drug Design, Volume

Page 109: 3D QSAR in Drug Design. Vol2

Ronald M.A. Knegtel and Peter D.J. Grootenhuis

Fig. 1. Schematic representation of the events and thermodynamic factors involved in ligand–receptor com-plexation. Upon binding to its receptor, the tran slational and rotational entrapy of the entire ligand isreduced and part of its solvent accessible surface becomes buried Interactions with the solvent are lost andreplaced by hydrogen-bond and van der Waals interactions at the newly formed protein–ligand interface Thisresults in the (partial) freezing of the internal rotational degrees of freedom of the ligand Similarily, part of the receptor surface is desolvated and interactions with the ligand are established possibly resulting in(local) conformational changes in the rcceptor. Finally, entropy is gained by the release of ordered solvent molecules from the interacting Iigand and receptor surfaces

accurate prediction of the binding properties of lead compounds by computational techniques can potentially reduce and guide experimental work. General applicability, computational speed and. of course. the ability to deliver accurate predictions of the binding free energy are the desired characteristics of such a scoring function.

The most commonly used technique for the simulation of intermolecular coin- plexation at the atomic level utilizes molecular mechanics force-fields, since quantum

100

Page 110: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-bonded Interaction Energies

mechanical calculations (although they have been reported [9]) are mostly unpracticalfor larger systems. Force-field based approaches have a solid foundation in physicalchemistry and the potential energy equations and parameters used to estimate inter-action enthalpies have been optimized to model quantum chemical and experimentalresults on the energetics and dynamics of various systems. Similarly, numerical solu-tions to the Poisson-Boltzmann equation [10] can provide estimates of binding freeenergies of systems involving charged molecules. Recently, a number of more empiricalapproaches have been developed. In some cases, such approaches cannot easily be inter-preted on a physical level; they are solely aimed at providing the best possible fitbetween experimental and calculated binding energies. In the following sections we willdiscuss the methodology, results, advantages and shortcomings of such strategies.

2. Force-Field Based Methods

Free energy perturbation (FEP) calculations are potentially the most accurate theoreticalapproach to the calculation of binding constants and free energies from structures ofreceptor-ligand complexes [11]. Unfortunately, the calculation of a single binding constantrequires several molecular dynamics simulations with long computation times. Moreover,since FEP calculations can be fairly sensitive to the starting geometry and parameters of thecomplex at hand as well as the precise protocol parameters, their practical use requirescareful preparation of input data and analysis of the results. Although good results havebeen obtained for the design of HIV protease inhibitors [12], FEP simulations are, ingeneral, still too time-consuming to be practically applicable in the prediction of bindingconstants of more than only a handful of chemically related ligands. Research in this fieldcontinues. however. and approximate methods have been reported allowing the estimation of relative binding free energies from only one or two molecular dynamics simulations [13,14]. Although these protocols provide a significant reduction of the time required tocalculate binding free energies, they remain too slow for the evaluation of thousands ofcoinpounds and. in addition, only structurally minor changes can be evaluated.

Since the direct calculation of binding free energies is still cumbersome, approxi-mations are needed to allow fast scoring of ligand-binding modes using molecularmechanics force-fields. Of the terms constituting the free energy, the entropy is still the

surface properties of the molecules under study [1,15]. The omitting of entropic factorseliminates the need to generate trajectories of conforinations with molecular dynamicsand yields a much more tractable scoring function. Even though this reduces the problem to energy minimization and evaluation of a single ligand-receptor complex,various parameters can still be varied in order- to improve the correlation with measuredbinding data. Recently, a number of studies have been reported that systematicallyinvestigated the correlation between the interaction energy of small molecule inhibitors with proteins and their experimentally determined binding constants, These will bediscussed i n more detail in the following paragraphs.

An early application of force-field energies in the ranking of receptor-bound smallmolecules was the incorporation of a grid-based representation of the AMBER force-

101

most difficult term to be predicted due to its intricate dependence on the dynamics and

Page 111: 3D QSAR in Drug Design. Vol2

field [16] in the molecular docking program DOCK [17]. Originally, only a simplefunction was used to account for van der Waals bumps between ligand and receptor[17], which was improved later to reflect shape Complementarity [18] and eventuallychemical complementarity by means of force-field scoring [19]. In order to optimizeand evaluate docked molecules, a rigid body minimization of the ligand is performedand ranking is based on the minimum interaction energy. Rather surprisingly, even ap-proaches with relatively crude approximations. such as evaluating rigid ligands and arigid receptor using a grid representation of the force-field. can still reproduce correctbinding orientations. A typical example is the application of DOCK to a homologymodel structure of cercarial elastase [20]. Here force-field scoring ranked the besttwo inhibitors identified in the fine chemicals directory of 55.313 compounds as 85th(Ki = 3 µΜ) and 627th (Ki = 6 µM), while three other inhibitors with Ki’s < 100 µMwere ranked 122nd, 918th and 561st. This indicates that considerable manual filteringis still required to select compounds for testing from DOCK output. Although the cor-relation between force-field score and binding constant appears to be low, the successfulidentification of diverse micromolar inhibitors given the inaccuracies of a homologymodel for the enzyme and rigid structures for the ligands remains an impressive result.

In order to rationalize the selection of parameters for force-field based scoring ofenzyme inhibitors. Grootenhuis and van Galen [21] applied several different energyminimization and evaluation protocols to a set of 35 non-covalently bound thrombin in-hibitors. Using starting coordinates from crystal structure determinations or hand-builtmodels, each complex was energy minimized using the CHARMm force-field [22] andits non-bonded interaction energy was evaluated. The influence of protein flexibility andthe balance between van der Waals and electrostatic contributions on the correlationbetween calculated and experimentally determined binding constants was tested forboth the minimization and interaction energy evaluation steps. Protein flexibility wasvaried between a completely flexible enzyme and a completely rigid protein via proto-cols where harmonic constraints of increasing strength were applied to the active site, while the rest of the protein remained fixed. In all cases, the ligand remained flexible.The balance of the electrostatic and van der Waals contributions to the total force-fieldscore was modified by using different effective dielectric functions D (= 4εr) in theCHARMm non-bonded potential energy function:

(2)

where A and B denote the Lennard-Jones parameters, q1 and q2 partial atomic charges and r the interatomic distance. The electrostatic interactions were either switched off orstepwise increased to vacuum conditions (ε = 1 ) via two distance-dependent dielectricfunctions for the energy minimization and interaction energy evaluation: ε = 4r and

= r , respectively. It was found that a protocol in which the entire enzyme was heldfixed during minimization without electrostatics, followed by evaluation using adielectric function ε = r, gave the best fitted regression coefficient (r²) of 0.66 with astandard deviation ( S D ) of 0.97 pKi units between energies and pKi’s Interestingly,receptor flexibility and electrostatic interactions appear to be of minor importance

102

Ronald MA. Knegtel and Peter D.J. Grootenhuis

Page 112: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-Bonded Interaction Energies

during the minimization phase in which an optimal fit between protein and ligand needsto be found. Scaled-down electrostatics were found to be essential, however, in the eval-uation phase.

A similar analysis was reported by Kurinov and Harrison [23], who used molecular dynamics to search the conformational space of phenylethylamine analogs. Scoring wasdone in vacuum using the UFF force-field [24] and Fourier-Green functions to acceler-ate the computations. For the 15 compounds examined, a r² of 0.58 with the experi-mentally determined was found, which improved to 0.74 when one outlier wasremoved. It was found that especially the use of an all-atom representation and theevaluation of all electrostatic interactions without the use of a distance cutoff werecritical to obtaining a good correlation.

Several groups have attempted to expand the molecular mechanics force-fields that are ordinarily used to include terms for desolvation and hydrophobic interactions. Lutyand co-workers [25] added a desolvation term to a grid representation of the AMBER[16]force-field of the form:

(3)

where the summation is performed over all ligand and receptor atom pairs. In Eq. 3,

which are multiplied by a Gaussian envelope function of width σ describing the volumeof displaced water. Although results were shown for the trypsin-benzamidine complexonly, the modified force-field was able correctly to dock benzamidine into the partially flexible active site. Results comparing the calculated and measured binding propertiesof protease inhibitors have not yet been reported. Yet another approach was describedby Viswanadhan et al. [26], who added an additional function describing hydrophobicinteractions to the AMBER [16]force-field:

(4)

where P is a scoring function which ranks hydrophobic contacts between receptor andligand atoms described by the constants ai and bj respectively, at a mutual distance rij.Its contribution to the overall score is scaled by a factor k. By comparing the energies ofsolvated and complexed states of 11 HIV-protease inhibitors relative binding energieswere obtained which were compared with experimental results. Interestingly, energiesobtained using only the standard force-field energies already yielded a r² of 0.85(SD = OS7 kcal/mol), which only slightly increased to 0.88 (SD= 0.58 kcal/mol) whenthe hydrophobicity term was included. However, the hydrophobicity score can be usefulin predicting the outcome of modifications of lead compounds which reduce or increasethe hydrophobic character of the molecule.

An interesting extension of the use of molecular mechanics energies in the prediction of binding properties was reported by Ortiz et al. [27] and termed comparative bindingenergy (COMBINE) analysis (see also the chapter by R.C. Wade in this volume,p. 19 ff)). In this approach, inhibitors of human synovial fluid phospholipase A2 (HSF-

were divided in small sections of which the bonded and non-bonded interaction

103

and f i are the respective solvation parameter and fragmental volume of a mobile atom i

PLA2)

Page 113: 3D QSAR in Drug Design. Vol2

Ronald M . A . Knegtel and Peter D.J. Grootenhuis

with individual residues in the binding site was evaluated. Also the intramolecular interac-tions among inhibitor fragments and active site residues were calculated and all energieswere stored i n a matrix. A column containing the experimentally determined activities (inpercentages) was added to the matrix which was subsequently reduced using partial leastsquares and D-optimal design techniques [28] .This procedure allows for the selection ofonly those energy terms that contribute significantly to differences i n binding free energy.The final model contained 50 energy terms (only 2% of the original matrix) which yieldeda r² of 0.92 (SD= 6.23% activity Direct correlation of binding energies with activitiesyielded only a correlation coefficient of 0.21 which is partially due to errors in modellingthe inhibitor conformations. an inaccurate description of the highly charged inhibitors and active site by the force-field and the fact that the percentage activity might not be linearlyrelated to the binding free energy. By selecting only the relevant energy terms, this methodprovides useful insights into important protein-ligand interaction. An additional advantageis that intramolecular contributions to binding energies can also be accounted for. Currentproblems involve the relatively arbitrary breakdown of the inhibitors into fragments and the selection of relevant variables from sets which are often highly collinear.

The use of force-field derived energies in the prediction of binding characteristics has proven to be useful in several cases. For instance. the development of Merck’s HIV-1protease inhibitor Crixivan was guided by manually docking novel inhibitors to the

al structure of the enzyme and evaluating the resulting complexes using energyminimization [29]. The observed correlations ( r² = 0.580.78) between binding energyand IC50s for known ligands allowed for the successful prediction of novel ligandsprior to actual synthesis and testing. Similarly, the interaction energies obtained afterenergy minimization of inhibitors docked to purine nucleoside phosphorylase (PNP)had enough predictive power to allow for the successful selection of inhibitors withimproved binding properties [30]. Force-field interaction energies also qualitativelydescribed the observed trend in binding energies of peptide-derived transition stateanalog inhibitors of thrombin [31].

Generally speaking, it seems that the force-field based methods do surprisingly well,even though entropic effects and (de)solvation are usually neglected. Apparently. whenthe change in entropy upon receptor binding is similar for a set of molecules, thebinding energy itself can have predictive power. Most of the studies mentioned above,however, report one or more outliers which notably decrease the correlation between calculated and measured binding energies and examples are known where the overall correlation appears to be quite low [27]. This casts some doubt on the general applica-bility, especially in cases where the ligands do not belong to a congeneric series, orwhen the selectivity of a compound with respect to two different proteins is an issue.Finally. most force-field based methods require in the order of minutes to minimize andevaluate ligand-receptor complexes, which can be detrimental when very large (> 104)numbers of compounds need to be screened.

3.

Several schemes have been proposed that utilize the linearized Poisson-Boltzmannequation for estimating binding free energies [10]. Usually the electrostatic con-

104

Methods for Solving the Poisson-Boltzmann equation

Page 114: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-bonded Interaction Energies

tributions are separated into a Coulombic term equal to that used in molecular mechan-ics force-fields (Eq. 2) and a polarization term due to interactions between polar atomsof the solvent and of the ligand and receptor. The energy due to polarization is a sumover all charges in the system where qi is the partial charge of an atom andthe electrostatic potential, which is obtained by solving:

(5)

where p ( r ) and are the electrostatic potential, the charge density and thedielectric function, respectively. This equation can be linearized and solved numericallyusing finite difference methods and a grid representation of the respective fields. Thefinite difference Poisson-Boltzmann (FDPB) method [10] is aimed at estimating thepolar (electrostatic) contribution to the hydration free energy. The apolar contribution to the free energy is usually estimated based on the solvent accessible surface that isburied upon complexation, often using a single solvation parameter (~ 20-25 cal/mol/Ų) for all atom types, although values of 5–40 cal/mol/Ų have been reported too.

The FDPB method has been applied successfully to a number of systems. Forinstance, a r² of 0.85 (with errors < 1 kcal/mol) was found for binding free energy dif-ferences calculated with the FDPB method for two receptor systems. arabinose (ABP)and sulphate-binding protein (SBP) [32]. In the case of ABP, five different ligands andfor SBP three point mutations were studied and calculated free energies deviated withless than 1 kcal/mol from the experimental values. Similar results (x ² goodness of fit0.32, average deviation from experiment within 0.5 kcal/mol) were reported by Zhangand Koshland [33],who examined 63 pairs of nine mutant proteins with seven R-malatesubstrate derivatives. The accuracy of their predictions appeared not to be highly sens- itive to small changes in the charges used for the protein. A similar result was obtainedfor the ligands charges. which were derived from the CHARMm [22] force-field para-meters for amino acids with similar functional groups as the substrate analogs.Nevertheless, again one outlier remained unexplained. The nonpolar contribution to thefree energy was clearly required to achieve the best correlation with experiment.Qualitative agreement with experimental data can, in some cases, be achieved when thenonpolar term is entirely omitted, as was demonstrated by the results of Jedrzejas et al. [34] on inhibitors of neuramidase. In addition, work has been reported on the intro-duction of molecular flexibility in FDPB calculations by means of Monte Carlo con-formational searches [35]. Recent work in our department [Engels et al., preparation]complemented the FDPB method with two fitted terms for the nonpolar contributionand the loss of translational and rotational entropy for the ligand, respectively. Thisadaptation yielded a r² of 0.74 and an error of 1 unit for calculated and measuredbinding free energies of 36 serine protease-inhibitor complexes.

The FDPB method appears to be quite suitable for systems which are highly chargedbut has a number of potential difficulties which make routine application non-trivial; unfortunately, no definite protocol exists for the choice of dielectric constants for the protein and Iigand, the treatment of dielectric boundaries in the system and the deriva-tion of partial charges. A more detailed discussion of the sources and magnitude oferrors in FDPB calculations is given by Shen and Wendoloski [32] .In addition, the

105

Page 115: 3D QSAR in Drug Design. Vol2

nonpolar contribution is rather crudely estimated based on the accessible surface, whichmakes the method less reliable for systems where a large hydrophobic component ispart of the binding free energy.

4. Empirical Scoring Functions

4.1. Models derived by fitting procedures

Since scoring schemes based upon physical principles such as molecular mechanicsforce-fields and methods based on solving the Poisson-Boltzmann equation are limitedin their predictive power, alternative scoring schemes have been devised that no longerattempt to derive the interaction energies from first principles. Such methods attempt tofind appriopriate mathematical functions that can be fitted to experimental binding data.Although in most cases the scoring function is still interpretable from a physical pointof view, often some of the atomic detail is lost and transferred to surface properties andadditional terms aimed at modelling entropic effects such as ligand flexibility, desolva-tion and hydrophobicity.

Probably the most widely used empirical scoring scheme is the one proposed byBöhm [36] which is aimed at rapid calculation of binding energies for use in moleculardocking and de novo design. The binding free energy function consists of five differentterms which are listed below:

(6)

where denotes a penalty function which accounts for deviations from idealhydrogen-bond geometries characterized by the deviations of the ideal donor acceptordistance and angle denotes a constant that can be interpreted as the lossof translational and rotational entropy of the ligand upon binding. andrepresent the contributions of unperturbed hydrogen bonds and ionic interactions,respectively. is the contribution of lipophilic interactions assumed to be pro-portional to the lipophilic contact surface which is estimated using a coarse gridmethod. Finally, accounts for the loss of entropy due to bond rotations within theligand which become fixed upon complex formation and is multiplied by NROT, thenumber of rotatable bonds.

This relatively simple function was fitted to 45 protein-ligand complexes taken from

known. The best fit was obtained when all five parameters were adjustable during

tainty of 1.4 orders of magnitude in Interestingly, with a minimum correlationcoefficient of 0.821 among five different versions, the scoring function does not appearto be very sensitive to changes in its composition such as making equal tosetting to a fixed value or removing the hydrogen-bond perturbation penalty func-tion. On the basis of complexes for which the binding constant could not be accurately

106

Ronald M.A. Knegtel and Peter D.J. Grootenhuis .

the protein database (PDB) for which the (ranging from 25 mM to 40 fM) was

fitting and yielded a r² of 0.762 with a standard deviation of 8.7 kJ/mol — i.e. an uncer-

Page 116: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-Bonded Interaction Energies

predicted, several possible sources of error were suggested. For instance, the scoringfunction is not sensitive to differences in the strengths of hydrogen bonds and does notaccount for more exotic interactions such as those between quaternary ammoniumgroups and aromatic rings [37]. Secondly, any conformational strain induced in eitherthe protein or the ligand is not taken into account. Also interactions involving solventmolecules are not implemented in the scoring function.

The VALIDATE approach [38] presents a hybrid approach by combining force-fieldderived energies with terms for the octanol-water partition coefficients, steric fit, rotat-able bonds, ligand strain energy and four terms describing polar/nonpolar (non)comple-mentarity. In particular, the terms penalizing for surface noncomplementarity are ofinterest since this aspect of scoring is rarely accounted for. Using a training set of 51complexes fitted by neural network and partial least-squares methods, a r² of 0.85 wasfound with an error of 1.01 log units. The model was subsequently tested on three setsof protein-ligand complexes and yielded good results for crystal structures but lessgood results for modelled HIV-protease inhibitors and thermolysin inhibitors, in par-ticular. Interestingly, the electrostatic energy contributed only 3% to the model, whichcould account for the disappointing results for the thermolysin data (where a zinc ionand phosphates are involved in binding). Also other terms like steric fit and nonpolar(non)complementarity contributed only marginally to the scoring function, independenton the method of fitting. The sensitivity of the method to starting geometries remains aserious problem.

In order to overcome possible dependence on the starting geometries of the ligand,scoring functions have been designed aimed at simplifying the energy landscape of aligand that is to be placed in the binding pocket of its receptor. To increase the speedand reproducibility of placing and optimizing the conformation of a bound ligand, theenergy function has to be smooth and contain unique minima corresponding to thenative binding mode. A simple piecewise linear potential energy function was used in agenetic algorithm docking approach by workers at Agouron Pharmaceuticals [39] forflexible docking of HIV-1 protease and FKBP-12 complexes. It was shown that using ascoring function that in its simplicity resembles a square-well potential comparable tothat used in macromolcular docking [40], flexible ligands could be docked to conforma-tions close to those observed in crystal structures. The selection of meaningful parame-ters for the repulsive part of the energy function turned out to be critical, since smallvalues enhance the possibilities of the search method to overcome energy barriers,while higher values reproduce steric effects much more realistically. Although thissimplified scoring function clearly aids in finding correct binding orientations forcomplex flexible ligands, it does not allow for an accurate comparison of bindingenergies of different ligands or the prediction of binding energies.

An attempt at solving both these problems — i.e. improve search performance whilecorrelating the resulting scores with binding affinities — was published recently [41].Here a continuous diffferentiable scoring function was obtained by combining nonlinearGaussian-like and sigmoidal functions to describe hydrophobic and polar interactions.Directionality and charge effects are taken into account for hydrogen-bonding inter-actions. Additional terms include solvation effects (estimated using the difference in the

107

Page 117: 3D QSAR in Drug Design. Vol2

Ronald M.A. Knegtel and Peter D.J. Grootenhuis

total number of potential hydrogen bonds and the actual number of hydrogen bonds inboth ligand and receptor) and entropic factors (estimated using the number of rotatablebonds and logarithm of the molecular weight of the ligand). A calibration data set of 34complexes was used to determine the 17 adjustable parameters of the model. Bestresults were obtained when the ligands were energy-minimized in vacuo to remove con-formational strain and their pose in the binding site was optimized during fitting of theparameters. The final scoring function achieved a r² of 0.90 and a mean error of 0.72 logunits. Decomposition of the scores showed that 44% and 26% can be attributed tohydrophobic and polar interactions, respectively, 25% to entropic effects and only 5%to the solvation term. The decomposition of this scoring function is very similar to thatof Böhm’s function [36], although the contribution of the loss of translational and rota-tional entropy is estimated to be higher. Estimates for this term based on experimentalresults vary wildly, however, and the interpretation of differences in this parameter isnot straightforward. The strong dependence on basic hydrophobic and electrostaticterms agrees with the observation that force-field based scoring methods can also stillaccount for a large body of binding data and suggests that they could be improved byaccounting for the entropic effects.

4.2. Solvent accessible surface-based methods

A different approach to predicting binding affinities from structural data acknowledgesthat all the important thermodynamical quantities involved are highly correlated withthe size and composition of the surface that is buried upon complexation [42]. Thus, acorrelation was established by Bohacek and McMartin [43], who noted a quantitativerelationship between the complementarity of polar and nonpolar solvent accessible sur-faces and inhibitor potency in thermolysin-inhibitor complexes. Extending the pioneer-ing work of Eisenberg and McLachlan [44], Horton and Lewis [45] demonstrated thatthe free energy of binding can be correlated to the solvent accessible surfaces of polarand nonpolar atoms by fitting the equation:

(7)

where and are the free energies of solvation for apolar and polar atoms,respectively, as derived from the difference in solvent accessible surface area betweenthe bound and unbound species summed over each atom i and multiplied by anatomic solvation parameter [44]:

(8)

The term is a parameter which represents the free energy change due to loss ofrotational and translational freedom of the ligand, which, like and ß, is adjustedduring the fitting procedure. Fitting of Eq. 7 to the experimental binding free energies of15 rigid enzyme-inhibitor complexes yielded a r² of 0.92. Even though the test set usedis not very diverse and it is not clear how sensitive the method is to small changes ininhibitor and receptor, the results encouraged further study of the role of buried surface

108

Page 118: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-Bonded Interaction Energies

area in molecular association for a large number of systems [46,47]. A more detailedanalysis of protein-ligand interfaces was provided by Wallqvist et al. [48], who exam-ined 38 protein-protein and protein-ligand complexes taken from the PDB and ex-tracted atom-atom preference scores using the buried interfacial area. For everyatom-atom combination (using 21 extended atom types), its preference of occurrence atan interface is calculated as the ratio of the fraction of the total interfacial area Atotcontributed by each atom pair normalized by the product of the contributions ofeach atom separately:

(9)

Atom-atom pairs with a high value occur with high frequency at adjacent surfaces inthe complexes studied and such combinations are therefore assumed to be thermo-dynamically favorable. Using these atomic preference parameters, an estimate for thefree energy of binding can be obtained by fitting:

(10)

with

(11)

to the experimental determined free energy changes by varying the parameters ß, δ andA fit with an rms deviation of 1.5 kcal/mol and a r² of 0.55 was obtained and used to

interpret the binding properties of several HIV- 1 protease inhibitors. The derivedscoring function has also been successfully applied in molecular docking [49]. An

109

Page 119: 3D QSAR in Drug Design. Vol2

Ronald M.A. Knegtel and Peter D.J. Grootenhuis

attractive characteristic of this approach is that the contribution to the free energy ofbinding by interfacial atoms can be displayed on the solvent accessible surface. Even though the fit between measured and predicted free energies is not perfect, i t allowsregions i n the ligand to be identified which do not yet have an optimal interaction withthe protein and are thus suitable for modification, as well as residues in the proteinwhich are crucial to ligand binding. A comparable method allowing the visualization ofbinding properties by analyzing the spatial distribution of different atom types around3-atom fragments for 83 high resolution crystal structures was recently published [50],although no direct correlation with binding affinities was made.

Also the work of Verkhivker [51] is based on the analysis of protein-ligand interatom distances and allows for the detailed decomposition of the free energy of binding. Inthis case, the scoring function consists of knowledge-based ligand-protein interactionpotentials. similar to those used i n protein-folding simulations. This function is com-plemented by terms for the desolvation of different types of surfaces based on Eq. 8 andterms accounting for the loss of rotational and translational entropy. Distance-basedpotentials were derived from seven HIV- 1 protease/inhibitor crystal structures by count-ing the frequencies with which certain contacts occurred and for this set of inhibitorsbinding free energies were calculated. A good correlation with experimental data wasfound and a detailed analysis of the different contributions to total binding affinity wasgiven. The analysis is, however, limited to HIV- I protease inhibitors, which were alsoused for the derivation of the ligand-protein pairwise potentials.

A similar approach was described by DeWitte and Shakhnovich [52], who derivedknowledge-based potentials for a de novo design approach (SMoG). A total of 125protein-ligand complexes were used for counting atom-atom contacts within 5 Å ofeach other, while using the average probability of all possible contacts as the referencestate. Correlation coefficients obtained by comparing calculated binding constants withexperimental binding data were in the range 0.77–0.81 for 3 protein-ligand systems,which is an encouraging result given the fact that no fitting was applied. Although theuse of techniques from protein-folding simulations is very interesting, application to abroader range of ligand-protein complexes is required for a proper evaluation of thismethod. An overview of the methodology discussed in sections 2 and 4 is given i nTable 1.

5. Conclusion

Progress has been made in the development of methodologies capable of predictingbinding affinities of protein-ligand complexes over the last few years. Simple force-field minimization and energy evaluation perform reasonably well for series of similarcompounds. Finite difference Poisson-Boltzmann methods are preferred for highlycharged systems. The development of empirical scoring functions, capable of account-ing for different entropic contributions and often with improved search profiles, is afield of intense activity and holds considerable promise for the future. Although a uni-versal and highly accurate scoring function may still be out of reach, current approachesare often capable of yielding reasonable correlations for sets of similar ligands. Areas of

110

Page 120: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-Bonded Interaction Energies

possible improvement include the correct treatment of electrostatics and bound solventmolecules. ligand and receptor flexibility and (de)solvation. A clear bottleneck forscoring function development is the rather limited number of high-resolution crystalstructures for which accurate thermodynamic binding data are available.

We feel that in the practice of structure-based drug design the error in predicting thebinding of a ligand to a receptor is often significantly larger than the errors typically re-ported in the literature. It seems that a good statistical performance of a scoring functionprovides no guarantee for a high predictive or extrapolative power. In order to assess re-alistically the predictive power of a scoring function and to monitor the progress beingmade with the development of scoring functions, we would advocate to extend theCASP (Critical Assessment of Techniques for Protein Structure Prediction) -trials[53,54] with test cases relevant to receptor-ligand scoring. Such a test case could com-prise a set of high-resolution structures of diverse protein-ligand complexes for whichpreferably accurate microcalorimetric data should be available. The (in)capacity topredict correctly the binding affinity of truly novel inhibitors would provide a clearcriterion to judge the qualities of different scoring methods. In addition, the availabilityof microcalorimetric data would allow for a more detailed comparison between theexperimentally determined changes in enthalpy and entropy and those suggested by decomposition of predicted binding free energies and could be used to improve theexisting scoring schemes.

Acknowledgement

The authors wish to thank Drs. M. Engels and V.J. van Geerestein for their usefulsuggestions and critical reading of the manuscript.

References

1.2.

3.

4.

5.

6.

Janin, J., Elusive affinities, Proteins: Struct. Funct. Genet., 21 (1995) 30–39.Ajay and Murcko. M.A., computational methods to predict binding free energy in Ligand-receptor com-plexes J. Med. Chem., 38 (1996)4953–4967.

Verlinde C.L.M.J. and Hol, W.G.J., Structure-based drug design: Progress, results and challenges,Structure 2 (1994) 577–587.Böhm, H.-J., Current computational tools for de novo ligand design Curr, Opin. Biotech., 7 (1996)433–436.Clark. M., Cramer, R.D., III, Jones. D.M., Patterson, D.E. and Simeroth, P.E., comparative molecular

110 (1988) 5959–5967.Clark, M., Cramer, R.D., III, Jones, D.M., Patterson. D.E. and Simeroth, P.E., comparative molecularfield analysis (CoMFA): 2. Towards its use with 3D-structural databases, Tetrahedron Comput.Methodol., 3 (1990)47-59.

the binding of thrombin inhibitors, In Wipff, G. (Ed.) Computational approaches i n supramolecularchemistry, Kluwer Academic Publishers, Dordrecht (NI), 1994, 137–149.

7 .

8.

111

Böhm, H.-J. and Klebe, G., What can we learn from molecular recognition inprotein-ligand complexes for the design of new drugs?, Angew. Chem. lnt. Ed. Engl. 35 (1996) 2588–2614.

field analysis (CoMFA): I. Effect of shape on binding of steriods to carrier proteins , J. Am. Chem. Soc.,

Grootenhuis. P.D.J. and van Helden, S.P., Rational approaches towards protease inhibition: Predicting

Page 121: 3D QSAR in Drug Design. Vol2

Ronald M.A. Knegtel and Peter D.J. Grootenhuis

9. Perakyla, M. and Pakkanen, T.A., Model assembly study of the ligand binding by p-hydroxybenzoate hy-droxylase: Correlation between the calculated binding energies and the experimental dissociation con-stants, Proteins: struct. Funct. Genet., 21 (1995) 22-29.Cramer, C.J. and Truhlar, D.G., Continuum solvation models: Classical and quantum mechanical imple-mentations. In Lipkowitz, K.B. and Boyd, D.B. (Eds.) Reviews in computational chemistry, 6, VCHPublishers Inc., New York, 1995,pp. 1-72.Kollman.P., Free energy calculations: Applications to chemical and biochemical phenomena, Chem.Rev., 93 (1993) 2395-2417.Rao. B.G., Tilton, R.F. and Singh, U.C., Free energy perturbation studies on inhibitor binding to HIV-1proteinase, J. Am. Chem. Soc., 114 (1992) 4447–4452.Aqvist, J. and Mowbray, S.L., Sugar recognition by a glucose/galactose receptor: Evaluation of bindingenergetics from molecular dynamics simulations, J. Biol. Chem., 270 (1995) 9978-9981.Liu, H.Y., Mark, A.E. and van Gunsteren, W.F., Estimating the relative free energy of different molecu-lar stawtes with respect to a single reference state, J. Phys. Chem., 100 (1996) 9485–9494. Finkelstein, A.V. and Janin, J.. The price of lost freedom: Entropy of biomolecular complex formation, Protein Eng., 3 (1989) 1-3.Weiner. S.J., Kollman, P.A., Case. D.A., Singh, U.C., Ghio, C., Alagona, G., Profeta, S. and Weiner,P.K. A new force field for molecular mechanics simulation, J. Am. Chem. Soc., 106 (1984) 765–784. Kuntz, I.D., Blancy, J.M., Oatley, S.J., Langridge, R. and Ferrin. T.E., A geometric approach to macro- molecule-ligand interactions, J. Mol. Biol., 161 (1982) 269-288.DesJarlais, R.L., Sheridan. R.P., Seibel, G.L., Dixon, J.S., Kuntz, I.D. and Venkataraghavan, R., Usingshape complementarily as an initial screen in designing ligands for a receptor binding site of known three-dimensional structure, J. Med. Chem. 31 (1988) 722-729.Meng, E.C., Shoichet, B.K. and Kuntz, I.D., Automated docking using grid-based energy evaluation,J. Comp. Chem., 13 (1992) 505-524.Ring, C.S., Sun, E.. McKerrow, J.H., Lee. G.K., Rosenthal, P.J., Kuntz, I.D. and Cohen, F.E., Structure-based inhibitor design by using protein models f o r the development of antiparasitic agents, Proc. Natl.Acad. Sci. USA, 90 (1993) 3583-3587.Grootenhuis, P.D.J. and van Galen, P.J.M., Correlation of binding affinities with non-bonded interactionenergies of thrombin-inhibitor complexes. Acta Cryst., D51 (1995) 560-566.Brooks, B., Bruccoleri, R., Olafson, B., States, D., Swaninathan, S. and Karplus, M., Charmm: Aprogram for macromolecular energy minimization and molecular dynamics calculations, J. Comp.Chem., 4 (1983) 187-217.Kurinov, I.V. and Harrison, R.W., Prediction of new serine proteinase inhibitors, Nature Struct. Biol.,1 (1994) 735-743.Rappé, A.K., Casewit, C.J., Colwell, K.S., Goddard. W.A.I. and Skiff. W.M., UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations, J. Am. Chem. Soc., 114

Luty, B.A., Wasserman, Z.R., Stouten, P.F.W., Hodge, C.N., Zacharias, M. and McCammon, J.A., Amolecular mechanics/grid method for evaluation of ligand-receptor interactions, J. Comp. Chem., 16 (1995) 454–464Viswanadhan, V.N., Reddy, M.R., Wlodawer, A., Varney, M.D. and Weinstein, J.N., A n approach torapid estimation of relative binding affinities of enzyme inhibirors: Application to peptidomimetic inhibitors of the human immunodeficiency virus type 1 protease, J. Med. Chem., 39 (1996) 705–712. Ortiz, A.R., Pisabarro, M.T., Gago, F. and Wade, R.C., Prediction of drug binding affinities by com-parative binding energy analysis, J. Med. Chem., 38 (1995) 2681-2691.Mitchell, T.J., An algorithm for the construction of ‘D-optimal’ experimental designs, Technometrics,16 (1974) 203-210.Holloway, K.M., Wai, J.M., Halgren, T., Fitzegerald, P.M.D., Vacca, J.P., Dorsey, B.D., Levin, R.B.,Thompson, W.J., Chen, L.J., deSolms, S.J., Gaffin, N., Ghosh, A.K., Giuliani, E.A., Graham, S.L.,Guare, J.P., Hungate, R.W., Lyle, T.A., Sanders. W.M., Tucker, T.J., Wiggins, M., Wiscount, C.M.,Woltersdorf, O.W., Young, S.D., Darke, P.L. and Zugay, J.A., A priori prediction of activity for HIV-1protease inhibitors employing energy minimization in the active site, J. Med. Chem., 38 (1995)

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

(1992) 10024-10046.25.

26.

27.

28.

29.

305-317.

112

Page 122: 3D QSAR in Drug Design. Vol2

Binding Affinities and Non-Bonded Interaction Energies

Babu, Y.S., Ealick, S.E., Bugg, C.E.,Erion, M.D., Guida, W.C., Montgomery, J.A. and Secrist, J.A., III,Structure-based design of inhibitors of purine nucleoside phosphorylase, Acta Cryst., D51 (1995)

31. Jetten, M., Peters, C.A.M., Visser, A., Grootenhuis, P.D.J., van Nispen, J.W. and Ottenheijm, H.C.J.,Peptide-derived transition state analogue inhibitors of thrombin; Synthesis, activity and selectivity,Bioorg. Med. Chem., 3 (1995) 1099-1114.Shen, J. and Wendoloski, J., Electrostatic binding energy calculation using the finite difference solutionto the linearized Poisson-Boltzmann equation: Assessment of its accuracy, J. Comp. Chem., 17 (1996)

33. Zhang, T. and Koshland, D.E., Jr., Computational method for relative binding energies of enzyme-substrate complexes, Prot. Sci. 5 (1996) 348-356.

34. Jedrzejas, M.J., Singh, S . , Brouillette, W.J., Air, G.M. and Luo, M., A strategy for theoretical bindingconstant, Ki, calculations for neuramidase aromatic inhibitors designed on the basis of the active sitestructure of influenza virus neuramidase, Proteins: Struct. Funct. Genet. 23 (1995) 264–277.

35. Zacharias, M., Luty, B.A., Davis, M.E. and McCammon, J.A., Combined conformational search andfinite-difference Poisson-Boltzmann approach for flexible docking: Application to an operator mutation in the lambda repressor-operator complex, J . Mol. Biol., 238 (1994) 455–465.Böhm, H.-J., The development of a simple empiric scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure, J . Comput.-Aided Mol. Design,8 (1994) 243-256.

37. Dougnerty, D.A. and Stauffer, D.A., Acetylcholine binding by a synthetic receptor: Implicationsfor bio-logical recognition, Science, 250 (1990) 1558-1560.

38. Head, R.D., Smythe, M.L., Oprea, T.I., Waller, C.L., Green, S.M. and Marshall, G.R., VALIDATE: Anew method for the receptor-based prediction of binding affinities of novel ligands, J. Am. Chem. Soc.,118 (1996) 3959-3969.Verkhivker, G.M., Rejto, P.A., Gehlhaar, D.K. and Freer, S.T., Exploring the energy landscapes of mol- ecular recognition by a genetic algorithm: Analysis of the requirements f o r robust docking of HIV-1protease and FKBP-12 complexes, Proteins: Struct. Funct. Genet., 250 (1996) 342-353.

40. Knegtel, R.M.A., Rullman, J.A.C., Boelens, R. and Kaptein, R., MONTY: A Monte Carlo approach toprotein–DNA recognition, J. Mol. Biol., 235 (1994) 318–324.

41. Jain, A.N., Scoring noncovalent protein–ligand interactions: A continuous differentiable function tunedto compute binding affinites, J. Cornput.-Aided Mol. Design, 10 (1996) 427–440.

42. Novotny, J., Bruccoleri, R.E. and Saul, F.A., On the attribution of binding energy in antigen-antibodycomplex MCPC 603.D1.3 and Hyhel-5,Biochemistry, 28 (1989) 4735–4749.

43, Bohacek, R.S. and McMartin, C. Definition and display of steric, hydrophobic and hydrogen-bondingproperties of ligand binding sites in proteins using Lee and Richards’ accessible surface: Validation ofa high-resolution graphical tool for drug design, J. Med. Chem., 35 (1992) 1671–1684. Eisenberg, D. and McLachlan, A.D., Solvation energy in protein folding and binding, Nature, 319(1986) 199– 203.Horton, N. and Lewis, M., Calculation of the free energy of association forprotein complexes. Prot.Sci., 1(1992) 169-181.Krystek, S., Stouch, T. and Novotny, J., Affinity and specificity of serine endopeptidase–protein inhibitor interactions: Empirical free energy calculations based on crystallographic studies, J . Mol. Biol., 234

receptor-ligand binding free energies, Biochemistry, 33 (1994) 13977-13988.Wallqvist, A., Jernigan, R.L. and Covell, D.G., A preference-based free energy parameterization ofenzyme- inhibitor binding: Applications to HIV-1 protease inhibitor design, Prot. Sci., 4 (1995)1881-1903.

49. Wallqvist, A. and Covell, D.G., Docking enzyme-inhibitor complexes using a preference-based freeenergy surface, Proteins: Struct. Funct. Genet. 25 (1996) 403–419.

50. Laskowski, R.A., Thornton, J.M., Humblet, C. and Singh, J., X-SITE: Use of empirically derived atom packing preferences to identify favorable interaction regions in the binding sites of proteins, J . Mol.Biol. 259 (1996) 175-201.

30.

529-535.

32.

350-357.

36.

39.

44.

45.

46.

(1993) 661-679.47.

48.

113

Vajda, S., Weng, Z., Rosenfeld, R. and DeLisi, C., Effect of conformational flexibility and solvation on

Page 123: 3D QSAR in Drug Design. Vol2

Ronald M.A. Knegtel and Peter D.J. Grootenhuis

ligand-protein crystallographic complexes: 1. Knowledge-based ligand-protein interaction potentialsbinding affinity, Protein Eng.

11733-11744.Moult, J.. The current state of the art in proteiin structure prediction, Curr. Opin. Biotech., 7 (1996)322-127.

8 (1995) 677–691

53.

54.

51. Verkhivker, G., Appelt, K., Freer, S.T. and Villafranca, J.E., Empirical free energy calculations of

applied to the prediction of human immunodeficiency virus protease I

52. DeWitte, R.S. and Shakhnovich. E.I., SMoG: De novo design method based on simple, fast, and accu-rate free enerrgy estimates: I. Methodology and supporting evidence, J. Am. Chem. Soc., 118 (1996)

Eisenberg, D. Into the black of night, Nature Struct. Biol., 4 (1997) 95-97.

114

Page 124: 3D QSAR in Drug Design. Vol2

Molecular Mechanics Calculations on Protein-LigandComplexes

Irene T. Weber and Robert W. HarrisonDepartment of Microbiology and Immunology, Kimmel Cancer Center; Thomas Jefferson

University, 233 South 10th Street, Philadelphia PA 19107, U.S.A.

1. Introduction

In order to understand the structures and energetics of protein- ligand complexes, wehave chosen to run relatively short, but accurate, molecular mechanics calculations onseveral related complexes. In contrast, the estimation of free energy differences by thefree energy perturbation or thermodynamic integration methods requires extensive andtime-consuming molecular dynamics simulations for each protein- ligand state (forreview see [1]). Recently, we and other groups have found that simple molecularmechanics energy minimization can give intermolecular interaction energies that cor-relate well with trends in measured binding energies [2- 5]. The aim of the molecularmechanics calculations is to reproduce the physics and chemistry of interactions i nprotein- ligand complexes without empirical corrections or restraints. Therefore, wehave carefully evaluated the factors influencing the accuracy of the calculations. Thefundamental improvement in accuracy of the new molecular mechanics and dynamicsprogram, AMMP [6] is due to the inclusion of all long-range non-bonded energies andall hydrogen atoms. Tests of these improvements are described in references [7–9].These improvements and other factors influencing the accuracy of the calculations are discussed. Molecular mechanics calculations with AMMP have been shown to agreewith a variety of independent experimental data. The agreement verifies the accuracy ofthe potentials and optimization procedures. Spectral data were used to improve the

sitions that are within the experimental error in the crystal structures of proteins [9].Finally, the protein- ligand interaction energies were shown to correlate with free energydifferences derived from kinetic data [2,3,7].

Molecular mechanics is a classical approximation to the inherently quantum problemof molecular chemistry. Therefore, the making and breaking of chemical bonds cannotbe treated correctly in the standard formulation. However, molecular mechanics is validfor the treatment of equilibrium systems or ‘states’.The choice of states will depend onthe specific protein- ligand system. The choice is particularly critical for calculations onenzyme-substrate complexes where the enzyme catalyzed reaction involves a change ina chemical bond. It follows that the key problem in the application of molecularmechanics to understanding enzyme catalyzed reactions is the choice of physically validstates with intact bonds.

Two types of protein- ligand complexes have been studied. Firstly, a protein- ligandcomplex that is formed without large conformational changes in protein or ligand andwithout a change in chemical bonds. In this case, only two states need to be considered— the protein- ligand complex is compared to the free protein and ligand with the

Volume 2. 115-127.© 1998 Kluwer Academic Publishers. Printed in Great Britain.

potential set [2] . Minimization of protein structures was shown to result in atomic po-

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design,

Page 125: 3D QSAR in Drug Design. Vol2

Irene T. Weberand Robert W. Harrison

displacement of the bound waters. Calculations were used to predict novel inhibitors oftrypsin [7]. Secondly, enyzme-substrate complexes have been analyzed, where theenzyme catalyzed reaction involves a change in a chemical bond. The calculations havebeen successfully applied to calculation of the HIV proteaseesubstrate interaction ener-gies [ 2 ] , and modelling the structure and energetics of glucokinase with different sugarsubstrates [3]. The calculated protein- ligand interaction energies were shown to cor-relate with the free energy differences calculated from kinetic measurements, despitethe absence of bulk water in the calculation.

2.

In order to be useful, the molecular mechanics energy must be accurately calculated.Molecular mechanics potentials typically consist of two sets of terms: one set repro-duces the bond and angle geometry of the molecules, and the other set reproduces thelong-range terms such as electrostatics and van der Waals forces. Ideally, the bond andangle terms are defined in a self-consistent manner, so that they have a energy minimumat an unstrained structure. Strain is introduced by the non-bonded terms; the predicted energy and structure are the result of the interplay between the two types of potentials.

Since the molecular shape is determined by the geometric arrangement of the atoms,the bond and angle terms are important for accuracy. The ability of the structures ofboth the protein and ligand to deform upon complex formation depends on these terms.The parameterization is especially important when protein- ligand complex formationrequires changes in structure or strong interactions such as hydrogen bonds. There aremany different potential sets including AMBER [10], MM3 [11], OPTIMOL [4], andUFF [12].We started with the UFF potential set and partial charges from AMBER. TheUFF potential set was chosen because its mathematical form reflects the Feynmann-Hellman theorem resulting in highly efficient parameterization, and because it is readilyadaptable to a wide range of chemistry. Other choices, such as AMBER, would requirethat we relate a novel compound to a known amino acid or nucleotide, which is notalways possible. However, we found that it was advantageous to perform minor re-parameterization. Improving the fit to molecular geometry and spectral data improved the agreement when calculating binding energy [2]. I t was especially useful to useisotope effects in infra-red spectral data to check that the force terms were well para-meterized. Since there are often fewer absorption and Raman bands than adjustableparameters. the ability to reproduce shifts in spectra upon deuteration confirms thepotential more than does the ability to reproduce a single data set. These improvementsresulted in better agreement between minimized and experimental protein structures.The improvements in AMMP also make it possible to perform stable molecular dyna-mics simulations on nucleic acids in the absence of special hydrogen-bonding restraints.

The ability of a ligand to fit into a binding site on a protein depends on the molecularshape and the non-bonded forces. The molecular shape is inadequately parameterized by a united atom potential where the apolar hydrogens are joined into larger unitedatoms. An ‘all-atom’ potential should be used, as shown in references [7,8]. However,the binding energy depends on the electrostatic energy, as well as the similarity in

Accuracy in Molecular Mechanics Calculations

116

Page 126: 3D QSAR in Drug Design. Vol2

Molecular Mechanics Calculations on Protein-Ligand Complexes

shape. Since electrostatics are observable on the macroscopic scale, it is critically in-portant that they be accurately calculated on the microscopic scale by including alllong-range terms [6,13]. Since the number of terms in the electrostatic energy scaleswith the square of the number of atoms in the structure. simple implementations of thiscalculation are prohibitively expensive for calculations on proteins with several thou-sand atoms. Therefore, most calculations have introduced a distance cutoff to make thecalculations faster. This cutoff is often physically inappropriate. Modern approachesincrease the speed of the calculation by changing the algorithm rather than using acutoff radius. AMMP uses a dual multipole algorithm for the efficient calculation oflong-range forces [8]. This algorithm is comparable in speed to the standard treatmentwith a 8-10 Å cutoff radius. A local multipole expansion is used for the long-rangeinteractions around each atom. interactions from atoms more distant than 6 Å areapproximated with a quadratic expansion. The energy or force from distal atoms is cal-culated from the local expansion, and the energy from the local atoms is calculated ex-plicitly. The local expansion is updated when the atoms have moved far enough. Thisalgorithm is more efficient because the local expansion is not calculated every time theenergy or force is calculated. This expansion is a solution to LaPlace’s equation and hasa convergence radius of about 1 Å. Higher-order expansions were not found to be helpful, because the limit in convergence is clue to changes in the positions of atoms at adistance rather than failure of the expansion [8].For large structures, it is more efficientto use the Fast Multipole Method to calculate the potential and the local expansion[14,9].

The model should be optimized with the potential before calculating the interactionenergy. There are many optimization strategies. Often, a high-quality structural modelwith experimentally determined water structure is available as a starting model for the

quasi-Newton methods are usually sufficient. AMMP uses conjugate gradients with thePolak-Ribiere beta and an inexact line search [8].Convergence is monitored by theChebeshev or l∞ norm on the force, rather than the quadratic or l2 norm. The false ap-pearance of convergence can occur with the quadratic norm when only a small numberof atoms are in bad positions. A randomization step is used to insure that the currentposition is not at a stationary point, but is indeed an energy minimum [9]. The ran-domization is done with a short run of molecular dynamics followed by further op-timization. When a good starting model is not available, other strategies should be used.These strategies include exhaustive searching with the Fourier Green’s function [7,15]or 4-d embedding and closely related homotopy methods [2,9,16]. Failure to produce amodel that closely approximates the correct complex will result in no agreementbetween the calculated and observed binding energies.

3.

Minimization of protein crystal structures without restrictions on atomic positions hasbeen optimized, so that the minimized structures are essentially identical to the starting crystal structures. The minimization of the protein introduces differences, probably due

Agreement with Atomic Positions in Protein Crystal Structures

117

protein-ligand complex. In this case, local optimizations like coli-jugate gradients or

Page 127: 3D QSAR in Drug Design. Vol2

Irene T. Weberand Robert W. Harrison

to limitations in the potentials and the search procedures. AMMP minimization of thecrystal structures of proteins resulted in root mean square [RMS] differences forminimized coinpared to crystal structure of 0.40 0.51Å for main chain atoms and 0.52-0.74 Å for side chain atoms [9]. These values are well within the range of 0.1 6-0.79 Å for RMS differences observed between main chain atoms in differentcrystal forms of the same protein [1 7,18] and less than the range 1.32-1.68 Å for thepositions of side chain atoms in three different crystal forms of bovine pancreatictrypsin inhibitor [17].These differences due to minimization with AMMP are within therange of experimental differences in protein crystal structures, and verify the accuracyof the potentials and minimization procedure.

4. Prediction of Differences in Free Energy of Binding for Protein-LigandComplexes

The differences in free energy for formation of various protein-ligand complexesare estimated from molecular mechanics calculations and correlated with the differencesin free energy derived from kinetic measurements.

The thermodynamic free energy = -

The calculated interaction energy is estimated from the total electrostatic and non-bonded energy between atoms in the ligand and the protein. As long as the structures ofthe protein and the ligand are not highly strained in the complex, differences in theinternal energies of the protein and the ligand can be ignored. The calculatedprotein-ligand interaction energy is proportionate to and gives good correlation with when entropy changes (AS) for formation of the different protein-ligandcomplexes are small (or similar for the compared ligands or proteins).

5. Enzyme-Inhibitor Complexes

The treatment of enzyme-inhibitor complexes is the same as for any protein-ligandcomplex that does not involve chemical changes in the protein or ligand. The choice ofstates is relatively simple. The ‘reaction’ is the formation of the enzyme-inhibitorcomplex from free inhibitor and enzyme with the displacement of the bound waters from the enzyme and inhibitor. When the inhibitors are similar in size and solubility, differences in the energy of the enzyme-inhibitor complex dominate differences in the

which is an apparent equilibrium constant. Thus, the expected free energy of binding is

An example is the application of molecular mechanics to estimate the energetics ofinhibitor binding to trypsin [7]. The energy of binding of 15 different small molecules to trypsin was estimated by molecular mechanics calculations using the Fourier Green’sfunction method [15]. The predicted binding energies agreed with measured inhibitionconstants (Fig. 1 ). The correlation coefficient between the predicted binding energy andthe free energy derived from values was 0.75 for 15 inhibitors. (The correlation

-RT

118

free energy of binding. The measured inhibition constant, is the ratio [E][I]/[EI],

Page 128: 3D QSAR in Drug Design. Vol2

Molecular Mechanics Calculations on Protein-Ligand complexes

coefficient R = is used throughout thischapter; literature values for R² have been converted to R values.)

6. Enzyme-Substrate Complexes

The choice of states for calculations on enzyme substrates is more complicated Anenzymatic reaction proceeds through several steps. The enzyme binds substrate(s), thesubstrate(s) chemically react proceeding through reaction intermediates and, finally. the product(s) dissociate. Any one of these steps could be the slow step in the reaction and dominate the kinetics. The intermediates are not necessarily transition states of thereaction (energy maxima along the reaction coordinate). It is, therefore, necessary tounderstand the reaction mechanism, and to model the key intermediate states. However,only quasi-stable states can be modelled by the molecular mechanics approximation. Comparison of the calculated binding energy with the observed kinetics in a regression analysis will extract which reaction intermediates are related t o rate-limiting steps.Predictive calculations can be made once the key intermediate states have beendetermined.

Differences in free energy can be related to enzyme kinetics by the equation =

-RT from transition state theory [19]. The velocity of the reaction can bewritten as and the ratio of the velocities for different substrates at constant[E][S] reflects the difference in the activation energies. The use of -RT for a

Fig. 1. Calculations for tyspsin with 15 different inhibitors [7].

119

Page 129: 3D QSAR in Drug Design. Vol2

Irene T. Weberand Robert W. Harrison

multi-step reaction can be justified by an extension of the Michaelis-Menten scheme, as de-scribed in Fersht [19,p. 102]. If the reaction has three steps with rate constants and

where E is enzyme, S is substrate, I is an intermediate and P is product.The Michaelis-Menten equation is:

By definition. = 1 + and = 1 + Therefore, is and= -RT + – The expression is linear i n the three relevant rate

constants, which means that these terms can be estimated independently. When the finalstep is slow and rate limiting, changes in will reflect changes in

We have studied substrate complexes with two different enzymes. human gluco- kinase [3] and HIV protease [2]. Human glucokinase catalyzes the phosphorylation ofsugar substrates. The glucokinase structure was modelled from the crystal structure ofyeast hexokinase on the basis of 30% identity in amino acid sequence. Only the firstreaction intermediate was modelled: the open conformation of glucokinase with boundsugar. The interaction energy calculated for the glucokinase-sugar complexes gave animpressive correlation coefficient of 0.99 with the kinetic data for 4 sugar substrates [3],as shown in Fig. 2. It was found to be important to include the experimentally deter-

Fig. 2. Calculations for a homology model of human glucokinase showed agreement with kinetic measure-ments for four sugar substrates [3] .

120

Page 130: 3D QSAR in Drug Design. Vol2

Molecular Mechanics Calculations on Protein-Ligand Complexes

mined water structure from yeast hexokinase in this calculation because there werewater-mediated hydrogen bonds to the substrate. This result illustrated the utility of thecalculations even for structures that are predicted by homology with known protein structures.

The calculations on HIV protease-substrate complexes will be described in moredetail to illustrate the complications involved in choosing physically valid states. HIVprotease is a dimeric aspartic protease that catalyzes the hydrolysis of peptide bonds inproteins or peptides of at least 7 residues in length [20]. The peptide hydrolysis pro-ceeds through a minimum of three steps: the binding of substrate, formation and decom-position of a tetrahedral intermediate and release of products [21,22]. The reactionintermediates were modelled from crystal structures of HIV protease with peptide-like inhibitors [2]. The protease-peptide complex and the protease complex with the tetra-hedral reaction intermediate were modelled for an octapeptide (Fig. 3). These modelsrepresent two steps in the reaction. The two catalytic aspartate residues were modelledas sharing a proton, which is an average over several discrete configurations where

HIV Protease Reaction IntermediatesFig. 3. Two reaction intermediates were modeled for HIV- 1 protease: the protease-peptide complex, and the protease complex with tetrahedral intermediate. The two catalytic aspartic residues were modeled us sharinga proton, which is an average over several deicrete configurations where different atoms of the two as-partates are protonated. This model with a central proton was demonstrated to be stable during moleculardynamics simulations [8].

121

Page 131: 3D QSAR in Drug Design. Vol2

different atoms are protonated. Since it is important to model a quasi-stable equilib-rium state, the stability of this proton was demonstrated by molecular dynamics simula-tions with peptide substrate [8]. However, similar molecular mechanics calculations byothers on HIV protease inhibitors used a model in which only one of the four possible

atoms was protonated [4]. Crystallographic water structure was included and therewere no restraints on protein atoms, in contrast to related calculations by other groups[4,5]. The tetrahedral intermediate was modelled as a gem-diol amine where the watermolecule which is seen between the flaps and the inhibitor in most crystal structures ofHIV protease was incorporated into the diol. Partial charges were assigned to the gem-diol amine using the electronegativity equalization scheme in AMMP, and the totalcharge for the gem-diol amine was set to +1.0. The total number of atoms and totalcharge were conserved in this model. The calculated interaction energies were com-pared with kinetic measurements for peptide substrates with single amino acid substitu-tions at positions P4 to P3´, where the scissile peptide bond is between P1 and P1´ (Fig.4). The interaction energy for the HIV protease-peptide complexes had n o significantcorrelation with the free energy derived from kinetic data. However, the interactionenergy for the complexes with the tetrahedral intermediate gave significant correlationwith kinetic data. The correlation coefficient was 0.93 for 8 substrates with changes inresidues P1 and P1' next to the scissile bond, 0.86 for 14 substrates with changes inresidues P2-P2´ and 0.64 for all 21 substrates with changes in P4-P3´ positions. Thesecorrelations are significant at the 0.995-0.9995 confidence level by Student’s T test,despite the absence of corrections for conformational entropy or for the effects ofsolvent. The higher agreement for substitutions of substrate positions P2-P2´ is proba-bly because they lie within the protease and their conformations are more restricted. Thelower correlation for more distal residues probably arises from their greater conforma-tional variation and exposure to solvent on the protein surface.

The results for models of HIV protease with reaction intermediates can be comparedto the results of other calculations on HIV protease-inhibitor complexes. Correlationcoefficients of 0.66 to 0.71 were obtained in a recent study of three-dimensional quan-titative structure-activity relationship (3D QSAR) of HIV protease inhibitors [23]. Inthese and other QSAR calculations, a 3D pharmacophore map is defined using molecu-lar properties and activities for a series of ligands. The weights for each individual mol-ecular property are empirically adjusted to maximize the correlation of predicted andobserved binding constants. Head et al. [5] have developed a hybrid method using mol-ecular mechanics minimization with the AMBER all-atom force field [10]to calculatethe enthalpy of binding and heuristics to estimate the entropy of binding. Only theinhibitor atoms and the HIV protease atoms within 8 Å of the inhibitor were minimizedin these calculations, which resulted in a predictive correlation coefficient of 0.755 for13 inhibitors. Holloway et al. [4] have observed high correlation coefficients of0.76-0.885 between the interaction energy calculated for different inhibitors of HIVprotease using the OPTIMOL potential and the measured enzyme inhibition. In theircalculations, the protease atoms were static and the inhibitor was minimized. while onlyone aspartate atom was protonated. Our molecular mechanics calculations gavesimilar agreement with the observed energy from kinetic parameters without applying

122

Irene T. Weber and Robert W. Harrison

Page 132: 3D QSAR in Drug Design. Vol2

Molec ular Mechanics Calculations on Protein–Ligand Complexes

HIV Protease-Peptide Substrate

Interaction Energy (kcal/mol) Fig. 4. Calculationsfor HIV protease with 21 peptide substrates [2]. (a)protease-peptide models: there wasno significant correlation The 14 peptide model with cha sile peptide bond are indicated by solid circles; the seven

nges in substrate positions P2-P1 closeto tmodels with changes in substrate positions

plotted for the 14 subtrates with changes in positions P2-P2´ which give a correlation coefficient of 0.86; then models with changes in positions P4, P3 and P3´are indicated by open circles.

123

and P3´ are indicated by open circles. (b) Protease-tetrahedral intermediate models: the regression line is

Page 133: 3D QSAR in Drug Design. Vol2

any empirical corrections to the potential set or restrictions on atomic positions (seeFigs. 1,2 and 4]. Therefore, significant correlation has been obtained between cal-culated and measured binding energies using a variety of potentials and different mini-mization procedures. These results show the success of applying molecular mechanics calculations on a single configuration to estimate relative binding energies for a series of protein-ligand complexes.

7.

The interaction energies calculated from molecular mechanics are usually much largerin magnitude than the observed free energy differences. This is the result of using asingle point estimate for a thermodynamic average. Molecular mechanics and dynamicscalculations are a simplistic classical approximation to an otherwise unsolvablequantum problem, and even a long 1000 ps) molecular dynamics simulation is an(>

over seconds to hours of time. Molecular mechanics calculations provide a point esti-mate of the internal energy for a specific molecular conformation. The free energy isdetermined by the partition function which is an ensemble average of these point estimates.

The molecular mechanics approximation of using a conformation near an energyminimum will be most accurate when one conformation dominates the partition func-tion. Then the effect of using one point in the ensemble can be treated analytically [2] .The consequence is that molecular mechanics calculations have an apparent ideal gasconstant which is different from R . It follows that the slope for the line of best cor-relation between and the calculated interaction energy is not equal to RT,asseen in Figs. 2 and 4. The measured differences in free energy will be lower than thedifferences in calculated interaction energy. Moreover, the slope gives an estimate ofthe average entropy of binding and is expected to vary for different protein-ligandsystems.

In practical terms, the differences in the thermodynamic internal energy are cal-culated from the expected differences in the Hamiltonian or molecular internal energy(AH) at the energy minimum, or the average of differences over a molecular dynamicsrun. The calculated differences are correlated with observed differences. The correlationshould be highest when the estimate for the internal energy is most accurate. Since the Helmoltz free energy d rence = – the correlation between and

will be best when there are only small variations in the entropy difference AS. Inother words, when the correlation is high, the thermodynamic internal energy termsdominate the free energy and the predictive power will be high. In contrast, when thecorrelation is low, the entropic terms dominate the free energy and the predictive abilitywill be low. Calculations on HIV protease substrates resulted in correlation coefficientsthat varied from 0.93 for substitutions adjacent to the scissile bond to 0.64 for distalsubstitutions of P4, P3 or P3´ [2]. This variation is consistent with distal substrateresidues occurring in several conformations in the protease complex since they arepartly exposed to solvent. Therefore, the effects of conformational entropy will be

124

Irene T. Weber and Robert W . Harrison

Statistical Mechanics Interpretation of the Molecular Mechanics Energy

insufficient statistical sample with respect to the behavior of 1012 or more molecules

Page 134: 3D QSAR in Drug Design. Vol2

Molecular Mechanics Calculations onProtein-Ligand Complexes

relatively large, resulting in lower correlation. The entropic contribution is reflected inthe degree of correlation of the calculated interaction energies and the observeddifferences in free energy for protein–ligand binding or enzyme–substrate catalysis.

8. Future Directions

Fundamental improvements in the computer algorithms and the integration of thermo-dynamic potentials with molecular potentials are required for uniformly reliable pre-dictions. There are two basic problems with the current methodology. Firstly, whilemolecular mechanics potentials are good at predicting local molecular geometry, theirapplication in a single molecular configuration results in a systematic mis-estimation ofthermodynamic energies. This thermodynamic ‘sampling error’ can have insidiouseffects when thermodynamic observations like free energies of binding are used to para-meterize molecular potentials. When the single configuration is representative of a uni-modal distribution, the estimates can be rigorously proven to be overestimates of freeenergy differences [2]. These overestimates can be corrected by a simple scaling whenthey are not too large. Molecular dynamics methods like free energy integration [24]can be used to estimate the distribution of configurations and thus improve the es-timation of the free energy. However, these methods are coinputationally expensive andultimately suffer from the same flaw of sampling errors because the accessible phase space volume is limited.

Secondly, the solvent is often neglected or poorly treated in the calculations. Errors inthe treatment of solvent lead to poor predictions for hydrophilic groups because thecompetition between the 55 molar water and the millimolar (or less) ligand is neglected.The conceptually simplest approach to solvent corrections is to use a discrete watermodel. It is important to include crystallographically determined waters in the cal-culations, especially since these water molecules may be critical for binding ligand orfor stabilizing the protein structure. However, the crystal structure will not include bulkwater. The problem for bulk water is that there are many configurations of water whichenter into the estimate. Another common approach is to use a continuum model like aconstant dielectric. The continuum model is physically incorrect near a protein surface,because near the surface the molecular nature of water dominates. An alternative is touse a screening potential. Screening potentials are a hybrid between a discrete model, inthat they use a molecular representation, and continuum models, in that they reproducethe asymptotic behavior of the continuum solution.

The classic analytic screening potential i s due to Debye (for a description of its rolein thermodynamic models see [25]). The presence of mobile counterions and dipoles istreated by introducing a screening factor to the electrostatic terms. A factor of the form

radius. The correlation radius is a function of the ionic strength. but it can also betreated as an adjustable parameter. Debye screening has been implemented in AMMPwith a constant correlation radius. In preliminary tests, this approximation improved thecorrelations in energy and structure over the use of no screening [26]. However, a singlecorrelation radius is physically unsatisfactory. Different parts of the protein will have

125

e xp (r/rc) is used where r is the radius between charged group and rc is the correlation

Page 135: 3D QSAR in Drug Design. Vol2

Irene T. Weber and Robert W. Harrison

different exposures to solvent and therefore require different correlation radii. There-fore, multiple correlation radii are probably needed for a physically correct treatment ofsolvent.

References

1 .

2.

3 .

Briggs, J.M., Marrone, T.J. and McCammon, J.A., Computational science new horizons and relevance to pharmaceutical design, Trends Cardiovasc. Med., 6 (1 996) 198-204.Weber, I.T., and Harrison. R.W., molecular mechanics calculations on HIV-1 protease with peptidesubstrates correlate with experimental data, Protein Eng., 9 (1996) 679–690.Xu, L.Z., Weber, I.T., Harrison, R.W., Gidh-Jain. M. and Pilkis, S.J., Sugar specificity of human ß-cell

6083-6092.Holloway, M.K., Wai, J.M., IIalgren, T.A., Fitzgerald. P.M., Vacca, J.P., Dorsey, B.D., Levin, R.B.,Thompson, W.J., Chen, L.J ., deSolms. S.J., Gaffin, N., Ghosh, A.K., Giuliani, E.A, Graham. S.L.,Guare, J.P., Hungate. R.W., Lyle, T.A., Sanders, W.M., Tucker. T.J., Wiggins, M., Wiscount, C.M., Woltersdorf, O.W., Young, S.D., Darke, P.L. and Zugay, J.A. A priori prediction of activity for HIV-1

Head, R.D., Smythe. M.L., Oprea, TI., Waller, C.L., Green, S.M. and Marshall. G.R., VALIDATE:A new method for the receptor-based prediction of binding affinities of novel Iigands, J. Am. Chem.Soc. , 118 (1996) 3959–3969. Harrison, R.W., Stiffness and energy conservation in the molecular dynamics: An improved integrator,J. Comp. Chem., 14 (1993) 1112–1122.Kourinov, I. and Harrison, R.W., Prediction of novel serine proteinase inhibutors, Nature Struc. Biol., 1 (1994)735-743.Harrison, R.W., and Weber, I.T., Molecular dynamics simulation of HIV-1 protease with peptide substrate, Protein Eng., 7 (1994) 1353-1363.Harrison R.W., Chatterjee, D. and Weber. I.T. Analysis of six protein structures predicted by com-parative modeling techniques,Proteins: Struct. Funct. Genet., 23 (1995) 463–471.

proteins and nucleic acids, J. Comput. Chem., 7 (1986) 230–252.

J.Am.Chem.Soc., 111 (1989) 8551–8565.Rappe, A.K., Casewit, C.J., Colwell, K.S., Goddard, W.A., III, and Skiff. W.M., UFF: A full periodic table force field for moleculars mechanics and molecular dynamics simulations J. Am, Chem. Soc.,

Schreiber, H. and Steinhauser, O., Cutoff size does strongly influence molecular dynamics results on

Greengard, L. and Rokhlin, V., A fast algorithm for particle simulations, J . Comp. Phys., 73 (1987)325–348.Harrison. R.W., Kourinov, T.V. and Andrews. L.C. The Fourier Green’s function and the rapid evalu-

Zeng, C.B., Aleshin, A.E., Hardie, J.B., Harrison, R.W. and Fromm, H.J., ATP-binding s i t e of human

35 (1996) 13157-13164.Wlodawer, A., Nachman, J., Gilliland, G.L., Gallagher. W. and Woodward, C., Sturcture of form III

, J . Mol. Biol., 198 (1987) 469–480.

complexed with 3´CMP and d(CpA): Active site conformation and conserved water moleculars. ProteinScience. 3 (1994) 2322-2339.Fersht, A., Enzyme structure and mecahnism W.H. Freeman and Company. New York, 1985.

4.

5 .

6 .

7.

8.

9.

10.

11.

12.

114 (1992) 10024–10035.13.

14.

15.

16.

17.

18.

19.

126

glucokinase: Correlation of molecular models with kinetic measurements. Biochemistry. 34 (1995)

protease inhibitors employing energy minimization in the active site, J. Med. Chem., 38 (1 995) 305–317.

Weiner, S.J., Kollman, P.A., Nguyen. D.T. and Case. D.A., An all atom force fieId for simulation of

Allinger, N.L., Yuh, Y.H. and Lii, J.-H., Molecular mechanics: The MM3 force field for hydrocarbons,

slovated polypeptides, Biochemistry. 31 (1992) 5856–5860.

ation of molecular potentials, Protein Eng., 7 (1994) 359–369.

brain hexokinase as studied by molecular modeling and site-directed mutagenesis, Biochemist,

crystals of bovine pancreatic trypsin inhibitorZegera, I . , Maes, D., Dao-Thi,M.-H., Poortmans, F., Palmer, R . and Wyns, L. The structures of R Nase A

Page 136: 3D QSAR in Drug Design. Vol2

Molecular Mechanics Calculations on Protein-Ligand Complexes

20. Darke, P.L., Nutt, R.F., Brady, S.F., Garsky, V.M., Ciccarone, T.M., Leu, C.T., Lumma, P.K.,Freidinger, R.M., Vebal, D.F. and Sigal, I.S., HIV-1 protease specificity of peptide cleavage is sufficient

21. Polgar, L., Szeltner, Z. and Boros, I., Substrate-dependent mechanisms in the catalysis of humanimmunodeficiency virus protease, Biochemistry, 33 (1994) 9351–9357.

rate studies and solvent kinetic isotope effects to elucidate details of chemical mechanisms, Biochemistry, 30 (1991) 8454-8463.

23. Oprea, T.I., Waller, C.L. and Marshall, G.R., Three-dimensional quantitative structure-activity relation- ship of human immunodeficiency virus ( I ) protease inhibitors: 2. Predictive power using limitedexploration of alternate binding modes, J. Med. Chem., 37 (1994) 2206-2215.

24. Beveridge, D.L. and DiCapua, F.M., Free energy via molecular simulation: applications to chemicaland biomolecular systems, Ann. Rev. Biophys. Biophys. Chem., 18 (1989) 431–492.

25. Goldenfeld, N., Lectures on phase transitions and the renormalization group, Vol. 85, Frontiers inPhysics, Addison-Wesley, Reading, MA, 1992.

26. Reed, C., Fu, Z.-Q., Wu, J., Xue, Y.-N.,Harrison, R.W., Chen, M.-J. and Weber, I.T. Crystal structureof TNF-α mutant R31D with greater affinity for receptor R1 compared to R2, Protein Eng. 10 (1997)101-107.

127

for processing of gag and polpolyproteins, Biochem. Biophys. Res. Commun., 156 (1988) 297–303.

22. Hyland, L.J., Tomaszek, T.A. and Meek, T.D.,Human immunodeficiency virus-1 protease: 2. Use of pH

Page 137: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 138: 3D QSAR in Drug Design. Vol2

Part II

Quantum Chemical Models and Molecular Dynamics

Simulations

Page 139: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 140: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

Bernd Beck and Timothy ClarkComputer Chemie Centrum des Institus für Organische Chemie I der Friedrich-Alexander-

Universität Erlangen/Nürnberg Niigelsbachstrasse 25, D-91056 Erlangen, Germany

1. Introduction

Semiempirical molecular orbital (MO) methods were originally developed as tools forbench organic chemists to investigate typical physical organic problems on molecules of about 50 atoms or less. There has, however, been such a rapid development in theefficiency and accuracy of semiempirical methods in recent years that semiempirical MOcalculations are now beginning to intrude into what used to be the realm of pure fore-field techniques. The present review is intended to cover the new techniques that areemerging for very large molecules and to describe some significant recent applications.

What are the advantages of approximate MO techniques over force fields, whichusually reproduce experimental geometrics and energies considerably more accurately?The ability of quantum mechanical techniques to describe bond making and breakingprocesses may be critical for mechanistic studies, but generally the detailed and accurateanistropic electron density distribution around atoms provide the major advantage ofusing MO techniques. The performance of the semiempirical methods is naturally criti-cal, MNDO, for instance, was unable to describe hydrogen bonds. making it completelyunsuitable for biological applications, AM1 and PM3 are better in this respect. but stillhave critical problems describing the geometry and the rotation barrier about peptideC-N bonds.

Another major advantage of semiempirical MO techniques is their speed and versatil-ity. They have been used as platforms for solvent simulations, quantum mechanical/molecular mechanical (QM/MM) mixed methods and a variety of other applications.There is still a tendency to neglect the power of such methods for biological applica-tions, and so we present an overview of some of the new literature and novel applicationsof these techniques.

This chapter is organized as follows: in section 2, we deal with the pure semiempiri-cal methods, In section 3, we describe the QM/MM methods, the differences betweenthe most commonly used methods and their abilities. In section 4, we give a short intro-duction into quantum-based electrostatics, and in sections 5 describe some possibleapplications of the QM/MM approach.

2. Pure Semiempirical Methods

As mentioned above, the major task of all methods presented here is to reduce thenumber of necessary operations during the SCF calculation in order to allow calcula-tions on large molecular systems. In the following section, we describe severalapproaches developed during the last few years that allow the calculation of large mole-cular systems within the semiempirical approximation.

2. 131–159.©1998 Kluwer Academic Publishers. Printed in Great Britain.H Kubinyi et al. (eds.), 3D QSAR in Drug Design. Volume

Page 141: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

2.1 . Strictly localized MOs

One of the first methods was published by Rivail and co-workers in 1992 [1]. As a firstapproximation, the electronic wave function is built up from strictly localized molecularorbitals (SLMO) [2]. These are contributions of atomic hybrid orbitals (HYO). Themost common SLMO involves two atomic centers and describes the bond betweenthese atoms. One-center SLMOs are used to describe lone pairs and multi-centreSLMOs for extended -systems. In order to improve the quality of the description for the region of interest, subsystems are defined. The subsystems are then optimized in thefield of the environment. which is represented by the SLMOs.

This simple approach has some major disadvantages. The computational effort forretting up the SLMOs of the environment i s still very high. The SLMOs are unable todeal with charge transfer and, therefore, the quality of the results depends crucially onthe definition and the size of the subsystems.

2.2. MOZYME

An improved method, implemented in the MOZYME program, was developed byStewart [ 3 ] in 1994 and is also based on localized molecular orbitals (LMOs). TheLMO theory is an alternative way to generate MOs that correspond to the electronicstructures in Lewis molecular structures. Therefore. calculations involving LMOs canbe limited to that region of space in which the LMO exists. To obtain a self-consistentfield (SCF), however, the LMOs are allowed to expand, Even so, for large systems suchas proteins, the size of an LMO will be small compared with the size of the entiresystem.

For small systems, every LMO involves all the atoms of the system. In large mole-cules, however, calculations involving LMOs are much more efficient. All occupied-virtual interactions involving LMOs separated by large distances will automatically bezero. Obviously the larger the system, the more important this becomes.

The calculation of the density matrix can be limited to those matrix elements that arerepresented by an LMO. Therefore, the computational effort for this step only dependson the number of LMOs — i.e. it increases linearly with the size of the system. Forthe calculation of the energies of LMOs and occupied–virtual interactions, the com-putational effort is independent of the size of the entire system, since the size of anLMO depends only on the local electronic structure.

On the other hand, the long-range electrostatic effect is a noli-local phenomenon for which the computational effort rises with N².However, the calculation of this effect i s

by far the simplest and. therefore. the computational costs for the SCF calculation usingLMOs rises nearly linearly with the size of the system (N~1) in contrast to N³ for theconventional MO-SCF procedure. The starting set of LMOs must satisfy several re-quirements: they should form an orthonormal set. There must be one LMO for everyoccupied and every unoccupied MO in the system. Ideally, they should involve one orat the most two atoms.

Conventional Lewis diagrams of molecules provide a good starting point for con-structing the initial set of LMOs. For each atom with a basis set of one s and three p

132

Page 142: 3D QSAR in Drug Design. Vol2

atomic orbitals, a set of four hybrid orbitals is constructed (one for each atom to whichthe atom is bonded or for its lone pairs and p-orbitals for π-bonds).

Stewart also uses a distance cutoff for interactions, in order to reduce memory re-quirements and because many of the integrals are not necessary within the LMO ap-proach. Therefore, only those one-electron integrals were calculated which representinteractions at less than a given distance (normally 6– 8 Å).At distances greater thanabout 7 Å the calculation of the two-center two-electron integrals is modified. Out ofthe 100 integrals needed for small distances, only seven remain, mainly because thequadrupolar and higher multipolar terms are ignored. At distances larger than 30 Å,simple Coulomb repulsion is used.

The method was tested for various polypeptides, up to 264 residues, and it could beshown that for larger systems, single SCF calculations using the LMO approach are upto 160 times faster than conventional SCF for the largest system. For systems with lessthan 50 atoms, the conventional methods are still more efficient.

The range of the molecules that can be studied using the LMO method is limited tothose systems that can be represented as non-radical Lewis structures. These structuresare essential in order to be able to build up the initial LMOs. The test calculations haveshown that the time dependency is still nearer N1.5 than N¹ . Optimization of largesystems is still impractical because of the heavy memory demand. Even so, MOZYMEis a great step forward in the quantum chemical treatment of large molecular systems.However, more tests and some additional modifications seem to be necessary for theLMO method to become a useful tool.

2.3. Divide and conquer

An alternative and also very promising approach is the so-called density matrix divideand conquer (D&C) method developed by W. Yang and co-workers [4,5] for densityfunctional theory and recently implemented into MOPAC [6]. Dixon and Merz [7] haverecently published another linear scaling semiempirical method based on the sameprinciples.

The basic concept of the D&C approach is to divide a large molecular system into aset of relatively small subsystems. An approximate total electron density matrix is thenbuilt up from the contributions of the subsystems. The linear scaling of the method is aresult of the fact that matrix diagonalization is not required on the global Fock matrix,but rather for a set of smaller subsystems. Semiempirical methods benefit from theD&C approach unless the system becomes very large, then the quadratic expense of cal-culation of the two-center integrals will begin to dominate. True linear scaling for large molecular systems can be achieved by the use o f finite cutoffs for diatomic interactionsor perhaps by employing the continuous fast multipole method [8–11].

As shown in Fig. 1, so-called buffer regions were defined at each end of the subsys-tems in order to reduce truncation effects. There is also an overlap between adjacentsubsystems to facilitate the propagation of electronic effects throughout the molecule.The Roothaan-Hall equations for the subsystems were solved including the atoms in thebuffer region, and the density matrix then constructed for a given subsystem excludingthe buffer atoms.

133

Some Biological Applications of Simiempirical MO Theory

Page 143: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

Fig. 1. Divide and conquer subsetting scheme

The D&C–SCF method can be summarized as follows. Starting from an initial guess of the global density matrix, the subsystem Fock matrices are assembled. The Roothaan-Hall equations for each subsystem are solved and their density matrices con- structed. Finally, all subsy-stem density matrices are combined in an appropriate fashion to arrive at a new global density matrix. The results obtained from test calculations on molecules up to about 10 000 atoms [6] clearly show the linear scaling of the D&C method for the cncrgy calculation. For the gradient calculation, a quadratic dependency (N²) was observed. Starting at about 250 atoms, the D&C approach is faster than the normal SCF scheme. Yang and co-workers also include solvation into their method by using the COSMO model (conductor-like dielectric continuum) [12–16]. On the other hand, it is still not possible to optimize a macromolecule fully in a reasonable time. There are problems using the cutoff [7] and the results are strongly dependent on the size of the subsystems used.

The LMO and D&C approaches are both steps forward that allow the calculations of large molecular systems like enzymes or proteins up to about 10 000 atoms within the semiempirical approximation. Despite the fact that it is still impracticable to perform full geometry optimization on these systems. it is possible to calculate Mulliken charges, molecular dipole moments and some other molecular properties. More research in this area is necessary to solve some of the problems mentioned above and to allow these methods to become more applicable in common use.

In contrast to the pure semiempirical methods. developed in the last few years. QMiMM approaches have been developed over the last two decades by combining dif- ferent levels of QM methods with force fields. A survey of semiempirical QM/MM willbe given in the next section.

3. QM/MM Methods

As mentioned before, QM/MM methods have been developed continuously over the last two decades. In 1976, Warshel and Lcvitt [17] prcscnted the first QM/MM approach.Since then. several hybrid QMiMM models have been developed, combining semi- empirical [18–24] density Functional [25], valence bond [26,27] or ab initio Hartree-Fock [28] methology with frequently used force fields like MM3 [29] AMBER [30] or CHARMm [31]. In the meantime. these methods have become well established; for example, for the investigation of solvation phenomena [32–46] or for biochemical problems [47–51].

134

Page 144: 3D QSAR in Drug Design. Vol2

Some Biological Appplications of Semiempirical MO Theory

Fig. 2. Partitioning of a molecular system into QM and MM part.

The basic idea of all QM/MM methods is to treat that part of the molecule whichundergoes the most important electronic changes quantum-mechanically, whereas therest of the molecule is treated by molecular mechanics.

The methods described in the following differ in the way they include the interactionbetween the QM and MM parts, by the usage of link atoms or the use of an additional‘boundary region’ around the molecular system to count for environmental effects,Therefore, the following section is divided into two parts. The first deals with thoseQM/MM methods that use so-called ‘link atoms’, and the second describes those whichdo not use them.

3.1. Methods with link atoms

Many research groups have published hybrid quantum mechanical/molecular mechan-ical approaches in which they link atoms to connect the QM with the MM parts of themolecule. See. for example, Singh and Kollman [28] or Merz and co-workers [25].Anapproach very often used is that of Field. Bash and Karplus, published in 1990 [18],which will be described in more detail. The authors’ aim was to implement a generallyapplicable method, with which a large molecular system can be studied with geometryoptimization, molecular dynamics and Monte Carlo methods. Therefore, they combinedthe MM program CHARMm [31]with the AM1 and MNDO semiempirical molecularorbital procedures. The molecular tem to be studied is divided into two parts. asshown in Fig. 2. As mentioned above, an additional boundary region was included toaccount for the surroundings that are neglected. The Hamiltonian for the entire systemis given by

and the energy by

The QM and MM terms are treated as usual. A more detailed description is given forthe QM/MM and the boundary terms.

135

Page 145: 3D QSAR in Drug Design. Vol2

Bernd Beck andTimothy Clark

The QM/MM Hamiltonian ^HQM/MM consists of the interactions of the QM and MMparts represented by atomic charges and van der Waals parameters. The interaction isgiven by the electrostatic and van der Waals interactions and the polarization of the QMpart by the atomic charges of the environment.

For the boundary term. two methods are possible: the periodic boundary [52] and thestochastic boundary approaches [ 53,54]. The only complication within the periodicboundary approach is that the images of the central box contain a copy of the QM atomsand that their charge distribution changes during the calculation. This problem can beavoided by choosing the periodic cells such that the QM images are far enough apart tobe unimportant for the interaction energy. Another important aspect are the so-called‘link atoms’, mentioned earlier. This type of atom is necessary if, for example. in anenzyme reaction sonic residues have to be treated quantum-mechanically, while the restof the protein does not. In this case, there are bonds between the QM and MM parts.The ‘link atoms’ are used to terminate the QM electron density along these bonds. Different schemes have been proposed [55,56]. The authors treated them exactly as QM hydrogens and they are invisible to the MM atoms, because no interactions are calcu-lated. The procedure employed in the version described is:

1.2. Define the ‘link atoms’.3.

4.

5 .6 .7 .

The test calculations performed clearly show that the partitioning can have significant

break π-bonds or bonds in which conjugation effects will be important. It is also impor-tant to note that the inclusion of the ‘link atoms energy’ means that it is only possible tocompare the energies between systems with the same number and types of ‘link atoms’.The authors recommend that it is necessary that a number of different partitioning ap-proaches should be tested for each system in order to determine the effects on theresults. This method also neglects the polarization of the MM part by the QM part.

Recently, Morokuma and Mascras published a new integrated QM/MM optimizationscheme for equilibrium structures and transition states [57]. In order to describe the so-called IMOMM method. we will use an example from the original paper. The ‘real’system is metal complex M(P(CH3)3)2, and the ‘model’ system in the QM calculation isM(PH3)3. Within IMOMM the atoms of the molecule are divided into four different

and the ’real’ MM part are present. The second subset includes those atoms which areonly present in the QM calculation, but substituted in the MM calculations by real (dif-ferent) atoms. In other methods, this set is referred to as ‘junction dummy atoms’ [28]

Partition a molecule into QM and MM parts.

Delete connectivities among the QM atoms. All MM internal coordinate energyterms that involve QM atoms are deleted.Create the non-bond list. The MM non-bond list is generated as usual; there aretwo QM/MM lists: one for the vdW, and one for the electrostatic interactions.Calculate the MM energies and forces.Calculate the QM/MM vdW interactions using the vdW list.Calculate the QM and QM/MM electrostatic interactions and forces.

136

effects on the results obtained. For the definition of the ‘link atoms’, one should not

sets, as shown in Fig.3. In set 1,the atoms which appear in both the ‘model’ QM part

Page 146: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

or the ‘link atoms’ [18]. Set 3 atoms are only present in the MM calculation, but each ofthem corresponds to an atom in set 2 (generally hydrogens) in the QM calculation.Finally, set 4 contains those atoms which are only present in the MM calculation.

For each pair of atoms in sets 2 and 3, the same definition of internal coordinates waschosen. During the optimization, the coordinates of the atoms in set 3 are a function ofthose in sets 1 and 2. This scheme differs sharply from the approach described above,where the coordinates in set 2 (‘link atoms’) optimize independently from those in theMM part.

The equations for energies and gradients for the combined QM/MM description are afunction of those of the independent QM and MM descriptions. In order to avoid doublecounting of contributions already treated in the QM part, these terms were selectivelydeleted in the MM calculation.

The computational algorithm starts with the input of the 4 sets of atoms. The QM cal-culations are then done, followed by the MM calculations, or vice versa. In the nextstep, the energy and gradients are calculated, followed by a check for convergence. Ifconvergence has not been achieved, new coordinate values for sets 1 and 2 atoms are generated and the calculation starts again.

This method has been designed to perform a QM geometry optimization within anMM environment. Therefore, it is possible to use ab initio, DFT or semiempirical MOmethods with several different force fields. Until now, results have only been publishedfor test calculations on small systems. The major limitation of this method is the factthat the QM/MM interactions are not treated explicitly.

The last method using link atoms that we will describe is that of Thiel andBakowies [23], published i n 1996. The authors present a hierarchy of QM/MMapproaches. The so-called model A only represents a mechanical embedding of thequantum mechanical region. Model B includes several interactions between the differ-ent parts and also the polarization of the QM part by the MM environment. The most

137

Fig. 3. Partitioning of t he ‘model’and the ‘real’ system within IMOMM.

Page 147: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

refined model C also includes the polarization of the MM part by the QM part. In con-trast to earlier methods, models B and C include an explicit MM correction for interac-tions involving link atoms. The formal disadvantage of this is the fact that this methodcannot be used in the MD simulations.

Model B includes almost the same treatment for the QM/MM interaction, as de-scribed by Field, Bash and Karplus [18]. As a refinement of earlier QM/MM methods,the authors have adopted parameterized models to calculate QM electrostatic potentialsand MM partial charges. These have been calibrated against RHF/6-3lG* referencedata [58]. In the majority of cases, this model successfully reproduces experimentaldata. Problems may arise if strongly charged MM atoms arc close to the QM/MMboundary. In this case, the electrostatic interaction tends to be overestimated.

Model C includes the polarization of the MM fragment in order to remove the asym-metry for the description of non-bonded QM/MM interactions (‘normal’ QM/MM onlyincludes polarization of the QM part). It adopts the treatment of Thole [50] to describeinduced dipole interactions. This model only needs one parameter per element, theisotropic atomic polarizability. This method currently provides one of the most ad-vanced treatments of polarization within semiempirical QM/MM approaches. The con-sideration of MM polarization seems to be crucial in applications involving chargedQM parts, which generate large electrostatic fields.

The MNDO/MM computer program uses either the MNDO or the AM1 wave func-tions and the MM3 force field. The method was tested for various smaller organic mole- cules up to about 40 atoms in order to be able to compare the results with pure QMmethods. For most test calculations, models B and C provide sufficiently reliableresults.

3.2. The SLMO technique

One method which allow bonds to be shared by the QM and MM part and does not use ‘link atoms’ is the local self-consistent field method of Rivail and co-workers [20]. Thismethod is a further development of the strictly localized molecular orbital (SLMO) ap-proach [1],already described in section 2 on pure semiempirical methods. However, theenvironment is now treated by mechanics instead of SLMOs.

For an atom pair at the boundary between the two parts, the atom belonging to theQM part is called the frontier atom (X). The atomic orbitals of X are transformed suchthat hybrid orbitals are obtained which are colinear with the bonds of this atom. The hybrid orbitals belonging to bonds between the QM and MM part are excluded from theorbital basis (Fig. 4). The associated electron densities are treated as external pointcharges on the cationic QM fragment. Again, this approach includes electrostatic andvdW interactions, as well as the perturbation of the Fock matrix (polarization of the QMpart). MM polarization is not implemented.

This method suffers from the fact that the electron density of the excluded orbital isnot known unless a QM calculation has been performed for the entire system.Additional problems may occur in the calculation of the electrostatic QM/MM interac-tions, because the total charge of a neutral QM subsystem is not forced to be zero.

138

Page 148: 3D QSAR in Drug Design. Vol2

field based on the method described above for modelling very large molecules [60].

3.3. Intermolecular approaches

The following two methods assume that there arc no bonds between the QM and theMM atoms. Therefore, they do not need ‘link atoms' or related approaches. The firstmethod discussed was the so-called PCM (Point Charge Model) method developed inour research group, which was originally designed to calculate heats of adsorption and reaction pathways in zeolites [24]. In PCM, the environment is fixed and the atoms aretreated as point charges with assigned van der Waals radii. The interactions between theenvironment and the QM part were considered to consist mainly of three contributions.The first of these is the Coulomb interaction between the point charges and the QMatoms including the electronic and core contributions. Secondly, the van der Waals in-teraction was calculated using a Lennard-Jones [6,12] potential, whereas the potentialparameters were taken from the Universal Force Field (UFF) developed by Rappé andco-workers [61].Finally, the perturbation of the Fock matrix (polarization of the mole-cule within the environment) was also considered. In order to show how important it isto include the polarization into the QM/MM interactions, Fig. 5 (see p. 145) shows anexample of an inhibitor optimized within the binding site of an enzyme. The result of agas phase calculation was used as reference.

The induced dipole moment is 5.48 Debye. Test calculations have shown that theVAMP-PCM approach can be used successfully for biochemical problems. (Exampleswill be discussed later.) There are two problems to be solved: the flexibility of the envi-ron ment, and the fact that the PCM approach neglects the MM polarization, in contrastto the QM/MMpol method published by Thompson and co-workers in 1995 [22] .

The QM/MMpol approach couples the QM region (for both ground and excited states) with a polarizable MM description, according to the approach described in theearlier work of Luzhkov and Warshel [62,03]. The first step was to define a system con-sisting of the QM and MM components. The QM part includes the electrons (ψ) and the

139

Some Biological applications of Semiempirical M O Theory

Fig. 4. Description of system partitioning in this hybrid method.

Recently, Rivail and co-workers have published a hybrid classical quantum force

Page 149: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

nuclei (Za) or effective cores, whereas the MM compounds consist of charged atomiccentres (qm) with atom centered polarizable point dipoles (µm) [64–66]. Figure 6 showsthe interactions between each of the compounds.

Interaction in the QM part are included into the operator. The operators and consists of interactions 4 and interaction 5 and 6, respectively.

(interactions 7 and 8) and (interactions 9 and 10) are the import- ant terms. In addition to these operators, there are also those describing the van derWaals interaction containing bond terms for the MM part. The Hamiltonian for the entire system is obtained by

During the QM system SCF, the MM region is included as a static external potential consisting of fixed point charges arid fixed atom-centered polarizable dipoles. The MM part SCF is done by a fixed QM state. A single iteration of the entire system consists of a full iteration of both the QM-SCF and MM–SCF.

The method was implemented into the semiempirical INDO/S approximation but it is also applicable to other semiempirical and ab initio Hamiltonians. As a test case, QM/MMpol was applied to the analysis of the ground and excited states of the bac- teriochlorophyll S dimer (P) of the photosynthetic reaction center (RC) of Rhodo-pseudomonas viridis. During the calculations. the system consisted of 325 QM atomsembedded in 20 158 polarizable MM atoms. The results obtained arc in reason- able agreement with experiment. The explicit values could be found in the original publication [22]. Recently. the method has been extended in order to use it for MD simulations [67].

Basic components needed for the evaluation of electrostatic interaction energies in combined quantum-mechanical and molecular-mechanical approaches are the electro- static potential and the partial charges. In section 4, we will give a short summary of published approaches which deal with the calculation of these properties.

Fig. 6. Schematic view on the into-actions in the QM/MMpol method.

140

and

Page 150: 3D QSAR in Drug Design. Vol2

4. Quantum-based Electrostatics

There are several possibilities to describe the charge distribution within a molecule.Atom-centered charges are commonly used (the one-center approach) for this purpose.Only a few methods using multi-center approaches have been published.

4.1. One-center approaches

the fact that partial charges are not observable quantities. Thus, any possible definitionis arbitrary. The simplest method to derive atomic charges is to use the electronegativityof the atoms [68,69].This technique is often used for MM methods. However, conjuga-tion and polarizability effects suggest that quantum-mechanical charges are more reli-able in more complex systems. The simplest approximations are the so-called Coulson[70] and Mulliken charges [71]. They are derived directly from the density matrix. Amore useful approach to describing the electronic distribution in molecules may be tochoose an atom-centered charge distribution that reproduces molecular properties at andoutside of a surface (usually the van der Waals surface) appropriate to the range of dis-tances at which molecules interact [72]. One such useful property is the electrostatic po-tential, which describes the forces exerted on a positive probe of unitary charge at thedesired surface. In the following, we will describe some methods for the calculation ofpartial charges within the semiempirical approximation.

In 1995, Truhlar and co-workers published two new charge models, called AM1-CM1A and PM3-CM1P, based on experimental dipole moments [73] .The partialcharge methods are parameterized such that the dipole moments calculated from themare as accurate as possible in comparison to the experimental ones. These chargemodels yield rms errors of 0.30 D for AM1 and 0.26 D for PM3 in the dipole momentsof a set of 195 neutral molecules. The test set covers a variation of organic functionalgroups such as halides, C-S-0 and C-N-0 linkages. Another remarkable point is thatthe atomic charges computed with the CM1 model are in good agreement with high-level ab initio calculations for both neutral compound and ions, whereas CM1 is far lessexpensive with respect to the computational effort.

Bakowies and Thiel developed another efficient semiempirical approach for the com-putation of partial atomic charges [58]. The charges are derived from a semiempiricalcharge equilibration model, which is based on the principle of electronegativity equal-ization [68,69]. The approach is a further development of the QEq model of Rappé andGoddard [74]. During the parameterization, Bakowies and Thiel used ab initio potentialderived atomic charges (RHF/6-31G*) as references. Final parameters have been pub-lished for hydrogen, carbon, nitrogen and oxygen atoms. The results were compared toab initio Mulliken and PD atomic charges and reproduced with an accuracy of approx.0.005e and 0.1e, respectively. The authors claim that the larger deviation for PDcharges can be traced back to statistically ill-defined PD charges of buried atoms thatconstitute the molecular skeleton.

141

Sonic Biological Applications of Semiempirical MO Theory

The proper definition of atomic charges is very difficult. The main problem arises from

Page 151: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

The so-called potential derived (PD) or electrostatic potential (ESP) derived atomiccharges used by Bakowies and Thiel are very important in QSAR or QM/MM methods.There are various methods of deriving potential-based charges using ab initio and semi- empirical techniques. Most differ in the way they generate the grid points for fitting pro-cedure. Chirlian and Francl (CHELP) [75] used spherical shells, and Brenenam andWiberg (CHELPG) [76] defined a cube of points spaced 0.3-0.8 Å apart containing themolecule and including 2.8 Å of headspace on all sides. Besler, Merz and Kollman(MK) [77] used the Connolly molecular surface algorithm to define points on several

Spackman uses a geodesic point selection scheme [78]. Most of the semiempiricalmethods use modified ab initio algorithms [79,80]. Representatively for all these

using semiempirical MO methods [81,82].The charge-fitting process begins with the calculation of the electrostatic potential in

a grid around the molecule. The points were generated using a modified Marsili algo-rithm [83,84] with variable step size (edge length), which enables the user to control thenumber of the grid points. All points within 1.4 times the van der Waals radius of themolecule were eliminated, the maximum distance of the grid points is one-third of thevdW radius. Figure 7 (see p. 145) shows such a grid produced by VESPA using a stepsize of 0.3 Å.

The charge-fitting process begins with the calculation of the electrostatic potential foreach point. The electrostatic potential at these points is then calculated using the NAO-PC model [85,86], described later. This enables us to construct a very fast algorithm forthe calculation of ESP-derived atomic charges, in which the most time-consuming stepis the final linear least-squares fit procedure suggested by Chirlian and Francl [75]. Theonly constraint during this fit is that the sum of the ESP charges have to reproduce themolecular charge, but i t is also possible to use, for example, the molecular dipolemoment for this purpose.

In many cases, this procedure gives charges of 6-31G* quality, especially for AM1and PM3. For example, for a test set of 27 organic compounds, AM1-VESPA chargescorrelate with the HF/6-31G*(MK) ESP charges with R = 0.934 and a standard devi-ation of 0.108. If the phosphorus compounds are omitted, the correlation coefficient in-creases to 0.954 (standard deviation of 0.104). Without the P- and S-compounds, thecorrelation coefficient reaches 0.960 (0.098). Using the CHELPG method [76] insteadof MK to obtain the 6-31G* values has no influence on the correlation obtained.

Kollman and co-workers [87] have described the RESP method, designed to improvethe quality of the ESP charges. They use a restraint in the form of a hyperbolic penaltyfunction in the charge-fitting procedure. This requires an iterative solution to self-consistency in (the point charge at atom j ) . A similar approach for the VESPA algo-rithm was tested, but no real improvement was achieved. In some cases, we experiencedmajor convergence problems.

It was also shown that the VESPA approach to determining electrostatic potential-derived atomic charges using semiempirical techniques essentially is orientationally in-variant. Conformational variation gives smooth and reasonable changes in both the

142

concentric shells (1.4, 1.6, 1.8 and 2.0 times the vdW radius) surrounding the molecule.

methods, we will discuss our VESPA method for deriving high-quality ESP charges

Page 152: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

molecular electrostatic potential and the charges derived from it. On the other hand,Francl et al. [88] have shown that the least-squares matrix for this fitting problem maybe rank deficient, and that statistically valid charges cannot always be assigned to allatoms in a molecule. This problem increases with increasing size of the molecule. Wewere able to show that an increasing number of grid points can improve the quality ofthe point charges obtained, especially for buried atoms, but does not always lead towell-defined atomic charges [82].

Since this method is so effective at generating monopoles that reproduce theelectrostatic potential of a molecule. it is possible to use VESPA for large bio-organicsystems [82]. In this case, however, there is a critical relationship between the structureof the molecule and the quality of the results. Atoms far away from the nearest grid points have very poorly defined charges (e.g. those inside the helix of α-helical Ala 28).This is a general disadvantage of all methods that use grid points at defined distancesaround a molecule.

4.2. Multi-center approaches

Two methods are mentioned here. The first one is the distributed multiple analysis de-veloped by Stone in 1981 [89]. Stone’s aim was to achieve a reliable description of thecharge distribution in a molecule that is suited to the calculation of the electrostatic po-tential outside the charge distribution itself. The method was defined as an extension ofthe Mulliken population analysis. It uses a multipole expansion instead of point chargesto describe the charge distribution. A detailed description is given in the original publi-cation. The second is the NAO-PC model, mentioned above. The natural atomicorbital-point charge model (NAO-PC) has been developed to calculate accurate molecu-lar electrostatic potentials within the semiempirical approximation [85,86].

This model uses nine point charges (including the core charge) to represent heavyatoms, because it is impossible to fit the quantum chemical potential properly by asimple atom-centered point charge model [90]. Figure 8 illustrates the arrangement ofthe NAO-PCs for forinaldehyde. The positions and magnitudes of the eight charges thatrepresent the atomic electron cloud are calculated from the natural atomic orbitals(NAOs) and their occupations. Each hybrid NAO is represented by two point chargessituated at the centroid of each lobe. The positions of the centroids and the magnitudes of the charges are obtained by numerical integration of the Slater-type hybrids and theresults used to set up polynomials and look-up tables that replace the integration step inthe actual MEP calculation.

Not only MEPs, which will be described later, but also atomic and molecular multi-pole moments up to the octupole moment [89,91,92], are well represented by the NAO-

143

Fig. 8. Schematic representation of the NAO-PCs for the C and O atom in formaldehyde.

Page 153: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

PC method. Molecular dipoles are calculated directly from the NAO-PCs and the resultsare compared to those obtained using the standard calculational method [93]. The stan-dard deviation between NAO-PC and directly calculated dipole moments for a test set

2and SCl2 (0.51 D), suggesting that NAO-PC performs less well for sulfur than for theother elements tested.

The calculation of the quadrupole moments uses the definition of Buckingham [94].Calculations were performed for 20 test molecules. the standard deviations between cal-culated and experimental values are l .2 and 0.7 Debye. Ångstrøm for Qxx and Qyy , re-spectively. The correlation is even better if the experimental margins of error areincluded.

The surprisingly good results for molecular multipole moments indicate that NAO-PC can describe an NDDO wave function nearly completely without significant loss ofaccuracy. It, therefore, provides a possibility to reduce a molecular wave function to adiscrete number of point charges.

The calculation of atomic multipoles is also possible within this model. Atomicdipoles especially give detailed information about electronic and structural properties.As atomic multipoles are not measurable quantities within molecules, however, the dis-cussion was limited to molecular moments.

4.3. Electrostatic potential

Electrostatic interactions are known to play a key role in determining the structureand activity of biomolecules [95,96,97]. Therefore, the electrostatic potential is alsoimportant within hybrid QM/MM models. A lot of work has been directed towardcalculating reliable molecular electrostatic potentials from semiempirical methods[58,85,86,98-103]. Some of these methods will be described in the following.

The MEP, in general, is given by

where V (r) is the electrostatic potential at any point Z α is the charge of atom α r;located at Ra; and p(r´) is the electronic density function of the molecule.

Within the monopole approximation, which is often used, this equation simplifies to

where n is the number of atoms; qj is the atomic point charge; and rij is the distance

Using the NAO-PC model, it becomes

144

of 90 organic molecules is 0.171 Debye. The two largest deviations are forH S (0.78 D)

between atom j and grid point i.

Page 154: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

Fig. 5. Polarization of the inhibitor L-benzylsuccinate (bzs) within the active site of carboxypeptidase A. The arrow shows the induced change in the dipole moment.

Fig. 7. The tetramethylammonium ion surrounded by a grid of points generated with VESPA. Several point layers have been removed in order to show the cavity,

145

Page 155: 3D QSAR in Drug Design. Vol2

Bernd Heck and Timothy Clark

where qia is the charge of the NAO-PCs located at riα; qα is the charge from the hydro-gen atoms located at rα; and norb is the number of NAOs.

The double sum gives the contribution of the heavy atoms to the electrostatic poten-tial, and the second term treats the hydrogen atoms. For a test set consisting of 12 mole-cules. the MEPs calculated using the NAO-PC approximation were compared to abinitio RHF/6-31 G* results obtained with Gaussian 92 [104]. As shown in the originalpublication [86]. the errors are between 10% and 14%. The correlation coefficients varybetween 0.91 and 0.99, with the exception of sulfanilamide (0.889) and tosyl chloride(0.855). We conclude that the NAO-PC model is a reliable and very fast method for the calculation of MEPs.

Bakowies and Thiel [58] developed a parameterized method to compute the electro-static potential using the AM1 and MNDO wave functions. The procedure is based on apreviously suggested one [105]. A large set of ab initio RHF/6-31G* reference datahave been used to calibrate the method.

Applying the final parameters (H. C. N. O), the ab initio electrostatic potentials are reproduced with an average accuracy of 20% (AM1 ) and 25% (MNDO). respectively. Kikuchi et al. published a method for the fast evaluation of molecular electrostatic maps for amino acids, peptides and proteins by empirical functions [106]. The MEPs due to valence electrons are calculated by a set of simple empirical functions at various origins. Those due to the core electrons and nuclei are considered by a point charge ap-proximation. The results are compared to ab initio STO-5G values. For amino acids, for example, correlation coefficients between 0.93 and 0.97 were obtained. In the final section, we describe some possible applications of semiempirical binding site models in 3D QSAR.

5. Applications

As mentioned above, the pure semiempirical methods for the calculation of large mole- cular systems are still under development. Therefore. no applications of these tech- niques have yet been published in the area of 3D QSAR. In contrast, QM/MM methods have been used within this research area. There are two main possibilities. The first is to use QM/MM methods to study reaction pathways of enzyme reactions [107,108] or for docking studies [109] or even for MD simulations [51], in order to find new lead com-pounds or active conformations that can be used in further QSAR studies. On the other hand, these methods can be used lor the prediction of biochemically relevant properties such as absolute binding free energies [110] or the inhibition strength [111]. In the fol-lowing, some of the applications mentioned above will be described in more detail.

5. 1. Reaction mechanisms

Richards et al. [107], for example. have examined chorismate mutase catalysis. This enzyme catalyses the skeletal rearrangement of chorismate to prephenate by selecting a destabilized conformer of chorismate. It was found that the minimum energy enzyme–substrate complex has chorismate in a distorted geometry compared to the gas-phase

146

Page 156: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

ground-state structure. It could also be shown that the two residues Arg90 and Glu78 are the most important residues during the reaction. This investigation used theCHARMm QM/MM method [18]. Mulholland and Karplus [108] used the same methodfor investigations on triose phosphate isomerase (TIM). citrate synthase (CS) and theFK506-binding protein (FKBP).

Properly applied QM/MM methods can give accurate reaction paths, they are able toquantify the interactions involved and determine the contributions of individualresidues. These data are very useful in the design of new lead compounds. Another ap- proach are docking studies, which give action conformations of ligands and binding points within the active site directly.

5.2. QM/MM docking

Molecular docking is one way to formulate the problem of molecular recognition com- putationally. Docking is siinply the collision of the substrate with the binding site in the correct conformation and orientation. In order to apply docking methods successfully, the active site geometry should be known, either from X-ray crystallography or by homology modelling for proteins with known sequence but unknown structure.

A combination of a genetic algorithm [112] with our VAMP-PCM method [24,113].mentioned above, was used to approach the docking problem. While the genetic algo- rithm covers the conformational and orientational flexibility of the ligand, the QM/MMmethod relines the interactions between the substrate and the environment. We have used this approach for the prediction of the docking positions or several cyclic nu- cleotides within experimentally known (3GAP, 1GKY) and a modelled binding domain(1APK). The 3D structure of the RIA regulatory subunit was inodelled by Weber et al. in 1987 [114] and is available from the Brookhaven Database.

In order to compare the theoretical predictions with the X-ray structures and to decide which of them should be applied to the modelled binding domain, the different ap-proaches for the docking problem were first tested on known substrate-enzymecomplexes.

5.2.1. 3GAP–cAMPThe cAMP binding domain of the bacterial catabolite gene activator protein CAP servedas the reference protein during the homology modelling of the cAMP binding domain ofRIA [114]. CAP senses the level of cAMP and regulates transcription from severaloperons in E. coli. cAMP (Fig. 9) serves as a hunger signal, both in bacteria andmammals. The crystal structure of the CAP dimer with two bound molecules of cAMP was published by McKay and Steitz in 1981 [115], and refined by Weber and Steitz in1987 to a resolution of 2.5 Å (entry 3GAP in the Brookhaven Database) [116].

It is remarkable that VAMP–PCM. starting with the best 20 solutions of the GA, opti-mizes all of them to be inside the pocket (6 GA solutions were originally outside). Thisshows that the point charge model is able to describe the interactions between the sub- stratc molecule and the protein adequately. The best solution obtained from the different methods within the binding site of the enzyme are shown in Fig. 10.

147

Page 157: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

Fig. 9. Schematic structures of cyclic adenosine-monophosphate (CAMP), 5'-guanosine-monophosphate(5'-GMP) and cyclic guanosine-monophosphate (cGMP).

Fig. 10. Binding site of CAP complexed with cAMP. Solutions of the different methods are as follows: X-ray (blue), GA (green); VAMP-PCM (red); and GA/VAMP-PCM (yellow).

148

Page 158: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

The RMS deviations compared to the X-ray structure are 0.80 Å, 0.78 Å and 0.76 Åfor the GA, VAMP-PCM and GA/VAMP-PCM solutions, respectively. The results are very similar and all deviations are within the experimental resolution. Especially for the cyclo-phosphate and ribose parts of CAMP, which are located in a tight binding pocket (main residues are Arg82, Ser83, Gly71 and Glu72), the predictions are nearly identical with the X-ray structure.

For the aromatic part of cAMP, the deviations are larger because, in the case of the CAP monomer, no residue binds directly to the adenosine part of the ligand, as is also true for the CAP dimer.

5.2.2. 1GKY–5'-GMP As a second test case, we chose guanylate kinase (entry 1GKY in the Brookhaven Database) complexed with guanosine-5'-monophosphate (5'-GMP). The enzyme was isolated from baker’s yeast and crystallized as a complex with its substrate GMP by

Fig. 11. 1CKY complexed with 5'-GMP. Solutions of the different methods are as follows: X-ray (blue); GA(green); VAMP-PCM started from displaced X-ray geometry (red); and VAMP-PCM started from best GA solution (yellow).

149

Page 159: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

Stehle and Schulz [117]. Its X-ray structure has been refined at a resolution of 2.0 Å [118]. The enzyme catalyzes the reaction ATP + GMP ADP + GDP. The ligandguanosine-5'-monophosphate, 5'-GMP is shown in Fig. 9. Again, the X-ray structure of 1GKY/5'-GMP was used as starting point for this investigation. The results of the dif-ferent docking algorithms within the active site of 1GKY are shown in Fig. 11. The ex-perimentally determined position of the ligand seems to be only one of several possible positions in the binding site of the enzyme. Although the binding site is very large, and therefore allows more conformational flexibility, the best predicted positions of 5'-GMPusing the GA and the combined method (VAMP-PCM started with the GA solutions) are in the good agreement with the X-ray structure.

5.2.3. Prediction of the docking position of cAMP in the RIA binding domain As mentioned above, the binding domain of RIA regulatory subunit was modelled by homology using the cAMP binding domain of CAP as reference [114,116]. For a better

Fig. 12. Predicted docking position of cAMP in the RIA binding site (yellow). The original position of cAMP (red) is shown as reference.

150

Page 160: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

orientation, the best solution of our docking approach and the original docking position (APK) within the modelled binding domain are shown in Fig. 12.

The RMS deviation between the original and our predicted positions of cAMP is 1.1 2 Å. Considering the theoretical refinement of 2.5 Å for the modelled structure(1apk), this is a small value. Remarkable in this modelled binding site is that, in addition

Fig. 13. Schematic view of L-benzylsuccinate (bzs), L-phenyllactate (lof), (L-(-)-2-carboxy-3-phenylpropyl)methylsulfodümine (cpm). glycyI-L-tyrosine, (gy). O-[[(1R)-[[N-(phenylmethoxycarbonyl)-L-alanyl] amino] ethyl] hydroxyphosphinyl]-L-3-phenyllactate (ZAAp(O)F, zaf), O-[[(1R)-[[N-(phenylmethoxycarbonyl)-L- phenylalanyl] amino] isobutyl] hydroxyphosphinyl]-L-3phenylacetate (ZFVp(O)F, fvf) O-[[(1R)-[[N- (phenylmethoxycarbonyl)-L-alanyl] amino] methyl] hydroxyphosphinyl]-L-3-phenyllactate (ZAGp(O)F, agf).and their experimental Ki values.

151

Page 161: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

to Arg209, Arg226 also contributes to the binding of cAMP. This is the main reason forthe deviation obtained. Because of the strong electrostatic interaction between the cyclo-phosphate part of cAMP and the two arginines, the whole ligand moves toward these residues during the optimization in VAMP-PCM.

Using different dielectric constants ε = 1, 2, 4) in the QM/MM calculations did notgive significant changes in the results discussed above. The results obtained suggest that the combination of a genetic algorithm with a quantum mechanical/molecular mechan- ical approach provides a powerful tool for treating the docking problem. The average CPU time for this optimization on a R8000 (90 MHz) Power Challenge is around 700 s.

5.3. Binding energies

QM/MM methods can also be used for the prediction of absolute binding free energies [110] or for the prediction of binding affinities [111]. In the following, we describe a

Fig. 14. X-ray (red) and optimized structure of ZFVp(O)F within the binding site of CPA. The Delta value,which will be correlated to the inhibition strength, is then obtained by Delta = ∆Hf(PCM) – ∆Hf(Ligand/Bulk)+ ∆Hf(Bulk).

152

Page 162: 3D QSAR in Drug Design. Vol2

case study for the performance of the method for docking flexible molecules into aprotein-binding site. We have used the VAMP-PCM method to rank a set of seven car-boxypeptidase A (CPA) inhibitors according to their inhibition strength. The bindinggeometries of these ligands are known crystallographically, and also the experimentalKi values for six of them [119-121].To account for the effects of solvation, we decidedto use a supermolecule approach with 28 water molecules surrounding the ligand. Theschematic structures of the inhibitors are shown in Fig. 13. A representative example of

their X-ray structures in Fig. 14.In Fig. 15, delta is plotted against IogKi for live out of the seven inhibitors to visual-

ize the correlation between these values. The zwitterionic gy ligand is not shown in theplot, because of a highly positive delta (191.91).This indicates that our approach in thepresent form may have some problems with zwitterionic structures. Nevertheless. theligand is ranked correctly. For the lof inhibitor, no experimental inhibition strength isknown; we rank this compound as the second worst inhibitor in our test set. The resultsabove suggest that lof is a much worse inhibitor than the structurally similar bzs ligand.

Thus, using a combined QM/MM approach for the simulation of the protein environ-ment and a supermolecule approach for solvation effects allows us to reproduce theinhibition strength ranking of the chosen set of CPA ligands correctly. In order to verifythis result and to strengthen this approach. further investigations arc in progress.To improve the calculation of the solvent effects, one can also use QM/MM models[32-46].

6. Conclusion and Outlook

The above examples demonstrate the emerging power of semiempirical MO theory inbiological applications. There is still much development (not least in the quantum

Fig. 15. Plot delta versus logKi.

153

Some Biological Applications of Semiempirical MO Theory

Pan optimized phosphonate inhibitor (ZFV (O)F) in CPA is shown in comparison with

Page 163: 3D QSAR in Drug Design. Vol2

mechanics of the methods themselves) to be done. but modern hardware and softwarehave opened new possibilities for applying MO theory to systems that would have beentreated with force fields only a few years ago. Much of the battle will be to overcomethe negative image that early semiempirical methods gained within the organic com-munity. This will only be possible by clear and detailed demostrations of the advantagesof using quantum mechanics i n real biological applications.

References

1.

2.

4.

5 .

6.

7.

8.

9.

10.

Ferenczy, G.G., Rivail, J-L., Surján P.R. and Náray-Szab!o, G., NDDO fragment self-consistent field J . Comput. Chem., I3 (1992) 830–837.

Náray-Szab!o, G., Towards a molecular orbital method for the conformational analysis of very large biomolecules . Acta Phys. Acad. Sci. Hung., 40 (1976) 261–273.

Zhao, Q. and Yang, W., Analytical energy gradients and geometry optimisation in the divide-and-

Yang, W. and Lee, T.-S., A density-matrix divide-and-conquer approach for electronic sturcture cal- culations of large molecules, J . Chem. Phps., 103 (1995) 5674–5678.

Linear scaling semiempirical quantum calculations of macromol-ecules, J. Chem. Phys., 105 (1996)274–42750.Dixon, S.L. and Merz, Jr., K.M., Semiempirical molecular orbital calculations with linear system sizescaling, J. Chem. Phys., 104 (1996) 6643–6649.White, C.A., Johnson. B G., Gill, P.M.W. and Head-Gordon. M., The continuous fast multiple method,230 (1994) 8–18.Strain, M .C., Scuseria, G.E. and Frisch. M.J., Achieving linear scaling for the electronic Coulombproblem, Science 271 (1996) 51–53Teeter, M.M., Roe. S.M. and Heo, N.H., Atomic resolution (0.83 Angstr/om) crystal structure of thehydrophobic protein crambin at 130 K, J. Mol. Biol., 230 (1993) 292–311. Deisenhofer., J., Crystallographic refinement of the structure of bovine pancreatic typsin inhibitor at1.5 Å resolution, Acta Crystallograph. B, 31 (1975) 238–250.Stewart. J.J.P., MOPAC7 Version2 manual, QCPE. Bloomington, IN. 1993.

11.

12.to dielectric screening in solvents with explicit

14. Troung, T.N. and Stephanovich, E.V., Analytical first and second energy derivatives of the generalizedconductorlike screening model for free Andzelm, J., Kölnmel, C. and Klamt, A., Incorporation of solvent effects into density functional cal-culations of molecular energies and geometries, J. Chem. Phys.,103 (1995) 9312–9320.York, D., Lee, T.S. and Yang, W., Chem. Phys. Lett. (submitted).Washel, A. and Lev itt, M., Thoeretical studies of enzymic reactions: 1. Dielectric electrostatic andsteric stabilization of the carbonium ion in the reaction of lysozyme, J . Mol. Biol., 103 (1976) 227–235. Field. M.J., Bash, P.A. and Karplus, M., A combined quantum mecahnical and molecular mechanical potential for molecular dynamics simulation, J. Comput. Chem ., 11(1990) 700–733. Vasiyev, V.V., Bliznyuk, A.A and Voityuk, A.A., A combined quantum chemical molecular mechani- cal study of hydrogen bonded systems, Int. J. Quant. Chem., 44 (1992) 897–930. Théory, V., Rinaldi, D., Rivail, J.-L., Maigret, B. and Ferenczy. G.J., Quantum mechanical computationson very large systems: The local self-consistent self method, J. Comput. Chem., 15 (1994) 269–282 Thompson, M.A., Glendening E.D. and Feller. D., The nature of K+/crown ether interactions A hybridquantum mecahnical-molecular mechanical study, J. Phys. Chem.,98 (1994) 10465–10476. Thompson, M.A. and Schenter, G.K. Excited states of the bacteriochlorophyll 6 dimer ofrhodopseudomonas viridis: A QM/MM study of the photosynthetic reaction center that includes MMpolarisation, J. Phs. Chem., 99 (1995) 6374–6386.

energy of solvation, J. Chem. Phys., 103 (1995) 3709–3717.15.

16.17.

18.

19.

20.

21 .

22.

154

Bernd Beck and Timothy Clark

consistent field equations.Int. J . Quant. Chem., 58 (1996) 133–146.

approximation for large electronic systems,

3 . Stewart. J.J.P., Application of localized molecular orbitals to the solution of semiempirical self-

conquer methods for large molecules , J . Chem. Phy., 102 (1995) 9598–9603.

Lee. T.-S., York, D.M. and Yang, W.,

13. Klamt, A. and Schürmann, G., COSMO: A new approach expressions for the screening energy and its gardients, Perkin Trans., 2 (1993) 799–805.

Page 164: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

Bakowies, D. and Thiel, W., Hybrid models for combined quantum mechanical and molecular mechan- ical approcaches J. Phys. Chem., 100 (1996) 10580–10594. Alex, A., Beck, B., Lanig, H . , Rauhut, G. and Clark, T. (paper in preparation).Stanton. R.V., Hartsough, D.S. and Merz, Jr., K.M., An examination of a density functional/molecular mechanical coupled potential. J . Comput. Chem., 16 (1996) 113–128.Bernardi, F., Olivucci, M. and Robb, M.A., Simulation of MC-SCF results on covalent organic multi-bonds reactions: Molecular mechanics with valence bond (MM-VB) , J. Am. Chem. Soc., 114 (1992) 1606–1616.Aqvist, J. and Warshel, A., Simulation of enzyme reactions using valence bond force fields and other hybrid quantum classical approaches, Chem. Rev., 93 (1993) 2523–2530.Singh, U.C. and Kollman, P.A., A combined ab initio quantum mechanical and molecular mechanical method for carrying out simulations on complex molecular systems: Applications to the CH3C1 + C1- ex-change reaction and gas phase protonation of polyethers, J. Comput. Chem., 7 (1986) 718–730. Allinger, N.L., Yuh, Y.H. and Lii, J.-H., MoIecular mechanics: The MM3 force field for hydrocarbonsI. J.Am. Chem. Soc., 111 (1989) 8551–8582.Pearlman, D.A, Case, D.A., Ross, J.W., Cheatham. III. T.E., Ferguson, D.M., Seibel, G.L., Singh, U.C.,Weiner, P.K. and Kollman, P.A., AMBER 4.1 , University of California, San Francisco, CA, 1995.Brooks, B.R., Burccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S. and Karplus, M.,CHARMm: A program for macromolecular energy, minimization and dynamics calculations. J. Comput. Chem., 4 (1983) 187–217.Bash. P.A., Field, M.J. and Karplus, M., Free energy perturbation mehtod for chemical reactions in the condensed phase: A dynamical approach based on a Combined quantum and molecular mechanicspotential J. Am. Chem. Soc., 109 (1987) 8092–8094.

33. Gao, J., Absolute free energy of solution from Monte Cui-lo simulations using combined quantum andmolecular mechanicalpotentials, J . Phys. Chem., 96 (1992) 537–540.

34. Gao, J. and Xia, X., A priori evaluation of aqueous polarization effects through Monte Carlo QM/MM simulations, Science, 258 (1992) 631–635.

35. Gao, J. and Pavelites, J.J., Aqueous basicity of the carboxvlate lone pairs and the C-O burrier in acetic acids: A combined quantum and statistical mechancal s tudy , J. Am. Chem. Soc . , 114 (1992)1912–1914.Gao, J ., Hybrid quantum and molecular mechanical simulations: An alternative avenue to solvent effects in organic chemistry, Acc. Chem. Rea., 27 (1993) 298–305.Gao, J., Luque, F.J. and Orozco, M., induced dipole moments and atomic charges based on average

Gao, J., Potential of mean force for the isomerization of DMF in aqueous solution: A Monte CarloQM/MM simulation study J. Am. Chem. Soc., 115 (1993) 2930–2935.

Gao, J. and Xia, X., A tow-dimensional energy surface for a type II SN2 reaction in aqueous solution,

J.Am. Chem. Soc., 115 (1993) 9667–9675.Stanton, R.V., Hartsough. D.S. and Merz, Jr., K.M., Calculation of solvation free energies using adensity funational molecular dynamics coupled potential, J. Phys. Chem., 97 (1993) 11868–11870.Gao, J., Combined QM/MM simulation study of the Claisen rearrangement of allyl vinyl ether inaqueous solution, J. Am. Chem. Soc., 116 (1994) 1563–1564.Liu, H. and Shi, Y ., Combined molecular mechanical arid quantum mechanical potential study of anucleophilic addition reaction in solution, J. Comput. Chem., 15 (1994) 1311–1318.Liu, H., Müller-Plathe, F. and Van Gunsteren. W.F., A molecular dynamics simulation study with a com-bined quantum mechanical and molecualr mechanical potential energy function: Solvation effects on theconformational equilibrium o f demethoxy ethane, J . Chem. Phys., 102 (1995) 1722–1730.Hartsough, D.S. and Merz, Jr., K.M ., Potential of mean force calculations on the SN

1 fragmentation oftert-butyl chloride, J.Phys. Chem., 99 (1995) 384–390.Stanton, R.V., Little. L.R. and Merz, Jr., K.M., Quantum free energy perturbation study within aPM3/MM coupledpotential, J . Phys. Chem., 99 (1995) 483–486.Thompson. M.A., Hybrid quantum mechanical/molecular mechanical force field development forlarge flexible molecules: A molecular dynamics study of 18-crown-6, J. Phys. Chem., 99 (1995) 4794–4804.

23.

24.25.

26.

27.

28.

29.

30.

31.

32.

36.

37.

38.

39.

40.

41.

42.

43.

electrostatic potentials in aqueous solution, J. Chem. Phys., 98 (1993) 2975–2982.

44.

45.

46.

155

Page 165: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

47. Bash. P.A., Field. M.J., Davenport. R.C., Petsko. G.A., Ringe. D. and Kaiplus. M., Structure of the

reaction pathway, Biochemistry 30 (1991) 5821–5826.Waszkowyezmolecular mechanical study of catalysi is by the enzyme phospholipase A2: An investigation of thepotential-energy surface for amide hydrolysis. J. Chem. Soc., Perkin Trans. 2 (1991) 225–2032. Vasilyev, VV., Tetrahedral intermediate formation in the acylation step of acetylcholinesterases Acombined quantum chemical and molecular mechanical model, J. Mol. Struct. (THEOCHEM),304 (1994) 129– 141 .Elcock, A.H., Lyne, P.D., Mulholland, A J., Nandra. A. and Richards, W.G., Combined quantum andmolecular mechanical study of DNA cross-linking by nitrous acid, J. Am. Chem. Soc., 117 (1995)4706–4707.Hartsough, D.S. and Merz, Jr., K.M., Dynamic force field models: Molecular dynamics simulations of human carbonic anhydrase II using a quantum mechanical/molecular mechanical coupled potential, J . Phy. Chem., 99 (1995) 11266– 11275.

52. Born, M. and Von Karman, T., Über Schwingungen in Raumgittern, Physik. Z., 13 (1912) 297-309.53. Brooks. III, C.L. and Kaiplus. M., Deformable stochastic boundaries in molecular dynamics J. Chem.

Phys. .79 (1983) 6312–6325.Brünger. A.T., Huber, R. and Kaiplus. M , Trysinogen-trypsin transition: A molecular dynamics study

Davis, T.D., Maggiora, G.M. and Christoffersen. R.E., Ab initio calculations on large molecules usingmolecular fragments: Unrestricted Hartree fock calculations on low lying states of formaldehyde and its radical ions, J . Am. Chem. Soc., 96 (1974) 7878–7887.Clementi, E.. Computational aspects for large chemical systems: Lecture notes in chemistry, Springer,New York, 1980.Mascras, F. arid Morokunia, K., IMOMM : A new integrated ab initio + molecular mechanics geometryoptimisation scheme of equilibrium structures and transition states, J. Comput. Chem., 16 (1995)1170 –1179.

58. Bakowies, D. and Thiel, W ., Semiempirical treatment of electrostatic potentials and partial charges in combined quantum mechanical, molecular mechanical approaches J. Comput. Chem., 17 (1996)87–108.Thole, B .T., Molecular polarizabilities calculated with a modified dipole interaction. Chem. Phys., 59 (1981) 341–350.

force field for molecular mechanics and molecular dynamics simulations, J. Am. Chem. Soc., 114 (1992) 10024–10035.Luzhkov, V. and Warshel, A., Microscopic calculations of solvent effects on absorption spectra of conjugated molecules, J . Am. Chem. Soc., 113 (1991) 4491–4499.Luzhkov, V. and Warshel, A., Microscopic models for quantum mechanical calculations of chemical processes in solution: LD/AMPAC and SCAAS/AMPAC calculations of solvation energies, J. Comput.Chem., 13 (1992) 199–213Vesely, F.J ., N-particle dynamics of polarizable Stockmayer-type molecules, J. Comput. Phys.,24 (1977) 361– 371.Ahlstrom, P., WallQvist, A., Engstrom. S. and Jonsson, B., A molecular dynamics study of polarizable water, Mol. Phys., 68 (1989) 563–581.Dang, L.X., Rice. J.E., Caldwell, J. and Kollman, P.A., Ion solvation in polarizable water: Moleculardynamics simulation, J . Am. Chem. Soc., 113 (1991) 2481-2486.Thompson. M.A., QM/MMpol: A consistent model for solute/solvent polarization: Application to the aqueous solvation and spectroscopy of formaldehyd. acetaldehyd and acetone, J. Phys. Chem., 100 (1996) 14492–14507. Gasteiger. J. and Marsili, M., Iterative partial equalization of orbital tronegativity: A rapid access toatomic charges,Tetrahedron. 36 (1990) 3219–3288.

48.

49.

50.

51.

54.

55.

56.

57.

59.

62.

63.

64.

65.

66.

67.

68.

156

B., Hiller, I.H., Gensmantel. N. and Payling, D.W., Combined quantum mechanical–

triosephosphate isomerase phosphoglycolohydroxamate complex: An analog of the intermediate on the

of induced conformational change in the activation domain, Biochemistry. 26 (1987) 5153–5164.

60. Monard, G., Loos M., Théry. V., Baka, K. and Rivail, J.-L., Hybrid classical quantum force field for modeling very large molecules, Int. J . Quant. Chem. 58 (1996) 153–159.

61. Rappé, A.K., Caswit, C.J., Colwell, K.S., Goddard. III, W.A. and Skiff. W.M , UFF, a full periodic table

Page 166: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

(a) Abraham, R.J. and Hudson, B., Charge calculations in molecular mechanics III: Amino acids and peptides, J. Coinput. Chem., 6 (1985) 173–181 . (b) Abraham, R.J. and Smith, P.E., Charge calculationsin molecular mechanics IV: A general method for conjugated systems, J. Comput. Chem., 9 (1987) 288–297.Coulson, C.A. and Longuel-Higgins, H.C., The electrostatic structure of conjugated systems: 1. General theory, Proc. Roy. Soc., A191 (1947) 39–60.

Phys., 23 (1955) 1833–1840.

In Lipkowitz, K.B. and Boyd, D.B. (Eds.) Reviews in computational chemistry. Vol. 2. VCH, Weinheim, 1991,pp. 219–271.Storer, J.W., Giesen, D.J., Cramer, C J. and Truhlar, D.G., Class IV carge models: A new semiempirical approach in quantum chemistry, J. Comput.-Aided Mol. Design, 9 (1995) 87–110.Rappé, A.K. and Goddard, III. W.A., Charge egualibration for molecular dynamics simulation, J. Phys.Chem.,95 (1991) 3358–3363.Chirlian, L.E. and Francl, M.M., Aomic charges derived from electrostatic potentials: A detailed study, J. Comput. Chem., 8 (1987) 894–905.

76. Breneman, C.M, and Wiberg, K.B., Determining atom-centred monopoles from molecular electrostaticpotentials: The need for high sampling density in formamide conformational analysis, J. Comput.Chem., 11 (1990) 361–373.

77. Besler, B.H., Merz, Jr., K.M. and Kollman. P.A., Atomic charges derived from semiempirical methods, J.Comp. Chem., 11(1990) 431–439

17 (1996) 1–18.79. Ferenczy, G.G., Reynolds, C.A. and Richards. W.G., Semiempirical AMI electrostatic potentials and

69.

70.

7 1. Mulliken, R.S ., Electronic population analysis on LCAO-MO molecular wave funations, I. J. Chem.

73.

74.

75.

AM1 electrostatic potential derived charges: A comparison with ab initio values, J. Coinput. Chem., 11(1990)159–169.

80. Orozco, M. and Luque, F.J., On the Use of AMI and MNDO wave functions to compute accurateelectrostatic charges, J. Comput. Chem., 11 (1990)909–923.

81 .Beck, B., Glen, R.C. and Clark. T., VESPA: A new, fast approach to electrostatic potential-derived atomic charges from semiempirical methods, J. Comput. Chem., 18 (1907) 744–756.

82. Beck. B., Glen, R.C. and Clark, T., A detailed study of VESPA electrostatic potential-derived atomiccharges,J. Mol. Model., 1 (1995) 176–187.

83. Heiden, W., Goetze, T. and Brickmann, J.. Fast generation of molecular surfaces from 3D data fields with echanced ‘marching cube’ algorithm, J. Comput. Chem ., 14 (1993) 246–250.

84. Marsili, M., Computation of volumes and surface areas of organic compounds, In Jochum, C., Hicks. M.G. and Sunkel, J. (Eds.) Physical property prediction in organic chemistr, Springer Verlag, Berlin, 1988, pp. 249–251.

85. Rauhut, G. and Clark, T., Multicenter point charge model for high-quolity molecular electrostatic

86. Beck. B., Rauhut, G. and Clark. T., The natural atomic orbital point charge model for PM3: multipolemoments and molecular electrostatic potentials, J. Comput. Chem., 15 (1 994) 1064–1073.

87. Bayly, C.I., Cieplak, P., Cornell. W.D. and Kollman. P.A., A wel-behaved electrostatic potential based method using restraints for deriving atomic charges: The RESP method, J. Phs. Chem., 97 (1993) 10269–10280.

potentials from AM1 calculations, J. Comput. Chem., 14 (1993) 503–509.

88. Francl, M.M., Carey. C., Chirlian, L.E. and Gange, D.M., Charge fit to elestrostatic potentials: II. Can

89. Stone, A.J., Distributed multipole analysis or how to describe a molecular charge distribution, Chem.Phys. Lett., 83 (1981)233–239.

90. Chipot, C., Ángyán, J., Ferenczy, G.G. and Scheraga, H.A., Transferable net atomic charges from a distributed multipole analysis for the description of electrostatic properties: A case study of staturated hydrocarbons, J. Phys. Chem., 97 (1993) 6628–6636.

91. Sokalski, W.A. and Sawaryn, A., Correlated molecular and cumulative atomic multipole omoments, J. Chem. Phys., 87 (1987) 526–534.

atomic charges unambiguously fit to electrostatic potentials, J. Comput. Chem., 17 (1996) 367–383.

157

72. Williams, D.E., Net atomic charges and multipole models for the ab initio molecular electric potential,

78. Spackman, M.A., potential derived charges using a grodesic point selection scheme, J. Comput. Chem.,

Page 167: 3D QSAR in Drug Design. Vol2

Bernd Beck and Timothy Clark

92. Stogryn, D.E. and Strogryn. A.P., Molecular multipole moments, Mol. Phys., 11 (1966) 371–393. 93. Stewart, J.J.P., MOPAC: A semiempirical molecular orbital program, J. Comput.- Aided Mol. Design.

94. Buckingham, A.D., Molecular quadrupole moments.Quart. Rev., I3 (1959) 183–214.95. Perutz, M.F., Electrostatic effects in proteins, Science, 201 (1978) 1187–1191. 96. Warwicker. J. and Watson, H.C., Calculation ofthe electric potential in the active site cleft due to a-

helix dipoles, J. Mol. Biol., 157 (1982) 671–679.97. Warshel, A. and Russel, S.T., Calculations of electrostatic interactions in biochemical systems in solu-

tion,Q. Rev. Biophys., 17 (1984) 283–291.98. Giessner-Prettre, C. and Pullman. A., Molecular elec trostatic potentials: Comparison of ab initio and

CNDO results,Theor. Chim. Acta, 25 (1972) 83–8999. Alhambra, C., Luque, F.J. and Orozco, M., Comparison of NDDO and quasi-ab initio approoches to

compute semiempirical molecular electrostatic potentials. J. Coinput. Chem., 15 (1994) 12–22.100. Luque, F.J., Illas, F. and Orozco, M., Comparative study of the molecular elertrosatic potenrial ob-

tained from different wavefunctions: Reliability of the semiempirical MNDO wavefunction, J. Comput.Chem., 11 (1990) 416–430.

101 . Luque, F.J. and Orozco, M., Reliability ofthe AMI wavefunction to compute molecular electrostatic po-tentials, Chem. Phys. Lett., 168 (1990) 269–275.

102. Alemán, C., Luque, F.J. and Orozco, M , Suitability of the PM3-derived molecular electrostatic poten-tials, J. Comput. Chem., 14 (1993) 799–808.

103. Bharadwaj, R.. Windemuth, A., Sridharan, S., Honig. B. and Nicholls, A., The fast multipole boundaryelement method for molecular electrostatics: An optimal approach for large systems, J. Comput. Chem., 16 (1995) 898-913.

104. Gaussian 92. Frisch, M.J., Trucks. G.W., Head-Gordon. M., Gill, P.M.W., Wong, M.W., Foresman,J.B., Johnson, B.G., Schlegel, H.B., Robb, M.A., Replogle. E.S., Gomperts, R.. Andres, J.L., Raghavachari, K., Binkley. J.S., Gonzales, C., Martin. R.L., Fox, D.J., Defrees, D.J.C., Baker. J.,Stewart, J.J.P. and Pople, J.A., Gausian lnc., Pittsburgh. PA. 1992.

tic potentials based on the AMI wave function: Comparison wiih ab initio HF 6-31G* results,J. Comput. Chem., 14(1993) 1101–1111.

106. Nakajima, H., Takahashi, O. and Kikuchi. O., Rapid evaluation of molecular electrostatic potentialmaps for amino acids, peptides and proteins by empirical functions, J. Comput. Chem., 17 (1996)790–805.

107. Lyne, P.D., Mulholland, A.J. and Richards. W.G., Insights into chorismate mutase catalysisfrom a com-bined QM/MM simulation of the enzyme reaction, J. Am. Chem. Soc., 117 (1995) 11345–11350.

108. Mulholland, A.J. and Karplus, M., Simulations of enzymic reactions, Biochem. Soc. Trans. 24 (1996)247–254.

109. Beck. B., Lanig, H., Glen. R.C. and Clark. T., J. Med. Chem. (submitted). 110. Alex, A. and Finn, P., Fourth World Congress of Theoretically Oriented Chemists – WATOC '96,

I11. Lanig, H., Beck, B. and Clark. T., Poster, MGMS Meeting, York, U.K, 1996.112. Jones, G., Willet, P. and Glen. R.C., Molecular recognition of receptor sites using a genetic algorithm

with a description of desolvation, J. Mol. Biol., 245 (1995) 43–53. 113. Rauhut, G., Alex. A., Chandrasekhar, J., Steinke, T., Sauer, W., Beck. R., Hutter, M., Gedeck, P. and

Clark, T., VAMP6.0, Oxford Molecular Ltd., Medawar Centre. Oxford Science Park, Sandford-on- Thames, Oxford OX4 4GA, U.K.

114. Weber. I.T., Steitz. T.A., Bubis. J. and Taylor. S.S., Predicted structures (cAMP binding domains of type 1 and type II regulatory subunits of CAMP-dependent protein kinase, Biochemistry. 26 (1987)343- 351.

112. McKay, D.B. and Steitz. T.A., Structure ofcatabolite gene activator protein at 2.9 Åresolution suggestsbinding to left handed B-DNA, Nature, 290 (1981) 744-749.

116. Weber. I.T., and Steitz. T.A., Structure of a complex between catabolite gene activator protein andcyclic AMP refined at 2.5 Å resolution,J. Mol. Biol., 198 (1987) 311–326.

4 (1990) 11–12.

105. Ford, G.P. and Wang, B., New approach to the rapid semiempirical calculation of molecular electrosta-

Jerusalem. Israel, 1996.

158

Page 168: 3D QSAR in Drug Design. Vol2

Some Biological Applications of Semiempirical MO Theory

117. Stehle, T. and Schulz, G.E., Three -dimensional structure of the complex bemeen guanylate kinase from yeast with its substrate GMP, J. Mol. Biol., 211 (1990) 249–254.

118. Stehle, T. and Schulz, G.E., Refined structure of the complex between guanylare kinase and its substrate GMP, J.Mol. Biol., 224 (1992) 1127–1141.

119. Mangani, S., Carloni, P. and Orioli, P., Crystal structure of the complex between carboxypeptidase Aand the biproduct analog inhibitor L-benzylsuccinate at 2.0 Å resolution, J. Mol. Biol., 223 (1992) 573–578.

120. Cappalonga, A.M., Alexander, R.S. and Christianson, D.W., Structural comparison of sulfodiimine in-hibitors in their complexes with zinc enzymes, J. Mol. Biol., 267 (1992) 19192–19197.

121. Kim, H. and Lipscomb. W.N., Comparison of the structures of three carboxypeptidase A-phosphonate complexes determined by X- ray crystallography, Biochemistry, 30 (1991) 8171–8180.

159

Page 169: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 170: 3D QSAR in Drug Design. Vol2

Density-Functional Theory and Molecular Dynamics: A NewPerspective for Simulations of Biological Systems

Wanda AndreoniIBM Research Division, Zurich Research Laboratory, CH-8803 Ruschlikon, Switzerland..

1. Introductionto the Methods

Density-functional theory (DFT) is an exact theory of the ground state of a many-parti-cle system [1,2]. The mathematical formulation was given by Hohenberg and Kohn(HK) in the mid-1960s [3,4], namely the demonstration of (i) the uniqueness of theground-state density associated with a given external potential, and of (ii) the varia-tional character of the density energy functional. As a consequence, (iii) the existence ofa universal density functional (HK functional) — i.e. independent of the external poten-tial and of the specific system — was established, the exact form of which, however, isunknown. The variational character of the functional makes the ground-state densityand energy accessible, in principle, via minimization procedures. A precise prescriptionto convert this complex problem into a practical scheme for a many-electron systemwas given by Kohn and Sham [5], in which one-electron orbitals were introduced to de-scribe the electron density. In this way, the solution of the energy minimum problembecame formally similar to that of the Hartree and Hartree-Fock approaches, namely itwas reduced to a set of equations to be solved iteratively. The underlying physics wasmore complete, however, being electron–electron correlation included in the HK func-tional. Also, the meaning of the single-particle orbitals and energies thus obtained wasdifferent [1].

There are several reasons why the power of such a method had to wait decades beforebecoming clear to the community of physicists, and especially to that of chemists. Thefirst step toward a practical application was to introduce a valid approximation for theexchange-correlation energy functional. For a long time. the only practical implementa-tion was the local density approximation (LDA), also suggested by Kohn and Sham [5],which is correct in the limit of the homogeneous electron gas and was expected to be avalid approximation for systems with slowly varying density. LDA was appreciated asphysically sound and appropriate for a number of real systems by solid-state physicistswho first used it for simple metals and semiconductors, and rapidly realized its validitybeyond the realm for which it had originally been designed [6 ,1]. On the contrary. theformal analogy with jellium being more appropriate for extended systems and thepicture so far away from the traditional thinking of chemists, LDA had difficulty inbeing accepted in the study of molecular systems. Moreover, the relatively poor perfor-mance of LDA for isolated atoms and the incorrect description of the tail of the ex- change-correlation potential often yielded significant errors for cohesive energies andelectron affinities. The scenario changed when gradient corrections to the LDA were in-troduced in the exchange and correlation functionals (GGA), which improved the agree-ment between these calculated values and experiment in a significant way. From afundamental point of view, it must be said that LDA defines a specific reference model

H. Kubinyi et al. (eds.), 3D QSAR in Drug Desiqn, Volume2. 161–167.© 1998 Kluwer Academic Publishers. Printed in Great Britian.

Page 171: 3D QSAR in Drug Design. Vol2

Wanda Andreoni

system whose physical meaning is clear because the LDA functional is well defined.GGA suffers instead from the multiplicity of proposed prescriptions and the generallack of a fundamental justification lor them. This is even more true for ‘hybrid’ schemesthat have recently become popular in applications to chemistry.

Nowadays, DFT is the method of choice for many problems in physics and chem-istry, ranging from molecular to condensed-matter systems. The advance in con-putational algorithms (and the progress in computer hardware) have also made testspossible on a variety of systems. It is clear by now that DFT can be applied (and suc-cessfully) to many real problems for which the size system makes traditional quantum-chemistry calculations impractical.

Further support for the use of DFT has come during the past decade from the so-called Car-Parrinello (CP) method [7,8], which combines it with molecular dynamics(MD). As such. it allows one (i) to use dynamical procedures (such as simulated anneal-ing) to minimize the energy functional with the possibility for simultaneous optimiza-tion of both electronic and ionic variables, and (i i ) to determine the time evolution ofthe system, kept on the Born-Oppenheimer surface. at finite temperature. On the practi-cal side, it can be applied to new classes of problems, especially those for which MDwith classical potentials fails to give an appropriate description. For instance. this is the case for covalently bonded systems whenever bonds rearrange as in structural changesand for any quantum-mechanical event such as bond breaking and (re)forming. Also,within the same scheme, one can treat systems in both molecular and condensed phasessuch that, for instance, six evolution patterns can be properly studies, without having to sacrifice the accuracy of the calculations on passing. for example, from the dimer to thesolid state. Starting from the CP proposal of other procedures for local minimizations, anumber of new procedures have been suggested for efficient optimization, eithersimultaneous or sequential, of geometries and electronic structure.

DFT and DFT-MD methods are currently flourishing i n chemistry and materialsscience in general. On the other hand, their application to biochemistry and biology isonly at its beginning. In the following section, I shall briefly mention some attemptsmade so far in this field, and mainly discuss where and why it appears worthwhile topursue this effort. I shall also try to clarify some technical and more fundamentalreasons that limit the range of applications at the moment.

2.

The reason for introducing DFT in biochemistry are: (i) the need for an ab initio de-scription of the interatomic interactions, in many cases. in which force fields become ‘fragile’, and (ii) its computational efficiency compared to that of post-Hartree-Fockmethods. The experience gained so far with the available prescriptions for the ex- change-correlation energy functionals in DFT shows a general ability to describe wellsystems and situations i n which metallic, covalent or ionic bonds are dominant. This isthe case for a number of ground-state properties and lor the energetics. Biologicalmatter, however, is much more complex than that of traditional inorganic and/or organicmaterials. A reliable description requires the ability to account correctly for other types

DFT and DFT-MD in Biochemistry

162

Page 172: 3D QSAR in Drug Design. Vol2

Density-Functional Theory and Molecular Dynamics

of interactions as well. These are hydrogen bonds, as well as van der Waals and hydro- phobic interactions, whose role in biology is known to be crucial for both stability andfunctionality.

Static calculations on key model systems (such as DNA base pairs) ‘in vacuo’ havebeen performed for some time. The main purpose is generally that of examining the per-formance of DFT-GGA methods versus post-Hartree-Fock methods [9], in accountingfor structure, dipole moments and energetics. A more systematic and critical workwould certainly be useful. Regrettably, calculations of this kind often suffer from thelimited accuracy of the computational scheme (e.g. small basis sets for the electronicwave functions) and are thus unable to provide reliable reference data for useful com-parisons. DFT-based results on simple systems have sometimes been used for the con-struction of classical potentials to be applied in molecular mechanics calculations ofrealistic models of biological systems. Although interesting in principle, parametr-izations derived in this way have no guarantee of being transferable.

More recently, it has become possible to extend DFT-GGA static calculations tocomplexes of a few hundred atoms — i.e. the size of biological models investigated inlaboratory experiments [10-12].The starting point is the X-ray refinement of the system(which typically exists in the crystal phase), and the outcome is the locally optimizedgeometry within DFT-GGA. Therefore, the results for the structure are primarily a testof the computational scheme. although they can also include some features. such as the hydrogen pattern, that are complementary to experimental observations. However, thenew information they provide concerns the electronic structure. For instance, in theexample reported in reference [11], where a platinum-modified nucleobase pair wasstudied in the crystal phase, one could capture the effects of metalation on the chemicalbonding and show its effects on the electronic structure of the nucleobase pair in thecomplex.

Static calculations are far from being exhaustive. Still, merely from the point of viewof testing of the computational scheme, local energy minimization starting from an edu-cated guess is not sufficient and does not guarantee the absence of spurious minimaand/or unrealistic conformations. More important. the study of the dynamics is vital forthe understanding of the physical and chemical behavior of biomolecules [13] and iscertainly essential for the correct description of the influence of water on their con-formations and interactions. This is why the combination of DFT and MD, as achievedwith the CP method, has strong potential in this field as well. Moreover, it has recentlybeen shown with CP simulations that liquid water can be treated at a reasonably accu-rate level within at least one of the DFT-GGA schemes [14]. It is probably worth em-phasizing that ab initio liquid water constitutes an important step forward in view of thewell-known severe difficulties that classical force fields encounter to represent properlyits intrinsic properties and specific interactions (see e.g. [15]). CP simulations of waterand of some chemical reactions in aqueous solutions [16] indicate that hydrogen bondscan be correctly described with current exchange-correlation functionals in DFT. Still,more experience must be gained to reinforce and generalize such statements.

The ability to represent liquid water with a reasonable degree of accuracy opens upalso the possibility of determining ‘structures in solution’ fully ab initio. This has

163

Page 173: 3D QSAR in Drug Design. Vol2

Wanda Andreoni

recently been achieved for the monostrand major cisplatin-DNA adduct [17] startingfrom the structure of the model system in the crystal phase.

Finally, the correct treatment of hydrogen-bonded systems may require an explicitaccount of the quantum nature of the nuclei. An extension of DFT-MD in this direction that combines it with the path-integral method has recently been introduced [18,19].

Dispersion forces are a bottleneck for current implementations of DFT. The inabilityof the latter to describe dispersion forces is an obviouy consequence of the nature of theapproximations made for the exchange-correlation density functionals. Because theseapproximated functionals are local or at most contain the density gradient (semilocal),they cannot represent the exchange-correlation energy of distant atomic systems withnon-overlapping charge distributions [20]. Thus, they generally fail to predict both po-sition and depth of the van der Waals minima, as well as the long-range behavior of the interaction potential. Although there are current attempts to make improvements and/orad hoc corrections within the available schemes (see e.g. [21]), new strategies haverecently been proposed within DFT as inore reliable and fully ab initio approaches[22,23].

Increasing the size and time scales of the system under investigation is an urgenttechnical problem. In fact, a typical computer model in CP simulations does not containinore than 300 atoms. and does not run for more than 10 ps. Solutions to the ‘sizeproblem‘ are indeed available for ab initio electronic structure calculations (see e.g.[24–26]) where the sire scaling of the algorithm is linear. These will eventually mergeinto the combined DFT-MD method [24].

The ‘time problem’ can, in some cases, be reduced by making use of algorithms thatare standard in classical MD [27,28]. In the study of activated processes, or inore gener- ally of events that are ‘rare’ i n the time scale of the computer simulation, one can makeuse of ‘constrained MD’ [29], namely of MD where specific reaction coordinates areimposed and constrained during the runs. In this way, one can also calculate relativefree energies and, in particular, free energy barriers. The implementation in DFT-MDhas opened an important new avenue for the study of chemical reactions [30]. In par-ticular, this method has recently been applied to simulate the reaction that is considered tobe the first step in the binding of eisplatin to DNA. namely the substitution of one chlorinewith a water molecule in the aqueous solution [17]. Clear insights into the reaction path ofthe reaction, the accompanying electronic mechanisms and the effect of the solvent havebeen thus obtained. and also a quantitative estimate of the free energy barrier.

sical approaches have long been proposed for simulations inbiochemistry, mainly to either include quantum corrections into molecular mechanicscalculations (see e.g. [31,321)or introduce solvation effects into semiempirical quantummodels (see e.g. [33]). An extension of the DFT-CP method in this direction would beof great help to overcome the technical limitations mentioned above (see e.g. [34]).

3. Conclusion

The application of DFT-based MD to cases of relevance in biochemistry and biology isin its ‘embryonic stage’. Indeed, i t has still to acquire a number of features beforeestablishing itself as an accurate and convenient investigation tool in this field.

164

Hybrid quantum-clas

Page 174: 3D QSAR in Drug Design. Vol2

Density-Functional Theory and Molecular Dynamics

So far, applications have mainly concerned the geometry and electronic structure offragments or crystal models of biological systems and, to a lesser extent, the chemistryand dynamics of models of biochemically relevant process (see section 2 above, as wellas references [35] and [36]). In particular, owing to its strength and localized nature, the study of metal ion binding to enzymes and nucleic acids suffers from the current limita-tions of the method to only a minor extent. Also, the role of water is often crucial indetermining the mechanism and energy scale of the binding itself. Therefore, this is thearea where progress can already be made [12,37,17], both the ab initio description andthe dynamical approach being of crucial importance.

As discussed above, some technical difficulties must be overcome, and the accuracyof the exchange-correlation energy functionals for the description of all relevant interac-tions must also be fully established. It should be evident why such an effort is worth-while. In fact. trying and validating a certain approximation for the universal HKfunctional will result in a global reliable description of the fundamental interactions,independent of the specific interacting atoms or molecules or aggregation state. inde-pendent of the size of the system under investigation and of its physical conditions (e.g.temperature, pressure). This is drastically different from trying and validating an empir-ical or a semiempirical approach, such as a classical potential or a quantum-mechanicalmodel. In such cases, one has to (est a specific parametrization that can be suitableonly for a subset of problems and systems, and often has little chance of beingpredictive.

In conclusion, DFT-based MD, assisted by intelligent modelling, has all the pre-requisites to render computer simulations useful not only for the investigation ofbiochemical systems, but also for the design of new biologically relevant materials.

Acknowledgements

I wish to thank Paolo Carloni and Alessandro Curioni for useful discussions.

References

1.2 .

3.4.

Dreizler, R.M. and Gross, E.K.U., Density-functional theory, Springer-Verlag, Berlin. 1990Parr, R.G. and Yang, W., Density-functional theory of atoms and molecules, Oxford Science Publications., New York, 1989.Hohenberg, P. and Kohn, W., Inhomogeneous electron gas, Phys. Rev., 136 (1964)B864–1887.See also Lieb, E.H., Density functionals for Coulomb systems, In Dreizler, R.M. and Providencia. J.(Eds.) Density functional methods in physics. Plenum. New York, 1985, pp. 31-80: Levy, M. and Perdew, J.P., The constrianed search formulation of density functional theory, ibid., pp. 1 1-30.Kohn, W. and Sham, L.J., Self-consistent equations including exchange and correlation effects, Phys.Rev., 140 (1965) A1133-A1138.Gunnarsson, O., Jonson, M. and Lundqvist, B.I., Description of exchange and correlation effects in

Car. R. and Parrinello, M., Unified density-functional theory and molecular dynamics, Phys. Rev. Lett.,55 (1985) 2471–2474.Car, R.. Molecular dynamics from first principles, In Binder. K. and Ciccotti G. (Eds.) Monte Carlo and molecular dynamics of condensed matter systems. Italian Physical Society Publications. Bologna, Italy,1995.pp. 601–634.

5.

6.

7.

8.

inhomogeneous electron systems, Phys. Rev. B, 20 (1979) 3136–3164.

165

Page 175: 3D QSAR in Drug Design. Vol2

WandaAndreoni

9. See e.g. Sponer, J., Leszezynski, J. and Hobza, P., Structures and energies of hydrogen-bonded DNA base pairs: A nonempirical study with inclusion of electron correlation, J. Phys. Chem., 100 (1996)1965–1974.Hutter, J., Carloni, P. and Parrinello, M., Non-empirical calculations of a hydrated RNA duplex, J. Am.Chem. Soc., 118 (1996) 8710–8712.Carloni, P. and Andreoni, W., Platinum-modified nucleobase pairs in the solid state: theoretical study,J. Phys. Chem., 100 (1996) 17797–17800.Carloni, P. and Alber, F., Density-functional theory investigations of enzyme-substrate interactions, thisvolume and references therein.Karplus, M. and Petsko, G.A., Molecular dynamics simulations in biology, Nature. 347 (1990) 631–639. Sprik, M., Hutter, J. and Parrinello, M., Ab initio molecular dynamics simulation of liquid water: comparison of three gradient-corrected density functional, J Chem. Php., 105 (1996) 1142–1152. Beveridge. D.L., Swaminathan, S., Ravishanker, G., Withka, J.M., Srinivasan, J.. Prevost, C., Louise- May. S., Langley, D.R., DiCapua, F.M. and Bolton, P.H., Molecular dynamics simulations on the hydra-tion, structure and motions of DNA oligomers, In Westhof, E. (Ed.) Water and biological macro-molecules. Macmillan, London, U.K., 1993, pp. 165–225.See e.g. Mejer, E.J and Sprik, M., A density-functional study of the addition of water to SO3 in the gasphase and in aqueous solution, J. Phys. Chem. (in press).Carloni, P., Sprik, M. and Andreoni, W., Cisplatin-DNA binding mechanism: Key steps from ab initio molecular dynamics (in preparation).Marx, D. and Parrinello, M., Ab-initio path integral molecular dynamics Basic ideas, J. Chem. Phys.,104 (1996) 4077-4082.Tuckerman, M.E., Marx, D., Klein. M.L. and Parrinello, M., On the quantum nature of the sharedproton in hydrogen bonds. Science. 275 (1997) 817–819, and references therein.Kristyán, S. and Pulay, P., Can (semi) local density functional theory account for the London dispersionforces? Chem. Phys. Lett.. 229 (1994) 175–180.Osinga, V.P., van Gisbergen, S.J.A. and Baerends, E.J., Density functional results for isotropic andanisotropic multipole polarizabilities and C6, C7 and C8 van der Waals dispersion coefficients formolecules, J. Chem. Phys. 106 (1997) 5091.Kohn, W. and Meir, Y., Van der Waals energies in density functional theory, Phys. Rev. Lett. (submitted)Cross, E.K.U., Dobson. J F. and Petersilka, M., Density functional theory time-dependent phenomena, InNalewajski, R.F. (Ed.) Density functional theory. Topics in Current Chemistry. Vol. 181. Springer,Heidelberg. 1996, pp. 81–172.

24. Mauri, F. and Galli, G., Electronic structure calculations and molecular dynamics simulations with linear system-size scaling, Phys. Rev. B, 50 (1994) 4316–4326.Carlsson A.E., Order-N density-matrix electronic-structure method for general potentials, Phys. Rev. B,51 (1995) 13935–13941.Kohn, W., Density functional and density matrix methods scaling linearly with the number of atoms,Phys. Rev. Lett., 76 (1996) 3168–3171.van Gunsteren, W.F., Molecular dynamics and stochastic dynamics simulations: A primer, In vanGunateren, W.F., Weiner, P.K., Wilkinson, A.J. (Eds.) Computer simulation of biomolecular systems,Vol. 2, ESCOM, Leiden, The Netherlands. 1993, pp. 3–36. Tuckerman. M.E. and Parrinello, M., Integrating the Cur-Parrinello equations II: Multiple time scaletechniques, J. Chem. Phy., 101 (1994) 1316–1329.Carter. E.A., Ciccotti. G. and Hynes, J.T., Constrianed reaction coordinate dynamics for the simulationof rare events, Chem. Phys. Lett.. 156 (1989) 472-477; Ciccotti. G , Ferrario, M. and Hynes, J.T.,Constrianed molecular dynamics and the mean potential for an ion pair in a polar solvent, Chem. Phys.129 (1989) 241–251.Curioni, A., Sprik, M., Andreoni, W., Schiffer, H., Hutier, J. and Parrinello, M., Density-functional-theory based molecular dynamics simulation of acid catalyzed reactions in liquid trioxane, J. Am.Chem. Soc. 119 (1997) 7218.

10.

11.

12.

13.14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

25.

26.

27.

28.

29.

30.

166

Page 176: 3D QSAR in Drug Design. Vol2

Density-Functional Theory and Molecular Dynamics

Perákylá, M. and Kollman, PA., A simulation of the catalytic mechanism of aspartyl-glucosaminidaseusing ab-initio quantum mechanics and molecular dynamics, J. Am. Chem. Soc., 119 (1997) 1189-1196.Stanton, R.V., Hartsough. D.S. and Merz, K.M., Jr., An examination of a density functional/molecularmechanical coupled potential, J. Comput. Chem., 16 (1995) 113–128.See e.g. Cramer, C.J. and Truhlar, D.G., Molecular obital theory calculations of aqueous solvationeffects in chemical equilibria, J. Am. Chem. Soc., 113 (1991) 8552–8554; Giesen, D.J., Gu. M Z., and

31.

32.

33.

Truhlar, D.G., A universal organic solvation model, J. Org. Chem., 61 (1996) 8720–8721.34. Wei, D. and Salahub, D.R., A combined density functional and molecular dynamics simulation of a

quantum water molecular in aqueous solution, Chem. Phys. Lett., 224 (1991) 291–296.Buda, F., dcGroot, H. and Bifone. A., Charge localization and dynamics in rhodopsin., Phys. Rev. Lett.,71 (1996) 4474–4477.Rovira, C., Ballone, P. and Parrinello, M., A density functional study of iron–porphyrin complexes, Chem. Phys. Lett., 271 (1997) 247–250.Sagnella, D.E., Laasonen, K. and Klein, M.L., Ab initio molecular dynamics study of proton transfer ina polyglycine analog of the ion channel gramicidin A, Biophys. J., 71 (1996) 1172–1178.

35.

36.

37.

167

Page 177: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 178: 3D QSAR in Drug Design. Vol2

Density-Functional Theory Investigations of Enzyme-substrateInteractions

Paolo Carloniª and Frank Alberb

a IBM Research Division, Zurich Research Laboratory, Säumerstrasse 4, CH-8803 Rüschlikon,Switzerland and Department of Chemistry, University of Florence, Via G. Capponi 7, 1-50121

Firenze Italyb Department of Pharmarcy Swiss Federal Institute of Technology (ETH), Winterthurerstrasse

190, CH-8057 Zurich, Switzerland.

1. Introduction

Density-functional theory (DFT) [1,2], originally developed in solid-state physics, is avaluable, versatile and efficient quantum-mechanical method for electronic structure calculations of chemical systems [3]. Recently, application of the DFT methods has also been extended to biological molecules [3,4].

Here, we present some results from recent investigations on two important enzymes.superoxide dismutase and thymidine kinase. Our goal is to understand the factors thatplay a crucial role in the enzyme-substrate interactions. We also briefly discuss theimportance of solvation effects on the quantum-mechanical calculations [5].

2. Superoxide dismutase

Copper–zinc superoxide dismutase (SOD) is a dimeric enzyme containing a copper ionand a zinc ion in the active site of each subunit. The active site of the enzyme consistsof the copper ion (which is essential to the SOD activity) coordinated by four histidinesin a distorted square planar geometry. One of these residues, His61, acts as a bridge between the Cu and Zn ions. An Asp and two other His residues complete that co-

Fig. 1. Schematic view of Cu, Zn SOD active site and active site channel. (From Getzoff, E.D, Halwell, R.A.and Trainer J.A., In Inouye. M. (Ed.) Protein Engineering: Applications in Science Industry and Medicine, Academic Press, New York, 1986, pp . 41–69,)

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design Volume2. 169–179.© 1998 Kluwer Academic Publishers. Printed in Great Britian.

Page 179: 3D QSAR in Drug Design. Vol2

Paolo Corloni and Frank Alber

ordination of Zn. The X-ray structure of the oxidized [6] and reduced [7] bovine SODreveals that the copper-histidine complex is located at the bottom of a shallow cavitypresent in the protein (Fig. 1) .

The physiological role of SOD is the removal of the harmful superoxide anionradical, O2,– produced during the oxygen metabolic cycle [8–13]. SOD dismutates O2

– to

references [9,14]):

(1)Two mechanisms have been proposed. In the most widely accepted mechanism [8],

the copper ion is repeatedly reduced and oxidized in a two-step process:

(2)

(3)

Within this scheme, the Cu-His61 bond breaks down i n the first step [15]; sub-sequently, His61 accepts a proton from the solvent [11,16]. In the second step, this proton is transferred to a second superoxide molecule forming HO-

2, and the Cu-His61bond is reformed.

A second mechanism. which involves the formation of a stable copper-superoxideintermediate has been suggested on the basis of ab initio quantum-mechanicalcalculations [17–19]:

(4)

(5)The (CuO2) intermediate would then oxidize a second superoxide molecule:

(6)Depending on whether an electron transfer (ET) occurs between copper(II) and super-oxide (reactions 2 and 4), the two different mechanisms can be operative [20].

With the aim of understanding some of the important factors governing the super-oxide-copper ET, we have undertaken DFT calculations on several models of SODactive site and its adduct with superoxide [21].

2.1 Computational procedure

A copper tetraimidazole complex, whose geometry has been taken from the X-ray struc-ture of oxidized SOD [6], is our model for the active site of SOD (Fig. 2). Arg141, theonly invariant residue of SOD [22], has been included in the model by modelling it as an

single-point DFT calculations on various complexes, within the local spin density ap-proximation of DFT [ I ,2]. We use the LDA parameterization by Perdew and Zunger[23], which is based on Monte Carlo simulations of the free electron gas investigated byCeperley and Alder [24]. Some calculations were done using spin-polarized techniques.

170

molecular oxygen and hydrogen peroxide at a very high rate (2–3 x 109 s–1 M–1;

ammoniumion. The mode of binding of Q-2 to SOD is also shown. We have performed

2O-2 + 2H+ → H2O2 + O2

Cu2+ + 2H+ → H2O2 + O2

Cu+ + O-2 + 2H+ → Cu2+ + H2O2

Page 180: 3D QSAR in Drug Design. Vol2

Density-Functional Theory Investigations of Enzyme-substrate Interactions

Fig. 2. Model of the SOD active site-superoxide adduct. (Reprinted with permission from Reference [2I],© I995 The American Chemical Society.)

Our basis set consists of plane waves (for a detailed description the reader is referred toreference [21]). The electronic structure of the complexes are presented in terms ofdensity of states (DOS).

2.2. Results and discussion

Figure 3a shows the DOS of the oxidized SOD active site — i.e. of the copper(II)-imidazole complex. The main contributions come from the imidazole ligands. The con-tributions of the other molecules present in the complexes are also shown. The four imi-dazole nitrogen atoms bound to copper produce a square planar distorted ligand field:the copper dxy , dxz and dyz levels form a large peak. Two small shoulders on the mainpeak are due to the copper dz2 orbital. At higher and lower energies lie the antibonding

171

Page 181: 3D QSAR in Drug Design. Vol2

Paolo Carloni and Frank Alber

Energy (eV)Fig. 3. DOS of the SOD active site (a) and its adduct with superoxide (b). At the top left, the Cu d-orbitalsplittings are also shown. (Reprinted in part, with permission from reference [21], © 1995 The American Chemical Society.)172

Page 182: 3D QSAR in Drug Design. Vol2

Density-Functional Theory Investigations of Enzyme-substrate Interactions

(σ*) and bonding (σ) molecular orbitals formed by the copper dx2 – y2 with the ligands.The bonding-antibonding splitting of the dx2 – y2 derived states is 5.3 eV [25].

When the superoxide is brought into close proximity of the copper ion (Fig. 3b), a new peak of the DOS at the Fermi level is present: this peak is due to a molecularorbital formed by the dx2 – y2 and the two π∗ orbitals of superoxide. A stable Cu(II)–O2

complex is formed, and an electron is partially transferred from the superoxide to thecopper(II) ion. Consistently with the stability of the Cu(II)–O2

– complex, the binding energy against dissociation into a neutral oxygen molecule and the reduced SOD active site is found to be 99 kJ/mol [25]. The reason why the dx2 – y2 orbital is substantiallyhigher in energy here than in the oxidized SOD active site is the large electron Coulombrepulsion of the copper ion.

We now discuss the effect of an important residue in the active site of the enzyme, Arg141, which has been included in the calculations. Because of its mobility [26], Arg141 could H-bond strongly to the terminal oxygen of the substrate and hence stabi-lize the Cu–O2

– intermediate, favoring the mechanisms 4 and 5. By positioning theguanidinum group of Arg141 closer to the substrate — i.e. by reducing the guanid-inum-superoxide H-bond from the 2.5 A in the previous complex to 1.5 A — we find that the electronic structure essentially does not change. The binding energy is also size-ably increased by 15 kJ/mol.

In conclusion, our calculations show that partial electron transfer process between SOD and its substrate occurs through the copper dx2 – y2 and the π∗ orbitals of the super-oxide. The electronic structure of the enzyme substrate complex is mainly influenced by Coulomb repulsion and, to a lesser extent, by spin-polarization effects of the paramag-netic Cu(II) ion and of superoxide π∗ orbitals [25]. The covalent hybridization betweenthe Cu-d and the superoxide π∗ orbitals is, in comparison, negligible.

The electronic structure at the Fermi level changes only slightly upon reducing the Arg141–superoxide distance. Other levels are shifted in energy by the long-rangeCoulomb interaction with the Arg guanidinum, whose position has been modified.

Our calculations essentially agree with earlier quantum-mechanical calculations [17], in that the ET between superoxide and copper does not take place in vacuo. However, unlike the conclusion drawn in [17], we believe that our and previous quantum-mechanical cal-culations do not necessarily imply a mechanism described by reactions 4–6; the presence of the electrostatic field of the protein/water environment might alter this picture, and the Cu–O2

– complex, which in vacuo is stable, could easily dissociate if the effect of theprotein is taken into account, consistently with the two-step mechanisms 2–3.

3. Herpes Simplex Type 1 Thymidine Kinase

Viral herpes simplex type 1 thymidine kinase (HSV1 TK) is a key enzyme in the metab- olism of the herpes simplex virus. Its physiological role is to salvage thymidine into theDNA metabolism by converting it to thymidine monophosphate: the phosphorylation is achieved by transfer of the γ -phosphate group from ATP to the 5'-OH group of thymidine:

ATP + d(T) ADP + d(Tp)

173

Page 183: 3D QSAR in Drug Design. Vol2

Paolo Carloni and Frank Alber

Understanding the biochemistry of this enzyme is important for application in the treat-ment of virus infections and for cancer chemotherapy [27–34].

We present here our recent DFT study that has focused on the HSV 1 TK nucleoside interactions [35]. Our goal is to gain a better understanding of the nature of HSV1 TKbinding interactions and of its mechanism of action.

3.1. Computationalprocedure

HSV1 TK is a dimeric enzyme with 376 residues per subunit. The two subunits arerelated by C2 symmetry (Fig. 4). The active site is formed by an ATP and a nucleosidebinding region [36]. Our modelling of HSV1 TK active site (Fig. 5) includes the residues

delled again by an ammonium ion, is also included because of its important electrostaticrole. Several HSV1 TK-thymine complexes have been considered by protonating theresidues and the substrate differently. Calculations were carried out in the DFT frame- work, using the Becke-Lee-Yang-Parr approximation for the exchange-correlation func-tionals [37,38]. The interaction between valence electrons and ionic cores is describedby pseudopotentials of the Martins-Troullier type [39]. The Kohn-Sham orbitals are ex-panded in plane waves up to an energy cutoff of 70 Ry. The complex geometries were

on the ammonium nitrogen.

3.2. Results and discussion

The difference density = ρcomplex ρ fragments – ρsubstrate describes how the electron density ρ changes during the formation of the complex. Inspection of for all com-plexes reveals that no charge transfer from or to the substrate is present. The O and Natoms of thymine as well as the Arg 163–Tyr172 H-bond are sizeably polarized.Interestingly, there is no evidence of polarization of the Met128 sulfur atom [40]. Thisindicates that sulfur plays only a minor role i n binding. That the role of Met128 sulfur inthe binding process is purely hydrophobic and steric has been confirmed by very recent site-directed mutagenesis experiments, which have shown that the activity is preservedwhen the Met residue is replaced by another hydrophobic residue such as Ile [41].

The complexes in which Tyr172 is deprotonated and thymine protonated are morestable by at least 130 kJ/mol, in terms of the binding energy, than those bearing aneutral substrate and/or neutral tyrosine. The stabilization is due to the favorable andstrong Coulomb interaction between the ionic pair. By including the electrostatic envir-onment using a standard force field [42], the energetics obtained using the quantum-mechanical calculations turn out not to be significantly affected.

We conclude that the enzyme–substrate binding is dominated by Coulombic inter-actions, with no sizeable charge transfer to or from the substrate. The calculationssuggest the existence of a ‘proton transfer complex’ between Tyr 172 and the substrate.Such a complex could be important for the efficiency of the enzyme by enhancing the rigidity of the active site.

174

fixing the thymine ring (Met128 and Tyr172); the guanidinum group of Arg163, mo-

_

fully optimized. Position constraint was imposed for the Cß of Tyr 172 and Met 128 and

Page 184: 3D QSAR in Drug Design. Vol2

Density-Functional Theory Investigations of Enzyme-substrate Interactions

Fig. 4. Structure of one subunit of HSVI TK.

4. Conclusion

The two examples shown here illustrate that the DFT approach can be used to investi- gate enzyme-substrate interactions. The DFT method appears to be well suited to treat a variety of systems and chemical bonding, from a metal-based enzyme to a protein con- taining a ‘p-complex’ between two aromatic rings. These examples also indicate that, in general, the effects of the protein/water environment should be considered to obtain a more realistic description of the interactions at the active site [5]. The development of coupled ab initio classical molecular dynamics simulations methods [43] will further improve the theoretical investigations of the mechanism of action of enzymes, by allow- ing the study of the reactive processes in the presence of the dynamical effects of the environment.

175

Page 185: 3D QSAR in Drug Design. Vol2

Paolo Carloni and Frank Alber

Fig. 5. Model of the HSVl TK active site-tymine adduct

Acknowledgements

We thank Wanda Andreoni for her valuable suggestions.

176

Page 186: 3D QSAR in Drug Design. Vol2

Density-Functional Theory Investigations of Enzyme-substrate Interactions

References

1. Hohenberg, P.C. and Kohn, W., Inhomogeneous electron gas, Phys. Rev., 136 (1964) B864–B887.2. Kohn, W. and Sham, L.J., Self-consistent equations including exchange and correlation effects, Phys.

Rev., 140 (1965) A1133-A1138.3. For a detailed discussion of the DFT methods and applications in biochemistry, the reader is referred to

Andreoni, W., Density-functional theory and molecular dynamics: A new perspective for simulations ofbiological systems, this volume.See, e.g. (a) Li, Yan and Evans, J.N.S., The hard-soft acid-base principle in enzymatic catalysis: Dualreactivity of phosphoenolpyruvate, Proc. Natl. Acad. Sci., 93 (1996) 4612–4616 (b) Hütter, J., Carloni,P. and Parrinello, M. Non-empirical calculations on a hydrated RNA duplex, J . Am. Chem. SOC.,18 (1996) 8710-8712; (c) Carloni, P. and Andreoni, W., Platinum-modified nucleobasepairs in the solidstate: A theoretical study, J . Phys. Chem., 100 (1996) 17797-17800; (d) Bernardi, F., Bottoni, A.,Casadio, R., Fariselli, P. and Rigo, A., A b initio study of the dioxygen binding site of hemocyanin: Acomparison between CASSCF, CASPT2, and DFT approaches., Int. J . Quant. Chem., 58 (1996)109-119; (e) Sagnella, D.E., Laasonen, K. and Klein, M.L., A b initio molecular dynamics study ofproton transfer in a polyglycine analog of the ion channel gramicidin A., Biophys. J., 71 (1996)1172–1178; (f) Oldziej, S.and Ciarkowski, J., Mechanism of action of aspartic proteinases: Applicationof transition-state analogue theory., J . Cornput.-Aided Mol. Design, 10 (1996) 583-588; and (g)Bajorath, J., Kraut, J., Li, Z.Q., Kitsoon, D.H. and Hagler, A.T., Theoretical studies on the dihydrofo-latereductase mechanism: Electronic polarization of bound substrates, Proc. Natl. Acad. Sci. USA,

5 . York, D.M.,Lee, T.S. and Yang, W., Quantum mechanical study of aqueous polarization effects on bio-logical macromolecules, J. Am. Chem. Soc., 118 (1996) 10940-10941,and references therein.

6. Tainer, J.A., Getzoff, E.D., Beem, K.M., Richardson, J.S. and Richardson, D.C., Determination andanalysis of the 2 Å structure of copper, zinc superoxide dismutase, J. Mol. Biol., 160 (1982) 181-217.

7. Banci, L., Bertini, I., Bruni, B., Carloni, P.,Luchinat, C.,Mangani, S.,Orioli, P.L., Piccioli, M.,Ripnieski, W. and Wilson, K.S., X-ray, NMR and molecular dynamics studies of reduced bovine super-oxide dismutase: Implications for the mechanism, Biochem. Biophys. Res. Commun., 202 (1994)

McCord, J.M. and Fridovich, I., Superoxide dismutase: An enzymatic function for an erythrocuprein (hemocuprein), J. Biol. Chem., 244 (1969) 6049-6055. Fee, J.A. and Bull, C., Steady state kinetic studies of superoxide dismutases, J. Biol. Chem., 261 (1986)13000-13005.Fee, J.A. and Di Corleto, P.E., Observation on the oxidation-reduction properties of bovine erythrocytesuperoxide dismutase, Biochem., 12 (1973) 4893–4899.Tainer, J.A., Getzoff, E.D., Richardson, J.S. and Richardson, D.C., Structure and mechanism of copper,zinc superoxide dismutase, Nature (London), 306 (1983) 284–287.Valentine, J.S., Pantoliano, M.W., McDonnell, P.J., Burger, A.R. and Lippard, S.J., pH dependent mi- gration of copper (II) to the vacant zinc-binding site of zinc-free bovine erythrocyte superoxide dismu- tase, Proc. Natl. Acad. Sci. USA, 76 (1979) 4245–4249.

13. Banci, L., Bertini, I., Luchinat, C. and Piccioli, M., Spectroscopic studies on Cu2Zn2SOD: A continuous advancement of investigation tools, Coord. Chem. Rev., 100 (1990) 67-103.

14. Fielden, E.M., Roberts, P.B., Bray, R.C., Lowe, D.J., Mautner, G.N., Rotilio, G. and Calabrese, L., Themechanism of action of superoxide dismutase from pulse radiolysis and electron paramagnetic reso-nance, Biochem. J., 139 (1974) 49-60.McAdam, M.E., A pulse-radiolysis study of the manganese-containing superoxide dismutase from bacil-lus stearothermophilus, Biochem. J., 165 (1977) 71-79, and references therein.Morpurgo, L. , Giovagnoli, C. and Rotilio, G . , Studies of the metal sites of copper proteins. Amodel compound f o r the copper site of superoxide dismutase, Biochim. Biophys. Acta, 322 (1973)204-210.Osman, R. and Bash, H., On the mechanism of action of superoxide dismutase: A theoretical study,J. Am. Chem. Soc., 106 (1984) 5710-5714.

4.

88 (1991) 6423-6426.

1088-1095.8.

9.

10.

11.

12.

15.

16.

17.

177

Page 187: 3D QSAR in Drug Design. Vol2

Paolo Carloni and Frank Alber

18. Rosi, M., Sgamellotti, A,, Tarantelli, F., Bertini, I and Luchinat, C., A theoretical investigation of thecopper-superoxide system: A model for the mechanism of copper-zinc superoxide dismutase, Inorg.Chim. Acta, 107 (1985) L21-L22.Rosi, M., Sgamellotti, A, , Tarantelli, F., Bertini, I. and Luchinat, C., A b initio calculations of theCu2+–O-2 interactions as a model fo r the mechanism of copper-zinc superoxide dismutase, Inorg. Chem.,

Banci et al. (reference [7]) have recently proposed that at low physiological substrate concentration thetwo-step mechanism would take place, whereas at higher concentrations the second mechanism wouldtake place.Carloni, P., Blöchl, P.E. and Paninello, M., Electronic structure of the Cu, Zn superoxide dismutaseactive site and its interactions with the substrate, J . Phys. Chem., 99 (1995) 1338-1348.Getzoff, E.D., Tainer, J.A., Stempien, M.M., Bell, G.I. and Hallewell, R.A.. Evolution of CuZn super-oxide dismutases and the Greek-key ß-barrel structural motif, Proteins: Struct. Funct. Gen., 5 (1989)322-366.

23. Perdew, J.P. and Zunger, A,, Self-interaction correction to density-functional approximations for many-electron systems, Phys. Rev. B, 23 (1981) 5048-5079.

24. Ceperley, M. and Alder, B.L., Ground state of the electron gas by a stochastic method, Phys. Rev. Lett.,45 (1980) 566-569.

25. To estimate spin-polarization effects, spin-polarized calculations were also carried out. For a detaileddescription of the spin-polarized effects, see reference [21].

26. Banci, L., Carloni, P., La Penna, G. and Orioli, P., Molecular dynamics studies on superoxide dismutaseand its mutants: The structure and functional role of Arg 143, J. Am. Chem. Soc., 114 (1992) 6994-7001.

27. Elion, G.B., Furman, P.A, Fyfe, J.A., De Miranda, P., Beauchamp, C. and Schaeffer, H.J., Selectivity ofaction of a n antiherpetic agent, 9-(2-hydroxymethyl)guanine5716-5720.Schaeffer, H.J., Beauchamp, C., De Miranda, P., Elion, G.B., Bauer, D.I. and Collins, P., 9-(2-hydroxy-methyl)guanine activity against viruses of the herpes group, Nature, 212 (1978) 583-585. Culver, K.W., Ram, Z., Wallbridge, S.,Ishii, H., Oldfield, E.H. and Blaese, R.M., In vivo gene transferwith retroviral vector-producer cells f o r the treatment of experimental brain tumors, Science, 256

30. Chen, S.-H., Shine, H.D., Goodman, J.C., Grossman, R.G. and Woo, S.L.C., Gene therapy for braintumors: regression of experimental gliomas by adenovirus-mediated gene transfer in vivo, Proc. Natl.Acad. Sci., 91 (1994) 3054-3057.O’Malley, B.W., Jr., Chen, S.-H., Schwartz, M.R. and Woo, S.L.C.,Adenovirus mediated gene therapyfor human head and neck squamous cell cancer in a nude mouse model, Cancer Res., 55 (1995)

Chambers, R., Gillespie, G.Y., Soroceanu, L., Andreansky, S., Chatterjee, S., Chou, J., Roizman, B, and Whitely, R.J., Comparison of genetically engineered herpes simplex viruses for treatment of braintumors in a scid mouse model of human malignant gliome, Proc. Natl. Acad. Sci., 92 (1995) 1411–1415. Vile, R.G. and Hart, I.R., Use of tissue-specific expression of the herpes simplex thymidine kinase gene to inhibit growth of established murine melanomas following direct intratumoral injection of DNA,Cancer Res., 53 (1993) 3860-3864.

34. Caruso, M., Panis, Y ., Gagandeep, S., Houssin, D., Salzmann, J.-L. and Klatzmann, D., Regression ofestablished macroscopic liver metastases after in situ transduction of a suicide gene, Proc. Natl. Acad. Sci., 90 (1993) 7024-7028.Alber, F., Kuonen, O., Scapozza, L., Folkers, G. and Carloni, P., Density functional studies on herpessimplex type I thymidine kinase–substrate interactions: the role of Tyr 172 and Met 128 in thymine fixation, Proteins: Stuct. Funct. Gen., in press.

36. Wild, K., Bohner, T., Aubry, A., Folkers, G. and Schultz, G.E., The three-dimensional structure ofthymidine kinase of herpes simplex type 1, FEBS Lett., 369 (1995) 289-292.

37. Becke, A., Density-functional exchange-energy approximation with correct asymptotic behavior, Phys.Rev. A, 38 (1988) 3098-3100.

19.

25 (1986) 1005-1008.20.

21.

22.

28.

29.

(1992) 1550-1552.

31.

1080-1085.32.

33.

35.

178

Proc. Natl. Acad. Sci., 74 (1977)

Page 188: 3D QSAR in Drug Design. Vol2

Density-Functional Theory Investigations of Enzyme-substrate Interactions

38. Lee, C., Yang, W. and Parr, R.G., Development of the Colle-Salvetti correlation-energy formula into afunctional of the electron density, Phys. Rev. B. 37 (1988) 785-789.

39. Troullier, N. and Martins, J.L., Efficient pseudopotentials for plane-wave calculations, Phys. Rev. B, 43

40. The sulfur atom of Met128 is 4.8 A away from the thymine ring (see Fig. 1). Therefore, it should inprinciple be possible to find sizable polarization effects on the sulfur.

41. Folkers, G., Pilger, B., Alber, F. and Scapozza, L., (in preparation). 42. We have used the parameterization of the GROMOS96 force field: van Gusteren, W.F., Billeter, S.R.,

Eising, A.A., Hünenberger, P.H., Krüger, P., Mark, A.E., Scott, W.R.P. and Tironi, I.G., Biomolecularsimulation: The GROMOS96 manual and user guide , BIOMOS B .V., Biomolecular Software,Laboratory of Physical Chemistry, ETH Zentrum, CH-8092 Zurich, Switzerland, 1996. Car, R. and Parrinello, M., Unified approach for molecular dynamics and density-functional theory,Phys. Rev. Lett.. 55 (1985) 2471–2474.

(1991) 1993-2006.

43.

179

Page 189: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 190: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

Didier RognanDepartment of Pharmacy Swiss Federal Institute of Technology CH-8057Zürich, Switzerland

1. Introduction

The recent growth of recombinant DNA technology (cDNA cloning, southern blotting,PCR) [1] has boosted, i n the last decade, the identification of biological macromoleculesand their expression in purity and quantity adequate for structure determination. About5000 coordinate entries arc today available in the Brookhaven Protein Data Bank [2,3]and more than 10 000 are expected at the turn of the century. Experimentally deter-mined protein structures represent by far the most promising starting point for rationaldrug design. There is, therefore, a need for filling the gap between 3D structures ofbiological targets and potential substrate/inhibitors of those macromolecules. Com-putational chemistry tries to rationalize this gap.

In parallel with the development of modern molecular biology, computational chem-istry methods have also dramatically evolved [4-61. The early static pictures of pharma-cophores [7] or, in the best cases, of protein-ligand complexes [8] have now beensupplemented by dynamic representations of drug-receptor interactions [9]. A majorbreakthrough came in the late 1970s from the application of molecular dynamics (MD)[10], initially developed for fluids [11], to biological macromolecules [12]. Basically,MD can be described by the numerical solution of Newton’s second law of motion(Eq. 1):

(1)

For a molecular system of N particles having a mass mi, atomic positions ri at a time t ,are derived from the gradient of the potential energy V,which is classically obtained bymolecular mechanics. As ri is calculated within short time steps ∆t of 1–2 fs (Eq. 2), thetime history or trajectory of all atoms may be easily monitored and thus give accessto dynamic properties that may be of interest for studying protein structure, folding,catalytic mechanisms and binding processes:

(2)

Early simulations were limited to simplified molecular representations (in vacuo) andvery short time scales (a few ps) [13–14]. However. the combined development of su-percomputer technology, algorithm parallelization [15–16], force fields [17–20] andtime-saving techniques [21] permits nowadays the simulation of complex biologicalsystems for longer time scales (up to a few ns) [22–23], and with higher accuracy [18,24].

It is not our aim in this chapter to review methodological advances [25] or potentialapplications of MD to biology [26]. As both the methodology and input 3D structures

H. Kubinyi et al.eds.), 3D QSAR in Drug Design, Volume 2. 181–209.©1998 KluwerAcademic Publishers. Printed in Great Britian.

Page 191: 3D QSAR in Drug Design. Vol2

Didier Rognan

are becoming accessible to the great majority of the scientific community, we will rather emphasize new aspects in the use of MD simulations: (i) as a tool for the rationalizationof structure-activity relationships, and (i i ) as a promising method for protein structure- based drug design.

2.

The most popular application of MD simulations is conformational sampling, especiallyif experimental constraints (distance, dihedral angles) are explicitly taken into account.Therefore, MD is nowadays an integral part of protein structure refinement usingeither NMR or X-ray diffraction constraints [27,28]. Several techniques aimed at enhancing the conformational hyperspace that can be scanned are described in theliterature [29–33].

In a drug-design protocol, these methods may be very powerful to propose reliableconformations of small molecules in their free state and even to avoid conformationalartefacts given by X-ray diffraction. One simple and useful application is the con-formational analysis of tetra-O-methyl-(+)-catechin (Fig. 1). In the crystalline state, thetwo observed conformations of the benzopyran ring places the dimethoxyphenyl moietyat C2 in an equatorial position [34]. This is in disagreement with proton NMR couplingconstants which suggest an interconversion of axial and equatorial conformations [35].A 4.5 ns MD simulation in vacuo starting from the two crystal structures not only showed the interconversion, but was also able to reproduce the NMR-derived ratiobetween the two populations of axial and equatorial conformations [34].

An efficient conformational sampling of free ligands may also reveal significant dif-ferences, related to their specificity profile for closely related receptors. The con-formational hyperspace accessible to deltorphin C and dynorphin A analogs could, forexample, explain their selectivity for δ, µ [36] and κ opioid receptors [ 37].

3.

3D QSAR models are highly dependent on the alignment of bioactive Conformations [38]. Although the complexity of conformational searching techniques does not necess-

Dynamics of Free Ligands: Efficient Sampling of Conformational Space

Synergistic Use of MD and 3D QSAR

Fig. 1. Chamical structure of tetra-O-methyl-(+)-catechin.

182

Page 192: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

arily correspond to their usefulness in determining bioactive conformations, another po-tential application of MD resides in the conformational analysis of semi-flexible mole-cules prior to pharmacophore mapping. Relevant conformational populations ormolecular properties derived from MD may thus be readily identified and imported intoa QSAR table [39–40].

Alternatively, simulations of protein-ligand complexes may explain some dis-crepancies in 3D QSAR analyses, due to a non-uniform binding mode of tabulatedmolecules and thus identify outliers. One example in our lab concerned the CoMFAstudy of class I MHC-binding nonapeptides [41]. Whatever the parameters used, nomodel was able to explain the poor activity of two compounds (Asn and Tyr analogs;see Fig. 2). Since the free ligands only were taken into account, a single binding modehad to be postulated for the 10 nonapeptides. In fact, this assumption did not depict theprobable situation. When all compounds were directly modelled and simulated in theMHC binding groove, the above-described discrepancy could be easily explained by theprogressive expulsion of the C-terminal Asn/Tyr side chains from a rather hydrophobiclocal subsite, whereas other side chains remained in the binding groove [42]. Very shortMD trajectories (30 ps) were sufficient indeed for clearly distinguishing the two out-liers. After their removal from the analysis, the model was found to be much more pre-dictive (r 2

cv raised from 0.52 to 0.75) [38]. A clear identification of outliers is one of thequality controls that are recommended for evaluating 3D QSAR models [43]. MD simu-lations could be particularly well adapted for this task, notably when several bindingmodes or conformational heterogeneity of the tabulated ligands is suspected.

4.

Protein crystal structures represent one of the most attractive starting points for a ra-tional drug-design procedure. However, X-ray structures may not be accurate enoughfor drug design because: (i) only few informations on the dynamics of the macro-molecule can be derived [44], and (ii) electron density cannot be interpreted due to a toolow resolution [45]. Here are two examples where both reasons were verified, and forwhich the complementary use of MD provided a reliable help.

The first case concerns the enzyme acetylcholinesterase (AChE) whose function i s tohydrolyze acetylcholine in cholinergic nerves. The X-ray structure of AChE fromTorpedo california has been obtained at a resolution of 2.8 Å [46]. Remarkably, theactive site is located far from the enzyme surface (about 20 Å) at the bottom of a deep,narrow gorge. This gorge may function as a cation pump by the combined action of adipole gradient (aligned within the gorge axis) and aromatic side chains delimiting thewalls of the gorge [47]. However, its mechanism of action and notably the high catalyticrate of the enzyme could not be fully explained from the crystal structure alone.Notably, three residual electron density peaks present in the gorge have to be attributedto either water molecules or small cationic species that may drive the substrate entryand fix its bound conformation [48]. MD simulation of AChE in presence of either threewater molecules or three ammonium cations filling the extra electron density provided a plausible explanation. Simulations performed in presence of water molecules yielded to

MD as a Complementary Tool to X-Ray Structure Determination

183

Page 193: 3D QSAR in Drug Design. Vol2

Fig

.2.

Pre

dict

edve

rsus

obse

rved

bioi

ogic

alac

tiviti

es(d

etec

tion

limit

conc

entr

atio

n,in

-log

M,

lend

ing

toth

ely

sis

ofcl

ass

IMH

CH

-2Ld -e

xpre

ssin

gta

rget

cells

byth

e T

cell

clon

e IE

1)

for a

set

of1

0 no

nupe

ptid

es (

Tyr-

Pro

-His

-Phe

-Met

-Pro

-Thr

-Asn

-Xaa

, Xaa

= T

yr, P

he, N

le, L

eu, I

le, M

et, V

al, A

la, A

sn,,

H)

(ref

eren

ce [

41])

184

Page 194: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

altered conformations of active site residues (rms deviations of 1.5 Å), whereas MDwith explicit definition of three small cations led to structures in remarkable agreementwith X-ray diffraction data (rmsd < 0.5 Å) [48]. The combined use of X-ray crystallo-graphy and molecular dynamics simulations clarifies here the dynamical behavior ofAChE. It reveals the transient formation of a short channel through the active site, largeenough for a water molecule [9]. A so-called ‘back-door’ hypothesis was formulated toexplain substrate/product entry/elimination. Although it was supported by the electro- static potential of the enzyme [9], it still has not been fully evidenced by recent site-directed mutagenesis studies [49].

A second possible use of MD to complement experimentally derived structuresapplies to protein crystal structures for which the electron density map of the ligand istoo sparse to ensure an unambiguous definition of its bound conformation. This hap-pened to the X-ray structure determination of class I major histocompatibility encoded(MHC) proteins. purified from natural sources (infected cells, for example). MHC-I pro-teins regulate the immune surveillance of intracellular pathogens by presenting anti-genic peptides to cytotoxic T cells at the surface of infected cells [50]. As free proteinsare unstable and need the presence of a bound peptide to properly fold [51], purificationof MHC molecules is always accompanied by a co-purification of a peptide pool (con-taining up to several hundred peptides) whose electron density cannot be solved [45,52].To propose a bound conformation of natural peptidic ligands as well as MHC-peptideinteractions, we filled the residual electron density map of the HLA-A2.1 protein by a viral antigen, known to be naturally presented by this allele. After 100 ps MD simula-tion in water, a 3D picture of a MHC-peptide complex could be proposed [53]. It wasnicely validated one year later by X-ray crystallography of the HLA-A2.1 protein incomplex with the same peptide [54]. The bound conformation of the peptide was inremarkable agreement with our MD model (Fig. 4), with rms deviations of 1.2 Å onbackbone atoms. Notably, the correct anchoring side chains (at position 2. 3 and 9) werefound in their bioactive conformations (Fig. 3). More discrepancies were observed inthe middle part of the peptide sequence (from positions 4 to 7) which is loosely boundto the protein and bulges out of the binding cleft. However, the MD model was, to thebest of our knowledge, the very first realistic three-dimensional picture of aMHC-peptide complex for which the full atomic coordinates of the bound peptide weredescribed.

5.

The rate of protein cloning/sequencing being by far higher than that of 3D structuredetermination, the majority of macromolecular targets of potential interest for thepharmaceutical community have unknown 3D structures. However, if their primary sequence is close enough to that of a protein for which a NMR or X-ray structure exists,the target protein may be modelled by homology to the existing 3D structure [55]. As a precise function is generally associated with a well-defined fold, a reliable picture of theactive site may be constructed by this technique. However, if dynamical insights intothe protein function are needed, one is getting into trouble. The main problem resides in

Simulating Truncated Active Sites in a Virtual Fluid

185

Page 195: 3D QSAR in Drug Design. Vol2

Fig

.3.

X-r

ayve

rsus

MD

mod

el o

fthe

HLA

-A2.

1-bo

und

conf

orm

atio

nof

aT

cell

epito

pe(G

ILG

FVF

TL)f

rom

the

influ

enza

viru

sm

atri

xpr

otei

n.B

ackb

one

atom

sof

the

sim

ulat

ed H

LA-A

2 ha

ve b

een

fitte

d to

the

crys

tal s

truc

ture

and

are

dis

play

ed fo

r sa

ke o

f cla

rity

. The

MD

and

cry

stal

str

uctu

re o

f the

bou

nd n

onap

eptid

e ar

e sh

own

as g

ray

and

whi

te b

alla

ndst

icks

resp

ectiv

ely.

pepi

depo

sitio

ns a

re la

bele

d at

the

ato

ms,

from

the

N-te

rmin

us (P

1) to

the

C-te

rmin

us (P

9).

186

Page 196: 3D QSAR in Drug Design. Vol2

Molecular Dynanmics Simulations: A Tool for Drug Design

Fig. 4. Schematic display of a truncated active site, surrounded by a pseudo-particle fluid. The particles lacated in the outer shell (rl < r <r2) are harmonically constrained, whereas the particles of the inner shell(r < rI) and the truncated active site are able to freely move.

building the outer part of the active site, for which less homology to the crystal tern-plate(s) is usually observed. A work-around is to construct a pseudo-receptor model

may reproduce as closely as possible the real interaction capacities of the full protein.However, the flexibility of ligand-receptor interactions is not really taken into accountby pseudoreceptor models.

To enhance this representation, we have developed an MD method within theGROMOS program [57], able to simulate free amino acids of an active site surroundedby a pseudo-particle fluid reproducing the behavior of the outer amino acids and thesolvent (Fig. 4) [58]. The properties of the particle (radius, charge, dipole moment) may be chosen in order to reproduce either a hydrophobic or hydrophilic proteinenvironment [51] (Eqs. 3–4):

(3)

(4)

187

[56], a collection of individual amino acids connected by artificial bonds or spacers, that

Page 197: 3D QSAR in Drug Design. Vol2

Didier Rognan

The potential energy V between two particles depend on their nature. Simple Lennard-Jones particles interact only through van der Waals dispersion forces (VLJ ; Eq. 3) as afunction of the interacting distance r, their diameter (about 6 Å) and the potential wellvalue ε at the optimal interaction distance. They reproduce pure hydrophobic environ- ments. More sophisticated dipole particles were created to simulate polar outer parts.They consist of two pseudoatoms i, j of opposite charge q connected by a very short bond. Their dipole moment (1.44 D) was slightly larger than that of an N-H bond.

Several protein structures (adenylate kinase, retinol binding protein, HIV-1 protease,HLA-B27 human leukocyte antigen) for which a crystal structure exists could be prettywell reproduced by simulating the truncated active site-ligand complexes in a virtualfluid. Positional rms deviations, atomic fluctuations, radius of gyration and hydrogen-bonding patterns were close to that of the parent crystal structure, and nearly as accurateas the corresponding values obtained after standard MD simulation of the fully solvatedcomplex [58]. The method was able not only to reproduce experimentally determinedprotein-ligand complexes, but could furthermore well explain the effect of proteinmutation at the active site, or rationalize the different binding affinities of relatedligands to the same macromolecule [59]. The main advantage of the method are: (i) itcombines the rapidity of in vacuo simulations with the accuracy of computations in afull water environment, and (ii) it takes into account the only part of the protein (activesite) that can be reliably derived by homology modelling.

6.

One of the most exiting applications of MD to biology is the computation of free energydifferences that may be related to the experiment [60]. Any biological process can be

As the free energy G isa state function (∆G of a closed cycle is zero), and the free energy change ∆G describ-ing a system in equilibrium is path-independent. experimentally determined free energychanges (horizontal processes corresponding to biological equilibria; Fig. 5) can be di-rectly compared to computed free energy changes (vertical processes corresponding totheoretical equilibria: Fig. 5). In practice, the starting state ‘0’ is progressively con-verted into the final one ‘1’,by modifying a state-dependent parameter λ from 0 to 1 atdiscrete intervals in λ. Free energy differences ∆G are generally calculated from the po-tential energy function V λ(i) by the free energy perturbation technique (Eqs. 5 and 6) orthe thermodynamic integration method (Eq. 7):

Quantitative Analysis of Ligand Binding

(5)

(6)

(7)

The major drawbacks of free energy calculations arc: (i) an experimentally deter-mined protein structure of high accuracy is recommended as a starting point: (ii) it is a

188

formulated by the corresponding thermodynamic cycle (Fig.5)

Page 198: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

Fig. 5. Thermodynamic cycle for the formation of a protein (P)-ligand (L) complex. Horizontal equilibrium values are measured experimentally (free energy or binding of ligands L1, L2 to the protein P; ∆G1bind,∆G2bind), whereas vertical equilibria can only he computed (free energy change upon ligand mutation inwater or in the protein-bound state; ∆Gsol, ∆Gprot), Experimentally determined free energy differences(∆G1bind – ∆G2bind) are equal to computed ones (∆Gsol– ∆Gprot ). lf L2 is a dummy molecular the free energydifference relates to the absolute free energy of association of ligand L1 to protein P.

time-consuming method as free energy changes need to be calculated for the free and bound states, in forward and backward directions: ( i i i ) i t necessitates an exhaustive con-formational sampling within each perturbation window for all intermediate statesbetween the starting and final one; and (iv) it better applies to amino acid mutations forwhich force-field parameters are generally best derived. This means that mutating asmall molecular weight inhibitor into an analog needs, first of all, an accurate para-meterization of the two organic molecules.

In the literature, numerous examples are provided for computing free energy changesupon protein and/or ligand mutation (for a recent review see [61]). The advantage of themethod is that i t allows a clear distinction of entropy and enthalpy contribution uponligand binding [62,63] that may be used in a lead optimization program, for example.Many applications are focused on solvation free energies of small molecules [6,17], orhost-guest interaction complexes [64,65]. Quantifying protein-ligand interactions ismuch more complex. Generally, free energy changes upon protein mutation is con-sidered to study the protein function and mechanism of action [66,67]. However, some reports denotes that it may be applied with reasonable success to the quantitative

189

Page 199: 3D QSAR in Drug Design. Vol2

Didier Rognan

analysis of ligand binding [68–71], notably for determining its stereospecificity [72,73]or optimizing its solubility and bioavailability [74,75].

One challenging problem is the computation of absolute free energies, like thefree energy of association of a ligand to a protein [61,76], and for that task, ‘doubleannihilation techniques’ [77] have to be used. That means that the ligand is progres-sively mutated to ‘nothing’ (dummy atoms), both in solution and in the protein-bound state. The major difficulty that is encountered is to relate the drastic chemicalmodification of the ligand to a rather large absolute free energy (about 10-15 kcal/mol). For two different protein-ligand complexes (boitin/streptavidin, N-acetyltrypto-phanamide/ α-chymoptrypsin), absolute ∆∆G binding values were overestimated by3-4 kcal/mol but remained in qualitative good agreement with experimental values[76], even if the free energy of association is very high (18 kcal/mol for the biotin-streptavidin complex). Whether ∆∆G changes may be decomposed into van der Waalsand electrostatic contributions (a very important parameter for lead optimization) is stilla matter of debate [78–79] as free energy decomposition is path-dependent [78] andindividual components converge very slowly [62].

Another interesting aspect of free energy calculations is the prediction of cavityhydrations at the protein-ligand interfaces. At least two water-mediated H-bonds wereshown to be energetically necessary for hydrating a protein-ligand interfacial cavity[80]. A recent survey of 19 high-resolution protein X-ray structures provided even moredrastic requirements as 80% of water molecules bridging protein-ligand interactionswere involved in three or more H-bonds [81 ]. Taking into account accurate positions ofprotein-bound water molecules clearly accelerates and fastens ligand design, as recentlyexemplified by the protein-based design of thymidilate synthase [82] and HIV- 1 pro-tease inhibitors [83].

7.

MD models are not aimed at replacing experimentally determined three-dimensionalstructures. In the very best cases, using time-consuming protocols. rms deviations ofabout 1 Å from the starting crystal structure have to be expected [18,24].However, suchaccurate representations are not very realistic in a drug-design protocol. A drug-designcycle such as that proposed by Blundell [84] can only be successful if it relies on aninteractive multidisciplinary approach that provides quick answers and feeback .Obviously. ns simulations of a series of protein-ligand complexes are not compatiblewith this rule, and would delay the prioritization of ligand synthesis and testing.

However, this does not mean that MD simulations of a macromolecular target with aset of related ligands cannot be very useful if one is interested in qualitative aspects ofligand design. MD models present the advantage to propose instantaneous or time-averaged molecular properties that can clearly discriminate high-affinity from low-affinity ligands [85] and thus may either explain [85,86] or predict binding properties [87,88]. The major difficulty of this approach is to find the best compromise between an

the field wewere interested in (antigen recognition by class I major histocompatibility proteins), and

Qualitative Analysis of Ligand Binding

190

acceptable accuracy level and a reliable force-field representation [89] . In

Page 200: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

which will be developed in that section, 200 ps simulation of fully solvated protein-ligand complexes were sufficient to depict predictive models. The most informativemolecular properties of the hundred MHC–ligands complexes we have simulated todate will be briefly reviewed.

7.1.

Starting from a basic assumption — the more bound, the less flexible — we tried torelate the binding affinity of a set of six bacterial peptides to a single class I MHCprotein (Table 1).

Atomic fluctuations of the bound peptides, averaged per residue, were thus computedfrom energy-minimized structures that had previously been averaged over 500 con-formations, for each complex (Fig. 6a) . It is important to recall that all complexes weresimulated using identical conditions, starting from the same protein X-ray structure andthe same peptide conformation (with preservation of the χ1 , χ2 angles for side chains)[85]. Interestingly, the C-terminal amino acid (position 9) of the two low-affinity pep- tides 4, 5 (Table 1 ) was highly flexible i n the protein-bound state. As this position wasknown to be a dominant anchor to the MHC protein [90],we could reasonably relate theweak affinity of the corresponding two ligands to a probable dissociation of theC-terminal residue from the binding groove, illustrated here by its higher atomic flex-ibility. In about 90% of all MHC-peptide complexes simulated in our group, this mole-cular property was always in qualitative agreement with the observed binding affinity ofthe corresponding ligand. The atomic mobility of each peptide amino acid could be wellrelated to its binding role. Strong anchoring positions (P1, P2, P3 and P9; and Pnmeaning position n of the peptide) were generally much less flexible that weak anchor-ing positions (P4 to P8, see Fig. 6a). Furthermore, amino acid flexibility was used as aguide to enhance the free enthalpy of binding of designed non-natural peptides to the

Atomic positional fluctuations of the bound ligand

Table 1

Peptide no. Sequence Binding affinitya

1 RRIKAITLKb n.d.c2 QRLKEAAEK +

3 RRKAMFEDI ++

4 ERLAKLSGG –5 LRDAYTDML –6 RRKAMFED +

ª The affinity for HLA-B*2705 w a s indirectly measured by an epilope stabilization assay which titrates

Binding of bacterial peptides to HLA-B*2705 (reference [85])

the ability of the ligand to induce the native fold of the HLA protein, that can be recognized by aconformation-specific monoclonal antibody (reference [91]) (++;binding affinity lower than 1 µM ;+; binding affinity between 1 and 20 µM; a n d –:n o delectable binding). The six peptides correspond tosequences of the 57 kD hcat shock protein (GroEL) of Chlamydia trachomatis.

determination of HLA-B27 (reference [52]).b Peptide model fitted in the extra electron density map produced by self-nonapeptiecs, in the crystal structure

c Not determined.

191

Page 201: 3D QSAR in Drug Design. Vol2

Didier Rognan

Fig. 6. Root mean square atomic fluctuations of a protein-bound ligand as an indicator of the protein–ligandcomplex stability. A, atomic flexibility of six HLA-B*2705-bond peptides 1–6 (Table 1), showing variablebinding affinities for the HLA protein. Fluctuations were calculated from energy-minimized conformations(1000 steps of steepest descent followed by 1000 steps of conjugate gradient energy minimization), time-aver- aged over the last 500 conformations [85]. B, atomic flexibility of a series of HLA-B*2705 bound nonapep-tides 7–11 for which a secondary anchor position (P3) has been varied (Table 2).

192

Page 202: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

Table 2 design of nonnatural HLA-B27 ligands from a bacterial epitope (reference [87])

Peptide no. Peptide sequenceP1- P2- P3- P4-P5- P6- P7- P8- P9Lys-Arg-Xaa-Ilc-Asp-Lys-Ala-Ala-Lys

Binding affinity

7 Glyª +8 Leu +9 Hpab ++

10 Anac ++11 Bnad ++

ª Peptide 7 is a natural squence (117– 125) of the Chlamydia trachomotis GroEL protein (reference [91]).

b Hpa, homophenylalanine. c Ana. α-naphthylalanine.d Bna, β-naphthtylalanine.

Binding affinities were measured as described in Table 1.

same MHC protein, HLA-B*2705 [87,88]. We substituted bulky hydrophobic residues and β-naphthylalanine, homophenylalanine; see Table 2) for the amino acid of a

natural peptidic ligand at a secondary anchor position (P3). As additional interactions to the protein were desired, a better binding of the designed analogs should result in a de-creased mobility of the mutated peptide position. After simulating the natural as well as the nonnatural peptides in complex with the host HLA protein, the lowered flexibilityof the non-natural amino acid at P3 (Fig. 6b) led us to predict an increased binding of ligands 9–11 when compared to the two natural peptides 7, 8. Binding assays [87], aswell as thermodynamic analysis of the complex denaturation by CD spectroscopy (Fig. 7), were in perfect agreement with the above predictions, as non-natural ligands9–11 were about 50-fold more potent binders than natural peptides 7–8 [87].

A very similar strategy was used for designing an optimal linker for trivalent throm- bin inhibitors [ 93]. The atomic fluctuations of the linker part. derived from 150 ps MD simulations of several solvated thrombin–inhibitor complexes were also in good agree- ment with its contribution to -thrombin binding. In both cases, looking at atomicpositional fluctuations of bound ligands provides a crude but predictive estimate of the stability of protein-ligand complexes.

7.2.

These molecular properties are generally correlated to the atomic flexibility of the bound ligand (the more flexible the less buried). In the two examples cited above,buried surface areas could also be well related to binding affinities [85,87]. Notably. for the set of non-natural analogs 9–11, designed to increase the free enthalpy of binding to the protein (Table 2), the amount or buried surface area for the mutated position was inperfect agreement with the size of the corresponding side chain (indicating that almost the entire part of the side chain was buried upon binding to the complementary hydrophobic pocket) and, consequently, with the binding affinity (Fig. 8a).

Accessible versus buried surface areas

193

Page 203: 3D QSAR in Drug Design. Vol2

Fig

.7.

Ther

mal

den

atur

atio

n of

the

com

plex

bet

wee

n H

LA-B

*270

5 an

d tw

o re

late

d lig

ands

7, 1

0.M

eltin

g te

mpe

ratu

res (

Tm) o

fthe

com

plex

es w

ith p

eptid

es 7

and

10

hav

e be

en e

stim

ated

to 4

8ºC

and

57º

C,r

espe

ctiv

ely.

The

con

trib

utio

n of

the

non-

natu

ralc

hain

to th

e fr

ee e

nerg

y of

bin

ding

may

be

estim

ated

from

Tm

valu

es to

2 k

cal/m

ol (r

efer

ence

[92]

).

194

Page 204: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

Fig. 8. Accessible/buried surface areas as indicators of the stability of protein-ligand complexes [87]. A,Buried surface areas of five HLA-B27 bound peptides (ligands 7–11, Table 2), calculated on relaxed time-averaged conformations (for the last 500 conformers, between 100 and 150 ps of simulation). Surface areaswere calculated for each peptide position, using the MSprogram [94] with a 1.4 Å radius probe.B, Substitution of organic spacers for a pentapeptide core sequence (ligands 12–16, table 3) of a bacterialpeptide. Buried and accessible surface areas were measured as in (A) for the whole ligand.

195

Page 205: 3D QSAR in Drug Design. Vol2

Didier Rognan

Table 3 Introduction of organic spacers in the sequence of a class I MHC ligand ( Gln-Arg-Leu-Spacer-Lys ) (reference [87])

Peptide no. Spacer Binding affinity

12 Ly-Glu-Ala-Ala-Gluª +13 Aba-Aba-Abab +14 Aha-Ahac +15 Auad +16 Adae +

ª Escherichia coli dnaK (220-228) epitope [91]. b Aba, 4-amino-butyric acid.c Aha, 6-amino-hexanoic acid.d Aua, 11-amino-undecanoic acid.e Ada, 12-amino-dodecanoic acid.

Binding affinities were measured as described in Table 1.

The predictability of that molecular property was verified in another design attemptaimed at simplifying the nonapeptide structure of natural HLA ligands [87]. Startingfrom a bacterial peptide known to bind to HLA-B27, we have substituted variousorganic spacers for a pentapeptide core sequence (from positions 4 to 8; see Table 3),suspected to interact with a T cell receptor (TcR) and not the HLA protein [52].Basically, we wished to simplify as much as possible the peptidie structure of the ligandwithout impairing its affinity for the host MHC protein; and at the same time, try to alterTcR recognition. In terms of desired molecular properties, this means that the totalburied surface area of the designed ligands should remain equal to that of the parentpeptide (implying a conserved binding to the HLA protein), but the total accessiblesurface area should be considerably decreased (implying a reduced binding to the TcR).After simulating the natural as well as the designed ligands in complex with HLA-B27,we computed accessible and buried surface areas for ligands 12–16 in the bound state(Table 3). It clearly appeared that our design goal was perfectly reached, as allMHC-peptide pairs were predicted to be equally stable from the analysis of atomic tra- jectories. The surface area accessible to water (or a putative TcR) had been significantly reduced by 50% for all ligands bearing the designed organic spaccrs (Fig. 8b). However, this should not preclude for a tight binding to the HLA-B27 molecule because the total surface area buried upon MHC binding. for each non-natural analog. was similar to that of the natural peptidie ligand (about 600 Ų). All ligands were synthe-sized and tested in an in vitro binding assay. As predicted, the introduction of organic spacers did not alter the binding affinity for the HLA-B27 protein, as very similarbinding affinities could be observed [ 87,95]. The ‘spacer effect’ was independent on the parent peptide sequence as similar modifications on three different T cell epitopes led tothe same computational and experimental results [D. Rognan and J.A. López de Castro, unpublished results]. These molecules are the very first class I MHC ligands for which half of the canonical peptide structure (P4–P8) have been successfully replaced byorganic spacers, and represent the first rational step towards nonpeptide TcR partialagonists or antagonists.

196

Page 206: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

7.3. Protein-ligand non-bonded distances

Monitoring the time course of critical topological features (non-bonded distances,angles) is often used to analyze trajectories of protein-ligand complexes [96–100]. Theregio- and(or) stereoselectivity of hydroxylation of nicotine and several camphor deriv-atives by cytochrome P450cam could have been predicted with a good accuracy bysimply looking at the non-bonded distances between substrate carbon atoms and theferryl oxygen intermediate of the heme moiety [73,96,97]. Examination of some keydistances was also used to study the deacylation enantioselectivity of acylenzymes [97].Such analyses are best suited for studying enzyme–substrate interactions for which awell-defined topology of a few atoms in the active site is often associated with a precisebiochemical event.

For enzyme–inhibitor complexes or when the macromolecule–ligand interactionsurface is very broad (e.g. MHC-peptide complexes), one cannot restrict the trajectoryanalysis to a few atom-centered non-bonded distances. For examining the finespecificity of antigen binding to two class I HLA alleles differing by only one aminoacid (Table 4), we have used a slightly different approach [86], in which key distanceswere not measured between atoms, but center of masses (cmass). The distance betweenprotein and peptide cmass remained constant for the most stable pairs 17a,b–18a,b,whereas it was still increasing after 200 ps MD for the less stable complexes 19a,b (seeFig. 9 and Table 4). Computing inter-cmass distances between individual MHC-bindingamino acids and their complementary pocket allowed the identification of the peptidepart (position 9) that was progressively expelled from the binding groove. The analysiswas greatly facilitated by the fact that the peptide sequence could be well separated intoa protein-binding substructure (P1–P2–P3 and P9) and a non-interacting part (P4-P5-P6-P7-P8). Furthermore, each MHC-binding amino acid develops allele-specific interactions with complementary pockets of the MHC proteins. Therefore, thisapproach is particularly well suited to examine protein-peptide interactions and monitorthe binding contribution of each peptide amino acid.

7.4. Protein-ligand hydrogen-bonds

The distribution of protein-ligand H-bonds is also a good indicator of the complexstability. Very often, a pure quantitative H-bond analysis based on distance-anglecriteria [102,103] is performed on time-averaged structures [85,93,99,100]. For rational-izing subtle structure-activity relationships for the set of peptides 17–19 (Table 4), wecombined a quantitative and qualitative analysis of intermolecular H-bonding by com-puting the frequency of all protein-ligand H-bonds [86] (Fig. 10).The distribution ofstrong and medium H-bonds correlates well with the binding potency of the peptide. Asimilar number of strong H-bonds were found for complexes 17a, 17b, 18a and 18b,consistent with the similar binding efficiencies of peptides 17 and 18 to both MHCalleles. At the opposite, a reduced number of medium and(or) strong H-bonds (peptide19 in complex with the two alleles) correlates with the decreased binding of thispeptide. The weakest binding potency (peptide 19 to B*2703) could be qualitatively and

197

Page 207: 3D QSAR in Drug Design. Vol2

Fig

. 9.

Tim

e co

urse

of t

he d

ista

nce

betw

een

the

cent

er o

f mas

s of p

eptid

es 1

7–19

and

that

of t

he h

ost H

LA-B

27 s

itht~

pe (

a, B

*270

3; b

, B*2

705)

. Bin

ding

affi

nitii

es o

fth

e th

ree

ligan

ds fo

r th

e tw

o al

lele

s ar

e lis

ted

in T

able

4.

198

Page 208: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

Table 4 Distance in Å, between proteins (HLA-B*2703, HLA-B*2705) and peptide center of masses (refer-ence [86])

ª Human histone H3 peptide: a self peptide, naturally hound to HLA-B*2705 [90] and B*2703

b Binding data arc expressed as the micromolar excess of peptide analog relative to the wild-type peptide 17,(reference [101]).

at which HLA-B27 fluorescence (measured by FMC analysis with an anti-B27 monoclonal antibody) onRMA-S cells was half the maximum obtained with the wild-type peptide.

c d1, protein/peptide; d2, protein/MHC-anchors (P1-P3, P9); d3, protein/TcR-anchors (P4–P8); d4, pocketA/P1; d5, pocket B/P2; d6, pocket D/P3; d7, pocket F/P9.

quantitatively explained by the low number of strong hydrogen-bonds. Interestingly, not only the number but also the quality of the MHC–ligand interactions (strong versus medium H-bond) correlates well with the binding potency.

7.5. Protein–ligand non-bonded contacts

Reporting the number of non-bonded contacts between the host protein and a series, a related ligand affords a more general survey of intermolecular contacts. In our design attempt to simplify and enhance the binding of natural T cell epitopes to the HLA-B*2705 protein, we substituted two organic polymers for the pentapeptide sequence (from position 4 to 8) of the bacterial nonapeptide 12 (Table 5) . After parameterizationof the organic polymer for the AMBER force field [104] and MD simulation of the cor-responding three complexes, we predicted an increased binding of the tetramer analog 20 and a decreased binding affinity of the trimeric spacer 21 with respect to the naturalnonapeptide [89], by simply comparing the number of protein-ligand non-bonded con- tacts closer than 4 Å (Fig. 11). Interestingly, the 200 ps trajectories were long enough topredict an enhanced binding role of the tetrameric polymer to the central part of the HLA binding groove (increased number of non-bonded contacts with the spacer), whereas the reduced binding of the trimeric compound was attributed to its short length and the perturbation of the interaction of the N-terminal part (less contacts to the very important P1–P3 peptide sequence, in spite of a high number of interactions in the

199

Page 209: 3D QSAR in Drug Design. Vol2

Fig.

10.

MH

C–p

eptid

e hy

drog

en-b

ondi

ng fr

eque

ncy

for p

eptid

es 1

7–19

in c

ompl

ex w

ith H

LA-B

*270

3 (a

) or H

LA-B

*270

5 (b

). Bi

ndin

g af

finiti

es o

f the

thre

e lig

ands

fo

rth

etw

oal

lele

sar

e lis

ted

inTa

ble

4.H

-bon

dsha

vebe

enhe

rege

omet

rical

lyde

fined

byan

acc

epto

r(A

)to

dono

r(D

)di

stanc

ele

ssth

an3.

25Å

and

aD

–H...

.Aan

gle

high

er th

an 1

20 d

eg. I

nter

actio

ns w

ere

satis

tical

ly m

onito

red

thro

ugho

ut th

e si

mul

atio

ns fo

r a to

tal o

f 400

con

form

atio

ns p

er M

HC

–pep

tide

com

plex

. Tw

o ca

tego

ries

of H

-bon

ds w

ere

defin

ed: s

tron

g on

es w

ith fr

eque

ncie

s hig

her t

han

50%

, and

med

ium

one

s with

occ

urre

nces

bet

wee

n 25

% a

nd 5

0%.

200

Page 210: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

Table 5 Incorporation of two organic polymers in the sequence of a class I MHC ligand (Gln-Arg-Leu-Spacer-Lys) (reference [89])

Peptide no. Spacer

12 Lys Glu-Ala-Ala-Gluª

20 (R)4b

21 (R)3

ª Escherichia coli dnaK (220–228) epitope (reference [91] ).b The structure of the organic moiety R will he published elsewhere.

spacing part). The number of non-bonded contacts was in good agreement with the aposteriori observed binding results for the three compounds. When compared to thenatural peptide ligand 12, the tetrameric and trimeric compounds exhibit a 5-fold increase and a 4-fold decrease in HLA-B*2705 binding, respectively [J.A. López de Castro. personal communication.

8. Ligand Design and Docking

In the past five years, tremendous research efforts have been devoted to computer algorithms able to optimize the de novo design of ligands from the knowledge of protein three-dimensional structures [4,105]. One of the major drawbacks of fragment build-upprocedures (one of the main techniques used in de novo drug design, with 3D databasesearching) is the ‘irreversible’ nature of the ligand-design procedure. Once a fragment(substructure) complementary to one part of the active site has been designed. its struc- ture cannot generally be perturbed (e.g. ring closure or opening) during the ongoing ligand design, in order to optimize non-bonded interactions of the final hit with the receptor active site. This problem was addressed by coupling ligand building to MD in two closely related algorithms [106,107] in which the protein active site is filled with either particles or fragments that can be covalently linked or separated in a stochastic and dynamically reversible manner. While the particle-based approach is only adequate for building apolar ligands, the fragment-based procedure takes into account electro-static contributions to the protein–ligand binding energy. Both procedures, tested on the same set of two high-resolution crystal structures (HIV1-protease, FKBP-12), suggestedinhibitors closely related to existing ligands. Unfortunately, this procedure is notoptimally suited for pharmaceutical purposes, because of the complexity of the struc-tures that are generated (none of the hits proposed in the original papers had been synthesized to test the validity of the method), and the computing effort that is askedfor.

Flexible docking of small molecules to known three-dimensional protein structures will be the last application of MD to drug-design techniques that will be reviewed here. Although recently described algorithms can relatively well handle the ligand flexibility problem [108–110], the potential flexibility of the active site still remains an open ques- tion. Generally, this problem can be addressed only by saving different conformationsor rotameric states of the protein active site and use these coordinates as starting points

201

Page 211: 3D QSAR in Drug Design. Vol2

Fig

.11

.Su

bstit

utio

nof

orga

nic

po1y

mer

s(19

,20)

for t

hena

tura

l pen

tape

ptid

e co

re(L

ys-G

lu-A

la-A

la-G

lu)

ofan

HLA

-B27

-bin

ding

bact

eria

lepi

tope

(pep

tide

12.

Tabl

e 5)

. PN

, P2,

P3

and

PC

rep

rese

nts

the

N-t

erm

inal

, sec

ond,

thir

d an

d C

-ter

min

al r

esid

ues,

res

tivel

y. I

nter

mol

ecul

ar n

on-b

onde

d co

ntac

ts w

ere

sum

med

up

for

the

spac

ing

grou

p (S

pace

r). N

on-b

onde

d co

ntac

ts b

etw

een

each

liga

nd r

esid

ue a

nd a

ny p

rote

in a

tom

clo

ser

than

4 Å

wer

e ca

lcul

ated

from

tim

e-av

erag

ed c

on-

form

atio

n ba

sed

on 2

00 p

s M

D s

imul

atio

ns in

a w

ater

she

ll (r

efer

ence

[89

]).

202

Page 212: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

for parallel docking attempts. The only described methodology that can intrinsicallyprovide a reliable solution is a molecular dynamics docking technique [111], in whichthe motion of the cmass of the ligand is separated from its internal/rotational motion andfrom that of the receptor by three different coupling to thermal baths. When applied tothe flexible docking of phosphocholine to the frozen McPC603 antibody structure, thecmass of the ligand was shown to explore a very wide conformational space (about0.8 mn²) and to land after 20 ps simulation in a crystal-like situation (rmsd of 0.5 Å forsubstrate skeleton atoms ) [111]. No attempt to dock a flexible ligand into an un-constrained protein was reported, but the method appears to be very promising to depictlocal geometry modifications of the active site upon docking of a flexible ligand.

9. Concluding Remarks

Up to very recently, molecular dynamics simulations of protein(DNA)-ligand complexeshad not been very popular in lead finding and optimization for two major reasons. Thefirst one was the high computing effort that was required, which precluded for the com-parison of a set of related compounds in their bound states. The second one was the low number of 3D structures for macromolecular targets of interest. The situation has dramati-cally evolved in the last two or three years, as MD simulations are nowadays feasible foreven larger and larger molecular systems, at relatively low cpu time costs, on affordablecomputer platforms [112]. The variety of all MD applications reviewed in this chapter is particularly well adapted to everyday drug-design problems and makes from that com-puting method a very powerful tool for the comparative and qualitative analysis ofprotein-ligand structures that may go beyond available experimental data.

However, it should be recalled that: (i) MD models will neither substitute for experi-mentally determined structures, (ii) MD is not the only potent conformational samplingtechnique [113] and (iii) the successful application of molecular dynamics simulationsin a drug-discovery program absolutely needs a strong and permanent feedback to theexperiment.

Acknowledgements

I wish to thank Professor Gerd Folkers (ETH Zürich) for critical reading of the manu-script, and Arthur for being so quiet at home during the writing period. The financialsupport of the Schweizerischer Nationalfonds zur Förderung der wissenschaftlichen Förschung (Project No. 31-45504.95), as well as the allocation of cpu time on the CRAY-J90 and the INTEL Paragon by the calculation center of the ETHZ, is sincerelyacknowledged.

References

1. Sambrook, J., Fritsch, E.F. and Maniatis, T., Molecular cloning: A laboratory manual, 2nd Ed., ColdSpring Harbor Laboratory Press, 1989.

2. Bernstein, F.C., Koetzle, T.F., Williams. G.J.B., Meyer, E.F., Jr., Brice, M.D.,Rodgers, J.M.,Kennard,O., Shimanouchi, T. and Tasumi, M., The protein data bank: a computer-based archival life for mocro- molecular structures, J. Mol. Biol., 112 (1977) 535–542.

203

Page 213: 3D QSAR in Drug Design. Vol2

Didier Roqnan

3. The current status (18 December 1996) of the PDB is: 5504 coordinate entries (5096 proteins, 396 nucleic acids, 12 carbohydrates).

4. Lybrand, T P., Ligand-protein docking and rational drug design, Curr. Opin. Struct. Biol., 5 (1995)224–228.

5. Bamborough, P. and Cohen, F.E., Modeling protein-ligand complexes, Curr. Opin. Struct. Biol., 6 (1996) 236–241

6. Brooks. C.L., III. and Case, D.A., Simulations of peptide conformational dynamics and thermodynamics,Chem. Rev., 93 (1993) 2487-2502.

7. Marshall, G.R., Barry, C.D., Bosshard, R.A., Dammkoehler, R.A. and Dunn, D.A., The conformationalparameters in drug design: The active analog approach, In Olson, E.C. and Christoffersen, R.E. (Eds.)

8. Goodford. P.J., A computational procedure for determining energetically favorable binding sites onbiologically important macromoleculars, J. Med. Chem., 28 (1985) 849–857.

9. Gilson, M.K., Straatsma, T.P., McCammon, J.A., Ripoll, D.R., Faerman, C.H., Axelsen, P.H., Silman,I. and Sussmann, J.L., Open ‘back door’ in a molecular dynamics simulation of acetylcholinesterase,Science, 263 (1994) 1276-1278.

10. Alder. B.J. and Wainwright. T.E., Studies in molecular dynamics: I. General methods, J. Chem. Phys., 31 (1959).459–466.

11. Rahman, A., Correlations in the motion of atoms in liquid argon, Phys. Rev. 136 (1964) 405–411. 12. McCammon, J.A., Gelin, B.R. and Karplus, M., Dynamics of folded proteins, Nature. 267 (1977)

13. McCammon, J.A., Wolynes, P.G. and Karplus, M., Picosecond dynamics of tyrosine side chains in

14. van Gunsteren, W.F. and Karplus, M., Effect of contraints, solvent and crystal environment on protein

15. Plimpton, S. and Hendrickson, B., A new parallel method for molecular dynamics simulation of macro-

16. Lim, K.T., Buinett, M., Iotov, M., McClurg, K.B., Vaideli, N., Dasgupta, S., Taylor, N. and Goddaard,

585-590.

proteins, Biochemistry, 18 (1979) 927–42.

dynamics, Nature, 293 (1981) 677-678.

molecular systems, J. Comput. Chem., 17 (1996) 326–337.

program, J . Comput. Chem., 18 (1997)501–52117. Cornell, W.D., Cieplak, P., Bayly, C.I., Gould. I.R., Merz, Jr., K.M., Ferguson, D.M., Spellmeyer, D.M.,

Fox, T., Caldwell, J.W. and Kollman, P.A., A second generation force field for the simulation ofproteins, nucleic acids, and organic molecules. J. Am. Chem. Soc., 117 (1995) 5179–5197.

18. Kitson, D H., Avbelj. F., Moult, J., Nguyen, D.T., Mertz, J.E., Hadzi, D. and Hagler, A.T., On achieving better than 1 Å accuracy in a simulation of a large protein: Steptomyces griseus protease A, Proc. Natl.Acad. Sci. USA., 90 (1993) 8920-8924.

19. Halgren, T., Merck molecular force-field: I. Basics. the scope parameterization and performance of MMFF94, J . Comput. Chem., 17 (1996)490–511.

20. Jorgensen, W.L., Maxwell, D.S and Tirade-Rives, J., Development and testing of the OPLS all-atomforce-field on conformational energetics and properties of organic liquids, J. Am. Chem. Soc.,118(1996) 11225-11236.

21. van Gunsteren, w., Computer simulation of biomolecular systems: Overview of time-saving techniques,AIP Conf. Proc., 239 (1991) 131–146.

22. Brunne. R.M., Berndt. K.D., Guntert, P., Wuthrich, K. and van Gunsteren, W.F., Structure and internal dynamics of the bovine pancreatic trypsin inhibitor in aqueous solution form long-time molecular dynamics simulations, Proteins: Struct. Funct. Genet., 23 (1995) 49–62.

23. Weerasinghe, S., Smith. P.E. and Pettitt, B.M., Structure and stability of a model pyrimidine–purine–purine DNA triple helix with a GC.T mismatch by simulation, Biochemistry. 34 (1995)16269–78.

24. York, D.M., Vlodawer, A., Pedersen, L.G. and Darden. T.A., Atomic-level accuracy in simulations oflarge protein crystals, Proc. Natl. Acad. Sci. USA., 91 (1994) 8715–8718.

25. Elber, R.E., Novel methods for molecular dynamics simulations, Current Opin. Struct. Biol., 6 (1996) 232–235.

204

III. W.R., Molecular dynamics for very large systems on massively parallel computer: The MPSim

Computer-assisted drug design, ACS Symp. series, Vol. 112, American Chemical Society, WashingtonDC., 1979, pp. 205–226.

Page 214: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Toolfor Drug Design

26. van Gunsteren, W.F. and Berendsen, H.J., Computer simulation of molecular dynamics: methology,

27. Brunger, A.T. and Karplus, M., Molecular dynamics simulations with experimental restraints, Acc.

28. Brunger, A.T., X-PLOR: A system for cystallography and NMR, Manual version 3.1. Yale University

29. Kirkpatrick. S., Gelatti, C.D., Jr. and Vecchi, M.P., Optimisation by simulated annealing, Science,

30. Amadei, A., Linssen, A.B. and Berendsen, H.J., Essential dynamics of proteins, Proteins: Struct. Funct.

31. Huber, T., Torda, A.E. and van Gunsteren, W.F., Local elevation: A method for improving the searching

32. Simmerling, C. and Elber, RE., Hydrophobic collapse in a cyclic hexapeptide computer simulation of

33. Jagannadh, B., Kunwar, A.C., Thangavelu, R.P. and Osawa, E., N e w techniquefor conformational sam-

applications, and perspectives in chemistry, Angew. Chem., Int. Ed. Engl., 29 ( 1 990) 992–1023.

Chem. Res., 24 (1991) 54–61.

Press, New Haven, CT.

220 (1983) 671–680.

Genet., 17 (1993) 412–425.

properties of molecular dynamics simulations, J. Comput.-Aided. Mol. Design., 8 (1994) 694–708.

CHDLFC and CAAAAC in water, J. Am. Chem. Soc., 116 (1994) 2534–2547.

pling of Cyclic molecules using the AMBER force field: Application to 18-crown-6 , J. Phys. Chem.,100 (1996) 14339–14342.

34. Fronczek, F.R., Hemingway, R.W., McGraw, C.W., Steyuberg, J.P., Helfer, C.A. and Mattice, W.L., Crystal structure, conformational analysis, and molecular dynamics of tetra-O-methyl-(+)-catechin.Biopolymers. 33 (1993)275–282.

35. Porter, L.J., Wong, R.Y., Benson, M., Chang. B.G., Wiswanadhan, V.N., Gandour, R.D. and Mattice.W.L., conformational analysis of flavans: Proton NMR and molecular mechanical (MM2) studies of the benzopyran ring of 3´,4´,5´,7´-tetrahydroxyflavan-3-ols: The crystal and molecular structure of the pro-cyanidin (2R, 3S, 4R)-3´,4´,5,7-tetramethoxy-4(-2,4,6-trimethoxyphenyl)-flavan-3-ol, J. Chem. Res., 3 (1986) 86–87.

36. Bryant, S.D., Attila, M., Salvadori, S., Guerrini, R . and Lazarus, L.H., Molecular dynamics conforma-

37. Collins, S. and Hruby, V.J., Prediction of the conformational requirements for binding to the Kappa-apioid receptor and its subtypes: I. Novel alpha-helical cyclic peptides and their role in receptor selec- tivity, Biopolymers, 34 (1994) 1231–1241.

38. Folkers, G., Merz, 4. and Rognan, D., CoMFA: Scope and limitations, In Kubinyi, H., (Ed.) 3D-QSAR: Theory. methods and applications. ESCOM Science Publishers B V., Leiden. The Netherlands, 1993,pp.583-618.

39. Hopfinger, A .J.and Kawakami, Y., QSAR analysis of a set of benzothiopyranoindazole anti-cancer analogs based upon their DNA interaction properties as determined by molecular dynamics simula- tion, Anti-Cancer Drug Design, 7 (1992) 203–17.

40. Langer, T. and Wermuth, C.G., Inhibitors of prolyl endopeptidase Characterization of the pharma-cophoric pattern using conformational analysis and 3D-QSAR, J. Coinput.-Aided Mol. Design. 7 (1993)253–262.

41. Rognan, D., Reddehase, M.J., Koszinowski, U.H. and Folkers, G., Molecular modeling of an antigeniccomplex between a viral peptide and a class I major histocompatibility glycoprotein, Proteins: Struct. Funct. Genet., 13 (1992) 70–85.

42. Folkers, G., Merz., A. and Rognan, D., CoMFA as a tool for active site modelling, In Wennuth. C.G.(Ed.) Trends in QSAR and molecular modelling, ESCOM Science Publishers. Leiden. The Netherlands, 1993, pp. 233–244.

43. Thibaut, U., Folkers, G., Klehe, G., Kubinyi, H., Merz, A. and Rognan, D., Recommendations forCoMFA studies and 3D QSAR publications, Quant. Struct.-Act. Relat., 13 (1994) 1.

44. Ringe, D. and Petsko, G.A., Mapping protein dynamics by X-ray diffraction, Prog. Biophys. Mol. Biol.,

45. Bjorkman, P.J.J., Saper, M.A., Samraoui, B., Bennet, W.S., Strominger, J.L. and Wiley, D.C., Structureof the human class I histocompability antigen. HLA-A2, Nature, 329 (1987) 506–512.

46. Sussman, J.L., Harel, M., Frolow, W., Oefner, C., Goldman, A., Toker, L. and Silman, I., Atomic struc-ture of acetylcholinesterase from Torpedo calfornica: A prototypic acetylcholine-binding protein, Science, 253 (1991)872–879.

tions of deltorphin analogues advocate delta opioid binding site models, Pept. Res. 7 (1994) 175–184.

45 (1985) 197–235.

205

Page 215: 3D QSAR in Drug Design. Vol2

Didier Rognan

47. Ripoll, D.R., Faerman, C.H., Axelsen, P.H., Simman, I . and Sussman, J.L , An electrostatic mechanismfor substrate guidance down the aromatic gorge of acetylcholinesterase, Proc. Natl. Acad. Sci. USA.,

48. Axelsen, P.H., HareI. M., Silman, I. and Sussman, J.. Structure and dynamics of the active site gorge of90 (1993) 5128–5132.

acetylcholinesterase SynergisticSci., 3 (1994) 188–197.

49. Faerman, C., Ripoll, D., Bon. S., Le Feuvre., Y , Morel, N., Massoulie, J., Sussman, J.L. and Silman, I.,Site-directed mutanis s des signed to test the back-door hypothesis of acetylcholinesterase function, FEBSLett., 3 (1996) 65–71.

50. Heemels, M.T. and Ploegh, H.L , Generation, translocation and presentation of MHC class I-restrictedpeptides, Annu. Rev. Biochem., 64 (1995)643–691.

51. Townsend, A., Ohlen. C., Bastin, J.. Ljunggren, H.G., Foster, L. and Karre, K., Association of class Imajor histocompatibility heavy and light chains induced by viral peptides, Nature, 340 ( 1989) 443–448.

52. Madden, D.R., Gorga, J.C., Strominger. J.L. and Wiley, D.C., The structure of HLA-B27 revealsnonamer self-peptides bound in an extended conformation,Nature, 353 (1991) 321–325.

53. Rognan, D., Zimmermann, N., Jung, G. and Folkers, G., Molecular dynamics study of a complexbetween the human histoconpability antigen HLA-A2 and the IMP58-66 nonapeptide from infleunzaprotein,virus matrix Eur. J . Bochem., 208 (1992) 101–113.

54. Madden, D.R. Garboezi, D.N. and Wiley, D.C., The antigenic identity of peptide/MHC complexes: A comparison of the conformations of five viral peptides presented by HLA-A2, Cell. 75 (1993) 693–708

55. Blundell, T., Sibanda, B.L., Sternberg, M.J.E. and Thornton. J.M., Knowledge-based prediction ofprotein structures and the design of novel molecules, Nature. 326 (1987) 347–352.

56. Vedani, A., Zbinden. P. and Snyder. J.P., Pseudo-receptor modeling: A new concept for the three-dimensional construction of receptor binding sites. J. Recept. Res., 13 (1993) 163–177.

57. van Gunsteren, W.F. and Berendsen. H.J .C., Groningen molecular simulation (GROMOS) librarymanual, Biomos, Groningen.

58. Kern, P., Brunne, R.M., Rognan, D. and Folkers, G., A pseudo-particle approach for studying protein-ligand models truncated to their active sites, Biopolymers, 38 (1996) 619–637.

59. Kern, P . , Rognan, D. and Folkers, G., MD simulations in pseudo-particle fluids: Applications to active-site protein complexes, Quant. Struct.-Act. Relat., 14 (1995) 229–241.

60. Bash, P.A., Singh, U.C., Langridge. R. and Kollman, PA., Free energy calculation by computersimulation, Science, 236 (1987) 564–568.

61. Kollman, PA., Advances and continuing challenges in achieving realistic and predictive simulation of the properties of organic and biological molecules Acc. Chem. Res., 29 ( 1996) 461–469.

62. Pealman, D.A. and Connelly, P.R., Determination of the differential effects of hydrogen bonding and water release on the binding of FK506 to native and Tyr82 Phe82 FKBP-12 proteins using free energy simulations, J . Mol. Biol., 248 (1995) 696–717.

63. Miyamoto, S. and Kollman, P.A., Absolute and relative binding free energy calculations of the inter-action of biotin and its analogs with streptavidin using molecular dynamics/free energy perturbationapproaches, Proteins: Struct., Funct., Genet., I6 (1993) 226–245.

use of molecular dynamics simulation and X-ray crytallography, Prot.

64. Bayly, C.I. and Kollmann, P.A., Molecular dynamics and free energy calculations on the peculiar bimodal alkali ion selectivity of an 8-subunit cavitand, J. Am. Chem. Soc. 116 (1994) 697–703.

65. Branda, N., Wyler, R. and Rebek, J., Jr., Encapsulation of methane and other small molecular in a self-

,66. Zacharias, M., Straatsma, T.P., McCammon J.A. and Quiocho. F.A., Inversion of receptor bindingpreferences by mutagenesis: Free energy thermodynamic integration studies on sugar binding toL-arabinose binding,Biochemistry. 32 (1093) 7428–7434.

67. Oroczo, M., Tirado-Rives. J. and Jorgensen, W.L., Mechanism for the rotamase activity of FK506binding protein from molecular dynamic simulations, Biochemistry, 32 (1993) 12864–12874.

68. Lau, F.T. and Karplus, M. Molecular recognition in proteins: Simulation analysis of substraye bindingby a tyrosyl-tRNA synthetase mutant, J. Mol. Biol., 236 ( 1994) 1049–1066.

69. Reddy, M.R., Varney, M.D., Kalish, V., Viswanadhan, V.N. and Appelt, K. Calculation of relative dif- ferences in the binding free energies of HIV1 protease inhibitors: A thermodynamic cycle perturbationcapproach,Med. Chem., 37 (1994) 1145–1152.

assembling superstructure, Science, 263 (1994) 1267–1 268.

206

Page 216: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

70. Singh, S.B., Wemmer, D.E. and Kollman, P.A., Relative binding affinities of distamycin and its analogto d(CGCAAGTTGGC).d(GCCAACTTGCG): Comparison of simulation results with experiment,Proc. Natl. Acad. Sci. USA., 91 (1991) 7673–7677.

71. Wlodek, S.T., Antosiewiez, J., McCammon, J.A., Straatsina. T.P., Gilson, M.K., Briggs, J.M., Humblet,C. and Sussman, J.L., Binding of tacrine and 6-chlorotacrine by acetycholinesterase, Biopolymers, 38 (1996) 109–117.

72. Reddy, M.R., Viswanadhan, V.N. and Weinstein, J.N., Relative differences in the binding free energiesof human immunodeficiency virus I protease inhibitors: A thermodynamic cycle-perturbation approach,Proc. Natl. Acad. Sci., 88 (1991) 10287–10291.

73. Jones, J.P., Trager, W.F. and Carlson. T.J., The binding and regioselectivity of reaction of (R)- and(S)-nicotine with cytochrome P-450cam: Parallel experimental and theoretical studies, J. Am. Chem, Soc., 115 (1993): 381–387.

74. Gillner, M., Bergman. J.. Cambillau, C., Alexandersson, M. and Fernstrom, B., Interactions ofindolo[3,2 -b]carbazoles and related polycyclic aromatic hydrocarbons with specific binding sites for2,3,7,8-tetrachlorodibenzo-p-dioxin in rat liver,Mol. Pharmacol., 44 (1993) 336–45.

75. Rao, B.G., Kim, E.E. and Murcko, M.A., Calculation of solvation and binding free energy differencesbetween VX-478 and i ts analogs by free energy perturbation and AMSOL methods, J. Comput.-AidedMol. Design, 10(1996) 23–30.

76. Miyamoto, S. and Kollnian, P.A., What determines the strength of noncovalent association of ligands to proteins in aqueous solution?, Proc. Natl, Acad. Sci. USA, 90 (1993) 8402–8406.

77. Jorgensen, W.L., Buckner, J.K., Boudon, S. and Tirado-Rives, J ., Efficient computation of absolute free energies of binding by computer simulation: Application to the methane dimer in water, J. Chem, Phys., 89 (1988) 3741–3746.

78. Mark, A. and van Gunsteren, W.F., Decomposition of the free energy of a system in terms of specificinteractions: Implication for theoretical and experimental studies, J. Mol. Biol., 240 (1994)167–176.

79. Boresch, S., Achonitis, S. and Karplus M., Free energy simulations: The meaning of the individualcontributions from a component analysis. Proteins: Struct., Funct. Genet.. 20 (1994) 25-33.

80. Helms, V. and Wade. R.C., Thermodynamics of water mediating protein-ligand interactions incytochrome P450cam: A molecular dynamics study. Biophys. J., 69 (1995) 810–824.

81. Poornima, C.S. and Dean. P.M. Hydration in drug design: 1. Multiple hydrogen-bonding features of water molecules in mediating protein-ligand interactions, J. Comput.-Aided Mol. Design. 9 (1995) 500–512.

82. Appelt, K., Bacquet, R.J., Bartlett. C.A., Booth. C.L., Freer, S.T., Fuhry, MA., Gehring, M.R., Herrman, S.M., Howland, E.F., Janson, C.A., Jones, T.R., Kan, C.-C., Kathardekar, V.,Lewis. K.K., Marzoni, G.P., Matthews, D.A., Mohr, C., Moomaw, E.W., Morse, C.A., Oatley, S.J.,Ogden, R.C., Reddy, M.R., Reich, S.H., Schoettlin, W.S., Smith. W.W., Varney. M.D., Villafranca, J.E.,Ward, R.W., Webber, S., Webher. S.E., Welsh. K.M. and White, J., Design of enzyme inhibitors using

83. Lam. P.Y., Jadhav, P.K., Eyermann, C.J., Hodge. C.N., Ru, Y., Bacheler, L.T., Meek. J.L., Otto. M.J.,Rayner, M.M., Wong, Y.N., Chang, C.H., Webber, P.C., Jackson. D.A., sharpe, T.R. and Erickson-Viltanen, S., Rational design of potent, bioavailable, nonpeptide cyclic ureas as HIV protease inhibitors.Science, 263 (1994) 380–384.

84. Blundell, T., Hubbard, R. and Weiss, MA., Structural biology and diabetes mellitus: Molecular patho- genesis and rational drug design,Diabetologia, 35 (1992)69–76.

85. Rognan, D., Scapozza, L., Folkers, G. and Daser, A., Molecular dynamics simulation ofMHC-peptide complexes as a tool for predicting potential T cell epitopes, Biochemistry, 33 (1994) 11476–11485.

86. Rognan, D., Krebs, S., Kuonen, 0.. Lamas, J.R., López de Castro, J.A. and Folkers, G., Fine sepcificityof antigen for two class I major histocompability protein Alleles (B 2705 and B 2703) differing in one amino acid, J . Comput.-Aided Mol. Design, 11 (1997) 463–478.

87. Kognan, D., Scapozza, L., Folkers, G . and Daser, A., Rational design of nonnatural peptides as high-affinity ligands for the HLA-B*2705 human leukoeyte antigen, Proc. Natl. Acad. Sci. USA, 92 (1995) 753–757.

iterative protein crystallographic analysis . J. Med. Chem., 7 (1991) 1925–1934.

207

Page 217: 3D QSAR in Drug Design. Vol2

Didier Rognan

88. Scopozza, L., Rognan, D., Folkers, G. and Daser, A. Molecular dynamics and structure-based drugdesign for predicting non-natural nonapeptide binding to a class I MHC protein, Acta Cryst., D51 (1995) 541–549.

89. Rognan, D., Molecular modeling of protein-peptide complexes: Application to major histocompatibilityproteins, Habilitationschrift. Eidgenossiche Technische Hochschule (ETH), Zurich, 1997.

90. Jardetzky, T.S., Lane, W.S., Robinson, R.A., Madden. D.R. and Wiley, D.C., Identification of self- peptides bound to purified HLA-B27, Nature, 353 (1991) 326–329.

91. Daser, A., Henning, U. and Henklein, P., HLA-B27 binding peptides derived from the 57kD heut shackprotein of Chlamydia trachomatis: Novel insights into the peptide binding rules, Mol. Immunol.,31 (1994)331-336.

92. Bouvier, M. and Wiley, DC., Importance of peptide amino and carboxy termini to the stabiliiy of MHCclass I molecules. Science, 265 (1994) 398–402.

93. Szewezuk, Z., Gibbs, B.F., Yue, S.H., Purisima, E., Zdanov, A., Cygler, M. and Konishi, Y., Design of a linker for trivalent thrombin inhibitors: Interaction of the main chain of the linker with thrombin,Biochemistry, 32, (1993) 3396–3404.

94. Connolly, M.J., Analytical molecular surface calculation, J. Appl. Crystallogr., 16 (1983) 548-558.95. Rognan, D. AS and immunotherapy: Future considerations, In, Lopez-Larrea. C. (Ed.) HLA-B27 in the

development of spondyloarthropathies, R.G. Landes Co., Georgetown, MA, 1997, pp, 235–251.96. M.D, Paulsen and Ornstein R.L. Predicting the product specificity and coupling of cytochrome

P450cam, J . Cornput.-Aided Mol. Design, 6 (1992) 449-460.97. Bass. M.B. and Orustein. R.L., Substrate specificity of cytochrome P450cam for L- and D- norcamphor

as studied by molecular dynamics simulations, J. Comput. Chem., 14 (1993) 541–548.98. Bemis, G.W., Carlson-Golab, G. and Katzenellenbogen, J.A., A molecular dynamics study of the stabil-

ity of chymotrypsin acyl enzymes, J . Ani. Chem. Soc., 114 (1992) 570–578.99. Taylor. N.R. and von Itzstein, M., Molecular modeling studies on Iigand binding to sialidase from

infleunza virus and thw mechanism of catalysis, J. Med. Chem., 34 (1994) 616–624.100. Helms, V., Deprez, E., Gill, E.. Barret, C., Hui Bon Hoa, G. and Wade. R.C., Improved binding of

Cytochrome P450cam substrates analogues designed to fill extra space in the substrate binding pocket,Biochemistry, 35 (1996) 1485–1499.

101. Boisgérault, F., Tieng, V., Stolzenberg. M.C, Dulphy. N., Khalil, I., Tamouza, R., Charron, D. and Toubert, A., Differences in endogenous peptides presented by HLA-B*2705 and B*2703 allelic variants: Implications for susceptibility to spondyloarthropathies, J. Clin. Invert., 98 (1996) 2764–2770.

102. Klebe, G., The use of composite crystal-field environments in molecular recognition and the de novodesign of protein ligands, J. Mol. Biol., 237 (1994) 212–235.

103. Mills, J.E.J. and Dean. P.M., Three-dimensional hydrogen-bond geometry and propability information from a crystal survey, J. Comput.-Aided Mol. Design, 10(1996) 607–622.

104. Pearlman, D.A., Case, D.A., Caldwell, J.C., Ross, W.S., Cheatham, T.E., III, Ferguson, D.E., Seibel,G.L., Singh, C., Weiner, P.K. and Kollmann, PA., AMBER 4. 1. University of California, San Francisco, CA, U.S.A., 1995.

105. Lewis. R.A. and Leach, A.R., Current methods for site-directedstructure generation, J. Comput.-AidedMol. Design, 8 (1994) 467–475.

106. Pearlman, D.A. and Murcko. M.A., CONCEPTS: New dynamic algorithm for de novo drug suggestion,J. Comput. Chem., 14 (1993) 1184–1193.

107. Pearlman, D.A. and Murcko. M.A., CONCERTS: Dynamic connection of fragments as an approach tode novo ligand design, J . Med. Chem., 39 (1996) 1651–1663.

108. Sandak, B., Nussinov, R. and Wolfson, H.J., An automated computer vision and robotics-based tech- nique for 3-D flexible biomolecular docking and matching. Comput. Appl. Biosci., 11 (1995) 87–99.

109. Rarey M., Kramer B., Lengauer T. and Klebe G., A fast flexible docking method using an incrementalconstruction algorithm, J. Mol. Biol., 261 (1996) 470–489.

110. Morris, G.M., Goodsell, R., Huey, R. and Olson. A.J., Distributed automatic docking of flexible ligandsto proteins: parallel applications of AutoDock 2.4, J. Comput.-Aided Mol, Design, 4 (1996) 293–304.

111. Di Nola. A., Roccatano, D. and Berendsen,H.J.C., Molecular dynamics simulation of the docking of sub- strates to proteins. Proteins: Struct., Funct. Genet., 19 (1994) 174–182.

208

Page 218: 3D QSAR in Drug Design. Vol2

Molecular Dynamics Simulations: A Tool for Drug Design

112. Tirado-Rives, J. and Jorgensen, W.L., Viability of molecular modeling with pentium-based Pcs,

113. Jorgensen, W.L. and Tirado-Rives, J., Monte Carlo vs molecular dynamics for conformational sampling, J. Comput. Chem., 17 (1996) 1385–1386.

J. Phys. Chem., 100 (1996) 14508–14513.

209

Page 219: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 220: 3D QSAR in Drug Design. Vol2

Part III

Pharmacophore Modellingand Molecular Similarity

Page 221: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 222: 3D QSAR in Drug Design. Vol2

Bioisosterism and Molecular Diversity

Robert D. Clark*, Allan M. Ferguson, and Richard D. CramerTripos, Inc., 1699 S. Hanley Road, St. Louis, MO 63144, U.S.A.

1. Introduction

Assessing quantitatively the similarity (or dissimilarity) between one molecule andanother is central to both QSAR and molecular diversity analysis. When either endeavoris involved in pharmaceutical or agrochemical lead discovery and development, theonly similarity that ‘really’ matters is similarity in biochemical (and/or physiological)properties. Yet, by delinition, biochemical similarity cannot itself be measured beforecompounds are actually in hand, so some other way must be found to identify function-alities which generally ‘look’ similar to receptor and enzymatic ligand binding sites,even though they do not share substructural motifs — i.e. which are bioisosteric [1].

A cartographic metaphor [2] for the discovery and development of biologically activesmall molecules is useful in thinking about the relationship between QSAR and diver-sity analysis. As so ably described in chapters elsewhere in this volume, QSAR involvesusing compounds of known biochemical activity to find a combination of descriptorswhich can be used reliably to predict the activity of related compounds which have notyet been synthesized and to help better understand the underlying biochemistry. Hence,QSAR is fundamentally interpolative, a matter of surveying islands once their locationis known to identify the highest peak on the island.

Molecular diversity analysis, in contrast, entails surveying an ocean to find out wherethe islands of activity are — and where they probably are not. Descriptor values forproperties which might be useful as charting reference points generally range far morewidely than would be appropriate for a QSAR study, making diversity analysis an es-sentially extrapolative exercise which is always looking to the horizon. The key isfinding a descriptor ‘lens’ through which the different kinds of islands are well-separated, clearly defined and easy to distinguish.

Computational chemists have quite naturally looked to QSAR as a source of ‘good’descriptors for quantitatively evaluating bioisoteric similarity. The classical whole-molecule (logP, MR, etc.) and fragment (π, σp, F, etc.) QSAR parameters developed byHansch and Fujita [ 3 ] ,among others, have been employed for this purpose [4,5], buthave a limited range of values and tend to be highly correlated with one another. whichmakes it difficult to capture satisfactorily the diversity of large structural datasets.

Steric, electrostatic and hydrophobic molecular fields have also been very success- fully employed in QSAR via Comparative Molecular Field Analysis [6–8].Extensionof CoMFA to diversity analysis is particularly appealing because the information inmolecular fields is localized and of high dimensionality .Binding sites in enzymes and receptors tend to present quite variegated interaction surfaces to their ligand. Therefore,

* To whom correspondence should be addressed.

H. Kibinyi et al (eds.), 3D QSAR in Drug Design, Volume2. 213–224.© 1998 Kluwer Academic Publishers. Printed in Great Britain.

Page 223: 3D QSAR in Drug Design. Vol2

Robert D. Clark,Allan M. Ferguson, mid Richard D. Cramer

it is reasonable to expect that descriptors like CoMFA fields, which can differentiatebetween molecules in many different ways simultaneously, stand a better chance of‘capturing’ biochemically meaningful distinctions. This is one explanation for the cor-relation between the dimensionality of descriptors and their suitability as diversitydescriptors [9].

An added appeal of CoMFA is its ability to generalize across substitution patternsand atom types. This occurs because differences between fields are quantitative but sens-itive to the structural context, whereas differences between atoms are qualitative.Hence, the field for a methylamino group is ‘between, those of an ethyl group and amethoxy group, and the field near a ketone resembles that around a difluoromethylene.Since an enzyme or receptor binding site interacts with each ligand’s molecular orbitals,not with its atomic nuclei, i t is reasonable to expect that the molecular fields involvedshould correlate well with biological activity. The success of CoMFA as a tool forderiving quantitative structure-activity relationships [7]] is a testament to the validity ofthis hypothesis.

2. Theoretical Considerations

2.1. Information density

In general, the databases with which diversity analysis deals nearly always originatefrom molecular connectivity specifications alone (e.g. SMILES or SLN strings [10])which are essentially two dimensional. Several excellent computer programs existwhich will convert such connectivity data into 3D molecular graphs which can, in turn,be submitted to minimization routines to identify low-energy conformations. None ofthese operations actually adds any information to that contained in the original con-nectivity, and recent work indicates that this absence of added information is very realin many situations [11]. How, indeed, can any ‘derived’ 3D descriptors such as mole-cular fields add anything to diversity analysis over that which can be obtained from 2Dfingerprints?

One way is by efficiently utilizing information about chirality which is embedded inthe line notation, either explicitly or implicitly. Explicitly specified chirality encodeshigher-order (‘2.5D’) information which is not captured in 2D fingerprints, but which iscaptured in 3D structures and, thence, in molecular fields. Certain implicit ‘chiral’ infor-mation can also be captured directly. such as the sensible disposition of equatorial andaxial substituents on symmetrical, achiral molecules.

Pharmacophore multiplet fingerprints [12] attempt to do this by including the dis-tance between analogous groups. but even quite restricted pharmacophoric classesrequire hundreds of thousands of bits to encode a small part of the information con-tained in a molecular field built from a few thousand lattice points. Moreover, in ourhands, the most important information in such fingerprints is best appreciated by visualinspection: compounds with similar biochemical properties tend to form similar patterns in the pharmacophore multiplet space, a relationship which often fails to show up inquantitative bitwise comparisons.

214

Page 224: 3D QSAR in Drug Design. Vol2

Bioisosterism and Molecualr Diversity

Incorporating pharmacophore definitions into 2D fingerprints can capture some ofthis information effectively, but cannot capture subtle biochemical effects (distinctionsbetween amide and carbamate nitrogens or similarities between pyrimidine analogs, forexample) except as special cases. Molecular fields, on the other hand, can generalizeefficiently across a range of structurally diverse but biologically equivalent functionalgroups — i .e. bioisosteres.

2.2. Encoding conformational information

CoMFA is an attractive way to compare molecules, but each conformer of any given mol-ecule has a different molecular field. Most small molecules of interest as potential drugsarc flexible and, hence, have a fairly large number of distinct conformations lying within areasonable energy range of the ground state. Indeed. solvent effects and necessaryapproximations in making energetics calculations can make identification of ‘the’ groundstate conformation problematic at best, and a purely theoretical exercise at worst.

One way to approach this dilemma is to average the molecular field across a large number of energetically reasonable conformations. For compounds which have only afew related but distinctive low-energy conformations, this approach will certainlyenhance the information content of the molecular field as a descriptor. Unfortunately,such molecules are the exception, not the rule, and averaging across conformersis likely to blur meaningful distinctions between molecules at least as often as itilluminates significant similarities.

Alternatively, one can finesse the issue by identifying that conformation for one mole-cule which produces a molecular field most similar to that of another to which it is beingcompared [13]. Such field fitting can become computationally intractable, however.unless one molecule is fixed in a reference conformation. Given a reference molecule andconformation, this approach becomes a variation on the theme of fitting probe molecules to some enzyme or receptor binding site model. Though docking methods are indeedpowerful tools for cases where a known, relatively rigid binding site structure is known,it is not very useful in designing libraries for general lead discovery —i.e. when targeted against a ‘universal receptor,. Related work using a panel of binding-sites, whethervirtual [14]or realized [15], may prove more generally useful, however.

2.3. Characteristic confomations

A third alternative, which is the one we have found most fruitful, is to ‘project’ the four-dimensional conformational space into three dimensions by using robust, rule-based al-gorithms to generate characteristic conformations. Doing so brings the fields for similarand homologous substructures into register, which enhances the ability to detect similar-ity and, at the same time, enhances field differences between functionally distinct substructures.

Moreover, suitably formulated rules make it possible to encode the very potentialfor flexibility into molecular fields. Consider two compounds, one cyclic and oneopen chain, but otherwise of similar structure. Even if the nominal ground-state

215

Page 225: 3D QSAR in Drug Design. Vol2

Robert D. Clark, Allan M . Ferguson, and Richard D. Cramer

conformations of the two are identical, their potentials as ligands are likely to be verydifferent when compared across a range of receptors. A 3D building rule which setsbond torsions to all trans wherever reasonable makes the corresponding molecular fieldsbecome readily distinguishable.

Characteristic conformations also serve to emphasize similarities between molecules which may be attenuated or lost altogether when averaged across conformers. Biaryl ethers are a case in point. The aryl planes in these compounds are characteristically tilted somewhat from the perpendicular with respect to one another, so that each low- energy conformer is asymmetrical about the central oxygen. Yet energy differencesbetween the various torsions about the ether bridge and rotations about each ether bondare negligible. If ‘true’ energy minima were used for CoMFA, differences between mol-ecules would be scattered across the entire field. Choosing an arbitrary but consistentstarting orientation focuses differences onto specific substructures, so that a consistentQSAR can be obtained [e.g. 16]. Characteristic conformations, such as those obtainedfrom CONCORD [17], bring an analogous benefit as a starting point for diversity

3. Topomeric CoMFA

Even where knowledge of the ‘true’ conformational ground state is available, there arean infinite number of energetically equivalent rotations and translations that could beapplied to one molecular field with respect to another when making a comparisonbetween the two. In cases where a series of compounds include a common core, this‘alignment problem’ can be dealt with by superimposing that common core.

The core atom bearing the substituent (core atom) defines the origin of the field. andthe bond between the core and the fragment establishes the positive pole of the x-axis.The positive-positive quadrant of the xy plane is defined by the core atom, the fragmentatom it is bonded to (the valence atom) and the first atom of the biggest substituent onthe valence atom [18]. A characteristic conformation is then generated by working outfrom the common core, applying a series of rules to establish characteristic torsionaland chirality states as one goes.

Such rules become more ambiguous as one moves away from the core, so field con-tributions from each atom in a substituent are attenuated by a factor of 0.85n, where n is the number of rotatable bonds separating that atom from the core.

Topomeric alignments for several trichoroethyl derivatives are shown in the top rowof structures in Fig. 1 . Note that dimethylamino and isopropyl groups are isosteric inthis context, whereas 2-propenyl is more similar to the thiomethyl congener. In theseries of trichloroacyl derivatives shown at the bottom of Fig. 1, on the other hand, di-methylamino is isosteric with 2-propenyl, which reflects conjugation with the acyldouble bond. This dependence on context is a particularly useful result or using therule-based CONCORD for initial 3D model building. In some cases, this requires atrade-off between useful distinctions between molecules and arguably artificial ones —compare the alignments of the isopentane (top) and the isopropyl ketone (bottom) inFig. 1, for example. As discussed below with respect to validation. however, this is nota great problem for diversity analysis.

216

analysis.

Page 226: 3D QSAR in Drug Design. Vol2

Fig

.1.

Top

omer

ic a

lignm

ent o

f a se

ries

of t

rich

loro

ethy

l der

ivat

ives

(top

) and

thei

r tri

chlo

roac

etyl

cou

nter

part

s (bo

ttom

). Al

l ana

logs

are

supe

rim

pose

d at

the

lef t

ofea

ch r

ow, H

eter

oato

ms

are

labe

led

in th

e su

bstiu

ents

but

om

itted

from

the

com

mon

cor

e st

ruct

ures

CO

NC

ORD

[17

] str

uctu

res

wer

e al

igne

d us

ing

Sele

ctor

in

SYBY

L 6

.3 [

20].

217

Page 227: 3D QSAR in Drug Design. Vol2

Robert D. Clark, Allan M.Ferguson, and Richard D. Cramer

A somewhat related method for optimizing the diversity of a set of side chains hasbeen described by Chapman [19], Here. similarity is summed across pairwise distancesbetween each atom in the ‘query’ and all atoms in the target molecule. A ‘soft thresh-old’ weighting function is used, and each of a set of ‘diverse’ conformers is considered.

3.1. Applications

Topomeric CoMFA is used to cluster each class of reactant (e.g. nucleophile. elec-trophilic reagent, dienophile, etc.) whenever a new class of reaction is considered for in-clusion in the Optiverse screening library [2,20] .Selection of a reagent from eachcluster gives a representative diverse set of reagents which carry on through subsequentstages of the design. Candidate products are then screened for diversity on the basis oftheir 2D fingerprints. This use of both metries together exploits their complementarityand enhances the isobiosteric diversity of the database [2].

Similarity searches can now be run in the ChemSpace virtual library [20] based ontopomeric fields. This capability is proving itself very useful in identifying novel iso- steric substituents for ongoing development programs. In addition, topomeric CoMFAhas been used to evaluate disparate structures as potential diverse cores or scaffolds forcombinatorial libraries, with quite satisfying results (R.D. Cramer, unpublished).

4. Inertial Field Orientation(IFO-CoMFA)

In many series of interest, compounds lack an identifiable common core. In some com-binatorial libraries, a nominal common core exists. but serves only as a linker betweenfunctional moieties. In either case, an alternative t o topomeric alignments is required ifmolecular fields are to be of use as diversity metrics. To this end, inertial field orienta-tion (IFO-CoMFA) was introduced into Selector [20] for SYBYL 6.3 [21].

Alignments based on molecular moments of inertia have been successfully used topredict HPLC retention times of polynuclear aromatic hydrocarbons (PAHs) [22]. Inthis approach, the field coordinate origin is set to the molecular center of mass, and thatmolecular axis about which the moment of inertia is smallest defines the x axis for thefield. They axis is defined by that molecular axis perpendicular to the first which hasthe smallest moment of inertia.

That implementation of the technique is not directly applicable to diversity analysis,however. For one thing, there is no good way to differentiate one end of the x axis fromthe other, or one end of the y axis from the other. As a result, each molecule has fourequivalent inertial alignments. This did not interfere with the analyses by Welsh et al.because they were working with flat, symmetrical molecules and highly isotropic typesof interactions distributed across the entire molecular surface [22,23].

A second problem is that inertial field orientations which incorporate atomic massesare overly sensitive to replacement of hydrogen by fluorine. chlorine or bromine.Halogenation, or exchange of one halogen for another, has a tremendous effect onmolecular moments of inertia. Hence. such homologous substitutions will intro-duce unreasonably large CoMFA dissimilarities if weighted moments arc used fororientation.

218

Page 228: 3D QSAR in Drug Design. Vol2

Fig

.2.

Sche

mat

ic r

epre

sent

atio

n of

the

isop

oten

tial s

hells

con

stitu

ting

the

mol

ecul

ar s

teri

c fi

eld

2,4-

dich

loro

benz

ene.

Arr

ows

indi

cate

the

com

mon

pri

ncip

al a

xes

for

the

mol

ecul

ar g

raph

and

the

isop

oten

tial s

hells

; dip

ole

mom

ents

alo

ng e

ach

axis

are

bas

ed o

n ch

arge

s cal

cula

ted

usin

g th

e G

AST

-HU

CK

opt

ion

in S

YB

YL

6.3

[2

0].

219

Page 229: 3D QSAR in Drug Design. Vol2

Robert D. Clark, Allan M. Feguson, and Richard D.Cramer

To avoid these problems, we have turned to ‘inertial’ orientation of the steric field, asopposed to inertial orientation of the molecule itself. This can be most simply accom-plished by using the principal axes of the unweighted molecular graph [21] as definingcoordinate axes for the molecular fields. That the principal axes of the (unweighted)molecular graph are valid indicators of the principal axes of the steric field is not imme-diately obvious, but may be appreciated from the schematic in Fig. 2. If the steric fieldis thought of as a series of shells built up from summed spherically symmetrical func-tions about each atom, each shell will have the same principal axes. Hence, the principalaxes of the 0-diameter shell —i.e. the unweighted graph —coincide with the principalaxes of the steric field as a whole.

The principal axes define four (generally distinct) possible orientations for the field.We opt to use the one for which the components of the dipole moment along the x and yaxes ate most positive. Logically, this can be thought of as aligning each molecule un-ambiguously with itself, and is akin to the rationale behind characteristic conformationsoutlined above. Incorporating the dipole moments along each principal axis allows theoverall electrostatics of the molecule to contribute to the orientation of the field, so thatnonlinear molecules with distinct distributions of polar functionalities can be morereadily distinguished.

Other, related approaches are also in the literature. One can, for example, align ran-domly sampled isosteric surfaces by their principal axes, then pick from among the fourdegenerate solutions for each of 20-30 conformers the one which maximizes the inter-section volume [24]. In this case, similarity is measured in terms of a Tanimotocoefficient calculated from the intersection and union volumes between the (fixed)query and the target molecules.

4.1. Example

Evolutionary forces can be expected to favor substrates and ligands which are geometri- cally and electrostatically asymmetrical. To the extent that this is so. two stericallysimilar molecules which ‘look’ electrostatically similar with respect to the principalaxes of each are expected to be qualitatively similar in biological activity. This is illus-trated in Fig. 3: steric and electrostatic IFO-CoMFA fields are shown for the anti-ulcer,histamine H2 receptor antagonists ranitidine (Zantac®) and cimetidine (Tagamet®). Thecharacteristic conformations used were generated using CONCORD [17]and orientedusing the inertial field orientation option in Selector [21].

Steric similarity between molecules can be expressed as a volumetric Tanimotocoefficient [24], in terms of the correlation coefficient between steric fields [25], or assimple energy differences [8]. The steric fields in Fig. 3 differ in energy by186 kcal/mol when summed across a 2 Å lattice, which is somewhat above thecharacteristic neighborhood radius of 165 kcal/mol for IFO-CoMFA (R.D. Clark,unpublished). The difference in the electrostatic fields, however, is considerably larger,on the order of 230 kcal/mol. In our hands, electrostatic fields are sometimes too dependenton conformation to discriminate reliably between molecules. I t is still the case that mole-cules with similar enough fields will, indeed, have similar biological activities, but the

220

Page 230: 3D QSAR in Drug Design. Vol2

Fiq

. 3. C

ompa

riso

n of

iner

tial

ly o

rien

ted

elec

tros

taic

(to

p) a

nd s

teri

c (b

otto

m)

mol

ecul

ar fi

elds

for

rani

tidi

ne (

Zan

tac®

) an

d ci

met

idin

e (T

agam

et®).

In

the

top

plot

s,

the

ligh

ter

area

s ar

e m

ore

elec

tron

egat

ive

regi

ons.

The

ele

ctro

stat

ic w

ere

cont

oure

d at

20

kca

l/m

ol (

top)

and

ste

ric

field

s w

ere

cont

oure

d at

25

kcal

/mol

(b

otto

m).

221

Page 231: 3D QSAR in Drug Design. Vol2

Rubert D. Clark, Allan M. Ferguson, mid Richard D. Cramer

‘neighborhoods’ defined by that radius may be too sparsely populated to be generallyuseful. H-bond fields have proven more useful in CoMFA diversity applications [9].

5. Validation

As noted above, QSAR is an essentially interpolative endeavor. But assessing the dif-ferences among thousands to millions of disparate compounds, as is routinely con-templated today, is an heroically extrapolative endeavor. The essential complementarityof the two disciplines means that some good QSAR descriptors will, indeed, be gooddiversity metrics and vice versa, but utility in the one arena is no guarantee of useful-ness in the other.

The usefulness of a descriptor in a particular QSAR application is assessed by howmuch it enhances the ability to interpolatively predict the biochemical activity of com-pounds whose activity is known but which was not included in the model. The vastmajority of compounds in virtual combinatorial libraries will never be made. let aloneassayed for biological activity. How, then. is the validity of a new diversity metric to beevaluated? The ultimate judgment will be based upon the track record of each metricand how it has been applied, but taking a ‘wait and see’ approach can be costly i n termsof wasted resources or missed opportunities, or both.

A more pragmatic approach is to use the descriptor in question to cluster a range ofcompounds into subgroups: ‘good’ descriptors will tend to put sets of similar com-pounds into the same group. The assessment of biochemical similarity in such ‘cate-gorical validation’ studies may rely on activity or inactivity in some single assay [26],or on the pharmacological class of each compound [25]. Given a set of (usually well-separated) activity islands, categorical validation tests the ability of a metric to assign

IFO-CoMFA has been validated i n the latter context [27], whereas the categoricalvalidation of toporneric CoMFA was by examination of clusters in light of medicinalchemical experience [18]. Such direct assessment of bioisosterism is ‘abstract’ to somedegree, but is altogether appropriate — the development process has relied upon it his-

‘real’ cornpounds to be extended to artificial datasets which lack the clumpinessimparted to commercial databases by directed-walk development strategies and patent considerations, and allows meaningful generalizations across receptors and active sites [18].

A complementary approach is to show that proximity in the descriptor space is asufficient condition for proximity in bioactivity space — i.e. that the descriptor of inter-est exhibits good ’neighborhood behavior [9]’. This entails plotting differences in bioac-tivity against dissimilarity in the descriptor space. If the upper-left triangle in such aplot is sparsely populated. similar compounds rarely show large differences in bio-activity. Hence, one can rely on each cornpound to reliably ‘report’ on the bioactivity ofits neighbors as defined by that descriptor [9]. Such neighborhood plots also provideestimates of the neighborhood ’radius’ for a descriptor — i.e. of degree of separation inthe descriptor space which corresponds to a difference of 100-fold or less i n biologicalactivity [9].

222

compounds to the correct island [27].

torically — and can be very powerful. Bioisosteric assessment allows experience with

Page 232: 3D QSAR in Drug Design. Vol2

Bioisosterism and Molecular Diversity

In terms of our island-charting metaphor: islands will be exclusive and locally dense ina good descriptor space, so that if one compound in the neighborhood is active, most of those similar to it will also be active. Each activity may well be spread across severalislands, however. Hence, exaggeration of what might well be incidental differences, suchas that between the isopentane and the isopropyl ketone in Fig. 1, do not directly com-promise the reliability of a metric for diversity analysis. Such fragmentation of ‘islands’is, however. undesirable, in so far as it reduces the efficiency of sampling by increasingthe number of compounds required to ‘cover’ a particular range of structural variation.

Neighborhood behavior has been demonstrated for topomeric CoMFA [9] and for whole-molecule CoMFA using a rule-based alignment of ACE inhibitors [25].IFO-CoMFA also exhibits neighborhood behavior with a similarity radius of150–200 kcal/mol when summed across a 2 Å field lattice (R.D. Clark, unpublished).

6. Conclusion

Molecular fields are important tools for assessing molecular diversity, particularly whenone wishes to go beyond 2D substructural similarity and include bioisosteric similarity.Hence, when common cores or scaffolds exist, topomeric CoMFA is a powerful com-plement to analyses based on similarity of 2D fingerprints. In the absence of commoncores, topomeric CoMFA can be used to evaluate the 'diversity potential’ of differentcore scaffolds. Alternatively, IFO-CoMFA is available to provide useful insight intomolecular similarities.

References

1. Fujita, T., QSAR and database-aided bioisosteric structural transformation procedures as methodolo- gies of agrochemical design. In Hansch, C. and Fujita, T. (Eds.) Classical and three-dimensional QSARin agrochemistry, ACS Symposium Series 606,ACS, Washington D.C., 1995,pp. 13–34.Ferguson, A.M., Patterson, DE., Garr, C.D. and Underiner, T.L., Designing chemical libraries for leaddiscovery J. Biomol. Screening. 1 (1996) 65–73. Hansch, C. and Fujita, T., A method for the correlation of biological activity and chemical structure,.JAm.Chem. Soc., 86 (1964), 1616–1626.Shemetulskis, N.E., Dunbar, J.B., Jr., Dunbar, B.W., Moreland, D.W. and Humblet, C., Enhancing thediversity of a corporate database using chemical database clustering and analysis, J. Comput.-Aided Molec. Design, 9 (1995) 407–416.

2.

3.

4.

diversity: Experimental des ign of combinational libraries for drug discovery,Moos, W.H., Measuring J. Med. Chem., 38 (1995),

1431-1436.Cramer, R.D. III, Bunce, J.D., Pallerson, DE. and Frank I.E., Crossvalidation bootstrapping, and partial least squares Compared with multiple regression in conventional QSAR studies, Quant. Str.-Act. Relat., 7 (1988) 18–25.Martin, Y.C., Kim, K.-H. and Lin, C.T., Comparative molecular field analysis: CoMFA. In Charton,M. (Ed.) Advances in quantitative structure–property relationships, Vol. 1 , JAI Press, inc., Greenwich,CT, 1966.pp. 1–52.Norinder, U., Single and domain mode variable selection in 3D QSAR applications, J. Chemometrics,10 (1996)95–105.Patterson, D.E., Cramer, R.D., Ferguson, A.M., Clark, R.D. and Weinberger, L.E., Neighborhoodbehavior: A useful concept for validation of molecular diversity descriptors, J. Med. Chem., 39 (1996)3049-3059.

6 .

7.

8.

9.

223

5. Martin, E.J., Blaney, J.M., Siani, M.A., Spellmeyer, D.C., Wong, A.K and

Page 233: 3D QSAR in Drug Design. Vol2

Robert D. Clark, Allan M. Ferguson, and Richard D. Cramer

10. Ash, S., Cline. M.A., Homer, R.W., Hurst, T. and Smith. G.B., SYBYL line notation (SLN):A versatilelanguage for chemical structure representation. J. Chem. Inf. Comput. Sci., 37 (1997) 7 1-79; SLN ispart of the SYBYL and UNITY products available from Tripos, Inc., 1699 S. Hanley Road. St. Louis.MO 63411, U.S.A.Brown, R.D. and Martin. Y .C., The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding,J . Chem. Inf. Comput. Sci., 37 (1997) 1–9.Good. A.C. and Kuntz, I.D., Investigating the extension of pairwise distance pharmacophore measuresto triplet–based descriptors, J. Camput.-Aided Molec. Design, 9 (1995) 373–379.Willett, P., Calculation of molecualr similarity of the alignment of molecular fields. In this volume. Briem, H. and Kuntz, I.D., Molecular similarity based on DOCK-generated fingerprints, J. Med. Chem.,39 (1996) 3401-3408.Kauvar, L.M., Higgins, D.L., Villar, H.O., Sportsman. J.R., Engqvist-Goldstein, A., Bukar, R.. Bauer, K.E., Dilley, H. and Rocke. D.M., Predicting ligand binding to proteins by affintiy fingerprinting, Chem.Biol., 2 (1995) 107–118.

16. Clark, R.D., Synthesis and QSAR of herbicidal 3-pyrazolyl α,α,α-trifluoro-tolyl ethers, J. Agric. Food Chem., 44 (1996) 3643–3652.

17. Balducci, R., McGarity. C., Rusinko, A. III, Skell, J. and Pcarlman, R.S. and Pcarlman, R.S.CONCORD was developed at the University of Texas at Austin and is av ailable exclusively from Tripos, Inc., 1699 S. Hanley Road, St. Louis, MO 63144, U.S.A. Cramer, R.D., Clark. R.D., Patterson. D.E. and Ferguson, A.M. Bioisosterism as a molecular diversitydescriptor: Steric fields of single topomeric conformers, J. Med. Chem., 39 (1996) 3060–3069.Chapman, D., The measurement of molecular diversity: A three-dimensional approach, J. Comput.-Aided Molec. Design. 1 0 (1996) 501 512.Tripos, lnc., 1699 S. Hanley Road, St. Louis, MO 63144. U.S.A.(a) Clark, R.D. and Cramer, R.D., The many ways to skin a combinatorial centipede, 211th ACSNational Meeting (1966). Abstract ClNF 067. 1966: (b) molecular Diversity Manager Manual. SYBYLVersion 6.3, Tripos. Inc., 1699 S. Hanky Road. St. Louis. MO 63144, U.S.A., 1996; pp. 194-195.Collantes, C.R., Tong, W. and Welsh. W.J., Use of moment of inertia in comparative molecular field analysis tu model chromatographic retention of nonpolar solutes, Anal. Chem., 68 (1966) 2038–2043.Welsch, W.J., Tong. W., Collantes, E.R., Chickos, J.S. and Cagarin. S.G., Emthalpies of sublimation and formation of polycyclic aromatic hydrocarbons (PAHs) derived from comparative molecular field analy- sis (CoMFA): Application of moment of inertia for molecualr alignment, Thermochemica Acta, 290 (1966) 55-64.Hahn. M., Three-dimensional shape-based searching of conformationally flexible compounds, J. Chem.Inf. Cornput. Sci., 37 (1997) 80-86.Matter, H. and Lassen, D., Compound libraries for lead discovery, Chimica Oggi, 6 (1996) 9–15. Brown, R.D. and Martin, Y.C., Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection, J. Chem. Inf. Comput. Sci., 36 (1996) 572–584.Clark, R.D. and Cramer, R.D., Taming the combinatiorial centipede, CHEMTECH, 27 (1997) 26–30.

I1.

12.

13.14.

15.

18.

19.

20.21.

22.

23.

24.

25.26.

27.

224

Page 234: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity: A Medicinal Chemist's View

Hugo KubinyiDrug Design, BASF AG, 0-67056 Ludwigshalfen, Germany

Several 3D approaches discussed in this volume describe methods for the analysis andquantitative description of chemical similarity. The underlying concept is that chemicalsimilarity is reflected by similar biological activities — i.e. chemically closely relatedanalogs should be related in their mode of action, as well as in their relative potencies.This fundamental assumption has, indeed, been used in medicinal chemistry research,and has led to many valuable drugs.

However, chemical similarity may have different facets if a computer chemist or amedicinal chemist look at the compounds. There is no argument that for maximalaffinity a ligand of a biological macromolecule has to fit the binding pocket geometric-ally and that hydrophobic surfaces of the ligand and the binding site have to be com-plementary. The functional groups of the ligand need a separate consideration. Forlipophilicity, there is no significant difference between, e.g. -O- and -NH- in anorganic molecule; for ionization. there is a big difference whether the nitrogen atom ispart of a basic group (an amine) or a neutral group (e.g. in an amide); and for binding,potency differences of several orders of magnitude may result from the exchange of thehydrogen bond acceptor -O-against a donor function -NH-.

1. Similarity as a Design Principle in Lead Optimization

Nearly all drugs result from the optimization of a lead structure. Sources of such leadsare natural products from plants or microorganisms, synthetic chemicals or their inter-mediates. hits from (high-throughput) screening of in-house and combinatorial libraries,rational concepts from a biochemical pathway or the unexpected observation of a thera- peutically useful side-effect of a drug. Most often, the biological activity of a lead struc-ture is neither optimal, with respect to its efficacy, nor with respect to specificity,bioavailability, pharmacokinetics, toxic and other side-effects. Chemists perform moreor less systematic variations of lead structures, using the experience of about 100 yearsof medicinal chemistry and the results of (quantitative) structure-activity relationships.

The principle of bioisoteric replacement of functional groups serves as a successfuloptimization strategy [1–3]. Its systematic application has resulted in a broad variety oftherapeutically used drugs, many of them finally having the desired combination offavorable properties. A few examples of typical but different consequences of isostericreplacement of atoms or groups are illustrated by compounds 1–3 (Fig. 1), some otherswith unexpected effects on biological activities are discussed in later sections of thischapter.

In their attempts to optimize lead structures, medicinal chemists intuitively follow the principles of evolution. In genetic and evolutionary algorithms, randomly generatedstarting models (the lead structures) are reproduced involving random mutations andcrossover (the chemical variation of the structures). Better models (compounds) are keptfor further modification; worse ones are discarded. The biological activity, in later

H. Kubinyi et al. (eds. ), 3D QSAR in Drug Design, Volume2. 225–252.© 1998 Kluwer Academic Publishers. Printed in Great Britain.

Page 235: 3D QSAR in Drug Design. Vol2

Hugo Kubinyi

Fig. 1. Different effects of isosteric replacement. (a) The substitution of all three iodine atoms of thyroxine 1aleads to an analog 1b which still acts as a thyromimetic agent. (b) Replacement of an ester function, –O–CO–, by an amide function –NH–CO–, most often produces analogs with higher metabolic stability: if this replace-ments is done in acetylsalicylic acid 2, an inactive anolog results because the amide is nu longer able totransfer an acetyl group to a certain serine residue of cyclooxygenase, (c) p-Aminibenzoic acid 3a is an es- sential metabolite of microorganism: if the acid group, –COOH, is replaced by a sulfonamide group–SO2NH2, the antimetabolite sulfanilamide 3b res ults (active metabolite of the antibacterial sulfonamidesulfamidochrysoidine

stages a selectivity index or some other biological property, serves as the ‘fitness func-tion’ for the ‘survival’ of certain structural entities. That genetic and evolutionary algor-ithms, indeed, reduce the effort in searching the most active analogs has recently beenconfirmed by dedicated investigations (e.g. [4,5]).

2 .

Other common structural modifications in the optimization of a lead structure are thedissection of rings or the rigidifcation of flexible molecules. Molecules with severalrotatable bonds may adopt many different geometries — some of them being favorablebecause of low internal energies, others being less favorable because of van der Waalsor electrostatic repulsions between non-bonded atoms or groups. If different conforma-tions of such molecules are ‘frozen’ by closing rings between certain atoms, either oneof two very different consequences results. If the frozen conformation differs from the

The Biological Activity of a Ligand Depends on its Flexibility

226

Page 236: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity: A Medicinal Chemist’s View

bioactive conformation of the flexible lead or if the added atoms interfer with thebinding, biological activity will be more or less destroyed. If the ring closure stabilizesthe bioactive conformation, usually a significant increase in biological activity results.This comes from the fact that the binding of a flexible analog is entropically unfavor-able, due to a loss of rotational degrees of freedom, whereas the rigid analog has alreadylost its flexibility prior to binding. Two examples of highly similar, flexible and rigidanalogs are the pairs 4 and 5 [6] and 6 and 7 [7], respectively, where the rigid analogsare significantly more active than their flexible counterparts (Fig. 2).

The computer program CAVEAT was developed for the design of rigid analogswhich bear a pharmacophore in a certain geometry [8]. CAVEAT starts from a struc-

Fig. 2. Rigidification of a bioactive conformation significantly increases biological activities ( a ) If an addi-tional ring is introduced into the thermolysin inhibitor 4 to produce antilog 5, a 20-fold increase in affinity isobserved (b) Introduction of a methyl group into the H+/K+-ATPASE inhibitor 6a reduces biological activity by a factor of 8, most probably due to a destablization of the bioactive conformation; if the substituent R in6b is extended to a new ring (compound 7). which fixes the bioa conformation, the biological activity of6b i s enhanced by a factor of150.

227

Page 237: 3D QSAR in Drug Design. Vol2

Hugo Kubinyi

tural hypothesis or from the known 3D structure of a ligand, e.g. a polypeptide. and ex-tracts vectors of residues that participate in binding. In a peptide, these vectors are e.g.the Cα –Cβ bonds of the amino acid side chains. Then the program identifies ring systems that are suited to accommodate these residues in exactly the same relativegeometries.

A lead structure, where a significant increase in binding affinity is achieved by stericconstraints, not by ring closure, has been described by Kaplan and Bartlett [9]. Inthe flexible phosphonate tripeptide Cbz–Phe–Ala[PO2

-]–O–CH(CH3)COOH(= Z–Phe–Alap –(O)–Ala; p indicates a PO2

- residue instead of a C=O group). inhibitory activities against the zinc protease carboxypeptidase A increase significantly with thestepwise introduction of bulky residues. In the series Z–Phe–Alap–(O)–Ala,Z–Phe–Alap–(O)–Phe and Z–Phe–valp–(O)Phe, affinities increase from 56 nM to

0.001 nM and 0.000010-0.000027 nMol (i.e. 10-27 fMol). The more active analogshave a higher lipophilicity, due to the additional phenyl group in phenylalanine, as com-pared to alanine, and due to the isopropyl group instead of the methyl group of theC-terminal lactic acid residue. But in this case. the affinity difference of about six ordersof rnagnitude cannot be attributed only to the increase in lipophilicity which changesby less than three orders of magnitude. It must also result from the conformationalconstraint imposed on both phenylalanines by the central valine and which, by fortune,stabilizes the bioactive conformation. In terms of ‘chemical similarity’, all threeanalogs are closely related: with respect to their internal flexibility and their preferredconformations, they are not.

3.

For compounds with comparable flexibility, most often similar analogs have alsosimilar biological potencies. That this need not always be the case can be seen by acomparison of three thermolysin inhibitors 8a–c (Fig. 3) [10,11]. All analogs 8a and c,with X = -NH-and –CH2–, are about 1000 times more potent than the X = –O–analogs8b (R = OH or different amino acids). The explanation for this effect can be easilyderived from the 3D structure of the complex of thermolysin with the inhibitor 8a(R =-Leu-OH). If X is an -NH- group (series 8a), a hydrogen bond is formed betweenthis group and the oxygen atom of an alanine carbonyl group. In the –O– analog 8b, thishydrogen bond cannot be formed; in addition. an electrostatic repulsion between thetwo oxygen atoms results. The biological activity of the –CH2– analog 8c has beenpredicted [12] to be comparable to the -NH- analogs and much higher than for the-O-analogs. This was later confirmed by the synthesis of the inhibitors 8c [11] .

A similar but less pronounced effect is observed for the thrombin inhibitors 9a–c(Fig. 4) [13]; in this case, the non-bonded contact between the –X– group of the ligand and the carbonyl group of Gly 216 in the binding site of thrombin is responsible for thestructure-activity relationship. I f , on the other hand, the carbonyl group of 10a, whichforms a hydrogen bond with the -NH- group of Gly 216, is replaced by –CH2– (10b),the affinity is reduced by more than three orders of magnitude (Fig. 4) [13].

Neutral endopeptidase 24.11 (NEP 24.11 ; enkephalinase) is a zinc protease withsome homology to thermolysin. The NEP inhibitors 11a–d (Fig. 5) show a different

228

Biological Potencies of Similar Molecules

Page 238: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity: A Medicinal Chemist’s View

Fig. 3. The X = –NU– group of the thermolysin inhibitors 8a forms a hydrogen-bond with the carbonyl oxygen atom of Ala 113: this affinity-enhancing effect is only moderated by desolvation of the ligand in going from the free to the bound state. All oxygen anal 8b are much less active, because they cannot form such a hydrogen-bond; in addition, there is the unfavorable desolvation effect and an electrostatic repulsion betweenthe oxygen atoms of the ligand and the binding site.Like the –O– analogs 8b, the –C –H2 analogs 8c cannot form the hydrogen-bond to Ala 113, but this disadvantage is countbalanced by the effect that no desolvation of the CH2 group of the ligand and no electrostatic repulsion take place. The differences between X = –NH–, –0– and –CH2– are identically observed whether R = –OH, –GIy–OH or –Leu–OH.

structure-activity relationship, as compared to the preceeding examples. Here the lipophilic –CH2– and -S- analogs are much less active than the -NH- and -0- analogs,which both have comparable biological activities [14]. As long as the 3D structure ofNEP 24.11 remains unknown, no explanation can be given for this effect.

An unexpected structure-activity relationship has been observed for the macrocyclicrenin inhibitors 12a and b (Fig. 5) [15]. Modelling of the complex of 12a with theaspartyl protease renin indicated that X = –NH- should form a hydrogen bond with oneof the Asp 226 oxygen atoms. Thus, the replacement of -NH- by -0- should reducethe affinity The opposite was observed; the affinity of analog 12b is increased by morethan two orders of magnitude. The authors discuss several possible explanations for thisobservation but finally conclude: ‘we have attempted to find a plausible basis for thissurprising reversal in potency, but without much success . .. the comparison [of bothanalogs] is therefore intrinsically more difficult, and will constitute a more demandingtest for thermodynamic simulation techniques [as compared to the thermolysininhibitors. Fig. 3]’.

229

Page 239: 3D QSAR in Drug Design. Vol2

Fig

. 4. T

he in

hibi

tor

9a,fo

rms

two

hyd

roge

n-bo

nds w

ith g

lyci

ne 2

16 o

f the

ser

inep

rote

ase

thro

mbi

n. A

s co

mpa

red

to th

e th

erm

olys

inin

hibi

tors

8a–

c (F

ig. 3

). a

si

mil

arbu

t muc

h le

ss p

rono

unce

d ef

fect

is

obse

rved

by

stru

ctur

al v

aria

tion

of t

he g

roup

X in

the

thro

mbi

n in

hibi

tors

9a–

c. I

f, on

the

othe

r ha

id, t

he c

orbo

nyl g

roup

of

10a

is r

epla

ced

by a

met

hyle

ne g

roup

, the

resu

lting

pse

udop

eptid

c10

b sh

ows

a 20

00–f

ield

wea

ker

inhi

bito

r act

ivity

.

230

Page 240: 3D QSAR in Drug Design. Vol2

Fig

. 5. T

he N

EP

24.

11

inhi

bito

rs 1

la a

nd b

are

muc

h m

ore

pote

nt th

an th

e m

ore

lipop

hili

can

alog

s11c

and

d. T

he m

acro

cycl

icre

nin

inhi

bito

r 12

a is

less

pot

ent

than

its

oxyg

en c

inal

q 12

b,al

thou

gh it

s N

H g

roup

sho

uld

form

a f

avor

able

hydr

ogen

-bon

dw

ith

the

carb

oxyl

ate

ofA

sp 2

26 o

f the

enz

yme.

231

Page 241: 3D QSAR in Drug Design. Vol2

23 2 Fig.

6. 3

,4-D

ihyd

roze

bula

rine

13

is a

wea

k in

hibi

tor o

f cyt

idin

e de

amin

ase.

Zeb

ular

ine

14 in

tera

cts w

ith th

e en

zym

e as

the

3,4-

hydr

ate

15, a

s con

firm

ed b

y X-

ray

stru

ctur

e an

alys

is. T

he fa

vora

ble

inte

ract

ions

of t

he h

ydro

xy g

roup

enh

ance

the

biol

ogic

al a

ctiv

ity b

y m

ore

than

seve

n or

ders

of m

agni

tude

. Neb

ular

ine

16, a

stro

ngad

enos

ine

deam

inas

e in

hibi

tor,

is a

lso

prop

osed

to in

tera

ct a

s the

1,6

-hyd

rate

.

Page 242: 3D QSAR in Drug Design. Vol2

Similairity and Dissimilarity A Medicinal Chemist’s View

There are many examples in literature where the introduction of an -OH group into aligand either causes an increase or a decrease of biological affinities. From a theoreticalpoint of view, this is not surprising. If the new -OH group forms hydrogen bonds withpolar groups at the binding site (either as a hydrogen bond donor or as an acceptor), thenet free energy depends on the balance between the desolvation energies of the watershells at the surfaces of the ligand and the binding site, as compared to the energy of theformed hydrogen bond/s and the entropy gain by the release of some water molecules.Certain tightly bound water molecules i n the binding cavity of a protein (usually seen inthe X-ray structures), e.g. those which form more than two hydrogen bonds to theprotein, are not easily removed. The attempt to introduce an -OH group into the ligand,to replace such a water molecule, must necessarily fail.

An example where this is not the case, and where significantly enhanced bindingaffinities result after the introduction of such a hydroxy group, are the cytidine andadenosine deaminase inhibitors 13-16 (Fig. 6) [16,17].In this special case, the resultingaffinity differences are 7 to 8 orders of magnitude!

4.

Functionally corresponding proteins from different species have identical or verysimilar 3D structures, but they normally differ in their amino acid composition.Although they always show, dependent on the evolutionary relationship between thetwo apecies, a certain degree of homology, structure-activity relationships maysignificantly differ, even after the replacement of just one single amino acid by anotherone. This has been shown, for example, by a comparison of rat and human 5-HT recep-tors [18]. Although both have about 90% identity and more than 95% homology in theiramino acid sequences, the binding affinities of a series of 5-HT receptor ligands andß-adrenergic blocking agents are not related (n = 10; r = 0.27). If one (!) amino acid of the human receptor, i.e. threonine 355, is replaced by the corresponding amino acid ofthe rat receptor, i.e. asparagine, the binding affinities of the ligands change significantly.Now the human mutant 5-HT receptor behaves like a rat receptor; all affinities to bothreceptors are more or less identical (r = 0.98) [18].

In the past, new compounds have always been tested in animals before they could beapplied to humans. Human proteins were not available, except in rare cases. Only forthose proteins that could be extracted from human material, e.g. hemoglobin or throm-bin from blood, could one predict the human biological activity of a new drug fromin vitro studies, prior to animal studies. With the progress in gene technology, it is nowpossible to produce human proteins in sufficient quantities to establish test models.Thus, their biological activity in humans can be forecasted from investigations at the molecular level. How important this is, can be illustrated, e.g. by the activities of therenin inhibitor remikiren 17 against the renins of different species (Fig. 7) [19].

Different QSAR models for dihydrofolate reductase (DHFR) inhibitors, chemicallyrelated to the antibacterial drug trimethoprim 18 (Fig. 8), describe the inhibitory activi-ties versus E. coli and L. casei DHFR [20].Whereas in the case of E. coli DHFR, the3-,4- and 5-substituents at the benzyl group contribute to biological activities, only the

Biological Potencies versus Similar Biological Targets

233

Page 243: 3D QSAR in Drug Design. Vol2

Fig

. 7. R

emik

iren

17

is a

muc

h st

rong

er in

hibi

tor

vers

us h

uman

and

mon

key

reni

s, a

s co

mpa

red

to d

og a

nd r

at r

enin

s. I

fthi

s co

mpo

und

had

only

bee

n te

sted

in

smal

l lab

orat

ory

anim

als

as w

as th

e ca

se b

efor

e ge

ne te

chno

logy

cou

ld b

e ap

plie

d in

dru

g re

sear

ch),

it w

ould

not

hav

e be

en c

onsi

dere

d fo

r fu

rthe

r de

velo

pmen

t, du

eto

its

wea

k B

iolo

gica

l ac

tivi

ty.

234

Page 244: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity AMedicinal Chemist’s View

Fig. 8. Trimethoprim 18 is proposed to interact with dihydrofolate reductase in its protonated form, whereN1 changes its character from a hydrogen-bond acceptor to a hydrogen-bond donor. Due io the positive charge of the ligand, the exchanged of negatively charged amino acids of the protein to neutral amino acidsand of neutral to positively charged amino acids reduce the affinity of the ligand.

3- and 4-substituents are relevant for the activities against L. casei DHFR. Several yearslater, the X-ray analyses of both enzymes explained these differences: in L. caseiDHFR, there is a much narrower pocket, due to a bulky leucine in the binding site, ascompared to a flexible methionine side chain in E. coli DHFR [20].

Whereas many differences in the binding affinities to related biological targets can beattributed to different non-covalent interactions with the amino acids of the bindingsites, such effects may also result from more distant changes in a protein. Trimethoprim18 is thought to bind to DHFR in its positively charged form. An E. coli mutant, wherethe negatively charged glutamate residue 1 18 is replaced by a neutral glutamine. has a4.5-fold lower affinity for trimethoprim, despite the fact that the modified group is about15 Å apart from the charged N1 of trimethoprim. An even more pronounced effect isobserved in a double mutant (Glu 118 Gln, Leu 28 Arg): the affinity of trimethoprim isreduced by a factor of 190, although the positive charge of the arginine residue is about8 A apart from the positive charge of the ligand [21]. This reduction of binding affinitieshas been used to explain the selectivity of 5 to 6 orders of magnitude of trimethoprimagainst bacterial DHFRs, as compared to avian and mammalian DHFRs. In chickenDHFR, seven amino acids in the environment of the binding site (not in the binding siteitself) have changed charge from negative to neutral or from neutral to positive, as com-pared to E. coli DHFR. If one compares the effects observed in the E. coli doublemutant with these figures, the changes seem to be sufficient to explain the observedeffect [21].

Thiorphan 19 and its retro-inverso peptide, retro-thiorphan 20 (Fig. 9), are inhibitorsof the structurally related zinc proteases thermolysin and NEP 24.11. Although theaffinities of the ligand pair differ, from enzyme to enzyme, by three orders of mag-nitude, there are no significant differences between them [22]. Thus, they may be con-sidered to be ‘similar’, which was also confirmed by the X-ray structure analyses oftheir complexes with thermolysin; both compounds bind in an analogous manner andhave corresponding interactions with the protein [23]. On the other hand, their activities

235

Page 245: 3D QSAR in Drug Design. Vol2

Fig

. 9.

Thio

rpha

ri19

and

retr

o-th

iorp

han

20ar

e ‘s

imila

r’ w

ith r

espe

ct to

ther

mol

ysin

and

NE

P 2

4.11

, but

they

are

hig

hly

‘diss

imila

r’ w

ith re

spec

t to

angi

oten

sinco

nver

ting

enzy

me

(AC

E).

chan

ging

the

conf

igur

atio

n of

21a

and

bat

the

ring

junc

tion,

from

re

duce

s th

e N

EP

-inh

ibito

ry a

ctiv

ity to

a la

rge

exte

nt; t

hus,

the

sele

ctiv

ity a

gain

st A

CE

is in

crea

sed

by a

fact

or o

f 300

.th

eA

CE

-inh

ibito

ryac

tivity

, but

Hto

ß– H

in21

c,do

esno

tin

fluen

ce

236

Page 246: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity: A Medicinal Chemist’s View

against yet another related zinc protease, angiotensin converting enzyme (ACE), are significantly different [22]; with respect to this enzyme, the analogs are ‘dissimilar’. Whether the differences result from different binding modes or from an unfavorable binding geometry of retro-thiorphan in ACE will only be known if the ACE 3D struc-ture becomes available. Corresponding differences in the structure-activity relation-ships are observed in some other dual zinc protease inhibitors 21a–c (Fig. 9) [24]. Again, the degree of similarity and dissimilarity of the ligands depends on the biological target.

Corresponding problems are also observed in toxicity predictions for humans, as il-lustrated by the toxicities of lysergic acid diethylamide (LSD) and 2,3,7,8-tetrachloro-dioxin (‘dioxin’) in different species. LSD is only weakly toxic to mice (LD50 =50–60 mg/kg) and rats (LD50 = 16.5 mg/kg) but significantly more toxic to rabbits (LD50 = 0.3 mg/kg). An elephant, which was (cautiously?) treated with an LSD dose of 0.3 g (corresponding to about 0.06 mg/kg), died within several minutes [25]! Humans seem to tolerate LSD quite well. Cases of death have been observed as the result of drug-induced suicides, but not by any toxicity of the drug itself.

2,3,7,8-Tetrachlorodioxin is highly toxic to several species — e.g. guinea-pigs(LD50 = 0.6–2.5 µ g/kg) and mink (LD50 = 4 µg/kg). It is much less toxic for mice(LD50 = 114–280 µg/kg), rats (LD50 = 22–320 µg/kg), hamsters (LD50 = 1150–5000µ g/kg), rabbits (LD50 = 115–275 µ g/kg), dogs (LD50 > 100 and < 3000 µ g/kg) andmonkeys (LD50 < 70 µ g/kg) [26]. If one extrapolates from the monkey, dioxin shouldbe relatively ‘harmless’ for humans, at least if only acute toxicity is considered. However, the significantly different toxicity versus the closely related species guinea pig and hamster puts a caveat on too simple and straightforward extrapolations. Ofcourse, for humans the LD50 is not the relevant endpoint, but rather the no-effect LD0

level!

5.

Comparable biological activities of chemically similar structures do not necessarily result from analogous binding modes of all analogs. Since at least in 3D QSAR, if not in all QSARs, the correct superposition of all analogs within a series is a precondition for reliable results, a better knowledge of the binding modes is most important. Unexpected differences in the binding modes of some analogs certainly produce problems in 3D QSAR studies that are based on such a mutal alignment of the structures. But it is to be expected that also 3D QSAR modifications, which do not require a structural alignment, should have problems in such cases.

Differences in binding modes have been extensively reviewed [1,27–29]. Thus, only some examples will be summarized here, without detailed discussion. Purine nucleoside phosphorylase (PNP) inhibitors 22 and 23 show surprising structure-activity relation-ships, which can be explained only by inspecting the X-ray structures of the inhibitor complexes (Fig. 10). The exchange of an acceptor nitrogen to a donor nitrogen atom changes not only the interactions with the directly involved asparagine side chain, but also of an 8-amino group with the threonine hydroxyl and methyl groups [30,31].

Similar Structures and Activities, but Different Binding Modes

237

Page 247: 3D QSAR in Drug Design. Vol2

Fig

. 10.

The

X-r

ay s

truc

ture

s of

the

puri

ne n

ucle

osid

e ph

osph

oryl

ase

(PN

P)

com

plex

es w

ith

the

inhi

bito

rs 2

2an

d23

show

that

22a

form

s le

ss fa

vora

ble

hydr

ogen

- bo

nds

wit

h A

sp 2

43 th

an th

e de

saza

puri

ne 2

3a.O

n th

e ot

her

hand

, the

intr

oduc

tion

of a

n 8-

amin

o gr

oup

has

diff

eren

t con

sequ

ence

s in

23a

and

23a.

Whe

reas

the

new

am

ino

grou

p of

22b

form

s an

add

ition

al h

ydro

gen-

bond

to T

hr 2

42, i

t cau

ses

unfa

vora

ble

pola

r/no

npol

ar in

tera

ctio

ns in

23b

.

238

Page 248: 3D QSAR in Drug Design. Vol2

Fig

.11

.C

hang

ing

anox

ygen

atom

ofth

eth

erm

olys

inin

hibi

tor

24a

toC

H2

(24b

)le

ads

toa

reve

rsed

bind

ing

mod

e.A

lso

the

vira

lco

atpr

otei

nlig

ands

25an

d26

bind

in a

rev

erse

man

ner,

as

indi

cate

d. T

here

are

onl

y hy

drop

hobi

c gr

oups

and

one

hyd

roge

n-bo

nd d

onor

in th

e bi

ndin

g po

cket

of t

he v

iral

pro

tein

; th

is d

onor

can

fo

rm a

hyd

roge

n-bo

nd w

ith

eith

er th

e ox

azol

ine

or th

e is

oxaz

ole

ring

.

239

Page 249: 3D QSAR in Drug Design. Vol2

Fig

. 12.

Dih

ydro

fola

te27

and

the

posi

tive

ly c

harg

ed fo

rm o

f met

hotr

exat

e 28

(R =

p–C

6H4–

CO

NH

CH

(CO

OH

)CH

2CH

2CO

OH

) ha

ve d

iffe

rent

pat

tern

s of

hyd

ro-

gen-

bond

dono

rs a

nd a

ccep

tors

, if t

hey

are

supe

rim

pose

d ac

cord

ing

to th

eir

ring

syst

ems

(lef

t sid

e). I

f the

rin

g sy

stem

of 2

7is

rot

ated

aro

und

the

bond

whi

ch c

on-

nect

s th

e ri

ng s

yste

m w

ith

the

rest

of t

he m

olec

ule,

a m

uch

bett

er s

imil

arit

y of

the

hydr

ogen

-bon

d pa

tter

ns r

esul

ts (

righ

t sid

e). F

ille

d ar

row

s in

dica

te id

enti

cal

dire

ctio

ns o

f the

hyd

roge

n-bo

nd in

the

anal

ogs

27an

d28

.

240

Page 250: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity A Medicinal Chemist’s View

Carbobenzoxy-phenylalanine (Cbz-Phe) 24a and β-phenylpropionyl-phenylalanine(β-PPP) 24b (Fig. 11) differ only by their X group. Nevertheless, they bind to ther-molysin in an opposite direction, as uncovered by the determination of the 3D structuresof the complexes [32]. Reverse binding modes are also observed for some structurallyrelated viral coat protein ligands – e.g. compounds 25 and 26 (Fig. 11) [33,34].

Dihydrofolate 27, the substrate of dihydrofolate reductase (DHFR), and the DHFRinhibitor methotrexate 28 seemingly differ only in minor chemical details (Fig. 12).However, this small difference has significant consequences on the hydrogen-bond donorand acceptor patterns of the compounds. Accordingly, dihydrofolate and methotrexateaccommodate the binding site of DHFR in different modes [35], an effect which has beenpredicted from the X-ray structure of the dihydrofolate/DHFR complex [36].

The isomeric 1- , 2- and 4-phenylimidazoles are inhibitors of the oxidation ofcamphor by cytochrome P450cam. Nevertheless, only 1- and 4-phenylimidazole bind asexpected, with one nitrogen atom of the imidazole ring coordinated to the iron atom ofthe porphyrin system; 2-phenylimidazole cannot be accommodated in such a positionand binds in a completely different manner [37]. Structurally related dipeptide inhibitorsof the serine protease elastase bind with their corresponding residues in differentpockets of the enzyme [1,27,38]. The trypsin inhibitor p-guanidiniumbenzoate seems tobe one of the rare cases where X-ray crystallography provides clear evidence that oneand the same ligand binds in distinctly different orientations [39]. Many other examplesof different binding modes of closely related analogs have been described in the litera-ture [l ,27–29]. From the observation of such differences, Dagmar Ringe proposed touse this information to design ‘hydra-headed’ inhibitors, which fill all possible pocketsof a binding site.

6.

Several well-known examples of different modes of action of closely related analogscan be found in medicinal chemistry and pharmacology textbooks. Norepinephrine 29a,epinephrine 29b and isoproterenol 29c (Fig. 13) are adrenergic agonists. However, ingoing from R = H to R = CH 3 and R = isopropyl, the mechanism of action graduallychanges from a more or less specific a-adrenergic agonism to a pure ß-adrenergicagonism. If the two hydroxyl groups of isoproterenol are changed into chloro sub-stituents, the ß-adrenergic antagonist dichloroisoproterenol (DCI) 30 results; in fact, 30was the first ß-blocker.

An unexpected effect resulted after the introduction of an isobutyl group into the AT1-specific angiotensin II receptor antagonist 31a (Fig. 13).The original AT1 selectiv-ity is completely destroyed; in addition, the new analog 31b is no longer an antagonist,it is a strong agonist at the AT1 and AT2 subtypes of this receptor [40]. A related effecthas been observed for some structurally diverse cholecystokinin (CCK) receptorligands; the introduction of an isopropyl group at a certain nitrogen atom leads fromperipheral CCK antagonists to agonists [41,42].

The integrin family of receptors are structurally and functionally related membrane- imbedded glycoproteins that ‘integrate’ the extracellular matrix with the cytoskeleton.

Different Mechanisms of Action of Similar Molecules

241

Page 251: 3D QSAR in Drug Design. Vol2

Fig

. 13.

The

cha

nge

from

nor

epin

ephr

ine

29a

to e

pine

phri

ne 2

9ban

d is

opro

tero

nol 2

9cgr

adua

lly

chan

ges

the

mod

e of

act

ion

from

ag

onis

mto

ß-a

goni

sm.

Dic

hlor

oiso

prot

eren

ol (

DC

I) 3

0 w

as th

e fi

rst ß

-adr

ener

gic

anta

goni

st. I

ntro

duct

ion

of a

n is

obut

yl g

roup

into

the

AT

1-se

lect

ive

anta

goni

st 3

1apr

oduc

es th

epot

ent,

unse

lect

tive

AT

1/A

T2

agon

ist3

1b.

242

Page 252: 3D QSAR in Drug Design. Vol2

Fig

. 14.

The

che

mic

ally

clo

sely

rel

ated

inte

grin

ant

agon

ist32

aan

d32

bdi

ffer i

n th

eir s

elec

tivity

tow

ards

the

fibrin

ogen

(G

PlI

b/lI

Ia)

and

the

vitr

onec

tia (

v3)

re-

cept

ors b

y ne

arly

eig

ht o

rder

s of m

agni

tude

.

243

Page 253: 3D QSAR in Drug Design. Vol2

Hugo Kubinyi

Important members are the GPIIb/IIIa receptor (GP for glycoprotein, IIb/IIIa for theαIIbβ3 integrin) and the vitronectin (αvβ3) receptor [43,44]. Whereas the GPIIb/IIIa receptor binds, inter alia, fibrinogen and mediates blood platelet aggregation, the αvβ3receptor binds vitronectin and is responsible for angiogenesis, vascular smooth musclemigration and adhesion of osteoclasts to the bone matrix. Thus, both receptors are im- portant for drug design, the GP IIb/IIIa receptor for the development of antithrombotic agents and the vitronectin receptor for the development of drugs for the treatment ofcancer, as well as against restenosis and osteoporosis. However, both fibrinogen andvitronectin interact with their receptors using an identical structural domain, the so-called RGD motif (RGD = Arg-Gly-Asp), but obviously in different conformations.This was first confirmed by the investigation of different cyclic pentapeptides. con-taining D-amino acids in different positions [44,45].How far structure-based design (inthis case, based only on the structures of ligands) can be advanced is illustrated bycompounds 32a and 32b, which are highly selective GPIIb/IIIa and vitronectin receptorantagonists. respectively (Fig. 14) [46]. The selectivity of these analogs against the tworeceptors differs by nearly eight orders of magnitude, despite their close chemicalsimilarity! The only major difference is the distance between the carboxylic group,which mimics the Asp, and the basic nitrogen atom. which mimics the Arg of the RGDmotif.

A well-known example of different modes of action of closely related analogs are theanti-allergic agent promethazine 33, the neuroleptic chlorpromazine 34, and the anti-depressants imipramine 35a and desipramine 35b (Fig. 15). Despite their very similarstructures, 33 acts mainly as a histamine H1 antagonist, 34 is a dopamine antagonist,35a is an unspecific norepinephrine- and serotonin-uptake inhibitor and 35b is anorepinephrine-specific uptake inhibitor. Steroid hormones – e.g. 36–39 (Fig. 16) –give another example of strikingly different biological effects of chemically closelyrelated analogs.

Several therapeutically used drugs bind to more than one receptor and are corre-spondingly termed ‘promiscuous’ ligands. However, whether a certain (balanced)unspecificity of their mode of action is advantageous for therapy or not still remainsuncertain.

7. Chirality and Biological Activities

Due to the chiral nature of amino acids (except glycine), drug binding sites of proteinsare asymmetric. In the past, the different actions of enantiomers of chiral molecules on

economic reasons, racemates ofsynthetic drugs were used in therapy. Today, researchers and drug companies are more

activities, as well as in their pharmacokinetics. Enantiomers can even be differentiatedby their odor — e.g. the monoterpenes (R )- and (S)-limonene 40 and (R)- and (S)-carvone 41 (Fig. 17) [48]. Butaclamol 42 is mentioned as just one example ofsignificantly different eudismic ratios (i.e. ratios of affinities of the different enan-tiomers) versus different receptors (Fig. 17) [47].

244

enzymes and receptors were often neglected [47]. For

aware of the different effects of enantiomers and diastereomers [1,2]in their biological

Page 254: 3D QSAR in Drug Design. Vol2

Fig

. 15.

Des

pite

thei

r ch

emic

al a

nalo

gy, t

he tr

icyc

lic

com

poun

ds 3

3, 3

4 an

d35

have

dif

fere

nt m

echa

nism

s of

act

ion

and

diff

eren

t the

rape

utic

eff

ects

; 33

is a

n an

tialle

rgic

age

nt, 3

4is

use

d fo

r th

e tr

eatm

ent o

f sch

izop

hren

ia a

nd 3

5aan

db

for

the

trea

tmen

t ofd

epre

ssio

n.

245

Page 255: 3D QSAR in Drug Design. Vol2

Fig

. 16.

In

the

ster

oid

seri

es, m

inor

che

mic

al d

iffe

renc

es p

rodu

ce v

ery

diff

eren

t eff

ects

on

the

biol

ogic

al p

rope

rtie

s of

the

copo

unds

. Com

poun

ds 3

6–38

cor-

resp

ond

in th

eir

biol

ogic

al a

ctio

ns to

the

diffe

rent

fem

ale

and

mal

e se

x ho

rmon

es, c

ompo

und

39 is

a m

ale

sex

horm

one-

rela

ted

anab

olic

.

246

Page 256: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity A Medicinal Chemist’s View

Fig. 17. Enantiomers are different: the olfactory receptors of our nose can even distinguish such minor struc- tural differences as between 40a and b, as well as between 41a and b (the typical odors of the compounds are indicated below the structures). Activity differences between enemtiomers are difined by their eudismic

indices, the ratios of biologic al activities of the individual enantiomers. They significantly depend on the bio-logical target. Whereas both enantiomers of butaclamol 42 look very ‘similar’ to the muscarinic acetylcholinereceptor, they differ by more than three orders of magnitude in their affinities to the D2 receptor.

Enantiomers may even have opposite biological effects. Certain chiral barbituratesshow sedative activities in the oneenantiomer and convulsive activities in the other one.The nifedipine analog Bay K 8644, 43 (Fig. 18) was originally synthesized as a race-mate. In vitro tests with isolated smooth muscle strips showed it to be more or less inac-tive. However, several years later its high affinity to calcium channels was discovered:this demanded a separation of the racemate into the two enantiomers. One is a calciumchannel agonist, the other a weak antagonist. Höltje explained these differences by theasymmetry of the molecular electrostatic potentials of the enantiomers, if the molecules are superimposed by their ring systems [49].

247

Page 257: 3D QSAR in Drug Design. Vol2

Hugo Kubinyi

Fig. 18. The enantiomers of Bay K 8644 have opposite biological effects. Whereas 43a stabilizes the calciumchannel in its open form, 43b stabilizes its closed form

8.

Quantitative descriptions of the specific binding of ligands to a protein surface are based on the additivity principle of activity group contributions to the overall affinities. This has been extensively proven by dedicated investigations performed by Andrews, Goodford and Bohacek [50], but also by Böhm’s scoring function of the de novo designprogram LUDl [51]. Some other approaches are reviewed in this book. However, despite the differences between enantiomers, discussed above, classical QSAR has no good tools to distinguish different conformers or enantiomers. 3D QSAR methods aremuch more efficient in this respect.

The very first approach to describe structure-activityrelationships by N X N similar-ity matrices (N = number of compounds) was presented by Rum and Herndon in I991 [52]. More systematic investigations, using 3D similarity indices, were performed by Richards et al. in 1993 [53,54]. N X N Distance matrices are even appropriate for thequantitative description of nonlinear structure-activity relationships [50,55,56 ].

A very promising method, based on similarity fields, is the Comparative Molecular Similarity Indices Analysis (CoMSIA) approach [57,58]. As in CoMFA, all molecules are aligned according to a common pharmacophore: similarity indices of certain probes to the different molecules are then calculated in the positions of a regular grid. Thesefields, not just pairwise similarity indices of the molecules, are correlated with the bio-logical activities. Due to the ‘soft’ character of the Gaussian potentials that are used to derive the similarity fields, smoother and more contiguous contour maps result from CoMSIA analyses [57].

As compared to the CoMFA and CoMSIA methods, the use of N X N 3D similaritymatrices in QSAR has several advantages. Instead of a mutual alignment of all active and inactive molecules to a common reference frame, only pairwise alignments need to be performed [56,58]. This leads to a matrix with individual ‘similarity vectors’ for each analog. Inactive analogs do no longer distort the alignment of the active analogs. If SEAL similarity indices [59,60] are used in the generation of the N X N matrix, noteven a grid is needed. Thus, the frequently observed disturbance of the CoMFA results, after translation or rotation of the box around the molecules (e.g. [61]), are not ob-

The Similarity Principle in QSAR and 3D QSAR

248

Page 258: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity: A Medicinal Chemist’s ’s View

served. A disadvantage of the calculation of pairwise similarity coefficients is the factthat no contour maps can be generated from the resulting QSAR model.

9. Conclusion

There are many lessons to be learned for 3D QSAR applications. Firstly, a correct align-ment is the most important basis of each structure-activity analysis. Errors that areproduced there can never lead to correct models.

An overreliance in target-independent similarity indices has to be questioned, be-cause of the dependence of ‘similarity’ on the biological macromolecule to which theanalogs bind. Sophisticated investigations on the ‘dissimilarity‘ of chemical databases are most often futile. Similar compounds may have very different actions and different molecules can be very similar in their biological activities. Considering the examplespresented in this chapter and the many more cases in the literature, one has to admit thatwe are far from a deeper understanding of the details which underline the observed structure-activity relationships. Applying the results from one series of analogs toanother one may lead to completely wrong conclusions.

In the past, the use of molecular connectivity parameters in QSAR studies has led tohighly controversial discussions. The same may happen to the BCUT, WHIM and EVAparameters, all described in other chapters in this volume. But one fact can be taken forsure: all these methods describe the similarity between molecules to a different extent.Thus, one should not be surprised that these approaches work, in some cases, even aswell as other, more ‘rational’ methods.

Besides a correct alignment, the selection of the training and test sets is criticallyimportant to the results of a 3D QSAR study [58].Only if both sets cover approximatelythe same range of structural space, can one be sure that the analysis and the results aremeaningful. Cross-validation methods do not help in this respect. If the data set is highlyredundant, even cross-validation in groups will produce ‘good’ results, whereas in clearlynon-redundant datasets even a simple leave-one-out cross-validation must fail. One has tocome back to the original goal of QSAR studies: to derive a working hypothesis and todesign new analogs, based on one or several alternative working hypotheses. In this sense,QSAR is a theoretical tool which cannot solve all our problems but which definitely should accompany the experiments of the medicinal chemists and biologists.

References

1 .

2.3 .4.

Böhm, H.-J., Klebe, G. and Kubinyi, H., Wirkstoffdesign. Der Weg zum Arzneimittel, SpektrumAkademischer Verlag, Heidelberg, 1996.Wermuth, C.G. (Ed.), The Practice of Medicinal Chemistry, Academic Press, London, 1996. Wolff, M.E. (Ed.), Burger’s Medicinal Chemistry, 5th Ed., Vol. 1, John Wiley, New York, 1995.Weber, L., Wallbaum, S., Broger, C. and Gubernator, K., Optimization of the biological activity ofcombinatorial compound libraries by a genetic algorithm, Angew. Chem., 107 (1995) 2452–2154;Angew. Chem. Intern. Ed., 34 (1995) 2280–2282.Singh, J., Ator, MA., Jaeger, E.P., Allen, M.P., Whipple. D.A., Soloweij, J.E., Chowdhary, S. andTreasurywala, A.M., Application of genetic algorithms to combinatiorial synthesis: a computationalapproach to lead identification and lead optimization, J. Am. Chem. Soc., 118 (1996), 1669–1676.

5 .

249

Page 259: 3D QSAR in Drug Design. Vol2

Hugo Kubinyi

6. Morgan. B.P., Holland D.R., Matthews, B.W, and Barlelt, P.A., Structure-based design of an inhibitorof the zinc peptidase thermolysin, J. Am. Chem. Soc., 116 (1994) 3251–3260.Kaminski, J.J., Wallmark, B., Briving, C. and Anderson, B.-M., Antiulcer agents: 5. inhibition of gastricH+/K+-ATPase by substituted Imidazo[1,2-a]pyridines and related analogs and its implications in modeling the high affinity potassium ion binding site of the gastric proton pump enzyme, J. Med. Chem., 34(1991)513–541.Lauri, G. and Barlett, P.A., CAVEAT: A program to facilitate the design of organic moleculesJ. Comput.-Aided Mol. Design, 8 (1994) 51–66.Kaplan, A.P, and Barlelt, P.A., Synthesis and evaluation of tin inhibitor of carboxypeptidase A with a Kivalue in the femtomolar range, Biochemistry. 30 (1991) 8165–8170.Barlett, PA. and Marlowe, C.K., Evaluation of intrinsic binding energy from a hydrogen bonding groupin an enzyme inhibitor, Science. 235 (1987)569–571.Morgan, B.P., Scholtz, J.M., Ballinger, M.D., Zipkin, I.D. and Barlett, PA., Differential binding energy:A detailed evaluation of the infleunce of hydrogen-bonding and hydrophobic groups on the inhibition ofthermolysin by phosphorous inhibitors, J. Am. Chem. Soc., 113 (1991) 297–307.Merz, K.M. and Kollman, PA., Free energy perturbation of the inhibition of thermolysin: Prediction ofthe free energy of binding of a new inhibitor, J. Am. Chem. Soc., 111 (1989) 5649–5655.

13. Shuman, R.T., Rothenberger, R.B., Campbell, C.S., Smith. G.F., Gifford-Moore, D.S. and Gesellchen,P.D., A series of highly selective thrombin inhibitors, In Smith, J.A, and Rivier, J.E. (Eds) Peptides —chemistry and biology proceedings of the 12th American Peptide Symposium. Cambridge. MA, U.S.A., 1991, ESCOM Science Publishers R.V., Leiden, 1992. pp. 801–802.Stanton, J.L., Ksander, G.M., de Jesus. R. and Sperbeck, D.M., The effect of heteroatom substitution ona series of phosphonate inhibitors of neutral endopeptidase 24.11, Bioorg. Med. Chem. Lett., 4 (1994) 539-542.Weber, A.E., Steiner, M.G., Krieter, PA., Colletti, A.E., Tata, J.R., Halgren, T.A., Ball. R.G., Doyle,J.J., Schorn, T.W., Stearns, R.A., Miller. R.R., Siegl, P.K.S., Greenlee, W.J. and Patchett, A.A., Highlypotent. orally active diester macrocyclic human renin inhibitors, J. Med. Chem. 35 (1 992) 3755-3773. Wolfenden, R. and Kati, W.M., Testing the limits of protein-ligand binding discrimination withtransition-state analogue inhibitors, Acc. Chem. Res., 24 (1991) 209–215. Xiang, S., Short, S.A., Wolfenden. R. and Carter, C.W., Transition-state selectivity a single hydroxyl group during catalysis by cytidine deaminase, Biochemistry, 34 (1995 ) 4516–4523.Parker. E.M., Grisel, D.A., Iben, L.G. and Shapiro, R.S., A single amino acid difference accounts for the pharmacological distinctions between the ral and human 5-Hydroxytryptamine1B receptors,J. Neurochem., 60 (1993) 380–383.Clozel, J.-P. and Fischli, W., Discovery of reremikiren as the first orally active renin inhihitor. Arzneim.- Forsch. (Drug Research). 43 (1993) 260–262.Li, R.-L., Hansch, C., Matthews, D., Blaney, J.M., Langridge, R.. Delcamp. T.J., Susten, S.S. andFreisheim, J.H., A comparason by QSAR. crystallography, and computer graphics of the inhibition of various dihydrofolate reductases by 5-(X-Benzyl)-2,4-diaminopyrimidines, Quant. Struct .-Act. Relal.,1(1982) 1-7.Li, Z., Nguyen, D.T., Kitson, D.H., Bajorath. J., Kraut. J and Hagler, A.T., Origin of trimethoprim’s pharmacologic activity and differential binding to E. coli and chicken liver dihydrofolate reductases:Long-range electrostatic non-’lock and key’ specificity, Abstract of Presentations, Scientific Seminar Tour 1993, BIOSYM. San Diego, CA, U.S.A., 1993, pp. 14–19 Roques. B.P., Nobel, F., Daugé. V., Fournié-Zaluski, M. and Beaumont. A., Neutral endopeptidase 24.1I: Structure, inhibition and experimental and clinical pharmacology, Pharmacol. Rev., 45 (1993) 87-146.Roderick, S.L., Fournié-Zaluski. MC., Roques, B.P and Mathews, B.W., Thiorphan and retro-thiorphan display equivalent interactions when bound to crystalline thermolsin. Biochemistry, 28(1989) 1493–1497.Slusarchyk, W.A., Robl, J.A., Taunk, P.C., Asaad, M.M., Bird. J.E., DiMarco, J. and Pan. Y., Dual met-alloprotease inhibitors: V. Utilization of bicyclic azepinothiazolidines and azepinonetetrahydrothiazenesin constrained peptidomimetics of mercaptoacyl dipeptides, Bioorg. Med. Chem. Lett., 7 (1995)753–758.

7.

8 .

9.

10.

11.

12.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

250

Page 260: 3D QSAR in Drug Design. Vol2

Similarity and Dissimilarity: A Medicinal Chemist’s View

25.26.

28.

29.

12 August 1991. 7–14.Mattos, C. and Ringe, D., Multiple binding modes, In Kubinyi, H. (Ed.) 3D QSAR in drug design:Theory methods and applications, ESCOM Science Publishers B.V., Leiden, 1993 pp. 226–254.Meyer, E.F., Botos, I . , Scapozza, L. and Zhang, D., Backward binding and other structural surprises,Persp. Drug Discov. Design, 3 (1993) 168-195.Böhm, H.-J. and Klebe, G., What can we learn from molecular recognition in protein-ligand complexesfor the design of new drug?, Angew. Chem., 108 (1996) 2750- 2778: Angew. Chem. Intern. Edit..35 (1996), 2588–2614. Montgomery, J.A. and Niwas, S., Stucture-based drug design, Chemtech. 23 (1993) 30–37. Montgomery. J.A., and Secrist Ill, J.A., PNP Inhibitors, Persp. Drug Discov. Design, 2 (1994)205–220,Kester, W.R., and Matthew.;, B.W., Cystallographic study of the binding of dipeptide inhibitors tothermolysin: Implications for the mechanism of catalysis, Biochemistry, 16 (1977) 2506–2516 . Badger, J., Minor. I., Kremer, M.J., Oliveira, MA., Smith. T.J., Griffith, J.P., Guerin, D.M.A.,Krishnaswamy, S., Luo, M., Rossmann, M.G., McKinlay. M.A., Diana. G.D., Dutko, F.J., Fancher, M.,Ruechert. R.R. and Heinz, B.A., Structural analysis of a series of antiviral agents complexed withhuman rhinovirus 14, Proc. Natl. Acad. Sci. USA, 85 (1988) 3304–3308.Diana. G.D., Treasurywala. A.M., Bailey. T.R., Oglesby, R.C., Pevear, D.C. and Dutko, F.J., A modelfor compuonds active against human rhinovirus-14 based on X-ray cystallography data, J.Med.Chem., 33 (1990) 1306–1311.Bystroff, C., Oatley, S.J. and Kraut. J., Crystal structures of Eschericha coli dihydrofolate reductaseThe NAPD+ holoenzyme and the folate NADP+ ternary complex: Substrate binding and a model for thetransition stateBiochemistry. 29 (1990) 3263–3277Bolin, J.T., Filman, D.J., Matthews, D.A., Hamlin R.C. and Kraut. J., Crystal structure of Escherichacoli and Lactobacillus casei dihydrofolate reductase refined at 1.7 Å resolution, J. Biol. Chem., 257 (1982) 13650-13662.Poulos, T.L, and Howard, A.J., Crystal structures of metyrapone- and phenylimidazole-inhibited

not always bind analogously, Nature. Struct. Biol., 1 (1994) 55–58, Massumoto, O., Taga, T., Matsushima, M., Higashi, T. and Machida, K., Multiple binding of inhibitors in the complex formed by bovine tryosin and fragments of a synthetic inhibitor, Chem. Pharm. Bull.,38 (1990) 2253– 2255.Underwood, D.J., Strader, C.D., Rivero, R., Patchett. A.A., Greenlee. W. and Predergast, K., Structuralmodel of antagonist and agonist binding lo the angiotensin II, AT1 subtype G protein coupled receptor,Chem. Biol., 1 (1994) 211– 221. Aquino, C.J., Armour, D.R., Bermann, J.M., Birkemo, L.S., Carr, R.A.E., Croom, D.K., Dezube, M.,Dougherty, Jr., R.W. Ervin. G.N., Grizzle. M.K., Head, J.E., Hirst, G.C., James, M.K., Johnson. M.F., Miller, L.J., Queen, K.L., Rimele, T.J.,Smith, D.N. and Sugg, E.E., Discovery of 1,5-Benzodiazepineswith peripheral cholecystokinin (CCK-A) receptor agonist activity: I. optimization of the Agonist ‘Tigger’, J. Riled. Chem., 39 (1996) 562–569.Hirst, G.C., Queen. K.L., Sugg, E.E. and Willson, T.M., Conversion acyclic nonpeptide CCK antago- nist into CCK agonists. Bioorg, Med. Chem. Lett., 7 (1997) 511–514.Samanen, J., GPVIIb/IIIa antagonist, Ann. Rep. Med. Chem., 31 (1996) 91–100. Engleman, V.W., Kellogg, M.S. and Rogers. T.E., Cell adhesion integrins us pharmaceutical targets, Ann. Rep. Med. Chem., 31 (1996) 191– 200.Aumailley, M., Gurrath, M., Müller, G., Calvete, J., Timpl. R. and Kessler, II., Arg-Glv-Asp constrained within cyclic pentapeptides: Strong and selective inhibitors of cell adhesion to vitronectin and laminunfragment P1.FEBS Lett.291 (1991) 50–54.Keenan, R.. Miller, W., Ali, F., Barton, L., Bondinell, J., Burgess. J., Callahan, J.. Calvo, R.. Cousins,R., Gowen, M., Huffman, W., Hwang, S., Jakas, D., Ku, T., Kwon, C., Lago, A., Mombouyran. V.,Nguyen, T., Ross, S., Samanen, J.,Takata, D., Uzinskas, I., Venslavsky, J., Wong, A., Yellin. T. and

251

30.31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.44.

45.

46.

Hofmann, A., LSD — mein Sorgenkind. dtv/Klett-Cotta, Munich, 1993.

27.

Hanson, D.J., Dioxin toxicity: New studies prompt debate, regulatory action, Chem. Eng. News,

complexes of cytochromeP-450cam, Biochemistry, 26 (1987) 8165–8174.Mattos, C., Rasmussen, B., Ding, X., Petsko, G.A. and Ringe, D., Analogous inhibitors of elastase do

Page 261: 3D QSAR in Drug Design. Vol2

Hugo Kybinyi

Yuan, C., Nonpeptide vitronectin receptor antagonists, Abstract MEDl 236, 211th ACS National Meeting, 1996.Ariëns, E.J., Wuis, E.W. and Vetinga, E.J., Stereosselectivity of bioactive xenobiotics: A pre-Pasteur atti-tude in meficinal chemistry, pharmacokinetics and clinical pharmacology, Biochem, Pharmacol., 37 (1998) 9–18.Friedman. L. and Miller, J.G., Odor incongruity and chirality, Science. 172 (1971) 1044–1046.Höltje, H.-D. and Marrer, S., A molecular graphics study on structure-action relationships of calcium-antagonistic and agonistic 1,4-dihydropyridines, J. Comput.-Aided Mol. Design, 1(1987) 23–30.Kubinyi, H., QSAR: Hanseh analysis and related approaches. VCH, Weinheim, 1993.Böhm, H.-J., The development of a simple empirical scoring function to estimate the binding constantfor (a protein–ligand c complex of known three-dimensional structure, J. Cornput.-Aided Mol. Design,8 (1994) 243–256.Rum. G. and Herndon, W.C., Molecular similarity concepts: 5. Analysis of steroid-protein binding

Good. A.C., Peterson. S.J. and Richards. W.G., QSAR’s, from similarity matrices: Technique validation and application in the comparison of different similarity evaluation methods, J. Med. Chem., 36 (1993). 2929–2937.Good, A.C., 3D molecular similarity indices and their application in QSAR studies, In: Dean. P. (Ed.)Molecular similarity in drug design, Chapman and Hall. New York, 1995, pp. 23–56.Martin. Y.C., Lin, C.T., Hetti, C. and DeLazzer, J., PLS analysis to detect nonlinear relationships between biological potencyKubinyi, H., A General View on Similarity and QSAR Studies, In Computer-assisted lead finding and op- timization, Proceedings of the 11th European Symposium on Quantitative Structure–Activity Relationships. Lausanne, Switzerland, 1996: van der Waterbeemd, H., Testa, B. and Folkers, G. (Eds.):Verlag Helvetica Chimica Acta and VCH: Basel, Weinheim, 1997. pp. 7–28.Klebe, G., Abraham. U and Mietzner, T , Molecular similarity indies in a comparative analysis(CoMSIA) of drug molecules to correlate and predict their biological potency, J. Med. Chem., 37 (1994)4130–4146Kubinyi, H., Hamprecht, F.A. and Mietzner, T., Three-dimensional quantitative similarity-activityrelationships (3D QSiAR), from SEAL similarity matrices, manuscript submitted for publication.Kearsley, S.K. and Smith. G.M., An alternative method for the alignment of molecular structures:Maximizing electrostatic and steric overlap, Tetrahedron Comp. Methodol., 3 (1990) 615–633. Klebe, G., Mietzner, T. and Weber, F. Different approaches toward an automatic alignment of drugmolecules: application to sterol mimics, thrombin and thermolysin inhibitors, J. Comput.-Aided Mol. Design, 8 (1994) 751–778.Cho, S.J. and Tropsha, A., Cross-validated R2 guided region selection for comparative molecular fieldanalysis (CoMFA): A simple method to achieve consistent results, J . Med. Chem., 38 (1995)1060–1066.

47.

48.49.

50.51.

52.

53.constants, J .Am. Chem. Soc., 113 (1991) 9055–9060.

54.

55.

56.and molecular properties. J. Med. Chem., 38 (1995) 3009–3015.

57.

58.

59.

60.

61.

252

Page 262: 3D QSAR in Drug Design. Vol2

Pharmacophore Modelling: Methods, Experimental Verificationand Applications

Arup K. Ghose and John J. WendoloskiAmgen Inc., 1840 DeHavilland Dr., Thousand Oaks, CA 91320, U.S.A.

1.

The essential functionalities of a molecule necessary for its pharmacological activity arecalled pharmacophores. Although the idea of pharmacophores existed for a long time,Ehrlich [1] first introduced this terminology following the term chromophore which was used to describe the groups responsible for the color of a compound. The interest in theidea of pharmacophores has grown tremendously in the last few decades due to theavailability of computer graphics [2], a number of computational methods to determinethe pharmacophoric geometry [3–7] and various software for 3D database mining using the concept of a pharmacophore pattern match [8]. However, the validity of a pharma-cophore hypothesis came more from direct medicinal chemistry structure-activity rela-tionships (SAR) studies rather than from any theoretical calculations. Such calculationsmay be possible in the near future with the availability of an ever increasing number ofligand-protein complex structures, better molecular mechanics force fields and betterunderstanding of solvation factors. In such an approach, we have to show that the phar-macophoric groups provide the major contribution in binding affinity to the protein

computational methods for pharmacophore identification and its geometry determina-tion, a number of ways to validate the pharmacophore models and their applications inmedicinal chemistry to identify novel active compounds. There are several excellentreviews and papers on pharmacophore modelling [3–7].

2.

The Concept of the Pharmacophore and its Validity

compared to the rest of the molecule. The main objective here is to discuss the various

The Medicinal Chemistry Approach for Pharmacophore Determination andits use in Computational Chemistry

Pharmacophoric groups have been traditionally identified from the SAR data. When acompound with a novel biological activity is identified, medicinal chemists try slowly tomodify this structure to optimize its potency, as well as to identify the important struc-tural moieties responsible for the biological activity. The usual pharmacophoric groupsinclude: (i) hydrogen (bond) donors, (ii) hydrogen acceptors, (iii) hydrophobic groups,(iv) electron acceptors, (v) electron donors and (vi) polar bonds, etc, This process ofpharmacophore identification is illustrated with an example based on the hydroxamateinhibitors of the matrix metalloproteinases (MMPs) [5] collagenase and stromelysin.Collagenase cleaves collagen, the major constituent of cartilage, so blocking the activa-tion of these MMPs is considered to be a potential strategy to prevent a tissue damage.The general structure of the peptidomimetic hydroxamate inhibitors of the MMPs areshown in Fig. 1. The pharmacophoric group determination and pharmacophoric atomselection are interrelated but may not be exactly the same. The atom selection varies

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design, Volume 2. 253–271 . ©1998Kluwer Academic Publishers. Printed in Great Britain.

Page 263: 3D QSAR in Drug Design. Vol2

Arup K. Ghose and John J.Wendoloski

Fig. 1. The general structure and pharmacophoric groups of the peptidomimetic hydroxamate inhibitors.Pharmacophoric groups are indicated wiht an *.

with our objective: when we are interested in the determination of the geometry of thepharmacophore atoms. we should select a minimum number of atoms from the pharma-cophoric group. However, when we are interested in database searching for a pharma-cophore, we should use a sufficient number of atoms which will match compounds fromthe correct chemical class. If the N-H bond of an amide is found to be a pharma-cophoric group, one may select the N and H atoms as the pharmacophoric atoms forgeometry determination. However, one should use the whole -CONH- group for data-base searching to avoid hits from amines. Keeping these differences in mind, we willanalyze the SAR work done around this molecule and pharmacophoric atom selectionof the hydroxamates described below.

2.1.

Replacement of the scissile peptide bond by groups able to act as a Zn ligand is thebasic approach that has been used in the design of collagenase inhibitors. The hydroxa-mate functionality is a very effective scissile bond surrogate in collagenase inhibitors[9,10] . Other Zn ligands like thiol. carboxylate and phosphonates also serve this func-tion reasonably effectively. Removal of the Zn liganding group leads to a dramatic de-crease in binding affinity. suggesting that it is a pharmacophore. When we are interestedin the geometrical aspects of the pharmacophoric groups. it may be difficult to definethe pharmacophore if we have different Zn liganding groups because the various ligandswill have different coordination geometries around Zn. In the case where three fairlyrigid Zn coordinating groups come from the protein, however, the complete coordina-tion sphere is usually either a distorted square pyramid or a trigonal bipyramid. In thistype of modelling, a dummy atom for Zn satisfying the geometry of the Zn-ligand coor-dination sphere [II] may be the best way to represent this pharmacophoric group. Inthis representation. the criterion of a valid superposition of different types of liganding

254

A Zn ligand ut the scissile peptide bond

Page 264: 3D QSAR in Drug Design. Vol2

pharmacophore Modeling: Methods, Experimental Verification and Applications

groups is that the Zn-ligand bond(s) will approach approximately from the samedirection.

2.2.

It was known that the complete removal of the iso-butyl group led to a strong decreasein binding affintiy Decreasing the size to n-propyl led to a very small reduction inaffinity [10].This SAR indicated the importance of one of the terminal methyl groupsfor the activity. Since the two methyl groups are stereochemically different and therewas no experimental evidence for selecting the correct methyl group, the second carbonwas selected as one of the pharmacophoric atoms.

2.3.

Replacement of the P1´–P2´ amide linkage (-CONH-) by an ester (-COO-) or aretroamide (-NHCO-) led to inactive compounds, suggesting that -CO- may not bedirectly involved in the binding process with the MMPs. Methylation of’the peptide Ndecreased the binding potency. These observations led to the selection of the N-H bondas a pharmacophoric group.

2.4.

Replacement of the P2´–P3´ amide (-CONH-), as in ( 2 . 3 ) , led to a considerabledecrease in binding affinity. Two pharmacophoric atoms were used to represent each ofthese hydrogen donors to maintain the directionality of hydrogen-bond donation.

2.5.

This increased the binding affinity considerably. Ala at P2´, for example, showed con-siderably less potency [10] than Phe or Trp. This structure activity data led us to selectthe seven atoms as critical phamacophoric atoms (Fig. 1) .

3.

The computational methods for pharmacophore modelling may have two steps: (i) deter-mination of equivalent pharmacophoric atoms in a diverse set of active compounds; and(ii) pharmacophore geometry determination from a defined set of equivalent atoms in adiverse set of compounds.

3.I

A hydrophobic group on the PI´ aminoacid

A hydrogen-bond donor at the P1 –P2´

A hydrogen bond donor at the P2´–P3´

A hydrophobic alky or aralkyl group at the P2´ side chain

Computational Methods for Pharmacophore Modelling

Determination of the equivalent pharmacophoric groups in a diverse set of activecompounds

When the pharmacophore hypothesis originates from the SAR study of a single leadcompound, definition or identification of the equivalent pharmacophoric groups may notbe difficult. However, if we have several active compounds for the same biochemical or

255

Page 265: 3D QSAR in Drug Design. Vol2

Arup K. Ghose and John J. Wendoloski

biological system with a diverse set of structures. identification of the equivalent phar- macophoric groups may not be as straightforward. Any automation along this lineshould have the strategies: (i) to identify similar atoms or groups in terms of their chem-ical or physico-chemical properties: (ii) to estimate the relative position of the possiblepharmacophoric groups in the allowed low-energy conformations of the molecules for3D structural comparison; and (iii) to weigh similar pharmacophoric groups when therearc multiple choices. Although. from time to time, some success along this line has beenclaimed, [12–16], none of these methods have been well tested for diverse types of problem. In one of the earliest attempts, Crippen [12] abstracted the ligands by a set ofrepresentative points of specific types. The conformational flexibility of the moleculeswas taken into account by a distance range matrix of upper and lower limits of the dis-tances between these points in the energetically allowed conformations of the mole-cules. He then combinatorially determined the maximum number of matching points ofthe same type. The geometrical requirement for the matching of the points is that anytwo matching points should have either overlapped distance ranges or the distanceranges should be within a specified distance limit. Once an acceptable match has beenfound, the common distance range of the matched atoms can be used in the distance geometry embedding technique [17] to find the active conformation of the individualmolecules. Ghose et al. [13,14] used a set of important atoms from each molecule for acombinatorial exploration of a plausible superposition. The conformational flexibilitywas accounted for by taking a set of low-energy conformations separately.Superpositions of all other atoms were determined from the initial superposition of theimportant atoms. Each superposition was rated by a fit function utilizing the three most important physico-chemical properties of the atoms, namely charge. refractivity and hy-drophobicity. Atomic charge density is related to the electrostatic interaction. Atomic refractivity is related to the van der Waals interaction. and atomic logP is related to hy-drophobic interactions. These are the three most important forces in the binding of theligands with the biological molecules. Martin et al. [15], in their DISCO program, auto- mated and improved most of Crippen’s approach using a selected number of low-energyconformations. Jones et al. [16] used a very similar approach, except that their fit func-tion was optimized using the genetic algorithm (GA) where the gene was represented byatom match, as well as the conformational information. The general utility of thesemethods may be determined in the future when independent workers will apply thesemethods to a diverse set of problems

3.2

Most of the generally explored methods for pharmacophore modelling belong to this cat-egory, since quite often the phamacophore hypothesis can be formulated from tradi-tional SAR studies. The objective of these methods is straightforward: to find allpossible low-energy conformations of the molecules where the pharmacophoric groupscan be superimposed within a predefined distance limit. If a method tries to find any oneacceptable solution, its success may be limited, since it may not be the real solution. Onthe other hand, the method which tries to get all possible solutions may sometimes be

Pharmacophore geometry determination witha defined superposition hypothesis

256

Page 266: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

coinputationally intractable. Some of the methods reported to solve this problem areshown below.

3.2.1. Ensemble distance geometrySheridan et al. [18] formulated an elegant way of using Crippen’s distance geometry program for superimposing molecules. In this approach, they created an ensemble dis-tance matrix of all atoms of all the molecules that are to be superimposed. The intramol-ecular atomic distance ranges are given the usual distance ranges for that molecule.Many of these distances are simply the bond distances, bond angle distances and fixedtorsion distances. The intermolecular atomic distance ranges are zero to a small toler-ance distance for the atoms to be superimposed. The rest of the distance ranges areevaluated by the embedding program using the triangular inequality rule. The methodhas some advantages over Crippen’s original approach where the common distancerange of the superimposed a t o m were used to evaluate their geometry. The constraintsof the non-superimposed atoms further decreased the acceptable solution space.

However, it is not completely free from all the disadvantages of regular distance geometry, namely when there are multiple solutions of conformations, it does not giveany information about the other acceptable solutions, nor does it guarantee low-energy conformations. This method is computationally very inefficient too. One has to dia-gonalize an (n*m) X (n*m) matrix, where n is the number of molecules and m is thenumber of atoms in each molecule. One can get a comparable result by diagonalizingthe m X m atomic distance matrix of each molecular separately with the common dis-tance range for the overlapping atoms. However, the simplicity of the data preparationjustifies the extra computation for this method.

3.2.2.Earlier, Crippen introduced [19] the method of using a distance range matrix of theatoms for a fast evaluation of the superimposability of the molecules. Distance rangewas determined from the minimum and maximum distances between these atoms in theacceptable low-energy conformations. In this method, the criterion for superimpositionis that the superposed atom should have an overlapping distance range. This criterion isa negative test, rather than a positive test; in other words, if this test fails, the moleculescannot be superimposed. However, qualifying this test does not guarantee a super-position. The conclusions drawn from the distance range matrix are valid only in an n – 1dimensional space, where n is the number of atoms superimposed [20]. Our physicalspace has only three-dimensions. One can get a structure in a three dimensional spacewhich can best represent the distance range using distance geometry ernbedding.However. distance geometry embedding is not cornputationally fast enough to beapplied in a large number of test cases.

Another problem of embedding is that the structures obtained from simple embed-ding may not be energetically low. To overcome these problems, Ghose and Crippen[21] proposed the construction of a distance map in torsional space. In this approach, an ensemble of distance matrices. each one representing an energetically allowed grid oftorsional space, is created (Fig. 2). The grid is determined by the rotational increment of

Distance mapping in torsional space

257

Page 267: 3D QSAR in Drug Design. Vol2

A m p K . Ghose and John J. Wendoloski

Torsion Angle

ble pharmacophoric distances, finding overlapping energetically allowed distance gridsin all other molecules will suggest superimposable low-energy conformations and,therefore, the corresponding arrangement of the pharmacophoric atoms may be consid-ered as a possible geometry. If one does not have any prior knowledge or hypothesis ofthe pharmacophoric distances, then all distance matrices should be compared. There aremany advantages in this approach: (i) the selected conformations will always fall withinthe defined energy limit; (ii) the accuracy of the superposition will go down when thestep size for rotating the torsional bond is increased, but it will not miss the allowed su-perposition: and (iii) i t will detect all possible conformations that are superimposable.There are at least a few disadvantages: (i) the generation of the distance range matricescorresponding to the energetically allowed grids is somewhat inefficient; (ii) if there aretorsion angles which do not affect the location of the pharmacophoric atoms, multipleidentical distance matrices along those torsional bonds will be generated; and (iii) verysmall torsional increments may miss superimposable conformations unless the distancematching tolerence is increased.

258

the torsion angles. If one has a hypothesis to select a few of these matrices as the possi-

Fig. 2.. A two dimensional representation of the distance map in the torsional space (ω1 and ω2).

Page 268: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

3.2.3.Marshall et al. used an alternative approach where they mapped the torsion angles indistance space [22] .The generation of a torsion angle map is computationally moreefficient since once a grid size has been defined. one can very easily assign a grid toeach of the energetically allowed conformations during the conformational search ac-cording to its distance property. A schematic representation of torsional (orientation)mapping in the distance space is represented in Fig. 3 . The simplest approach to test thefeasibility of superimposition of a set of equivalent atoms in two molecules is tocompare their orientation map. If a grid is occupied by both the molecules, those conformations are superimposable if their chirality is the same [2]

The weakness of this approach is that the outcome is dependent on the distance gridsize. If one arbitrarily uses a very large grid size, most conformations will be placed i n afew grids and will suggest many poorly superimposable conformations as super-imposable [21] Too small a grid size will create many unoccupied grids and maysuggest many fairly superimposable conformations as non-superimposable simplybecause they occupied a close neighboring grid. The ideal grid size that can avoid

Orientation map in distance space

Distance d2

Fig. 3. A schematic tw o dimensional representation of an orientation map in the distance space. The circlesand the squares represent the conformations of tow different molecules satisfying the distance range of theoccupied grids, and d1 and d2 are the two pharmacophoric atom distances.

259

Page 269: 3D QSAR in Drug Design. Vol2

Arup K. Ghose and John J . Wendoloski

Fig. 4. Description of the various internal coordinates affecting the AD distance, as used in equations I and 2.

during a torsional angle increment in the conformational search process. The distancebetween atoms A and D during the rotation around bond B-C (Fig. 4) is given by:

(1)

where d’s, θ ’s and ω are the various bond distances. bond angle (Fig. 4).

The differentiation of r² with respect to ω gives:

= (2)

Equation 2 shows that the rate of change in distance between A and D depends on d1

and d3. The higher the values, the greater is the rate of change. In addition, the change ismaximal when the bond angles as well as the torsion angle arc 90°.Unfortunately. thepharmacophoric atoms A and D are often not directly bonded to B and C and, therefore,all of these variables may have a wide range. In other words, the appropriate gridspacing is a function of several variables and will differ from one atom pair to anotherand even from one conformation to another.

Creating a distance map from torsional space [21] was in a better position from thesedifficulties, because the distance range tends to change according to the increment stepsize of the torsion angles and the other variables, as shown in Eq. 1. The precision of thesuperimposed structure was determined by the coarseness of the torsional incrementduring the conformational analysis.

3.2.4. Disco programIn Disco, Martin et al. [15] automated the identification of possible pharmacophoricgroups in a molecule and applied Kuhl et al.’s [23] clique finding algorithm to deter-mine their superimposability. They, however, used a set of distance matrices represent-ing a few low-energy conformations of the molecules as used by Marshall et al. [3] andGhose et al. [13,14] instead of a distance range matrix. Since it used a distance tolerancefor accepting a distance match in a discrete distance representation of a flexible mole-cule, it often had to redo the matching with increased tolerance when no matching was

260

Page 270: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

found with at least three pharmacophoric groups. The method was successful in identi-fying the traditionally accepted pharmacophores of the benzodiazepines, as well as infitting some newer 2-aminothiazoles in the traditional pharmacophores. Disco is an attractive tool for generating alternate possible pharmacophore hypotheses. It is nowcommercially available [24] and it is expected to be extensively explored by theresearchers of this field in the near future.

3.2.5.Payne et al. probably was the first to apply [25] GA for pharmacophoric geometry de-termination. Genetic algorithm mimics the process of evoluation where new generationsare born by crossover and mutations of genes and the fittest members survive over the time. Any optimization process can be used by the GA programs by encoding the inde-pendent variables in a string of bits called chromosomes. The fitness of each chromo-some can be measured by a function. An initial population is randomly created and thefitness of its members are evaluated. A set of reproduction operators (crossover, muta-tion, etc.) is used to create the children. The fitness of the children is evaluated. Thechildren replace the least-lit members of the population. The process continues as longas it finds better-fitted children. Here the conformational variables were encoded in a bitstring and a fitness function was represented by the distance of the superimposed atoms.It is definitely more efficient than a Monte Carlo-type search. However, when there aremultiple superposable conformations, it may fail to detect all of them. There is no guar-antee that the method will find the global minimum. Sometimes convergence may be aproblem also.

3.2.6. DHYDM method The DHYDM (Distance Hyperspace Distance Measurement) [S] method was developedby Ghose et al. to avoid the grid size problem of the orientation-mapping approachdeveloped by Marshall et al. [22].This method involves the following steps.

(i) Do a conformational search and generate an orientation map (Fig. 3) for each mole-cule independently. The conformational analysis and orientation map generationcan be accomplished by Sybyl software [24].

(ii) Determine the distance of the centroid of the occupied grids of molecule 1 fromthose of the i’th (initially 2nd) molecule in the distance space and store theminimum distance for each grid using the following expression:

Genetic algorithm (GA) based approach

where dij represents the diatomic between the i’th occupied grid of the molecule 1 andthe j’ th occupied grid of molecule n in distance hyperspace, and dci,k represents thedistance of the center of the i’th grid along the k’th dimension. It is simply the mean ofthe minimum and maximum distances of the grid along the k’th dimension.

(iii) Repeat step (ii) for the rest of the molecules. If the minimum distance is greaterthan the stored value, replace it by the current minimum.

261

Page 271: 3D QSAR in Drug Design. Vol2

Arup K. Chose and John J . Wendoloski

(iv) Rank-order the minimum distance values and accept those grids which found occu-pied grids within an acceptable limit.

(v) Accept the conformation in the grid if' a rigid fit gives a good rms deviation of thepharmacophoric atoms from a conformation of the reference molecule in thatgrid. Since distances do not contain the chirality information (enantiomeric con-formations have the same distance properties), this step may be very important.

Unlike most other methods, the advantages of this method are:

1. The outcome of this experiment can be monitored to set an upper limit on the mol-ecular mechanics energy of an acceptable superimposable conformation. This maybe important if the best solution for a low-energy cutoff is not geometrically acceptable.When the molecules are not sufficiently diverse in their conformational behaviorand, therefore, are superimposable in many different low-energy conformations,the method will detect such a situation and will inform the investigator about multi-ple possibilities. In the absence of other criteria for the acceptance of these possibleconformations, one can prioritize on the basis of conformational energy.This method also accounts for the complete low-energy conformational space andnot just the grid points explored in the search process. The computational burden ofthis method, however. precludes its use for a fast but approximate 3D databasesearch [8].

2 .

3 .

The main disadvantage of this method is the computational speed. It needs a thoroughconformational search for all molecules. When there arc many torsion angles, themethod may still be used by increasing the angle increment and setting the unimportanttorsion angles to a local minimum.

4. Experimental Verification of a Pharmacophore Geometry

One obvious major- problem with computer modelling is that it is not an experimentalresult, even though the input data was supplied from experimental findings! To improveconfidence in the result, appropriate steps should be taken to validate the pharma-cophore model. Several approaches can be used to validate the pharmacophore model.

4.I .

Analysis of the binding affinities of the compounds which can/cannot attain the pharma-cophore geometry in the various allowed low-energy conformations may give an indirectsupport to the pharmacophore model. In general, if there is a low-energy conformation ofthe molecule satisfying the pharmacophore geometry, it should be active, otherwise not.However, this experiment may not be very conclusive since a molecule having a perfectmatch for the pharmacophore may suffer from steric repulsive with the binding proteinfrom the other parts of the molecule. Some pharmacophores may interact less stronglywith the protein and give detectable binding without proper alignment or even in itsabsence. Some molecules may bind in a totally different binding mode.

Analyzing binding affinities of comparable coinpounds

262

Page 272: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

4.2. Analyzing constrained compounds

Making constrained compounds to validate a pharmacophore theory is one approachoften used by medicinal chemists. The constrained compounds with appropriate phar-macophoric groups, in theory, should have a higher binding affinity due to smallerentropy loss upon binding with the protein. In reality, it may not happen the expectedway because the accepted or computed pharmacophore geometry may not be the idealgeometry or for the reasons already discussed. In an alternate view, if the constrainedcompound is not totally free from torsional rotation, positive binding may not alwaysconfirm the pharmacophore geometry.

4.3.

Is it possible to determine the binding conformation of the ligand without the tediousprocess of crystallizing and solving the ligand-protein complex structure? High-resolution NMR spectroscopy [5] may often be a good complementary tool for thispurpose. This method needs appropriate isotopically labeled inhibitors for generatingNOEs. The NOE constraints are determined in presence and absence of the bindingprotein. Wang et al. [5], for example. used an 15N-edited and decoupled NOESY spec-trum of compound Phe analog of I (Fig. 5 ) in the presence and absence of humanfibroblast collagenase (HFC). Analysis of NOE peaks derived from the labeled NH ofthe bound inhibitor. they inferred that there was a methyl group within a short distancefrom the labeled NH, possibly less than 3 Å. Since there is only one label in this com-pound, it was not possible to distinguish intermolecular and intramolecular NOEs. TheNOESY showed that the two prochiral ß protons have strong and comparable NOEintensities. On the other hand. the cross-peak with the phenylalanine ortho protons isvery weak. These results suggest that the two ß protons are at similar distances from theNH and the phenyl ring is probably in the trans position to it. This observation wasconsistent with the suggested active conformation of I.

4.4.

The most realistic test of the pharmacophore hypothesis involves the solution of theprotein-ligand crystal structure either by X-ray crystallography or NMR [5], althoughwe do not very often get the chance to perform such an elaborate experiment. Thisapproach needs a sufficient quantity of purified protein. The NMR study needs iso-topically labeled proteins. The X-ray method needs good-quality crystals of theligand-protein complex. In the current industrial drug-research setting, we often startwith a target protein without a known 3D structure, and a long Iead time before thestructure is solved. In absence of the protein structure, pharmacophore identification is acommon approach in drug research. As a consequence, in the future we will get moreexamples where ligand-receptor complex structures will support or refute pharma-cophore hypothesis. For the collagenase (HFC) inhibitors, Ghose et al. used theirDHYDM method and a set of partially constrained and unconstrained inhibitors (Fig. 5 )

Biophysical determination of the binding conformation

Biophysical determination of the ligand-receptor complex structure

263

Page 273: 3D QSAR in Drug Design. Vol2

Arup K. Ghose and John J. Wendoloski

Fig. 5. The structures of the various constrained and unconstrained collagenase inhibitors used for the pharmacophore geometry determination.

to propose a pharmacophore geometry and, finally, solved a collagenase-ligand com-plex structure using X-ray crystallography. Ten of the eleven torsion angles of thebound conformation of the ligand were within an acceptable error limit (Table 1). Astereoview of the computed pharmacophore model and the X-ray structure of a relatedbound ligand are shown in Fig. 7.

264

Page 274: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

Fig. 6. Description of the torsion angles as used in Table 1.

ªSee Fig. 6 for the description of the torsion angles.bThe angle range is given from the first experiment; the values within parenthesis are the average value indifferent inhibitors.CThe value suggested from the alternate conformation of trans nine membered lactam in the CambridgeStructural Database.dThedifference from the mid-point of pharmaacophore model range.eThevalue from the unconstrained acyclic compound I only.

Fig. 7. A stereoview of the computed pharmacophore model superimposed on the X-ray structure of a relatedbound ligand.

265

Page 275: 3D QSAR in Drug Design. Vol2

Arup K. Ghose and John J. Wendoloski

4.5.

In the recent literature, there were a considerable number of papers which studied theconformations of a set of bioactive compounds either by X-ray [26–30] or by NMR[31–35] to propose a pharmacophore model. Whether such methods are justifiedis arguable, since it is the protein or biological molecules which dictate the bindingconformation, sometimes with minor adaptation on its own. A ligand will bind to thebiomolecule provided it can take a complementary conformation with a low expense ofenergy. In other words, the binding conformation may not be the most populated con-formation in the free state. Nicklaus et al. [36] recently studied the various ligandsavailable in the protein databank. Many of these compounds were also available in the Cambridge structural database in their unbound state. None of these structures resem-bled the bound conformations. Only a careful analysis of the structures of multiple com-pounds in their unbound conformation may shed light on their binding conformation.

5. Applications of Pharmacophore Modelling

Pharmacophore modelling is an excellent research tool in drug research. It is useful atthe very early stage when no lead has been generated, but we know a natural substrate as well in the later stage when a ligand-receptor complex structure has been solved.Being a qualitative approach, one does not need to have very good-quality binding data.As the organic synthetic approaches become more automated, pharmacophore modellingis likely to be used more extensively. We will discuss here the most common applica-tions of pharmacophore modelling.

5.1.

The most common and useful application of pharmacophore modelling is in databasemining. In recent years, a large number of very useful databases have been supplied bythe chemical software companies, such as ACD (Available Chemical Directory),MDDR (Molecular Drug Directory Report) and CMC (Comprehensive MedicinalChemistry) [37]. The development of the reliable methods of conversion of 2D to 3Dstructures enabled the creation of these and proprietary 3D databases. These 3D data-bases can not only be searched for traditional substructures, but also for a three-dimensional pharmacophoric geometry. One can do such searches under various conditions. When several lead compounds are available, one can try to develop auniquely defined 3D pharmacophore hypothesis using one of the approaches discussed above, and subsequently, use that information for searching novel leads. When only onelead structure is available, the possible pharmacophoric groups may be searched for afew of its low-energy conformations. When such a lead is unavailable, natural sub-strates may be used. However, a 3D database search may be complicated by the fact thatmost organic molecules are flexible. The choice of strategy used to cover the flexibilitymay change with the problem. For high-throughput screening, one may simply use a distance range matrix for this purpose. A comparison of the distance range matrix

Biophysical determination of the conformation of the free ligand

Pharmacophore model and database searching

266

Page 276: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

(being a negative test, see section 3.2.2) will pick a certain number of compounds thatwill not be able to satisfy the pharmacophore geometry. But absorbing those false hitsmay not be difficult in a high-throughput screening.

More difficult screening may need more careful selection of the compounds. Onesuch approach was used in the Galaxy 3D database [38]. The 3D database here wascreated after a rapid conformational search capable of identifying a very low-energyconformation, if not the global minimum energy conformation. The advantages ofkeeping a very low-energy conformation are two-fold. If a rigid matching is done, thehit will be valid since the conformation used is an accessible conformation. When aflexible search is desired, the conformation can be computed directly from the distancegeometry embedding program by setting the required pharmacophoric distances alongwith the bond distance and bond angle distances. Comparison of the energy of the com-puted conformation (if available) with the energy of the conformation in the databasewill immediately prompt its energetic acceptance.

5.2. Pharmacophore model and 3D QSAR

Most 3D QSAR techniques [13,39–41] need to superimpose the 3D structures of theinhibitors as the first step. The pharmacophore modelling may be used as the basis forsuch superposition. In general, there are two types of 3D QSAR approaches. Thephysico-chemical property-based approaches like REMOTEDISC [13,40] where thelocal properties are clustered spatially from the superimposed 3D structures of theligands and the clustered properties are correlated with the biological activity. Suchmethods give a physical interpretation of the nature of the hypothetical binding pocket ofthe protein. The field-based approaches like COMFA [41] calculate the interaction of theligand molecules with atoms representative of the protein atoms at an arbitrarily definedset of grid points. The interaction energies are correlated with the binding affinity todevelop a comparable interpretation of the nature of the protein-binding pocket. Whatcomes out of these approaches depends on the conformation used, as well as the mode ofsuperimposition of the ligands. Although COMFA does not supply any definite algor-ithm for the superimposition of the ligands, it is definitely a good idea to use the pharma-cophore modelling to initiate the COMFA. Hopfinger’s molecular shape analysis [42]and APEX-3D [43] often used pharmacophore hypothesis to a 3D QSAR analysis.

5.3.

Pharmacophore models can be used for a de novo design of novel compounds. Thestructural moieties to connect the pharmacophore may be selected by using the attach-ment bond vectors of the pharmacophore and searching a 3D database of molecularstructures [44] or using some standard spacers (molecular fragments) from a library[45]. These approaches definitely help medicinal chemists to get ideas about novel structural class of compounds. However, since most often such compounds have to besynthesized, the utility of these programs will be less compared to database searchingprograms where one can get ‘ready-made’ compounds.

Pharmacophore model and de novo design

267

Page 277: 3D QSAR in Drug Design. Vol2

Arup K. Ghose and John J. Wendoloski

5.4.

The most useful combinatorial library design programs can analyze the reactants interms of its diversity. It may be debated whether the diversity will be in terms of keyphysico-chemical properties like hydrophobicity, polarizability, volume, etc. or in termsof the nature of the pharmacophoric groups or both. Diversification, in terms of thenature of the pharmacophoric groups alone, is also an excellent way to probe thebinding site [46]. In general, while designing a ‘universal library’, one should diversifythe nature of the possible pharmacophoric groups of the whole molecule. The focusedlibraries should be diversified for each and every combinatorial substituents.

Pharmacophore modelling and combinatorial library design

6. Critical Aspects and Comments

One should be critical while using a pharmacophore hypothesis, Constraining the con-formation often leads to a loss of binding affinity. although entropy gains arising fromthe reduced conformational distribution should facilitate binding. Problems in such mol-ecules may be due to: (i) constraining the conformation at an angle somewhat distantfrom the ideal angle; and (ii) bad interactions of the constraining structural moiety withthe enzyme binding site. It is not certain how much drop in activity of a moleculeshould be acceptable while developing or validating the model.

The idea of pharmacophoric modelling does not hold if the inhibitors show a considerable activity, even when one or more pharmacophoric groups do not reach thesame region of the active site. Forcing the molecules to attain a conformation whereequivalent groups occupy the same location in such a situation may give a distortedpharmacophoric model. However, success in the drug-design process is so rare that the researchers in this area are often eager to take these risks. The only suggestion to beoffered here may be to analyze the suggested (computed) conformation with theexisting knowledge of conformation of similar molecules.

The pharmacophore modelling is based on the idea that similar inhibitors bind in thesame way at the active site. The X-ray crystallographic data of most ligand-proteincomplexes usually confirm this hypothesis. However, there are several exceptions tothis basic idea [47–49]. Multiple-binding mode may often result in a binding pocketwhere non-directional forces, like van der Waals or hydrophobic, are dominant. Theapplication of pharmacophore modelling in such a system may be risky.

Staying close to a lead compound maximizes the chance of finding an active com-pound. Unless it is necessary to get a very different compound (maybe for patent-ability), one should try to make the smallest change from the lead compound whilesearching a database.

In general. pharmacophore modelling is more useful than QSAR approaches at theinitial phase of drug discovery. Also it is more often applied since i t does not need avery good-quality biological activity or binding data. Unfortunately, such a relaxedcondition may easily lead to a misuse of various methods in this area.

268

Page 278: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

Acknowledgements

The authors want to thank Dr. Tim Hervey for a careful reading of the manuscript and for the various suggestions.

References

1.2.

Ehrlich, P., Über den jetzigen Stand der Chemotherapie, Chem. Ber., 42 (1909) 17.Marshall. G.R. and Naylor, C.B.. Use of molecular graphicsfor structural analysis of small molecules,In Hansch, C.. Sammes. P.G., Taylor. J.B. and Ramsden, C.A. (Eds.) Comprehensive medicinalchemistry: Vol. 4. Pergamon Press, Oxford. 1990, pp. 431–458.Marshall, G.R., Barry. C.D., Bosshard, H.E.. Dammkoehler. R.A. and Dunn, D.A., The conformationalparameter in drug design, Computer Assisted Drug Design. ACS Symp. Ser. 112. 1979, pp. 205–226.Gund, P., Pharmacophoric pattern searching and receptor mapping, Ann. Rep. Med. Chem., 14 (1979)299–308.Chose, A.K.. Logan, M.E., Treasurywala. A.M., Wang, H., Wahl, R.C., Tomezuk. B.E., Gowravaram,M.R., Jaeger. E.P. and Wendoloski, J.J., Determination of pharmacophoric geometry for collagenaseinhibitors using a novel computational method and its verification using molecular dynamics, NMR andX-ray crystallogrpahy, J. Am. Chcm. Soc., 17 (1995) 4671–82.Humblet, C. and Marshall. G.R. Pharmacophore identification and receptor mapping, Ann. Rep. Med.Chem.. 15 (1980) 267–276. Kclebe, G., structural alignment of molecules In Kubinyi, H. (Ed.) 3D QSAR in drug design: theory,methods and applications, ESCOM, Leiden, 1993. pp. 173–199.Clark. D.E., Willet, P. and Kenny. P.W.. Pharmacophoric pattern matching in file of three-dimensionalchemical structures: Use of smoothed bounded distances for incompletely specified query patterns, J. Mol. Graph., 9 (1991) 157–160.Johnson, W.H., Roberts. N.A. and Borkakoti. N.. Collagenase inhibitors: Their design and potentialtherapeutic use, J. Enzyme Inhib., 2 (1987) 1–22.Schwartz, M.A. and Van Wart. H.E., Synthetic inhibitors of bacterial and mammalian interstitial collagenase, Prog. Med. Chem. 29 (1992) 271–334.Matthews, B .W., Structural basis of the, action of thermolysin and related Zn peptidases, Acc. Chem.Res.. 21(1988) 333–340. Crippen, G.M., Quantitative structure–activity relationships by distance geometry: systematic analysisof dihydrofolate reducatase inhibitors J. Med. Chem. 23 (1980) 599–606. Ghose, A.K., Crippen, G.M.. Revankar, G.R.. McKernan. PA., Srnce, D.F. and Robins, R.K., Analysisof the in vitro antiviral activity of certain ribonucleosides against parainfluenza virus using a novelcomputer aided receptor modeling procedure, J. Med. Chem., 32 (1989) 746–756.Viswanadhan, V.N., Ghose. A.K.. Revankar. G.R. and Robins, R.K.. Atomic physicochemical para-meters of three-dimensional structure directed quantitative satructure–activity relationships: 4.Additional parameters for hydrophobic and dispersive interactions and their application for anautomated superposition of certain naturally occurring nucleoside antibiotics J, Chem, Inf. Comput.Sci., 29 (1989) 163–172. Martin, Y.C.. Burea, M.G., Danaher. EA., DeLazzer, J., Lico, 1. and Pavlik. P.A. A fast new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists,J. Comput.-Aided Mol. Design, 7 (1993) 83–102. Jones, C., Willet, P. and Glen. R.C., A genetic algorithm for flexible molecular overlay and pharma- cophore elucidation, J. Comput.-Aided Mol. Design, 9 (1995) 532–549.Crippen, G.M. and Havel, T.F.. Stable calculation of coordinates from distance information, ActaCrystalloger., Sect. A. 34 (1978) 282. Sheridan, R.P., Nilakantan, R., Dixon. J.S. and Venkataraghavan, R. The ensemble distance geometry:Application to the nicotinic pharmacophore, J. Med. Chem., 29 ( 1986) 899–906.

3.

4.

5.

6.

7.

8.

9.

10.

11,

12.

13.

14.

15.

16.

17.

18.

269

Page 279: 3D QSAR in Drug Design. Vol2

A rup K. Ghose and John J . Wendoloski

19.

20.

Crippen, G.M., Distance geometry approach to rationalizing binding data, J. Med. Chem., 22 (1979) 988-997.Chose, A.K. and Crippen, G.M., Distance geometry approach to modeling receptor sites, In Hansch, C.,Sammes, P.G., Taylor. J.B. and Ramsden, C.A. (Eds) Comprehensive medicinal chemistry, Vol. 4,Pergamon Press, 1990, pp. 715–734.Ghose, A.K. and Crippen, G.M., Geometrically feasible binding models of the flexible ligand moleculeat the receptor site, J . Comp. Chem., 6 (1985) 350-359.Motoc, I., Dammkoehler, R.A. and Marshall. G.R. Three-dimensional structure-activity relationshipsand biological receptor mapping, In Trinajstic, N. (Ed.) Mathematical and computational concepts in chemistry, Ellis Horwood Ltd., Chichester, 1986. pp. 222–251.Kuhl, F.S., Crippen, G.M. and Friesen, D.K., A combinatorial algorithm calculating ligand bindingJ. Comput. Chem., 5 (1984) 24–34.Tripos Associates. Inc., 1699 S. Hanley Road. St. Louis, MO 63144, U.S.A.Payne, A.W. and Glen. R.C., Molecular recognition using a binary genetic search algorithm, J.Mol.Graph., 11(1993) 74- 91.Bohacek, R.. Delombaert, S., McMartin, C., Priestle, J., and Grutter, M., 3-Dimensional models of ACE and NEP inhihitors and their use in the design of potent daul ACE/NEP inhibitors, J. Am. Chem. Soc.,118 (1996) 8231-8249.Dalpaiz, A., Bertolasi, V., Borea, P.A., Nacci, V., Fiorini, I.. Campiani, G., Mennini. T., Manzoni, C.,Novellino. E. and Greco, G., A concerted study using binding measurements, X-ray structural data andmolecular modeling on the stereochemical features responsible for the affinity of 6-arylpyrrolo [2.1-D] [1, 5] benzothiazepines toward mitochondrial benzodiazepine receptors, J. Med. Chem.,38 (I995) 4730–4738. Froimowitz, M., Patrick. K.S. and Cody, V., Conformational analysis of methylphenidate and its struc-tural relationship to other dopamine reuptake blockers such as CFT, Pharmaceutic. res., 12 (1995)1430-1434.Bandoli, G., Dolmella, X., Gatto, S. and Nicolmi, M., X-ray studies, empirical, semiempirical andstatistical calculations on a series of thyrotropin-releasing-harmone derivatives, J. Mol. Struct., 345

Morita, H., Yun, Y.S., Takeya, K., ltokawa, H. and Shiro, M., Conformational analysis of a cyclic haxa-peptide, segetalin-A from Vaccariasegetalis, Tetrahedron, 51 (1995) 5987–6002.Brandt, W., Drosihin, S., Haurand, M., Holzgrabe, U. and Nachtsheim, C., Search for the phar-macophore in k-agonistic diazabicyclo[3.3.1]nonan-9-one-1,5-diesters and arylacetamides, Arch. der Pharmazie, 329 (1996) 311–323.Hennig, P., Raimbaud, E.. Thurieau, C., Volland, J.P., Michel, A. and Fauchere, J.L., Solution confor-mation by NMR and molecular modeling of 3 sulfide-free somatostatin octapeptide analogs compared toangiopeptin, J. Coinput. -Aided Mol. Design, 10 (1996) 83–86.Boger, D.L. and Zhou, J.C., N-Desmethyl derivatives of Deoxybouvardin and RA-VII: synthesis andevaluation, J. Am, Chem. soc., 117 (1995) 7364–7378.Morita, H., Yun, Y.s., Takeya, K., Itokawa, H. and Shiro, M., Conformational analysis of a cyclichexapeptide, segetalin-A from Vaccariasegetalis, Tetrahedron, 51 (1995) 5987–6002.Sefler, A.M., He, J.X., Sawyer, T.K., Holub, K.E., Omecinsky, D.O., Reily, M.D., Thanball, V.,Akunne, H.C. and Cody. W .L., Design and structure-activity relationships of C-terminal cyclic neu-rotensin fragment analogs, J. Med. Chem., 38 (1995) 249–257.Nicklaus. M.C., Wang. S., Driscoll, J.S. and Milne, G.W., Conformational changes of small moleculesbinding to proteins, Bioorg. Med. Chem., 3 (1995) 411–428.MDL Information Systems Inc., 14600 Cataline Street, San Leandro, CA 94577, U.S.A.AM Technologies Inc., 14785 Omicron Dr., Texas Research Park, San Antonio, TX 78245, U.S.A.Ghose, A.K. and Crippen, G.M., A general distance geometry three-dimensional receptor model fordihydrofolate reductase inhibitors, J. Med. Chem., 27 (1984) 901–914.Ghose, A.K. and Crippen, G .M., Use of physicochemical parameters in distance geometry and relatedthree-dintensional quantitative structure-activity relationships: A demonstration using Escheria colidihydrofolate reductase inhibitors, J. Med. Chem., 28 (1985) 333–346.

2 1 .

22.

23.

24.25.

26.

27.

28.

29.

(1995) 213–225.30.

31.

32.

33.

34.

35.

36.

37.38.39.

40.

270

Page 280: 3D QSAR in Drug Design. Vol2

Pharmacophore Modeling: Methods, Experimental Verification and Applications

Cramer, R.D., Patterson, D.E. and Bunce, J.D., Comparative molecular field analysis (CoMFA): I. Effectof shape on binding of steroids to carrierproteins, J. Am. Chem. S oc. , 110 (1988) 5959-5967.Holzgrabe, U. and Hopfinger, A.J., Conformational-analysis, molecular shape comparison, and phar-macophore identification of different allosteric modulators of muscarinic receptors, J. Chem. Inf. Comp. Sci., 36 (1996) 1018-1024.Hariprasad, V. and Kulkarni, V.M., A proposed common spatial pharmacophore and the correspondingactive conformations of some peptide leukotriene receptor antagonists, J . Cornput.-Aided Mol. Design,

44. Barlett, P.A., Shea, J.T. Telfer, S.J. and Waterman, S. CAVEAT: S program tofacilitate the structure-derived design of biologically active molecules (Special publ. Molecular recognition in chemical andbiological problems), 78 (1989) 182-196.

45. Tschinke, V. and Cohen, N.C., The NEWLEAD program: A net method f o r the design of candidatestructures from pharmacophore hypothesis. J. Med. Chem., 36 (1993) 3863–3870.

46. Pickett, S.D., Mason, J.S. and Melay, I.M., Diversity profiling and design using 3D pharmacophores —pharmacophore-derived queries (pdq),J. Chem.Inf. Comput. Sci. 36 (1996) 1214–1223.

47. Mattos, C. and Ringe, D., Multiple Binding Modes, In H. Kubinyi (Ed.) 3D QSAR in drug design:Theory, methods and applications. ESCOM, Leiden, 1993, pp. 226-254.

48. Diana, G.D.,Jaeger, E.P., Peterson, M.L. and Treasurywala, A.M., The use of an algorithmic method ofsmall molecule superpositions in the design of antiviral agents, J. Cornput.-Aided Mol. Design, 7 (1993), 325-335.Martin, Y.C., Pharmacophore mapping, In Martin, Y.C. and Willet, P . (Eds) Design of bioactivemolecules using 3D structure information, ACS Washington D.C., 1997.

41.

42.

43.

10 (1996) 284-292.

49.

271

Page 281: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 282: 3D QSAR in Drug Design. Vol2

The Use of Self-organizing Neural Networks in Drug Design

Soheila Anzaliª, Johann Gasteigerb, Ulrike Holzgrabec, Jaroslaw Polanskid,Jens Sadowskie, Andreas Teckentrupband Markus Wagenerf

aE. Merck KGaA, Department of Medicinal Chemistry/Bio- and Chemoinformatics D-64293 Darmstrdt, Germany

bComputer-Chemie-Centrum, Institut für Organische Chemie, Universität Erlangen-Nürnberg,D- 91052 El-larigen, Germany

cPharmazeutisches Institut der Universität Bonn, D-53 115 Bonn, GermanydInstitute of Chemistry, University of Silesia, PL-40006 Kutowice, PolandeBASF AG, Drug Design, ZHV/W-A30, D-67056 Ludwigshafen, GermanyƒSmith Kline Beecham, UW2940, King of Prussia, PA 19406-0939, U.S.A.

1. Introduction

The development of a new drug is an extremely laborious and time-consuming process.Thus, quite early on, computer methods have been used to further an understanding ofthe interactions of a drug with its receptor. Molecular modelling and rational drug designhave become indispensable tools for the development of a new drug [1]. Recently, com-binatorial chemistry and high-throughput screening have been introduced in order tospeed up the drug development process. These methods produce massive amounts ofdata that have to be analyzed in an efficient manner in order to make best use of thesenovel methods. We will show here that self-organizing neural networks, such as the oneintroduced by Kohonen [2].can be used both in rational drug design and in combinator-ial chemistry.

The application of neural networks in chemistry has increased dramatically in recentyears [3–5] .In a Kohonen neural network (KNN), the artificial neurons self-organize inan unsupervised learning process and, thus, can be used to generate topological featuremaps. It will be shown here that this potential can be utilized to analyze the shape andsurface properties of those three-dimensional objects responsible for biological activity,molecules.

In these applications, there is a one-to-one mapping of a single molecule into a singleKohonen network. However, a Kohonen network can also be used for the analysis ofdatasets of molecules, where several molecules are simultaneously mapped into one Kohonen network. In order to make full use of the potential of self-organizing net-works, novel representations of molecular structures have been developed. Thesemethods can be put into a clear hierarchy, starting from molecular topology going allthe way to molecular surfaces. They do not only encode structural information, hut alsoinformation on the properties of atoms or of molecular surfaces.

2. Self-organizing Neural Networks

A Kohonen network can be used to study data of high-dimensional spaces by projectioninto a two-dimensional plane. The projection will be such that points that are adjacent in

H. Kubinyi et al. (eds. ). 3D QSAR in Drug Design, Volume 2. 273–299. © 1998 Kluwer Academic Publishers. Printed in Great Britian.

Page 283: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger. Ulrike Holzgrabe, et al.

Fig. 1. Architecture of a Kohonen neural network The input object X = (x1, x2, ..., xm) is mapped into an

n X n arrangement of neurons, j, each having a weight vector Wj = (wj1, Wj2, ..., wjm).

the high-dimensional space will also be adjacent in the Kohonen map: this explains the full name of the method. self-organizing topological feature maps.

Figure 1 shows the architecture of a Kohonen network: each column in this two- dimensional arrangement represents a neuron, each box in such a column represents a weight of a neuron [5]. Each neuron has as many (m) weights, wji, as there are inputdata, xi, for the object that is being mapped into the network.

An object, a sample, s, will be mapped into that neuron, sc, that has weights mostsimilar to the input data (Eq. 1 ):

(1)

The weights of this winning neuron, sc, will be adjusted such as to make them evenmore similar to the input data. In fact, the weights of each neuron will be adjusted but to a degree that decreases with increasing distance to the winning neuron. There are various ways for utilizing the two-dimensional maps obtained by a Kohonen network:

1.

2.

3.

Representation: the two-dimensional map can be taken as a representation, an encoding, of the higher-dimensional information. Similarity perception: objects that are mapped into the same or closely adjacent neurons can be considered as similar. Cluster analysis: points that form a group in such a map, clearly distinguished from other points, can be taken as a class or category of objects having certain features in common.

274

Page 284: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

3. Computational Details

The 3D structures of all molecules studied in the investigations reported here were gen-erated by the 3D structure generator CORINA [6–9] and used without further optimiza- tion. Partial atomic charges are calculated by the PEOE (Partial Equalization of OrbitalElectronegativity) method [10] and its extension to conjugated systems [11]. Residualelectronegativity values were also obtained by the above two procedures [ IO,11] byconsidering the charge dependence of electronegativities [12].The method for the cal-culation of atom polarizabilities has been published in reference [13]. These methodsare collected in the program package PETRA (Parameter Estimation for the Treatmentof Reactivity Applications) [14].

The molecular electrostatic potential was obtained in a classical manner by moving apoint charge on the molecular surface and calculating the potential according toCoulomb,s law from the partial atomic charges [15]. Any molecular surface can betaken into account; in most cases, we used the van der Waals surface.

The Kohonen neural networks were generated and analyzed by the Kohonen mapsimulator KMAP [16]. The study of the benzodiazepine and dopamine dataset was per-formed with an implementation of a Kohonen network on a massively parallel com- puter, MasPar [17,18]. In many studies reported here; the same dataset was used: 31steroids binding to the corticosteroid binding globulin (CBG) receptor [19,20]and forwhich affinity data were available in the literature [21-23]. This dataset was chosenbecause it had been studied with other methods [20,24,25]. Quite intentionally the samedataset of CBG steroids is used again and again. in order to render the various methodscomparable and show which features they emphasize.

4.

4.I .

4.1.1. Methods and resultsA Kohonen network can be used to map a molecular surface into a two-dimensionalplane. For this mapping of a molecular surface, points on this surface are chosen atrandom and their three Cartesian coordinates are taken as input into a KNN, with eachneuron having three weights [26,27]. As the molecular surface is without beginning andwithout end, is was decided also to choose for projection a two-dimensional plane without beginning and without end, h e surface of a torus. For visualization, the torus iscut along two perpendicular lines and the surface spread into a plane.

With a toroidal network, the maps can be shifted, mirrored and rotated against eachother to achieve a similar position of their patterns. The surface of a molecule and thesurface of a torus have a different topology and, therefore, this mapping process mustlead to topological distortions that result in empty neurons. This feature of the mappingprocess in a Kohonen network has been analyzed and explained in detail [28].

Once the network has been trained, the entire dataset is sent again through thenetwork and each neuron is colored with a property on the molecular surface that exists

One Molecule into One Network

Maps of molecular surfaces

275

Page 285: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, UIrike Holzgrabe, et al.

at that point(s) that is (are) mapped into the neuron considered [27]. In this coloringprocess, any molecular surface property can be chosen such as molecular electrostatic potential, hydrogen-bonding potential, or just a color identifying the surfaces of differ-ent types of atoms.

In order to give an idea of the correspondence between a 3D space and its 2D map,we show in Fig. 2 (see p. 279) as an example the projected Kohonen maps of the vander Waals surface of corticosterone. The values of the electrostatic potential on themolecular surface (MEP) determine the colors of the map.

Corticosterone has two sites with a large negative value of the MEP, the carbonylgroup at positions 3 (4) and the side chain COCH2OH at position 17 (1) (Fig. 2).Consistent with this, the Kohonen map (Fig. 2) shows two spaces with a red-yellowcolor of these sites. The spatial distance of these groups is reflected by two differentshapes of the projection of the MEP into the Kohonen network. The third site with anegative value of the MEP stems from the hydroxyl group at position 11 (3). A spacewith a yellow color is reserved in this map for this group. Furthermore, the large posi- tive MEP area of corticosterone is below the D-ring and the side chain at position 17 (2).The projection of the MEP into the Kohonen map indicates the location of this space(violet color) close to the space of the negative MEP area of COCH2OH at position 17.

4.1.2.The comparison of Kohonen maps of molecular surface properties offers a technique forthe perception of similarities in ligands binding to the same receptor. Kohonen maps ofthe molecular electrostatic potential have been generated for four ligands that bind tothe muscarinic receptors and four ligands that bind to the nicotinic receptors and areshown in Fig. 3.

Visual inspection of these maps clearly shows characteristics that are common to thefour molecules binding to the muscarinic receptors and are not contained in the maps ofthe four ligands binding to the nicotinic receptors [20] .The nicotinic compounds, fortheir side, show common features different from those of the muscarinic ligands. Thus,inspection of these eight maps allowed a clear separation of molecules that either bindto the muscarinic or the nicotinic receptors.

4.1.3. Averaged mapsIn the previous example, a visual comparison of the Kohonen maps of the molecularelectrostatic potential was made, thus allowing one to differentiate ligands that bind totwo different types of receptors. The question is now whether such a comparison can beput onto a more objective basis. This will be explored with 31 steroids for which theCBG affinities are known [21–23]. The distribution of the compounds into high-, inter-mediate- and low-affinity classes are defined in reference [24].

For each of the 31 steroids, a Kohonen network was trained, using the three Cartesian coordinates of points on the molecular surface as input into the network. The values of theMEP determine the colors of the map. For a more objective analysis, the averaged mapsfor the sets of high-, medium- and low-active compounds were generated (Fig. 4). For thispurpose, each neuron n of the Kohonen maps of the single compounds was assigned a

Visualcomparison of Kohonen maps

276

Page 286: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

color index in a range of ten values representing the MEP of the potential the neuronobtained during the training process. Then the colors of the neurons in the averaged mapswere obtained by averaging the color indices of the neurons in the single maps.

The MEP pattern of the most polar area in the averaged map of the highly active com-pounds is the most pronounced one. In the three averaged maps, the pronunciation of thepolar spaces decreases according to decreasing activity of the compounds. Therefore, acomparison of the maps of steroids with the averaged map allows one to establishwhether a molecule belongs to the active or inactive CBG compounds. The averagedmap of the highly active compounds can be used to build a pharmacophore model.

4.1.4.The investigation of the previous section can be taken one step further. If, indeed, themaps of the molecular electrostatic potential allow one to distinguish high-active com-

molecules? In other words, we first train a, say, 20 X 20, Kohonen network with thethree Cartesian coordinates of points of a molecular surface. Then, the entire dataset issent again through the network and an extra layer of the network used for labeling thenetwork is colored with the electrostatic potential of the points that are mapped intoeach neuron. (This is the procedure as outlined in section 4.I.I.) The values of the MEP(or the color) of the 20 rows of this label layer, each consisting of 20 values, are thenconcatenated to give a 400-dimensional vector. This vector is a two-dimensional repre-sentation of a molecule as it has been obtained by projecting a molecular surface intotwo dimensions and is used to train a second Kohonen network of size 5 X 5 . The resultis shown in Fig. 5 .

It can be seen that the steroids quite nicely separate into groups of compounds ofhigh. medium and low activity. Only one compound of medium activity shows a colli-sion with highly active compounds by being mapped into the same neuron.

Maps as a two-dimensional representation of molecules

4.I.5. Bioisosteric designThe bioisostere database by lstvan Ujvary (BIOISOSTER version 1.3), a database ofanalog design, including 1515 bioisosteric groups was analyzed. The question was ifsome coherency between the physico-chemical properties and the bioisosteric effect canbe deduced by looking at the calculated Kohonen maps of these groups. Figure 6 showsan example of such a structure-pair in this database. The squares show those parts of thestructures which are defined as bioisosteric groups.

Several hundred pairs from this database were selected. The selection of these com-pounds was based on diverse structural fragment pairs as far as possible. From the se-lected database the bioisosteric groups are then cut out. The 3D structures of theselected fragments are calculated using the program CORINA [6-9]. The MEP werecalculated on the van der Waals surface. Then, the Kohonen maps were calculated foreach bioisosteric groups with a unique color plate. Figure 7 shows some examples ofthe calculated fragments. As shown here, the fragment pairs are structurally quite differ-ent, but their maps show a high similarity in the electrostatic potential patterns.

277

pounds from low-active compounds, why not take those maps as representations of

Page 287: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, Ulrike Holzgrabe, et al.

This is an interesting result because it offers the possibility for selecting fragment pairs of this database, which can have a general validity, true bioisosteric groups.Therefore, we constructed a 3D database of several hundred fragments and functionalgroups including their corresponding Kohonen maps. The comparison of the electro-static potential patterns of the maps of such a library can be used to cluster bioisostericgroups and, consequently, improve the efficiency of the 3D design of bioactivemolecules.

4.2. Comparative maps

4.2.1. The template approachA Kohonen network stores the information on an object that is used for training. Thisfact inspired the use of a Kohonen network trained with the molecular data of a givenmolecule as a reference molecule, a template, for the comparison with other molecules [30,31]. The general idea of such a comparative mapping is shown in Fig. 8. TheCartesian coordinates of the points taken from the molecular surface of a butane mole-cule (a) are used to train a Kohonen network (c). In the example shown in Fig. 8,neurons of the map (d) are colored by giving the surface belonging to carbon atoms 1 to4 of butane, and the hydrogen atoms bonded to these carbon atoms different shades ofgray. The network (c) can now be used for the comparison of the surface of moleculesother than butane. In our example, it was used for the simulation of a map of thepropane molecule (b). Such a map (e) can be seen as a superposition of the comparedmolecule onto the template molecule, propane and butane in our case. A point from thecoinpared molecule will find a neuron in the template network having weights quite similar to coordinates of the point from the surface of the compared molecule (cf. Eq. 1). However, neurons corresponding to those parts of the surface of the templatemolecule that have no counterpart on the surface of the compared molecule will notbecome exited and, thus, stay empty. In the comparative map that is obtained byfiltering the compared molecule through the reference network of the template mole-cule, the empty neurons show up as white areas indicating where the surface of thereference molecule differs from the surface of the compared molecule.

Different settings of the parameters for training and testing of the Kohonen networkprogram allow one to emphasize certain aspects of the molecular surface to a different extent. In particular, the value for the threshold that determines whether the input dataof an object match with the weights of a neuron (cf. Eq. 1) can render the number ofnon-matching (empty) neurons and. thus. the white area in the comparative map largeror smaller. This is indicated in Fig. 8 where one setting (top map, Fig. 8e) somehowindicates the entire methyl group of butane to be lost in propane. whereas the secondsetting (bottom map, Fig. 8c) shows the major difference to reside in the three hydrogenatoms of the methyl group of butane being lost in propane.

The basis of the template approach is an analysis of the shapes of molecules and thequantification of a shape similarity or dissimilarity within a series of compounds using areference molecule. This is of particular merit for the comparison of a series of bio-logically active compounds. The larger the difference in shape between the reference

278

Page 288: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

Fig. 5. Mapping of a dataset of 31 steriods binding to the corticosteriod binding globulin (CBG) receptorinto a toroidal 5 X 5 Kohonen network. Each steriod is labeled by its activity range: H = high, M = mediumand L = low activity.

Table 1

Name of molecule No. of Name of molecule No. of

Number of empty neurons for the maps of CBG compounds (total number of neurons = 2500)

empty neurons empty neurons

Corticosterone 0 16α, 17-dihydoxyprogesterone 69Cortisol 15 19-nortestosterone 35711-deoxycortisol 52 Dihydrotestosterone 32417α-hydroxyprogesterone 61 2α -methy1-9α-fluorocortisol 282α -methylcortisol 6 4-androstenedione 350I I -deoxycorticosterone 50 Androsterone 417Cortisolacetat 13 Eticholanolone 710Prednisolone 140 Pregnenolone 108Progesterone 58 17α-hydroxypregnenolone 130Epicorticosterone 46 Estriol 63617α -methylprogesterone 113 Estrone 700

19-norprogesterone 152 Dehydroepiandrosterone 3784-pregnene-3,11,20-trioneTestosterone 296 5-andostenediol 332Aldosterone 225

Cortisone 75 Estradiol 644

281

79 Androstanediol 341

Page 289: 3D QSAR in Drug Design. Vol2

Fig

. 6.

Ast

ruct

ure-

pair

of t

he B

iois

oste

re D

atab

ase.

The

squ

ares

sho

w th

e bi

oiso

ster

ic g

roup

s.

282

Page 290: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

Fig. 7. Some examples of the bioisosteric groups and their calculated Kohonen maps.

4.2.3.The identification of the pharmacophore in an assembly of structurally diverse ligands is quite a difficult task, especially when the structure of the target molecule, the receptor, is not known. It will be demonstrated how Kohonen networks can be utilized for this

Backprojection of maps onto molecular shapes

283

Page 291: 3D QSAR in Drug Design. Vol2

Soheila Anzali Johann Gasteiger, Ulrike Holzgrabe, et al.

f) d) e) f) Fig. 8. The idea of comparative Kohonen mapping of two molecules A template butane molecule (a) trains (black arrow) a Kokonen network (c) allowing for 2D visualization of the surface of two methyl (CH3) andtwo methylene (CH2) groups (d). Difference settings of the parameters (two examples) during training result inonly slightly different template patterns (d). The same network (c), if used for processing the molecular datacoming from propane (b), gives a comparative pattern of the propane molecule (e). Different settings of the training parameters allow for showing such aspects of similarity that comply with the simple analys is comingfrom a chemist (f) — i.e. propane lacks the entire terminal methyl group of butane that is taken in the square within upper formulas; if, however, the carbon atom of this methyl group is superposed on one of hydrogen atoms of the propane molecule the difference will consist in three hydrogen atoms as indicated with squaresin the bottom formulas.

problem by elucidating the pharmacophore of allosteric modulators of muscarinic receptors, a group of drugs under development. which can enhance the activity of an antagonist in a very specific manner [32]. In a previous publication [33], it was shown how the pharmacophore can be mapped out with a small number of allosteric modulators.

284

Page 292: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

Here, we limit the discussion even further by only exploring alcuronium as the mostpotent compound, characterized by an almost rigid structure, and the flexible W84 as arepresentative of a wider range of hexamethonium and bispyridinium compounds [32–35].

Two different conformations of W84 were explored, the extended form 4e, obtainedfrom CORINA, and a slightly distorted sandwich form, 4d, found by molecular dynamic calculations and subsequent optimization by the semiempirical AM1 method. For alignment, the rigid alcuroniirm was chosen as a template. Both conformations of W84 were superimposed onto alcuronium first using the positively charged nitrogens, because they were assumed to be the most important feature for the first step of ligand receptor recognition. In addition, both aromatic rings were matched onto each other. The superposition of the extended, linear conformation of W84 (4e) and alcuronium (1)with the fix points of two positively charges nitrogens resulted in protruding phthalimido rings at both ends of alcuronium (Fig. (9a). The alignment of alcuronium, 1, and the dis-torted sandwich conformation of W84 (4d), revealed a much better fit (Fig. 9b).

In order to find out the similarities in the inolccular surfaces and, thus, properties of these molecules, the surfaces were sent into Kohonen neural networks and colored bylabeling the neurons according to the kind of atom the corresponding points belonged to(atomic surface assignment, ASA). The ASA maps showed that, firstly, the maps ofalcuronium and that of the distorted sandwich conforination of W84 are similar,whereas the one with the extended form of W84 is quite different.

For a more quantitative comparison of the 3D shape of the molecules, a template approach was made sending both conformations of W84 through the Kohonen network of alcuronium as reference compound. The maps obtained for the surface of the mole- cules show a rather large number of empty neurons reflecting that the shape of themolecules is fairly different from the shape of the large reference molecule alcuronium.

There is an even more illustrative method for showing the correspondence of two molecular surfaces. The map of the second molecule obtained by sending it through the Kohonen network of the first, the reference molecule, can be projected back onto the three-dimensional surface of the reference structure alcuronium. Fig. 10 shows the 3D models of the surface of alcuronium 1 with a backprojection of the Kohonen map of the extended conformation and that of the distorted sandwich-like conformation of W84, 4eand 4d, respectively. Those places that have empty neurons in the template maps are indicated by a black open mesh on the surface of the alcuronium.

This way of representation impressively exhibits the similarities between alcuronium and the distorted sandwich conformation of W84 (4d). The extended conformation of W84 tills only the center of the alcuronium surface. In contrast. the distorted sandwich- like conformation of W84 covers much more area of the surface of alcuronium. Moreover, the essential features, both aromatic skeletons and both positive charges, color the surface exactly at the same places for this sandwich conforination and alcuronium.

Taken together, the following conclusions can be drawn [30]: firstly, the pharma-cophore consists of two positively charged nitrogens in a distinct distance from each other and two heterocyclic, aromatic rings, both closely located in the hydrophobic central chain; secondly, the distorted saiidwich geometry appears to be the conformation W84 takes up upon binding to the allosteric binding site; and thirdly, electrostatic inter-

285

Page 293: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, Ulrike Holzgrabe, et al.

Fig. 9. Superposition of the skeleton of the extended form of W84 (4e) and the distorted sandwich form of W84 (4d) onto alcuronium (1).

1 4e 4d

Fig. 10. Backprojection of the Kohonen maps shown in Fig. 28a–c onto the molecular surface of alcuronium 1.

286

Page 294: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

actions have been found to be primarily responsible for the molecular recognition between receptor and ligand.

4.2.4. Descriptors from comparative mapsThe more the surface of a template (reference) and a compared molecule differ fromeach other, the higher is the number of empty neurons. Therefore, this value can betaken as a measure describing the difference in the geometry of the molecules input. This approach can be taken one step further also to describe differences in molecular surface properties In principle, any molecular surface property can be taken but welimit the discussion to the molecular electrostatic potential (MEP). The entire spectrumof the electrostatic potential is divided into ranges (e.g. 10 ranges) each indicated by a specific color. Therefore, the map is coded by a matrix with elements that take discretevalues from 1 to 10, while 0 codes empty neurons. Clearly, also the real values of theelectrostatic potential could be used, but we wish to keep the discussion here simple.

Figure 11b shows an example of the comparative MEP maps of two similar mole-cules shown in Fig. 1la. Figure 11c compares the histogram of the occurrence of theneurons colored with the respective colours 0-10 . We can define descriptors for the dif-ferences in color profiles of the maps. By comparing the occurrence of neurons havingthe same color (range of MEP):

(2)

with ki = 1 for neurons colored with the respective color coded by i and ki = 0 for allother colors, while the matrix coding a feature map is of size n X n.

The difference between the map of the template and the compared molecule is given by:

(3)

with the index, T, denoting the template, and M the molecule being compared. We canalso include the range of the electrostatic potential coded by a certain color and obtain amodified Eq. 4:

(4)

where ci is a value (1-10) defining the range of the electrostatic potential coded by therespective color i and (ci)jk denotes the components of the matrix. of size n X n, codingthis respective color within the feature map.

These descriptors can be calculated for a single color or for a group of differentranges of colors — e.g. EP1–3 will give a sum of the EP1 + EP2 + EP3.

A related global EP descriptor was used by Barlow [36]. In contrast, the EP para-meter calculated for a narrow range of EP will bear only the information on the polarcharacter neglecting the shape of the molecules.

287

Si = (Ki )T – (Ki)M

Page 295: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, Ulrike Holzgrabe, et al.

Fig. 11. comparative patterns (b) of two molecules (a) and a histogram comparing the frequency of theoccurrence of different colors within the maps (c).

4.3. Quantitative structure –activity studies

One of the basic approaches toward modelling SAR and QSAR relationships involvesthe comparison of a series of bioactive molecules. Table 2 gives an overview of a seriesof analogs analyzed in previous publications by quantitative structure–activity studies using descriptors developed above (section 4.2.4).

288

Page 296: 3D QSAR in Drug Design. Vol2

Tab

le 2

Stud

yC

ompo

unds

/act

ivity

Mod

el/s

tatis

tical

Des

crip

tor

Ref

eren

ce

The

SAR

and

QSA

Rm

odel

sob

tain

ed u

sing

the

tem

plat

eK

ohon

enm

aps o

f the

mol

ecul

arel

ectr

osta

ticpo

tent

ial

char

acte

riza

tion

1.St

eroi

ds/C

BG

Qua

litat

ive

NE

[30]

2.H

ista

min

ean

alog

s/H

2ac

tiviti

esQ

ualit

ativ

eE

Pa all

[36]

3.R

yano

dine

deri

vativ

es/b

indi

ngto

mem

bran

epr

otei

nsQ

ualit

ativ

eN

Eb

[37]

4.N

itroa

nilin

es/s

wee

tnes

sac

tivity

r=0.

99,n

=9

EP 7

–10

[31]

5.N

itro-

and

cyan

oani

lines

/sw

eetn

ess

activ

ityr=

0.96

,n=

18E

P 1–1

0[3

1]6.

Eth

ylca

rbox

ylat

es/T

aft’s

Es

cons

tant

r=

0.94

,n=

22N

E[3

1]

7.St

eroi

ds/C

BG

r=

0.89

, n=

31[3

8]–c

r=

0.92

,n=

31

8.St

eroi

ds/T

BG

r=

0.92

,n=

20d

[39]

9.A

ryls

ulfo

nyla

lkan

oic

acid

s/sw

eetn

ess

activ

ityr

=0.

94,n

=9e

[40]

–c –c

aT

heco

mpa

riso

nis

perf

orm

edby

Subt

ract

ion

ofth

eM

EP

mat

rice

s,a

num

ber

ofra

nges

ofE

Pis

notg

iven

.b

NE

isth

enu

mbe

rof

empt

yne

uron

s.e

The

com

pari

son

ispe

rfor

med

bya

sing

lein

star

neur

on.

dT

hein

duce

dfit

betw

een

the

TB

Gre

cept

oran

dm

olec

ules

stim

ulat

ing

are

sim

ulat

edus

ing

the

hype

rmol

ecul

ete

chni

que.

eA

met

hod

allo

ws

for

the

dete

rmin

atio

nof

activ

eph

arm

acop

hori

cco

nfor

mat

ion,

the

stat

istic

alch

arac

teri

zatio

nco

nsid

ers

this

conf

orm

atio

n,

289

Page 297: 3D QSAR in Drug Design. Vol2

For more details, the reader is referred to the original publications. The major con- clusion to be drawn is that the template approach provides a basis for the quantification of the shape and electrostatic effects of molecular surfaces, allowing the calculationof descriptors that are useful for the development of quantitative structure-activityrelationships.

5.

Previously, each Kohonen network was trained with information from only one mole-cule, in most cases, one molecular surface. The maps thus generated were replicationsof one molecular surface, the objects mapped into the individual neuron were pointsfrom the molecular surface. Now, work will be reported where entire datasets of mole-cules are sent into a single Kohonen network. Each object mapped into a neuron willconsist of an entire molecule. Various representations of molecules have been em-ployed, as detailed in the following sections.

5.1. Representation of molecules

The analysis of a dataset of objects by learning methods, be it by statistical or patternrecognition methods or neural networks, asks for the objects to be represented by thesame number of variables. If the objects are molecules, one has to come up with thesame number of descriptors irrespective of the size of the molecule and the number ofatoms in the molecule. In the following, the transformation of a molecular structure byautocorrelation is used to obtain a representation by a fixed number of variables. It willbe shown that autocorrelation allows molecules to be considered with different degreesof sophistication, starting with the constitution (the topology) of a molecule, through the3D structure all the way to representations of molecular surfaces. In addition, a varietyof physico-chemical properties of the atoms or of the molecular surfaces can be consid-ered. Such a hierarchy of representations is mainly dictated by the size of the datasets tobe studied: large datasets of molecules ask for rapid encoding schemes, while smallerones allow a more detailed consideration of molecules.

Several Molecules into One Network

5.2.

The idea of using autocorrelation for the transformation of the constitution of a moleculeinto a fixed length representation was introduced by Moreau and Broto [41]. A certainproperty, p

k, of an atom i is correlated with the same property of atom j and these pro-ducts are summed over all atom pairs having a certain topological distance d . This givesone element of a topological autocorrelation function A (pk, d) of this property pk:

Topological autocorrelation and the location of biologically active compounds

290

(5)

Page 298: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

The following properties were calculated by previously publ ished empirical methodsfor all atoms of a molecule: sigma charge, qσ [I0], total charge, qtot, sigma electro-negativity, χσ [12], pi-electronegativity, χπ [11], lone-pair electronegativity, χLP , andatom polarizability, α, [13]. In addition to those six electronic variables, the identityfunction — i.e. each atom being represented by the number 1 — was used in Eq. 5 tojust account for the connectivity of the atoms in the molecule.

The autocorrelation of these seven variables was calculated for seven topological dis-tances (number of intervening bonds) from two to eight. The basic assumption, thus,was that interactions of atoms beyond eight bonds can be neglected. With seven vari-ables and seven distances, an autocorrelation vector of dimension 49 was obtained foreach molecule, irrespective of its size or number of atoms. The hydrogen atoms werenot considered in the calculation of the autocorrelation vector.

In order to investigate the potential of topological autocorrelation functions for thedistinction of biological activity, a dataset of 112 dopamine agonists (DPA) and 60 ben-zodiazepine agonists (BDA) was studied [18]. A Kohonen network of size 10 X 7 wasused to project these 172 compounds from the 49 dimensional space spanned by theseautocorrelation vectors into two dimensions.

The two types of compounds, DPA, and BDA, were nearly completely separated inthe Kohonen map, underscoring the potential of this molecular representation to modelbiological activity. To put this capability to a more severe test, this dataset of 112 DPAand 60 BDA compounds was mixed with the entire catalog of a chemical supplier(Janssen Chimica catalog, version 1989)consisting of 8323 commercially available com-pounds comprising a wide range of structures from alkanes to triphenylmethane dyestuffs.

The map of Fig. 12 shows that both DPA and BDA occupy only Limited areas in theoverall map. Furthermore, the areas of DPA and BDA are quite well separated fromeach other, only one neuron with BDA intrudes into the domain of DPA and only twoneurons with conflicts, obtaining both DPA and BDA, occur. With the results obtainedhere, the search for new active compounds or new lead structures can be restricted to asmaller area of the entire chemical space. This opens the way for searching for com-pounds with a desired biological activity and for discovering new lead structures inlarge databases of compounds. Closer analysis of the mapping shows interestinginsights that are further discussed in the original publication [18].

5.3.

Ligands and proteins interact through molecular surfaces and, therefore, clearly, repre-sentations of molecular surfaces have to be sought in the endeavor to understand bio-logical activity. Again, we are under the restriction of having to represent molecularsurfaces of different size; and again, nutocomelation was employed to achieve this goal[20] . Firstly, a set of randomly distributed points on the molecular surface has to begenerated. Then, all distances between the surface points are calculated and sorted intopreset intervals

Autocorrelation of molecular surfaces properties

(6)

291

Page 299: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, Ulrike Holzgrabe, et al.

Fig. 12. Kohonen map of 40 X 30 neurons obtained by training with I12 dopamine (DPA), 60 benzo-diazepine agonis sts (BDA) and 8323 commercially available compounds Only the types of compounds mappedinto the individual neurons are indicated Black identifies DPA, light gray BDA and dark gray the compoundsof unknown activity Empty neurons are .stored in white; the two neurons marked by a black frame indicateconflicts where both DPA and BDA are mapped into the same neuron.

where p ( i ) and p(j) are property values at points i and j , respectively; dij is the distanceand L is the total number of distances in the interval [d1, du]

represented by d. For a series of distance intervals with different upper and lowerbounds, d1 and du, a vector of autocorrelation coefficients is obtained. It is a condensedrepresentation of the distribution of property p on the molecular surface.

5.4.

The affinity of 31 steroids binding to the corticosteroid binding globulin (CBG) receptorwas modelled based on spatial autocorrelation coefficients of the molecular electrostaticpotential as descriptor [20] .A vector of twelve autocorrelation coefficients correspondingto twelve distance intervals of 1 Å width between 1 and 13 Å was determined for each steroid using Eq. 5. Then, this set of descriptors was investigated using two differentmethods: firstly, the 12-dimensional descriptor space was projected into a plane using aKohonen neural network in order to visualize the high-dimensional descriptor space.Then, these descriptors were used to quantitatively model CBG activity.

Modelling CBG affinity by a combination of two different neural networks

292

between the points i and j ;

Page 300: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, Ulrike Holzgrabe, et al.

5.5. Modelling of chemical libraries

The methods introduced in previous sections have the advantage that they allow for arapid visualization of high-dimensional descriptor spaces. The importance of thisfeature has increased with the advent of the large compound collections that can be gen-erated by combinatorial chemistry and related techniques: small datasets comprisingtens or hundreds of compounds can be analyzed using almost any method withoutreaching the limits of currently available computer hardware, whereas special tech-niques are needed for the handling of datasets of hundreds of thousands of compounds.To demonstrate the merits of Kohonen networks and spatial autocorrelation descriptorsin handling large datasets, we analyzed three combinatorial libraries that togethercomprise more then 87 000 compounds [42].

Rebek et al. published the synthesis of two combinatorial libraries of semi-rigid com-pounds that were prepared by condensing a rigid central molecule functionalized byfour acid chloride groups with a set of 19 different L-amino acids [43]. This process issummarized in Fig. 14. In addition to the two published libraries we included a third,hypothetical library with adamantane as central molecule into our study.

5.5.1. Comparison of the xanthene, the cubane and the adamantane libraries A Kohonen network with 50 X 50 neurons was trained with the combined descriptors ofthe xanthene and the cubane libraries, each molecule represented by 12 autocorrelationvalues calculated from the electrostatic potential on the molecular surface. The resulting

central molecule that is mapped into them. All 2500 neurons of the map are occupied.

Fig. 14. Preparation of the xanthene and the cubane libraries.

294

map is shown in Fig. 15a. The neurons are colored according to the most frequent

Page 301: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

The compounds of the cubane library form a cluster in the center of the map that isseparated from the compounds of the xanthene library. The neural network can clearlyseparate the two libraries quite well — they both cover different parts of Kohonenmaps and, thus, it can be concluded that they are from different parts of the chemicalspace. Consequently, they are remarkably different and, thus, both worthwhile to beconsidered in a screening program.

In a second experiment, we trained the same network with the combined data set ofall three libraries. This resulted in the Kohonen map shown in Fig. 15b. Again, a dis-tinct cluster that is clearly separated from the xanthene derivatives can be seen in thecenter of the map. The cubane and adamantane derivatives, on the other hand, cannot bedistinguished by the neural network. They are tightly mixed in the central cluster, evenmore than can be concluded from Fig. 15b, as 92% of the cubane and adamantane com-pounds are mapped into common neurons. The cubane and adamantane libraries, thus,cover the same part of the chemical space — they are so similar to each other that con-sidering both of them in a screening program is both a waste of resources and time. Thexanthene library is evidently different from the other two libraries. Therefore, thexanthene and either one of the cubane or adamantane libraries should be used forscreening.

Fig. 15. Kohonen map of (a) the combined xanthene and cubane libraries and (b) the combined xanthene, cubane and adamantane libraries.

295

Page 302: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, Ulrike Holzgrabe, et al.

Rebek et al. used their libraries to screen for novel trypsin inhibitors. Only the xanthenelibrary showed significant trypsin inhibition, so that they concentrated further efforts onthis library. In the next round of screening. they divided the xanthene library into sixsublibraries by using subsets of only 15 amino acids for the generation of the libraries.These subsets were generated by omitting three amino acids in turn from a set of 18amino acids. This process resulted in sublibraries of 25 425 coinpounds that were testedfor their trypsin inhibition. To study the diversity of the six sublibraries, we first traineda network with the complete xanthene dataset resulting in a map with all neurons occu-pied. In this map, we then sent the compounds of the different sublibraries, obtaining al-together six different maps, one each for each sublibrary (Fig. 16).

The six maps show remarkable differences: some of them are nearly completely filled.some of them exhibit large white areas representing neurons that no compound was

the chemical space of the original xanthene library. The omission of the basic or acidicamino acids, for example, has led to a decreased diversity as shown by the large numberof empty neurons. On the other hand, the omission of the larger alkyl amino acids or the-OH and -S- substituted amino acids from the xanthene library does not lead to a remark-able decrease in diversity as there are only small white areas in the corresponding maps.

Fig. 16. Kohonen maps of a network trained with the entire xanthene library. Neurons occupied by thedifferent sublibraries are shown in black, unoccupied neurons in white.

296

5.5.2. Deconvolution of xanthene sublibraries

mapped into. The larger these white areas are , the less the corresponding sublibrary covers

Page 303: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

6. Conclusion and outlook

The human brain generates maps of the environment from sensory information, Thiscapability of the human brain is modelled by self-organizing neural networks such as theone developed by Kohonen. Kohonen networks can be used for the mapping of mole-cular surface properties. It is shown that maps of the molecular electrostatic potentialprovide valuable information for understanding biological activity and searching fornew lead structures. Kohonen networks can also be used for the mapping of datasets ofmolecules. Autocorrelation vectors derived from the topology of molecules or frommolecular surface properties provide an encoding of molecular structures that can beused as input to Kohonen networks and, thus, allow a clustering of molecules thatreflects biological activity. Such a mapping can be used for the assessment of thesimilarity and diversity of chemical libraries.

The algorithms presented here, both those for the calculation of physico-chemicaleffects such as the molecular electrostatic potential and that for the Kohonen network,work quite rapidly. In addition, by their very nature, neural networks are of a parallelmanner allowing their implementation on parallel machines. This all taken togethermakes it possible to study large molecules and very large datasets.

References

1.

2.

Kubinyi, H. (Ed.), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden.1993.Kohonen. T., Self-organized formation of topologically correct feature maps, Biol, Cybern., 43 (1982)59–69; Self-Organization and Associative Memory, 3rd Ed., Springer. Berlin. 1989; Proc. IEEE,78 (1990) 1460–1480: Self-Organizing Maps, Eds. Huang, T.S., Kohonen. T. and Schröder, M.R., Springer, Berlin, 1995.Zupan, J. and Gasteiger, J.. Neural networks: a new method for solving chemical problems or just a

Gasteiger, J. and Zupan, J.. Neural networks in chemistry, Angew. Chem. Int. Ed. Engl., 32 (1993) 502–527; Angew. Chem., 105 (1993) 510–536.Zupan, J. and Gasteiger, J., Neural Networks for Chemists: An Introduction, VCH Publishers.Weinheim, 1993.Sodowski, J.. Rudolph. C. and Gasteiger, J., The generation of 3D-models of host-guest complexes. Anal. Chim. Acta, 265 (1992) 233–241.Gasteiger, J., Rudolph, C. and Sadowski. J., Automatic generation of 3D-atomic coordinates for organic molecules, Tetrahedron Comput. Method.. 3 (1990) 537–547.Sadowski, J. and Gasteiger, J., From atoms and bonds to three-dimensional atomic coordinates:Automatic model builders, Chem. Reviews. 93 (1993) 2567–2581.Sadowski, J., Gasteiger, J. and Klebe, G., Comparison of automatic three-dimensional model buildersusing 639 X-ray structures, J. Chem. Inf. Comput. Sci., 34 (1994) 1000–1008. Gasteiger, J. and Marsili, M., Iterative partial equalization of orbital electronegativity — a rapid accessto atomic charges, Tetrahedron, 36 (1980) 3219–3228.Gasteiger, J. and Saller, H., Calculation of the charge distribution in conjugated systems by aquantification of the resonance concept, Angrew, Chem. Int. Ed. Engl., 24 (1985) 687–680: Angew. Chem., 97 (1985( 699–701.Hutchings, M.G. and Gasteiger, J., Residual Electronegativity — an empirical quantification of polarinfluences and its application to the proton affinity of amines, Tetrahedron Lett.. 24 (1983) 2541–2544.

3.

4.

5 .

6.

7.

8.

9.

passing phase?, Anal. Chim. Acta. 248 (1991) 130.

10.

11.

12.

297

Page 304: 3D QSAR in Drug Design. Vol2

Soheila Anzali, Johann Gasteiger, Ulrike Holzgrabe, et al.

13.

14.15.16.17.18.

Gasteiger, J. and Hutchings, M.G., Quantification of effective polarizability: Applications to studies of

PETRA program package. Gasteiger, J., Marsili, M., Saller, H., Hutchings, M.G. and Fröhlieh, 4.,1995.SURFACE program, Version 1.0, Sadowski, J. and Gasteiger, J., 1994. KMAP, Version 2.1. Li, X., Wagener. M. and Gasteiger, J., 1996.Bauknecht, H. and Zell, A., Universität Stuttgart, 1995.Bauknecht, H., Zell, A., Bayer, H . Levi, P., Wagener. M., Sadowski, J. and Gasteiger, J., Locating bio-logically active compounds in medium-sized heterogeneous datasets by topologic al autocorrelation vectors: Dopamine and benzodiazepine agonists, J. Chem. Inf. Comput. Sci., 36 (1996) 1205–1213.Cramer, R.D., III, Patterson. D.E. and Bunce, J.D., Comparative molecuar field analysis (CoMFA): I. Effect of shape on binding of steriods to carrier proteins, J. and Chem. Soc., I 10 (1988) 5959-5967 Wagener, M., Sadowski. J. and Gasteiger, J ., Autocorrelation of molecular surface properties for model-ing corticosteriod binding globulin and cytosolic ah receptor ac tivity by neural networks, J. Am. Chem. Soc., 117 (1995) 7769–7775. Dunn, J.F., Nisula, B.C. and Rodbard, D., Transport of steroid hormones: Binding of 21 endogenous steroids to both testosterone-binding globulin and corticosteriod-binding globulin in human plasma. J. Clin. Endocrinol. Metab., 53 (1981) 58–68.Mickelson, K.E., Forsthoefel, J. and Westphal, U., Steriod protein interactions, human corticosteriodbinding gIobulin: Some physicochemical properties and binding specificity, Biochemistry, 20 (1981) 6211–6218.Westphal, U. (Ed.). Steroid–Protein Interaction II, Springer, Berlin. Germany, 1986.Good. A.C., So. S.S. and Richards, W.G., Structure-activity relationships from molecular similarity matrices, J. Med Chem., 36 (1993) 433-438. Jain, A.N., Koile, K. and Chapman, D., Compass: Predicting biological activities from molecular surface properties: Performance comparison on a steriod benchmark, J. Med. Chem., 37 (1994)2315–2327.Gasteiger, J., Li, X., Rudolph. C., J. Sadowski. J. and Zupan, J., Representation of molecular electro-static potentials by topological feature maps, J. Am. Chem. Soc., 1 16 (1994) 4608-4620.Gasteiger, J., Li, X. and Uschold, A., The beuaty of molecular surfaces as revealed by self-organizingneural networks, J. Mol. Graphics, 12 (1994) 90-97. Li, X., Gasteiger, J. and Zupan, J., On the topology distortion in self-organizing feature maps, Biol.Cybern., 70 (1993) 189-198.Gasteiger, J. and Li, X., Mapping the electrostatic potential of muscarinic and nicotinic agonists withartificial neural networks, Angew. Chem. Int. Ed. Engl., 33 (1994) 643–646: Angew. Chem., 106 (1994)671-674.Anzali, S., Barnickel. G., Krug, M., Sadowski, J.. Wagenrr, M., Gasteiger, J. and Polanski, J., The com- prison of geometric and electronic properties of molecular surfaces by neural networks: Application tothe analysis of corticosteriod binding globulin activity of steroids, J. Comput.-Aided Mol. Design, 10(1996)521–534.Polanski, J., Gasteiger, J.. Wagener. M. and Sadowski, J.. The comparison of molecular surfaces by neural networks and its application to quantitative structure activity studies, Quant. Struct. Act. Relat.(in print).Holzgrabe, U. and Mohr, K., Allosteric modulators of lagand binding to muscarinic acetylcholine receptors, Drug Discovery (in print).Gasteiger, J., Holzgrabe, U., Kostenis, E.. Mohr, K., Sürig, U. and Wagener, M., Variation of the oximefunction of bispyridinium-type allosteric modulators of M2-cholinoceptors, Pharmazie, 50 (1995) 99–105.Bejeuhr, G., Holzgrabe, U., Mohr, K., Sürig. U. and von Petersenn, A., Molecular modelling and syn-thesis of potent stabilizers of antagonist binding to M2-cholinoceptors, Pharm. Pharmacol. Lett., 2

35. Holzrabe, U., Wagener, M. and Gasteiger, J., Comparison of structurally different allosteric modula-tors of muscarinic receptors by self-organizing neural networks, J. Mol. Graph. 14 (1996) 185–195.

x-ray photoelectron spectroscopy and alkylamine protonation, J. Chem. Soc. Perkin, 2 (1084) 559–564.

19.

20.

21.

22.

23.24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

(1992) 100–103.

36. Barlow, T.W.J., Self-organizing maps and molecular similarity, J. Mol. Graph., 13 (1995) 24–27.

298

Page 305: 3D QSAR in Drug Design. Vol2

The Use of Self-Organizing Neural Networks in Drug Design

Anzali, S.,Barnickel, G., Krug, M., Sadowski, J., Wagener, M. and Gasteiger, J., Evaluation of mole-cular surface properties using a Kohonen neural network: Studies on structure-activity relationships, InDevillers, J. (Ed.) Neural networks in QSAR and drug design, Academic Press, London, 1996,

Polanski, J., Neural nets for the simulation of the molecular recognition of MS-WINDOWS environment, J. Chem. Inf. Comput. Sci., 36 (1996) 694-705.Polanski, J., The receptor-like neural network for modelling corticosteroid and testosterone binding globulins, J. Chem. Inf. Comput. Sci., 37 (1997) 553–561. Polanski, J., Ratajczak, A., Gasteiger, J., Galdecki, Z. and Galdecka, E., Molecular modelling and X-rayanalysis for a structure-taste study of a-Arylsulfonylalkonoic acids, J. Mol. Struct.Moreau, G. and Broto, P., Autocorrelation of molecular structures: Application to SAR studies, Nouv.J. Chim., 4 (1980) 757-764.Sadowski: J., Wagener, M. and Gasteiger, J . Assessing similarity and diversity of combinatorial librariesby spatial autocorrelation functions and neural networks, Angew. Chem. Int. Ed. Engl., 34 (1995)2674-2677; Angew. Chem., 107 (1995) 2892-2895.Carell, T., Wintner, E.A., Bashir-Hashemi, A. and Rebek, J., Jr., A Novel Procedure for synthesis oflibraries containing small organic molecules, Angew. Chem. Int. Ed. Engl., 33 (1994) 2059-2061;Angew. Chem., 106 (1994) 2159-2162; Carell, T., Wintner, E.A., Bashir-Hashemi, A. and Rebek, J., Jr.,A solution-phase screening procedure for the isolation of active compounds from a library of molecules,Angew. Chem. Int. Ed. Engl., 33 (1994) 2061-2064;Angew. Chem., 106 (1994) 2162-2165.

37.

pp. 209-222.38.

39.

40.

41.

42.

43.

299

Page 306: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 307: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment ofMolecular Electrostatic Potentials

David A. Thorner, David J. Wild, Peter Willett* and P. Matthew Wright Krebs Institute for Biomolecular Research and Department of Information Studies, University of

Shefield, Shefield S10 2 TN, U.K.

1. Introduction

Database searching plays an increasingly important role in drug-discovery programs

database that contain a user-defined query substructure, have been available in chemicalinformation systems for many years [2]. The last few years have seen the introductionof complementary facilities for similarity searching [3], This involves matching a targetmolecule of interest, such as a weak lead from a high-throughput screening program,against all of the molecules in a database to find the nearest neighbors — i.e. thosemolecules that are most similar to the target using some quantitative measure of inter-molecular similarity.

Early database searching systems were designed for the storage and retrieval of two-dimensional (2D) chemical structures but the development of structure generation pro-grams [4] has focused interest on techniques for the processing of three-dimensional(3D) structural information [5], and there have already been several reports of systemsfor 3D similarity searching that are sufficiently fast in operation to allow them to beused with databases of non-trivial size [6–12]. However, few of these approaches takeexplicit account of the electrostatic, steric and hydrophobic fields that form the basis ofmodern approaches to 3D QSAR (as illustrated by the many other papers in thisvolume), and an ongoing project at the University of Sheffield is hence developingmethods for field-based similarity searching. Like many previous workers [13-21], ourexperiments have focused on the Molecular Electrostatic Potential (MEP), but the tech-niques that we have developed are applicable, in principle at least, to any field-likeattribute that can be represented by real values in a 3D grid surrounding a molecule.

The electrostatic potential, Pr , at a point r for a molecule of n atoms is calculatedfrom the point charges qi on each atom i in the molecule, so that

where Ri denotes the position of the i-th atom. A molecule is positioned at the center ofa 3D grid and the potential is calculated at each point in the grid. The similarity betweena pair of molecules is estimated by aligning them such that similar features are super-imposed, taking the product of the two molecular potentials at each point and then

* To whom correspondence should be addressed.

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design, Volume 2. 301-320.© 1998 Kluwer Academic Publishers. Printed in Great Britain.

[ 1]. Facilities for substructure searching , the identification of all of those molecules in a

Page 308: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild, Peter Willett and P. Matthew Wright

summing over the entire grid, with a suitable normalizing factor being used to bring thesimilarities into the range –1.0 to +1 .0. This numerical approach necessarily involvesthe matching of very large numbers of grid points and is, thus, extremely demanding ofcomputational resources; however, Good et al. [16] have reported an alternative ap-proach in which the potential distribution is approximated by a series of Gaussian func-tions that can be processed analytically, with a substantial increase in the speed of thesimilarity calculation and with only a minimal effect on its accuracy. This elegant idearemoves one of the main limitations of field-based approaches to 3D similarity search-ing, but still requires the alignment of the two molecules that are being compared priorto the calculation of the similarity, which is why field-based similarity methods have,thus far, only been applied in the context of small datasets.

The principal objective of the work reported here has been to develop methods for thegeneration of alignments that are sufficiently rapid in execution to permit the use ofMEP-based similarity measures for database searching. This chapter presents the twomethods we have developed for this purpose: one method is based on the application ofa maximal common subgraph (MCS) isomorphism algorithm to a structure representa-tion that we shall refer to subsequently as a field-graph; and the other method is basedon a genetic algorithm, hereafter a GA. Full details of the work reported here arepresented by Thorner et al. and by Wild and Willett [22-24].

2. Calculation of Similarities Using Field-Graphs

2.1. Generation of field-graphs

Chemical database systems use methods of representation and search that are based ongraph theory. The methods were first developed for 2D substructure searching, wherethe atoms and bonds of a chemical compound are denoted by the nodes and edges of alabelled graph and where searching is effected by the application of a subgraph iso-

for 3D pharmacophore searching, where the nodes of a graph again denote the atoms ofa chemical compound but where the edges denote the interatomic distances (or distanceranges) in a rigid (or flexible) 3D molecule [25]. Graph-based similarity searchingmethods have also been described, in which an MCS algorithm is used to find mole-cules that are similar to a target molecule [3,26,27].

It is simple to develop analogous graph methods for the processing of grid-based rep-resentations of molecular fields, since a grid can be represented by a graph in whichthe nodes correspond to each of the grid points and in which the edges correspond to theinter-point distances. The overlap between two sets of molecular fields, and hencethe similarity of the two corresponding molecules, will then be given by the overlapbetween the two graphs, this being estimated most obviously by their MCS. Un-fortunately, existing MCS algorithms are capable of handling graphs containing only afew tens of nodes at most, whereas even a coarse molecular grid will involve many hun-dreds or thousands of grid-points, and hence graph nodes. Accordingly, the use of anMCS algorithm for the generation of molecular alignments requires a substantial

302

morphism algorithm to such graph representations [2]. Analogous techniques are used

Page 309: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

reduction in the sizes of the grid-based graphs that are to be matched, while ensuringthat the resulting field-graphs still encompass the main features of the underlying fields,so that the alignments are chemically (and hence biologically) meaningful. Once theMCS procedure has been used to generate an alignment, the similarity between thealigned molecules can be calculated using the fast Gaussian similarity procedure de-scribed previously. The similarity measure used is the so-called Carbo index [28],which is actually a form of the long-established cosine coefficient [3] and which isdefined to be

where PA and PB are the properties (such as the MEPs) of the two molecules that arebeing compared.

A field-graph is generated from a set of grid points by identifying a subset of themthat have potential values meeting some criterion (to be discussed below) and thengrouping points that meet the chosen criterion and that are close to each other (in somesense). There are many ways in which a field-graph can be generated from a set of po-tential values, depending on the criterion that is used and the definition of ‘close’ that isused to determine whether two grid-points are to be considered as belonging to the samegroup. There are no obvious guidelines, a priori, as to how field-graphs should becreated, and we have accordingly evaluated a range of different procedures for creatingnodes. In fact, we have simplified the problem by considering only a single measure ofcloseness, which is that two grid-points, P and Q, are considered as being containedwithin the same graph node if P and Q are vertically, horizontally or diagonally ad-jacent to each other. This may be illustrated by considering the set of grid-points inFig. 1 (which is in just two dimensions for simplicity). If the criterion was that a grid-point value was to be at least 10 kcal/mol, then the four boxed elements would beselected as forming a node in the field-graph.

The basic approach involves thresholding the grid-point values using a user-definedthreshold potential: the application of this threshold identifies all of the positive (nega-tive) grid-points with values greater (less) than or equal to the threshold value. The

Fig. 1. Application of a threshold of +10 to a set of grid-point values to generate a field-graph. In this simple, planar grid, the four values surrounded by bold lines are taken as forming a single field-graph node since they all meet the threshold criterion and each is adjacent to at least one other grid-point that meets the threshold criterion. The resulting four-point node is shown shaded.

303

Page 310: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild. Peter Willett and P. Matthew Wright

selected subset of the original points is then considered for inclusion in one of the nodesin the field-graph, using the adjacency-based grouping criterion described above.Thorner et al. describe several different methods for the selection of the most appro-priate threshold potential: the most cost-effective of these methods was found to be onethat seeks to maximize the number or nodes i n the field-graph that contain at least someminimal number of grid-points [24].

Each node in a field-graph has an associated label defining the threshold level (either

of grid-points that it represents. The location of a node is defined by the geometric co-ordinates of the center of the corresponding cluster of points, so that the distance between two nodes (which represents the graph-edge that connects those two nodes) isthe distance between the two centers. Thus. a field-graph containing N nodes (i.e. onethat contains N clusters of grid-points after the application of the procedures describedabove) will contain N(N – 1)/2 distinct distances (since the inter-node distance matrix iscompletely symmetric), and a database of 3D structures can be represented for searchby the corresponding set of field-graphs.

Pairs of these graphs, one representing a target structure and one representing a data-base structure. are matched using a n MCS procedure based on the Bron-Kerboschclique-detection algorithm [29,30]. Two nodes are regarded as being equivalent if theirlabels have the same potential sign (either positive or negative); it is possible to adopt amore rigorous matching criterion that takes account of the magnitude of the local poten-tial, as well as its sign or of the size of the node (i.e. the number of constituent grid-points). or of some combination of the two. but all our experiments have employed justthe simple. sign-based matching criterion. The Bron-Kerbosch algorithm identifies theMCS based on such node-to-node matches, and the constituent pairs of matching nodesare input to a least-squares fitting routine. This routine aligns the database structure withthe target structure so as to minimize the squared distances between the centroids of thefield-graph nodes from the database structure with the centroids or the field-graph nodesfrom the target structure to which they have been mapped. The resulting alignment isthen used for the final grid-based similarity calculation.

The Bron-Kerbosch algorithm is designed. and normally used, to identify the largestsubgraph common to a pair of graphs. In the present context, however, we have used it to generate all of the subgraphs common to a pair of field-graphs, and not just thelargest such subgraph(s). This has been done to increase the number of possible align-ments that are considered in the precise similarity calculation, and thus to ensure thatthe minimum possible number or close neighbors to the target structure are overlookedbecause of the approximations involved in the generation of the field-graphs. The simi-larity between an individual database structure and the target structure is taken to be thelargest of the calculated similarities over all of the possible alignments resulting fromthe first-level, clique-detection search.

2.2 An operational implementation

The field-graph generation procedure described above was applied to the structures inthe Zeneca Agrochemicals corporate database. The 3D structures of these molecules

304

positive or negative) at which it was formed and the size of the node — i.e. the number

Page 311: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

were generated using CONCORD [31] and then the atomic point charges calculatedusing the MNDO routines in MOPAC Version 5.00 [32]. A grid was constructed aroundeach molecule, extending for 5.0 A beyond the maximum and minimum extents of themolecule in each plane, as determined by the centers of the atoms on the molecularsurface. The grid had a user-defined step-size (0.5 Å i n all of the experiments reportedhere), with grid-points being ignored if they fell within the van der Waals radius of anyof the atoms i n the molecule. The grids were then converted to field-graphs: the meannumber of nodes in the field-graphs for the 173 I97 molecules in the database was 7.13(standard deviation of 3.22), with the single largest graph containing 41 nodes. The gen-eration of the field-graphs took about 20 CPU days on an R4000 Silicon Graphics workstation (excluding the very extensive computation associated with the calculation of the atomic partial charges).

Preliminary testing showed that the search times for a scan of the entire file werelikely to be excessive unless a fair amount of program optimization was to be carriedout.

The relative computational requirements of the MCS-detection and Gaussian-calculation stages for the matching of the target structure with a database structuredepend upon the number of possible alignments resulting from the application of theMCS algorithm. Specifically, if only a few possible alignments are identified. then theMCS stage takes more time, while the Gaussian calculation takes more time if there aremany alignments (since each one of them needs the similarity calculation to be carried out). Most of the time requirement for the Gaussian calculation is occasioned by theneed to calculate square roots and exponentials, and the routine was hence modi fed tominimize the number of square roots and to calculate most of the exponentials by meansof a precalculated look-up table. This optimization was also usce in all of the GAexperiments described later in this chapter.

The time requirement for the MCS algorithm rises rapidly with an increase in thenumber of nodes in the target structure, given a fixed set o f database structures, and thisrequirement will hence be minimized if the target structure contains just a few nodes. Afield-graph with only a few nodes will also generate only a small number of alignments,and hence reduce the time requirement of the final similarity calculations. A series ofsearches was hence carried out in which a threshold was applied to the sizes of thenodes in the field-graph representing the target structure, and a cutoff applied, so thatonly those nodes containing more than n grid-points, where n is defined by the user,were considered in the matching of a target structure with a database structure. This pro-cedure will certainly increase the speed of searching, but may also mean that moleculesthat are, in fact, strongly similar to the target structure i n the final similarity calculationwill be overlooked since the appropriate alignments are not forthcoming from the MCSstage of the search. Following a detailed series of experiments [24], the final imple-mentation was such that the alignment stage considered only those target field-graphnodes for which n ≥ 3. The chosen value for n could, of course, be changed by a user if this was felt to be desirable in the context of a particular search.

With the incorporation of these and other modifications, a search of a typical targetstructure against the entire database can be accomplished in about 16 hours of elapsedtime on an R4000 Silicon Graphics workstation (i.e. in a single overnight run), although

305

Page 312: 3D QSAR in Drug Design. Vol2

David A. Thorner, David .I. Wild Peter Willett and P. Matthew Wright

large target structures can mean that a search will take a day or even more. It should benoted that some of the molecules in the database can lead to very large numbers ofalignments, each of which must then be checked in the subsequent Gaussian cal-culation: searches with several target structures demonstated that the search time couldbe reduced by approx. 40% simply by ignoring the top-ranked 5% of the database whenthe structures were ranked in decreasing order of the number of alignments that neededto be processed.

Searches with the system at Zeneca Agrochemicals demonstrate clearly that i t oftenleads to the retrieval of structures with a high degree of novelty that would not beretrieved by conventional similarity searching techniques that are based on patterns ofatoms (in either 2D or 3D) [3]. The system hence provides an effective way of suggest-ing novel bioisosteres for known active compounds. That said, the focus on just a singletype of field means that while the top-ranked structures generally provide a reasonablelevel of MEP similarity, many of them are of little interest since their steric and/orlipophilic characteristics render them inappropriate for the biological system underinvestigation. The principal value of the system is hence as an ideas-generator that cansuggest previously unexplored chemical classes to the chemist requesting the search.albeit at the cost of (in some cases) a low level of search precision.

There are two other obvious limitations to the current system. The first is that thegraph-creation routines have an inherent failure rate of around 6% since the thresholdcriteria that are used to create a field-graph can result in the identification of less nodesthan are necessary for the generation of a unique alignment [24]. Secondly, inspectionshows that suboptimal alignments are generated in some cases, with the result that mol-ecules that are similar to the target structure can be ranked less highly than they shouldbe. This is almost inevitable, given the simplicity of the representation that is used.

3.

3.1. The algorithm

Genetic algorithms are computational problem-solving methods that mimic some of theprincipal characteristics of biological evolution and genetic reproduction [33,34]. A GAcreates a randomly chosen set, known as a population, of individuals, each of whichcontains a representation of a possible problem solution. This solution is encoded in a linear string. called a chromosome. The effectiveness of the solution encoded by each of

nipulates the chromosomes so as to maximize the value of the fitness function. This itdoes by the creation of subsequent populations that include features from the fitterstrings in the previous population, in an iterative procedure that can be thought of as analgorithmic representation of biological reproduction. Parents are selected from the popu-lation, and information is taken from their chromosomes to produce one or more childindividuals that are inserted into the population. Chromosomes are manipulated by mu-tation (where the chromosomal material may be altered slightly in a random fashion)and crossover (where new child chromosomes are created by taking some chromosomal

Calculation of Similarities Using a Genetic Algorithm

306

the chromosomes in a population is measured by the fitness function, and the GA ma-

Page 313: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

material from one parent, and some from the other) operators. A GA may be consideredto have succeeded when convergence occurs — i.e. when the members of the popu-lation all lie in the same region of search space.

There have been several reports of the application of GAs to problems in chemicalstructure handling, demonstrating that they provide both an effective and an efficientmechanism for the investigation of a range of complex matching problems [35,36].Following earlier work by Payne and Glen [37], the GA we have developed here seeksto identify a combination of translations and rotations that will align one MEP withanother, fixed, MEP so as to give the highest possible similarity. Each chromosomecontains six components, three to encode rotations and three to encode translations, witheach component being allocated one byte. The chromosomes are initially set to randomvalues. and then decoded by applying the indicated rotations and translations to the 3Dcoordinates of the atoms in one of the two molecules that are being aligned. The result-ing coordinates are passed to the fitness function for the evaluation of the alignmentdefined by that particular set of rotations and translations: this function is the Gaussiansimilarity.

While GAS are simple in concept, there are generally many different ways in whichthey can be parameterized. Wild and Willett [22] describe the very extensive com-parative experiments that were carried out to ensure an appropriate combination ofeffectiveness — i.e. the ability to identify good MEP alignments — and efficiency — i.e.the CPU time required (since the GA must be sufficiently fast in operation to searchdatabases of non-trivial size). The final GA that was used for the experiments reportedin the next section had a population size of ten, steady-state-without-duplicates repro-duction, binary encoding, a static uniform crossover rate of 20%. linear normalizationand a diversely initialized population.

3.2. Comparison of alignment methods

Having described two different ways of generating alignments, the question arises as towhich is the more effective — this, in turn, requiring some quantitative means of evalu-ating the effectiveness of the similarity measures that are being tested. Previous work inour laboratory on the comparison of different similarity or clustering methods [38] has

purpose. Here, simulated property-prediction experiments are carried out using datasetsfor which both structural and property data are available, so as to ascertain whichmethods (e.g. which similarity coefficients) result in measures of structural similaritythat are most closely correlated with measures of property similarity. We have adopteda rather different approach in the work reported here. The extensive studies of Richardsand co-workers [13,16.19,20] have shown clearly that there is a strong correlation between biological activity and the similarities that result from grid-based MEP cal-culations (and further evidence of such a correlation is presented later in this chapter).Accordingly, we can compare the effectiveness of the field-graph and GA methodsfor matching MEPs by the magnitudes of the Gaussian similarities since a high similar-ity will be achieved if, and only if, an appropriate alignment has been achieved. A

307

made use of the similar-property principle of Johnson and Maggiora [39] for this

Page 314: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild. Peter Willett and P. Matthew Wright

comparable approach had been taken previously in identifying the best way of generat-ing field-graphs [24].

Our experiments used a test database of 1000 molecules taken from the FineChemicals Database (FCD), from which 100 molecules were chosen at random to act asthe target molecules for which the nearest-neighbor molecules were required. Thedataset was processed using CONCORD and the MNDO routines in MOPAC, as de-scribed previously, and then searches were carried out using the field-graph and GAmethods for generating alignments, with the methods parameterized such that they tookabout the same amount of time (1-2 CPU seconds on a medium-level Unix workstation)to match a pair of MEPs.

Let Si be the Gaussian similarity for the i-th most similar molecule to the m-th targetmolecule; then we define the performance measure Emn by

where typical values for n are 5, 10 or 20 — i.e. Emn is the mean MEP similarity for then nearest-neighbor structures of the m-th target structure. The overall effectiveness ofthe set of searches for the n nearest neighbors of each target structure, Emn, is then ob-tained by taking the mean of the 100 individual Emn values. When this was done, therewas found to be no obvious difference between the En values obtained using the field-graph and GA methods for generating the alignments; for example, both gave E20 valuesof 0.70 when averaged over the 100 searches of the FCD dataset, and a Wilcoxonsigned-rank test [40] showed that the two sets of searches were not significantly differ-ent in performance at the 0.05 level of statistical significance. Wild and Willett reportadditional experiments in which the alignments resulting from the GA were also shownto be at least as good as those resulting from a bit climber and from a simplex optimizer[22].

4. Inclusion of Conformation Flexibility

The experiments that have been described thus far have assumed that the molecules thatare being processed are completely rigid in nature. However, most organic moleculescontain one or more rotatable bonds, and it is to be expected that improved inter-molecular similarity relationships will be identified in a database search if the MEP-alignment procedure is able to take account of flexibility in a target structure and/or in adatabase structure.

The MEP at any point in 3D space is a function of the partial atomic charges and thedistances of each atom from that point, and a full treatment of the effect of conforma-tional flexibility on MEP-based similarity searching should thus take account of thechanges in both the atom-to-point distances and the partial atomic charges that canoccur as a molecule flexes [41]. However, the calculation of the partial atomic charges required for the generation of an MEP is very time-consuming. Our work on flexiblefield-based searching hence involves making the assumptions that the partial charges do

308

Page 315: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

not vary with the conformation adopted by a flexible molecule, and that the MEP isaffected only by the changes in the atom-to-point distances resulting from torsionalrotation.

In seeking to find an appropriate alignment procedure, we have been guided by theextensive studies that have been carried out into techniques for flexible 3D substructuresearching, where two main approaches have been described [1,5,25]. In the first ap-proach, a flexible molecule is characterized by a small number of conformations that arechecked to ascertain whether any of them contain a query pharmacophore [42–44].Thealternative approach involves a torsional optimization approach that permits an explo-ration of the full conformational space of a flexible molecule at search time, seeking todetermine whether it can adopt a conformation that contains the pharmacophore[45–47]. These two approaches are considered further below.

4. 1. Searching flexible molecules using field-graphs

Our initial experiments involved the application of field-graphs to the multi-conformationmethod of flexible searching [23]. Specifically, we sought to determine the numbers offield-graphs that are required to delineate fully the variations in MEP that result fromvariations in conformation, since the field-graph approach will only be applicable to searching databases of non-trivial size if these numbers are not large.

The experiments involved a dataset of eleven compounds, taken from the CambridgeStructural Database (CSD) [48], that had been used previously by Ghose et al. to evalu-ate conformational searching methods [49]. The SYBYL SEARCH module was used togenerate a number of conformations (between 48 and 1589) for each of the elevenmolecules by systematic increments of their rotatable bonds, and a field-graph was thengenerated from each of the resulting conformations. The set of field-graphs for the set ofconformations for each molecule was next converted into a searchable database, withthe target structure in each case being the field-graph that was generated from thatmolecule's CSD crystal structure.

Fig. 2. Similarity values for the molecule AGLUAM10 after sorting into descending order. Each value is the

for details).

309

MEP similarity for the matching of the CSD structure of this molecule against one of its conformers (see text

Page 316: 3D QSAR in Drug Design. Vol2

David A Thorner, David J. Wild, Peter Willett and P. Matthew Wright

The results of the searches were summarized as shown in Fig. 2. which is a sorted listof the similarities calculated for the 301 conformations that were generated for one ofthe eleven molecules, AGLUAM 10.

AGLUAM10

It will be seen that there are a few conformations with similarities in excess of 0.80but that the great majority of the conformations haw much smaller similarities: for thisdataset, the mean similarity is 0.60 1 with a standard deviation of 0.096. Similar resultswere obtained with all of the other molecules that were considered, with the mean (stan-dard deviation) similarity varying from 0.308 (0.071) to 0.887 (0.058). If all of the con-formations for a particular molecule gave comparable MEPs, and hence comparablefield-graphs, then most of the similarities with the target structure would be near to 1.0;in fact, most of them are very much smaller than this (with one of the conformations forone of the molecules yielding a similarity as low as 0.202).

It is, thus, clear that torsional rotations can bring about substantial changes in the MEPsimilarity with, in extreme cases, just a small change in a single torsion angle resulting inchanges of up to 40% in the similarity between the target structure and a conformation[23]. There are two reasons for such drastic changes in the similarity. The first, and mostobvious, is a change in the MEP itself arising from the rotation of a bond near to thecenter of a molecule. This can result in large-scale changes in the overall geometry ofthe molecule, and hence in the atom-to-point distances that are used in the calculation ofthe potential at each point in the 3D grid surrounding a molecule. A second problem isthe lack of robustness in the routine that is used to generate a field-graph from the set ofgrid-point potentials. once they have been calculated. We have found that the form of thefield-graph produced by this routine can be overly sensitive to the precise values of thesepotentials, with only small changes in the values sometimes leading to changes in thenumber or nodes in the resulting Geld-graph: these changes can affect the alignmentsgenerated by the MCS procedure and, hence, the final intermolecular similarities.

Substructure searches make use of extensive screening strategies to minimize the number of molecules for which a detailed search needs to be carried out. One strategythat has been found to be particularly effective in the context of flexible 3D substructuresearching is to associate a distance range with each pair of atoms in a flexible molecule[45]. Here, the lower and upper bounds of the range correspond to the minimum andmaximum separations of the two atoms as the molecule flexes, so that the set of distanceranges for a molecule contains all of the geometrically feasible conformations which that molecule can adopt. This representation allows the use of graph-based screeningprocedures, which ensure that only those molecules that match the query at the graphlevel proceed to the final, detailed conformational search [46,47].

Some preliminary experiments were thus carried out to investigate the extent towhich it might be possible to associate an analogous potential range with each point inthe 3D MEP grid surrounding a molecule. Specifically, each point in a grid was charac-

310

Page 317: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

terized by not one but two values, MAX and MIN, which are the maximum and theminimum values of the potential that are observed at that point as the molecule flexes.However, it was found that the potential ranges, MAX-MIN, were generally large and, thus, unable to provide any useful screening capability [23].

We have already noted some inherent limitations of the field-graph approach whendiscussing its application to the searching of rigid 3D molecules. The results obtainedhere suggest that it will he difficult for a field-graph representation derived from a singleconformation to provide an adequate description of the variations in MEP that can occuras a result of torsional rotations: instead, many conformations, and thus field-graphs,will be required for such a description. Since the matching of the field-graphs represent-ing a single conformation of a target structure and a single conformation of a databasestructure requires 1-2 CPU seconds, it is clear that the use of multiple field-graphs forflexible field-based searching will be extremely time-consuming. In addition, the para-meter-driven nature of a GA makes it easier to bias a search towards efficiency (i.e. asearch that runs quickly but that may miss some good hits) or towards effectiveness (i.e.a search using a larger population size and/or a greater number of generations).Accordingly, the remainder of this chapter focuses upon the use of the GA method forflexible similarity searching.

4.2.

The GA that we have developed can be used i n two ways. I n the first, which is de-scribed below and which formed the basis for most of the initial experiments, a rigidtarget structure is assumed but each of the database structures is allowed to be flexible;alternatively, the target structure can also be allowed to be flexible. We shall refer tothese two types o f search as ‘Flex’ and ‘FlexFlex’, respectively; searches when bothmolecules are rigid will be referred to as ‘Fixed’.

The Flex GA is a straightforward extension of that described previously. The chro-mosome again encodes the translations and rigid-body rotations, but augments these by an extra component for each rotatable bond in the database structure (and in the targetstructure in the more general, FlexFlex case) such that not only the position, but also the conformation, of a molecule is encoded, and a further component to encode cornerflipping in each flexible ring. The fitness function is augmented by a simple van derWaals radius bumpcheck procedure, which ensures that the torsion angles encoded in achromosome do not represent a high-energy conformation. If the bumpcheck is success-ful, then the alignment defined by the chromosome acts as the input to the Gaussiansimilarity calculation, with the resulting similarity being the raw fitness value for thatchromosome. The chromosomes are ranked in decreasing order of raw fitness from the size of the population clown to one (which is the least-lit chromosome in the currentpopulation), with their position in the ranked list being taken as the fitness. A startingvalue was also added to each chromosome to provide a fitness window. This last ap-proach was found to give a value of 1 . 1 (for Flex and 1.5 for FlexFlex) for the selectionpressure, that is, the ratio of the fitness of the fittest chromosome to the mean fitness ofthe population.

Searching flexible molecules using a genetic algorithm

311

Page 318: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild, Peter Willett and P. Matthew Wright

Fig. 3. Distribution of observed MEP similarities for matching a rigid target structure against 491 rigid (‘Fixed’) or flexible ( ‘Flex’) database structures. Each column denotes the mean number of similarities in theindicated range. when averaged over the 39 target molecules and the 10 runs for each target molecule thatwere used in these experiments

As with the previous GA, a large number of parameterization experiments werecarried out to maximize the performance of the algorithm. These experiments, whichare detailed by Thorner et al. [23] ,led to the use of two-point crossover and a bit-flipmutation operator, with other parameters having the following values (with the range ofvalues that were tested included in brackets): a population of 150 chromosomes (variedin the range 5-500): a selection pressure of 1.5 (varied in the range 1.05-1.90); 1250generations (varied in the range 50-5000); a crossover rate of 35% (varied in the range2-90%); and a mutation rate of 7% (varied in the range 0.1-90%). The effectiveness ofthe alignments produced by the GA, and hence the values of the final similaritycoefficient, increases in line with the number of generations; the 1250 value used herewas felt to provide the best trade-off between effectiveness and efficiency, with runtimes being about 3.5 CPU seconds for calculating the similarity between a rigid targetstructure and a flexible database structure, using an implementation of the algorithm inthe C programming language on a Silicon Graphics R4000 workstation.

The GA was tested using a set of 49 1 structures from the Fine Chemicals Database, 39of which were used in turn as the rigid target structure for two types of search. In the firstset of Flex searches, the database structures were allowed to be completely flexible, withthe search being effected using the GA parameter values given previously. In the secondset of Fixed searches, the database structures were kept entirely rigid so as to determinethe increase in performance, if any, resulting from the inclusion of torsional flexibility inthe matching process. The GAS that were used in these two sets of runs were parameter-ized so as to take about the same amount of CPU time, and both sets of searches were re-peated ten times to encompass the variations in performance that result from thenon-deterministic nature of GAS. The mean similarity of each target structure to eachdatabase structure. when averaged over all 39 target structures, all 491 database struc-tures in each case and all ten sets of runs was 0.465 for the Fixed searches and 0.556 forthe Flex searches. The distribution of the calculated similarities is shown in Fig. 3, which

312

Page 319: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

illustrates the marked shift to higher similarities that results from the inclusion of data-base-structure flexibility in the matching algorithm.

Thus far, we have considered only the flexibility or the database structures, whilekeeping the target structure rigid. It is, however, simple to extend the algorithm toinclude any inherent flexibility in the target structure. All that is required is to extendthe chromosome by a further byte for each rotatable bond in the target structure, and totreat these bytes in just the same way as is currently done for the corresponding bytes ina description of a database structure. It might be expected that allowing both the targetstructure and the database structures to flex would further increase the similarities thatwere obtained, when compared with the Flex searches. This was, however, not found tobe the case, since while the FlexFlex searches gave mean similarities that were againgreater than those of the Fixed searches, the largest mean similarities resulted from theFlex searches. Specifically, the Fixed, Flex and FlexFlex searches gave mean similar-ities of 0.398, 0.522 and 0.515, respectively, when averaged over these sets of searchesand when parameterized to take similar amounts of CPU time. We believe that thisseemingly counter-intuitive result arises from the Frequent non-convergence of theFlexFlex searches during the limited time available for the execution of the GA,especially when the target structure had a large number of rotatable bonds.

We, thus, concluded that while the inclusion of conformational flexibility in a field- based similarity search enables the identification of better MEP overlaps than if onlyrigid molecules are considered, the cost-effectiveness of allowing both the molecules ina comparison to flex required further study, as detailed below.

5. Prediction of Biological Activity

Searching a chemical database to find molecules that are similar to a bioactive targetstructure, and that might thus also be expected to exhibit the activity of interest, is oneof the principal applications of similarity searching [ 1 ,3]. The work reported thus far inthis chapter has not considered the effectiveness of our matching algorithms for thispurpose and we, hence, now report some initial experiments to ascertain the extent towhich field-based similarity searching might be of use in lead discovery programs. Inthese sets of experiments, the 3D CONCORD structures were minimized using SYBYLMAXMIN, and the atomic charges calculated using the PM3 routines in MOPAC.

The first tests used six small datasets that we have employed previously in a study ofmethods for distance-based 3D similarity searching [50]. Each of these datasets containsmolecules for which qualitative (active/inactive) data are available, as follows:

active [52];

4. 141 aromatic amines of which 98 were carcinogenic [54];

6. 145 nitrosamines of which 112 were carcinogenic [56].

313

5. 113 steroids of which 69 showed potent anti-inflammatory activity [55];

1. 209 9-anilinoacridines of which 150 showed anti-tumor activity [51];

3. 112 nitrobenzenes of which 53 were musk odorants [53]:

2. 147 barbiturates of which 37 had sufficient durations of activity to be classed as

Page 320: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild. Peter Willett and P. Matthew Wright

For each dataset, an active compound was selected and used as the target structure lorFixed, Flex and FlexFlex searches of that dataset. This was repeated for a total of tendifferent target structures, and the mean similarities that were obtained are listed i nTable 1. It will be seen that, on average, the Flex searches again give the largest similar-ities and hence, one might assume, the best alignments. Different results are obtained,however, when the number of active nearest neighbors that were retrieved is considered,as illustrated in Table 2. The columns in the main body of the table list the meannumber of actives that were retrieved when the 20 nearest neighbors were consideredfor each of the ten active target structures mentioned previously. The columns headedFixed, Flex and FlexFlex have their normal meanings, while those headed AM andMCS represent the results that were obtained using the Atom Mapping (AM) andMaximal Common Substructure (MCS) similarity measures discussed by Pepperrelland Willett in their review of distance-based measures for 3D similarity searching [50].The results suggest that FlexFlex is the best of the three MEP-based similarity measuresand that it is at least as effective as the two distance-based similarity measures.

The QSAR datasets are limited both in size and in structural heterogeneity, and alsohave a large proportion of actives; the second set of experiments hence used a file

314

Page 321: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

of 3500 structures drawn from the World Drugs Index (WDI), which contains 2Dstructures and broad-class activity indicants for some tens of thousands of drugs.

The target structures we adopted were those used recently by Kearsley et al. in a studyof properly-based similarity searching [57]. These molecules are listed in Table 3,together with their associated WDI activity classes and the number of molecules

tagonist, rather than its true nature as an inhibitor of angiotensin-converting enzyme, andwe have thus used this description in the searches reported here.) For the purposes of theseexperiments, drugs that lie within the same activity class or classes as the target structure are considered to be actives. For each of the target structures, all of the actives were addedto the WDI subset mentioned previously, and then Fixed, Flex and FlexFlex similaritysearches carried out to retrieve the nearest neighbors for the target structure in each case.

The results that were obtained are summarized in Table 4, which contains meanvalues averaged over the ten target structures that were used. Here, the first row in themain body of the table gives the mean similarity between the target structure and thedatabase structures, while the following four rows list the mean numbers of actives re-trieved when the top 10, top 20, top 50 and top 100 nearest neighbors were retrieved. Itwill be seen that the relative performance of the three types of search depends on thenumber of nearest neighbors that are retrieved. with the Fixed searches giving the bestresults for a precision-oriented search in which just a few, highly similar, structures arerequired, and with the FlexFlex searches giving a relatively better level of performanceas more recall-oriented searches are required. As with the earlier experiments. the Flexsearches yield the highest mean similarities, and we would thus hypothesize that theFlexFlex would perform even better at retrieving active molecules if it were allowed torun for long enough to give similarities that were comparable with those obtained in theFlex searches.

It must be emphasized that the results presented here are only preliminary, but they do suggest that the inclusion of flexibility information can increase the effectiveness offield-based searching at little computational cost, when compared with the correspond-ing non-flexible searches.

6. Conclusion

Several previous workers have described methods for calculating the similaritiesbetween pairs of molecules characterized by their MEPs. In this chapter, we have pre-sented algorithms and data structures that enable such similarities to be calculatedsufficiently rapidly to enable MEPs to be used for field-based similarity searching inchemical databases.

Our first approach involves the use of an MCS algorithm to align the field-graphsrepresenting the target structure and each database structure. The alignment(s) resultingfrom this algorithm are then used for the calculation of the final MEP-based similaritymeasure. The second approach involves a GA that encodes the translations and rotationsneeded to maximize the overlap of the MEPs of a target structure and a databasestructure. This is found to be comparable in effectiveness with the field-graph approach

315

belonging to that particular class. (Captopril is described in the WDI as an angiotensin an-

Page 322: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild. Peter Willett and P. Matthew Wright

316

Page 323: 3D QSAR in Drug Design. Vol2

Calculation of structural Similarity by the Alignment Molecular Electrostatic Potentials

Table 4 Results of searching the World Drug Index; the Similarity row gives the mean similarity whenaveraged over the 10 target structures while each of the remaining rows lists the total number of actives retrieved for a given number of nearest neighbors (NNs)

Effectiveness Measure Fixed Flex FlexFlex

Similarity 0.48 1 0.592 0.556

Actives (NN = 10) 5.4 4.8 4.0

Actives (NN = 20) 8.8 7.8 7.9

Actives (NN = 50) 12.5 10.9 13.9

Actives (NN = 100) 18.8 17.8 19.1

when rigid structures are considered, but is noticeably more efficient if flexiblesimilarity searching is to be carried out.

The studies reported here are now being extended in two ways. Firstly, we are carrying out further searches on the World Drugs Index dataset using not just the MEP similaritymeasure, but also measures based on 2D fingerprints and on 3D interatomic distances (specifically the atom-mapping measure [6]). The aim of this work is to investigate the extent to which the various similarity measures retrieve different sets of bioactive mole- cules as the output from a similarity search. Our initial experiinents suggest that thesearch outputs do. indeed, differ. with those resulting from the MEP measure beingnoticeably more heterogeneous than those resulting from the 2D and 3D measures, thussupporting the hypothesis that field-based searching provides a way of identifying novelbioisosteres that would not be retrieved by more conventional, atom-based searching pro-cedures. Secondly, we have already noted, when discussing the system at Zeneca Agrochemicals, that the focus on just a single type of field means that while the top-ranked structures in a search generally provide a reasonable level of MEP similarity,many of them are of little interest since their steric and/or lipophilic characteristics renderthem inappropriate for the biological system under investigation. We are, accordingly,extending tlie GA to enable all three types of field information to be taken into accountwhen deciding which molecules should be retrieved in a database search, with the ex-pectation that this will further improve tlie retrieval of bioactive molecules from database searches. Full details of both of these sets of experiments will be reported shortly.

Acknowledgements

We thank the following: the Biotechnology and Biological Sciences Research Council, the James Black Foundation, the Science and Engineering Research Council and Zeneca Agrochemicals for funding; Harold Cox, Jonathan Davies, Bobby Glen, Gareth Jones, Caroline Low, Anne Mullaley, Robin Taylor and Andy Vinter for helpful discussions on genetic algorithms and molecular electrostatic potentials; Derwent Information forproviding the World Drugs Index: and Tripos Inc. for hardware and software support.The Krebs Institute for Bioniolecular Research is a designated Bioinolecular SciencesCentre of the Biotechnology and Biological Sciences Research Council.

317

Page 324: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild, Peter Willett and P. Matthew Wright

References

1.

2.

3.

4.

5.

6.

Good, A.C. and Mason. J.S.. Three-dimensional structure database searches, Rev. Comput. Chem., 7(1995)67-117.Barnard, J.M., Substructure searching methods: Old and new, J. Chem. Inf. Comput. Sci., 33 (1993)

Downs, G.M. and Willett, P., Similarity .searching in databases of chemical structures, Rev. Cornput. Chem., 7 (1995) 1-66.Sadowski, J. and Gasteiger, J., From atoms and bonds to three-dimensional atomic coordinates: Automatic model builders, Chem. Rev.. 93 (1 993) 2567-2581.Martin. Y.C. and Willett, P. (Eds.). Designing bioactive molecules: Three-dimensional techniques and applications (in press). Pepperrell. C.A.. Willett. P. and Taylor. R.. Implementation and use of an atom-mapping procedure for similarity searching in databases of 3D chemical structures, Tetrahedron Comput. Methodol., 3 (1990) 575-593.Perry, N.C. and van Geerestein. V.J., Database searching on the basis of three-dimensional molecular similarit). using the SPERM program. J. Chem. Inf. Comput. Sci., 32 (1992) 607-616.Fisanick, W., Cross. K.P. and Rusinko. A,. Similarity searching of CAS Registry substances: I. Global molecular property and generic atom triangle geometric searching, J. Chem. Inf. Comput. Sci., 32 (1992) 664-674.Bemis. G.W. and Kuntz. I.D.. A fast and efficient method for 2D and 3D molecular shape description, J. Cornput.-Aided Mol. Design, 6 ( 1992) 607-628.Nilakantan. R.. Bauman, N. and Venkataraghavan. R., A new method for rapid characterisation of mole-cular shape: Applications in drug design. J. Chem. Inf. Comput. Sci., 33 (1993) 79-85.Bath, P.A., Poirrette. A.R.. Willett. P. and Allen. F.H., Similarity searching in files ofthree-dimensionalchemical structures: Comparison of fragment-based measures of shape similarity, J. Chem. Inf. Comput. Sci., 34 (1994) 141-147.Good, A.C., Ewing. T.J.A.. Gschwend. D.A. and Kuntz, I.D.. New molecular shape descriptors: Application in database screening, J. Cornput.-Aided Mol. Design. 9 (1995) 1-12.Burt. C.. Richards, W.G. and Huxley. P.. The application of molecular similarity calculations, J. Comput. Chem., 11 (1990) 1139-1146.Manaut. F.. Sanz. F., Jose. J. and Milesi. M., Automatic search for maximum similarit). between molecu-lar electrostaticpotential distributions, J. Cornput-Aided Mol. Design, 5 (1991) 371-380.Richard. A.M.. Quantitative comparison of molecular electrostatic potentials for structure-activitystudies. J. Comput. Chem., 12 (1991) 959-969.Good, A.C.. Hodgkin. E.E. and Richards. W.G.. The utilisation of Gaussian functions for the rapid evaluation of molecular similarity); J. Chem. Inf. Cornput. Sci., 32 (1992) 188-191.Sanz. F.. Manaut, F., Rodriguez. J.. Lozoya. E. and Lopez-de-Brinas. E.. MEPSIM: A computational package for analysis and comparison of molecular electrostatic potentials, J. Cornput.-Aided Mol. Design, 7 (1993) 337-347.Petke, J.D., Cumulative and discrete similarity analysis of electrostatic potentials and fields, J. Cornput. Chem., 14 (1993) 928-933.Good, A.C.. Peterson. S.J. and Richards. W.G.. QSARS, from similarity matrices: Technique validation and application in the comparison of different similarity evaluation methods, J. Med. Chem., 36 (1993) 2929-2937.Burt. C.. Molecular similarity calculations for the rational design of bioactive molecules, In Vinter, J.G. and Gardner. M. (Eds.) Molecular modelling and drug design. Macmillan, London, 1994. pp. 305-332.Johnson. M.A., Maggiora, G.M., Lajiness, M.S.. Moon, J.B.. Petke. J.D. and Rohrer, D.C., Rational use of chemical and sequence databases. In van de Waterbeemd. H. (Ed.) Advanced computer-assisted techniques in drug discovery, VCH, Weinheirn, 1995, pp. 89-110.Wild, D.J. and Willett. P.. Similarity searching in files of three-dimensional chemical structures: Alignment of molecular electrostatic potentials with a genetic algorithm, J. Chem. Inf. Cornput. Sci., 36 (1996) 159-167.

532-538.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

318

Page 325: 3D QSAR in Drug Design. Vol2

Calculation of Structural Similarity by the Alignment of Molecular Electrostatic Potentials

Thorner, D.A., Wild, D.J., Willett, P. and Wright. P.M., Similarity searching in files of three-dimensional chemical structures: Flexible field-based searching of molecular electrostatic potentials,J. Chem. Inf. Comput. Sci., 36 (1996) 900–908. Thorner, D.A., Willett, P., Wright. P.M. and Taylor. R., Similarity searching in files of three-dimensional chemical structures: Representation and searching of molecular electrostatic potentialsusing field-graphs, J. Coinput.-Aid. Mol. Design, 11 (1997) 163–174.Bures, M.G., Martin, Y.C. arid Willett, P., Searching techniques for databases of three-dimensionalchemical structures, Topics Stereochem., 21 (1994) 467–511.Ho, C.M.W. and Marshall, G.R., FOUNDATION: A program to retrieve all possible structures con-taining a user-defined number of natching query elements from three-dimensional databases. J. Comput.-Aid. Mol. Design, 7 (1993) 3–22.Moon, J.B. and Howe, W.J., 3D database searching and de novo construction models in moleculardesign, Tetrahedron comput. Methodol., 3 (1990) 697–711.Carbo, R.. Leyda, L. and Arnau, M., How similar is a molecular to another? An electron density measure of similarity between two molecular structures, Int. J. Quant. Chem., 17 (1980) 1185–1189 Bron, C. and Kerbosch, J., Algorithm 457: Finding all cliques of an undirected graph, Comm. ACM, 16(1973) 575–577.Brint, A.T. and Willett, P., Algorithms for the identification of three-dimensional maximal commonsubstructures, J. Chem. Inf. Comput. Sci., 27 (1987) 152–158.CONCORD is distributed by the University of Texas at Austin and Tripos Inc., St Louis, MO, U.S.A. Stewart. J.J.M., MOPAC; a semiempirical molecular orbital program, J. Comput.-Aided Mol. Design,4 (1990) 1–105.Goldberg, D.E., Genetic Algorithms in Search. Optimisation and Machine Learning. Addison-Wesley.New York, 1989.Michaelewicz, Z., Genetic Algorithms + Data Structures = Evolution Programs, 2nd Ed., SpringerVerlag, New York, 1994.Willett, P., Genetic algorithms in molecular recognition and design, Trends Biotech.,13 (1995)516–521.Clark, D.E. and Westhead. D.R., Evolutionary algorithms in computer-aided molecular design,J. Comput.-Aided Mol. Design, 10 (1996) 337–358. Payne, A.W.R. and Glen, R.C., Molecular recognition using a binary genetic search algorithms, J. Mol.Graph., 11 ( 1993) 74-91.Willett, P., Similarity searching and clustering algorithms for processing database of two-dimensionaland three-dimensional chemical structures, In Dean. P.M. (Ed.). Molecular similarity in drug design, Chapman and Hall. Glasgow, 1994. pp. 110–137.Johnson, M.A. and Maggiora, G.M. (Eds.), Concepts and Applications of Molecular Similarity. John Wiley, New York, 1990.Siegal, S., Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill Kogakusha. Tokyo. 1956. Reynolds, C.A., Essex, J.W. and Richards. W.G., Atomic charges for variable molecular conformations. J. Am. Chem. Soc., 114 (1992) 9075–9079. Clark. D.E., Willett, P. and Kenny, P.W., Pharmacophoric pattern matching in files of three-dimensional chemical structures: Use of smoothed-bounded distance matrices for the representationand searching of conformationally-flexible molecules, J. Mol. Graph.. 10 (1992) 194–204. Moock, T.E., henry, D.R.. Ozkabak, A.G. and Alamgir, M., Conformational searching in ISIS/3D databases, J. Chem. Inf. Comput. sci., 34 (1991) 184–189.Hurst, T., Flexible 3D searching The directed tweak technique, J. Chem. Inf. Cumput. Sci., 34 (1994)190-196.Milne, G.W.A., Nicklaus, M.C., Driscoll, J.S. and Wang, S., National cancer Institute Drugs Information System 3D Database, J. Chem. Inf. Comput. Sci., 34 (1994) 1219–1224.Smellie, A., Kahn. S.D. and Teig, S.L., Analysis of conformational coverages: 1. Validation andestimation of coverage, J. Chem. Inf. Comput. Sci., 35 (1995) 285-294.Sinellie, A., Kahn, S.D. and Teig, S.L., Analysis of conformational coverage: 2. Applications of con-

formational models, J. Chem. Inf. Comput. Sci., 35 (1995) 295–304.

23.

24.

25.

26.

27.

28.

29.

30.

31.32.

33.

34.

35.

36.

37.

38.

39.

40.41.

42.

43.

44.

45.

46.

47.

319

Page 326: 3D QSAR in Drug Design. Vol2

David A. Thorner, David J. Wild, Peter Willett and P. Matthew Wright

48. Allen, F.H., Davies, J.E., Galloy, J.J., Johnson, O., Kennard, O., Macrae, C.F., Mitchell, E.M., Mitchell,G.F., Smith, J.M. and Watson, D.G., The development of Versions 3 and 4 of the Cambridge Structural Database System, J. Chem. Inf. Comput. Sci.,31 (1991) 187-204.Ghose, A.K., Jaeger, E.P., Kowalczyk, P.J., Peterson, M.L. and Treasurywala, A.M., Conformationalsearching methods for small molecules: 1. Study of the SYBYL search method, J. Comput. Chem.,

Pepperrell, C.A. and Willett, P., Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances, J. Cornput.-Aid. Mol. Design, 5 (1991) 455–474. Henry, D.R., Jurs, P.C. and Denny, W.A., Structure-antitumour activity relationships of 9-anilino- acridines, J. Med. Chem., 25 (1982) 899-908.Stuper, A.J. and Jurs, P.C., Structure-activity studies of barbiturates using pattern-recognition techniques, J. Pharm. Sci., 67 (1978) 745-751.Chastrette, M., Zakarya, D. and Elmouaffek, A., Structure-odor relations of the nitrobenzene muskfamily, Eur. J. Med. Chem., 21 (1996) 505–510. Yuta, K. and Jurs, P.C., Computer-assisted structure-activity studies of chemical carcinogens: Aromatic amines, J. Med. Chem., 24 (1981) 241-251.Stouch, T.R. and Jus, P.C., Computer-aided studies of the structure–activity relationships betweensome steroids and their anti-Rose, S.L. and Jurs, P.C., Computer-assisted studies of structure-activity relationships of N-nitrosocompounds using pattern recognition, J. Med. Chem., 25 (1982) 769-776.Kearsley, S.K., Sallamack, S., Fluder, E.M., Andose, J.D., Mosley, R.T. and Sheridan, R.P., Chemicalsimilarity using physicochemical property descriptors, J. Chem. Inf. Comput. Sci., 36 (1996) 118-127.

49.

14 (1993) 1050-1065.50.

51.

52.

53.

54.

55.

56.

57.

inflammatory activity, J. Med. Chem., 29 (1986) 2125–2136.

320

Page 327: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

Andrew C. Gooda and W. Graham Richardsb

a Glaxo Wellcome Medicines Research Center, Gunnels Wood Rood Stevenage, Herts, SGl 2NY,U.K.

b Physical Chemistry Laboratory, South Parks Road, Oxford OX1 3QZ, U.K.

1. Introduction

The measurement of (dis)similarity between molecules in 3D is central to many aspects of Computer-Aided Molecular Design (CAMD). From evaluating possible molecularsuperpositions to applying descriptors to QSAR model construction, it is the degree ofmolecular similarity between molecules which is being evaluated. In general, 3Dmodelling techniques such as CoMFA [1]and DISCO [2] apply an implicit measure-ment of the similarity between structures during molecular comparison. There is abranch of CAMD, however, which attempts the explicit calculation of molecular simi-larity in order to elucidate SAR data. In this chapter, we summarize a number of suchtechniques, detailing their application, merits and pitfalls.

2.

A multitude of indices have been suggested for the calculation of explicit molecular similarity, using a variety of 3D molecular descriptors. These indices can be separatedinto two basic classes. The first is best described as the set of ‘cumulative’ similarityindices. For these formulae, molecular similarity is evaluated via the accumulation ofoverlap or difference values over all descriptor space. The second can be characterizedas ‘discrete’ in nature. Such indices evaluate similarity at discrete points in descriptorspace, with overall molecular similarity determined from the average of these pointvalues.

2. 1. Cumulative formulae

Hopfinger [3,4] proposed a number of indices including equations for measuring thecommon volume of steric overlap between molecular pairs. One example is shown here:

A Brief History of 3D Molecular Similarity Coefficients

(1)

Each atom is described by a sphere of van der Waals (vdW) radius. The atomic overlapvolume for all intermolecular atom-pair combinations is calculated. Summation of thesemolecular overlap data provides a measure of shape similarity. Hermann and Herron [5]and Masek [6] et al. have both proposed variants of this approach.

Perhaps the most widely applied ‘cumulative’ molecular similarity index used in 3DSAR studies was pioneered by Carbo et al. [7]:

H. Kubinyi et al. eds.), 3D QSAR in Drug Design, Volume 2 321–338.©1998 Kluwer Academic Publishers. Printed in Great Britain.

Page 328: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

(2)

Molecular similarity CAB is determined from the 3D descriptor properties PA and PB ofmolecules A and B being compared over all space. As i n Eq. 1, the numerator term meas-ures property overlap, while the denominator normalizes the similarity result. Its struc-ture is identical in construction to the cosine coefficient which is also utilized in 2Ddatabase similarity searching [8,9]. The formula known as the Carbo index measures thedeviation of two molecular properties from proportionality and is, thus, sensitive to theshape of property distribution rather than to its magnitude. This is highlighted by thefact that when the measured properties of two molecules correlate (PA = nPB), the simi-larity index tends towards unity. Variants of this index have been proposed by Hodgkinand Richards, and also by Petke, in an attempt to increase the sensitivity of the formulato property magnitude [10–12].

The indices of Carbo and Hodgkin have been used extensively in molecular similarityinvestigations. As originally applied, the Carbo index was utilized to measure the simi-larity of quantum-mechanically derived electron density between molecules [7,13–15].Other quantum-mechanically derived measures of electron distribution have also beenused in conjunction with variants of the index [16–18]. Electron density has the qualityof being an analytical property firmly grounded in quantum chemistry. Its calculation is,in general, CPU intensive, however, and it is not a particularly discriminating property.For 3D QSAR calculations, the Molecular Electrostatic Potential (MEP) and molecularshape derived measures tend to be preferred, due to their ease of calculation and im-proved discrimination. To this end, similarity evaluations using the variants of theCarbo index have been extended to include the comparison of such MEP [10,19–23]and shape [24–29] descriptors.

A close relative of the Carbo index, the Tanimoto coefficient, which is widely used in2D similarity calculations [8], has also been applied in modified form (Eq. 3) to themeasurement of shape similarity [29]:

(3)

PAPB equals the volume overlap between structure A and shape query B (B is the shapederived from a single molecule, group or molecules or active site), PA –PB is the struc-ture volume not in the query, PB–PA the query volume not i n the structure and w1 and w2

are user-defined weights. A more distant relative of the Carbo index, the Spearmanrank-correlation coefficient, has also been used to calculate 3D molecular similarity:

(4)

This index has been used for molecular similarity calculation in a number of studiesinvolving MEP and accessible surface [30–35]. Rather than applying the actual numeri-

322

Page 329: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

cal values of the properties being evaluated, the formula calculates the Pearson cor-relation coefficient of the relative ranks of the data. The numerator term di is the dif-ference in the property ranking at point i of two structures, and n is the total number ofpoints over which the property is measured.

Another widely used family of indices analyas molecular dissimilarity through themeasurement of the Euclidean distance between molecular properties. An example ofthis is shown in Eq. 5, which calculates the root mean squared difference between properties:

(5)

Perry and van Geerestein [36] used Eq. 5 for optimizing molecular surface shape simi-larity. Other variants of this group of indices include the measurement of sum squarederrors and mean squared difference [ 32,37–39].

2.2. Discrete formulae

More recently, indices which measure similarity at discrete points in space have beenemployed to allow greater control over molecular similarity calculations [12,22,40–42].The application of such formulae permits both graphical and statistical analysis to beexecuted on the resultant similarity grid. For such discrete indices, overall molecularcomplementarity is obtained by summing the similarities over all points in space andthen dividing the result by the total number of points present. The resulting value is theaverage similarity over measured property space. Reynolds et al. [40] proposed the firstdiscrete similarity formula for 3D calculations, in the form of the ‘linear’ similarityindex:

(6)

max (|PA|,|PB|) equals the larger molecular property magnitude between PA and PB atthe grid-point where the similarity is being calculated. An exponential variant of thisindex was also proposed by Good [22], while Petke [12] created two further discreteindices, including one based on the Hodgkin index. These discrete indices have beenapplied to the measurement of MEP and Molecular Electrostatic Field (MEF) similarity

Recently, Klebe et al. [41] proposed a discrete formula for the evaluation of mole-cular similarity in 3D QSAR calculations. This methodology is interesting, in that ratherthan calculating a pair-wise molecular similarity between molecules, similarity is deter-mined between a given molecule and a probe atom:

(7)

323

[12,22],

Page 330: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

where the similarity A for physico-chemical property k at grid-point q is calculatedthrough Summation across all n atoms of molecule j, wik is the physico-chemical pro-perty value of atom i, wprobe,k the probe atom property (+1 atomic charge, 1 Å radius and+1 hydrophobicity), σ the attenuation factor and riq the mutual distance between probeatom at grid-point q and atom i of molecule j. The resulting molecular similarity grids have been applied successfully to 3D QSAR calculations [43].

2.3. Comparing coefficients

Choosing the equation which best suits one’s need for 3D molecular similarity cal-culations is no simple task. While many studies have been undertaken to compare therelative merits of similarity indices in 2D database searches (detailed elsewhere, seereference [8]), equivalent comparisons involving 3D descriptors are less common. Petke[12]and Good [22] explored issues of sensitivity in some detail, comparing similarityindex values for MEPs and MEFs across a number or datasets. Both these studiessuggest that discrete indices tend to be more sensitive than their cumulative counter-parts. The test systems used (e.g. the comparison of index behavior at a single discretepoint i n space) were, however. rather artificial in nature. Calculations involving actualligand series yielded more complicated relationships [22] . Indeed, extensive QSARstudies by Good et al. [43], involving multiple datasets and indices, suggest that despiteits perceived lack of sensitivity, the Carbo index produces results comparable to manyof its more complex contemporaries.

These results highlight the problems in attempting to define which is the best index.Each coefficient has its own strengths and weaknesses. The Spearman rank-correlationcoefficient (Eq. 3) has been shown to perform poorly in QSAR studies [43]. Its structureis such that it is scale invariant, however, making it potentially useful with systems con-taining a net charge, as it prevents highly charged regions dominating all other systemfeatures. Discrete indices, such as Eq. 6, allow improved control and graphical/

while formulae such as the Carboindex allow rapid analytical evaluation using Gaussian functions [23,27] . Many of theindices have been applied with success to a variety of problems. The choice of indexand its method of evaluation should, therefore, be made with careful consideration tothe molecular design problem at hand. The information provided in section 3 shouldmake the nature of such considerations clearer. More extensive comparisons of similar-ity coefficients in general. and those applied to 3D problems in particular, are detailedelsewhere [44–46].

3. 3D Descriptors and their Application in Explicit Molecular Similarity Calculations

In many cases, i t is not so much the similarity index, but the methodology applied toproperty calculation, that determines the success of a molecular similarity evaluation. Ahost of 3D descriptors have been used in conjunction with the molecular similaritycoefficients described above. Many of the measures have been selected or tailored to

324

statistical analysis of similarity evaluations [22],

Page 331: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

meet particular applications, from ab initio quantum mechanical calculations for thorough molecular comparison to specially devised atomic triplet shape descriptors forrapid database searches. A number of these approaches are detailed here, together withtheir associated applications.

3.1.

A variety of techniques have been exploited in comparing the molecular fields betweenmolecules, including numerical evaluation on rectilinear grids, analytical comparisonsemploying Gaussian function approximations, gnomonic projection and molecular volume measurement. These and other approaches are detailed below.

3.1.1.In CAMD, continuous molecular properties are often approximated through the applica-tion of rectilinear grids, and this is no less the case in molecular similarity calculations.When originally applied to the problem of measuring molecular similarity, manyindices were (and still are) used to measure molecular similarity based on quantum-me-chanically derived properties [7, 13–18,47–48]. CPU hungry calculations and lack ofresults discrimination led Hodgkin and Richards [10,11] to apply similarity indices innumerical evaluations of MEP and MEF (Fig. 1). The increased speed of MEP andMEF similarity calculations allows systems of greater biological significance to bechosen for study. Burt et al. [19] further developed the methodology by undertakingMEP and MEF similarity calculations for a series of nitromethylene insecticides. Multiple grid increments, extents and charge schemes were employed in order to deter-mine the optimum compromise between speed and accuracy of calculation. Burt andRichards [20] then added the ability to include flexible fitting during similarity opti-mization. They found that by allowing torsional flexibility, molecular superpositionswith significantly higher similarity could be realized. In order to lower the chance of

Calculations based on explicit molecular shape and MEP evaluation

Descriptor calculations in grid space

Fig. 1. Numerical approximation for the Carbo index using molecular properties defined on a rectilineargrid. adapted from reference [43].

325

Page 332: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

obtaining energetically unfavorable conformations during such calculations, the similar-ity index calculation was weighted using a Boltzmann factor to penalize significantincreases in internal energy:

(8)

where c is the weighting factor, ∆E the energy of rotated conformation minus energy ofinitial conformation. R the gas constant and T the temperature. All the software functionsdescribed above have been integrated into a single software package known as ASP [49].Similar procedures have also been applied to measurement of hydrophobic similarity,essentially replacing point charges with atom-based hydrophobic potentials [50].

Comparable functionality to that described above has also been developed by Manautet al. [34-35] in the form of the MEPCOMP and MEPSIM software. These programemploy the Spearman rank-correlation coefficient (Eq. 4) to determine and optimizeMEP similarity from ab initio derived MEP grids, as well as those calculated from pointcharges. They also use a novel method for determining the grid-points to be included inthe optimization, setting inner and outer exclusion volumes proportional to the vdWradii of the constituent atoms. The internal shell is usually set to a small value, so thatthe high MEP values can attempt to simulate steric interactions.

One of the problems of a rectilinear grid with uniform point density is that the systemgives equal importance to points close to and far from the molecules being compared.Richard [21] proposed an alternative in the form of a molecular atom-centered radial (MACRA) grid. As its name suggests, a MACRA grid produces grid-points emanatingfrom the atom centers of a molecule. A template sphere employing a fixed or vdW radiuswith approximately uniformly distributed surface points is centered on each atom. Thisforms the first layer of the grid. The second layer is created by scaling the sphere radiusthrough the addition of the average distance between grid-points on the lower layer. Pointsclashing with lower shells of other atoms are removed. The primary advantage of such anapproach is that, as the layers arc constructed, the resulting radial grid requires around 2%of the points of a 1.0 Å increment rectilinear grid to encompass a ligand of drug-like pro-portions. The technique contains certain drawbacks however, not the least of which is theinability of such a grid to deal adequately with concave molecular surfaces.

The calculation of molecular shape similarity has also been undertaken employingrectilinear grids. Meyer and Richards [24] utilized a point counting function version ofthe Carbo index through modifications of the ASP program. Every grid-point is tested tosee whether it falls inside the vdW surface of each molecule. The results are thenapplied to the following index:

(9)

B is the number of grid-points falling inside both molecules, while TA and TB are thetotal number of grid-points falling inside each individual molecule. The use of

326

Page 333: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

extremely fine grids (0.2 Å separation) in conjunction with this technique make for pro-longed calculation times, restricting its utility to SAR calculations rather than molecularalignment. More recently, Hahn [29] has developed a more rapid grid-based method forshape evaluation. The technique uses rapid volume and principal axes indices to pre-screen for molecules similar in shape to a given query. Molecules passing this screenare aligned by their principal axes (and symmetrically equivalent superpositions) andtheir shape similarity evaluated through volumetric comparison. Additional optimiza-tion, flexible fitting and electrostatic similarity comparison functions arc also included. The use of such rapid pre-screens and coarser grids allows this approach to be used forfull-scale 3D database searching.

3.1.2. Gaussian function evaluationsWhile grid-based similarity evaluation techniques are common, their numerical founda-tions impart inherent drawbacks. The largest of these problems is that. to gain com-putation speed, the grids employed are normally coarse, with the consequence thatresulting evaluations of spatial properties are somewhat rough. In particular, the similar-ity optimization through the modification of relative molecular position is coarse andcrude. It is, for example, very difficult for a grid-based similarity optimization to super-impose a molecule on top of itself, since the program tends to converge prematurely atsome discrete point.

The mathematical structures of certain similarity coefficients are such that analyticalGaussian functions may be exploited in their evaluation — for example, MEP cal-culations employing the standard point charge approach, where the charges (qi) assigned to each atom (i) create an electrostatic potential at a point r for a molecule of n atomsaccording to the following equation:

(10)

where Ri is the nuclear coordinate position of atom i.

function approximation by Good et al. [23] (Fig. 2):The inverse distance dependence term of Eq. 10 was substituted with a Gaussian

(11)

When this potential function is substituted into the Carbo index (Eq. 2). the resulting in-tegral terms expand into a series of two-center Gaussian overlap integrals. A two-centerGaussian overlap integral has a simple solution based on the exponent values and dis-tances between atom centers shown in Eq. 12 [51]:

(12)

327

Page 334: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

Fig. 2. Two and three Gaussian function approximations to the l/r distance dependence curve. Adapted from reference [22]: I/r; ..... 2 Gaussians; —. —. 3 Gaussians.

The similarity calculation can thus be broken down into a succession of readily calcula- ble exponent terms. As a result of this, it is possible to evaluate MEP similarity rapidly and analytically, and as no singularity exists when the potential approaches an atomic nucleus (Fig. 2). the calculation need not be restricted to regions outside the atomic vdW radii.

These Gaussian functions were incorporated into the ASP program by Good et al. [23], and MEP similarity optimization calculations were undertaken on identical molecules in different spatial orientations. The results confirmed that the analytical Gaussian functions produced similarity values comparable with grid-based calculations. with a two orders of magnitude increase in evaluation speed. They were also found to be capable of superimposing two identical molecules back on top of each other during similarity optimizations, calculations for which equivalent numerical evaluations always converged prematurely. Further speed increases have been obtained by exploit-ing the analytical nature of the Gaussian functions to replace simplex optimization rou-tines with gradient-based methods [52]. In this way. the time required for similarity optimization can be reduced by up to an order of magnitude. Willett and co-workers [53–54] have exploited the speed of Gau n-based evaluations. coupled with efficient methods for undertaking molecular alignment (genetic algorithms, MEP field graphs and bit climbers). to allow the rapid comparison of MEPs between molecular pairs (< 1 CPU second on a good workstation). Using such an approach, it has been possihle to compare whole databases of molecules against given lead structures, searching for similar MEP distributions. This is of particular interest as the molecules found to be similar often show little structural resemblance to each other, unlike those located using traditional substructure search techniques.

328

Page 335: 3D QSAR in Drug Design. Vol2

The use of Gaussian functions need not be limited to MEP calculations. In principle,any property which can be approximated to a set of Gaussians could be compared in

ecular fragments. The technique was found to work well for the small fragmentsinvestigated, but was not pursued because of the difficulty of breaking a larger drugmolecule down into a suitable set of fragments. More recently, Good and Richards [27]proposed a more elementary approach, with electron density simply approximated to thesquare of the STO-3G atomic orbital wave functions [55].The results produced usingthis technique were found more than two orders of magnitude faster than equivalent cal-culations using the numerical shape analysis of Meyer and Richards [24].with little lossin calculation accuracy. Grant and Pickup [56–57] applied Gaussian approximations tothe hard sphere model of molecular shape, further improving on the approach by ex-ploiting the fact that the product of multiple Gaussians is a single Gaussian centered at acoalescence point. This allows molecular shapes based on Gaussians to be collapsedinto increasingly simple elliptical representations. further increasing calculation speed.Gaussian functions have also been applied by Chapman [58] in the field of com-binatorial library profiling, using the approach to quantify steric dissimilarity betweenpotential reagents.

It is clear from the above studies that calculations employing Gaussian functionsoffer a versatile technique for molecular similarity evaluation. Their analytical natureand ease of calculation make for robust and rapid similarity optimizations, whiletheir empirical nature appears to have little consequence on their effectiveness. (Good et al . [43] showed that when used as parameters in QSAR calculations, analytic-ally derived MEP similarity matrices were more predictive than their numericalcounterparts.)

3.1.3. Gnomonic projectionThe technique of gnomonic projection extends the molecular properties of a structureonto the surface of a sphere. The sphere is approximated by a tessellated icosahedron, oran icosahedron and dodecahedron oriented such that the vertices of the dodecahedronlie on the vectors from the center of the sphere through the mid-points of the icoso-hedral faces [26,36,59]. Gnomonic projection displays many useful properties, includ-ing the dramatic simplification of how to map two irregular surfaces to each other.Calculation times arc rapid, especially when symmetry elements inherent to the systemare exploited to reorient the projections. As a result of these features, gnomonic pro-jection techniques have been widely applied to quantitative similarity measurement.Applications include exhaustive similarity comparisons of MEP and surface shape between molecular pairs [31], full-scale shape similarity database searching [26,36] andinteractive graphical and numerical similarity analysis [38-39,60]. The primary draw-back of gnomonic projections is that the technique is essentially restricted to com-parisons in rotational space, rendering the final results of any similarity calculationdependent on the chosen centers of projection for each molecule.

329

tron density maxima, so to reproduce the ab initio electron density distributions of mol-

Explicit Calculation of 3D Molecular Similarity

this way. Hodgkin and Richards [14] parameterized Gaussian in functions centered at elec-

Page 336: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

3.1.4. Volume measurementsThe measurement of molecular shape through the use of volume overlap calculationswas one of the first ways in which 3D quantitative similarity measures were made.Hopfinger [3] first employed such measures as QSAR parameters through the overlapcalculation between all pairs of atoms in the molecules being compared (Eq. 1) ,usingthe hard sphere approximation for each atom. Hermann and Herron [5] developed theOVID program, which optimizes molecular alignment by maximizing the overlapvolume of a prespecified subset of ligand atoms. OVID is fast but requires the user inputto define the atoms considered important in the binding mode, essentially measuring fitquality to a postulated pharmacophore. Masek et al. [6] employ the more accurate ana-lytical volume calculations presented by Connolly [6l ] to determine optimum molecularshape comparisons. It is also possible to weight for overlap between atoms with match-ing chemistry. These searches arc more exhaustive and, consequently, have a higherCPU requirement.

3.1.5. Dot Surface evaluationAs well as the OVID program, Hermann and Herron [5] developed an alternative, SUPER, for measuring surface shape complementarity. This software is used to under-take more exhaustive molecular alignment calculations. The program basically worksby comparing molecular dot surfaces of overlaid molecules with surface points con-sidered as matching when they lie within a predefined distance of one another. MEPdata can also be used to weight for matches with matching electrostatics. Badel et al.[37]use what they term bi-dimensional surface profiles to measure surface com-plementarity . A 2D slice is made through the molecule Connolly surface [62], with theangular profile of the resulting surface contour determined by moving along the cross-section, calculating the angles formed by three successive surface points. Sections ofthis resulting profile are then compared in order to measure shape complementarity. Thetechnique attempts to provide a condensed 2D measure of shape, but the results are ultimately dependent on the orientation of the molecule relative to the cross-sectionstaken. Masek et al. [63] have developed a method for molecular surface similarityevaluation based on the overlap of regions between the vdW and solvent accessiblesurface. The technique was pioneered because it is not possible to use volume overlapcomparisons to measure the similarity of ligands with vastly different size (e.g. com-parison of small molecule with exposed loop of protein ligand). Overlap is calculatedusing the analytical volume overlap calculations of Connolly [62]. Such evaluations areextremely CPU intensive, however, so to speed up the calculations, Perkins et al. [63]developed an alternative where molecular surface representations are precomputed andstored on grids for comparison. Such evaluations are thus rapid, but this does lead to theproblem that molecular flexibility cannot be dealt with explicitly, as a new grid wouldneed to be calculated in response to any change in conformation.

3.1.6. QSAR methodology exploiting field-based similarity data A number of different approaches have been taken in the application of molecularsimilarity calculations to the creation of 3D QSAR models. The easiest way to correlatemolecular similarity results with biological data is via simple regression analysis. Many

330

Page 337: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

examples of such studies can be found in the literature. Seri-Levy and Richards [65–66]exploited molecular similarity data to construct QSARs for ligand enantiomer eudismicratios. Each enantiomeric pair was superimposed through a least-squares fit to maximizethe overlap of their stereogenic centers, and the ASP program was used to measureshape and MEP dissimilarity (defined as 1-similarity). The eudismic ratios were thencorrelated against these dissimilarities. Similar calculations have also been undertakento quantify the complementarity of the peptide bond to a series of isosteres [67]. Burtet al. [19] correlated activity for a set of nitromethylene insecticides with their MEPsimilarity to the most active molecule in the series. A similar approach has also beenemployed by Montanari et al. [68]. Hopfinger et al. [3,69-73] have successfully applied their molecular shape calculations in the derivation of 3D QSAR models across a widevariety of systems. Correlations were determined through the calculation of molecularsimilarity, using each dataset molecule in turn, or even the shape of an ensemble ofaligned active molecules. as the template for regression analyses.

It is clear from the above studies that molecular similarity-based regression equationscan produce good QSARs. Nevertheless, it is unlikely that a single molecule willcontain all or most of the structural information inherent to a given ligand dataset. In aneffort to capture this information, full N X N (similarity of each ligand in dataset cal-culated against all other ligands) molecular similarity results matrices originallyemployed by Rum and Hernden in QSAR calculations [74] were extended to incor-porate 3D property data [43,75]. This would make all the SAR data embedded withinthe molecular similarity values available for model generation. The information con-tained within such matrices is similar to that present within a CoMFA data matrix. theprimary difference being that. while CoMFA will describe a region around a group ofmolecules using a large number of grid-points, the similarity matrix will attempt to de- scribe the same region using just a few numbers. Such matrices thus provide an efficient3D QSAR descriptor set, and the predictivity of resulting 3D QSARs has been found tocompare well with CoMFA calculations undertaken on identical datasets [43,76]. Theprimary problem of this methodology is that, while CoMFA is able to display thecoefficients of its QSAR equations as maps of favorable and unfavorable structuralinteractions. no such method exists for extracting chemical meaning from similaritymatrix equations. To address this, Klebe et al. [41] have proposed an approach to 3DQSAR which applies a discrete rather than cumulative 3D molecular similarity index(Eq. 7). Similarities are evaluated between molecules in the QSAR dataset and a probeatom. The resulting molecular similarity grids are then analyzed in a CoMFA-likemanner using PLS. While, as in CoMFA, this leads to much larger data matrices thanthe N X N cumulative similarity matrix approach, the resulting QSAR models can beanalyzed graphically. These graphical QSAR model depictions were shown t o be ofsuperior quality when compared to CoMFA graphical representations of the samedatasets [41],while the QSAR models showed similar predictivity.

3.2. Molecular similarity evaluations based on atom distribution

While a number of the techniques described above have utilized clever techniquesto speed up MEP and shape similarity comparisons, it is generally the case that such

331

Page 338: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

calculations are somewhat time-consuming. As a consequence. they do not lend them-selves particularly well to database searching and library profiling protocols. To lowerthe CPU requirement of 3D molecular similarity evaluation, a variety of alternativemethods has been devised which attempts to determine similarity using simplified struc- tural descriptors based on 3D atom distributions. Such techniques, while less accurate,can be calculated rapidly and are, therefore. well suited to the large dataset comparisonsrequired in database searching and library profiling. Many of these techniques are described below.

3.2.1. Shape measuresA host of approaches have been developed to allow shape similarity measurementsbased on matching atom distributions. Nilakantan et al. [28]have developed a 3D data-base search system. based on the distribution of all atom triplet distance combinationsfound within a molecule. Simplified binary signatures of template and database mole-cule triplets are coinpared during the first stage of a database search. with the triplets ofthose molecules deemed similar enough regenerated on the fly for final comparison. Thenumber of triplet matches found is used as the molecular shape property to quantifysimilarity. Good et al. [77] extended the use of simple ligand triplet descriptors to en-compass molecular surface comparisons. and altered the descriptors to allow storage onhard disk, thus facilitating faster searches. Bemis and Kuntz [78] applied a simplifiedversion of triplet matching to permit rapid 3D database clustering. The perimeter ofeach triplet in a molecule is measured, and the resultant distance is used to augment the appropriate bin of a molecular shape histogram. These histograms are then compared toquantify shape similarity. A variety of different triangle geometric properties has alsobeen exploited by Fisanick et al. [79] to aid similarity searching of molecules found inthe Chemical Abstracts Service Database. Another interesting method introduced byNorel et al. [80] employs, again, triangle descriptors to calculate molecular shape. Foreach molecule in a database, every pair of atoms within a defined distance range of eachother are extracted, and triangles are constructed by adding a third vertex in the form ofa molecular surface point. Triangle data for all the atom pair-surface triangle de-scriptions of a molecular database are stored in a hash table. This table is then usedrapidly to screen for potentially complementary ligand-receptor interactions, through complementarity calculations with equivalent receptor triangle data.

3.2.2. Measures incorporating pharmacophoric informationIt is possible to extend the descriptors described in section 3.2.1. through the incorpora-tion or chemical information with the atom distribution data. Moon and Howe [25] havedeveloped a 3D database search system where queries are built up from single or multi-ple overlapped active ligands. Database molecules are fitted to the query in multipleorientations using the clique detection algorithms of Kuntz et al. [81]. Atoms matchwhen their centers are found within a predetermined distance of each other. The scoreapplied then depends on the degree of chemical complementarity between matchedatoms. Query atoms are defined as required or optional: database molecules must havean atom match with required atoms in order to be retrieved as a hit, while optional

332

Page 339: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

Fig. 3. Pharmacophore centre abstraction and triplet description creation paradigm for pharmacophorebased similarity measures (references [79,81-83 1). Key to pharmacophore center types: A= hydrogen-bond acceptor; D = hydrogen-bond donor; Ar = aromatic ring centroid; Lip = lipophilic centroid (center of groupof atoms with –0 charge); + = charged positive. Adapted from reference [82].

atoms matches are used only to augment the similarity score after matching hasoccurred.

All of the techniques described so far i n section 3.2 are based on descriptors derivedfrom explicit molecular conformations. Thus, while fast, they do not account for theinherent flexibility which might be present in a given system. The primary problem withattempting to incorporate molecular flexibility in conjunction with triplet similaritymeasures is the n3 dependence (n = number o f centers per molecule) of the descriptorkeying times. To overcome this problem, Good and Kuntz [82] developed a triplet-based measure grounded on a simplified description of each molecule. Rather thanusing all the heavy atoms in a structure, triplet calculation is restricted to phar- macophoric centers (e.g. donor, acceptor, lipophilic; see Fig. 3). In this way, theCPU requirement for descriptor calculation is dramatically reduced, allowing the

333

Page 340: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

incorporation of data from multiple conformations. Pickett et al. [83] extended this ideausing their PDQ (pharmacophore derived query) approach, where triplet descriptors arederived from multiple 3D database searches against a battery of predetermined theoreticalpharmacophores. Such a technique has the advantage of building up a full profile of thenumber and type of pharmacophores present in a given dataset. This can be useful inlibrary profiling where dataset diversity measures are extremely important. Quality of fit toany given pharmacophore can also be determined easily. The major drawback with thePDQ approach is its keying time which is orders of magnitude slower than internal phar-macophore measurement on the fly [82,84] .This type of methodology has been further ad-vanced by the program Chem-Diverse [84-85], which exploits pharmacophore tripletinfomiation in combinatorial library profiling. The Chem-Diverse protocol for molecularsimilarity calculation is based on trying to obtain the maximum coverage of pharma-cophore space by potential combinatorial chemistry products. The overlap of pharma-cophore triplets between entire datasets is compared, allowing the assessment of theirsimilarity and hence diversity. Such methodology is currently an area of much interest [86].

4. Conclusion

This chapter has summarized many of the techniques available for quantifying explicit3D molecular similarity. A multitude of molecular properties, indices and protocolshave been presented, from the detailed comparison of MEP and shape for QSAR con-struction. through to the rapid analysis of pharmacophore triplet descriptors for diversityanalysis. The sheer variety of such approaches permits their application in many keyareas of molecular design. Such flexibility makes quantitative molecular similarity eval-uation an important tool in the hands of the computational chemist. With the ever increasing demand for the quantification of diversity in combinatorial library profiling,this importance is likely to grow.

References

1. Cramer, R.D. III, DePriest, S.A., Patterson DE. and Hecht, P., The developing practice of comparativeMolecular Field Analysis, In 3D QSAR in drug design. Kubinyi, H. (Ed.) ESCOM, Leiden, 1993,

Martin. Y.C., Bures, M.G., Danaher, EA., DcLazzer, J , Lico, I. and Pavlik, P.A., A few new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists,J. Comput.-Aided Mol. Design, 7 (1993) 83–102.Hopfinger, A.J., A QSAR investigation of DHFR inhibiton by Bakers Triazines based upon molecularshape analysis, J. Am. Chem. Soc., 102 (1980) 7196–7206.Hopfinger, A.J., Theory and analysis of molecular potential energy fields in molecular shape analysis: A QSAR study of 2.4-diamino-5-benzylpyrimidines as DHFR inhibitors, J. Med. Chem., 26 (1983)990-996.Hermann, R.R. and Herron, D.K., OVlD and SUPER: Two overlapprograms for drug design,J. Comput.-Aided Mol. Design, 5 (1991) 511–524. Masek, B .B., Merchant, A. and Matthew. J .B., Molecular shape comparisons of angiotensin II receptorantagonist, J. Med. Chem., 36 (1993) 1230–1238. Carbo. R., Leyda, L. and Arnau, M., An electeon density measure of the similarity between twocompounds, Int. J. Quantum Chem., 17 (1980) 1185–1189.

pp. 443–485.2 .

3.

4.

5 .

6.

7.

334

Page 341: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

Downs. G.M. and Willett, P., Similarity searching in databases of chemical structures. In Lipkowitz, K.B. and Boyd. D.B. ( Eds.) Reviews in computational chemistry, Vol. 7, VCH. New York, 1995, pp. 1–66.Holland. J.D., Ranade, S.S. and Willett, P., A fast algorithm for selecting sets of dissimilar molecules from large chemical databases, Quant. Struct.-Act. Relat., 14 (1995) 501–506.

field, Inf. J. Quantum Chem. Quantum Biol. Symp., 14 (1987) 105–110.Hodgkin, E.E. and Richards, W.G., Molecular samilarity, Chem. Br., (1988) 1141–1144.Petke, J.D., Cumulative and discrete similarity analysis of electrostatic potentials and fields, J. Comp.Chem., 14 (1993)928–933.Bowen-Jenkins, P.E. and Richards. W.G., Molecular similarity in terms of valence electron density,J. Chem. Soc. Chem. Commun. (1986) 133–135.Hodgkin, E.E. and Richards. W.G., A semi-empirical method for calculating molecules similarity, J. Chem. Soc. Chem. Commun., (1986) 1342–1344.Carbo, R. and Domingo, L., LCAO-MO similarity measures and taxonomy, Int. J. Quantum Chem.,

Amovilli, C. and McWeeny, R., Shape and similarity: Two aspects of molecular recognition, J. Mol. Struct., 227 (1991) 1–9. Cioslowski, J. and Fleischmann, ED., Assessing molecular similarity from results of ab initio electronic structure calculations, J. Am. Chem. Soc., 113 (1991) 64–67.

8.

9.

10. Hodgkin, E.E. and Richards. W.G., Molecular similarity based on electrostatic potentials and electric-

11.12.

13.

14.

15.

16.

17.

32 (1987) 517–545.

18. Cooper, D.L. and Allen, N.L., A novel approach to molecular similarity, J. Coinput.-Aided Mol. Design. 3 (1991) 253–259.

19. Burt. C., Huxley, P. and Richards, W.G., The application of molecular similarity calculations, J. Comp.Chem., 11 (1990) 1139–1146.

20. Burt, C. and Richards, W.G., Molecular similarity: The introduction of flexible fitting, J. Cornput.-Aided Mol. Design 4 (1990) 231–238.

21. Richard. A.M., Quantitative comparason of MEPs for structure activity studies, J. Comp. Chem., 12 (1991) 959–969.

22. Good, A.C., The calculation of molecular similartity: Alternative formulas, data manipulation and graphical display, Mol. Graph., 10 (1992) 144–151

23. Good, A.C., Hodgkin, E.E. and Richards. W.G., The utilisation of Gaussian functions for the rapid evaluation of molecular similarity, J. Chem. Inf. Comput. Sci., 32 (1992) 188–191.Meyer, A.M. and Richards. W.G., Similarity of molecular shape, J. Comput.-Aided Mol. Design. 5 (1991) 426–439. Moon, J.B. and Howe, W.J., 3D database searching and de novo design construction methods in molecular design, Tetrahedron Comput. Methodol., 3 (1992) 697–711. van Geerestein, V.J., Perry, NJ., Grootenhuis, P.D.J. and Haasnoot, C.A.G., 3D database searching on the basis of shape using the SPERM prototype method, Tetrahedron Comput. Methodol., 3 (1992)595-613.Good, A.C. and Richards, W.G., Rapid evaluation of shape similarity using Gaussian functions, J. Chem. Inf. Comput. Sci., 33 (1993) 112–116. Nilakantan, R., Bauman, N. and Venkataraghavan, R., New method for rapid characterization of molecular shapes: Applications in drug design, J. Chem. Inf. Comput. Sci., 33 (1993) 79–85.Hahn, M., Three-dimensional shape-based searching of conformationaly flexible compounds, J. Chem.Inf. Comput. Sci., 37 (1997) 80–86.Namasivaynm, S. and Dean. P.M., Statistical method for surface pattern matching between dissimilarmolecules: Electrostatic potentials and accessible surfaces, J. Mol. Graph., 4 (1986) 46–50.

projection, J. Mol. Graph., 5 (1987) 97–100. Dean, P.M. and Chau, P.-L., Molecular recognition Optimised searching through molecular 3-spacefor pattern matches on molecular surfaces, J. Mol. Graph., 5 (1987) 152–158.Dean, P.M., Callow, P. and Callow, P.-L., Molecular recognition: Blind searching for regions of strong

24.

25.

26.

27.

28.

29.

30.

31. Chau, P.-L. and Dean, P.M., Molecular recognition: 3D surface structure comparison by gnomonic

32.

33.structural match on the surface of two dissimilar molecules, J. Mol. Graph.. 6 (1988) 28–34.

335

Page 342: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

34.

35.

Manaut, M., Sanz, F., Jose. J. and Milesi, M., Automatic search f o r maximum similarity between MEPdistributions, J. Cornput.-Aided Mol. Design, 5 (1991) 371–380. Sanz, F., Manaut, F., Rodriguez, J., Lozoya, E. and Lopez-de-Brinao, E., MEPSIM: A computationalpackage for analysis and cornparison of Molecular Electrostatic Potentials, J. Cornput.-Aided Mol. Design, 7 (1993) 337-347.

36. Perry, N.C. and van Geerestein, V.J., Database searching on the basis of 3D molecular similarity using the SPERM program, J. Chem. Inf. Comput. Sci., 32 (1992) 607-616.

37. Badel, A., Mornon, J.P. and Hazout, S., Searching for geometric molecular shape complementarityusing bi-dimensional surface profiles, J. Mol. Graph.. 10 (1992) 205–211.

38. Blaney, F.E., Finn, P., Phippen, R.W. and Wyatt. M., Molecular surface comparison: Application tomolecular design, J. Mol. Graph., 11 (1993) 98-105.

39. Blaney, F.E., Naylor, D. and Woods, J., MAMBAS: A real time graphics environment for QSAR, J. Mol. Graph., 11 (1993) 157-165.

40. Reynolds, C.A., Burt, C. and Richards. W.G., A linear molecular similarity index, Quant. Struct.-Act.Relat., 11 (1992) 34-35.

41. Klebe, G., Abraham. U. and Mietzner, T., Molecular similarity indices in a comparative analysis(ComSIA) of drug molecules to correlate andpredict their biological activity, J. Med. Chem., 37 (1994) 4130–4146.Kearsley, S.K. and Smith. G.M., An alternative method for the alignment of molecular structure: Maximizing electrostatic and steric overlap, Tetrahedron Comput. Methodol., 3 (1990) 615-633.Good, A.C., Peterson, S.J. and Richards. W.G., QSARs from similarity matrices: Technique validation and application in the comparison of different similarity evaluation methods, J. Med. Chem., 36 (1993) 2929-2937.Sneath, P.H.A. and Sokal, R.R., Numerical Taxonomy, W.H. Freeman, San Francisco, CA, 1973.Good, A.C., 30 molecular similarity indices and their application in QSAR studies, In Dean, P.M. (Ed.)Molecular similarity in drug design, Blackie Academic and Professional, Glasgow, 1995,pp. 138-162.Brown, R.D. and Martin. Y.C., The infomation content of 2D and 3D structural descriptor relevant toligand-receptor binding, J . Chem. Inf. Comput. Sci., 37 (1997) 1-9.Burgess, E.M., Ruell, J.A., Zalkow, L.H. and Haugwitz, R.D., Molecular similarity from atomic electro- static multipole comparisons: Application to anti-HIV drugs, J . Med. Chem., 38 (1995) 1635-1640.Measures, P.T., Mort. K.A., Allan, N.L. and Cooper, D.L., Applications of momentum space similarity, J. Cornput.-Aided Mol. Design, 9 (1995) 331-340.Automated Similarity Package, developed and distributed by Oxford Molecular, the Medewar Centre,Oxford Science Park, Oxford OX4 4GA. U.K.Bone, R.G. and Villar, H.O., Discriminating D1 and D2 agonists with a hydrophobic similarity index,J.Mol. Graph., 13 (1995) 165-74.Szabo, A. and Ostland, N.S., Modem quantum chemistry, Macmillan, New York, 1982, pp. 410-412.McMahon, A.J. and King, P.M., Optimization of the Carbo molecular similarity index using gradientmethods, J. Comp. Chem., 18 (1997) 151-158.Wild, D.J. and Willett, P., Similarity searching in files of three-dimensional chemical structures: Alignment of molecular electrostatic potential fields with a genetic algorithm. J. Chem. Inf. Comput.Sco., 36 (1996) 159-167.Thorner, D.A., Wild, D.J., Willett, P. and Wright, P.M., Similarity seearching in files of three-dimensional chemical structures: Flexible field-based searching of molecular electrostatic potentials,J. Chem. Inf. Comput. Sci., 36 (1996) 900–908. Frisch, M.J., Head, G.M., Schlegel, H.B., Raghavachari, K., Binkley, J.S., Gonzalez, C., Defrees, D.J.,Fox, D.J., Whiteside, R.A., Seeger, R., Melius, C.F., Baker, J., Martin. R., Kahn, L.R., Stewart, J.J.P.,Fluder, E.M., Topiol, S. and Pople, J.A., Gaussian 88, Gaussian. Inc., Pittsburgh, PA, U.S.A.

56. Grant, J.A. and Pickup, B.T., A Gaussian description of molecular shape, J. Phys. Chem., 99 (1995) 3503-3510.

57. Grant, J.A., Gallardo, M.A. and Pickup, B.T., A fast method of molecular shape comparison: A simpleapplication of a Gaussian description of molecular shape, J. Compt. Chem., 17 (1996) 1653-1666.

58. Chapman, D., The measurement of molecular diversity: A three-dimensional approach, J . Comput.-Aided Mol. Design, 10 (1996) 501-512.

42.

43.

44.45.

46.

47.

48.

49.

50.

51.52.

53.

54.

55.

336

Page 343: 3D QSAR in Drug Design. Vol2

Explicit Calculation of 3D Molecular Similarity

59. Bladen, P., A rapid method for comparing and matching the spherical parameter surfaces of moleculesand other irregular objects, J. Mol. Graph., 7 (1989) 130-137.

60. Blaney, F.E., Edge, C., Phippen, R.W., Molecular surface comparison: 2. Similarity of electrostaticsurface vectors in drug design, J. Mol. Graph., 13 (1995) 165-74.

61. Connolly, M.L., Computation of molecular volume, J. Am. Chem. Soc., 107 (1985) 1118-1124.62. Connolly, M.L., Analytical molecular surface calculation, J. Appl. Cryst., 16 (1983) 548-558.63. Masek, B.B., Merchant, A. and Matthew, J.B., Molecular skins: A new concept for quantitative shape

matching of a protein with its small molecular mimics, Protein, 17 (1993) 193–202. 64. Perkins, T.D.J., Mills, J.E.J. and Dean, P.M., Molecular surface-volume and property matching to

superpose fexible dissimilar molecules, J. Cornput.-Aided Mol. Design, 9 (1995) 479–490. 65. Seri-Levy, A. and Richards, W.G., Chiral drug potency: Pfeiffer’s rule and computed chirality

coefficients, Tetrahedron Asymmetry. 4 (1993) 1917-1921.66. Seri-Levy, A., West, S . and Richards, W.G., Molecular similarity, quantitative chirality and QSAR for

chiral drugs, J . Med. Chem., 37 (1994) 1727-1732.67. Dughan, L., Burt, C. and Richards, W.G., The study of peptide bond isosteres using molecular

similarity, J. Mol. Struct., 235 (1991) 481-488.68. Montanari, C.A., Tute, M.S., Beezer, A.E. and Mitchell, J.C., Determination of receptor-bound drug

confornations by QSAR using fexible fitting to derive a molecular similarity index, J. Cornput.-AidedMol. Design, 10 (1996) 67-73.

69. Cardozo, M.G., Kawai, T., Iimura, Y., Sugimoto, H., Yamanishi, Y. and Hopfinger, A.J.,Conformational analyses and molecular shape comparisons of a series of inandone-benzylpiperidine inhibitors of acetylcholinesterase, J. Med. Chem., 35 (1992) 590-601.

70. Tokarski, J.S. and Hopfinger, A.J., Three-dimensional molecular shape and analysis-quantitative struc-ture-activity relationship of a series of cholecystokinin-A receptor antagonists, J. Med. Chem., 37 (1994) 3639-3654.Burke, B.J., Dunn, W.J. III and Hopfinger, A.J., Construction of a molecular shape analysis-three-dimensional quantitative structure-analysis relationship for an analog series of pyridobenzodiazepinoneinhibitor of muscarinic 2 and 3 receptor, J. Med. Chem., 37 (1994) 3775–3788. Rhyu, K.B., Patel, H.C. and Hopfinger, A.J., A 3D-QSAR study of anticoccoidial triazines using molecular shape analysis, J. Chem. Inf. Comput. Sci., 35 (1995) 771-778.Holzgrabe, U.and Hopfinger, A.J., Conformational analysis, molecular shape comparison, andpharma-cophore identification of different allosteric modulators of muscarinic receptors, J. Chem. Inf. Comput. Sci., 36 (1996) 1018-1024.

74. Rum, G. and Herndon, W.C., Molecular similarity concepts: 5. Analysis of steroid protein binding constants, J. Am. Chem. Soc., 113 (1991) 9055-9060.

75. Good, A.C., So, S . and Richards, W.G., Structure Activity Relationships f r o m Similarity Matrices,J. Med. Chem., 36 (1993) 433-438.

76. Horwell, D.C., Howson, W., Higginbottom, M., Naylor, D., Ratcliffe, G.S. and Williams, S.,Quantitative structure-activity relationships (QSARs) of N-terminus fragments of NK1 tachykinin antag-onists: A comparison of classical QSARs and three-dimensional QSARs from similarity matrices. J. Med. Chem., 38 (1995) 4454–4462.Good, A.C., Ewing, T.J.A., Gschwend, D.A. and Kuntz, I.D., New molecular shape descriptors:Application in database screening, J. Cornput.-Aided Mol. Design, 9 (1995) 1-12.Bemis, G.W. and Kuntz, I.D., A fast efficient method for 20 and 3D molecular shape description, J. Cornput.-Aided Mol. Design, 6 (1992) 607-628.Fisanick, W., Cross, K.P. and Rusinko, A. III, Similarity searching on CAS registry substances:1. Global molecular property and generic atom triangle geometric searching, J. Chem. Inf. Comput.Sci., 32 (1992) 664-674.Norel, R., Fischer, D., Wolfson, H.J. and Nussinov, R., Molecular Surface Recognition by a ComputerVision Based Technique, Protein Eng., 7 (1994) 39–46. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T.E., A geometric approach to macromolecule-ligand interactions,J. Mol. Biol., 161 (1982) 269-288.Good, A.C. and Kuntz, I.D., Investigating the extension of pair-wise distance pharmacophore measures to triplet based descriptors, J. Cornput.-Aided Mol. Design, 9 (1995) 373–379.

71.

72.

73.

77.

78.

79.

80.

81.

82.

337

Page 344: 3D QSAR in Drug Design. Vol2

Andrew C. Good and W. Graham Richards

83.

84.

85.

86.

Pickett, S.D., Mason, J.S. and Melay, I.M., Diversity profiling and design using 3D pharmacophores —pharmacophore derived queries (PDQ), J. Chem. Inf. Comput. Sci., 36 (1996) 1214-1233.Davies, E.K. and Briant, C., Combinatorial chemistry library design using pharmacophore diversity, URL http://www.awod .com/netsci/Science/Combichem/feature05.htlm.Chem-Diverse, developed and distributed by Chemical Deisng Ltd., Roundway House, Cromwell Park,Chipping Norton, Oxon OX7 5SR, U.K.Lewis, R.A., Good, A.C. and Pickett, S.D., Quantification of molecular similarity and its application to combinatorial chemistry, In Computer-assisted lead finding and optimization, Proceedings of the 11th European Symposium on QSAR, Lausanne. Switzerland, 1996, van de Waterbeemd, H., Testa, E. and Folkers, G. (Eds.) Wiley-VCH, Easel, 1997, 135–156.

338

Page 345: 3D QSAR in Drug Design. Vol2

Novel Software Tools for Chemical Diversity

Robert S. Pearlman*and K.M. SmithLaboratory for Molecular Graphics and Theoretical Modelling College of Pharmacy, University

of Texas, Austin TX 78712. U.S.A.

1. Introduction: Diversity-related Tasks

Although the concept of chemical diversity has been intuitively considered by chemistsfor many years, the advent of combinatorial chemistry and high-throughput screening have focused attention on the need for efficient software tools to address a variety ofdiversity-related tasks. Perhaps the most fundamental task related to chemical diversityis that of selecting a diverse subset of compounds from a much larger population ofcompounds. The obvious objective of that task is to identify a subset which best repre-sents the full range of chemical diversity present in the larger population, either to avoidthe time and expense of synthesizing ‘redundant’ compounds or to avoid the time and expense of screening ‘redundant’ compounds. However, in addition to simple subsetselection, recent practical experience has revealed other equally (possibly more) import-ant diversity-related tasks which must al so be addressed in the pharmaceutical andagrochemical industry.

High-throughput screening (HTS) can be an effective approach to lead discovery butis obviously limited by the structural diversity of compounds being screened. What ifthat population does not include representatives of one or more chemical classes or pharmacophores? Identifying diversity-voids or missing diversity is an important taskand, obviously, filling in diversity voids with compounds from other sources is equally important. It is also important to be able to recognize and choose among the many com-pounds which might fill a particular diversity-void. These tasks become increasinglyimportant as the number of combinatorially synthesizable compounds increases withadvances in combinatorial chemical methods. and as the number of commercially avail-able compounds increases. Similarly, comparing diversities of alternative compoundlibraries is another important diversity-related task.

In addition to simple diverse subset selection, it is often desirable to select a subsetchosen not only to provide structural diversity, but also to satisfy one or more non- structural criteria or ‘biases’. For example, compound availability and/or physical prop-erties may be important when selecting a subset for HTS purposes. Reagent cost and/orreagent usage frequency may be important when deciding which compounds actually tosynthesize ou t of a very large range of compounds synthetically accessible throughcombinatorial chemical methods. Clearly, non-structurally biased subset selection willyield subsets with somewhat less structural diversity than a subset chosen simply tomaximize structural diversity, but practical considerations often make biased subsetselection a very important diversity-related task.

*To whom correspondence should be addressed.

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design Volume 2. 339–153.©1998Kluwer Academic Publishers. Printed in Great Britian.

Page 346: 3D QSAR in Drug Design. Vol2

Robert S. Pearlman and K.M. Smith

The following section will discuss how chemical structures can be described forchemical diversity purposes. We shall refer to such descriptors as ‘metrics’ of a ‘chemistry-space’. An extremely important but often overlooked diversity-related task is that ofchoosing the chemistry-space metrices which best represent the structural diversity of agiven population of compounds . For example, combinatorially generated populations, orpopulations chosen to be similar to a particular active (‘lead’) coinpound, are inherentlyless diverse than other more randomly assembled populations. Thus, it is quite reason-able to expect that metrics specifically tailored to focus on the limited diversity of such’focused populations’ will provide some advantages over metrics which were developedto best represent the broad range of diversity found i n ‘non-focused populations’. Lastbut not least, the notion of considering alternative chemistry-space metrics remindsus of the need for a rational approach for validating chemistry-space metrics — animportant and often misunderstood diversity-related task.

2.

Before discussing software tools for addressing the aforementioned diversity-relatedtasks, i t is useful to review a few fundamental concepts. The notions of chemical simi-larity, dissimilarity and, consequently, ‘diversity’ are all related to the distance betweenchemical compounds positioned in some multi-dimensional ‘chemistry-space’, the axesof which are the structure-related ‘chemistry-space metrics’ mentioned above. In orderto be a well-defined vector-space, our chemistry-space axes must be orthonormal (mu-tually orthogonal, uncorrelated and normalized). We must also define a method forcomputing a true distance (one which satisfies the triangle inequality) within that space.The ‘diversity’ of compounds positioned in chemistry-space is intuitively related to theinter-compound distance as measured in that space.

Whereas the dimensionality of our physical world is predefined as 3, the dimension-ality of a chemistry-space (as well as the definition of axes) can be chosen to best repre-sent the diversity of a given population of compounds. Most software for addressingchemical diversity uses some form of ‘molecular fingerprint’ to describe each com-pound in a population. Fingerprints are bit-strings (sequences of 1 s and 0s) representingthe answers to yes/no questions about the presence or absence of various substructural features within the molecular structure of a given compound. Although not often dis-cussed in such terms, each bit represents an axis in a multi-dimensional chemistry-space. Each axis could have either of two values: 0 or 1. Fingerprints represent veryhigh-dimensional chemistry-spaces: typically between 150 and 200 bits for MDL appli-cations [1], a few thousand for Tripos [2] and Daylight [3] applications or millions forthe pharmacophore fingerprints introduced by Chemical Design Ltd. [4]. With the ex-ception of the latter, fingerprints were developed to enable similarity and substructuresearching.

Comparing fingerprints using the well-known Tanimoto similarity index, T, hasproven useful for finding similar coinpounds within very large databases of chemicalstructures. Thus, it is typically assumed that the ‘Tanimoto dissimilarity’, (1-T), repre-sents a useful measure of distance within such high-dimensional spaces. Distance-based

Chemistry-space Concepts and Diversity-related Algorithms

340

Page 347: 3D QSAR in Drug Design. Vol2

Novel Software Tools for Chemical Diversity

diversity algorithms consider only the distances between compounds in chemistry-spaceand are used to select diverse subsets by choosing compounds guaranteed to be distantfrom other coinpounds in the selected subset. Practical experience and various valida-tion studies (e.g. Brown and Martin [5]) indicate that such high-dimensional, distance-based diversity-related algorithms are, indeed, useful for simple diverse subsetselection. However, the following criticisms should be considered:

1. The ‘questions’ corresponding to the bits of fingerprints were specifically devel-oped to identify compounds similar to one another. They were not developed tofocus on differences which would constitute a structurally diverse subset.Although some software permits users to redefine the ‘questions’ corresponding tothe bits of a fingerprint, this is rarely (if ever) done in actual practice. Thus, we usefingerprints developed to find similar compounds in diverse populations to selectdissimilar (diverse) compounds not only from diverse populations, but also fromfocused populations. The Tanimoto similarity index was developed to gauge similarity, not dissimilarity. That is, if T(A,B) and T(A,C) (the Tanimoto similarities between compounds A, Band C) are 0.9 and 0.8, compounds A and B are probably more structurally similarthan compounds A and C. However, it is not at all certain that if (1 – T(X,Y)) and(1 – T(X,Z )) are 0.9 and 0.8, that compounds X and Y are more dissimilar (diverse) than compounds X and Z.Although the Tanimoto similarity index appears useful for the similarity purposesfor which it was designed, (I – T) is not a valid measure of distance since it doesnot obey the triangle inequality. Thus, it appears that distance-based diversity (and,possibly, similarity) algorithms might be improved by using either the Euclideandistance or the Hamming distance as suggested by Pearlman [6,7].

2.

3.

4.

Despite these criticisms, distance-based diversity algorithms appear to work satis-factorily for simple diverse subset selection, the most fundamental of the diverse-relatedtasks. However, a s the term implies, distance-based algorithms consider only inter-compound distances, not the absolute positions of compounds in chemistry-space. As aresult, distance-based algorithms are inherently limited and are ill-suited for many ofthe other diversity-related tasks. For example, locating diversity voids in chemistry-space is essentially impossible since distance-based algorithms do not reference1ocation.

In contrast, by dividing each axis of a multi-dimensional space into ‘bins’, cell-baseddiversity algorithms partition chemistry-space into a lattice of multi-dimensional hyper-cubes and, thereby, consider not only inter-compound distance, but also absolute po-sition of compounds in chemistry-space. As will be illustrated below, this additionalinformation makes cell-based diversity algorithms much more powerful and easilyapplicable to all of the diversity-related tasks mentioned in the preceding sec-tion. However, whereas distance-based algorithms can be applied in either a high-dimensional or low-dimensional representation of chemistry-space. cell-based algor- ithms can only be applied in low-dimensional chemistry-spaces. (For example, a1000-bit fingerprint corresponds to a 1000-dimensional chemistry-space which would

341

Page 348: 3D QSAR in Drug Design. Vol2

Robert S . Pearlman and K.M. Smith

be partitioned into 21000 cells — an astronomically large number of cells almost all ofwhich would contain no compounds.)

In order to take advantage of the power and utility of cell-based diversity algorithms,we must identify low-dimensional matrices from which to construct a less than 10-dimensional chemistry-space satisfactory for diversity purposes. There have beennumerous attempts to use ’traditional’ molecular descriptors (e.g. molecular weight, shape-factors, estimated logP, surface area, dipole moment, HOMO-LUMO gap. etc.)as the axes of a low-dimensional chemistry-space. There are three basic reasons forwhich these efforts have not proven particularly useful:

1 .

2.

Many of the ‘traditional’ descriptors are highly correlated; the axes of a vector-space should be orthogonal (uncorrelated).Some traditional descriptors (e.g. logP and pKa) are strongly related to drug trans-port or pharmacokinetics but are only weakly related to receptor affinity or activityas measured in most screening-based drug discovery efforts.The traditional descriptors are whole-molecule descriptors which convey very littleinformation about the details of molecular substructural differences which are thebasis of structural diversity.

3.

The first problem above could be addressed to a limited extent by using principal com-ponents of the ‘traditional’ descriptors as the axes, but the second and third more funda-mental problems would remain. The advantages of cell-based methods cannot berealized unless some non-traditional chemistry-metrics can be found which enable the definition of a meaningful low-dimensional chemistry-space.

3.

In 1989, Burden [8] suggested that a ‘molecular ID number’ could be defined in termsof the two lowest eigenvalues of a matrix representing the hydrogen-suppressed con-nection table of the molecule. More specifically, Burden suggested putting the atomicnumbers on the diagonal of the matrix. Off-diagonal matrix elements were assignedvalues of 0.1 times the nominal bond-type if the two atoms are bonded and 0.001 if thetwo atoms are not bonded. He also added 0.01 to the off-diagonal elements representing‘leaf edges’ in the molecular graph (i.e. terminal bonds to the last atom in a chain).In suggesting that structurally similar compounds would be near each other in anID-ordered list, Burden was actually proposing a one-dimensional chemistry-space.Since fingerprint-based similarity searching method? were just becoming availablefor modestly sized databases (under 0.5 million compounds), Burden’s seeminglyfar-fetched suggestion was generally ignored.

In 1993, eager to find some sort of ‘similarity searching method’ applicable to theChemical Abstracts Service (CAS) Registry File of approximately 12 million structures,Rusinko and Lipkus [9] applied Burden’s suggestion to a test database of 60 000 c o n -pounds. The results were relatively poor compared to fingerprint-based similaritysearching methods, but much better than expected. They also experimented with thenotion of assigning a constant value to all diagonal matrix elements or a constant value

BCUT Values: Novel Low-dimensional Chemistry-space Metrics

342

Page 349: 3D QSAR in Drug Design. Vol2

Novel Software Tools for Chemical Diversity

for all bonded off-diagonal elements but, in each case, were using the lowest eigenvalueof a single matrix to define a one-dimensional chemistry-space.

Based on Burden’s (B) original suggestion and CAS’s (C) ‘validation’ of thatsuggestion, Pearlman at the University of Texas (UT) added the following significantextensions which resulted in what we now refer to as the BCUT approach [ 6,7,10]:

1. Given that a one-dimensional chemistry-space showed some signs of promise, asimilarly defined multi-dimensional chemistry-space should be even more pro-mising. This is easily accomplished by using more than one matrix to representeach compound.Mathematical analysis reveals that all eigenvalues of such matrices contain infor-mation related to molecular structure. The lowest and highest eigenvalues reflectthe most different information (are least correlated). Considering both the lowestand highest eigenvalues provides another mechanism to extend Burden’s originalsuggestion to a multi-dimensional space.Pharmaceutical and agrochemical researchers are interested i n structural diversitywith respect to the way in which compounds might interact with a bioreceptor.Since atomic number has almost no bearing on the strength of intermolecular inter-actions, much more relevant metrics can be defined by putting more relevantatomic properties on the diagonals of four ‘classes‘ of BCUT matrices: atomiccharges, polarizabilities. H-bond donor- and acceptor-abilities corresponding to theelectrostatic. dispersion and H-bonding modes of bimolecular interaction.Burden’s suggestion of using nominal bond-type information for the off-diagonalelements of the matrices was very good and should be retained. However. usingCONCORD [11] to generate 3D structures opens the possibility of putting variousfunctions of interatomic distance on the off-diagonals and, thereby, definingmetrics which encode information about the 3D structure.Another approach to incorporating aspects of 3D structure is to use atomic surfaceareas to weight the atomic properties placed on the diagonals.Noting that the matrices contain atomic properties on the diagonals and connect-ivity information on the off-diagonals, there is clearly a need for a scaling-factor toprovide the proper balance of the two types of information.

2 .

3.

4.

5 .

6.

Given the large number of possible combinations of diagonal, off-diagonal andscaling-factor choices, it is clear that some method must be developed for rationallydeciding which combination of BCUT values (eigenvalues) would form the chemistry- space which best represents the structural diversity of a given population of compounds.Many combinations can be quickly eliminated by requiring that the axes of the chermistry-space be mutually orthogonal. For example, different charge-related values (e.g.Gasteiger-Marsili charges, AM1 charges, AM1 densities, etc.) all convey the same fun-damental information and, therefore, will be intercoil-elated. On the other hand, wiht theexception of the H-bond-ability matrices 1, both the highest and lowest eigenvalues should be relevant and turn out to be relatively uncorrelated. (The lowest eigenvalues ofthese matrices contain ‘information’ about atoms in the molecule whch are neitherH-bond donors or acceptors. Since all non-H-bonding atoms have the same zero value

343

Page 350: 3D QSAR in Drug Design. Vol2

Robert S. Pearlman and K.M. Smith

Fig. 1. (a) Cartoon representation of non-optimal two-dimensional chemistry space showing poorly dis-tributed compounds. (b) Representation of a better two-dimensional chemistry-space showing more evenlydis tributed compounds.

of H-bond-ability, the lowest eigenvalues of these matrices are information-poor.)Sometimes, a 6-dimensional chemistry-space (two charge-BCUTs, two polarizabilityBCUTs and two H-bond-BCUTS) yields the best chemistry-space for a given popu-lation. Often, the H-bond-acceptor- and charge-BCUTs are correlated, yielding a

344

Page 351: 3D QSAR in Drug Design. Vol2

Novel software Tools for Chemical Diversity

5-dimensional chemistry-space. Pearlman and Smith [6,7,10] developed a powerful‘auto-choose’ algorithm which automatically determines both the best dimensionality ofthe chemistry-space and best choice of exactly which metrics best represent thestructural diversity of a given population of compounds.

The rationale for the auto-choose algorithm can most easily be explained by referenceto Figs la and 1b which depicts two ‘cartoon’ representations of the same population ofcompounds distributed in two different two-dimensional chemistry spaces. Recall thatthe most fundamental of diversity-related tasks is that of rational, structure-baseddiverse subset selection. Suppose that one were asked to select a subset of 25 diversecompounds. The chemistry-space defined by axes i and j in Fig. 1 a does a poor job ofdistinguishing one compound from another and positions most of the compounds in thecentral ‘cell’, thereby forcing us to choose several compounds from that cell at random.Clearly, this would be a very poor choice of chemistry-space axes for this population ofcompounds since it would provide little or no advantage over a random choice madewithout reference to any chemistry-space considerations. In contrast, the chemistry-space defined by axes k and l in Fig. 1b does a much better job of distinguishing com-pounds based on structural diversity and enables the selection of a subset of 24 diversecompounds (satisfactorily close to the desired size of 25) by simply choosing one com-pound from each of the 24 occupied cells. Note that by choosing compounds as close tothe center of each occupied cell as possible, we have described a very natural cell-based subset selection algorithm which yields a set of compounds and ’covers’ the range of di-versity represented by the population and includes compounds which are mutually distant from one another. Clearly, the chemistry-space yielding the most uniform dis-tribution of compounds would best suit our purposes. However, real-world populationswill not be distributed in a perfectly uniform fashion. Valence and steric considerationslimit the continuity of structures which can be achieved and, more obviously, the history of discovery efforts at a given company with result in corporate databasesreflecting those focused efforts. However, the χ-squared statistic provides a measure ofhow well one distribution matches another. Thus, minimizing the χ-squared statistic canreveal which combination of metrics yields a distribution of compounds closest to thehypothetical uniform distribution. Simultaneously, by considering a range of dimen-sionalities, this χ-squared approach also reveals the dimensionality of the chemistry-space which best represents the diversity of the given population of compounds.Clearly, in order to include as much structure-distinguishing information as possible,the algorithm will choose the highest-possible dimensionality which does not result incorrelated (non-orthogonal) axes. (Correlated axes would result in some highly popu-lated cells along a diagonal of the chemistry-space and corresponding empty cellssurrounding that diagonal. This would yield a high χ-squared and the chemistry-spacewould be rejected).

Note that the χ-squared approach yields the combination of metrics which best repre-sents the diversity of a given population of compounds. For ‘truly diverse’ populationsof compounds, we are not surprized to find the same (or similar) ‘universal chemistry-space’ definition being reported by different users. In contrast, populations result-ing from generating different combinatorial libraries should be expected to occupy dif-ferent and individually less diverse regions of ‘universal chemistry-space’. Clearly, the

345

Page 352: 3D QSAR in Drug Design. Vol2

Robert S. Pearlman and K. M. Smith

χ-squared approach enables us to tailor a chemistry-space to best represent a focusedpopulation — an important diversity-related task listed in the Introduction section ofthis chapter.

It should also be noted that this auto-choose, χ-squared approach can be applied notonly to combinations of BCUT values, but to combinations involving any other low-dimensional chemistry-space metrics. Although experience to date strongly supports the use of BCUT values as metrics, software developed by Pearlman and Smith [12]encourages the user to consider his own metrics in addition to BCUT values. However,this must be done with extreme caution for the following somewhat ironic reason.Imagine assigning random numbers to each of compounds of a large populations andthen considering those numbers as a potential axis of a chemistry-space. Since therandom numbers would be uniformly distributed over the population of compounds, theχ-squared approach would perceive this ‘metric’ (and other similarly random ‘metrics’)as good choices as axes of a chemistry-space. This brings us. rather dramatically, to theneed to validate the choice of metrics used to define a chemistry-space.

4. Validation of Chemistry-space Metrics

Obviously, chemistry-space ‘metrics’ which are merely random numbers with norelation to structure would be of no use for diversity-related tasks or any other purpose.How can we demonstrate that a given set of metrics is actually reflecting differences inmolecular structure and, thereby, validate those metrics for use in addressing chemicaldiversity-related tasks?

Perhaps the most intuitive approach to metric validation is to use the metrics asQSAR descriptors; i.e. establish a linear regression equation relating the metrics to theexperimentally measured ‘activities‘ of a set of compounds. Let us imagine that we canput aside the differences between receptor-affinity and actual activity (due to transportissues or secondary processes in the cascade of events between initial receptor bindingand eventual pharmacological effect). Since it would be impossible to establish a statis-tically significant regression based on meaningless, random numbers, demonstrating such a regression would be proof that the metrics are not random numbers, but true indi- cators of chemical structure. After auto-choosing a six-dimensional BCUT chemistry-space to best represent the diversity of their entire corporate database, Weintraub andDemeter [13] used those six BCUT values to regress the logIC50 values measured for800 ligands at the benzodiazepine site of the GABAA receptor and obtained a PLSmodel essentially as good as one they obtained previously based upon 70 classicalQSAR descriptors.

While the results of Weintraub and Demeter certainly confirmed the validity ofBCUT values as chemistry-space metrics, those results are. unfortunately but not un-expectedly, quite rare! As will be illustrated below. chemistry-space metrics are notQSAR descriptors and rarely yield regressions as good as that reported by Weintrauband Demeter, Chemistry-space metrics are intended to position compounds in a struc-ture-based chemistry-space. QSAR descriptors are intended to provide quantitative esti- mates of bioactivity. Chemistry-space metrics are intended to reflect (in a necessarily

346

Page 353: 3D QSAR in Drug Design. Vol2

Novel Software Tools for Chemical Diversity

crude manner) all features of molecular structure. In contrast, QSAR descriptors arespecifically chosen to reflect (as accurately as possible) only those features of molecularstructure which have been found relevant for a particular bioactivity. It is well knownthat QSARs will give unreliable estimates of activity when applied to compounds con-taining structural features not present in the training set used to identify the ‘relevant‘features characterized by the QSAR descriptors. We certainly should not expect QSARsbased on chemistry-space metrics to do any better since (i) the metrics were notintended for this purpose and (ii), as will be explained below, position in chemistry-space is not quantitatively related to activity.

How, then, should chemistry-space metrics be validated? Pearlman and Smith [10]have presented a simple yet novel approach to metric validation which they refer to asactivity-seeded, structure-based clustering. Unlike typical clustering algorithms (based on structure alone) which can be used for a variety of tasks, this algorithm requiresactivity data (preferably, quantitative data) for a set of compounds and is intended onlyfor the diversity-related task of validating chemistry-space metrics. Given a set of activecompounds which all bind to a given receptor in the same way, it is certainly reasonableto expect that those active compounds should be positioned near each other in a smallregion of chemistry-space if the chemistry-space metrics are valid. The activity-seeded, structure-based clustering algorithm provides a method for directly testing that expecta-tion in the typical case in which the chemistry-space dimensionality is greater than 3and, thus, simple visual inspection of the distribution of active compounds is difficult orimpossible. The algorithm consists of the following procedure:

1 .

2.3.4.

5 .

6.

Choose a unit-cluster radius: a small distance in the chemistry-space to be validated.Center a sphere of that radius on the most active coinpound in the validation set.Assign other active compounds located within that sphere to that ‘unit-cluster’.Center another sphere on the next most active compound not already assigned tosome unit-cluster.Repeat steps 3 and 4 until all active compounds have been assigned to someunit-cluster.‘Coalesce’ adjoining (overlapping) unit-clusters and record the number of unit-cluster spheres per coalesced-cluster.

The algorithm can be implemented as an O(N) process and, thus, is extremely fast.More significantly, the algorithm can be used to validate all types of chemistry-spacedefinitions including other (non-BCUT) low-dimensional chemistry-spaces and thosebased on high-dimensional fingerprints as well. When used in a cell-based context, theunit-cluster radius is typically chosen to yield a tiny hypersphere of volume equal to thatof a single hypercubic cell reflecting the ‘resolution’ corresponding to a user-specifiednumber of bins/axis (see below). In any case, the total number of unit-cluster spherescontained in all coalesced-clusters provides an upper bound on the volume of chemistry-space required to contain all the active compounds.

Using the activity-seeded, structure-based clustering algorithm. Pearlman and Deanda[14] have performed a number of validation studies. For example, after auto-choosing

347

Page 354: 3D QSAR in Drug Design. Vol2

Robert S. Pearlman and K.M. Smith

the BCUT chemistiy-space which best represents the diversity of compounds in MDL’sMDDR database [15], they computed the positions of 197 relatively diverse ACEinhibitors in that chemistry-space. The 197 inhibitors were culled from the primaryliterature [16–23]. Measured activities (–logIC50) were reported for all compounds and spanned the range 5.24 to 9.64. The 78 most active compounds (top 40%) had activities inthe range 7.85 to 9.64 and were identilied as ‘highly active’ compounds. If the BCUTvalues used as chemistry-space metrics were random numbers or quantities unrelated tostructure and intermolecular interaction, the active compounds would be randomly distrib-uted throughout chemistry-space. However, using the activity-seeded, structure-based clus-tering algorithm, they found that the 78 ‘highly active’ compounds are all contained by just3 coalesced clusters occupying less than 0.02% of the entire chemistry-space and less than0.19% of occupied chemistryspace. Significantly, the 3 clusters were close to each other;the largest inter-cluster distance being just 3.2R where R is the unit-cluster radius.

It is instructive to consider the analogous results obtained using all 197 compounds(including the 119 ‘poorly active’ compounds). Once again, the active compounds wereall clustered relatively near each other but they occupied a much larger volume ofchemistry-space than that occupied by just the 78 ‘highly active’ compounds. Thisresult is entirely consistent with expectations. There can be many different structures which exhibit poor to modest activities. In contrast, there are relatively fewer structureswhich exhibit high activities. This fact may be easier to appreciate by considering thenotion of making structural modifications of a very highly active compound. There maybe a few modifications which preserve high activity but there are far more modifications which reduce or even completely destroy activity.

The fact that poorly to modestly active compounds are spread over larger regions ofchemistry-space than highly active compounds illustrates one reason for which chemistry-space metrics cannot (and should not) be used as QSAR descriptors.Compounds with significantly different structures would be positioned at widely distant points in chemistry-space but could exhibit low or moderate bioactivities (or affinities) of exactly equal magnitudes. The converse illustrates a second reason for which chem-istry-space metrics cannot (and should not) be used as QSAR descriptors. For example,adding just a single methylene unit to the middle of the –CH2CH2OH side chain of somehighly active compound could completely destroy the activity if the propyl-hydroxy de- rivative no longer fits into the receptor. Thus. two highly similar compounds (which anyvalid metrics would place very near each other in chemistry-space) could have entirelydissimilar activities. Clearly, neither QSAR nor any other approach based o n the as-

sumption of a quantitative relationship between activity and precise position in chem-istry-space will be a valid approach to metric validation. On the other hand, theactivity-seeded, structure-based clustering approach clearly indicates whether a givenset of metrics places compounds active against the same receptor in the same smallregion of chemistry-space and, thus, provides a rational basis for metric validation.

5.

Once one has determined which metrics define the chemistry-space which best repre-sents the diversity within a given population of compounds, one is then able to use

348

Using Low-dimensional Metrics for Diversity-related Tasks

Page 355: 3D QSAR in Drug Design. Vol2

Novel Software TooIs for Chemical Diversity

various cell-based algorithms to address all of the other diversity-related tasks men-tioned in the introductory section of this chapter. Recall that such algorithms can exploitnot only the knowledge of inter-compound distances, but also the knowledge ofabsolute positions of compounds in chemistry-space. Structurally similar compounds are positioned near each other in chemistry-space and, thus, are found ‘clustered’ in thesame or neighboring cells.

A chemistry-space, like any other vector-space. must be comprised of normalized axes(so that a distance of, say, 4 units in one direction is equivalent to a distance of 4 units inany other direction). Thus, the ‘cells’ are hypercubes resulting from dividing each of thenormalized axes of a chemistry-space into equal numbers of evenly spaced ’bins’. Thenumber of bins/axis is directly related to the ‘resolution’ with which one examines the dis-tribution of compounds across chemistry-space and is inversely related to the apparent‘occupancy’ of that chemistryspace. For example, if 250 000 compounds are distributedin some 6-dimensional chemistry-space and each axis is ‘divided’ into just one single bin,all 250 000 compounds would be contained in just one single cell. The occupancy(number of occupied cells divided by total number of cells) would be 100% but theresolution would, obviously, be uselessly low. If each axis were divided into 20 bins, therewould be 206 = 64 000 000 tiny cells. In this case, the occupancy would be extremely lowand the resolution would be uselessly high: most cells would be empty and even verysimilar compounds could be in different, noli-neighboring cells. Cell-based algorithms forsome tasks automatically choose the number of bins/axis most appropriate for that task and population. Other tasks require that the user decide on the resolution (see below). Recalling that typical populations of compounds are not uniformly distributed, experience has shown that choosing the number of bins/axis which yields roughly 12% to 16%occupancy provides an appropriate level of resolution for most purposes.

6.

Our explanation of the χ-squared approach to auto-choosing the metrics of a chemistry-space also illustrated the essence of the natural, cell-based approach to diverse subsetselection. As implied in that illustration, we recommend selecting one compound fromeach occupied cell, although our software also allows one to sample each cell in pro-portion to its occupancy or by selecting up to some fixed number of compounds per cell.Once the user has specified the sampling protocol (number per cell) and the size ofthe desired subset, the software automatically finds the number of bins/axis which yields the number of occupied cells required to provide a subset closest in size tothat requested. Cell-based algorithms are extremely fast and especially well-suited tohandling very large populations of compounds. Even if the software must make threeof four guesses before finding the best number of bins/axis, selecting a subset of50 000 structurally diverse compounds from a population of 0.5 million would takeapproximately 20 cpu seconds on a modest workstation.

By selecting cornpounds nearest the center of each cell, we can avoid choosing com-pounds near each other but just barely on opposite sides of a plane separating two cells.In other words, by selecting compounds nearest the center of each cell, we are selectinga subset of maximal structural diversity, namely simple diverse subset selection.

Simple and Biased Diverse Subset Selection

349

Page 356: 3D QSAR in Drug Design. Vol2

Rohert S. Pearlman and R.M. Smith

Biased subset selection can easily be accomplished by allowing the user to constructa modified selection rule. In other words. rather than choosing the compound nearest thecenter of a given cell, one can arrange to choose the compound which provides the best(user-specified) compromise between distance from center and some non-structuralproperty. For example, given a choice between two compounds from the same smallregion (cell) of chemistry-space, availability (price, quantity on hand, etc.) might cer-tainly be important considerations for assembling a subset for general screening pur-poses. Recalling that logP in a poor chemistry-space metric but, nevertheless, quiteimportant for bioactivity, choosing compounds from each cel l closest to some particular‘ideal’ logP could be advantageous.

Biased subset selection can also be used to improve the efficiency and economy ofcombinatorial library synthesis. Imagine that 1000 A-type and 1000 B-type reactantscould be used to make 100desired for screening purposes. Selecting 100 diverse A’s and B’s offers obvious prac-tical advantages but. clearly, does not yield as diverse a set of 10 000 as could have been selected from the complete set of 1 000 000 products. Simple diverse subset selec-tion from all the products would undoubtedly result i n the need to use many more than100 of each type of reactant. By keeping track of the frequency with which eachreactant is used i n the products being selected, and by specifying the format of theplates used for the syntheses (e.g. typical 8 X 12 = 96-well plate), DiverseSolutions [12 ] used in conjunction with CombinDBMaker (for combinatorial database generation[24]) enables the user to specify a selection rule which chooses compounds providingthe best (user-specified) compromise between distance from center of cell and economy.

7.

In order to address the possibility of finding leads to bioactive compounds in regions ofchemistry-space not covered by their current collection of compounds, pharmaceuticaland agrochemical companies allocate a certain fraction o f their resources to compound acquisition programs: purchasing. trading for or synthesizing additional compounds forscreening. Practical considerations (cost, screening capacity, etc.) limit the number ofcompounds companies choose to acquire.

Identifying and filling in diversity voids is trivially simple using cell-based alpor-ithms. Obviously, ‘empty’ cells represent regions of missing diversity. ‘Empty’ can bedefined to mean either that the cell contains no compounds or that it contains less than some user-specified number. When identifying the diversity voids in a given populationof compounds, the number of empty cells will depend not only on how those com-pounds are distributed, but also upon the ‘resolution’ at which the ‘search’ for emptycells is performed. Whereas the number of bins/axis can be chosen by the software during subset selection, the user must choose the number of bins/axis which will yield anumber of diversity voids consistent with those practical considerations which limit thenumber of compounds his company chooses to acquire (see below).

Since cell-based algorithms (unlike distance-based algorithms) reference absolutecompound coordinates in chemistry-space. tilling in diversity-voids is also trivially

Identifying and Filling in Diversity Voids

350

0 000 AB-type products but that just 10 000 products are

Page 357: 3D QSAR in Drug Design. Vol2

Novel Software Tools for Chemical Diversity

simple using cell-based methods: compounds (from some secondary population) areacquired if they would occupy a previously ‘empty’ cell in the chemistry-space con-taining the primary population. Of course, this entails precomputing the coordinates(metrics) of the secondary population i n the same chemistry-space as that used tocontain the primary population. Since we know exactly which cell would be filled byeach candidate compound, we can easily bias our choice of fill-in compounds using thesame sort of lion-structural criteria as discussed for biased subset selection. The user canalso specify how many compounds lie wants to add to ‘empty’ cells. DiverseSolutionsthen presents the list of compounds to acquire in various formats to facilitate purchasedecisions (e.g. compounds which would ensure at least 1 compound i n each ‘empty’cell, compounds which would ensure at least 2 compounds in each ‘empty‘ cell, etc.).

Finding the diversity voids in a population of 0.5 million compounds typically takesless than 5 cpu seconds on a modest workstation. Filling i n those voids (to the extentpossible) from a library of 50 000 compounds typically takes less than 4 cpu seconds.Thus, the user can easily experiment with several values of bind/axis and severalfilling-in protocols.

It is worth noting that the filling-in process does not require any information aboutthe compounds contained in the primary population. All that is required is the definitionof the chemistry-space (i.e. name and range of the metric corresponding to each axis),the number of bins/axis, and the cell-numbers of the ‘empty’ cells. Thus, withoutrevealing the compounds in its proprietary database, company X can. in essence, enable company Y to identify compounds which would fill in company X’s missing diversity.

8.

Occasionally, it may be useful to compare the diversities of two (or more) populationsof compounds — perhaps alternative third-party libraries one could purchase or alter-native combinatorial libraries one could synthesize to augment the diversity of a cor-porate database. Distance-based approaches merely allow the comparison of statisticsrelated to nearest-neighbor distances within the two populations. Such statistics provideno information regarding the redundancy of compounds contained in both populations.or even the extent to which the regions covered by the populations overlap in chemistry-space.

In contrast, a cell-based approach provides an extremely rapid answer to the funda-mental, pragmatic questions at the heart of the population comparison issue: if popu-lation A and population B are alternative libraries and population X is a corporate database,

1.2.3.4.

Questions 1 and 2 are easily answered by identifying the voids in population X andhypothetically using both populations A and B to fill those voids. Similarly, questions 3

Comparing the Diversities of’ Two or More Populations

how many population-A compounds fill voids in population X?how many population-B compounds fill voids in population X?how many population-A cornpounds fill voids in population B? how many population-B compounds fill voids in population A?

351

Page 358: 3D QSAR in Drug Design. Vol2

Robert S. Pearlman and K.M Smith

and 4 can be answered simultaneously by a ‘compare diversities’ algorithm which, es-sentially, performs two find-voids and fill-in tasks at the same time. In addressing ques-tions 1 and 2, it is natural to use the chemistry-space which best represents the diversityof population X. In addressing questions 3 and 4, one might use a chemistry-space pre-viously defined for some related population or use a chemistry-space defined to bestrepresent the union of the A and B populations.

9. Summary

If properly constructed, high-dimensional (fingerprint) and low-dimensional metrics canprovide equally valid representations of chemistry-space for chemical diversity pur-poses. High-dimensional metrics offer the advantage of providing substantial detailregarding the topological aspects of molecular substructure but suffer the disadvantagethat they can be used only for distance-based algorithms for addressing the variousdiversity-related tasks encountered in pharmaceutical and agrochemical industry. Low-dimensional metrics offer the advantage of enabling the use of either distance-based orcell-based algorithms but traditional molecular descriptors are often cross-correlated,provide little or no substructural information and, thus, are poor choices for chemistry-space metrics, BCUT values constitute a novel class of molecular descriptors which notonly encode substructural topological (or topographical) information, but also encodeatom-based inforination relevant to the strength of ligand-receptor interaction.

We have presented an algorithm for choosing those low-dimensional metrics whichbest represent the diversity of a given population of compounds, an algorithm forvalidating the chosen metrics and cell-based algorithms using those metrics to addressall of the diversity-related tasks. Work is currently in progress to develop additionalmetrics and a more efficient implementation of the algorithm for choosing the best chemistry-space.

References

1. MACSS and Isis are distributed by Molecular Design Ltd. Information Systems. San Leandro, CA.2. Unity is distributed by Trips. Inc., St. Louis. MO.3. DayMenus is distributed by Daylight Chemical Information Systems. Inc., Irvine, CA.4. ChemDBS-3D is distributed by Chemical Design Ltd., Oxford. U K .5. Brown, R.D. and Martin. Y.C., Use of structure-activity data to compare structure-based clustering

methods and descriptors for use in compound selection, J. Chem. Inf. Comp. Sci., 36 (1996), 572–584. 6. Pearlman, R.S., Novel software tools for addressing chemical diversity, NetSci,

http:/www.awod.com/netsci, 2 (1996), 1.7. Pearlman. R.S., Diverse8. Burden. F.R., Molecular

9.

10.11.

12.

Solutions User’s Manual, University of Texas, Austin. TX, 1996. identification number for substructure searches, J. Chem. Inf. Comp. Sci.,

29 (1989), 225–7.Rusinko III, A. and Lipkus. A.H., unpublished results obtained at Chemical AbstractsService,Columbus OH.Pearlman, R.S. and Smith, K.M., manuscript in preparation. CONCORD was developed by R.S. Pearlman, A. Rusinko, J.M. Skell, adn R. Balducci at the University of Texas, Austin TX and is distributed by Tripos, Inc., St. Louis. MO.DiverseSolutions was developed by R.S. Pearlman and K.M. Smith at the University of Texas, Austin TX and is distributed by Tripos, Inc., St. Louis, MO.

352

Page 359: 3D QSAR in Drug Design. Vol2

Novel Software Tools for Chemical Diversity

13.14.15.

16.

17.

Weintraub, H.J.R. and Demeter, D.A., personal communication.Pearlman, R.S. and Deanda, F., manuscript in preparation.Modern Drug Data Report distributed by Molecular Design Ltd. Information Systems, San Leandro,CA.Sweet, C.S., Ulm, E.H., Gross, D.M., Vassil, T.C. and Stone, C.A., A new class of angiotensin-converting enzyme inhibitors, Nature, 288 (1980) 280-283.Suh, J.T., Skiles, J.W., Williams, B.E., Youssefyeh, R.D., Jones, H., Loev, B., Neiss, E.S., Schwab, A.,Mann, W.S., Khandwala, A., Wolf, P.S., and Weinryb, I., Angiotensin-Converting Enzyme Inhibitors.New Orally active antihypertensive (Mercaptoalkanoyl)- and [(Acylthio)alkanoyl]glycine derivatives,J. Med. Chem., 28 (1985) 57-66.Menard, P.R., Suh, J.T., Jones, H., Loev, B., Neiss, E.S., Wilde, J., Schwab, A., and Mann, W.S.,Angiotensin-Converting Enzyme Inhibitors. (Mercaptoaroyl)amino acids, J. Med. Chem., 28 (1985)

Wyvratt, M.J. and Patchett, A.A., Recent developments in the design of angiotensin-converting enzyme inhibitors, Med. Res. Rev., 5 (1985) 483-531.Karanewsky, D.S., Badia, M.C., Cushman, D.W., DeForrest, J.M., Dejneka, T.,Loots, M.J., Perri, M.G.,Petrillo, E.W., and Powell, J.R., (Phosphinyloxy)acyl amino acid inhibitors of angiotensin-convertingenzyme (ACE). 1. Discovery of (S)-1-[6-amino-2-[[hydroxy(4-phenylbutyl)phosphinyl]oxy]-l-oxohexyl- L-proline novel orally active inhibitor of ACE, J. Med. Chem., 31 (1988) 204-212.Yanagisawa, H., Ishihara, S., Ando, A., Kanazaki, T., Miyamoto, S., Koike, H., Iijima, Y., Oizumi, K., Matsushita, Y. and Hata, T., Angiotensin-Converting Enzyme Inhibitors. 2. Perhydroazepin-2-one deriv- atives. J. Med. Chem., 31 (1988) 422–428.

22. Krapcho, J., Turk, C., Cushman, D.W., Powell, J.R., DeForrest, J.M., Spitzmiller, E.R., Karanewsky,D.S., Duggan, M., Rovnyak, G., Schwwartz, J., Natarajan, S., Godfrey, J.D., Ryono, D.E., Neubeck, R.,Atwal, K.S., and Petrillo, E.W., Angiotensin-Converting Enzyme Inhibitors. Mercaptan, carboxyalkyldipeptide, and phosphinic acid inhibitors incorporating 4-substitutedprolines, J. Med. Chem., 31 (1988)1148-1160.Karanewsky, D.S., Badia, M.C., Cushman, D.W., DeForrest, J.M., Dejneka, T., Lee, V.G., Loots, M.J.,and Petrillo, E.W., (Phosphinyloxy)acyl amino acid inhibitors of angiotensin-converting enzyme. 2. Terminal amino acid analogues of (S)-1-[6-amino-2-[[hydroxy(4-phenylbutyl) phosphinyl]oxy]-1-oxohexyl]-L-proline, J. Med. Chem., 33 (1990) 1459-1469.CombinDBMaker was developed by R.S. Pearlman and E.L. Stewart at the University of Texas, AustinTX and is available from the authors.

18.

328-332.19.

20.

21.

23.

24.

353

Page 360: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank

Page 361: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSARApplications

Roberto Todeschini* and Paola Gramatica Department of Environmental Sciences, via Emanueli, 15, 1-20126 Milan, Italy

1. Introduction

Molecular descriptors represent the way chemical information, contained in the molecu-lar structure, is transformed and coded to deal with chemical, pharmacological and toxi-cological problems in quantitative structure-activity (QSAR) and structure-property(QSPR) studies. Molecular descriptors take into account different aspects of the chemi-cal information. The approach to obtaining this information can (a) be through experi-ments, theoretical calculations or simple counting operations, (b) consider the wholemolecule, fragments of it or functional groups, (c) require the knowledge of the 3Dstructure of the molecule or its molecular graph, or simply its formula, or (d) call forinformation defined by scalar values, vectors or scalar fields. In recent years. severalapproaches have been explored and many kinds of molecular descriptors have beenproposed [1-10].

Among the theoretical descriptors. the best known are molecular weight and struc-tural descriptors (1D descriptors. i.e. counting of bonds, atoms of different kinds, pres-ence or counting of functional groups and fragments, etc.), obtained from a simpleknowledge of the formula, and topological descriptors (2D descriptors). obtained fromthe knowledge of the molecular topology [5-8].

The complexity of the chemical information contained in 3D molecular structurecalls for descriptors able to also take into account properties related to a more sound andthree-dimensional representation of the molecules, WHIM (Weighted Holistic InvariantMolecular) descriptors are 3D molecular indices that represent different sources ofchemical information [9–13]. WHIM descriptors contain information about the whole3D molecular structure in terms of size, shape, symmetry and atom distribution. Theseindices are calculated from x,y,z-coordinates of a 3D structure of the molecule, usuallyfrom a spatial conformation of minimum energy, within different weighting schemes ina straightforward manner and represent a very general approach to describe moleculesin a unitary conceptual framework. These indices have already been successfully usedto search for QSAR and QSPR relationships for several classes of compounds anddifferent responses [9–16].

The WHIM descriptor approach has also been extended to treat interaction scalarfields [17]: G-WHIM (Grid-W eighted Holistic Invariant Molecular) descriptors aredefined and calculated from the coordinates of the grid-points where an interaction energy field between the molecule and a probe has been evaluated. In both WHIM ap-proaches, the chemical information contained i n the molecular structure or in the whole

*To whom correspondence should he addressed.

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design, Volume 2. 355-380© 1998 Kluwer Academic Publishers. Printed in Great Britian.

Page 362: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

grid of interaction energy values is synthesized in a few parameters that are completely invariant to rotation and translation and that represent interpretable properties of themolecules. Moreover, in the G-WHIM approach. the difficulties commonly related tochemical information spread out over a great number of grid-points and to the problem of the dependence of the results upon molecule alignment are avoided.

The interpretability of the results is quite evident and defined by the same math-ematical properties of the algorithm used for their calculation. In this review, the theoriesof both WHIM approaches are presented and some applications in the QSAR field arediscussed. A simple didactical example is also used to explain properties, characteristics and chemical meaning of the WHIM descriptors. Due to the continuous development ofthe theory itself, there is some discrepancy in the WHIM symbols used in previouspublications and those used here, the final choice is in this review.

2. Theory of WHIM Descriptors

WHIM descriptors are built in such a way that they capture relevant molecular 3D in-formation regarding molecular size, shape, symmetry and atom distribution with respectto invariant reference frames. The algorithm consists in performing a Principal Com-ponents Analysis (PCA) on the centered molecular coordinates by using a weighted co-variance matrix S obtained from different weighting schemes for the atoms. Theelements of the covariance matrix are:

(1)

where n is the number of atoms. wi the weight of the i-th atom. qij represents the j-th co-

2.1. The weighting schemes

Six different weighting schemes have been proposed: (1) the unweighted case U (wi = 1i = 1, n, where n is the number of atoms for each compound), (2) atomic massesM (wi = mi), (3) the van der Waals volumes V (wi = vdwi), (4) the Mulliken atomic

electrotopological indices of Kier and Hall S (wi = Si) [5].All the weights (1)-(5) are also scaled with respect to the carbon atom and their

values (original and scaled values) are shown in Table 1 . As all the weights must bepositive, the electrotopological indices me scaled as follows:

(2)

In this caw, only the non-hydrogen atoms are considered and the atomic charge of eachatom is dependent on its atom neighbor, following the Kier and Hall approach.

356

ordinate (j = 1,2,3) of the i-th atom and qj is the average of the j-th coordinates.

electronegativities E (wi = elni), (5) the atomic polarizabilities P (wi = poli) and (6) the

i =S' Si + 7 Si' > 0

Page 363: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSA R

Table 1

ID Atomic mass vdW volume Electronegativity Polarizability

Atomic weights and relative atomic weights used for calculation of the WHIM descriptors

M M/M(C) v W/V(C) E W/E(C) P W/P(C)

H 1.01 0.084 6.709 0.299 2.592 0.944 0.667 0.379B 10.81 0.900 17.875 0.796 2.275 0.828 3.030 1.722C 12.01 1.000 22.449 1.000 2.746 1.000 1.760 1.000N 14.01 1.166 15.599 0.695 3.194 1.163 1.100 0.625

F 19.00 1.582 9.203 0.410 4.000 1.457 0.557 0.316A1 26.98 2.246 36.511 1.626 1.714 0.624 6.800 3.864Si 28.09 2.339 31.976 1.424 2.138 0.779 5.380 3.057

S 32.07 2.670 14.429 1.088 2.957 1.077 2.000 1.648CI 35.45 2.952 23.228 1.035 3.175 1.265 2.180 1.239Fe 55.85 4.650 41.052 1.829 2.000 0.728 8.400 4.773 co 58.93 4.907 35.011 1.561 2.000 0.728 7.500 4.261Ni 58.69 4.887 17.157 0.764 2.000 0.728 6.800 3.864 C u 63.55 5.291 11.494 0.512 2.033 0.740 6.100 3.466Zn 65.39 5.445 38.351 1.708 2.223 0.8 10 7.100 4.034Br 79.90 6.653 3 1.059 1.384 3.319 1.172 3.050 1.733 s n 118.71 9.884 45.830 2.042 2.298 0.837 7.700 4.375 I 126.90 10.566 38.792 1.728 2.778 1.012 5.350 3.040

O 16.00 1.332 11.404 0.512 3.654 1.331 0.802 0.456

P 30.97 2.579 26.522 1.181 2.515 0.916 3.630 2.063

Depending on the kind of weighting scheme. different covariance matrices and differ- ent principal axes are obtained. For example, using atomic mass es as the weighting scheme. the directions of the three principal axes from PCA are the directions of the moments of inertia axes. Thus, the WHIM approach can be viewed as a generalization searching for the principal axes with respect to a defned atomic property (the weighting scheme). For each weighting scheme, a set of statistical indices is calculated on the atoms projected onto each principal component tm (m = 1,2,3), as described below. The whole procedure is shown in Fig. 1.

The invariance to translation of the calculated parameters is guaranteed from the cen- tering of the atomic coordinates and the invariance to rotation from the uniqueness of the PCA solution. It must be noted that, in the first version of the WHIM descriptors [9–12], the molecules have been centered on their baricenter (i.e. using the atom weights for the calculation of the center) and not in the center of the coordinates (i.e. firstly, centering the atomic coordinates and then calculating the weighted covariance matrix).

2.2. Directional WHIM descriptors

Directional WHIM descriptors are univariate statistical indices calculated from thescores of each individual principal component (1, 2. 3). The first group of descriptors are the eigenvalues λ 1, λ 2 and λ 3 that are related to molecular size, the second group is constituted by the eigenvalue proportions ϑ1, ϑ2 and ϑ3 that are related to molecularshape:

357

Page 364: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

Fig. 1. Flow chart of the procedure for the calculation of the WHIM descripiors.

358

Page 365: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSA R

(3)

Because of the closure condition (ϑ1 + ϑ2 + ϑ3 = 1), only two are independent. The third group of descriptors is constituted by the symmetries γ1, γ2 and γ 3 calcu-

lated from an inforination content index on the symmetry along each principal compo-nent with respect to the center of the scores:

(4)

where ns, na, and n arc, respectively, the number of central symmetric atoms (along them-th component), the number of non-symmetric atoms and the total number of atoms ofthe molecule.

Finally, the fourth group of descriptors is constituted by the inverse of the kurtosis κ1,κ2 and κ3, calculated from the fourth-order moments of the scores tm, that arc related tothe atom distribution and density around the origin and along the principal axes:

(5)

To avoid the problems related to the infinite (or very high) kurtosis values, obtainedwhen along a principal axis all the atoms are projected in the center (or near the center,i.e. leptokurtic distribution), the inverse of the kurtosis is used.

Low values of the kurtosis are obtained when the data points (i.e. the atom projec-tions) assume opposite values (–t and t) with respect to center of the scores. When an increasing number of data values are within the extreme values ± t along a principalaxis, the kurtosis value increases (i.e., κ = 1.8 for a uniform distribution of points,κ = 3.0 for a normal distribution). When the kurtosis value tends to infinity, the cor-responding value tends to zero.

Thus, the group of descriptors ηm can be related to the density of the atoms distribu-tion — i.e. to the quantity of unfilled space per projected atom — and has also beencalled emptiness: the greater the ηm values, the greater the projected unfilled space. Theηm descriptors are used in place of the kurtosis descriptors κm (previously proposed, seereference [11]).

2.3. Non-directional WHIM descriptors

The non-directional WHIM descriptors are directly derived from the directional WHIMdescriptors. Thus, for non-directional WHIM descriptors, any information related to theprincipal axes disappears and the description is related only to a global — holistic —view of the molecule.

In many cases, size descriptors can, in modelling. play a significant role independ-ently of the measured directions, allowing more simple models. Thus, in view of the

359

Page 366: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paolca Gramatica

importance of this quantity, a group of descriptors of the total molecular size is con-sidered in three different ways:

(6)

T and A are, respectively, related to linear and quadratic contributions to the total mole-

Thus the molecular size is taken into account by all three size descriptors in differentways.

For each function, six total dimensions are obtained. one for each weighting scheme.The molecular shape is represented by the following expression:

(7)

The term 4/3 is the maximum value of the numerator term and is used to scale Kbetween 0 and 1 . This expression also has a more general meaning and has beenproposed to evaluate the global con-elation in multivariate data [18].

For example. for an ideal straight molecule, both λ2 and λ3 are equal to zero andK = 1; for an ideal spherical molecule, all three eigenvalues are equal t o 1/3 and K = 0.For all planar molecules. the third eigenvalue λ3 is 0. there being no variance from themolecular plane. and K ranges between 0.5 and 1, depending on the linearity.

The K shape term definitely substitutes for the acentric factor. the former being moregeneral than the previously proposed acentric factor ω [9–12]and defined as =ϑ1 – ϑ3.

The total molecular symmetry is defined as the following:

(8)

where G is the harmonic mean of the directional symmetries. It equals 1 when the mole-cule shows a central symmetry along each axis and tends to 0 when there is a loss ofsymmetry along at least one axis. Different symmetry values are obtained only whenunitary, mass and electrotopological weights are used: for this reason, only three kindsof symmetry parameters are retained; Gu, Gm and Gs.

The total molecular density is represented by the following expression:

(9)

The molecular descriptors defined above, due to their invariance to rotation and transla-tion, are generalized molecular properties within each weighting scheme. Thus, for eachweighting scheme, a total of 11 molecular directional descriptors (ϑ 3 was eliminated),

360

cular dimension; V , that contains also the third-order term, is the complete expression.

D = η1 + η2 + η3

Page 367: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSAR

Weighted Holistic Invariant Molecular (WHIM) descriptors, can be obtained from eachmoleculdr geometry

The total number of directional WHIM descriptors is 66.

scheme:For planar compounds. the modelling requires only 8 descriptors for each weighting

As the molecular weight MW is the sum of the atomic weights of the molecule, similar additive quantities can be analogously defined from the other weights used for the cal-culation of the WHIM descriptors. Thus. the sum of the van der Waals volumes (Sv ),

of the electrotopological charges (Ss),together with the molecular weight. can be addedto the 33 non-directional WHIM. The sum of the unweighted atoms (Su) corresponds tothe total number of a t o m (NAT), but is not here considered among the WHIM descrip-tors. All these descriptors are independent of conformational (and configurational)geometries, but it is interesting to observe that Sv and Sp represent spatial additiveproperties, while MW represents an inertial additive property. As a whole, the non-directional WHIM are 5 for each weighting scheme (6): T, A, V, K, D, plus Gu, Gm, Gs.Sv, Se, Sp and Ss, giving a total number of 37 descriptors.

2.4.

To understand and explain the chemical meaning of WHIM descriptors, a dataset of 40simple but heterogeneous molecules is used. The list of the 40 molecules is shown inTable 2 and the WHIM values can be obtained on request. Once calculated, the WHIMdescriptors can be plotted, forming simple scatterplots that give a deeper insight into thedescriptor chemical meaning.

In Fig. 2, the scatterplot of the 40 compounds is obtained from the variables Vm(size) and Km (shape) — i.e. descriptors calculated using the atomic masses as atomweights. The arrow A represents the increase i n size and linearity (Km 0.7-0.9) of thefive linear alkanes (1-5). Ethane (1) is more linear (higher km value than propane dueto the absence of any central Csp3 carbon atom contribution along the second compo-nent. The same increase in size and linearity can be observed for cycloalkanes (11–15,arrow B), but. as expected, at a lower level of linearity (Km 0.3-0.4). The arrows Cand D represent the increase i n size and linearity of the halo-substituted benzencs(27-30, arrow C) and of the condensed benzenes (21, 39, 40, arrow D), respectively.The remaining mono-substituted benzenes are at intermediate positions. Arrow E high-lights the large change in shape from neopentane (7 , spherical shape) to2-butyne (10, near-linear shape); note also the difference in shape between the two2-butene isomers (8, cis and 9, trans) in this same direction. The alcohols (16–20) are

The meaningof the WHIM descriptors

361

the sum of the electronegativities (Se), the sum of the polarizabilities (Sp) and the sum

Page 368: 3D QSAR in Drug Design. Vol2

Roherto Todeschini and paola Graniatica

Table 2 List of the 40 c ompounds used in the example

ID Compound ID Compound

1 Ethane 21 Benzene 2 Propane 22 Toluene 3 n-butane 23 Phenol4 n-pentane 24 Benzoic acid5 n-hexane 25 Aniline 6 Isobutane 26 Nitrobenzene 7 Neopentane 27 F-benzene 8 cis-2-butene 28 CI-benzene9 trans-2-butene 29 Rr-benzene

10 2-butyne 30 I-benzene11 Cyelopropane 31 2-propanone12 Cyclobutane 32 2-propanol

13 Cyclopentane 33 2-propylamine

15 Cyclohexanone 35 2-iodopropane 16 Methanol 36 2-propanethiol 17 Ethanol 37 Methylamine 18 Trifluoroethmol 38 Dimethylamine 19 2-aminoethanol 39 Naphthalene 20 Propano 40 Anthracene

14 Cyclohexane 34 2-fluoropropane

approximately aligned like the n-alkanes, the exception being the trifluoroethanol (18) because the masses of the three fluorine atoms substituting the hydrogen atoms give a tridimensional isotropic contribution. resulting in an increase in size and a significant decrease in linearity. Due to the iodine mass. increased linearity is observed for the 2-iodopropane (35), compared to the other 2-substituted propane isomers (31–36),

As shown in Figs. 3 and 4 of reference [13]. other interesting information about the molecular structure can be obtained by plotting, for example, shape (K) versus sym-metry (G) descriptors, or densities descriptors weighted from masses and electro-topological charges (Dm versus Ds). As shown in Figs. 6–9 of reference [13]. a principalcomponent analysis of the 33 non-directional WHIM descriptors shows as the different sources of WHIM information are separated. The first six components explain the major part or the total variance (97%). where the first four represent individually molecularsize. shape, symmetry anti density. while the last two components arc mainly related to single density descriptors. Dm and Ds. respectively.

3. Theory of G-WHIM descriptors

The algorithm proposed for the WHIM approach can be applied to any discretized image: in particular. this algorithm can be extended to deal with 3D grid-points. In such a case. the grid-points substitute for the atomic coordinates of a molecule and defined electronic or steric properties are used as weights. The descriptors derived from this ap- proach are called Grid- eighted Holistic Invariant Molecular descriptors (G-WHIM).

362

Page 369: 3D QSAR in Drug Design. Vol2

Fig

. 2.

Scat

terp

lot o

f Vm

(si

ze.)

and

Km

(sh

ape)

des

crip

tors

for

the

40 m

olec

ules

of T

able

2.

363

Page 370: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

The theory of the G-WHIM descriptors has been presented in reference [17]. In thisreview, a summary of its main elements is newly presented, updating symbols and de-scriptors on the base of the state-of-art of the WHIM theory presented i n this chapter.

In principle, if a molecule is placed into an infinite, isotropic and evenly very densegrid, the scalar field F3 calculated at the grid-points must contain the same information,independent of the molecule's orientation and only depending on the potential energy ofthe selected probe and the mathematical functions representing the interaction. Thus F3

contains the whole information about the interaction properties of the molecule. In prac-tice. this ideal situation cannot be obtained. but i t can be simulated by plunging the mol-ecule into a finite grid N3: the aim is to represent the theoretical F3 scalar field by a finitesampling of this field.

Molecule position, grid dimensions and spacing between grid-points are crucialaspects to be defined in order to ensure that the obtained field is representative of theideal one. In the simplest approach. the molecule must be placed in an isotropic grid —i.e. the obtained scalar field N 3 is constituted of a finite number of points evenly distrib-uted along the three dimensions. When a class of similar compounds is considered. theisotropic grid requirement can be relaxed and a grid of n1 X n2 X n3 points can be usedif all the molecules are also oriented using a selected criterion. However, when the con-pared molecules are different in shape and size. this last approach to the grid definitioncannot be considered because the scalar fields would then he evaluated with differentsampling sizes in the three directions: thus, the sesulting description is no longer uniqueand the invariance to rotation is not preserved. In any case. in order to avoid field dis-tortions due to the truncation of the non-zero interaction values at the neighborhoods ofthe finite grid. the molecule must be centered i n the grid.

Grid dimensions depend on the field selected as each field shows a different analy- tical dependence from the molecule-probe distance (r). For example. when the non-bonding and electrostatic energy expressions are analyzed. the interaction energiesdecay with minus six and minus one power of r, respectively. In the first case. a gridvery close to the van der Waals surface of the molecule is sufficient to contain the mostinformative points. In the second case, where the electrostatic energy decays veryslowly with r,very large grids are required. To overcome this problem, an energy cutoffcriterion can be proposed. for which only electrostatic energy values relevant for theconsidered interaction (e.g. long-range or chemical interaction) are taken into account.In this way, points far from the molecule and not contributing to the interaction are notincluded in the calculations.

In spite of the centering of the molecule. the selection of an isotropic grid and theenergy cutoff definition, the invariance to rotation is, again, not preserved if thegrid-points are not dense enough. This is a very important problem since a too sparse distribution of grid-points represents an inadequate sampling of the ideal scalar field andis not able to guarantee that the calculated scalar field is representative of the ideal scalar field in such a way as to preserve rotational invariance. Preliminary calculations of the grid step can be used to select the optimal step.

Once the optimal choices for the grid arc selected. the G-WHIM descriptors are usedto condense the whole information contained in the scalar field into a few global para- meters, whose values are independent of the molecular orientation within the grid.

364

Page 371: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSAR

For each molecule, the G-WHIM descriptors are calculated by the following steps:

1.2.3.

The molecule is freely and separately imbedded i n the center of the grid.The scalar field is calculated by using the selected probe.The scalar field values are used as weights for the grid-point cooredinated: this is the main difference between G-WHIM and WHIM descriptors. In fact, in the lattercase. the coordinates are the spatial atomic coordinates, each weighted by one ofthe six different kinds of weights defined above: unitary weights U, atomic massesM, van der Waals volumes V, atomic electronegativities E, polarizabilities P, andelectrotopological charges S .Finally, the G-WHIM descriptors are calculated in the same way as lor the WHIMdescriptors — that is, by the calculation of a weighted covariance matrix, principalcomponent analysis and the calculation of statistical parameters on the projectedpoints along each principal component (i.e. on the score values).

4.

Point 3, above, deserves some further consideration. Firstly, it should be noted that onlypoints with non-zero interaction energy are effective i n the computation of the descrip- tors. Secondly, when the calculated interactions give both positive and negative values.the scalar field values cannot be used directly i n this form as statistical weights, whichmust be always semi-positively defined. In this case. the scalar field values are dividedin two blocks: a grid-negative (positive) block containing the grid coordinates associ-ated with negative (positive) interaction values. getting their absolute values and settingthe positive (negative) values to zero.

By this assumption, two sets of G-WHIM descriptors are obtained: the first describ- ing the positive part of the molecular field, and the latter describing the negative one.Thus, for each weighting block (positive (+) and negative (-)), the G-WHIM descriptorsconsist of 8 directional plus 5 non-directional molecular descriptors (26 for a completedescription of each interaction field), calculated from each molecule (point d):

The directional y and the noli-directional G parameters, defined for WHIM descriptorsand containing information about the molecular symmetry, are not considered in theframe of the G-WHIM approach because their meaning becomes doubtful, dependingheavily upon the point sampling. However, the information regarding the molecularsymmetry can be obtained by directly using the WHIM symmetry parameter.

The meaning of the G-WHIM descriptors is that previously defined lor the WHIMdescriptors, but now the descriptors are referred to the interaction field instead of themolecule. For example, the eigenvalues λ1, λ 2 and λ3 will be related to the interactionfield size; the eigenvalue proportions ϑ 1, ϑ2 and ϑ 3 will be related to the interactionfield shape; the group of descriptors constituted by the inverse function of the kurtosis(k), i.e., ηm = 1/κm will be related to the interaction field density along each axis. Moreover, global information about the interaction field is obtained from the non-direc-tional WHIM (T, A. V, K, D), with the meaning previously defined.

In the first chapter [17], the acentric factor has been used as shape descriptor —defined as w = ϑ 1 – ϑ3, ranging between zero (spherical interaction field) and one

365

Page 372: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

(linear interaction field): moreover, among the non-directional WHIM, only the eigen- value sum T has been previously used.

3.1. Grid and points setting

The optimal grid size and point density must be evaluated by preliminary calculationson the largest molecule (or better, on the molecule with the broadest molecular field ofthe selected molecule set. The behavior of the G-WHIM descriptors with variation inthe grid size and the point sharpness has been tested on a simple molecule such as thechlorobenzene. using the electrostatic potential to represent a very broad interaction.

To do this, a number of assumptions were made. The first assumption is to evaluatethe chemical information due to the interaction outside the van der Waals surface: the field values in the inner part of the molecule have not been considered. The second as-sumption is to define different energy cutoffs: three values corresponding to -2, -5 and-10 kcal/mol have been selected for this case. The higher the cutoff value, the smallerthe considered region around the molecule (i.e. the total number of non-zero weightedgrid-points).

After preliminary calculations, the grid size was fixed as a 20 A edged cube in order to ensure the inclusion of the smallest cutoff surface. The steps defining the field points

(3 cutoffs X 5 steps), the G-WHIM descriptors were calculated: the values obtained arereported in table 1 of reference [17].

Figure 3 shows the trend of T (the eigenvalue sum) as a function of the step for eachcutoff value; all the other descriptors show similar behavior. As can be seen, the T para-meter for each cutoff value tends to become constant as the step decreases, convergingtowards the most reliable value, corresponding to the smallest step (0. 1 i n this case).A single calculation at step 0.05 and cutoff -5 kcal/mol confirms the obtained con-vergence.

The step for which the parameter values show only small differences represents theoptimal step for which N 3 F3; this also means that using smaller steps docs notchange the G-WHIM parameter values. For cutoff values very close to the minimumenergy value ( in this case. lower than –10 kcal/mol), the sampling of the interactionfield can be unreliable because too few points are considered to represent the greatenergy variability of the field. The parameters obtained for each cutoff value are differ-ent. representing three different chemical situations, as can be easily observed from theT values: a relatively long-range interaction (-2 kcal/mol) shows a higher T value(greater total field dimension). while a relatively short-range interaction (-10 kcal/mol)shows a lower T value (smaller total field dimension).

3.2. Invariance to rotation

The invariance to rotation is a point of utmost importance in order to avoid problemstypical of other QSAR strategies. such as molecular alignment. To check the invarianceto rotation, the electrostatic potential of chlorobenzene was used, as above. The follow-

366

were fixed at 0.10. 0.25. 0.50. 0.75 and 1 Å, respectively. For each of the 15 cases

Page 373: 3D QSAR in Drug Design. Vol2

Fig

. 3. V

alue

s of

the

eige

nval

ue s

um (

T, s

ize)

for

diff

eren

t ste

p si

zes

calc

ulat

ed fo

r th

ree

cuto

ff v

alue

s.

367

Page 374: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

ing calculations were performed: the G-WHIM parameters were calculated for the samecubic isotropic grid for 5 different steps (0.1, 0.25, 0.5, 0.75 and 1.0 Å, respectively)and for 21 different orientations of a centered molecule.

Figure 4 shows the values of the eigenvalue sum (T) for the 21 molecule rotations, fordifferent step sizes. For this case, note the stabilization due to reduction of step size, thevalue of the considered descriptor clearly shows good invariance with respect to rota-tions at step = 0.5: for all the descriptors, the standard deviations are less than 0.075.

The values of the standard deviations of each descriptor, calculated from the descrip-tor values obtained for the 21 molecule rotations, decrease as the step size decreases,confirming the expected trend observed above. The 3 eigenvalues (λ) and their sum (T ) ,represented by numbers not limited to between 0 and 1 (as for other descriptors), showstandard deviations higher, on average, than those of the other descriptors.

Thus, as expected from the theory, for a representative sampling of the interactionfield — i.e. when the step size is small enough — G-WHIM parameters independent ofthe molecular orientation within the grid space can be obtained. Similar behavior (oreven better) is shown by all the other G-WHIM descriptors. Moreover, the results ob-tained in the case of chlorobenzene have also been confirmed in the case where phenol was used as the test molecule and 4 different cutoff values (0.5, 1.0, 1.5 and 2.0 Å) wereused.

3.3.

From the theoretical point of view, the G-WHIM approach to QSAR problems appearsvery promising, integrating the information contained in the WHIM descriptors andovercoming any problems due to the alignment of the different molecules and the ex- plosion of variables arising from traditional grid approaches. In particular, the G-WHIMapproach can take into account both all the points within the cutoff values, excludingonly the positive interactions within the inner part of the molecule and the surfacepoints at a cutoff value — i.e. the points on the iso-interaction-energy surfaces.

The ability to define different molecular interaction regions. taking into account theindividual parameters provided from different cutoff values, is undoubtedly a fas-cinating possibility which may lead to a deeper chemical insight into molecularinteractions and properties.

4. Descriptors

The WHIM descriptors defined above can be used separately in modelling: the set of 37non-directional WHIM and the set of 66 directional WHIM. For the sake of comparison, other sets of theoretical descriptors are sometimes used, either as a single set or jointlywith WHIM descriptors. In particular, the software produced by our research group(WHIM-3D/QSAR, see below) calculates two other sets of descriptors: the first is con-stituted by the so-called structural descriptors (30) — i.e. the number of different kindsof atoms (e.g. nH, nC, nF and nX are the number of hydrogens, carbons, fluorines, halo-gens, respectively), the number of bonds (nBO), the number of some functional groups

Perspectives of the G- WHIM approach

368

Page 375: 3D QSAR in Drug Design. Vol2

Fig

. 4.

Val

ues

of th

e ei

genv

alue

sum

(T

, siz

e) fo

r 21

dif

fere

nt r

otat

ions

of t

he te

st m

olec

ule

(chl

orob

enze

ne),

for

diff

eren

t ste

p si

zes

369

Page 376: 3D QSAR in Drug Design. Vol2

Roberto Todeschiniand Paola Gramatica

(e.g. nOH, nCO and nNH2 are the number of OH, CO and NH2 groups, respectively)and the number of rings of different size (nR03, .. ., nR I O are the number of' 3-memberedto 10-membered rings). Moreover, the number of atom acceptors and donors of H-bonds(nHA and nHD) are also considered. The second set is constituted by the more fre-quently used 34 topological descriptors (information and connectivity indices), as defined in reference [19].

To all the sets of descriptors, the molecular weight (MW) has also been added. The G-WHIM descriptors, as defined above, have been used in some cases. The descriptors pro-vided from theories, like topological or WHIM, are numerous and are usually internallyCorrelated. Chemometric strategies that use variable subsets selection procedures (e.g.genetic algorithms or simulated annealing) and model validation techniques play a key rolein obtaining predictive and stable models in QSAR studies (see section 5 on Methods).

5. Methods

The minimum energy conformations of all the compounds were obtained by the molec-ular mechanics method of Allinger (MM2). using the package HyperChem [20]; WHIMdescriptors were calculated from the obtained coordinates using our package WHIM-3D/QSAR for WINDOWS/PC (now available on request [21]). This package has beenextended to calculate also structural and topological indices. A special version has alsobeen extended to the calculation of the G-WHIM descriptors, reading the grid co-ordinates and the grid-point property values for each molecule. Principal ComponentAnalysis (PCA) has been performed by STATISTICA [22]. The HyperChem/Chempluspackage [20] has been also used to calculate some simple physico-chemical properties.

The selection of the best subset variables (variable subset selection, VSS method) formodelling the selected properties was through the genetic algorithm (GA-VSS)approach [23] where the response is obtained by ordinary least-squares regression(OLS), using our package Moby Digs for variable selection for WINDOWS/PC [24].

All calculations were performed by Moby Digs using the leave-one-out procedure ofcross-validation. maximizing the cross-validated R-squared (Q2) In many cases, moredemanding validation procedures were also used and always when models with highlycorrelated variables had been selecled by GA-VSS. In these cases, when possible, anexhaustive leave-more-out procedure - i.e. leaving out g objects in all the possible ways — was used; otherwise. a random selection of g objects to be left out was repeatedsome thousands of ways. For the obtained models. all variables are highly significantwithin the 95% confidence level.

6. QSAR applications

In the gaining of confidence with the WHIM approach to QSAR problems, a number ofdatasets have been used and discussed in previous papers [9–12,14], testing the predic- tion capabilities for several responses. either physico-chemical properties or biological/toxicological activity responses.

Owing to the importance of the surface of a molecule for the determination of variousphysical, thermodynamic, toxicological, biological and transport properties, the WHIM

370

Page 377: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSAR

descriptors were successfully applied [9] to calculate the total surface area (TSA) of 101heterogeneous compounds. On a small and homogeneous class of contaminants, 12chlorobenzenes that are widely distributed throughout the environment, properties ofvery different kinds have been modelled, such as melting and boiling points, solubility,hydrophobicity, bioconcentration factor, toxicity on algae and Microtox. In all cases,high prediction powers resulted: as is already known, size parameters play the main modelling role for all these properties but WHIM descriptors of molecular shape, atomicdistribution and entropic contribution from symmetry add useful information [14].Similar results were obtained in the modelling of chlorophenol toxicity on sevendifferent biosensors [12].

aromatic hydrocarbons

as melting point, boiling point and hydrophobicity have been successfully modelled[10], with good prediction powers. A study was made of highly heterogeneous com- pounds [11], belonging to the EEC priority list of dangerous chemicals (amines, chlorobenzenes, organotin and organophosphorous compounds; 49 compounds) with different toxic actions: the toxicity on Daphnia magna and hydrophobicity wasmodelled, leading also to general models and predictions for external compounds.WHIM descriptors used with topological indices were the subject of a recent work [16]on an extended class of 118 environmental priority chemicals dangerous to the aquatic environment, selected by the European Union according to the directive 76/464/EECand included in the so-called ‘List 1 ’. In particular, physico-chemical properties(melting point, boiling point, density, refraction index, solubility and hydrophobicity)and toxicological properties (algae, bacteria, Daphnia, fish and mammals) weremodelled with mixed approaches (WHIM and other sets of descriptors). In the reportedpaper [16], a new hydrophilicity index (Hy), showing good modelling power, was alsodefined.

A comparison between WHIM and topological descriptors was recently performed[15] on a wide class of haloaromatic compounds (73 halobenzenes and 89 halo-toluenes). In all cases, the WHIM descriptors gave better predictive results in modellingmelting, boiling and flash points and density.

An extensive comparison of the latest developments of the WHIM descriptors withother different QSAR approaches has been performed on the datasets reported above.With regard to the G-WHIM descriptors, an application of a raw modification [25] ofthe original G-WHIM version [17] has been applied to the classical dataset of 31steroids studied by Cramer et al. [26]. Despite some G-WHIM descriptors having beencalculated roughly, the results are encouraging compared with the original Cramercalculations and later CoMFA refinements.

The G-WHIM approach has been improved i n analogy with the WHIM approach andhas been further tested for the toxicity of 14 dioxins.

6.1. Chlorophenols

On a large dataset of environmentally relevant polycycle

In a previous paper [12], an estensive study was made of chlorophenol toxicities,asessed by different biosensors (Microtox, bacteria, Daphnia, algae, guppy, flounder

371

(PAH, 82 compounds). some important, wide-range, physico-chemical properties such

Page 378: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

and SMP), with good results: Q² between 95% and 97% for almost all the responses. Inthe same paper, logKow was also modelled with size descriptors such as MW, Tu and Te

= 94.5 and R² = 96.1): however, this model significantly decreases its predictive(Q²power (Q² = 80.8) when cross-validated with a perturbation of 30%. Instead, a satisfac-tory and stable two-variable model (Vu and Ve size descriptors) was recalculated fromthe complete set of non-directional WHIM descriptors (Q² = 94.1 and R² = 95.9). Thismodel is perfectly stable when 30% of the objects is left out (Q² = 93.3), in spite of thegreat correlation between Vu and Ve.

A local property like pKa, difficulty modelled by global descriptors, is instead bettermodelled by directional WHIM descriptors with Q² ranging from 76% and 78% for thethree-, four- and five-variable best models. In these models, the most selected variablesare those related to directional size, atomic density and symmetry, which in some wayrepresent the relative position or the substituents responsible of the phenol groupacidity. For example, the three-variable model, constituted by λ2v (size). η1m (density) and γ 2v (symmetry), is:

(10)

The G-WHIM approach has also been applied to these compounds to verify, for the firsttime, the applicability of these new descriptors [17]. In this case, the WHIM andG-WHIM descriptors were successfully used jointly to improve on the results obtainedusing only WHIM descriptors, on logKow, pKa, toxicity on flounder and melting point.

6.2. N,N-dimethyl-2-Br-phenethylamines

To predict the antagonism of 22 N,N-dimethyl-2-Br-phenethylamines to epinephrine inthe rat (log 1/ED50), a comparison between Hansch and Free-Wilson approaches wasperformed by Kubinyi [ 27]. A more extended comparison, using also topological andWHIM descriptors, has been performed and several models have been evaluated byusing the last WHIM descriptor version [14] and comparing the results with those of theCoMFA approach [28]. A first set of models, based on substituent descriptors, has beenproposed by Unger and Hansch [29]. In particular, a two-variable model (ID = 10) isconstituted by the variables σ+ and π (Hammett electronic and hydrophobic constants,respectively). A three-variable extension of this model (ID = 7) corresponds to theregression model:

(11)

where rp is the van der Waals radius from the para position of the substituent. TheGA-VSS approach applied to the whole set of Hansch descriptors confirms these twomodels.

The Free-Wilson approach (ID = 14), based on 10 binary descriptors (2 sites X 5 sub-stituents), has also been used [30]. In spite of its good fitting performance, a check ofthe predictive power of this model gives a prediction power of zero! This result is not

372

n = 20 p = 3 Q2 = 75.7 R2 = 82.7

log(1/ ED50) = 7.058 – 1.015σ + + 0.621rp + 0.828π

Page 379: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSAR

unusual for the Free-Wilson approach when predictive performances are searched forand it is not used for sorting group contributions.

Other groups of descriptors have been applied to this dataset: structural, topologicalindices, directional and non-directional WHIM. used either separately or jointly. Usingthe topological descriptors, as defined in reference [19], the GA-VSS approach Ied to athree-variable regression model (ID = 9):

(12)

where MW, AAC and ICEN are the molecular weight, the average atomic compositionindex and the centric index, respectively.Using the non-directional WHIM descriptors, a three-variable model, based on size andshape descriptors, is obtained (ID = 4):

(13)

Using the directional WHIM descriptors, the results concerning the best three- and four-variable models are (ID = 5 and ID = 3):

(14)

(15)

Finally, structural descriptors (only 11 are not constant for this dataset) have also beentried: the best model (ID = 13) is a three-variable model with MW, NAT (total numberof atoms) and nC1 (numberof chlorine atoms), giving a Q² = 62.2.

For both topological and WHIM models, the dependence of the response on size(MW, AAC, Tv, Te or λ1e) and shape (ICEN, Ks or ϑ1s) is confirmed. For the WHIMmodels, the presence of descriptors depending on electronegativity (e) and electrotopo-logical (s) weights highlights the importance of electronic parameters, in accordancewith σ+ of the Hansch model, in which the importance of molecular size is also con-firmed (π, rp) However, molecular weight alone is unable to give a predictive model(Q² = 8.0, R² = 25. 1) , as can be expected for structural isomers with different values ofbiological activity.

Mixed models have also been tried by using jointly structural, topological and bothsets of WHIM descriptors. In this case. the sequence of the best models was calculated from one- to four-variable models (ID = 12. 6. 2.1). Table 3 shows the more interestingmodels obtained from the different approaches, sorted on the Q² values.

The descriptors 1K and 2K are the topological shape indices of first- and second-order as defined by Kier [31,32]; the remaining descriptors are directional and non-directional WHIM.

As can be observed, a good model (ID = 12) is already obtained with the global di-mensional variable Tv; a significant improvement is obtained by adding λ3p (ID = 6),

373

Page 380: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

Table 3 22 variously substituted N,N-dimethyl-2-Br-phenethylamines

Comparison of the best models obtained by different QSAR approaches on the biological activity of

ª Number of significant principal components.

which takes into account out-of-plane substituent contributions. Alternatively, the out-of-plane information can be represented by shape descriptors (Ks or ϑ1s).

For these compounds. the CoMFA approach was also applied using steric and elec-tronic fields. The common substructure features of these compounds greatly reduce alignment problems: molecular superpositions were performed by minimizing the rmsdistance between all the common heavy atoms of the considered compounds. CoMFA was carried out using the QSAR option of SYBYL 6.2 [33]. The steric and electrostaticprobe-ligand interaction energies were calculated with Lennard-Jones and Coulomb po-tential functions within the Tripos force field, using a Csp³ probe atom with a charge of+1. The dimension of the CoMFA lattice was determined through the provided auto- matic procedure in order to extend the lattice walls beyond the dimensions of eachstructure by 4.0 Å in all directions. The lattice spacing established was 2.0 Å The steric energies were truncated at 30 kcal/mol and the electrostatic ones dropped to within the steric cutoff for each molecule. The regression model (ID = 11) was obtained using PLS, as implemented in SYBYL 6.2 and validated with the leave-one-out procedure. The optimal number of cross-validated PLS components is two and the corresponding prediction power is Q² = 80.5, a value comparable with the simplest WHIM modelbased only on the size descriptor Tv (Q² = 79.4).

6.3. Dioxins and analogs

On a dataset of 73 polyhalogenated aryl derivatives [34], WHIM models were searched for 71 pRB and 69 pAHH responses. The biological response pRB is pRB = –log

374

Page 381: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSAR

Comparison of the best models obtained by different QSAR approaches on the pRB and pAHH Table 4 biological responses of dioxin analogues

Response (approach) Size Q² R² Model descriptions

pRB (WHIM) 3 81.9 83.5 Tm Ts Vu

pRB (MTD) 1 n.a. 70.9 MTD

pAHH (WHIM) 4 66.8 71.8 Se Te De Vu

pAHH (WHIM) 3 64.3 68.0 Tm Vs Du

pAHH (WHIM) 2 62.7 66.6 Se Te

pAHH (MTD) 1 n.a. 66.4 MTD

n.a.; not available.

EC50(RB), where EC50(RB) values are the in vitro rat hepatic cytosolic Ah receptorbinding affinities. The pAHH response is pAHH = –logEC50(AHH), where EC50(AHH)values correspond to the in vitro induction of aryl hydrocarbon hydroxylase (AHH).

A close dependence of the responses on molecular size seems strongly suggestedboth by the WHIM models [14] and the models obtained with Minimal Topology Difference (MTD) approach [34], which is also related to the size of a molecule (Table 4). For both responses. models using topological descriptors, as defined in reference [19].were also tried, but the results obtained were unacceptable.

These 72 compounds were also used in a preliminary study of the relationships between WHIM descriptors and properties calculated using some well-known models largely used in commercial packages. For this purpose, the HyperChem/Chempluspackage [20] was used to calculate some simple physico-chemical responses: total surface area (TSA. [35]), molar volume (Vm. [35]). molar refraction (MR, [36]). polar-izability (Pol, [37]) and octanol–water partition coefficient (logKow or ClogP, [36]). Thecompounds’ properties were calculated in the Chemplus approaches on the same opti-mized molecular structures used for the calculation of the WHIM descriptors. Table 5 collects Q² values (leave-one-out procedure) for some selected models (common to thefive studied properties).

The results arc quite surprising, showing very high prediction powers for all the con-sidered properties by one- and two-variable WHIM models. Moreover, all the properties

Table 5 calculated physico-chemical properties of 73 dioxin analogues

Q² values (leave-one-out procedure) of the selected non-directional WHIM models for five

Size Variables TSA Vm MR Pol ClogP

1 Sp 97.7 99.2 99.2 97.1 84.3

1 Sv 97.0 98.9 98.8 97.5 80.3

2 Sp Ts 98.6 99.5 99.2 97.6 84.9

2 Sv Tu 97.1 99.1 99.4 99.3 95.4

2 Sv Dv 98.6 99.4 98.8 97.7 88.9

375

Page 382: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

(with the exception of ClogP, in part) are ultimately modelled by a small number ofsimilar-size WHIM descriptors. highlighting a genera1 (and expected) correlation betweenthese properties and size parameters.

If the quality of the obtained regression models will be confirmed for more heteroge- neous compounds. at least the first four properties could be predicted by simple global models using the WHIM approach. avoiding problems due to limited parameterizations or time-consuming algorithms.

From this dioxin analogue dataset, tlie subset of 14 PolyChloroDibenzoDioxins(PCDD), for which the toxicity measured as Ah receptor binding affinity (pRB) isknown, has also been studied separately. Regression models have been calculated usingdifferent QSAR appi-oaches and different sets of descriptors: the Free-Wilson approach [30], the topological indices [19], the values of Molecular electrostatic potential (MEP)at selected points [38] and the WHIM and G-WHIM descriptors recalculated in theversion proposed here. The results can be seen in Table 6.

The three topological indices (BAL, IDM and WIA) are the Balaban index. the infor- mation index on the magnitude of distances and the average Wiener index, respectively. The model based on MEP was obtained on the basis of a variable selection performed on a few MEP values chosen in the 3D distribution of this property and used as molecular descriptors.

For the two WHIM models of Table 6, a more demanding validation procedure(20% of perturbation, 3 objects left out. exhaustive calculations) gives Q² = 89.8 andQ² = 84.2, respectively. For these compounds, the G-WHIM approach has also beenused [39]. The niolecular electrostatic potential was calculated in all the grid-pointsbetween tlie van der Waals surface and a predefined iso-potential surface (i.e. a thresh- old surface). The selected threshold values are –4 and +10 kcal/mol for the negative and positive parts of the potential distribution, respectively.

26 G-WHIM descriptors were used. 13 for each part of the potential distribution and the two best models found by the genetic algorithms are reported in Table 6. The prediction powers of these models are exceptionally high; moreover, an exhaustive validation (20% of perturbation, i.e. 3 objects left out at each step) was also performed. In both cases. the obtained leave-more-out Q² values remain high: Q² = 98.2 and

Table 6 polychlorinated dibenzodioxins (PCDD)

Comparison of models obtained by different QSAR opproaches on the toxicity of 14

376

Page 383: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSAR

Q² = 95.7, respectively. From the analysis of the MEP distributions of the modelvariables and of their standardized regression coefficients. high binding affinity is deter-mined by a great global extension of the negative distribution (T(-)), with a smallportion of ‘empty space’ between the separated negative distributions (D(-) and η1(–))

extension of the positive distribution out of the molecular plane (λ3(+)).The G-WHIM descriptors efficiently condense/extract the information contained in

MEP distributions leading to good quantitative models of the relationships betweenelectrostatic propertics and binding affinity pRB of the PCDD. These 14 PCCDs,included i n a dataset of 25 polyhalogenated dioxins, have also been studied by Wallerwith the CoMFA approach [40]. In this case, the result was:

(16)

7. Conclusion

The complexity of chemical phenomena today calls for new and complementary termsto explain such phenomena. In fact, one must be aware that, for example, the biologicalactivity of a molecule, as well as several physico-chemical properties, are, in manycases, the result of complex interactions between the molecule and its chemical, phys-ical and biological medium. Because of this involved complexity, high prediction cap-abilities cannot usually be expected from very simple relationships between theresponse and only a few independent descriptors.

WHIM descriptors, obtained from different weighting schemes, can be viewed as anadaptive descriptor space, containing both global and directional information (spreadalong orthogonal axes) that, in many cases, seems able to capture this complexity. Inany case, WHIM descriptors give a holistic representation of the molecule, whereas re- ductionistic approaches (chemical interpretation in terms of local properties, functionalgroups, additive schemes based on molecular fragments or on atomic types, etc.) arerather limited. These descriptors are the first 3D molecular descriptors that are invariant to rotation and translation, thus avoiding any molecule alignment problem. The algo-rithms for their calculations are very simple and are not time-demanding. Unliketopological descriptors. WHIM descriptors are able to distinguish different con-formations of the same molecule and, obviously, different geometric isomers.

Their interpretability is discussed in this chapter and their meaning in relation to 3Dstructural characteristics of molecules is highlighted, although a deeper insight is stillneeded to better understand their meaning and correlation with several physico- chemical properties. For example, the peculiar role of the symmetry descriptors (Gw) inmodelling the melting point is always confirmed for all the studied cases.

As highlighted in the studied cases, the high modelling power of WHIM descriptors toface QSAR problems has been confirmed, although the modelling power of directionalWHIM descriptors needs further investigation for cases where they may be of ad-vantage for the correlation of responses with local properties.

The G-WHIM approach appears to be a powerful tool for the future of QSAR studies— in particular, in the pharmacological field. In fact, it overcomes problems due to the

377

and the spherical shape of the positive distribution (K(+)), together with a little

n = 25 Q2 = 71.5 R2 = 91.9

Page 384: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

alignment of different molecules and to the explosion of variables arising fromtraditional grid approaches, based on interaction energy fields.

Finally, encouraging signs of the wide applicability and the modelling power of theWHIM approaches are constantly corning from the QSAR studies of our researchgroup. In fact, studies on the gas-chromatographic relative retention time, physico-chemical properties such as Henry’s constant, total surface area, melting point, solubil-ity, aqueous activity coefficients and hydrophobicity of 209 PolyChloroBiphenyls(PCB) are in progress with good preliminary results [41]. Research on phytotoxicity of

tives (WQO) [42], atmospheric reactivity (with OH*, NO3 and O3 ) of several organic chemicals released into the environment [43], environmental behavior ofChloroFluoroCarbons (CFC) and HydrogenChloroFluoroCarbons (HCFC) [44] andbiodegradability of organic pollutants [45] have given encouraging preliminary pre-dictive models. Moreover, the WHIM approach has also been applied to the classi-fication of 152 organic solvents. As a result of this, a new general classification basedon multivariate analysis has been proposed [46].

Acknowledgements

We wish to thank, for their contributions to this review: Drs. Laura Bonati, Ugo Cosentino, Marina Lasagni, Giorgio Moro and Professor Demetrio Pitea (Department ofPhysical Chemistry. University of Milan, Italy), Drs. Alessandro Maiocchi (BRACCOspa, Milan, Italy) and Natalia Navas (University of Granada, Spain).

References

1.2.

Horvath, A.L., Molecular Design. Elsevier, Amsterdam, 1992.Karcher, W. and Devillers, J. (Eds.), Practical Applications of Quantitative Structure-Activity Relationships (QSAR) in Environmental Chemistry and Toxicology, . Kluwer, Academic Publications. Dordrecht. 1990.Hansch, C. and Leo, A., Exploring QSAR, American Chemical Society, Washington D.C., 1995.Mercy, P.G. (Ed.), Mathematical Modeling in Chemistry. VCH, Weinheim, 1991.Kier, L.B. and Hall, L.H., Molecular Connectivity in Structure–Activity Analysis. Research Studies Press, Letchworth, U.K., 1986. Bonchev. D., Information Theoretic indices for Characterization of Chemical Structures, ResearchStudies Press. Letchworth, U.K., 1983.

7. Sabljic, A., Topological indices and environmental chemistry, In Karcher, W. and Devillers, J. (Eds.)Practical applications of quantitative structure–activity relationships (QSAR) in environmental chem-istry and toxicology, Kluwer Academic Publications, Dordrecht. 1990. pp. 61–82.

3.4.5.

6.

8. Basak, S.C., A nonempirical approach to predicting molecular properties using graph-theoreticinvariants, In Karcher, W. and Devillers, J. (Eds.) Practical applications of quantitative structure- activity relationships (QSAR) in environmental chemistry and toxicology. Kluwer Academic Publishers, Dordrecht, 1990. pp. 83–103.

9. Todeschini, R., Lasagni, M. and Marengo, E., New molecular descriptors for 2D and 3D structures theory: Part I, J. Chemometrics, 8 (1994) 263–272.Todeschini, R.. Gramatica, P., Provenzani, R. and Marengo. E., Weighted holistic invariant moleculardescriptors: Part 2. Theory development and application on medeling physico-chemical propel-ties ofpolyaromatic hydrocarbons, Chemom. Intell. Lab. System.;. 27 (1995) 221–129.

10.

378

triazine derivatives, classification of aquatic pollutants according water quality objec-

Page 385: 3D QSAR in Drug Design. Vol2

New 3D Molecular Descriptors: The WHIM theory and QSAR

Todeschini, R., Vighi, M., Provenrani, R., Finizio, A. and Gramatica, P., Modeling and prediction by using WHIM descriptors in QSAR studies: Toxicity of heterogeneous chemicals on Daphnia magna:Part 3, Chemosphere, 32 (1996) 1527–1545. Todeschini, R., Bettiol, C., Giurin, G., Gramatica, P., Miana, P. and Argese, E.. Modeling and prediction by using WHIM descriptors in QSAR studies: Submitochondrial particles (SMP) as toxicity biosensors

11.

12.

13. Todeschini, R. and Gramatica, P., 3D-modelling and prediction by WHIM descriptors: Part 5. Theory development and chemical meaning of WHIM descriptors, Quant. Struct.-Acta. Relat., 16 (1997) 113-119.Todeschini, R. and Gramatica, P., 3D-modelling and prediction by WHIM descriptors: Part 6.Application of WHIM descriptors in QSAR studies, Quant, Struct.-Act. Relat., 16 (1997) 120–125. Chiorboli, C., Gramatica, P., Piazza, R.. Pino, A. and Todeschini R., 3D-modelling and prediction by WHIM descriptors: Part 7. Physico-chemical preperties of haloaromatics: comparison between WHIM and topological descriptors, SAR QSAR Envir. Res., (1997) (in press).Todeschini, R., Vighi, M., Finizio, A. and Gramatica, P. 3D-modelling and prediction by WHIM

20-TI and 3D-WHIM descriptors, SAR QSAR Envir. Res.. (1997) (in press).Todeschini, R., Moro, G., Boggia, R.. Bonati, L., Cosentino, U., Lasagni, M., and Pitea, D., Modellingand prediction of molecular properties: Theory of grid-weighted holistic invariant molecular (G-WHIM)descriptor, Chemom. Intell, Lab. Systems, 36 (1997) 65–73.Todeschini, R., Data correlation, number of significant principal components and shape of molecules:The K correlation index, Anal. Chim. Acta, 348 (1997) 419–430.Todeschini, R., Cazar, R. and Collina, E.. The chemical meaning of topological indicex, Chemom. intell, Lab. Systems, 15 (1992) 51–59.HyperChem, rel. 4, and Chemplus, rel. 1.0 for WINDOWS, Autodeck, Inc., Sausalito, CA, U.S.A., 1995.Todeschini, R., WHIM-3D/QSAR — Software for the calculation of WHIM descriptors, rel. 2.1 forWINDOWS. Talete srl, Milan, 1996.STATISTICA, rel. 5.0 for WINDOWS. StatSoft, Inc., Tulsa, OK. U.S.A., 1995.Leardi, R., Boggia, R. and Terrile, M., GeneticJ . Chemometrics, 6 (1992)267–281.Todeschini, R., Moby Digs — Software for Variable Subset selection by Genetic Algorithm, rel. 1.0 for WINDOWS. Talete srl. Milan. 1997.

14.

15.

16.descriptors: Part 8. Toxicity and physico-chemical properties of environmental priority chemicals by

17.

18.

19.

20.21.

22.23.

24.

algorithms as a strategy for feature selection,

25. Bravi, G., Gancia, E.. Mascagni, P., Pegna, R.. Todeschini, R. and Zahiani, A. MS-WHIM, new 3D theor-etical descriptors derived from molecular surface properties: A comparative 3D-QSAR study in a series

of steriods, J. Comptut.-Aided Mol. Design, 11 (1997) 79–92. Cramer III, R.D., Patterson, D.E., Bunce. J.D., Comparative molecular field analysis (CoMFA): I. Effectof shape on binding of steriods to carrier proteins, J. Am. Chem. Soc., 110 (1988) 5959–5967.Kubinyi, H. (Ed.), QSAR: Hansch Analysis and Related Approaches, VCH. Weinheim, 1993,

developing practice of comparative design, ESCOM, Leiden 1993,

Unger, S.H. and Hansch, C.J., On model buliding in structure-activity relationship: A reexamination ofadrenergic blocking activity of ß-halo-ß-arylalkylamines, J. Med. Chem., 16 (1973) 745–749.Free, S.M. and Wilson. J.W., A mathematical contribution to structure–activity studies, J. Med. Chem.,7 (1964) 395–402.Kier, L.B., Shape index from molecular graphs, Quant. Struct.-Act. Relat., 4 (1985) 109–116.Kier, L.B., An index on molecular flexibility from Kappa shape attributes, Quant. Struct.-Act. Relal., 8 (1989) 218–221.SYBYL, Release 6.0, Tripos Assoc. Inc., St. Louis. MO, U.S.A., 1993.Sulea, T., Kurunczi, L. and Simon. Z., Dioxin-type activity for polyhalogenated arylic derivatives: A QSAR model based on MTD-method, SAR QSAR in Environ. Rea.. 3 (1995) 37–61.Bodor, N., Gabanyi, Z. and Wong, C., A new method for the estiation of partition coefficient, J. Am.Chem. Soc., 111 (1989) 3783–3786.

26.

27.pp. 57–68.

28. Cramer III. R.D., DePriest, S.A., Patterson, D.E. and Hecht, P., Themolecular force field analysis, In Kubinyi, 13. (Ed.). 3D QSAR in drug

29.

30.

31.32.

33.33.

35.

379

of chlorophenols: Part 4, Chemosphere, 33 (1996) 71–79

pp. 443–505.

Page 386: 3D QSAR in Drug Design. Vol2

Roberto Todeschini and Paola Gramatica

36. Ghose, A.K. and Crippen, G.M., Atomic physicochemical parameters from three-dimensional structure directed quantitative structure-activity relationships: Part 4 , J. Chem. Inf. Comp. Sci., 29 (1989)163-172.Miller, K.J., Additivity methods in molecular polarizability, J. Am. Chem. Soc., 112 (1990) 8533–8542. Bonati, L., Fraschini, E., Lasagni, M., Palma Modoni, E. and Pitea, D., A hypothesis on the mechanismof PCDD biological activity based on molecular electrostatic potential modelling: Part 2 , Theochem.,340 (1995) 83-95.Moro, G., Bonati, L., Lasagni, M., Cosentino, U., Pitea, D., and Todeschini, R., Molecular descriptorsderived f r o m 3D-distributions of molecular scaler fields (G- WHIM descriptors): quanti tat ivestructure-activity relationships of polychlorodibenzo-p-dioxins (forthcoming).Waller, C.L. and McKinney, J.D., Comparative molecular field analysis of polyhalogenated dibenzo-p-dioxins, dibenzofurans, and biphenyls, J. Med. Chem., 35 (1992) 3660–3666. Todeschini, R., Gramatica. P. and Navas, N., 3D-modelling and prediction by WHIM descriptors: Part9. Chromatographic relative retention time and physico-chemical properties of PolyChloroBiphenyls(PCB),Cheman. Intell. Lab. Syst. (in press).Colombo, S., Thesis in Environmental Sciences, University of Milan, 1996.Consonni, V., Thesis in Environmental Sciences, University of Milan, 1997.Triacchini, G., Thesis in Environmental Sciences, University of Milan, 1997.Merli, V., Thesis in Environmental Sciences, University of Milan, 1996.Tagliabue, M., Thesis in Chemistry, University of Milan, 1996.

37.38.

39.

40.

41.

42.43.44.45.46.

380

Page 387: 3D QSAR in Drug Design. Vol2

EVA: A Novel Theoretical Descriptor for QSAR Studies

Trevor W. Heritageª*, Allan M. Fergusonª, David B. Turnerb) and PeterWillettb

ª Tripos Inc., 1699 South Hanley Road, St. Louis, MO 63144, U.S.A.b Krebs Institute for Biomolecular Research and Department ofInformation Studies, University of

Sheffield, Wertern Bank, Sheffield S10 2TN, U. K.

1. Introduction

Since the advent of classical QSAR techniques, exemplified by Hansch [1], there hasbeen considerable progress in the development of molecular descriptors and chemo-metric techniques for use in such studies. The development of 3D QSAR techniques [2]that attempt to correlate biological activity with the values of various types of molecularfield, for example, steric, electrostatic or hydrophobic. has been of particular interest[3,4]. The original and most well-known of the 3D QSAR techniques is ComparativeMolecular Field Analysis [3], (CoMFA) which uses steric and eleclrostatic field valuescalculated at the intersections of a three-dimensional grid that surrounds the structuresin the dataset. A major limitation of CoMFA and most other 3D QSAR techniques, isthe dependency upon the relative orientation o f the molecules in the dataset [5,6].Despite efforts to improve the efficiency of the alignment process [7-9], the selection ofthe molecular alignment is regarded as the major variable in the analysis. These prob-lems are further exacerbated when the conformational flexibility of the molecules in thedalaset is considered.

There is, therefore, considerable interest in the development of new descriptions ofmolecular structure that do not require the alignment of molecules, but retain the 3Dand molecular property information encoded within molecular fields. Alternative de-scriptions of molecular fields than those used in CoMFA or molecular surface proper-ties, for example, methods based on autocorrelation vectors [10], molecular moments[11] or 3D WHIM descriptors [12], may provide effective orientation-independentdescriptions of molecular structure. In this chapter, we review a novel alignment-freedescriptor of molecular structure, known as EVA (EigenVAlue descriptor), that isdceived from calculated infrared (IR) range vibrational frequencies. As discussed laterin this chapter, EVA has been found to yield statistically robust QSAR models that arecomparable, in statistical terms, to those derived using CoMFA, with the advantage thatEVA does not require structural alignment.

2. The EVA Descriptor

During the late 1980s. workers at Shell Research Ltd. [13]reasoned that a significantamount of information pertaining to molecular properties, in particular. biological

To whom correspondence should be addressed.

© 1998 Kluwer Academic Publishers. Printed in Great Britian.H. Kubinyi et al. (eds.) 3D QSAR in Drug Design, Volume 2. 381-398.

Page 388: 3D QSAR in Drug Design. Vol2

Trevor W. Heritage, Allani M. Ferguson, David B Turner and Peter Willet

activity, might be contained within the molccular vibrational wave function, of whichthe vibrational spectrum is a fingerprint. The EVA descriptor is derived from normal co-ordinate Eigenvalues (i.e. the vibrational frequencies) that are either calculated theoret-ically or extracted from experimental I R spectra. Typically, a classical normalcoordinate analysis (NCA) [14] is performed on an energy-minimized structure and theresulting eigenvalues represent the normal mode frequencies from which the EVA de-scriptor is derived. The associated normal coordinate eigenvectors (i.e. the vibrational motions) are not used within the EVA descriptor. The force constants upon which anormal coordinate analysis is dependent may be determined using a molecular mechan-ics, semiempirical or ab initio quantum mechanical method. The accuracy of the cal-culated vibrational eigenvalues is, therefore, determined entirely by the quality of theforce constants applied or derived.

Using the standard Cartesian coordinate system as a basis for describing the dis-placement of an atom from its equilibrium position in a vibrating molecule requires 3Ncoordinates for a molecule containing N atoms. Three of these coordinates describerigid-body translational motion and a further three describe rigid-body rotations. Thus,in the general case for a molecule of N atoms, there are 3N-6 vibrational degrees offreedom, or 3N-5 for a linear molecule such as acetylene (only two coordinates arerequired to fix the orientation). The number of vibrational degrees of freedom is equi-valent to the number of fundamental vibrational frequencies (normal modes of vibra-tion) of the molecule. While each of these fundamental vibrations can be calculated,they may or may not appear in an experimental IR absorption spectrum due to sym-metry considerations — i.e. they may have zero (or close to zero) intensity [14].

Thus, in terms of the derivation of the EVAdescriptor, each structure is initially char-acterized by 3N-6 (or 3N-5) vibrational modes. In all but the special case where themolecules in the dataset contain the same number of atoms i t is not possible to comparethe vibrational frequencies directly. This so-called dimensionality problem docs notarise during a CoMFA analysis because the molecular fields arising from each moleculeare calculated across a fixed set of lattice points, but this would be an issue if, forexample, one wanted to compare directly the atomic point charges from which the elec-trostatic fields are derived. Furthermore, even in cases in which it is desired to comparemolecules that do contain the same number of atoms and hence the same number ofvibrational modes, it is difficult to establish which vibrations are directly comparablebetween molecules: this problem arises from inherent and effectively indeterminatecontributions made by individual atoms to a given vibrational mode [15].

In EVA, the dimensionality of the descriptor is unified across the entire dataset by athree-step procedure that involves transformation of the sets of vibrational frequenciesonto a scale where they are directly comparable (i.e.a scale of fixed dimensionality). Inthe initial step of this standardization process, the frequency values are projected onto abounded frequency scale (BFS) with individual vibrations represented by points on thisaxis (Fig. 1) . The bounds chosen for the BFS of 0 and 4000 cm–1 encompass the fre-quencies of all fundamental molecular vibrations and Facilitate comparison against ex-perimentally derived IR spectra. The second step in the standardization process involves

382

the generation of a value at each point on the BFS whereby each calculated frequency is

Page 389: 3D QSAR in Drug Design. Vol2

EVA: A Novel Theoretical Descriptor for QSAR Studies

Fig. 1. Bounded frequencyranitidine (upper spectrum).

scale with superimposed EVA descriptors for cimetidine (lower spectrum and

characterized in terms of ‘peak’height, width and shape. Each of the calculated vibra-tions is weighted equally during this process. The resulting value associated with eachof the calculated vibrations permits the proportion of overlap of vibrations to be deter-mined and may be considered analogous to, but in no way representative of, peakintensity. In principle, it should be possible to use theoretical vibrational intensities derived from derivatives of calculated or measured dipole moments, but the resultingintensities are notoriously inaccurate and this has not been attempted at the time ofwriting.

In practice, in the second step a Gaussian function of fixed standard deviation (σ) isplaced over each vibrational frequency value for a given structure. resulting in a seriesof 3N-6 (or 3N-5) identical and overlapping Gaussians (Fig. 2 ) . The value of the EVAdescriptor, EVAX, at any chosen sampling point, x, on the bounded frequency scale is

Fig. 2. Profile of the summed overlap Gaussians ( ‘EVA’ curve) for three arbitrary vibrational frequenciesand using a term of 10 The curve ‘EVA ’ is determined by summing the estimated ‘intensites' of the vi- brations centered at 28 37 and 51 respectively. The EVA descriptor is exacted by samplingthe frequency scale at fixed intervals of L

383

Page 390: 3D QSAR in Drug Design. Vol2

Trevor W. Heritage Allan M.Ferguson David B. Turner and Peter Willett

then determined by summing the contributions from each and every one o f the 3N-6 (or3N-5) overlaid Gaussians at that point

where ƒi is the i’th frequency for the structure.It is important at this stage to remind the reader that the purpose of the above EVA

smoothing procedure is not an attempt to simulate the infrared spectrum of the moleculeof interest, since the transition dipole data is ignored, but rather to provide a basis uponwhich vibrations occurring at slightly different frequencies may be compared to oneanother. The Gaussian function applied to define peak shapes adds a probabalisticelement, in that the peak maxima are centered at each of the calculated frequency values(fi) and, thus, these points are taken to be the most probable values of the respective fre- quencies. An EVA descriptor sampled at a point x ≠ fi is thus considered to be a lessprobable value of the i’th frequency and the corresponding contribution of fi to the finalvalue of EVAx will be less than the maximum possible contribution. To a certain extent. this behavior of the EVA descriptor reduces the dependency of the final QSAR modelon the accuracy of the original calculated vibrational frequencies, which arc sensitive tothe molecule geometry optimization criteria and to the theoretical approximations orempirically based parameters of the NCA procedure. Furthermore and as discussed indetail below, this behavior has implications regarding the sensitivity of the descriptor tomolecular conformation, in that small changes in vibrational frequencies arising from conformational changes may have insignificant effects on the resulting EVA descriptorvalues.

In the third and final, step of the data standardization process, the EVA function is sampled at fixed increments of L cm–1 along the BFS, which results in the 4000/L valuesthat comprise the EVA descriptor. Typically, a descriptor set is derived using a Gaussianstandard deviation (σ ) term of 10 cm–1 and a sampling increment (L) of 5 cm–1, resulting in 800 descriptor variables. As is the case with the CoMFA technique, the dimensional-ity of the EVA descriptor is much larger than the number of compounds in a typicalQSAR dataset and, thus, data reduction methods such as partial least squares to latentstructures (PLS) [16]or principal components regression (PCR) are applied to yieldrobust correlations with biological data.

3.

One of the first demonstrations in the public domain of the regressive modeljing capabil- ity of the EVA descriptor was obtained in a QSPR study [19],using the BCDEF datasetof Cramer [20]. The dataset consists of measured logP values for a highly heterogeneous set of 135 small organic chemicals, ranging from polycyclic aromatics such as thehighly lipophilic phenanthrene (logP = 4.46) to small hydrophilic moieties including methanol (logP= 0.64). The EVA descriptors were derived using a Gaussian spread (σ )term of 10 cm–1 and a sampling increment (L) of 5 cm–1 based on normal coordinate fre-

Applications of the EVA Descriptor in QSAR

384

Page 391: 3D QSAR in Drug Design. Vol2

EVA: A Novel Theoretical Descriptor for QSAR Studies

quencies calculated using the AM1 [21] Hamiltonian in the MOPAC [22] semiempirical molecular orbital program. These parameters yielded an EVA descriptor consisting of

regression equation based on only five PLS factors, that explained 96% of the variance in the logP values, was obtained in this way. Full leave-one-out cross-validation of thisdataset yielded a cross-validated-r² (i.e. q²) of 0.68. This model was then used to predict the logP value for a test set of 76 ‘unseen’ chemicals, resulting in a predictive r² of0.65. This study demonstrates the value of EVA as both an explanatory and a predictivetool and, in addition, highlights one of the key advantages over 3D QSAR techniquessuch as CoMFA, In cases such as this, where no intuitive alignment of the dataset struc-tures exists, it is very difficult or even impossible to apply CoMFA in a meaningful way, but with EVA no such complexity exists. Furthermore, bulk properties such aslogP have no orientation dependence and, thus, any attempt to introduce such a depend-ency for QSAR purposes is entirely arbitrary. The diversity of structures exemplifiedin this dataset also suggests that EVA may be applied to the analysis of diverse setsof compounds rather than just to congeneric series, which is a limitation for mostalternative descriptors.

In subsequent studies [23,24], the general applicability of the EVA descriptor inQSAR studies has been investigated i n detail using datasets exhibiting a range of bio-logical end-points (Table 1 ). Using EVA descriptors derived from AM1 modes, goodPLS models (in terms of q²) can be obtained for nine of the eleven datasets. The ex-ceptions to this are the oxadiazole [25] and biphenyl [26] datasets for which, at best, only poor models can be obtained. It is important to remind the reader that although theEVA QSAR models presented in Table 1 are satisfactory, they are based solely uponthe default EVA descriptor parameters (σ = 10 cm–1 and L = 5 cm–1). Additional studies[23,24] have been performed in which the effect of changes to these parameters on thequality of the final QSAR models has been investigated and for nearly all of the datasetslisted there exist combinations of σ and L that give rise to superior PLS models. Arange of these parameters should. therefore, be investigated prior to settling on a finalmodel. A protocol recommended by Turner et al. [23] suggests that a value of 10 cm–1

is a reasonable starting point for a QSAR study and thereafter if satifactory results arenot achieved, to supplement this with analyses based on σ -terms of 5.25 and 50 cm–1.

A useful benchmark in determining the effectiveness of the EVA descriptor forQSAR studies is to compare the statistical performance and model characteristics basedon the EVA descriptor with the analogous CoMFA model for the same datasets. A keylimitation in all such comparative studies [23,24] is that the datasets have been selectedbecause a good, published CoMFA model exists. This, therefore, leads to significantbias in favor of the CoMFA technique, hut none the less. the results do provideinteresting insights into the nature and scope of the EVA descriptor.

Examination of Table 1 shows that. at least in terms of the q² scores, the EVA de- scriptors provide roughly equivalent correlations for the cocaine [27], dibenzofuran[26]. dibenzo-p-dioxin [26], piperidine [25], sulphonamide [25] and steroid datasets [3].Although not as high as CoMFA, good predictive correlations are also obtained usingEVA for the ß-carboline [28] and nitroenamine [25] datasets. The two cases where

385

800 variables per structure, which were regressed against the logP values using PLS. A

Page 392: 3D QSAR in Drug Design. Vol2

Tab

le1

Sum

mar

y of

QSA

R a

naly

ses

usin

g E

VA

(de

rive

d fr

om A

MB

ER

and

MO

PA

C A

M1

norm

al m

odes

) an

d C

oMF

A d

escr

ipto

rsª

AM

BE

RM

OPA

CA

M1

CoM

FA(b

oth

fiel

ds)

SE

F

Dat

aset

q2r²

SEF

q²r²

SEF

ß-C

arbo

lincs

b0.

29(6

)0.

970.

5018

0.6

0.50

(6)

0.97

0.57

195.

50.

68(4

)0.

891.

1269

.9

0.49

(3)

0.87

0.

36

21.9

B

iphe

ny Is

0.

16(1

) 0.

72

0.48

30

.3

0.28

(2)

0.90

0.

3049

.0

Coc

aine

s0.

57(2

) 0.

91

0.26

51

.9

0.49

(2)

0.95

0.

20

94.0

0.

59 (4

) 0.

88

0.28

63

.0

Dib

enzo

-p-d

ioxi

ns0.

48(2

)0.

850.

5961

.5

Mus

cari

nics

0.42

(3)

0.88

0.

27

81 .7

0.

53(4

) 0.

95

0.17

17

1.3

0.59

(4)

0.84

0.

31

46.0

Oxa

diaz

oles

-0.

36(1

)–

––

-0.1

9(1)

0.38

0.33

12.9

0.51

(2)

0.85

0.07

56.2

0.68

(2)

0.88

0.53

77.2

0.66

(1)

0.80

0.66

92.3

0.78

(4)

0.97

0.25

273.

90.

72(6

)0.

850.

5729

.9D

iben

zofu

rans

0.61

(1)

0.74

0.69

103.

4

0.47

(2)

0.86

0.

59

41.3

0.

49 (3

) 0.

93

0.41

61

.5

0.84

(3)

0.96

0.

34

92.4

N

itroe

nam

ines

Pipe

ridi

nes

0.71

(5)

0.84

0.42

137.

90.

76(4

)0.

840.

4317

4.7

0.73

(3)

0.80

0.48

175.

2

Ster

oids

(TB

Gc

activ

ity)

0.42

(5)

0.99

0.17

206.

40.

70(4

)0.

980.

2017

5.0

0.62

(3)

0.92

0.37

67.7

Ster

oids

(CB

Gd

activ

ity)

0.79

(2)

0.90

0.38

85.0

0.70

(2)

0.87

0.45

59.1

0.75

(2)

0.91

0.37

93.0

Sulp

hona

mid

ese

––

–0.

54(6

)0.

8016

.460

.20.

65(5

)0.

8215

.187

.7–

aT

he le

ave-

one-

out q

2va

lues

are

rep

orte

d to

geth

er w

ith th

e op

timal

num

ber

of L

Vs

in b

rack

ets.

All

q2va

lues

of≤

0ar

ein

dica

ted

asan

dLV

opto

mitt

edas

mea

ning

less

. Mod

els

are

base

d on

the

sele

ctio

n of

LV

optb

y m

inin

um S

Ecv

scor

e.Fu

ll (f

itted

) mod

els

wer

e de

rive

d on

ly w

here

q2>

0.

bA

MB

ER

had

the

requ

ired

for

ce-f

ield

par

amet

ers f

or o

nly

39 s

truc

ture

s.

eT

esto

ster

one-

bind

ing

glob

ulin

aff

init

y as

the

targ

et a

ctiv

ity.

d

Cor

ticos

tero

ne-b

indi

nggl

obul

inaf

finity

asth

eta

rget

activ

ity.

eA

MB

ER

for

ce-f

ield

par

amet

ers

not a

vaila

ble.

386

Page 393: 3D QSAR in Drug Design. Vol2

EVA: A Novel Theoretical Descriptor for QSAR Studies

EVA performs poorly, the oxadiazole [25] and biphenyl [26] datasets, also yield the poorest CoMFA results, although statistically signifcant con-elations (q² 0.5) are stillobtained using CoMFA.

The robustness of PLS models derived using EVA has been extensively tested byTurner [24,31] , in terms of both randomization permutation testing [16] and the abilityof those models to make reliable predictions for test chemicals. Using the standardsteroid dataset from the original CoMFA study [ 3 ] , albeit with structures corrected ac-cording to Wagener et al. [10], a predictive-r² value of 0.69 is obtained for the ten test chemicals; the biological end-point used was the affinity for corticosteroid-bindingglobulin (CBG) expressed as 1/[logK].This compares to a much lower value forCoMFA combined steric and electrostatic fields of 0.35. The apparently poor CoMFAtest set predictive performance is almost entirely due to an extremely poor prediction for the only structure in the test set containing a fluorine atom, omission of which raisesthe CoMFA predictive-r² to 0.84. In contrast, the EVA predictive performance is raisedby 0.05 when this compound is excluded, a small but none the less significant improve-ment. Clearly, in terms of the EVA descriptor space this compound cannot be con-sidered an extreme outlier, but in terms of CoMFA fields it is too different from thestructures in the training set for a reliable prediction to be made.

The main advantage of EVA over CoMFA for QSAR purposes is the fact that orien-tation and alignment of the structures in the dataset is not required. In CoMFA, thealignment is the major variable, providing in some instances different modelling statis- tics for even quite small changes in the relative positions of the atoms in a pair of struc-tures. However, given the nature of the field-based descriptors used in CoMFA,alignment does facilitate a powerful means of visualizing the important features of aQSAR model in the form of plots of the structural regions that are most highly cor-related (either positively or negatively) with the biological property of interest. Despite the undoubted utility of these CoMFA plots, they do not indicate precisely which atomsare responsible for the modelled correlations since the electrostatic and steric fields are composed of contributions from each and every atom in the molecule, although thePredicted Activity Contributions (PAC) [17]method has been reported to overcome thisproblem. A further point to note is that it is not possible to predict the effects that struc- tural changes may have on the resultant CoMFA fields. In contrast to CoMFA, thereexists no obvious means of backtracking from those components of the EVA descriptorwhich are highly correlated with changes in biological activity to the corresponding molecular structural features; a discussion of the ways of achieving this is presented at the end of this chapter.

4. EVA Descriptor Generation Parameters

The judicious selection of parameters is a prerequisite to the success of any QSARmethod and EVA is no exception. The most fundamental parameters involved in thederivation of the EVA descriptor are the Gaussian standard deviation (σ) and thesampling increment ( L ) .

387

Page 394: 3D QSAR in Drug Design. Vol2

Trevor W. Heritage Allan M. Ferguson, David B. Turnerand Peter Willett

4.I .

The effect of varying the σ term of the EVA descriptor is illustrated in Fig. 3 in which.as is increased, the features of the descriptor profile are progressively smoothed. Theeffect of the application of a Gaussian function during the EVA descriptor standard-ization process is to ‘smear out’ a particular vibrational frequency such that vibrationsoccurring at similar frequencies in other structures overlap to a lesser or greater extent. It is this overlap that provides the variable variance upon which PLS modelling isdependent. By definition, each and every Gaussian must overlap, but for the most partthis occurs at small (negligible) values and, consequently, the contribution to variance isvery small. Only where the frequency values are sufficiently close to one anotherrelative io the value of σ is it likely that interstructural overlap of Gaussians will occurat values of significant magnitude. The selection of the Gaussian standard deviation,therefore, determines the number of and extent to which, vibrations of a particular fre-quency in one structure can be statistically related to those in the other structures in thedataset.

In addition to interstructural overlap of Gaussians, the σ term also governs the extentto which vibrations within the same structure may overlap at non-negligible values.Intrastructural Gaussian overlap of this type, which is also dependent on the ’density’(i.e. proximity) of vibrations at various regions of the spectrum, causes EVA variablesto consist of significant contributions from more than one vibrational frequency. Themixing of information contributed by individual normal coordinate frequencies is gen-erally considered undesirable, but in order to provide sufficient interstructural Gaussianoverlap, it is inevitable that a certain degree of intrastructural overlap occur.

Thus, small values of σ give rise to minimal intrastructural Gaussian overlap, while at larger values σ of significant overlap arises. In the former case, there will be a reduc-tion in interstructural overlap, perhaps to such an extent that there exists no overlap ofthe Gaussians at significant values. In this instance, the descriptor takes on the charac-teristics of a binary indicator, showing only the presence or absence of specific features,thereby rendering the descriptor useless for regression analysis, but perhaps still ofutility in classification analysis. In cases where larger σ values are used, increasedmixing of the information encoded by one frequency with that encoded by otherfrequencies arises.

Gaussian standard deviation (σ)

4.2. Sampling increment (L )

Detailed investigation into the effect of various combinations of the σ and L parameterson the resulting q² value has been carried out by Turner [24]. Turner’s results indicatethat, for the most part. the final q² value is insensitive to small changes in either of theseparameters —i.e. the information content of the EVA descriptor remains consistent. The most significant variations i n q² are seen as σ is reduced (giving a more spiky

lowering the spectral resolution. This result is intuitively reasonable since one wouldanticipate that, as L becomes very large relative to σ some of the Gaussian peaks (or

vibrational ‘intensity’) and the sampling increment (L) is increased; this is analogous to

388

Page 395: 3D QSAR in Drug Design. Vol2

Fig

. 3.

Eff

ect o

f the

te

rm o

n E

VA

des

crip

tor

prof

ile.

389

Page 396: 3D QSAR in Drug Design. Vol2

Trevor W. Heritage, AIlan M. Ferguson. David B. Turner and Peter Willett

information encoded within them) will be omitted from the descriptor. In some cases. the information omitted will be predominantly noise resulting in a superior QSAR model; but in other cases, signal may be accidentally omitted, resulting in degradationof the QSAR model. This phenomenon is known as blind variable selection since vari-ables are selected or excluded from the descriptor on a completely arbitrary basis whichis, of course, undesirable. The value of L at which blind variable selection begins tooccur is related to the σ term; the larger the σ term the higher the permissible value ofL. Thus, to avoid blind variable selection. one might wish to minimize the value of L,but this must be balanced against the additional computational requirements associated with such a practice. Conversely. therefore. the value of L should be maximized in orderto reduce the computational overhead and this leads to the concept of critical L values(Lcrit) which are σ specific and which, if exceeded. result in a sampling error. Table 2lists generally applicable Lcrit values for various values that were derived by a system-atic study of L and σ parameter settings for several EVA datasets [23]. Table 2 confirmsthat the intuitively reasonable and default. selection of an L of 5 cm–1 with a term of10 cm–1 should result in no blind variable selection and that in point of fact, the value of L may be increased to 20 cm–1 with no apparent information loss (change in q2).

The existence of these Lcrit values is important not least because one of the problems with CoMFA at present is that the coarse grid-point spacing (typically 2 Å) that is gen- erally used is such that there is incomplete sampling of the molecular fields. resulting in information loss. The consequence of this is that reorientation of an aligned set of mole-cules as a rigid body within the defining CoMFA 3D region often results in substantial changes to the resulting QSAR model [30], as evidenced in the q2 values. EVA, on theother hand, does not suffer from such sampling errors. provided that the Lcrit valuesgiven in Table 2 are not exceeded.

5.

Although the EVA descriptor is not intended to simulate the infrared spectrum of a mol- ecule, it is useful to visualize the EVA descriptor in the form of a ‘spectrum’. This permits the interpretation of the EVA descriptor by examination of the distribution of vibrations in a molecule or in a set of molecules. Figure 4 shows plots of the EVA de- scriptor for deoxycortisol (one of the most active CBG-binding compounds in the origi- nal steroid dataset used by Cramer [3]) and estradiol (one of' the inactive structures)over the spectral range 1 to 4000 cm–1. Also shown in Fig. 4 is the univariate standard deviation of the descriptor over the entire dataset of 21 structures [3]. The density of

Characteristics of the EVA Descriptor

Table 2

Gaussian standard deviation. σ_1 1 2 3 4 6 8 10 14 21Threshold increment, Lcrit (cm_1)a 2 4 5 8 10 16 20 25 32

a In order to avoid sampling errors the value of L should be chosen to be lets then Lcrit for a given s; thesevalues have been chosen such that effects resulting from the choice or sampling frame (determined by S) areaccounted for.

critical values of L (Lcrit) for selected Gaussian terms

390

Page 397: 3D QSAR in Drug Design. Vol2

Fig

. 4. E

VA

pse

udo-

spec

tra

for

estr

adio

l (in

acti

ve fo

r C

BG

-bin

ding

) an

d de

oxyc

orti

sol (

hig

hest

CB

C-b

indi

ng a

ctiv

ity)

; a

Gau

ssia

n of

4 c

m-1

has

been

use

d. T

he

univ

aria

te s

tand

ard

devi

atio

n fo

r the

21

trai

ning

set

str

uctu

res

is g

iven

by

the

heav

y lin

e.

391

Page 398: 3D QSAR in Drug Design. Vol2

Trevor W. Heritage, Allan M. Ferguson. David R. Turner and Peter Willett

peaks in the fingerprint region (1 to 1500 cm–1) indicates that there is considerably more vibrational information in this region than in the functional group region (1500 to4000 cm–1) of the spectrum, as is typical in most infrared spectra. The EVA descriptorvalues and the standard deviation over the entire dataset are largest at frequencies cen-tered around 1400 and 3100 cm–1, corresponding to C-H bending and stretching vibra-tions. Figure 4 also highlights the errors associated with the calculation of normalcoordinate frequencies (in this case, using MOPAC), since a carbonyl stretching fre-quency is expected (from experiment) to appear at around 1700 cm-1, but is representedon this plot by peaks in around 2060 cm–1. This feature of the EVA descriptor, onceagain, indicates that there is no attempt to simulate an experimental IR spectrum, butdoes not detract from the usefulness of the descriptor for QSAR purposes, since con-sistency rather than accuracy across the dataset is critical. Furthermore, for QSAR pur-poses, relative rather than absolute differences in vibrational frequency across the dataset are important. One might expect that this would become more of an issue shouldheterogeneous datasets be used since the consistency with which errors associated with the reproduction of equivalent vibrational frequencies may be more erratic. In practice,however, reasonable QSAR results have been obtained using a variety of heterogeneousdatasets [19,23].

6.

The sensitivity of CoMFA to the molecular orientation and alignment and, therefore, tothe molecular conformation is well established [32,33] ,but while EVA is completelyindependent of molecular orientation and alignment, the impact of the molecular con-formation on EVA QSAR performance has. thus far, not been discussed. Intuitively, itis obvious that a change in conformation will result in changes in the force constantsbetween atoms and, therefore, in the normal coordinate frequencies and displacements.The questions are: ‘to what extent arc these changes evident within the EVA descrip-tor?’ and ‘how much of this is accounted for by the Gaussian spread term?’. Some limited studies of these conformational effects have been performed [31,33]. In one

at the same biological target, encomp ing pyrazoles, thiazoles, piperidines, quinolines

nearest-neighbor algorithm, based on the EVA descriptor. The conformations of eachmolecule were repeatedly randomized, new EVA descriptors generated and the cluster-ing process repeated. The conclusions from this study were that, while the nearest-neighbor relationships between compounds change. the overall cluster membership isapproximately constant. This result suggests that, in the vast majority of cases, a con-formational change does not lead to a sufficiently large change in the resulting EVA de-scriptor to cause a change in the underlying statistical model.

In a more recent study [31], EVA descriptors for test chemicals were generated for aconformation which matched that used in the training set and also for a non-matchingconformation. At low σ values, the predictions made based on the non-matching con-formation are considerably poorer than those made for the matched conformation. This

Conformational Sensitivity of the EVA Descriptor

392

and thiochromans, and totalling more than 250 structures, were clustered using a

such study [33] performed by Shell Research Ltd., five classes of chemical known to act

Page 399: 3D QSAR in Drug Design. Vol2

EVA: A Novel Thoeretical Descriptor for QSA R Studies

difference gradually decreases until convergence is achieved at σ = 12 cm–1, thereafterthe predictions from the two conformations are roughly equivalent. In general, theconformational sensitivity of the EVA descriptor decreases as is increased. As would beexpected, the predictions made using CoMFA for noli-matching conformations aremuch poorer than any of those obtained using EVA, thereby highlighting the relativeconformational sensitivity of the two methods.

7. QSAR Model Interpretation

In CoMFA, 3D isocontour plots are used to visualize those regions of space indicatedby the PLS model to be most highly positively or negatively correlated with biological activity. While no such 3D visualization is possible with EVA, a variety of 2D plotshave been suggested [24,31] that indicate the relative importance of regions of the spec-trum in correlating biological activity. Figure 5 shows two such plots based on a two-component PLS model for the steroid dataset [3] that, i n some ways, facilitateinterpretation of an EVA QSAR model in analogous fashion to the interpretation of anexperimental IR spectrum. The two measures shown in the figure are the magnitudes ofthe regression coefficients (B) and the variable influence on projection (VIP) [34].It ispertinent to remind the reader that the peak heights depicted i n Fig. 5 represent therelative importance of the EVA variables in the PLS analysis and are in no way relatedto vibrational intensity.

To backtrack to the important structural features indicated by the PLS model, it isnecessary first to identify the variables most highly correlated with activity, decomposethose variables into the contributing vibrational frequencies and then to interpret andvisualize the underlying normal mode vibrations.

Two simple approaches have been proposed for identifying the most important variables in the PLS analysis [31]. The first approach suggests that important variableswill have regression coefficients in excess of half of the largest coefficient. The secondmethod. based upon the VIP score, states that important variables will have a VIP scoregreater than 1.0, while unimportant variables will have a VIP score less than 0.8 [34].Analysis of the EVA descriptor (σ = 4 cm-1) for the steroid dataset by Turner et al. [31]results in the selection of too many EVA variables at a threshold of VIP 1.0 (183variables), but a threshold of VIP 3.0 yields a more manageable number (17 vari-ables). It is reasonable to use such a high VIP threshold since these are the variables most heavily weighted by PLS and, thus, may be used to get some feel for the mainstructural features used to discriminate between the training set structures.

The decomposition of the selected (important) EVA variables into their contributorynormal mode frequencies is most straightforward and certainly less ambiguous, if eachEVA variable is composed of one and only one normal coordinate frequency. For thisreason, it is important that the smallest value is used during the analysis as possible,since, as discussed earlier, σ directly affects the degree of intrastructural Gaussianoverlap. Examination of the underlying frequencies for EVA variables with VIP 3.0is not straightforward. However, for the steroid dataset, PLS appears to discriminatebetween high-, medium- and low-active structures based on the presence or absence of

393

Page 400: 3D QSAR in Drug Design. Vol2

Fig

. 5. R

egre

ssio

n co

effi

cien

t (B

) an

d V

IP s

core

plo

ts fr

om E

VA

4 c

m-1

two

LV

fitt

ed m

odel

.

394

Page 401: 3D QSAR in Drug Design. Vol2

EVA: A Novel Theoretical Descriptor for QSAR Studies

specific frequencies that are characteristic of the functionalities considered important forbinding affinity. For example, the variable with the second-highest VIP score at2056 cm–1 relates to the position-3 carbonyl group stretching mode. This group is one ofthe features deduced by Mickelson et al. [35] to be critical for CBG-binding and ispresent in all of the high- and medium-activity compounds, as well as the most active ofthe low-activity compounds.

The attempts at interpreting QSAR models based upon the EVA descriptor, discussedherein, are encouraging, in that the classifications between structures can, to someextent, be rationalized in terms of the features postulated to be necessary for activity.None the less, EVA QSAR models cannot, to date, be interpreted to the same extent asCoMFA models in which the correlations may be related to probe interaction energies.

8. Summary

One of the main problems encountered with QSAR techniques that use fields to charac-terize molecules, such as CoMFA, is the need to align the structures concerned. Theselection of such alignments, in terms of the molecular orientation and conformation, isessentially arbitrary, but has profound effects on the quality of the derived QSARmodel. For this reason, a number of groups have attempted to develop new 3D QSARtechniques that extend beyond this limitation, with varying degrees of success. This chapter has reviewed the progress made with one such methodology, that based uponmolecular vibrational eigenvalues and known as EVA.

EVA provides an entirely theoretically based descriptor derived from calculated, fun-damental molecular vibrations. Molecular structure and conformational characteristicsare implicit in the descriptor since the vibrations depend on the masses of the atomsinvolved and the forces between them. The signi ficant advantage that EVA offersrelative to CoMFA and related 3D QSAR techniques is that molecular vibrational pro-perties are orientation independent, thereby eliminating ambiguity associated with thewell-known molecular alignment problem.

The discussion of the QSAR modelling performance of EVA herein illustrates that thegeneral applicability of the descriptor and the robustness of the resultant QSAR (PLS)models in terms of cross-validation statistics. In addition. extensive randomization testing of the PLS models discussed herein [31] shows that the probability of obtainingsimilar correlations by chance to those actually obtained using the EVA descriptor isessentially zero. Randomization and related statistical tests [16] have played a crucialrole in conclusively demonstrating that EVA can be used to correlate biological activityor other properties and generate statistically valid QSAR models. In most, but not all,cases examined EVA compares favorably with CoMFA, in terms of the ability to buildstatistically robust QSAR models from training set structures and in terms of the abilityto use those models to predict reliably the activity of ‘unseen’ test chemicals.Furthermore, EVA has yielded predictively useful QSAR models for quite hetero-geneous datasets, where the application of CoMFA is difficult or impossible.

The promising results presented herein may lead one to believe that developmentof the EVA methodology has been completed, but this is not the case. There is

395

Page 402: 3D QSAR in Drug Design. Vol2

Trevor W. Heritage, Allan M. Ferguson, David B. Turner and Peter Willett

considerable interest in exploring several aspects of the descriptor, including the cor-relation with specific types of effects (e.g. hydrophobic, steric or electrostatic) and therational selection of localized σ values as a basis for establishing suitable probabilitydensity functions for particular types of vibration or regions of the infrared spectrum. Inaddition, despite the example provided herein of taking significance-of-variable plotscoupled with techniques for selecting these variables as a means to interpreting an EVAQSAR model, there is need for more sophisticated techniques for the decomposition ofEVA variables into the underlying normal mode vibration(s) and thereby to the groupsof atoms that are characteristic of those vibrations. A further area that requires invest-igation is the sensitivity of EVA to the molecular conformation used and to what extentthis governs the choice of σ parameter.

As the EVA methodology matures other applications, besides 3D QSAR, will beginto emerge that take advantage of the strengths of the technique. One such example [36],centers on the use of EVA for similarity searching in chemical databases, in which theoverall conclusions are that EVA is equally effective for this purpose as the more tradi- tional 2D fingerprint method. although depending on the similarity measure applied, thehits returned by EVA and 2D similarity measures may be structurally quite different. A consequence of this finding is that EVA-based similarity searching may provide analternative source of inspiration to a chemist browsing a database. The applicabilityof EVA in the context o f database similarity searching is i n stark contrast to thecomplexities associated with field-based similarity searching [9] in chemical databases.

Finally, the technique described herein that yields the standardized EVA descriptorfrom the calculated vibrational frequencies is not limited to that purpose and may, inprinciple, be applied in any circumstance where the property or descriptor is non-standard. For example, the standardization procedure may be applied to interatomic dis-tance information, either for a single conformation or as a means of summarizingconformational flexibility. Furthermore, the same procedure may be applied to other de-scriptions of molecular structure that are dependent on the number of atoms, such aselectron populations, partial charges or vibrational properties other than normal co-ordinate eigenvalues (EVA), including transition dipole moments (intensity) or eigen-vector data (directionality of the vibrations). The EVA standardization methodology,therefore, provides a novel means of transforming data. Furthermore, it is conceivablethat descriptor strings derived from different sources, such as these, may be concatenatedin a manner similar to that of the Molecular Shape Analysis method of Dunn et al. [37].

References

1.

2.3.

4.

Hansch, C. and Fujilta, T., ρ- σ- π analysis: A method for the correlation of biological activity andchemical structure, J. Am. Chem. Soc.,86 (1964) 1616–1626. Wiese, M., In Kubinyi, H. (Ed.) 3D QSAR in drug design, ESCOM, Leiden, 1993.Cramer, R.D., Patterson, D.E. and Bunce, J.D. comparative molecular field analysis (CoMFA): 1. Effectof shape on binding of steriods to carrier proteins, J. Am. Chem. Soc., 110 (1988) 5959–5967.Kim, K.H. and Martin. Y.C., Direct prediction of linear free-eneergy substituent effects from 3D struc- tures using comparative molecular-field analysis: 1. Electronic effects of substituted benzoic-acids, J. Org. Chem., 56 (1991) 2723–2729.

396

Page 403: 3D QSAR in Drug Design. Vol2

EVA: A Novel Theoretical Descriptor for QSAR Studies

Klebe, G., Abraham, U. and Mietzner, T., Molecular similarity indexes in a comparative-analysis(CoMSIA) of drug molecules to correlate and predict their biological activity, J. Med. Chem., 37 (1994)4130-4146.Kellogg, G.E., Semus, S.F. and Abraham, D.J., HINT — A new method of empirical hydrophobic field calculation for CoMFA, J. Comput.-Aided Mol. Design, 5 (1991) 545–552.Good. A.C., The calculation of molecular similarity: Alternative formulas, data manipulation andgraphical display, J. Mol. Graph., 10 (1992) 144–151.Good. A.C., Hodgkin, E.E. and Richards. W.G., The utilisation of Gaussian functions for the rapidevaluation of molecular similarity, J. Chem. Inf. Comput. Sci., 32 (1992) 188–191.Thorner, D.A., Wild, D.J ., Willett, P. and Wright, P.M., Similarity searching in files of three-dimensional structures: Flexible field-based searching of MEP, J. Chem. Inf. Comput. Sci., 36 (1996) 900–908. Wagener, M., Sadowski, J. and Gasteiger, J., Autocorrelation of molecular surface properties for model-

Soc., 117 (1995) 7769–7775. Silverman, B .D. and Platt. DE., Comparative molecular moment analysis (CoMMA): 3D QSAR withoutmolecular superposition, J. Med. Chem., 39 (1996) 2129–2140. Clementi, S., Cruciani, G., Riganelli, D. and Valigi, R., In Dean. P.M., Jolles, G. and Newton, C.G.(Eds.) New perspectives in drug design. Academic Press, London, 1995. pp. 285–310. Ferguson, A.M. and Heritage.T.W., Shell Research Ltd. Internal Report. 1990 (not publicly available).Herzberg, G., Molecular Spectra and Molecular Structure: II, Infrared and Raman Spectra PolyatomicMolecules, 8th Ed., D. Van Nostrrand Company Inc., New York, 1945.Ferguson, A M., and Jonathan. P., Shell Research Ltd. Internal Report, 1990 (not publicly available). Lindberg, W ., Persson, J.-A. and Wolds, S., Partial least-squares method for spectroflourimetricanalysis of mixtures of humic acid and ligninsulfonate, Anal. Chem., 55 (1983) 643–648. Waszkowyez, B., Clark. D.E., Frenkel, D., Li, J., Murray, C W., Robson, B. and Westhead. D.R., PROG-LIGAND — an approach to de Novo molecular design: 2. Design of novel molecules from molecular field analysis (MFA) models and pharmacophores, J. Med. Chem., 37 (1994)3994–4002.Weiner, S.J., Kollman, P.A , Case, D.A., Singh, U.C., Ghio, C., Alagona, G., Profeta, Jr., S. and Weiner, P., A novel force field for molecular mechanical simulation of nucleic acids and proteins, J . Am. Chem.Soc., 106 (1984) 765–784.Ferguson, A.M., Heritage, T.W., Jonathan. P., Pack, S.E., Phillips. L., Rogan, J. and Snaith, P.J., EVA:A new theoretically based molecular descriptor for use in QSAR/QSPR analysis, J. Comput .-Aided Mol. Design. 11 (1997) 143–152.Cramer III, R.D., BC (DEF) Parameters: 1. The intrinsic dimensionality of intermolecular interactionsin the liquid state, J. Am. Chem. Soc., 102 (1980) 1837–1849.Stewart. J.J.P., Optimisation of Parameters for Semiempirical Methods: 2. Applications, J. Comp.Chem., 10 (1989) 221–264. Stewart. J.J .P., MOPAC: A semiempirical molecular orbital program, J. Comput.-Aided Mol. Design. 4 (1990) 1–105.Turner, D.B., Willett, P., Ferguson, A.M. and Heritage. T.W., Evaluation of a novel infra-red range vibration-based descriptor (EVA) for QSAR studies: 1. General application, J. Comput .-A ided Mol.Design, 11 (1997) 409–422.Turner, D.B., An Evaluation of a novel molecular descriptor (EVA) for QSAR studies and the similaritysearching of chemical structure databases, PhD, thesis, University of Sheffield, 1996.Jonathan. P., McCarthy, W.V. and Roberts, A.M.I., Discriminant analysis with singular covariancematrices: A method incorporating crossvalidation and efficient randomized permutation tests, J. Chemometrica, 10 (1996) 189–214.Waller, C.L. and McKinney, J.D., Comparative molecular field analysis of polyhalogenated dibenzo- p-dioxins, dibenzofurans and biphenyls, J. Med. Chem., 35 (1992) 2660–3666.

27. Carroll, F.I., Gao, Y.G., Rahman, M.A., Abraham, P., Parham, K., Lewin, A.H., Boja, J.W. and Kuhar, M.J., Synthesis, ligand-binding, QSAR and CoMFA study of 3-ß-(para-substituted phenyl)tropane-

5 .

6.

7.

8.

9.

10.ing corticosteriod binding globulin and cytosolic Ah receptor activity by neural networks, J. Am. Chem.

11.

12.

13.14.

15.16.

17.

18.

19.

20.

21.

22.

23 .

24.

25.

26.

2-ß-carboxylic acid methyl-esters, J. Med. Chem., 34 (1991) 2719–2725.

397

Page 404: 3D QSAR in Drug Design. Vol2

Trevor W. Heritage, Allan M. Ferguson, David B. Turner and Peter Willett

28. Allen, M.S., Laloggia, A.J., Dorn. L.J., Matin, M.J., Costantin, G., Hagen, T.J., Koehkr, K.F.,Skolnick, P. and Cook. J M., Predictive binding of ß-carboline inverse agonists and antagonists via theCoMFA/GOLPE approach, J. Med. Chem., 35 (1992) 4001–4010. Greco, G., Novellino, E.. Silipo, C. and Vittoria, A., Comparative molecular-field analysis on aset ofmuscarinic agonists, Quant. Struct.-Act. Relat., 10 (1991) 289–299.Cho, S. and Trophsha, A., Crossvalidated R²-guided region selection for comparative molecular field- analysis: A simple method to achieve consistent results, J. Med. Chem., 38 (1995)1060–1066.Turner. D.B., Willett, P., Ferguson, A.M. and Heritage, T.W., Evaluation of a novel infra-red range vibration-based descriptor (EVA) for QSAR studies: 2. Model validation, J. Med. Chem. (submitted).Kroemer, R.T. and Hecht, P., Replacement of steric 6-12 potential-derived interaction energies by atom-based indicator variables in CoMFA leads to models of higher consistency, J. Comput.-Aided Mol.Design, 9 (1995) 205–212.Heritage, T.W., Shell Research Ltd. Internal Report, 1992 (not publicly available). Wold. S., Johansson, E. and Cocchi, M., PLS — partial laest squares to latent structures, In Kubinyi, H. (Ed.) 3D QSAR in drug design, ESCOM, Leiden, 1993, pp. 523–550. Michelson, K.E., Forsthoefel, J. and Westphal, U., Steroid–protein interactions: Human corticosteroid binding globulin: Some physicochemical properties and binding specificity, Biochemistry, 20 (1981)6211–6218.Ginn, C.M.R., Turner. D.B., Willett, P., Ferguson, A.M. and Heritage. T.W., Similarity searching in files of three-dimensional chemical structures: Evaluation of the EVA descriptor and combination of rank-ings using data fusion, J. Chem. Inf. Comput. Sci. 37 (1997) 23–37. Dunn, W.J., Hopfinger, A.J., Catana, C. and Duraiswami, C., Solution of the conformation andalignment tensors for the binding of trimethoprim and its analysis to dihydrofolate-reductase —3D-quantitative structure–activity study using molecular-shape analysis 3-way partial least-squares regression, and 3-way factor analysis, J. Med. Chem., 39 (1996) 4825–4832.

29.

30.

31.

32.

33.34.

35.

36.

37.

398

Page 405: 3D QSAR in Drug Design. Vol2

Author Index

Alber, F. 169 Liljefors, T. 3Andreoni, W. 161Anzali, S . 273 Marshall, G.R. 35

131 Oprea, T.I. 35Ortiz, A.R. 19

Carloni, P. 169Clark, R.D. 213 Pearlman, R.S. 339Clark, T. 131 Polanski, J. 273Cramer, R.D. 213

Reddy, M.R. 85Erion, M.D. 85 Richards, W.G. 321

Rognan, D. 181Ferguson, A.M. 213, 381

Sadowski, J. 273Gago, F. 19 Smith, K.M. 339Gasteiger, J. 273Ghose, A.K. 253 Teckentrup, A. 273Good, A.C. 321 Thorner, D.A. 301Gramatica, P. 355 Todeschini, R. 355Grootenhuis, P.D.J. 99 Turner, D.B. 381

Harrison, R.W. 115 Viswanadhan, V.N. 85Heritage, T.W. 381Holloway, M.K. 63 Wade, R.C. 19Holzgrabe, U. 273 Wagener, M. 273

Knegtel, R.M.A. 99 Wendoloski, J.J. 253Kubinyi, H. 225 Wild, D.J. 301

Weber, I.T. 115

Willett, P. 301, 381Wright, P.M. 301

H. Kubinyi et al. (eds.), 3D QSAR in Drug Design, Volume 2. 399.© 1998 Kluwer Academic Publishers. Printed in Great Britain.

Beck, B.

Page 406: 3D QSAR in Drug Design. Vol2

This Page Intentionally Left Blank