Molecular Superposition - John Wiley & Sons · CMA29M Molecular Superposition Michael D. Miller...

5
CMA29M Molecular Superposition Michael D. Miller Merck Research Laboratories, West Point, PA, USA 1 Introduction 2 Flexible Superposition 3 Atom Based Methods 4 Field Based Methods 5 Other Methods 6 Conclusions 7 Related Articles 8 References Abbreviations AII D angiotensin II; APA D anilinophenylacetamide; 3D D three-dimensional; MMFF D Merck macromolecular force field; SAR D structure activity relationship; SPERM D super- position by permutation. 1 INTRODUCTION Three-dimensional superposition is an abstract task involv- ing the placement of one object in the space of another. Com- putational techniques that superimpose molecules serve as a focal point for the interpretation and understanding of molec- ular data. Superposition plays an important role in correlating a wide assortment of information (e.g., energetic, physiochem- ical, biological, or pharmacokinetic), as all of these depend on 3D overlap of some physical characteristic. Energetics is dic- tated by a balance of nuclear and electronic forces; solvation and receptor recognition are dependent on the conformation and the discrete interactions a molecule makes with its sur- roundings. Each of these is directly dependent on a molecule’s spatial arrangement if atoms of similar characteristics are placed in the same location, they are expected to effect a sim- ilar result. The means by which molecular superposition is used to interpret this type of information varies almost as much as the data to be interpreted. For methods that describe data in terms of molecular characteristics, atoms, functional groups, or physical attributes, molecular superposition is the component common to all techniques. Techniques such as comparative molecular field analysis and 3D similarity are all founded on molecular superposition. Similarities may be used in com- pound design, compound optimization, and the identification of new compound classes. The resulting superposition sets are often generalized and the consistent spatial information referred to as a pharmacophore. The principal difficulty faced in molecular superposition is one of combinatorics. The num- ber of ways even a pair of molecules may be superimposed is large, making the problem inherently complex. The problem becomes even more difficult when the conformational space of the molecules is considered. As a result, most successful techniques reduce the problem, either by directing the search through heuristics or by using sampling techniques. The scope of this discussion will be restricted to small drug-like molecules. An equally large literature exists for macromolecular systems, although the issues and applications are somewhat different. Superposition techniques may be broadly divided into two classes, those that are atom based, relating atoms or fragments of one molecule to those of another, and those that make use of molecular fields, volumes, or surfaces. Flexibility and conformation must be addressed with each of these techniques. 2 FLEXIBLE SUPERPOSITION 2.1 Manual Approaches When the chemical series being investigated are relatively homologous, reasonable superpositions can be found by twist- ing bonds within a molecule while manually rotating and translating structures on a computer graphics terminal. In Figure 1(a), two potent angiotensin II antagonists are shown. The atoms are colored according to the AII pharmacophore described by Prendergast et al. 1 The two molecules were converted to 3D structures and minimized with the MMFF forcefield available in version 5.5 of the Batchmin program. The molecules were then superposed by twisting the rotat- able (nonrigid/cyclic) bonds, minimizing the distance between pharmacophoric atoms (Figure 1b). The structures were then re-minimized with the MMFF force field and superimposed via a least squares procedure. The resulting superposition is shown in Figure 1(c). Such manual approaches are lim- ited, however. It is difficult for most chemists to find all the alternative superpositions or to make an objective assess- ment of the quality of the superposition. To assist such inter- active methods, feedback and classification techniques have been developed. 2 This form of assistance and automation provides a measure of objective assessment. Other forms of automation address explicit atom superposition by incorporat- ing it into a force field. The distance between corresponding groups in the two molecules is ideally assumed to be zero and deviations from that are expressed as a penalty func- tion, to be minimized along with the valence and nonbonded terms. 3 2.2 Genetic Algorithms Still other approaches utilizing evolutionary techniques or genetic algorithms exist. 5 GENETIC ALGORITHMS require a fitness function which evaluates the quality of a given population and is used to select or construct the membership for the next population. Such evolutionary techniques are probabilistic and therefore only good solutions are obtained. With these techniques there is no way to determine the optimal solution, therefore several runs with varying starting points must be employed. In practice, for the molecular superposition problem, this is not a significant difficulty. Consideration of conformational flexibility limits the number of molecules that may be practically handled. To work with larger collections, fixed conformations of molecules are required.

Transcript of Molecular Superposition - John Wiley & Sons · CMA29M Molecular Superposition Michael D. Miller...

CMA29M

Molecular Superposition

Michael D. MillerMerck Research Laboratories, West Point, PA, USA

1 Introduction2 Flexible Superposition3 Atom Based Methods4 Field Based Methods5 Other Methods6 Conclusions7 Related Articles8 References

Abbreviations

AII D angiotensin II; APAD anilinophenylacetamide; 3DDthree-dimensional; MMFFDMerck macromolecular forcefield; SARD structure activity relationship; SPERMD super-position by permutation.

1 INTRODUCTION

Three-dimensional superposition is an abstract task involv-ing the placement of one object in the space of another. Com-putational techniques that superimpose molecules serve as afocal point for the interpretation and understanding of molec-ular data. Superposition plays an important role in correlatinga wide assortment of information (e.g., energetic, physiochem-ical, biological, or pharmacokinetic), as all of these depend on3D overlap of some physical characteristic. Energetics is dic-tated by a balance of nuclear and electronic forces; solvationand receptor recognition are dependent on the conformationand the discrete interactions a molecule makes with its sur-roundings. Each of these is directly dependent on a molecule’sspatial arrangementif atoms of similar characteristics areplaced in the same location, they are expected to effect a sim-ilar result.

The means by which molecular superposition is used tointerpret this type of information varies almost as much asthe data to be interpreted. For methods that describe data interms of molecular characteristics, atoms, functional groups, orphysical attributes, molecular superposition is the componentcommon to all techniques. Techniques such as comparativemolecular field analysis and 3D similarity are all foundedon molecular superposition. Similarities may be used in com-pound design, compound optimization, and the identificationof new compound classes. The resulting superposition setsare often generalized and the consistent spatial informationreferred to as a pharmacophore. The principal difficulty facedin molecular superposition is one of combinatorics. The num-ber of ways even a pair of molecules may be superimposed islarge, making the problem inherently complex. The problembecomes even more difficult when the conformational space

of the molecules is considered. As a result, most successfultechniques reduce the problem, either by directing the searchthrough heuristics or by using sampling techniques.

The scope of this discussion will be restricted to smalldrug-like molecules. An equally large literature exists formacromolecular systems, although the issues and applicationsare somewhat different.

Superposition techniques may be broadly divided into twoclasses, those that are atom based, relating atoms or fragmentsof one molecule to those of another, and those that makeuse of molecular fields, volumes, or surfaces. Flexibility andconformation must be addressed with each of these techniques.

2 FLEXIBLE SUPERPOSITION

2.1 Manual Approaches

When the chemical series being investigated are relativelyhomologous, reasonable superpositions can be found by twist-ing bonds within a molecule while manually rotating andtranslating structures on a computer graphics terminal. InFigure 1(a), two potent angiotensin II antagonists are shown.The atoms are colored according to the AII pharmacophoredescribed by Prendergast et al.1 The two molecules wereconverted to 3D structures and minimized with the MMFFforcefield available in version 5.5 of the Batchmin program.The molecules were then superposed by twisting the rotat-able (nonrigid/cyclic) bonds, minimizing the distance betweenpharmacophoric atoms (Figure 1b). The structures were thenre-minimized with the MMFF force field and superimposedvia a least squares procedure. The resulting superpositionis shown in Figure 1(c). Such manual approaches are lim-ited, however. It is difficult for most chemists to find allthe alternative superpositions or to make an objective assess-ment of the quality of the superposition. To assist such inter-active methods, feedback and classification techniques havebeen developed.2 This form of assistance and automationprovides a measure of objective assessment. Other forms ofautomation address explicit atom superposition by incorporat-ing it into a force field. The distance between correspondinggroups in the two molecules is ideally assumed to be zeroand deviations from that are expressed as a penalty func-tion, to be minimized along with the valence and nonbondedterms.3

2.2 Genetic Algorithms

Still other approaches utilizing evolutionary techniques orgenetic algorithms exist.5 GENETIC ALGORITHMSrequirea fitness function which evaluates the quality of a givenpopulation and is used to select or construct the membershipfor the next population. Such evolutionary techniques areprobabilistic and therefore only good solutions are obtained.With these techniques there is no way to determine the optimalsolution, therefore several runs with varying starting pointsmust be employed. In practice, for the molecular superpositionproblem, this is not a significant difficulty. Consideration ofconformational flexibility limits the number of molecules thatmay be practically handled. To work with larger collections,fixed conformations of molecules are required.

CMA29M

2 MOLECULAR SUPERPOSITION

Figure 1 Potent AII antagonists.1 Atom pairings are indicated by the colored atom labels. Superposition is achieved by rotating the flexiblebonds to achieve collinearity with identically colored atoms

NH

N NN

O

CH3

O CH3

NH

Cl

Cl

OH2N

CH3

Nevirapine α-APA

Figure 2 Reverse transcriptase inhibitors nevirapine and˛-APA5

3 ATOM BASED METHODS

Efficient techniques have been developed which examine allatom pairs within a molecule. Two relatively small molecules,nevirapine and˛-APA reverse transcriptase inhibitors (seeFigure 2) will be used to examine these techniques. If thedistance between any one atom center or vertex is consideredwith all other centers then this is referred to as a maximallyconnected graph (see Figure 3). Each vertex makesn� 1 con-nections to the other vertices, wheren is the number of atomcorrespondences between the two molecules. This approachhas been applied successfully in programs such as DISCO andSQ.4,6,7 To assist in the superposition problem, physical prop-erties may be mapped to the vertices and edges of the enumer-ated graphs. Therefore, the superposition problem becomes one

Figure 3 Clique for nevirapine and̨ -APA, the tolerance for edgecomparison was 0.5̊A. Atom correspondences are indicated by thecolored labels. The clique formed from these correspondences isshown

of identifying correspondences between the graphs; the largestcorrespondence is referred to as a clique. The correspondencemay then be turned into superpositions by performing a leastsquares fit of the graphs. Atom based correspondences areoften insufficient to describe the data being examined. Toaddress this, atom descriptions may be augmented to reflect thebehavior of an entire group, e.g., ring centroids may representa phobic or aromatic interaction, or carbonyl extensions may

CMA29M

MOLECULAR SUPERPOSITION 3

represent the directional interaction exhibited by lone pairs.

4 FIELD BASED METHODS

Even with the enhancements described above for atombased methods, when the molecules being studied are struc-turally unrelated, the utility of atom correspondences inevitablydiminishes. In addition, it is difficult to seek out structurallydiverse classes of compounds using atomic correspondences.Alternative methods that utilize molecular fields, volumes, orsurfaces have been developed. The most widely used metricof molecular similarity is that of Carbo.8

ZAB.rA, rB,/ D∫ ∫

�A.r1/.r1, r2/�B.r2/dr1dr2 .1/

Originally defined forelectron densities, it has been gener-alized to evaluate the similarities of molecules based on anymolecular field. To facilitate the manner in which the simi-larities are computed, atom centered expansions may be usedto approximate the electron density, electrostatic fields, stericsurface, or volume. Grid based representations have also beenused9 (Figure 4). These techniques work well but are signifi-cantly more demanding computationally than the atom corre-spondence methods. While these techniques are insensitive tochanges in atomic structure and position, they do have diffi-culty differentiating between regions within a molecule. This

Figure 4 Electron densities mapped to the van der Waals surface ofnevirapine and̨ -APA. Red and yellow regions are electron deficient,blue and cyan regions are electron rich

is an important consideration when only portions of a moleculeare responsible for its activity. Superpositions obtained in thismanner may result in misleading solutions. To avoid suchpitfalls, this approach has been extended to consider localsimilarities, thereby producing alternative solutions. One suchapproach involves transforming the electric field from realspace into reciprocal space where the field is representedas atom centered scattering factors.10 These in turn may besummed providing a metric for molecular similarity. A con-sequence of performing this calculation in reciprocal space isthat the rotational degrees of freedom are separated from thetranslational.

5 OTHER METHODS

5.1 Symmetry Based Techniques

Other more approximate techniques exist. Though their rep-resentations of molecular properties do not map directly tomolecular surfaces or volumes, they have been shown to per-form well and are computationally more efficient. The SPERMmethod represents the molecular properties on a fixed set ofcoordinates based on a tessellated icosahedron (Figure 5, redvertices) and the center of its faces (grey vertices).11 Adjust-ing for the molecules’ inherent volume and taking advantageof the symmetry of an icosahedron allows molecules to berapidly superimposed, minimizing the differences between theproperties assigned to the vertices. The principal advantageof this technique is its speed. Molecular similarity computedby this method is fast enough for the evaluation of large col-lections of molecules; furthermore the similarities computedby this approach discriminate well enough to be suitable forsearching structural databases. This technique is not withoutlimitations, most of which lie in the approximations. Prolatemolecules are not as well represented as spherical moleculesand scale is also a difficulty. Small molecules are representedbetter than larger ones as a smaller area is mapped to eachvertex.

5.2 Molecular Skins

Many of the interactions important for recognition orresponse are short range, manifesting themselves on themolecular surface. To represent this, the notion of a molecularskin has been described.12 Taken as the volume obtained bysubtracting a small radius surface from a larger one, molecularalignment is achieved by optimizing skin volume overlap

Figure 5 Stereo view of a tessellated icosahedron. Red spheres arethe vertices of the icosahedron, the gray spheres are the centers ofthe faces (dodecahedron)

CMA29M

4 MOLECULAR SUPERPOSITION

(Figure 6). This method has a distinct advantage over the vol-ume or field based methods, as it works well for moleculesthat are significantly different in size.

5.3 Regional Overlap Techniques

Other methods that work well for molecules of varying sizerepresent a molecule’s properties with overlapping decayingfunctions. Programs like SEAL and SQ use Gaussian func-tions to describe this overlap5,13 (see Figure 7). Unlike thepreviously described use of Gaussian functions to describea volume, these techniques use the Gaussian to compute aregional overlap, weighting the contributions of one molec-ular region with those of another. Both steric and electronicattributes of a molecule are treated this way. The electroniccharacter of a molecule is represented by seven distinct atomtypes in this approach: cationic, anionic, H-bond donor, H-bond acceptor, polar (both donor and acceptor), phobic, andpolarized. Figure 7 shows the highest scoring result for theSQ superposition of nevirapine with̨-APA. The construc-tive regional overlaps for the four contributing atom types areshown in Figure 7(b). The resulting superposition 7(c) is closeto that observed experimentally. As a result, many solutionsmay be obtained by this approach; some will favor steric con-servation while others will have smaller intersecting regionswith a high degree of electronic similarity. Of all the meth-ods described, this approach often generates the largest set ofviable solutions.

One final note, the SQ program is actually a hybrid that uti-lizes both graph theoretic and regional approaches to performmolecular superpositions in a computationally efficient man-ner. Its objective nature and inherent speed make it suitablefor comparing molecules that vary greatly in shape, size, andcomposition and it has been used to search chemical structuredatabases.

6 CONCLUSIONS

The success of techniques utilizing molecular superpositionhas caused a rapid growth in their development and refinement.The techniques for performing rigid molecular superpositions

Figure 6 Cross-section through the molecular skin for nevirapine,the skin thickness is 0.25̊A

Figure 7 SQ regional overlaps for nevirapine and̨-APA. Whiteregions are the phobic areas, red are the donor regions, blue theacceptor regions, and orange the polarized regions

based on geometric characteristics have been well refinedand can be applied to large collections (in the millions) ofmolecules. This trend will surely continue, with innovationsenabling conformationally flexible superposition to be per-formed where only rigid superpositions can currently be done.As better understanding develops, the effect of conformationon other properties will be introduced into the superpositionproblem, opening up new avenues for interpretation and under-standing of these properties.

7 RELATED ARTICLES

Coarse-Grained Searches Over Protein Conformations;Comparative Molecular Field Analysis (CoMFA); Conforma-tional Flexibility in 3D Structure Searching; ConformationalSearch for Medium-Sized Molecules; Structure Databases;Genetic and Evolutionary Algorithms; Measures of Stru-ctural Similarity for Database Searching; Molecular Designand Structure-based Drug Design; Molecular Shape Analy-sis; Structure and Substructure Searching.

8 REFERENCES

1. K. Prendergast, K. Adams, W. L. Greenlee, R. B. Nachbar,A. A. Patchett, and D. J. Underwood,J. Comput. Aided Molec.Design, 1994,8, 491 512.

2. J. Lejeune, A. Michel, and D. P. Vercauteren,J. Comput. Chem.,1986,7, 739 744.

CMA29M

MOLECULAR SUPERPOSITION 5

3. C. McMartin and R. S. Bohacek,J. Comput. Aided Molec.Design, 1995,9, 237 250.

4. D. E. Clark and D. R. Westhead,J. Comput. Aided Molec.Design, 1996,10, 337 358.

5. J. Ren, R. Esnouf, E. Garman, S. Somers, C. Ross, I. Kirby,J. Keeling, G. Darby, Y. Jones, D. Stuart, and D. Stammers,Nature Struct. Biol., 1995,2, 293 304.

6. Y. C. Martin, M. G. Bures, E. A. Danaher, J. DeLazzer, I. Li-co, and P. A. Pavlik,J. Comput. Aided Molec. Design, 1993,7,83 102.

7. M. D. Miller ‘SQ A program for producing rapid molecularsuperimpositions’, ACS 207th National Meeting and Exposition;Computers in Chemistry Section. San Diego California, 1993;M. D. Miller ‘SQ A program for the rapid production ofmultimolecular superposition’, ACS 210th National Meetingand Exposition, Medicinal Chemistry Section, Chicago, Illinois,1995.

8. R. Carbo, L. Leyda, and M. Arnau,Int. J. Quantum Chem.,1980,17, 1185.

9. (a) P. Constans, L. Amat, and R. Carbo-Dorca,J. Comput.Aided Molec. Design, 1996, 10, 827 846; (b) J. Mestres,D. C. Rohrer, and J. M. Maggiora,J. Comput. Chem., 1997,18,934 954; (c) J. A. Grant, M. A. Gallardo, and B. T. Pickup,J. Comput. Chem., 1996,17, 1653 1666; (d) T. D .J. Perkins,J. E. J. Mills, and P. M. Dean,J. Comput. Aided Molec. Design,1995, 9, 479 490; (e) R. B. Hermann and D. K. Herron,J.Comput. Aided Molec. Design, 1991,5, 511 524.

10. J. W. M. Nissink, M. L. Verdonk, J. Kroon, T. Mietzner, andG. Klebe,J. Comput. Chem., 1997,18, 638 645.

11. P. Bladon,J. Mol. Graphics, 1989,7, 130 137.12. B. B. Masek, A. Merchant, and J. B. Matthew,Proteins, 1993,

17, 193 202.13. S. K. Kearsley and G. M. Smith,Tetrahedron Computer Metho-

dology, 1992,3, 615 633.