Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

17
PROTEINS: Structure, Function, and Genetics 25403-419 (1996) RESEARCH ARTICLES Docking Enzyme-InhibitorComplexes Using a Preference-Based Free-Energy Surface A. Wallqvist and D. G. Covell Frederick Cancer Research and Development Center, National Cancer Institute, Science Applications International Corporation, Frederick, Maryland 21 702 ABSTRACT We present a docking scheme that utilizes both a surface complementarity screen as well as an energetic criterion based on surface area burial. Twenty rigid enzyme/ inhibitor complexes with known coordinate sets are arbitrarily separated and reassembled to an average all-atom rms (root mean square) deviation of 1.0 A from the native complexes. Docking is accomplished by a hierarchical search of geometrically compatible triplets of surface normals on each molecule. A pruned tree of possible bound configurations is built up using successive consideration of larger and larger triplets. The best scoring configurations are then passed through a free-energy screen where the lowest energy member is selected as the predicted native state. The free energy ap- proximation is derived from observations of surface burial by atom pairs across the inter- face of known enzymelinhibitor complexes. The occurrence of specific atom-atom surface bu- rial, for a set of complexes with well-defined secondary structure both in the bound and un- bound states, is parameterized to mimic the free energy of binding. The docking procedure guides the inhibitor into its native state using orientation and distance-dependent functions that reproduce the ideal model of free energies with an average rms deviation of 0.9 kcal/mol. For all systems studied, this docking procedure identifies a single, unique minimum energy con- figuration that is highly compatible with the na- tive state. o 1996 Wiley-Liss, Inc. Key words: surface complementarity, macro- molecular interactions, HIV-1 pro- tease inhibitor binding INTRODUCTION The problem of predicting how biological macro- molecules interact with their targets is of funda- mental importance. Theoretical studies of structure/ activity relationships, protein folding, and enzyme1 0 1996 WILEY-LISS, INC. inhibitor binding all rely on some implicit or explicit assumption about how the free energy of these sys- tems can be characterized. In understanding how and why inhibitors bind, or dock, to an enzyme, the concept of shape complemen- tarity between mating surfaces is thought to be a powerful tool.' This rationale is based on observa- tions of close atomic packing within an interface. For rigid molecules, tight interfacial packing is a necessary condition for binding, but it may still not be sufficient to screen out non-native configurations. Thus a second requirement is often imposed, name- ly, that the configurations also have a locally opti- mal energetic component. This is usually expressed in a scoring function, either heuristically chosen or based on a suitable potential energy force field.' These functions seek to mimic the global free energy at the native minima, or at least reproduce its sa- lient features; depending on their ability to accom- plish this feat, such functions can be used to select the native state from alternative arrangements. Although potential energy functions between pairs of atoms and the forces they give rise to are essentially the same in all systems, the free energy considered from a painvise perspective will always be highly system dependent. This is noted experi- mentally in the observed differences in hydrogen bond contributions to the stability of different pro- tein~.~-~ To some extent these system-dependent features are useful in that we can parameterize a simplified free-energy functional that will be appli- cable under a specific set of circumstances. Thus, although the atoms and molecules that are involved in protein folding are the same as those responsible for binding, the same free energy descriptions need not apply, e.g., contact potentials for burial of amino Received September 25, 1995; revision accepted February 13, 1996. Address reprint requests to D.G. Covell, Frederick Cancer Research and Development Center, National Cancer Institute, Science Applications International Corporation, Frederick, MD 21702.

Transcript of Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

Page 1: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

PROTEINS: Structure, Function, and Genetics 25403-419 (1996)

RESEARCH ARTICLES

Docking Enzyme-Inhibitor Complexes Using a Preference-Based Free-Energy Surface A. Wallqvist and D. G. Covell Frederick Cancer Research and Development Center, National Cancer Institute, Science Applications International Corporation, Frederick, Maryland 21 702

ABSTRACT We present a docking scheme that utilizes both a surface complementarity screen as well as an energetic criterion based on surface area burial. Twenty rigid enzyme/ inhibitor complexes with known coordinate sets are arbitrarily separated and reassembled to an average all-atom rms (root mean square) deviation of 1.0 A from the native complexes. Docking is accomplished by a hierarchical search of geometrically compatible triplets of surface normals on each molecule. A pruned tree of possible bound configurations is built up using successive consideration of larger and larger triplets. The best scoring configurations are then passed through a free-energy screen where the lowest energy member is selected as the predicted native state. The free energy ap- proximation is derived from observations of surface burial by atom pairs across the inter- face of known enzymelinhibitor complexes. The occurrence of specific atom-atom surface bu- rial, for a set of complexes with well-defined secondary structure both in the bound and un- bound states, is parameterized to mimic the free energy of binding. The docking procedure guides the inhibitor into its native state using orientation and distance-dependent functions that reproduce the ideal model of free energies with an average rms deviation of 0.9 kcal/mol. For all systems studied, this docking procedure identifies a single, unique minimum energy con- figuration that is highly compatible with the na- tive state. o 1996 Wiley-Liss, Inc.

Key words: surface complementarity, macro- molecular interactions, HIV-1 pro- tease inhibitor binding

INTRODUCTION The problem of predicting how biological macro-

molecules interact with their targets is of funda- mental importance. Theoretical studies of structure/ activity relationships, protein folding, and enzyme1

0 1996 WILEY-LISS, INC.

inhibitor binding all rely on some implicit or explicit assumption about how the free energy of these sys- tems can be characterized.

In understanding how and why inhibitors bind, or dock, to an enzyme, the concept of shape complemen- tarity between mating surfaces is thought to be a powerful tool.' This rationale is based on observa- tions of close atomic packing within an interface. For rigid molecules, tight interfacial packing is a necessary condition for binding, but it may still not be sufficient to screen out non-native configurations. Thus a second requirement is often imposed, name- ly, that the configurations also have a locally opti- mal energetic component. This is usually expressed in a scoring function, either heuristically chosen or based on a suitable potential energy force field.' These functions seek to mimic the global free energy at the native minima, or at least reproduce its sa- lient features; depending on their ability to accom- plish this feat, such functions can be used to select the native state from alternative arrangements.

Although potential energy functions between pairs of atoms and the forces they give rise to are essentially the same in all systems, the free energy considered from a painvise perspective will always be highly system dependent. This is noted experi- mentally in the observed differences in hydrogen bond contributions to the stability of different pro- t e i n ~ . ~ - ~ To some extent these system-dependent features are useful in that we can parameterize a simplified free-energy functional that will be appli- cable under a specific set of circumstances. Thus, although the atoms and molecules that are involved in protein folding are the same as those responsible for binding, the same free energy descriptions need not apply, e.g., contact potentials for burial of amino

Received September 25, 1995; revision accepted February 13, 1996.

Address reprint requests to D.G. Covell, Frederick Cancer Research and Development Center, National Cancer Institute, Science Applications International Corporation, Frederick, MD 21702.

Page 2: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

404 A. WALLQVIST AND D.G. COVELL

acids6 in the core of proteins may not be the same as for amino acids within an enzymefinhibitor inter- face. Consequently our previously developed param- eterization of a model based on enzyme-atom burial by inhibitor atoms7 may be well suited for determin- ing properly bound configurations. The aim of this paper is to verify that our preference-based, surface- burial parameterization of binding free energies can adequately account for the global free-energy min- ima of a bound enzymelinhibitor complex. An addi- tional test of this model is its ability to locate this minimum, in a computationally feasible time, given the unbound molecules. This entails developing a scheme that probes the relevant bound configura- tions efficiently while discarding local minima.

At the root of most docking studies is the quest to construct de novo molecules that can affect the be- havior of target biomolecules and ultimately regu- late their function(s).8-16 In such an endeavor a re- liable docking scheme is a necessary tool, and accordingly various strategies have been devised to address this problem. The earliest idea about sur- face complementarity is expressed in the "lock-and- key" concept,17 i.e., the individual free-energy min- ima corresponding to the shape (or conformation) of the unbound enzyme and inhibitor configurations are also a minimum for the bound complex. Pioneer- ing computer studies focused on interfacial molecu- lar contacts and the predictability of surface comple- menting shapes.",19 Simplified contact potentials were also used to probe the energetic aspects of rigid-body docking, as in the bovine pancreatic trypsin inhibitor-trypsin complex." Surface recog- nition was further enhanced by introducing other types of more detailed energy screens, such as hy- drogen bond energiesz1 and specific ion pairings.

Surface matching algorithms have been extended

from geometrical considerations one can also apply correlation techniques to match surface features of target and ligand.38 Using models of minimal atomic detail, methods have been applied to docking less well-determined structures, i.e., structures at a resolution of - 7 A.39 A general problem with these methods arises from the lack of sufficient knowledge of the free-energy minimum, and the not uncommon result that geometric screening of possible matches is not sufficient to pick out the native state unam- b i g u o ~ s l y . ~ ~ This result is, in part, a sampling prob- lem, in that not enough possible matches are gener- ated to cover the native state, and, in part, a problem of scoring functions that fail to mimic the free-energy minimum near the native state.

Efficient ways of generating many near native configurations from geometrical considerations alone should improve the likelihood of identifying the native state using a simple potential energy ~ c r e e n . ~ ~ , ~ ' Similarly, docking fragments with many initial starting configurations may improve the

to a variety of systems and application^.'^-^^ A Pad

chance of finding the global free-energy minimum.43 If the location of the active site is known, plausible initial docking positions may be generated and fol- lowed by energy minimization techniques.44 Using only shape complementarity and electrostatic mea- sures as deciding factors does not circumvent the additional "biochemical knowledge"-input for prop- er identification of the native Further de- velopment using a continuum approach to calculate differences in solubility based on electrostatics and hydrophobicities, as well as estimating entropic changes upon binding, must ultimately contribute to selection of native configuration^.^^

Docking structures with a potential energy force field often employs simulated annealing procedures using Monte Carlo or molecular dynamics methods. A novel way of guiding larger biomolecules in dock- ing is to use the tools of molecular dynamics, while controlling motion of different degrees of freedom by separately adjusting the temperature for these mo- t i o n ~ . ~ ~

An inherent problem in docking is that neither enzyme nor ligand is actually completely rigid. For smaller ligands that do not have a well-character- ized secondary structure in solution, induced struc- tural changes upon binding must be accounted for from an ab initio binding prediction. For the side- chain movements several successful approaches have been used that include flexibility by consider- ing only low energy conformation^?^,^^ Flexible ligands have also been constructed against a target by considering the shape of the target a t the active site.49 Predictive docking has also been successful in finding native conformations that were later con- firmed by experimental studies, e.g., binding con- formations of aconitas substrates were correctly assigned by Goodsell et al.50751 using unrefined co- ordinates as starting points, as were the conforma- tions of small flexible peptides bound to the human and murine histocompatibility complex receptor^.^'

The major advantage of a rigid-molecule approach is the fewer degrees of freedom available to the sys- tem. A complete treatment of all degrees of freedom of a polypeptide would entail a solution to the protein folding problem. This is not yet feasible. Disadvan- tages with rigid approximations are that they cannot predict conformational changes, or evaluate the rel- ative weight of small bond rotations and bends in the complex. Unbound enzymes and inhibitors whose crystal structures are known separately could also be used as starting points for a rigid molecule docking, but if there are conformational changes upon bind- ing, a rigid-based scheme cannot take these into ac- count and poor results may occur. In these instances the inability to find the native bound state can in- dicate either a failure of the method or that side- chain motion is sufficiently energetically important so as to prevent association. Since our method is designed to be sensitive on an atomic level, not much

Page 3: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

DOCKING ENZYME-INHIBITOR COMPLEXES 405

can be gained by such an experiment without intro- ducing flexibility. It is also doubtful whether flexible docking can be made sufficiently accurate and fast to screen massive sets of possible drug candidates. Ef- forts to search for both steric and chemically accept- able targets in known enzymes, using a small mol- ecule database, have, however, been successful in finding correct conformations of targets to mutant influenza virus hemagg l~ t in in .~~

The docking problem may also be approached by separately considering constituent parts of ligands. Such probes, or fragments, can be of a reduced set of small functional groups, or molecules, needed to de- scribe chemical functionality. Docking the target molecule with such small rigid fragments may be computationally more efficient than considering all possible conformations of even a small peptide. Flooding an active site with many small fragments can identify the chemical components necessary for binding.54,55 Of course, a direct probe utilizing van der Waals contacts with electrostatic as well as hy- drogen bond functions can accomplish the same pur-

Simplified characterizations of surface hy- drophobicities have also been used to identify the location of active sites as well as binding domains.61 Fragments can later be reconnected to form complete molecules, either using a theoretical a p p r ~ a c h , ~ ' - ~ ~ or using a database search,66 effectively avoiding the problem of flexibility in de novo drug design.

Although it is the ultimate aim of docking studies to be able to design ligands against target mole- cules, the current state of the art is a t the level of generating plausible lead compounds that can fur- ther be developed by knowledgeable medicinal chem- i s t ~ . ~ ~ * ~ ~ , ~ ~ Initial theoretical studies of shape and chemical complementarily coupled with data-base searches have suggested novel human immunode- ficiency virus 1 (HIV-1) protease inhibitor^.^'-^' Even without knowledge of the enzyme target, model studies have been successful in finding micro- molar strength inhibitors acting as antiparasitic agents of malaria.72

This paper presents an investigation of the more basic property of reproducing global free-energy minima for a set of enzymelinhibitor complexes. Pa- rameterization of binding free energies to surface area burial has been shown to reproduce experimen- tal data for a large and varied number of sys- t e m ~ , ~ ~ , ~ yet such an approach is valid only under specific circumstances and has no easily generaliz- able analog. At best, this method is correct only for molecular complexes of medium- to small-sized pep- tides with well-defined secondary and tertiary struc- t u r e ~ . ~ With this caveat in mind we have used em- pirically observed frequencies of surface burials by atom pairs across an enzymelinhibitor interface as our primary scoring screen. A fast method of gener- ating near native configurations based on geometric surface complementarity considerations allows the

energy screening to be done on a restricted set of configurations, and thus the native state can be re- covered with a modest computational effort, typi- cally a few hours on a workstation. Even though such a model will not always be able to duplicate the free energy of binding, primarily through absence of knowledge of the unbound states, it is shown below that our description is able to mimic satisfactoriIy the global free-energy minimum for a diverse set of 20 enzymelinhibitor systems.

DOCKING PROCEDURES For two rigid molecules A and B that form a com-

plex in water there is a free energy of association,

AGb+aq) receptor A + ligand B - complex AB. (1)

The most stable form of the complex has the lowest free energy, and hence if we knew the complete free- energy surface as a function of the relative orienta- tions of molecules A and B, we could find the optimal configuration by minimizing the free energy on this surface. The fundamental assumption of such a scheme is that the scoring functions used to assess the conformations are proportional to or closely re- semble the true free-energy surface at the global free-energy minimum. The currently existing scor- ing functions are not sufficiently powerful to achieve docking of candidate molecules beginning from ran- domly assigned positions. The partial solution to this problem, proposed herein, separates docking into two tasks. The first task utilizes only the fea- tures of surface geometry to locate complementary patches, which are then used to dock complete mol- ecules. The goal in this first step is to generate a family of trial dockings based solely on surface ge- ometry. The best scoring trial dockings can then be subjected to further improvements, or ruled out as likely possibilities, using an appropriate free-en- ergy-based scoring function. The following sections will describe, in detail, the steps used in these two phases of docking. Geometric procedures will be de- scribed first, followed by the steps used for final docking using a free-energy surface.

Geometric Screening Energetically stable molecular complexes are

characterized by strong surface complementarity; surfaces are usually tightly matched and reflect the effects of short-range interactions found with atoms in close van der Waals contact.' A necessary but not always sufficient step toward aligning molecules into a bimolecular complex is to identify portions of each molecule's surface that exhibit geometric com- plementarity. The large number of degrees of free- dom available to each molecule of a complex, which give rise to motions of individual atoms as well as the larger scale displacements of groups of atoms or molecules with respect to each other, limits the prac-

Page 4: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

Panel A

Usina averagedpaired doublets reDeat SteD 2 with a laraer distance for selectina Dairwise normals:

Panel B

Dock with ODtirnallv scoring Dair:

Fig. 1.

Page 5: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

DOCKING ENZYME-INHIBITOR COMPLEXES 407

ticality of conventional search strategies. To reduce the number of these possibilities, alternative dock- ing methods have been proposed based on rigidly constraining the atoms of each m ~ l e c u l e . ~ ~ - ~ ~ This condition confines the search space to only three translational and three rotational degrees of free- dom. While such a constraint certainly reduces the number of possible arrangements, the computa- tional costs of searching along these six degrees of freedom are still prohibitive, even for most modern high-speed computers.

Recently, advantages in docking rigid molecules have been gained by combining fast searching tech- niques with a reduced set of geometric features of each molecule’s ~ u r f a c e . ~ ’ . ~ ~ These schemes gener- ally construct manageable-sized lists that character- ize selected geometric features of each molecule’s surface and then rapidly scan these lists to identify entries that strongly complement each other. A crude analogy of this method is made by comparing

Fig. 1. A two-dimensional example of the proposed geometric docking scheme. A: The set of surface normals determined using the C ~ n n o l l y ~ ~ algorithm. (In the analysis presented here the rno- lecular surface for each molecule is generated using the Connolly alg~rithm,’~ with a probe radius of 1.4 A, a surface density of 2 dot&, and atomic radii taken from B~ndi.~’) The surface nor- mals relevant to this example are arbitrarily labelled ?,+, through v p+6 for molecule A and t,+, through t)q+6 for molecule B. OA and OB define the sets of surface normals for each molecule. B: For each set of surface normals, nA and a’, a list of pairwise surface normals, N4 and ha, is generated. Each entry in I\y\ and ha consists of a pair of vectors that lie within the specified dis- tance range. The number of vector pairs, k, in each list varies according to the number of surface normals, their density, and the specified distance range. C: Each entry in the set of paired nor- mals for a surface is scored according to how well they match the entries in the set of paired normals on the mating surface [i.e., scored according to how well pairs of normals face in nearly op- posite directions, according to Eq. 2, o, = Z:=, cos(O,), where e is the angle formed for pairs of normals]. In this idealized example the best scoring vector pairs are designated Aw The complete set of paired normals forms the set A. In this example the best scoring pairs are obtained from the geometrically complementary portions of each surface, located directly across the interface. D: The pair- wise sets of surface normals are averaged to yield a single normal vector on each surface. In this example the surface normals used to generate each average vector appear as thin arrows on each side of the average vector, which is designated by the thicker arrow. Each averaged vector, < vijk >, is paired with its optimal vector on the opposite surface, to form the pairwise entry, < A, >, in the complete set of averaged, paired surface normals, A. E: The process of selecting pairwise sets of surface normals is re- peated as described in A above. In this step pairs of surface normals are selected using a higher allowed range of distance separating each normal than was used in the initial step. At this stage of the procedure, the current set of surface normals has been selected so as to be complementary to the opposing surface (but on a smaller distance scale). This second iteration simply asks for pairs of surface normals that are also complementary on a larger distance scale. F: Pairs of surface normals are selected that have the lower scores according to their orientation in oppo- site directions (i.e., most complementary). The best scoring pairs are used to dock the complete molecules by simply rms matching each pair of optimally scoring surface normals. In this example the surface normals A3 and A6 form this pair (see E). The topmost set of normals are then selected and used to generate a family of trial dockings. Each of these dockings is then submitted for further evaluation using a surface-preference-based free-energy scoring scheme.

+

two approaches to the game of fitting pegs into holes. One approach simply tries each peg in each hole until the proper fit is found; such a trial-and- error approach is usually costly and inefficient. An alternative approach is to scan each peg visually for geometric queues that must be matched for a proper fit. This alternative approach limits testing of pegs to holes only with matching geometric features, e.g., triangles into triangles, squares into squares, etc., and constitutes a powerful advantage when search- ing for the proper fit.

Geometric approaches to matching solid elements can be found in the relatively young field of compu- tational g e ~ m e t r y . ~ ~ - ~ ~ Applications have been made in the areas of pattern recognition, image pro- cessing, geometric modeling, and computational graphics. Research in this field generally involves the study of “shapes” as viewed from the geometric concept of “topology.” Families of shapes are con- structed from a finite spatial point set, 0, in a d-di- mensional Euclidian space, Rd. The set R = GI, v 2, . . . , v ,}, where n is the number of points in the set and the points are specified by vectors, Ti, com- posed of d coordinates in the right-handed Cartesian coordinate system. While the geometric notion of “shape” has no formal meaning,77 such as is associ- ated with other geometric notions like diameter, vol- ume, etc., it can be thought of as a means to char- acterize a set of geometric features that identify an object, features such as local surface curvature.

Concepts developed in computational geometry can be used to identify portions of two surfaces that exhibit strong geometric complementarity. The anal- ysis begins by constructing a list of rotationally and translationally invariant features, R, associated with each member of a bimolecular complex. This list is constructed from the set of normals to discrete surface patches obtained when constructing each molecule’s surface. An example of the steps used here for geometric docking is illustrated in Figure 1 for a simplified two-dimensional case. Each surface nor- mal, $i, is a vector in space and forms the set of spa- tial points, R, for each molecule of a binding pair (Fig. 1A). The actual position and direction of surface nor- mals on each list need not be specified; only their relative values are required for each molecule.

The object of this step in the process is to generate a set of trial dockings, with strong geometric com- plementarity, that can be passed on for evaluation and further docking using the free-energy-based model of binding. In principal, the trial sets of “docked” arrangements can be obtained by directly matching an appropriate subset of elements in the constructed list of translationally and rotationally invariant features, R. The coordinates of the com- plete molecule are positioned, rigidly, according to the match of only this subset of elements in R. To achieve a match in three dimensions (d = 3) re-

+ +

Page 6: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

408 A. WALLQVIST AND D.G. COVELL

quires a minimum of three points. Consequently, triples are assembled from the list of surface nor- mals, R, of each molecule. The large number of tri- ples possible from a list of n normals ( = n!/3!(n - 3)!) requires further criteria to reduce the length of each list. Such reduction is achieved by selecting triplets of normals, v Gk = Gi, u j , u k } , with a relative spacing that reflects the density of surface normals (Fig. 1B). Here the subscripts i , j , and K specify three different surface normals from the set of normals, s1. The surface density of dots results in an average nearest neighbor distance between normal vectors of 0.6 * 0.5 A. Selecting triples to sample this density uniformly is thus accomplished by choosing only triplets of points with a separation value slightly below this lower limit and an upper limit for sepa- ration of 2 standard deviations above this. In other words, the complete set of triplets is constructed from the list of n surface normals by simply select- ing triads of normals having edge lengths within the accepted range of 0.6-1.6 A. The vector of m triples determined for each molecule will be referred to as

The lists of triplets, N, provide the geometric de- tails necessary to find optimal matches. In other words, each triple characterizes the “shape” of a re- gion on the surface, as defined by the relative loca- tion and orientation of triples of surface normals. Identifying those shapes that are most complemen- tary on each molecule’s list is based on how well these geometric features match. The idea here is to locate the pairs of triples from each list that have their surface normals mostly aligned in opposite di- rections (Fig. 1C). The relative positions of each tri- ple cannot be known a priori. It is assumed that their geometric features should be evaluated as though they were ideally “docked” to each other. Matches are therefore judged by aligning, in oppo- site directions, the averaged unit normal vectors for each triangle* and then calculating the three an- gles, {&, O,, O,}, formed between normal vectors of each triplet. All painvise combinations of the three members of each triplet set are evaluated, and only normals from triangles with similar perimeter lengths are measured. Pairs of triplets with the low- est scores for o,, defined as

‘3 + +

* t , c, N = { V ijk,, IJ i jkp . . . 9 ijk,}.

are cataloged into a set Y. Pairings are identified according to their location within each list (e.g., tri- ple m of one list is optimally matched with triple n of the other list). These pairings are exclusive, such that the mth and nth entries are removed from each list, before searching for the next optimal pair. This

*The average unit normal vector is simply the vector aver- age of the three normals of each triplet, scaled to a unit vector.

step is equivalent to sorting through two sets of com- parably “sized” triangles (triples of surface normals) to find which pairs can be most closely aligned such that their normals are pointing directly away from each other. This step yields a list of pairs of match- ing shapes for molecular surfaces A and B referred

The alignments used to build the pairwise list, A, can also be used to dock complete molecules. The “best” dockings, would naturally be based on the triplet pairs with the lowest scores of 0,. Dockings based on any single pair of aligned triangles might be successful, but in our hands are not powerful enough to yield acceptable results. This can be ex- pected, since a perfect match of these two small sur- face patches would be necessary to expect proper alignment of complete molecules. An alternative is to use the current set of optimal pairings to “build” another set of surface features. This new set, a, is comprised of averaged geometric features of each un- derlying triplet in the set, as well as the identity of the matching pair for each molecule,

to as A = {N&, N:k}.

(3) This newly derived list is then used to generate a

+ a = {<7,1>,<7,2>, . . . , < v s>).

next set of surface triples, <yiik>, using longer dis- tances than were used in the previous step to select triads (Fig. 1D). The process of identifying optimal matches in this set is repeated as before, with the restriction that comparisons be made only among the pairings, {mn}, identified in the first pass (Fig. 1E). At the completion of this second iteration, the best scoring matches are then used to align complete molecules (Fig. 1F). The advantage of each addi- tional iteration is a threefold greater number of op- timal spatial contacts for docking. In practice, only one iteration was needed to yield a list of trial dock- ings that always included an arrangement SUE- ciently close to the correct one to begin the phase of docking based on the free-energy binding model. The CPU requirements for a search depend on the number of surface normals on each structure. For the analyses presented here, searches with list lengths of - lo5 normals typically require 3 CPU hours on a DEC-alpha model 3000/800 workstation.

Energy Considerations The previously developed methods of parameter-

izing jointly buried surface areas to a free-energy model of binding is not directly applicable for eval- uating different conformations of the complex AB, since all distance dependencies were incorporated through a single cutoff value. In the following a sim- ple van der Waals scoring function is chosen that takes into account a more varied distribution of pos- sible distances. Although this function does not in- clude an r-l long-range component, it is sufficient to guide the ligand into its native configuration based

Page 7: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

DOCKING ENZYME-INHIBITOR COMPLEXES 409

on the initial geometry-sampled group of trial con- formations. As before, no consideration of explicit water molecules is taken into account. Implicitly water is present as we are parameterizing the pref- erences to mimic AGbind(uq), in essence taking ac- count of the desolvation of the jointly buried surface area between molecules A and B.

The criterion for which surface element a: on molecule A overlaps with which surface element ai on molecule B is slightly different from that used in our previous work.7 For the free-energy calculations described in Wallqvist et al.7 all surface elements of molecules A and B that are within 2.8 A from each other were gathered, and then sorted according to the distance between each pair of surface patches. Once the closest pair has been established, this pair of surface elements is removed from consideration and the search repeated for another minimum dis- tance until all selected surface elements on one mol- ecule is exhausted. This uniquely dissects the binary interface into pairwise overlapping surface ele- ments. These surface patches are now evaluated as to their alignment, according to their distance, as well as to whether they are overlapping or not.

If 8, is the angle between surface normal i belong- ing to atom Z of molecule A and the surface normal j on atom J of molecule B, then the alignment score is

s" v = e-(T)z (4)

where d8 = 1.170 rad. Once all surfaces of non-overlapping atoms be-

tween molecules A and B have been ordered, a set of surface patches ai on atom I in molecule A is matched with a similar set of surface elements aj on atom J in molecule B. The distance between two such surface elements is denoted r,, and the van der Waals size of the corresponding atom pair is given as 01j = uI + 0,. The distance score is then evaluated from a shifted Lennard-Jones interaction as

This function goes to zero as the distance between pairs of surface elements ro goes to zero. The optimal

distance score is achieved when r, = UIJ(2' - 1). score of the binary surface coverage is then calcu- lated as

1

+In the analysis presented here the molecular surface for each molecule is generated using the Connolly algorithm,78 with a probe radius of 1.4 A, a surface density of 2 dots/A2, and atomic radii taken from BondLT9 Atoms belonging to molecule A are denoted I , a surface element i on this atom has a surface area of ai, and a surface normal belonging to this surface is also carried along; similarly atoms in molecule B are denoted J and a surface element j on this atom has a surface area of a?

s v . . = S9.S'. (J v (6)

where 0 5 S, s 1. The total weight of the free- energy contribution of these surface patches is

6Gij = agSijcPIJ (7)

where the averaged area contributions, a,, from each segment are biased with the preference of observing such a surface patch burial, PIj, to determine the score. The constant, c, and the calculated preferences, PIj, are obtained self-consistently for the above scor- ing scheme using the procedures described previ- o u s l ~ , ~ and are detailed in the Appendix.

For penetrating atoms a slightly different ap- proach is used to score the surface patches involved. A surface element, ai, is overlapping with an atom J if it is closer than a, A to that atom. In this case a penalty is assessed that depends on the atom-atom distance, r,,, of the involved surface patches,

S?. v = [ (""-)6 - ( : ) 6 ] (8) rlJ -k UlJ

where the contribution to the free-energy score is again evaluated as in Equation 7. This function is zero for atomic spheres that are just touching and has a finite value for completely overlapping spheres. This penalty is chosen to be weak to have a total energy surface that is as smooth as possible. As the orientations of the surface elements are irrele- vant for penetrating atoms, Sg is set to unity.

The total free-energy estimate used for the com- plex is evaluated as

A G u = C 6 G i j + S (9) z € A j a

where the summation runs over all selected atom pairs and a crude approximation of the cratic entropy is given by the constant S, set to be 2.39 kcal/mol. No further account of side-chain or backbone entropies are considered. The limitations and applicability of this representation to mimic the true free energy of binding is discussed in our earlier work.' It is this final free-energy estimate that is used to distinguish between structures, the lowest AGAB being our best approximation to the native state. A flow chart out- lining these steps is shown in Figure 2.

A typical calculation of AG for one configuration takes about 5 CPU minutes on a DEC-alpha model 3000/800 workstation.

Generation and Evaluation of Trial Configurations

Trial configurations are generated using the geo- metric complementarity screen described earlier. This phase of the analysis produces a list of "dock- ings" sorted from best to worst according to the qual- ity of their locally derived geometric fit. Initially each molecule is randomly positioned, and its mo-

Page 8: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

410 A. WALLQVIST AND D.G. COVELL

Top ml configurations from geometry screen

I Calculate AGAB for configuation ml-1 I

retain lowest energy configurations

Translate to remove vdw overlaps; Eqn. 10

Rotations and translaions; select lowest energy configurations

I Monte Carlo simulations; I select lowest energy configurations

sdect doeked configuration as lowest freeenergy complex

Fig. 2. Flowchart of the steps to determine the final docked configuration. The trial dockings obtained from the geometric screen serve as input to this final procedure.

lecular surface is calculated. In all cases presented here, the complete set of surface normals of the “ligand” (the smaller molecule of the binding pair) is submitted for geometric analysis. The size of many of the target molecules considered here results in quite large numbers of normal vectors, numbers suffi- ciently large to prevent their complete analysis within reasonable computational time (e.g., less than 5 CPU hours on a DEC-alpha model 3000/800 work- station). Accordingly, a smaller portion of these tar- get surfaces was analyzed, usually between 40 and 50% of the total (i.e., between 3,000 and 5,000 A’). The actual ligand attachment site was always in- cluded within this portion of the surface. Previous studies indicate that simple measures of hydropho- bicity can be used to identify candidate ligand at- tachment These test sites generally cover - 10-20% of the complete surface, and for very large target molecules, could identify subsets of the total surface for subsequent geometric analysis. Such an approach was not implemented here. Instead, a large surface around the target site was considered.

A typical geometric scan generates a list of - 1,000-2,000 trial dockings. Each arrangement is scored according to the geometric criteria for match- ing triads and sorted according to these values. The top 50 scored trial dockings serve as input for eval- uation using the preference-based free-energy model

of binding. The average distance of all-atom matches between these trial structures and their correctly positioned structures ranged from a low of 2 A to values in excess of 30 A. As expected, these results indicate that the geometric scan simply identifies strongly complementary portions of each surface, which, when each molecule is docked only on this basis, can yield acceptable as well as quite poor re- sults. As will be discussed in the next section, the poorer structures can usually be ruled out on the basis of a simple energetic calculation. If more than 50 trial dockings are considered, the quality of the best fit usually improved, but was never lower than

The initial geometry scan gives rise to a spectrum of scores that are grouped together via clustering structures that do not differ from each other with a root mean square (rms) deviation greater than 4 A. In Figure 3 we show this reduction of configurations by filtering out similar structures while keeping only the lowest energy member of each cluster. From this reduced set of initial scanning points it may still be possible to find false positives, i.e., structures that initially score high but are not re- lated to the native state. Typically the reduced set will consist of 1-3 top scorers that are included in the subsequent minimization procedure, so as to be sure that no spurious minima are selected. So far no set of non-native configurations has been scored higher than native-like states. This is a necessary condition if we are to use the free-energy scores as our sole criterion for selecting the native state.

The final search for a minimum is conducted in three stages: 1) removal of grossly interpenetrating atom-spheres; 2) translations and rotations on a small grid; and 3) a final search using a low-temper- ature Monte Carlo simulation. The details of these steps are described in the next three sections. The configuration with the lowest AGAB is selected as the predicted native state. Typically this procedure requires about 300 function evaluations.

- 1.0 A.

Overlap Removal If initially a substantial number, i.e., more than

50% of all surface-surface overlaps are interpene- trating, it is prudent to do a set of translations re- moving the bulk part of these repulsive interactions. By evaluation of the repulsive score using Equations 8 and 7, a weight is assigned to each penetration. By collecting all N vectors 2 between such penetrating surface-pairs an average optimal direction can be constructed using

(10)

Page 9: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

DOCKING ENZYME-INHIBITOR COMPLEXES 41 1

Scanned

Clustered

-12 -9 I I8

-6 -3

A G/kcal/mol

Fig. 3. Energy spectrum of a geometry-based scan using HIV-1 protease as enzyme and the PD134922 inhibitor. In the top panel each individual configuration is indicated by one line on the energy scale. In the bottom panel only the best scoring member of a clustered set is indicated. In this case we can reduce the 50

where the weights wti are measured as S$xiaiirii. A translation in the direction of Tdir is then performed and the process is repeated until the number of in- terpenetrating pairs is about 10%; typically three iterations are satisfactory.

Grid Search A set of grid translations and rotations is per-

formed to get a rough picture of the surrounding free-energy surface. These movements are along or around the principal axis'' of the ligand molecules. Initially a rotational grid of -n to + n rotations of 0.18 rad around each of the three principal axis is performed. The structure with the lowest free en- ergy is then translated -m to + m 0.75 A on a three- dimensional cubic grid with each axis corresponding to a principal axis. Again the member with the low- est free energy is selected for a final rotational search but with a reduced angular step of 0.09 rad. n and m are both set to 2, as larger grid searches in- variably lead to structures that are too far away from any local minima.

Monte Carlo Simulation The best structure in terms of lowest free energy

from the above searches is then subjected to a low- temperature Monte Carlo simulation to explore and seek out the nearest local minima. The purpose of

0

initially generated configurations to only 7 truly unique configura- tions. Given the energy gap between the two lowest free-energy structures and the others, only these two would be passed on to the free-energy screen. The native state has an ideal binding energy of - 11.4 kcalhol in this case.

the Monte Carlo simulation is to perform a local minimization of the free-energy surface. Configura- tions are sampled using a Metropolis scheme.82 Ro- tational movements are generated by randomly se- lecting points on the ligand surface and rotating around an axis extending from the center of mass to the chosen surface point.83 Temperatures were set to 150 K; 100 configurations are generated using step- sizes adjusted to give an acceptance ratio of - 0.4. For large ligands that have only a small portion of their surface in contact with the enzyme, it is pru- dent to bias the rotational sampling to only those atoms that are close to the enzyme, i.e., residues across the interface whose atoms are within 5.6 A.

An example of this guided search process is given in Figure 4, where the calculated rms deviation from the native state is plotted as a function of the calcu- lated AG- for different stages during the search. In the proposed scheme a broad range of compatible surfaces between the enzyme and its inhibitor is scanned, in this case HIV-1 for protease complexed with Ro-31-8959. The points with the largest devia- tions from the native state correspond to the ligand not being in the active site, but located in contact with the outer surface of the enzyme. Empirically we find that if the surface scan does not select the proposed structures with their proper surface atoms within an rms deviation of 5-6 A from the native

Page 10: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

412 A. WALLQVIST AND D.G. COVELL

15

12

9 -4 ... 8 k

6

3

0

Surface Scan 0

G r i d N C Search +

End Points of Intermediate Search

15 -12 -9 -6 -3 0 AG/ltcsl/mol

Fig. 4. The rrns deviation of the generated inhibitor configura- tions as a function of AG. The end points of the surface scan, grid, and Monte Carlo simulations are indicated as E(1), E(2), and E(3),

state, no benefit is derived from the initial surface alignment scheme. The grid and Monte Carlo search in Figure 4 are shown to bring the inhibitor smoothly into its native configuration.

RESULTS AND DISCUSSION The results of the docking procedure outlined

above are summarized in Table I. Since the method will always give one best answer, no additional cri- teria need to be imposed to select the predicted na- tive state. As is evident, a state that is lower in energy than the native state is sometimes found. This occurrence of more strongly bound configura- tions reflects the limitation of exactly representing the global free-energy minimum via the pseudo-po- tentials in Equations 4-9. In energetic terms this discrepancy exhibits a modest average deviation be- tween AG#$:e and AGk*L of 0.9 kcavmol. Com- bined with the mean structural deviation of 1.0 A these results are quite acceptable. One must also bear in mind that our A#$$ is still only an ap- proximation to the experimental AG,,i,-values.7

On average the deviation of predicted AGs from native is -0.3 kcavmol. The absolute range is from 0.0 to 2.5 kcavmol for the myoglobin complexed with its heme-group (4MBN). It is possible that a further refinement of the energy scoring function can im- prove the average deviation, yet this deviation is

respectively. In this case the predicted native configuration has an rms deviation of 0.5 A with a concomitant free energy that is -0.4 kcal/rnol deeper than the ideal native-state free energy.

smaller than the predicted reliability of the free en- ergies themselves. The range of rms deviations is between 0.3 and 2.9 A. Two structures with the greatest deviations, chymotrypsin-trypsin inhibitor (1CHO) and the carboxypeptidase complex (4CPA), are comparatively large inhibitors, where the bulk part of the deviations is a result of mismatched atoms far from the enzymehhibitor interface. In Figure 5 the correlation between the number of atoms in the ligand and the final predicted deviations from the native structure is given for the complexes studied in Table I. All the small inhibitors (less than 200 atoms) are docked within 1 A, whereas there is a slight correlation with larger inhibitors being docked fur- ther away from their native states. The energy dif- ference between A@$$e and A=& does not cor- relate with the size of the inhibitor.

In the case of the protease inhibitor A-74704,84 we elicited, using the geometric search process de- scribed above, a possible non-native strong binding mode. Still, the strongest binding complex was that of a native-like configuration with a deviation of 0.4 A from the native configuration and a binding en- ergy of -13.3 kcal/mol. The second top scoring can- didate, with a binding energy of - 12.8 kcavmol, but with an rms deviation of 12.6 A, was the C,-sym- metric inverted analog of the native state. A-74704 consists basically of four benzene rings distributed

Page 11: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

DOCKING ENZYME-INHIBITOR COMPLEXES 413

TABLE I. Docking Results*

ITlS AM$% A*& deviation PDB codet Enzymehnhibitor (kcal/mol) (kcal/mol) (A) 1FDL Lysozyme antibody complex -8.1 -8.4 1.3 lCHO Chymotrypsin wl inhibitor -13.0 -11.8 2.9 2CLR Histocompatibility antigen - 16.3 -16.3 0.6 2CPK Protein kinase w/ inhibitor - 16.6 - 18.2 0.9 2IGF IGGl Fab fragment w/ peptide - 10.0 -9.0 1.0 2SEC Subtilisin Carlsberg w/ Eglin C - 15.6 -15.1 1.2

4CPA Carboxypeptidase w/ inhibitor -10.8 -11.4 2.1

4SGB Serine protease w/ PCI-1 -11.2 - 10.9 2.1

3DFR Dihydrofolate reductase complex -7.1 -7.9 0.6

4MBN Myoglobin w/ heme-group -10.8 -13.3 0.7

4TPI Trypsinogen w/ inhibitor -18.2 - 16.8 1.5 HIV-1 protease with inhibitor

4HVP MVT-101 -12.2 -13.0 1.0 5HVP Acetyl-pepstatin -9.1 -9.6 0.9 8HVP U-85548e - 10.9 - 10.6 0.8 9HVP A-74704 - 13.2 -13.3 0.4

PD134922 -11.4 -12.1 0.6 BMS182,193 -9.6 -9.6 0.7 Ro85548e -11.2 -11.6 0.5 Ro8959-R -12.8 -13.3 0.5

lHIV U75875 - 14.4 - 14.6 0.3 *The average all-atom rms deviation of the docked inhibitor structure as compared with the native configuration for the complexes studied is 1.0 A. The rms deviation from the native state free-energy score is 0.9 kcavmol. 'The PDB code refers to the name of the coordinate sets deposited in the Protein Data Barkgo Coordinates of Ro85548e, Ro8959-R, BMS182,193, and PD134922 were kindly supplied by Dr. Alex Wlodawer (NCI-FCRDC).

3

2

4 \

2 j-3

1

0 0

OD 0

0

OD 000 OD 0 0

0 4

0

0

0

400 800

number of heavy atoms

1200

Fig. 5. The rms deviation of docked structures with respect to their native configurations as a function of heavy atoms in the ligand.

Page 12: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

414 A. WALLQVIST AND D.G. COVELL

Fig. 6. A stereo picture of the A-74704 ligand with the docked structure. The high-scoring inverted ligand is also shown. The middle part of these molecules is virtually indistinguishable, whereas a more marked difference is noted in the orientation of the benzene rings at the ends. The rms deviation of the inverted

symmetrically in the Pl/Pl’ and P3P3’ pockets of the enzyme. An inversion of the native structure coordinates finds the correct pockets, but the orien- tations of the benzene rings in the P3P3’ pockets are skewed, preventing the full recovery of the na- tive-state binding energy. In Figure 6 the agree- ment of the docked structure with the native ligand is evident. The general features of the inverted mol- ecules agree well with the native structure in the central portion of the molecules. It is possible that inclusion of flexibility in the ligand could recover the correct orientations of the benzene rings. At the very least the proposed method differentiates be- tween these conformations, using only rigid ligands.

The probe configurations and their associated free-energy scores can be projected onto the rms de- viations of each inhibitor configuration, referenced

molecule is 12.6 A; however, inverting the atom labels for a true comparison in structural differences yields an rms deviation of only 2.4 A. The docked native-like ligand is the highest scoring structure of all generated A-74704 configurations.

to the native state, so as to provide a sketch of the free-energy surface. Figure 7 plots these values for a number of initial trial dockings used to probe the hypersurface of the trypsinogedtrypsin inhibitor system. The point with the lowest AG values for a given rms deviation represents the minimum or op- timal energy of the inhibitor for that set of coordi- nates. Although the surface appears to be well be- haved, with only small barriers between local minima, the barriers in coordinate space, represent- ing the actual path between two minimum rms val- ues, may be much larger.

For molecules that have a rather small interface it is possible that the shape complementarity scores will not be able to guide the search close enough to the native state so that the energy screen will be able to pick out the global minimum. If the geometry

Page 13: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

5

0

- 2 -5

2 -10

> Cd

\ 2

-15

-20

DOCKING ENZYME-INHIBITOR COMPLEXES

I I I I I I

4 0 0 0 0 0 0

4

s"

8

0 o o 0

4

4

415

0 2 4 6 8 10 12 14

Fig. 7. The binding energies AGAB for various trial configurations of the trypsinogenhypsin inhibitor svstern (4PTI) as a function of the rrns deviation of the ligand with respect to its native

~I

configuration.

scores are too flat, i.e., there is no clearly distin- guishable top set of configurations, the procedure given above may fail, not because of a poor descrip- tion of the free-energy minimum, but rather because of a failure to sample the relevant, native-like coor- dinate space. Unfortunately, a global search of the six degrees of freedom using only the energy screen is not yet computationally feasible.

CONCLUSIONS A two-step docking scheme has been developed us-

ing a surface complementarity screen followed by an energetic-based criterion for positional alignment. With this procedure a randomly placed test set of rigid inhibitor/enzyme complexes can be assembled to an average all-atom rms deviation of 1.0 A from the native complexes. For all systems studied this docking procedure displays a unique minimum en- ergy configuration that is compatible with the na- tive state, thus demonstrating its effectiveness in avoiding spurious minima associated with non-na- tive configurations. The success of this scheme sup- ports the notion that surface geometry and a simpli- fied free-energy model of atomic interactions can be used to locate appropriate binding configurations of molecules with rigidly defined atomic positions. This method may offer a suitable strategy for de- signing ligands based on fragment assembly.

ACKNOWLEDGMENTS Special thanks are extended to Dr. A. Wlodawer

for generously providing us with the X-ray coordi-

nates of the HIV-1 protease inhibitor complexes not in the PDB and to Dr. R. Batik for valuable com- ments regarding this manuscript. The staff of the Biomedical Supercomputing Center, FCRDC, Fred- erick, MD is thanked for its assistance and for access to its computer facilities. We thank the NIH Intra- mural Targeted Anti-Viral AIDS Program for sup- port. This research is also sponsored by the National Cancer Institute, DHHS, under contract with Sci- ence Applications International Corporation. The contents of this publication do not necessarily re- flect the views or policies of the DHHS, nor does mention of trade names, commercial products, or or- ganizations imply endorsement by the US. Govern- ment.

REFERENCES 1. Jorgensen, W.L. Rusting of the lock and key model for

protein-ligand binding. Science 254:954-955, 1991. 2. Halgren, T.A. Potential energy functions. Curr. Opin.

Struct. Biol. 5:205-210, 1995. 3. Horovitz, A., Fersht, A.R. Strategy for analysing the co-

operativity of intramolecular interactions in peptides and proteins. J. Mol. Biol. 214:613-617, 1990.

4. Horovitz, A., Serrano, L., Avron, B., Bycroft, M., Fersht, A.R. Strength and co-operativity of contributions of sur- face salt bridges to protein stability. J. Mol. Biol. 216:

5. LiCata, V.J., Ackers, G.K. Long-range, small magnitude nonadditivity of mutational effects in proteins. Biochem- istry 34:3133-3139, 1995.

6. Sippl, M.J. Knowledge-based potentials for proteins. Curr. Opin. Struct. Biol. 5:229-235, 1995.

7. Wallqvist, A., Covell, D.G., Jernigan, R.L. A preference- based free-energy parameterization of enzyme-inhibitor

1031-1044,1990.

Page 14: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

416 A. WALLQVIST AND D.G. COVELL

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

binding. Applications to HIV-1-protease inhibitor design. Protein Sci. 41881-1903, 1995. Wodak, S.J., de Crombrugghe, M., Janin, J . Computer studies of interactions between macromolecules. Prog. Bio- phys. Mol. Biol. 4929-63, 1987. Kuntz, I.D. Structure-based strategies for drug design and discovery. Science 257:1078-1082, 1992. Blaney, J.M., Dixon, J.S. A good ligand is hard to find: Automated docking methods. Perspect. Drug. Disc. Design 1:301-319, 1993. Cherfils, J., Janin, J. Protein docking algorithm: Simulat- ing molecular recognition. Curr. Opin. Struct. Biol. 3:265- 269,1993. Kuntz, I.D., Meng, E.C., Shoichet, B.K. Structure-based molecular design. Acc. Chem. Res. 27:117-123, 1994. Miller, M.D., Sheridan, R.P., Kearsley, S.K., Underwood, D.J. Advances in automated docking applied to human immunodeficiency virus type 1 protease. Methods Enzy- mol. 241:354-370, 1994. Whittle, P.J., Blundell, T.L. Protein structure-based drug design. Annu. Rev. Biophys. Biomol. Struct. 23:349-375, 1994. Lybrand, T.P. Ligand-protein docking and rational drug design. Cur. Opin. Struct. Biol. 5:224-228, 1995. Rosenfeld, R., Vajda, S., DeLisi, C. Flexible docking and design. Annu. Rev. Biophys. Biomol. Struct. 24:677-700, 1995. Fischer, E. Einfluss der Configuration auf die Wirkung der Enzyme. Berl. Dtsch. Chem. Ges. 27:2985-2993, 1894. Levinthal, C., Wodak, S.J., Kahn, P., Dadivanian, A.K. Hemoglobin interaction in sickle cell fibers. I: Theoretical approaches to the molecular contacts, Proc. Natl. Acad. Sci. USA 72:1330-1334, 1975. Salemme, F.R. An hypothetical structure for an intermo- lecular electron transfer complex of cytochrome c and b,. J. Mol. Biol. 102:563-568, 1976. Wodak, S.J., Janin, J . Computer analysis of protein-pro- tein interactions. J . Mol. Biol. 124323-342, 1978. Kuntz, LD., Blaney, J.M., Oatley, S.J., Langridge, R., Fer- rin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161:269-288, 1982. Santae, Kypr, J . A fast computer algorithm for finding an optimum geometrical interaction of two macromolecules. J . Mol. Graph. 2:47-49, 1984. Zielenkiewicz, P., Rabczenko, A. Protein-protein recogni- tion: Method for finding complementary surfaces of inter- acting proteins, J. Theor. Biol. 111:17-30, 1984. Connolly, M.L. Shape complementarity at the hemoglobin alpI subunit interface. Biopolymers 25:1229-1247, 1986. Billeter, M., Havel, T.F., Kuntz, I.D. A new approach to the problem of docking two molecules: The ellipsoid algo- rithm. Biopolymers 26777-793, 1987. Rebek Jr., J . Model studies in molecular recognition. Sci- ence 2351478-1484, 1987. Janin, J., Chothia, C. The structure of protein-protein rec- ognition sites. J . Biol. Chem. 26516027-16030, 1990. Jiang, F., Kim, S.-H. “Soft docking”: Matching of molecu- lar surface cubes. J . Mol. Biol. 219:79-102, 1991. Shoichet, B.K., Kuntz, I.D. Protein docking and comple- mentarity. J . Mol. Biol. 21:327-346, 1991. Bacon, D.J., Moult, J . Docking by least-squares fitting of molecular surface patterns. J. Mol. Biol. 225:849-858, 1992. Badel, A,, Mornon, J.P., Hazout, S. Searching for geomet- ric molecular shape complementarity using bidimensional surface profiles. J. Mol. Graph. 10:205-211, 1992. Kasinos, N., Lilley, G.A., Subbarao, N., Haneef, I. A robust and efficient automated docking algorithm for molecular recognition. Protein Eng. 5:69-75, 1992. Lawrence, M.C., Colman, P.M. Shape complementarity at proteidprotein interfaces. J . Mol. Biol. 234:946-950, 1993. Helmer-Citterich, M., Tramontano, A. PUZZLE: A new method for automated protein docking based on surface shape complementarity. J . Mol. Biol. 235:1021-1031, 1994. Mizutani, M.Y., Tomioka, N., Itai, A. Rational automatic search method for stable docking models of proteins and ligand. J . Mol. Biol. 243:310-326, 1994. Norel, R., Lin, S.L., Wolfson, H.J., Nussinov, R. Shape

complementarity at protein-protein interfaces. Biopoly- mers 34:933-940, 1994.

37. Cummings, M.D., Hart, T.N., Read, R.J. Monte Carlo dock- ing with ubiquitin. Protein Sci. 4885-899, 1995.

38. Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesem, A.A., Aflalo, C., Vakser, I.A. Molecular surface recogni- tion: Determination of geometric fit between proteins and their ligands by correlation techniques. Proc. Natl. Acad. Sci. USA 892195-2199, 1992.

39. Vakser, LA. Protein docking for low-resolution structures. Protein Eng. 8:371-377, 1995.

40. Cherfils, J., Duquerroy, S., Janin, J . Protein-protein rec- ognition analyzed by docking simulation. Proteins 11:271- 280, 1991.

41. Meng, E.C., Gschwend, D.A., Blaney, J.M., Kuntz, I.D. Orientational sampling and rigid-body minimization in molecular docking. Proteins 17:266-278, 1993.

42. Fischer, D., Lin, S.L., Wolfson, H.L., Nussinov, R. A geom- etry-based suite of molecular docking processes. J. Mol. Biol. 248:459-477, 1995.

43. Hart, T.N., Read, R.J. A multiple-start Monte Carlo dock- ing method. Proteins 13:206-222, 1992.

44. Caflisch, A,, Niederer, P., Anliker, M. Monte Carlo dock- ing of oligopeptides to proteins. Proteins 13:223-230, 1992.

45. Walls, P.H., Sternberg, M.J.E. New algorithm to model protein-protein recognition based on surface complemen- tarity. Application to antibody-antigen docking. J . Mol. Biol. 228:277-297, 1992.

46. Jackson, R.M., Sternberg, M.J.E. A continuum model for protein-protein interactions: Application to the docking problem. J. Mol. Biol. 250:258-275, 1995.

47. Di Nola, A., Roccatano, D., Berendsen, H.J.C. Molecular dynamics simulation of the docking of substrates to pro- teins. Proteins 19:174-182, 1994.

48. Leach, A.R. Ligand docking to proteins with discrete side- chain flexibility. J. Mol. Biol. 235:345-356, 1994.

49. DesJarlais, R.L., Sheridan, R.P., Dixon, J.S., Kuntz, I.D., Venkataraghavan, R. Docking flexible ligands to macro- molecular receptors by molecular shape. J. Med. Chem. 29:2149-2153,1986.

50. Goodsell, D.S., Olson, A.J. Automated docking of sub- strates to proteins by simulated annealing. Proteins

51. Goodsell, D.S., Lauble, H., Stout, C.D., Olson, A.J. Auto- mated docking in crystallography: Analysis of the sub- strates of aconitase. Proteins 17:l-10, 1993.

52. Rosenfeld, R., Zheng, Q., Vajda, S., DeLisi, C. Computing the structure of bound peptides. Application to antigen recognition by class I major histocompatibility complex re- ceptors. J . Mol. Biol. 234:515-521, 1993.

53. Lawrence, M.C., Davis, P.C. CLIX A new search algo- rithm for finding novel ligands capable of binding proteins of known three-dimensional structure. Proteins 12:31-41, 1992.

54. Miranker, A., Karplus, M. Functionality maps of binding sites: A multiple copy simultaneous search method. Pro- teins 11:29-34, 1991.

55. Klebe, G. The use of composite crystal-field environments in molecular recognition and the de m u 0 design of protein ligands. J . Mol. Biol. 237:212-235, 1994.

56. Goodford, P.J. A computational procedure for determining energetically favorable binding sites on biologically impor- tant macromolecules. J. Med. Chem. 28:849-857, 1985.

57. Wade, R.C., Clark, K.J., Goodford, P.J. Further develop- ment of hydrogen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 1. Ligand probe groups with the ability to form two hydrogen bonds. J . Med. Chem. 36:140-147, 1993.

58. Wade, R.C., Goodford, P.J. Further development of hydro- gen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 2. Ligand probe groups with the ability to form more than two hydrogen bonds. J . Med. Chem. 36:148-156, 1993.

59. Hahn, M. Receptor surface models. 1. Definition and con- struction. J. Med. Chem. 38:2080-2090, 1995.

60. Hahn, M., Rogers, D. Receptor surface models. 1. Applica- tion to quantitative structure-activity relationships stud- ies. J. Med. Chem. 382091-2102, 1995.

8~195-202, 1990.

Page 15: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

DOCKING ENZYME-INHIBITOR COMPLEXES 417

61. Young, L., Jernigan, R.L., Covell, D.G. A role for surface hydrophobicity in protein-protein recognition. Protein Sci. 3:717-729, 1994.

62. Moon, J.B., Howe, W.J. Computer design of bioactive mol- ecules: A method for receptor-based de nouo ligand design. Proteins 11:314-328, 1991.

63. Caflisch, A,, Miranker, A,, Karplus, M. Multiple copy si- multaneous search and construction of ligands in binding sites: Application to inhibitor of HIV-1 aspartic proteinase. J. Med. Chem. 362142-2167,1993,

64. Eisen, M.B., Wiley, D.C., Karplus, M., Hubbard, R.E. HOOK: A program for finding novel molecular architec- tures that satisfy the chemical and steric requirements of a macromolecular binding site. Proteins 19:199-221,1994.

65. Lauri, G., Bartlett, P.A. CAVEAT A program to facilitate the design of organic molecules. J . Comput. Aided Mol. Des. 8:51-66, 1994.

66. Martin, Y.C. 3D database searching in drug design. J . Med. Chem. 352145-2154, 1992.

67. Cohen, N.C., Blaney, J.M., Humblet, C., Gund, P., Barry, D.C. Molecular modeling software and methods for medic- inal chemistry. J. Med. Chem. 33:883-894, 1990.

68. Kubinyi, H., ed. “3D QSAR in Drug Design.” Leiden: ES- COM, 1993.

69. Rutenber, E., Fauman, E.B., Keenan, R.J., Fong, S., Furth, P.S., de Montellano, P.R.O., Meng, E., Kuntz, I.D., Decamp, D.L., Salto, R., Rose, J.R., Craik, C.S., Stroud, R.M. Structure of non-peptide inhibitor complexed with HIV-1 protease. J. Biol. Chem. 268:15343-15346, 1993.

70. Ghosh, A.K., Thompson, W.J., Fitzgerald, P.M.D., Culber- son, J.C., Axel, M.G., McKee, S.P., Huff, J.R., Anderson, P.S. Structure-based design of HIV-1 protease inhibitors: Replacement of two amides and a l0n-aromatic system by a fused bis-tetrahydrofuran. J. Med. Chem. 37:2506-2508, 1994.

71. Lam, P.Y.S., Jadhav, P.K., Eyermann, C.J., Hodge, C.N., Ru, Y., Bacheler, L.T., Meek, J.L., Otto, M.J., Rayner, M.M., Wong, Y.N., Chang, C.-H., Weber, P.C., Jackson, D.A., Sharpe, T.R., Erickson-Viitanen, S. Rational design of potent, bioavailable, nonpeptide cyclic ureas as HlV pro- tease inhibitors. Science 263:380-384, 1994.

72. Ring, C.S., Sun, E., McKerrow, J.H., Lee, G.K., Rosenthal, P.J., Kuntz, I.D., Cohen, F.E. Structure-based inhibitor de- sign by using protein models for the development of anti- parasitic agents. Proc. Natl. Acad. Sci. USA 90:3583- 3587,1993.

73. Horton, N., Lewis, M. Calculation of the free energy of association for protein complexes. Protein Sci. 1:169-181, 1992.

74. Nievergelt, J., Hinrichs, K. “Algorithms and Data Struc- tures with Applications to Graphics and Geometry.” New York, Prentice-Hall, 1993.

75. Forrest, A.R. In: “Fundamental Algorithms for Computer Graphics.” Earnshaw, R.A. (ed.). Berlin: Springer-Verlag, 1985:707-724.

76. Shamos, F.P.P.M.I. “Computational Geometry-An Intro- duction.’’ New York Springer-Verlag, 1985.

77. Edelsbrunner, H., Kirkpatrick, D.G., Seidel, R. On the shape of a set of points in the plane. IEEE Trans. Inform. Theory IT-29:551-559, 1983.

78. Connolly, M.L. Solvent-accessible surfaces of proteins and nucleic acids. Science 221:709-713, 1983.

79. Bondi, A. Van der Waals volumes and radii, J. Phys. Chem. 68:441-449,1964.

80. Covell, D.G., Smythers, G.W., Gronenborn, A.M., Clore, G.M. Analysis of hydrophobicity in the alpha and beta chemokine families and its relevance to dimerization. Pro- tein Sci. 3:2064-2072, 1994.

81. Goldstein, H. “Classical Mechanics.” Reading: Addison- Wesley, 1980.

82. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1092, 1953.

83. Rao, M., Pangali, C., Berne, B.J. On the force bias Monte Carlo simulations of water: Methodology, optimization and comparison with molecular dynamics. Mol. Phys. 37:1771- 1798, 1979.

84. Erickson, J., Neidhart, D.J., VanDrie, J., Kempf, D.J., Wang, X.C., Norbeck, D.W., Plattner, J.U., Rittenhouse,

J.W., Turon, M., Wideburg, N., Kohlbrenner, W.E., Sim- mer, B., Helfrich, R., Paul, D.A., Knigge, M. Design, ac- tivity and 2.8 Angstrom crystal structure of a C-2 symmet- ric inhibitor complexed to HIV-1 protease. Science 249:

85. Kauzmann, W. Some factors in the interpretation of pro- tein denaturation. Adv. Protein Chem. 14:l-63, 1959.

86. Eisenberg, D., McLachlan, A.D. Solvation energy in pro- tein folding and binding. Nature 319:199-203, 1986.

87. Makhatadze, G.I., Privalov, P.L. Heat capacities of pro- teins. I. Partial molar heat capacities of individual amino acid residues in aqueous solution: Hydration effect. J. Mol. Biol. 213:375-384, 1990.

88. Privalov, P.L., Makhatadze, G.I. Heat capacity of proteins. 11. Partial molar heat capacity of the unfolded polypeptides chain of proteins: Protein unfolding effects. J . Mol. Biol. 213:385-391,1990.

89. Privalov, P.L., Makhatadze, G.I. Contribution of hydration and noncovalent interactions to the heat capacity effect on protein unfolding. J. Mol. Biol. 224:715-723, 1992.

90. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer Jr., E.F., Brice, M.D., Rogers, J.R., Kennard, O., Shimanouchi, T., Tasumi, M. The Protein Data Bank A computer based archival file for macromolecular structures. J. Mol. Biol. 112535-542,1977.

527-533, 1990.

APPENDIX Preference Calculations

The average area of the constituent surface ele- ment is accumulated for each selected atom pair sur- face across the enzymelinhibitor interface as in En- ergy Considerations, above, and used to determine a preference score PIJ based on the chemical identity of atoms I and J. Each preference is calculated as the ratio of the fraction of the total interfacial area con- tributed by each atom pair, FIj = normal- ized by the product of the fractional contribution by each atom in the pair, FI = AIIAt,,t, FJ = AJlAbt:

(11)

The highest and lowest scoring preferences, PIj, identify the most and least observed adjacent atomic surfaces, respectively, in a data set. Following the procedure in our earlier work we used a subset of 38 complexed molecules in the Protein Data Bank as our database. The modified preferences with the new area selection are given in Table IIA in which the atom identities in Table IIB follows the same division as in our previous s t ~ d y . ~ These atoms cor- respond only to those found among the standard amino acids. A minimum preference was set to 0.10 for rarely observed surface burial pairs.

Free-Energy Approximation The idea that the free energy of binding is propor-

tional to the amount of surface area buried is an old concepts5 that has recently been extended to account for a more detailed analysis of the constituent inter- f a c e ~ . ~ ~ . ’ ~ It has also been noted experimentally that thermodynamic properties of solvation for biological macromolecules correlate with solvent-exposed sur- f a c e ~ . ~ ~ - ~ ’

Page 16: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

418 A. WALLQVIST AND D.G. COVELL

TABLE IIA. Surface Area Burial Preferences, P,,

IIJ C, c a C, CaliDh carom Camid ccarb 'CNC 'N' CY

c, Caliph

Camid Ccarb 0.10 0.10 1.17 0.10 cCNC 0.10 0.10 0.10 CN+ 2.25 0.10 C Y 0.10 N,P N K Narom

C,P 0.20 1.03 1.04 1.48 1.28 0.51 0.10 0.96 2.49 0.67 c, 0.46 1.48 1.04 0.79 0.74 1.12 0.91 0.81 0.92

0.54 0.94 1.20 1.11 0.66 3.02 0.76 1.19 1.22 1.20 1.16 1.33 1.35 0.74 2.00

Carom 1.10 1.68 1.38 0.28 0.43 0.69 2.68 1.68 0.10 1.64 0.10

Namid

NK Opep Oa1c Oamid Ocarb 0,

Npep NK Narom Namid NR Opep Oalc Oamid Ocarb OY S

Cpep 0.82 0.80 1.25 1.38 1.23 0.60 0.40 0.84 0.64 0.84 0.62 c, 0.61 1.14 0.87 0.45 0.68 2.09 1.06 0.91 0.56 1.08 1.22

0.75 0.34 1.56 0.54 0.60 1.20 1.16 0.68 0.67 0.86 1.69 0.93 0.50 0.82 0.96 0.68 0.97 1.04 1.19

Carom 0.60 0.28 0.54 0.55 1.34 0.72 0.80 0.57 0.90 0.93 0.74 Camid 0.84 0.28 2.83 0.96 0.77 0.48 1.59 0.56 0.10 1.17 0.10 ccarb 1.48 2.91 2.96 0.10 1.22 0.53 1.23 0.10 0.63 0.10 0.40 CCNC 0.5 0.10 0.10 0.26 0.10 0.31 0.10 0.10 0.10 0.10 2.07 CN+ 1.74 1.94 0.10 0.80 0.27 1.63 0.88 0.85 1.78 0.47 1.63 CY 0.93 0.10 0.10 0.68 1.39 0.10 0.10 0.10 0.57 1.92 0.83 Npep 0.39 0.27 2.30 0.67 0.64 2.30 2.31 1.78 0.72 1.41 0.68 N K 0.10 0.36 0.10 1.26 1.71 5.89 0.63 8.98 0.99 0.10 Narom 0.10 0.29 0.10 0.66 0.66 0.43 4.22 0.10 2.43 Namid 0.19 0.94 2.20 1.61 4.65 1.08 2.80 0.10 N R 0.20 2.14 0.54 2.72 2.57 2.50 0.44 Opep 0.32 0.66 0.98 0.44 0.57 0.88 Oalc 0.99 1.33 0.84 0.45 0.34 Oamid 1.54 2.06 1.23 0.10 Ocarb 2.46 1.87 0.43 OY 1.66 0.98

c, Caliph 0.67 0.49 0.59

In our model the free energy of association be- tween molecules A and B is given as

AGm = C p i , . + S (12) i € A j € B

where the summation runs over the selected closest pairs of surfaces i and j belonging to molecules A and B, respectively; C, is an effective binding pa- rameter dependent only on atom types i and j; a, is the average surface area jointly buried by the atom types i and j in the complex; and S is identified as a generic entropic contribution and is set equal to 2.39 kcal/mol. It is not possible to fit explicitly all binding

parameters directly because of the paucity of crystal data and accurate binding constants for each com- plex. Instead, C, is presumed to be proportional to the frequency of observing such a surface burial pair, i.e.,

(13) where the distance and orientational dependence are incorporated in the function S , of Equations 4-6 and 8, the preference is calculated as in Equation 11, and c is a coeficient to be determined. In effect, this is a one-parameter model that requires only that c be specified before we can convert our preference

Page 17: Docking enzyme‐inhibitor complexes using a preference‐based free‐energy surface

DOCKING ENZYME-INHIBITOR COMPLEXES 419

TABLE IIB. Atom Type Classification

Element Reduced atom type Corresponding amino acid atoms

Nitrogen

Oxygen

score into the language of free energies. Conse- quently we fit a set of known free-energy values for enzymehnhibitor complexes that have a well-de- fined structure both in the complex and in s~ lu t ion .~ The accuracy of the fit (* 1.5 kcal/mol) is similar to that obtained previously employing a slightly differ- ent functional form and is compatible with c =

2.2676 cal/mol/W2. Given its simplicity, the model is not likely to be predictive of actual measured bind- ing free energies for complexes deviating from the

rigidity assumption of the unbound state. In addi- tion, one must also assume that the similarity of the amount of bound surface, approximately 1,200 A" in the interfa~e, '~ should not be violated. How- ever, the strength of the preference formulation is such that the likelihood of a certain atom-atom as- sociation should still be proportional to the true free-energy surface even for systems that deviate from the conditions applicable to the free-energy fit itself.