Protein Flexibility in Ligand Docking and Virtual...

17
Protein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio N. Cavasotto* and Ruben A. Abagyan Molsoft LLC, 3366 N Torrey Pines Ct. Suite 300, La Jolla CA 92037, USA The main complicating factor in structure-based drug design is receptor rearrangement upon ligand binding (induced fit). It is the induced fit that complicates cross-docking of ligands from different ligand – receptor complexes. Previous studies have shown the necessity to include protein flexibility in ligand docking and virtual screening. Very few docking methods have been developed to predict the induced fit reliably and, at the same time, to improve on discriminating between binders and non- binders in the virtual screening process. We present an algorithm called the ICM-flexible receptor docking algor- ithm (IFREDA) to account for protein flexibility in virtual screening. By docking flexible ligands to a flexible receptor, IFREDA generates a discrete set of receptor conformations, which are then used to perform flexible ligand–rigid receptor docking and scoring. This is followed by a merging and shrinking step, where the results of the multiple virtual screenings are condensed to improve the enrichment factor. In the IFREDA approach, both side-chain rearrangements and essential backbone movements are taken into consideration, thus sampling adequately the conformational space of the receptor, even in cases of large loop movements. As a preliminary step, to show the importance of incorporating protein flexibility in ligand docking and virtual screening, and to validate the merging and shrinking procedure, we compiled an extensive small-scale virtual screening benchmark of 33 crystal structures of four different protein kinases sub-families (cAPK, CDK-2, P38 and LCK), where we obtained an enrichment factor fold-increase of 1.85 ^ 0.65 using two or three multiple experimental conformations. IFREDA was used in eight protein kinase complexes and was able to find the correct ligand confor- mation and discriminate the correct conformations from the “misdocked” conformations solely on the basis of energy calculation. Five of the generated structures were used in the small-scale virtual screening stage and, by merging and shrinking the results with those of the original structure, we show an enrichment factor fold increase of 1.89 ^ 0.60, com- parable to that obtained using multiple experimental conformations. Our cross-docking tests on the protein kinase benchmark underscore the necessity of incorporating protein flexibility in both ligand docking and virtual screening. The methodology presented here will be extremely useful in cases where few or no experimental structures of complexes are available, while some binders are known. q 2004 Elsevier Ltd. All rights reserved. Keywords: protein flexibility and induced-fit; ligand docking; structure- based drug design; virtual screening; protein kinases *Corresponding author 0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. Supplementary data associated with this article can be found at doi: 10.1016/j.jmb.2004.01.003 E-mail address of the corresponding author: [email protected] Abbreviations used: BPMC, biased probability Monte Carlo; DA, docking accuracy; EF , enrichment factor; GB/SA, generalized Born/surface area; ICM, internal coordinate mechanics; IFREDA, ICM-flexible receptor docking algorithm; LBP, ligand-binding pocket; MD, molecular dynamics; PK, protein kinase; RMSD, root-mean-square deviation; VS, virtual screening. doi:10.1016/j.jmb.2004.01.003 J. Mol. Biol. (2004) 337, 209–225

Transcript of Protein Flexibility in Ligand Docking and Virtual...

Page 1: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

Protein Flexibility in Ligand Docking and VirtualScreening to Protein Kinases

Claudio N. Cavasotto* and Ruben A. Abagyan

Molsoft LLC, 3366 N TorreyPines Ct. Suite 300, La JollaCA 92037, USA

The main complicating factor in structure-based drug design is receptorrearrangement upon ligand binding (induced fit). It is the induced fitthat complicates cross-docking of ligands from different ligand–receptorcomplexes. Previous studies have shown the necessity to include proteinflexibility in ligand docking and virtual screening. Very few dockingmethods have been developed to predict the induced fit reliably and, atthe same time, to improve on discriminating between binders and non-binders in the virtual screening process.

We present an algorithm called the ICM-flexible receptor docking algor-ithm (IFREDA) to account for protein flexibility in virtual screening. Bydocking flexible ligands to a flexible receptor, IFREDA generates a discreteset of receptor conformations, which are then used to perform flexibleligand–rigid receptor docking and scoring. This is followed by a mergingand shrinking step, where the results of the multiple virtual screenings arecondensed to improve the enrichment factor. In the IFREDA approach,both side-chain rearrangements and essential backbone movements aretaken into consideration, thus sampling adequately the conformationalspace of the receptor, even in cases of large loop movements.

As a preliminary step, to show the importance of incorporating proteinflexibility in ligand docking and virtual screening, and to validate themerging and shrinking procedure, we compiled an extensive small-scalevirtual screening benchmark of 33 crystal structures of four differentprotein kinases sub-families (cAPK, CDK-2, P38 and LCK), where weobtained an enrichment factor fold-increase of 1.85 ^ 0.65 using two orthree multiple experimental conformations. IFREDA was used in eightprotein kinase complexes and was able to find the correct ligand confor-mation and discriminate the correct conformations from the “misdocked”conformations solely on the basis of energy calculation. Five of thegenerated structures were used in the small-scale virtual screening stageand, by merging and shrinking the results with those of the originalstructure, we show an enrichment factor fold increase of 1.89 ^ 0.60, com-parable to that obtained using multiple experimental conformations.

Our cross-docking tests on the protein kinase benchmark underscorethe necessity of incorporating protein flexibility in both ligand dockingand virtual screening. The methodology presented here will be extremelyuseful in cases where few or no experimental structures of complexes areavailable, while some binders are known.

q 2004 Elsevier Ltd. All rights reserved.

Keywords: protein flexibility and induced-fit; ligand docking; structure-based drug design; virtual screening; protein kinases*Corresponding author

0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.

Supplementary data associated with this article can be found at doi: 10.1016/j.jmb.2004.01.003

E-mail address of the corresponding author: [email protected]

Abbreviations used: BPMC, biased probability Monte Carlo; DA, docking accuracy; EF, enrichment factor; GB/SA,generalized Born/surface area; ICM, internal coordinate mechanics; IFREDA, ICM-flexible receptor docking algorithm;LBP, ligand-binding pocket; MD, molecular dynamics; PK, protein kinase; RMSD, root-mean-square deviation; VS,virtual screening.

doi:10.1016/j.jmb.2004.01.003 J. Mol. Biol. (2004) 337, 209–225

Page 2: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

Introduction

The rapid progress of genomics explosion willresult in a dramatic increase of novel yet biologi-cally validated targets for drug discovery. Struc-ture-based drug design is now established as akey first step in the lengthy process of developingnew drugs.1 Thus, the role of computer-aideddrug design through virtual screening (VS) ofavailable or virtual chemical libraries will thuscontinue to grow.2 – 5 However, advances in thistechnology are badly needed to improve the accu-racy of the predicted geometries and scores.

Induced molecular flexibility is fundamental tounderstanding the principles of molecular recog-nition between ligand and receptor. Upon ligandbinding, many systems undergo rearrangements,which range from local motions of side-chains tolarge domain movements. In any case, receptorflexibility might have a dramatic impact in theligand docking problem and VS. It has beenshown that even small changes in the receptor con-formation can be important in computing bindingaffinities.6 The importance of receptor flexibilityand its implication in drug discovery has beenhighlighted in two excellent reviews in this field,7,8

and everything points in the direction that proteinmobility will have an increasing role in computer-aided drug design in the future. Dealing withprotein flexibility is essential to predict the orien-tation and interactions of a ligand within a bindingpocket in the absence of experimental structuralinformation. Prediction of mutation resistance todrugs can benefit from reliable docking algorithmsthat include conformational sampling of thereceptor.

There have been a few reports concerning theimpact of protein flexibility in ligand docking todifferent protein families. Considering twoinhibitors of HIVp, Bouzida et al.9 demonstratedconvincingly the limitation introduced by con-sidering a single and rigid receptor structure. Ananalysis of the sensitivity of the docking results toprotein flexibility in thrombin, thermolysis andneuraminidase6 showed that only 49% of theligands are cross-docked correctly to a receptorstructure bound to a different ligand, while smallmovements in the receptor structure can lead toerrors up to 14 kJ/mol in the binding energyprediction. The authors pointed out that side-chain flexibility is not sufficient for accounting forthe mis-docking of inhibitors. While evaluatingdifferent docking methods on three differentreceptors,10 the authors showed that lead dockingto a single receptor conformation significantlyreduces the chances of finding the correct pose.

There have been some attempts in the past toinclude protein flexibility in the ligand dockingprocedure. These include the early attempts usingsoft docking,11 partial side-chain flexibility,12,13 con-tinuous side-chain sampling14 and rotamericlibraries.15,16 The hinge-bending concept was alsoused to model receptor flexibility.17,18 Although

incorporating side-chain flexibility was a big stepforward, current methods should go beyond thispoint to include backbone rearrangements. As hasbeen pointed out recently,19 the use of severalreceptor structures seems to be the best choice todate to incorporate flexibility in the dockingproblem, but many questions arise. What shouldbe the source of these structures? How many areneeded? How should they be used or results becombined? And, more importantly, how shouldthis flexibility be incorporated in the VSprocedure? To date, an explicit and direct consider-ation of the receptor plasticity in the VS procedureis still computationally unattainable.

Different approaches on how to use an ensembleof receptor structures in ligand docking have beenpresented. Knegtel et al. used NMR and crystalstructures to generate combined interaction gridsby averaging with respect to energy andgeometry.20 These composite grids were used witha rigid ligand approach to re-dock native ligandsand to identify known ligands from a small com-pound database. FlexE incorporates flexibilitythrough discrete alternative conformations of vary-ing parts of the protein taken from structures thathave a very similar backbone trace, which aremerged in a combinatorial way and considereddirectly during the rigid ligand docking.21 Themethod is evaluated for root-mean-square devi-ations (RMSD) on ten different proteins containing105 crystal structures and 60 different ligands.When the top ten solutions for each ligand are con-sidered, FlexE finds the ligands within 2.0 A oftheir native pose in 67% of the cases.

An analysis of four different choices for com-bining many structures into a single representativeenergy grid has been performed recently.22 Theauthors used 21 crystal structures of the HIV-1protease with diverse inhibitors. These inhibitorswere docked into the generated combined gridsusing AutoDock23 – 25 and the RMSD from nativepose and the binding energy were evaluated andit was found that the weight-averaged gridsperform best.

Frimurer et al. used a rotamer library of four keyresidues to improve predictions of bindinggeometry and affinities in protein tyrosine phos-phatase 1B (PTP1B).26 The library was built on thebasis of observations on three crystal structures ofPTP1B bound to different inhibitors and resultedin 96 models. Docking of the three inhibitors tothese models improved their geometry and bind-ing energy predictions.

Side-chain flexibility has also been included inSLIDE, which uses a set of template points con-structed for hydrogen bond donors and acceptors,and hydrophobic regions to represent the bindingpocket.27

Molecular dynamics (MD) simulations havebeen used to generate an ensemble of differentreceptor conformations as input for the generationof a composite interaction weight-averagedgrid.28 This method was applied to VS against

210 Protein Flexibility in Docking and Virtual Screening

Page 3: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

dihydrofolate reductase and found improvementsin the top-ranked 10% of a database of drug-likemolecules. Multiple structures generated throughMD simulations were used to build a receptor-based pharmacophore model for the HIV-1integrase.29 In the search for the correct ligand–receptor conformation, the relaxed complexmethod30,31 uses an ensemble of structuresgenerated through MD simulations of the unli-ganded receptor to dock a mini-library of bindersusing a fast rigid receptor docking method. Thisrapid docking is used as a filter and selectedcomplex conformations are rescored with a moreaccurate energy function using the molecularmechanics/Poisson–Boltzmann surface areaapproach. The observed experimental complexesare found within the lowest free energy complexes.

Recently, we have used a continuous stochasticglobal optimization method using the internalcoordinate mechanics (ICM)14 methodology toincorporate side-chain and backbone flexibility toanalyze the structural binding determinants of anRXR antagonist to the receptor (C.N.C. et al.,unpublished results). With a similar methodology,the binding mode of the ligand and pocket side-chain conformations could be predicted from arandom starting conformation, with and withoutthe ligand present, in the seven-transmembraneproteins rhodopsin and bacteriorhodopsin.32 Theligand was predicted within 0.2 A RMSD and theRMSD for the pocket side-chains was 0.3 A. In arecent article, a procedure was presented toinclude side-chain receptor flexibility and conti-nuum solvation in flexible ligand docking.33 Usinga Monte Carlo simulation and the generalizedBorn/surface area (GB/SA) continuum solventmodel, the authors used a test set of 14 complexesto evaluate ligand docking with and without side-chain flexibility. Although the RMSD values of theligand were comparable in both approaches,energy discrimination of the binding mode actu-ally deteriorated in the flexible receptor dockingprocedure. The authors concluded that includingprotein flexibility may result in a rugged energylandscape with less distinguishable multipleminima.

Despite previous studies, to our knowledge nosystematic attempt has been undertaken toexamine and incorporate the influence of proteinflexibility on docking geometries and on VS andenrichment factors (EFs). In a real-life VS experi-ment, RMSD values cannot be computed and weare left with a collection of binding scores of thedocked compound library.

We present a novel algorithm called the ICM-flexible receptor docking algorithm (IFREDA),which incorporates protein flexibility in liganddocking and VS, especially in the cases wheremultiple experimental structures representative ofthe conformational space of the target protein arenot available. IFREDA generates an ensemble ofreceptor conformations by performing flexibleligand docking of selected known binders to a

flexible receptor. The conformational ensemblethus generated is then used to perform flexibleligand–rigid receptor docking and scoring, andresults from the multiple VS are then condensedusing a merging and shrinking procedure. IFREDAaccounts for both side-chains and key backbonemovements, thus sampling the conformationalspace of the target receptor. IFREDA was used ineight protein kinase (PK) complexes and was ableto identify the correct ligand pose as the best-energy ranking conformation. By merging andshrinking the results from VS against the generatedstructures, the average EF fold increase wascomparable to that obtained using multiple experi-mental structures (,1.9).

To underscore the necessity of incorporatingprotein flexibility in ligand docking and VS, andto validate our merging and shrinking procedureto improve EFs using multiple experimental struc-tures, we report extensive flexible ligand–gridreceptor docking and small-scale VS tests against33 crystal structures of four different PK sub-families. PKs have been implicated in proliferation,invasion and metastasis of many types of cancer(the importance of PKs as relevant drug targetsand the progress in developing new PK inhibitorshave been reviewed recently34). Induced-fit effectsin PKs may be one of the factors that explain whythe common ATP-binding sites are good drugtargets.7 The accuracy of native ligand dockingand cross-docking, and the impact of receptor flexi-bility in ligand docking and scoring are reported,together with the validation of the merging andshrinking procedure to condense the VS results ofmultiple receptor conformations to improve the EFs.

In Results, we report the analysis on the PKbenchmark of 33 crystal structures and the use ofIFREDA on eight PK complexes. This is followedby Discussion, where we address the best strategyfor the choice of multiple receptor conformations,and the limitations and further improvements ofour algorithm. A summary of our results andachievements followed by a detailed descriptionof the Methods used are then presented.

Results

The 100% accurate docking of nativePK ligands

The PK family is a difficult test set for anydocking-scoring method. Upon ligand binding,the side-chains in the binding pocket may adoptdifferent conformational states and, in some cases,loop rearrangements are observed (induced-fit). Apart of the ligand is usually solvent-exposed andheld in position by many hydrophobic contacts. Ithas been pointed out that induced-fit effects inPKs together with hydrophobicity of the bindingpocket may explain why the common ATP-bindingsites are good drug targets.7 Also, some ligand–protein interactions are mediated by water

Protein Flexibility in Docking and Virtual Screening 211

Page 4: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

molecules. The four protein kinase sub-familiesused in this study were: three serine/threoninePKs, the cAMP-dependent protein kinase (cAPK),the cyclin-dependent kinase 2 (CDK-2), themitogen-activated protein P38, and one tyrosinePK, the lymphocyte-specific kinase (LCK). Theywere chosen because there are more than fourcrystal structures of complexes with diverse com-pounds (see Table 1 for the crystal structures usedin this study). The ligand of 1QPC (AMP-PNP)was not used for ligand docking, since itsdisordered g-phosphate group complicates thedetermination of comparable RMSD values. How-ever, the receptor structure of 1QPC was used forcross-docking and VS.

The first step in addressing the problem of struc-tural flexibility of PKs was to perform a small-scaleligand docking and VS against the PK receptors. A1000 compound library of random moleculesseeded with the corresponding PK co-crystallizedligands (see Materials and Methods for details)was screened against each of the receptors (seeTable 1) using the ICM flexible ligand–grid recep-tor docking algorithm (see Materials andMethods).35 – 38 During the energy optimization ofthe ligand in the field of the receptor, a confor-mational set of low-energy states is generated andthe best-energy conformation is scored. In thisway, the reported RMSD values for ligands inflexible ligand–rigid receptor docking alwaysrefer to the best energy-ranked solution. Thescreening is repeated four times and the best scoreof each ligand is kept.

The native ligands were docked to the PK struc-tures with remarkable accuracy. As it is shown inTable 2, the average RMSD is 0.74 A and 100% ofthe native ligands in the 29 holo complexes aredocked within 1.5 A RMSD (individual RMSDvalues for each receptor are detailed in Figures 1and 2; and see the Supplementary Material).RMSD values are always calculated betweenheavy atoms of the docked ligand with the com-

plex in the crystal structure, after superposition ofthe backbone atoms within the ligand-bindingpocket. When symmetric moieties in the moleculesare present, all possible atom numberings withinthe symmetric portion are generated and that withthe lowest RMSD is kept.

Evaluating the strong impact of receptorflexibility in ligand docking and scoring

The influence of the induced conformationalchanges on docking results becomes apparentwhen a ligand is docked to a receptor complexedwith another compound. Since, sometimes, smallvariations in the structure of the binding pocketmight have a large impact on docking geometries,cross-docking experiments are useful to assess themagnitude of this influence. As it is shown inTable 2, on average, only 70% of the ligands arecross-docked correctly (apo structures wereexcluded from this calculation), in agreement witha recent study on docking strategies, where ICMwas ranked first for pose prediction among the

Table 2. RMS deviation for the docking of native ligandsand docking accuracy in cross-docking experiments

Completeset cAPK CDK2 P38 LCK

Docking of native ligandsfRMSD,1:0 �A

a 76.0

fRMSD,1:5 �A

a 100.0

Average RMSD (A) 0.74 0.71 0.84 0.77 0.63

Cross-dockingAverage DAb 70.0 65.0 65.0 72.0 88.0

RMSD values refer to the best energy docking solution.a Fraction of ligands with RMSD values below the indicated

threshold (A).b Docking accuracy (DA) calculated as:

DA ¼ fRMSD,2:0 �A

þ 0:5ð fRMSD,3:0 �A

2 fRMSD,2:0 �A

Þ:

Table 1. Protein kinase complexes used in ligand docking and virtual screening

PDBentry

Ligandcode Ligand name

Kinasefamily

PDBentry

Ligandcode Ligand name

KinaseFamily

1 1BKX adn Adenosine cAPK 18 1H1Q 2a6 NU6094 CDK22 1BX6 ba1 Balanol cAPK 19 1H1S 4sp NU6102 CDK23 1FMO adn Adenosine cAPK 20 1JSV u55 PNU112455A CDK24 1STC sto Staurosporine cAPK 21 1HCL CDK25 1YDR iqp H7 cAPK 22 1A9U sb2 SB203580 P386 1YDS iqs H8 cAPK 23 1BL6 sb6 SB216695 P387 1YDT iqb H89 cAPK 24 1BL7 sb4 SB220025 P388 1JLU cAPK 25 1DI9 msq Anilinoquinazoline 3 P389 1AQ1 stu Staurosporine CDK2 26 1BMK sb5 SB218655 P3810 1DI8 dtq Anilinoquinazoline 2 CDK2 27 1M7Q dqo 14e P3811 1DM2 hmd Hymenialdisine CDK2 28 1P38 P3812 1E1X nw1 NU6027 CDK2 29 1QPC anp ANP-PNP LCK13 1E9H inr E226 CDK2 30 1QPD stu Staurosporine LCK14 1FVT 106 Oxindole 16 CDK2 31 1QPE pp2 PP2 LCK15 1FVV 107 Oxindole 91 CDK2 32 1QPJ stu Staurosporine LCK16 1G5S i17 H717 CDK2 33 3LCK LCK17 1H1P cmg NU2058 CDK2

212 Protein Flexibility in Docking and Virtual Screening

Page 5: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

other three docking methods.10 The RMSD valuesof ligand cross-docking for each PK sub-family areshown in Table 2 (detailed cross-docking matricesare detailed in Figures 1 and 2; and see theSupplementary Material).

In a real-life case, the goal of VS is to select asmall number (,2%) of potential binders to thereceptor of interest from a large source library. Fora particular focused library built with the top scompounds of the ranked database, the enrichment

factor can be calculated as:

EFðsÞ ¼Hitss

NCs=

Hitstotal

NCtotal

where NC is the number of compounds.It is evident, however, that pose prediction,

while being a necessary component of the VS pro-cedure, is not sufficient for accurate scoring andthus high EFs.

In the small-scale VS of the 1000 compoundlibrary, about 80% of the native binders are scoredin the top 1.5% of the screened database whendocked to their co-crystallized structure, showingthe accuracy of our scoring function (throughoutthis work we did not try to optimize or improvethe scoring function specifically for PKs). In orderto examine the relation between RMSD values andranking, and to guide us to assess the impact ofstructural diversity on ligand docking and scoring,the correlation between compound scoring andRMSD deviation is plotted in Figure 3. Only 4% ofthe points are in the upper-left corner (bad-pose,good-score), and ,50% of these are close to theborderline of 2.5 A. The common threshold for acompound to be considered docked correctly is,2 A. However, due to variations in the positionof the backbone when overlaying the structuresfor RMSD calculation, we preferred to use athreshold value of 2.5 A. In the lower-right cornerwe have the good-pose, bad-score compounds(17%).

The fraction of ligands that are docked correctly

Figure 1. Cross-docking on CDK-2. RMSD (A) for heavy atoms is calculated after superposition of the backboneatoms within the ligand-binding pocket. RMSD values refer to the best energy solution. Thick border cells indicatenative complex. DA, docking accuracy. pThe disordered and solvent-exposed benzyl moiety of ligand 107 was excludedfrom the RMSDs calculations.

Figure 2. Cross-docking on cAPK. RMSD (A) for heavyatoms is calculated after superposition of the backboneatoms within the ligand-binding pocket. RMSD valuesrefer to the best energy solution. Thick border cellsindicate native complex. DA, docking accuracy.

Protein Flexibility in Docking and Virtual Screening 213

Page 6: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

and ranked within the top 10% is ,49% (lower-leftcorner). If we compare this number with the ,70%of compounds that are cross-docked correctly, wesee that the PK scoring and ranking is more sensi-tive to the induced-fit effects than ligand docking.Still, both 49% of good ranking compounds and,70% of correct geometries is indeed a goodresult. In the next section we improve theperformance in both the geometry prediction andthe EFs by using many experimental receptorconformations.

How to merge the screening results of knownmultiple receptor conformations to improvethe enrichment factor?

Inspection of the cross-docking RMSD valuesand EFs can help to determine which crystal struc-ture will be used or if it is meaningful to do the fullVS against more than one receptor structure tobetter represent conformational flexibility. Forexample, there are two diverse cAPK structureswith adenosine bound (PDB codes 1BKX and1FMO). No distinction as to which to choose canbe made from the cross-docking RMSD, but inspec-tion of the EFs shows that 1FMO might be a bettertarget for VS (16.7 versus 0.0 for the top 2%screened database), partially linked probably to itsbetter resolution (2.2 A versus 2.6 A). CDK-2structure 1H1S links good EF (58.3 for the top 1%selection) with high docking accuracy (DA).

When two or more crystal structures are avail-able, small-scale VS including non-native knownbinders could be very useful. In this case, resultsof the screening against each conformation aremerged and the best rank for each compound iskept, thus shrinking the scoring list to the size cor-responding to a VS against a single receptor

(referred to here as the merging-shrinking pro-cedure). We show in Table 3 how the combinationof screening results against two and three differentPK structures can yield better RMSD and EF valuesthan each one alone. Note that the better perform-ance is not self evident because the merging-shrinking procedure can dilute the correct results.Within each of the PK sub-families, a few repre-sentatives with chemically different ligands wereselected (Table 3). The LCK sub-family memberswere not included, since they had only twodifferent ligands. In addition, the 1D19 receptorwas excluded from the P38 sub-family, since theinitial EF was zero for 1% or 2% and by calculatingthe EF increase would lead to a singularity.

To characterize the improvement in EF throughmerging and shrinking the screening results, weintroduce the EF fold-increase, calculated as:

EF fold-increase ¼EFmerged

EFinvididual

The average EF fold increase was 1.85 ^ 0.65 calcu-lated for the 21 groups of Table 3 using 141 EF fold-increase values. The EF increase considering thetop 1%, 2% and 10% of the screened database was1.89 ^ 0.60, 1.83 ^ 0.60 and 1.83 ^ 0.74, respect-ively, which shows that the EF fold-increase wasroughly independent of the size of the focusedselection. The EF fold-increase was less than onein only four out of the 141 cases (,3%), whichhighlights the advantage of the merging-shrinkingtechnique. In those cases, the EF drop occurredbecause of the high EF values for structures 1H1Sand 1YDS, which were diluted with inferior EFvalues of the other receptor conformations. Thedistribution of the EF fold-increase values isshown in Figure 4.

It is evident that combining even two structuresis enough to have most of the ligands docked cor-rectly (in 19 out of the 21 groups more than 75%of the ligands are docked correctly when themerged results are considered). The merging-shrinking procedure has a more significant impacton the RMSD values than on EFs, showing againthat scoring is more sensitive to protein flexibilitythan ligand docking. However, an average EF foldincrease of ,2 should be always considered assignificant.

Since the computing time for the flexibleligand–rigid receptor docking and scoring is,1–2 minutes using a 700 MHz processor (1 MbRAM for the dual-processor node), screeningagainst a few structures is affordable and has theadvantage of using actual receptor conformations,which may differ in side-chain conformation, andin the rearrangement of loops.

The ICM-flexible receptor dockingalgorithm (IFREDA)

When several different crystal structures of thesame receptor are available, a small-scale VS

Figure 3. RMS deviations versus ranking of proteinkinase ligands. Colored marks represent outliers: yellowcircle, cAPK; blue triangles, CDK-2; red squares, P38.No outlier was found for LCK. The black lines delimitthe areas of 10% ranking and 2.5 A RMSD deviations.RMSD values refer to the best energy docking solution.

214 Protein Flexibility in Docking and Virtual Screening

Page 7: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

followed by analysis of cross-docking RMSDs andEFs can help to determine how to include potentialstructural diversity of the receptor-binding pocketin the VS using large compound libraries as dis-cussed above. Merging and shrinking the resultsof two or three screenings could be a solution toimprove the EF. The IFREDA is especially usefulin cases where only one holo (or apo) crystal struc-ture and known binders are available. It has threemain steps: (i) de novo receptor structure generationby performing flexible ligand docking of knownbinders to a flexible receptor, where side-chainand essential backbone movements are taken intoconsideration; (ii) VS against the generated confor-mations using flexible ligand–grid receptor dock-ing; (iii) merging and shrinking of the VS resultsin the same way as described above.

The computational procedure to de novogeneration of alternative multiplereceptor conformations

This procedure (see Materials and Methods fordetails) consists of:

Seeding. The ligand was placed within theligand-binding pocket and four conformations aregenerated by flipping the ligand 1808 with respectto its principal axes of inertia. Each of these start-ing poses is then followed by ten random displace-ments and rigid rotations of the ligand, thusgenerating 40 starting complex conformations.

Soft-van der Waals structure relaxation. Eachcomplex conformation was allowed to relaxthrough in vacuo minimization using a variableweight soft van der Waals algorithm. Selectedbackbone loops, pocket side-chains and the ligandwere regarded as fully flexible in this stage.

Stochastic global energy optimization. The 40 com-plexes generated through seeding and in vacuominimization were subjected to a stochastic globalenergy optimization of the side-chain and ligand

Table 3. Improvements in enrichment factors and RMSDvalues for focused libraries by using the merging andshrinking procedure

PDB entry

EF(top1%)

EF(top2%)

EF(top10%)

%Binders withRMSD , 2.5 A

1DM2 25.0 16.7 3.3 41.71G5S 41.7 25.0 5.8 91.71DM2 þ 1G5S 58.3 29.2 6.6 91.7

1DM2 25.0 16.7 3.3 41.71JSV 33.3 20.8 5.0 41.71DM2 þ 1JSV 50.0 29.2 7.5 75.0

1DM2 25.0 16.7 3.3 41.71H1P 16.7 8.3 4.2 58.31DM2 þ 1H1P 41.7 20.8 5.8 83.3

1DM2 25.0 16.7 3.3 41.71G5S 41.7 25.0 5.8 91.71JSV 33.3 20.8 5.0 41.71DM2 þ 1G5S þ 1JSV 66.7 37.5 9.2 100.0

1DM2 25.0 16.7 3.3 41.71H1P 16.7 8.3 4.2 58.31JSV 33.3 20.8 5.0 41.71DM2 þ 1H1P þ 1JSV 50.0 25.0 7.5 100.0

1AQ1 41.7 29.2 6.7 75.01E1X 33.3 20.8 4.2 25.01AQ1 þ 1E1X 50.0 33.3 7.5 91.7

1E1X 33.3 20.8 4.2 25.01G5S 41.7 25.0 5.8 91.71E1X þ 1G5S 58.3 33.3 6.7 91.7

1E1X 33.3 20.8 4.2 25.01G5S 41.7 25.0 5.8 91.71DM2 25.0 16.7 3.3 41.71E1X þ 1G5S þ 1DM2 66.7 33.3 8.3 91.7

1FVT 25.0 16.7 5.0 58.31E1X 33.3 20.8 4.2 25.01FVT þ 1E1X 50.0 29.2 6.7 66.7

1H1S 58.3 29.2 9.2 100.01E1X 33.3 20.8 4.2 25.01H1S þ 1E1X 66.7 37.5 7.5 100.0

1DM2 25.0 16.7 3.3 41.71H1P 16.7 8.3 4.2 58.31DM2 þ 1H1P 41.7 20.8 5.8 83.3

1FMO 16.7 16.7 5.0 50.01YDS 50.0 25.0 5.0 83.31FMO þ 1YDS 50.0 33.3 8.3 83.3

1FMO 16.7 16.7 5.0 50.01BX6 16.7 8.33 1.67 50.01FMO þ 1BX6 33.3 16.7 6.67 66.7

1BX6 16.7 8.33 1.67 50.01YDS 50.0 25.0 5.0 83.31BX6 þ 1YDS 33.3 16.7 5.0 83.3

1FMO 16.7 16.7 5.0 50.01STC 16.7 8.33 5.0 83.31FMO þ 1STC 33.3 16.7 8.33 83.3

1BX6 16.7 8.33 1.67 50.01STC 16.7 8.33 5.0 83.31BX6 þ 1STC 33.3 16.7 3.33 83.3

1FMO 16.7 16.7 5.0 50.01BX6 16.7 8.33 1.67 50.01STC 16.7 8.33 5.0 83.31FMO þ 1BX6 þ 1STC 50.0 25.0 6.67 83.3

1A9U 16.7 8.33 5.0 83.31BMK 16.7 8.33 3.33 66.7

(continued)

Table 3 Continued

PDB entry

EF(top1%)

EF(top2%)

EF(top10%)

%Binders withRMSD , 2.5 A

1A9U þ 1BMK 16.7 16.7 8.33 83.3

1A9U 16.7 8.33 5.0 83.31M7Q 16.7 8.33 1.67 83.31A9U þ 1M7Q 33.3 16.7 5.0 83.3

1BMK 16.7 8.33 3.33 66.71M7Q 16.7 8.33 1.67 83.31BMK þ 1M7Q 16.7 16.7 5.0 83.3

1A9U 16.7 8.33 5.0 83.31M7Q 16.7 8.33 1.67 83.31BMK 16.7 8.33 3.33 66.71A9U þ 1M7Q þ 1BMK 33.3 25.0 5.0 83.3

Virtual screening is performed against two and three multipleexperimental crystal structures of the same protein kinase sub-family.

Protein Flexibility in Docking and Virtual Screening 215

Page 8: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

torsion angles using the double-energy minimiz-ation scheme.14 During simulations, a set of geo-metrically diverse low-energy states are stored,which are then clustered by comparing the RMSDof the heavy-atom coordinates of the ligand toeliminate redundant conformations. The top-rank-ing conformations were subjected to full minimiz-ation, keeping flexible the selected backboneloops, pocket side-chains and the ligand. Theenergy was then re-evaluated with a more accu-rate solvation energy term by solving the Poissonequation using the boundary element algorithm.39

Optimization of the simulation temperature for thede novo receptor generation procedure

Four complexes with ligands belonging todifferent chemical spaces (1YDT, 1H1Q, 1FVT and1DM2) were chosen to run a Monte Carlo energyoptimization procedure at different temperatures.Side-chains within 6.5 A from the ligand and tor-sion angles of the ligand were randomized at 458amplitude and the position of the ligand center ofmass was displaced randomly 2 A. For each com-plex, on average ten simulations were performedat 300 K, 600 K, 1000 K and 2000 K. For a givencomplex and temperature we recorded the numberof low-energy conformations found and if theglobal minimum was found or not. The low-energyconformations were defined as those within10 kcal/mol of the global minimum. We foundthat 600 K was the best choice for Monte Carlosimulations, in agreement with what has beenalready found for conformational searches inproteins.40

De novo receptor generation of accurate PKstructures using IFREDA

We selected eight PK complexes for modelingthrough docking to a flexible receptor either

because of ligand–receptor clashes (induced-fitupon binding) or because non-optimal ligand–receptor contacts, which lead to poor bindingscores. The complexes chosen were A, 1FMO þbalanol; B, 1JLU þ staurosporine; C, 1JLU þ H7;D, 1DM2 þ NU6027; E, 1E1X þ oxindole 16; F,1JSV þ oxindole 16; G, 1BMK þ 14e; H,3LCK þ PP2.

It has been observed that binding pockets havevery poorly and very highly mobile regions.41

Indeed, this could be observed within each PK

Figure 4. Distribution of theenrichment factor fold-increaseobtained by the merging-shrinkingprocedure on 21 groups of two orthree protein kinases (Table 3).

Table 4. Lowest free energy conformations of the eightprotein kinase structures generated by using a de novogeneration of multiple receptor conformations

Complex Ranking

LigandRMSD

(A)DDG

(kcal/mol)

A 1FMO þ balanol 1 1.4 0.04 2.7 7.7

B 1JLU þ staurosporine 1 0.8 0.02 3.0 8.2

C 1JLU þ H7 1 0.7 0.03 4.0 11.9

D 1DM2 þ NU6027 2 1.8 þ0.51 8.1 0.0

E 1E1X þ oxindole 16 1 2.4 0.02 4.4 2.2

F 1JSV þ oxindole 16 1 2.1 0.02 4.3 1.3

G 1BMK þ 14e 1 1.9 0.02 7.7 5.6

H 3LCK þ PP2 1 0.9 0.03 6.1 2.2

RMSD and free energy values are in A and kcal/mol, respect-ively.

DDG is the difference in free energy with respect to the best-energy conformation.

216 Protein Flexibility in Docking and Virtual Screening

Page 9: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

sub-family. In the eight PK complexes, the partsregarded as flexible during the procedure (loops)include the mobile glycine-rich flap plus otherresidues that depend on the particular PK sub-family. They were chosen on the basis of theobserved flexible parts, comparing crystal struc-tures of the same PK sub-family. These selectionsare somewhat arbitrary and could be expanded.The following residues were selected: (i) cAPK49–58, 70–74, 120–127, 170–173, 181–187 and322–330; (ii) CDK-2 8–18, 80–86, 131–134, 143–147; (iii) P38 30–38, 104–112, 50–54, 167–169; (iv)LCK 250–260, 316–323. The eight complexes wereoptimized using the de novo structure generationtool of the IFREDA procedure as described inMaterials and Methods. In Table 4 we show theRMSD of the best-ranking solution together withthe rank, RMSD and relative free energy of thefirst significantly structurally different confor-mation. We selected the best conformations solelyon the basis of energy calculation, and it is seenthat most of the ligands are within 2 A of theirnative structure.

1FMO þ balanol complex. In the cAPK sub-family,balanol cannot dock correctly in the adenosine-bound pocket (PDB entries 1BKX and 1FMO),since it clashes with the glycine flap (Gly50-Thr-Gly-Ser-Phe-Gly-Arg-Val). This loop moves about2 A upwards upon balanol binding, as it is seen instructure 1BX6. In Figure 5 we show how model Areproduces this effect. Balanol superimposes verywell with its native structure (1BX6) making alarge number of contacts with the generatedreceptor structure.

1JLU þ staurosporine complex. Staurosporine, a

flat and rigid compound that could be dockedcorrectly only to its native structure, induces con-formational changes in its neighboring residues,especially in the glycine flap and the C-terminalloop containing Phe327, which is pushed away bystaurosporine more than 2 A. We show in Figure 6how this induced fit was indeed reproduced inmodel B, while staurosporine superimposesexcellently with its native structure.

1JLU þ H7 complex. Ligand H7 could dock cor-rectly to apo structure 1JLU but scored poorly. Theflexible receptor docking of H7 to 1JLU has a verygood RMSD (see Table 4). It is shown below thatduring the small-scale VS, while the scoring of H7in model C improved (H7 is now placed withinthe top-ranking 2%), H7 was not placed at the topof the hit list, probably due to the omission ofwater molecules in the VS.

1DM2 þ NU6027 complex. Superposition ofstructures 1DM2 and 1E1X shows that ligandNU6027 cannot dock correctly into 1DM2 becausethe amino group in position 2 clashes with Leu83.This clash pushes the ligand away and the optimalcontacts for binding cannot be achieved. In theoptimized model D, the backbone of the hingeregion including Leu83 rearranges, restoring thetriplet hydrogen bond pattern of NU6027 withGlu81 and Leu83. The pyrimidine ring super-imposes very well with that of 1E1X, and theRMSD of 1.8 A comes mainly from the deviationof the high B-factor cyclohexyl moiety. In fact, thecrystal structure of 1E1X does not show an optimalconformation for the hydrogen bond between the2-amino group of NU6027 and the O in Leu83,since the angle O–H–N is ,1148, while in the

Figure 5. Flexible docking of balanol into the 1FMO binding pocket (model A). The modeled structrure is in grey,while 1FMO is displayed in yellow. Balanol carbon atoms are displayed in yellow (corresponding to native structure1BX6) and white (modeled complex). Notice the displacement in the modeled structure of the glycine-rich flap ,2 Aupwards to allow balanol to bind. (The picture was constructed using ICM version 3.0.)

Protein Flexibility in Docking and Virtual Screening 217

Page 10: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

optimized model this angle is ,1508, closer to theoptimal 1808. However, in model D, the distancebetween NH of Leu83 and N-1 of NU6027 is some-what large (,2.4 A), which may account for thefact that NU6027 does not rank at the top in thesmall-scale VS (see below). In model D, the energyof the best pose was slightly worse than that of amisdocked one, by 0.5 kcal/mol (see Table 4),which we consider within the accuracy of ourenergy function (1 cal ¼ 4.184 J). Of course, themisdocked conformation could be discardedbased on structural considerations of the bindingmode to PKs.

1E1X þ oxindole 16 complex. Compound oxindole16 is unable to dock to structure 1E1X, mainly dueto the positions of Lys33 and Lys89. Superpositionwith the native structure of oxindole 16 (1FVT)shows that Lys33 clashes with the Br-benzyl partof oxindole 16, while Lys89 clashes with the SO2

part. Modeling of 1E1X with oxindole 16 by flexiblereceptor docking (model E) shows a rearrangementof both lysine residues. However, Lys33 wasanchored by a salt-bridge with Asp45 and couldnot adopt the completely buried and foldedposition that is exhibited in 1FVT, whereby it hasno close contact. In this way, oxindole 16 couldnot fit completely into the binding pocket, with adisplacement towards the exposed solvent regionof ,1.5 A. The hydrogen bond between the indoli-none O and HN of Leu83 is conserved but thatwith the carbonyl group of Glu81 is lost.

1JSV þ oxindole 16 complex. The clash betweenLys33 and oxindole 16 is the reason why this com-

pound could not be docked to 1JSV. Flexiblereceptor docking of oxindole 16 into 1JSV (modelF) shows the same effects as in model E. In thiscase, the ligand is shifted ,1.2 A, and the averageRMSD is marginally better (see Table 4). In thiscase, the hydrogen bond with HN of Leu83 is con-served and that with Glu81 is lost, but a new oneis formed between the O of Leu83 and the H atN17 in oxindole 16.

1BMK þ 14e complex. Compound 14e is anotherexample of correct docking to 1BMK and poorscoring. The triplet hydrogen-bonding patternbetween the ligand and the backbone (O ofLeu107, NH of Met109 and NH of Gly110) couldnot be reproduced through flexible ligand–rigidreceptor docking; only the hydrogen bond withMet109 could be achieved. Using IFREDA togenerate the complex structure 1BMK þcompound 14e (model G) restores the hydrogenbond with Leu107 and keeps that with Met109.The explanation of why the hydrogen bond withGly110 was not restored lies in the fact that in thenative crystal structure of compound 14e (1M7Q)a peptide flip between Met109 and Gly110 isobserved, which could not be reproduced in ourmodel. With that flip, the HN of Gly110 faces thepocket, thus enabling the formation of the hydro-gen bond (this peptide flip effect has been studiedrecently by Fitzgerald et al.42). Despite this fact, theligand RMSD of model G is good, especially takinginto account that the shift comes mainly from thehigh B-factor piperazine moiety.

3LCK þ PP2 complex. The main reason why the

Figure 6. Flexible docking of staurosporine into the 1JLU binding pocket (model B). The modeled structrure is ingrey, while 1JLU is displayed in yellow. The glycine-rich flap has been cut for the sake of clarity. Staurosporine carbonatoms are displayed in yellow (corresponding to native structure 1STC) and white (modeled complex). Notice the dis-placement in the modeled structure of the C-terminal loop, which contains Phe327. (The picture was constructed usingICM version 3.0.)

218 Protein Flexibility in Docking and Virtual Screening

Page 11: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

grid docking of PP2 to structure 3LCK fails is notdue to clashes but to a weak hydrogen bond inter-action pattern between PP2 and the receptor, as itis seen when superimposing the native structureof PP2 (1QPE) with 3LCK. In model H, this patternis improved slightly and in the ligand dockingusing a rigid receptor approach of PP2 to model Hthe ligand is placed within 1.3 A.

Improvements in RMSD values and enrichmentfactors using IFREDA

The receptor structures thus obtained were usedto perform small-scale VS in the same way as wasdone for the crystal structures (see above). TheRMSD and EF values for the corresponding nativereceptors and those generated through IFREDAare shown in Table 5, together with the valuesobtained by merging and shrinking of the screen-ing results. The EF increase for five analyzedgroups with respect to their single original struc-ture was as high as 1.89 ^ 0.60 (averaged over 15EF values), which is even slightly higher than thevalue obtained through using multiple experimen-tal structures (1.85 ^ 0.65), while no decrease inthe enrichment factor was observed. Althoughtested on a limited set, this result shows that theIFREDA is an efficient tool to incorporate receptorconformational diversity in the VS procedure.

For the model A, balanol got the best score.However, two other ligands among six (H8 andH89) were placed in the top 1% ranking. It shouldbe remarked that H89 was not even able to dockcorrectly to 1FMO. All of the ligands except forstaurosporine are docked within 0.9 A. This showsthat modeling by flexible receptor docking does

not make a tight custom-pocket that accommo-dates only the ligand used in the optimization, aswill be seen in other cases. Combination of EFs ofboth VS does not make any difference in this case:the generated receptor structure alone is enoughto improve the EF significantly.

The structure of 1JLU complexed with stauro-sporine (model B) is expected to behave in afashion similar to that of native structure 1STC.And this is exactly the case. Staurosporine gets thetop ranking score in the database, but no otherligand is present in the top 10% ranking. However,beside staurosporine, adenosine and H7 weredocked within 0.8 A but scored poorly. The com-bined EF is the arithmetic sum of each VS againstthe native structure and the model and the increaseis significant.

As stated previously, compound H7 docked cor-rectly in 1JLU, but has a rather poor score (itranked within the top 30%). The VS on model Cshows H7 now in the top 2% ranking. CompoundH89, which could not dock to 1JLU, is now able todock in the generated binding pocket and both H8and H89 also improve their score significantlycompared to 1JLU. Adenosine, H7, H8 and H89are docked within 0.9 A, and balanol within 2.7 A,somewhat worse than to 1JLU, but it still ranks inthe top 1%. One possible explanation for the factthat H7 does not score better in model C could befound in the fact that water molecules were notconsidered in the VS.

Compound NU6027, which could not dock to1DM2, was placed within 1.8 A in modelD. Although its score was not optimal andNU6027 ranked in the top 7%, other compoundsthat were either “misdocked” in 1DM2 (NU2058,NU6094 and NU6102) or docked correctly withpoor score (H717) were ranked in model D withinthe top 1.5%, while compound NU2058 was withinthe top 5% ranking. As a consequence, the EF ofthe combined set is higher than the individual EFvalues for each of the three cut-offs, and theRMSD values are improved.

Ligand docking and small-scale VS againstmodels E and F (1E1X þ oxindole 16 and 1JSV þoxindole 16, respectively) using a rigid receptormodel failed to place oxindole 16 in its nativepose but, surprisingly the similar compoundoxindole 91 was docked in place with a top rank-ing score. Although the RMSD of oxindole 16 inmodels E and F is within an acceptable range (seeTable 4), we believe that further refinement mightbe necessary to get better structures to be used in VS.

In the model of 1BMK with compound 14e, thiscompound ranks first with an RMSD of 2 A,slightly worse than in 1BMK but with a muchbetter score. In this case, the improvement in theEF values obtained by merging and shrinking ismodest. As expected, there is no improvement inRMSD values, since compound 14e docked alreadycorrectly into 1BMK and the side-chain rearrange-ment was supposed to be small, just to improveligand–receptor contacts.

Table 5. Improvements in enrichment factors and RMSDvalues for focused libraries by using the merging andshrinking procedure when virtual screening is per-formed against the native structure and another onegenerated using IFREDA

PDB entry

EF(top1%)

EF(top2%)

EF(top10%)

%Binders withRMSD , 2.5 A

1FMO 16.7 16.7 5.0 50.01FMO þ balanol 50.0 33.3 8.3 83.3Combined EF 50.0 33.3 8.3 83.3

1JLU 16.7 8.3 3.3 66.71JLU þ staurosporine 16.7 8.3 1.7 50.0Combined EF 33.3 16.7 5.0 83.3

1JLU 16.7 8.3 3.3 66.71JLU þ H7 16.7 25.0 6.7 66.7Combined EF 16.7 25.0 8.3 83.3

1DM2 25.0 16.7 3.3 41.71DM2 þ NU6027 16.7 12.5 5.0 50.0Combined EF 41.7 25.0 6.7 75.0

1BMK 16.7 8.3 3.3 66.71BMK þ 14e 16.7 8.3 5.0 66.7Combined EF 16.7 16.7 5.0 66.7

RMSD values refer to the best-energy solution.

Protein Flexibility in Docking and Virtual Screening 219

Page 12: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

No EF improvement is reported for LCK struc-tures, since only two different ligands were tested.Compound PP2, which could not dock in apo struc-ture 3LCK, could dock in model H within 1.3 A andranked in the top 5%. Staurosporine could dock cor-rectly (RMSD 0.9 A) and ranked in the top 5%.

Discussion

Receptor flexibility: which is the best choice todeal with multiple receptor conformations?

Potential binders can be lost (ranked poorly) inthe VS for the following reasons. (i) They are mis-docked because of clashes with the receptor orbecause the ligand–protein contacts are not strongenough to hold the ligand in a correct position(for example, if the quality of the crystal structureis bad or residues within the pocket have highB-factors). (ii) They are docked correctly, but theydo not score properly because the ligand–receptorcontacts are not optimal. (iii) They are misdockedor they do not score properly because watermolecules or ions are not included in the receptormodel. (iv) Uncertainty in the ionization state ofthe ligand or the receptor, due to receptor-induced(ligand-induced) pKa changes in the ligand(receptor). (v) They are misdocked because ofinsufficient sampling or they are docked correctly,but they do not score properly because of failuresin the scoring function. The first two reasons arerelated to rearrangements of the binding pocketupon ligand binding.

Direct incorporation of protein flexibility inscreening of large compound libraries is computa-tionally too expensive. The use of many receptorstructures has been characterized as the best choiceto incorporate receptor flexibility in liganddocking.19 However, there are many unansweredquestions regarding how these structures shouldbe used, generated and, in particular, how theflexibility consideration impacts the success of theVS procedure. The merging-and-shrinking ofscreening results from multiple but fixed receptorconformations (either experimental or computer-generated) can deal with both side-chain and back-bone flexibility, as is the case of balanol binding to1FMO (Figure 5) or staurosporine binding to 1JLU(Figure 6).

Using the merge-and-shrink procedure with twoand three receptor structures (either experimentalor generated) we obtained an EF fold-increase of,1.90, showing that a small ensemble of PK struc-tures can satisfactorily span the conformationalspace of the binding pocket. The computationaleffort for flexible ligand–rigid receptor docking isabout one to two minutes (700 MHz processor,1 Mb RAM for the dual-processor node), so screen-ing against many different structures (say five) iscomputationally still affordable. In cases where alarge number of experimental structures are avail-able, a small-scale ligand docking and screening

may be used as a tool to help to eliminate redun-dant structures from the ensemble, while keepingthe most structurally diverse.

Comparison with other methods is difficult,since most of them analyze the impact of receptorflexibility on docking accuracy only, and all ofthem, including that presented here, have beentested on a limited number of protein families,sometimes even in just one protein with manyavailable crystal structures, thus leaving an openquestion regarding transferability. However, someanalogies and differences with other methods canbe pointed out.

By using many receptor conformations, we avoiddirectly the use of averaged composite grid maps,in which the structural diversity of the pocket isrepresented by only one structure.20,22,28 While inthose studied cases some of the weighted-averagegrid maps performed satisfactorily in evaluatingRMSD values and binding energies of knownbinders, there might be other cases in which asingle grid map will not be able to representreceptor flexibility, specially if loop rearrangementsare involved. Other alternatives, like generating anensemble of structures through combination ofrotameric states of key residues,26 will lead to acombinatorial multiplication of structures, whichmakes it unfeasible to be used in VS of hundredsof thousand of compounds.

The incorporation of flexibility through discretealternative conformations of varying parts of theprotein that are merged in a combinatorial wayand considered directly during the ligand docking(FlexE21) is definitely an interesting approach. Themethod has been evaluated using a rigid liganddocking algorithm and is currently limited to theuse of source structures that have a similar back-bone trace. The time for protein preparation andligand docking (per ligand) is roughly equivalentto dock and score one ligand to five structuresusing IFREDA.

Limitations of IFREDA: a guide forfurther improvements

It is important to remark here that the de novogeneration of receptor structures through IFREDAdoes not make a tight custom-pocket that accom-modates only the ligand used in the optimization,as shown in Results.

Two of the eight generated structures were notsuitable for VS (models E and F), since it was notpossible to re-dock the ligand used in the optimiz-ation using a rigid receptor approach. Those struc-tures were far from useless, since other knownbinders could be docked and scored correctly.However, failing to re-dock the ligand used in thecomplex structure generation should be taken asevidence that the model should be improved. Thiscould be linked to the comparatively high RMSDvalues of the ligand (,2 A) in both models(see Table 4). Two main improvements could beincorporated in our de novo generation procedure. (i)

220 Protein Flexibility in Docking and Virtual Screening

Page 13: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

Performing the stochastic energy optimization on awider selection of side-chain torsional angles (forexample, randomly perturbating and minimizingwith respect to L2; see Stochastic global energyoptimization in Materials and Methods). (ii) Using abetter solvation energy term during the stochasticoptimization (for example, GB/SA). This could helpto improve the energy discrimination of mis-dockedconformations (for example, in model D, where twoconformations were energetically indistinguishable).

Although backbone flexibility is incorporated inthe IFREDA, we were not able to reproduce thepeptide flip between Met109 and Gly110 of 1M7Qcompared to 1BMK. To overcome the large poten-tial barrier of this flip, a Monte Carlo sampling ofthe backbone atoms would be necessary andmight be incorporated in future developments.

Conclusions

We present an algorithm, called IFREDA, toincorporate protein flexibility in ligand dockingand scoring, specially useful in a drug discoveryscenario where few or no holo experimentalstructures are available, while some binders areknown. Initially, IFREDA generates a set of recep-tor conformations by docking flexible ligands to aflexible receptor through a global energy optimiz-ation of the complexes. Both side-chains andessential backbone distortions are included in thisoptimization, thus sampling the conformationalspace of the receptor, even in cases of looprearrangements. The receptor structural set is thenused to perform flexible ligand–grid receptordocking and scoring, followed by merging the VSscores and keeping the best rank for each com-pound. The scoring list is thus shrunk to the sizecorresponding to a VS against a single receptor.

To underscore the impact of protein flexibility onRMSD and EF values, and the necessity of some-how including induced-fit effects in VS, in the firststage of our study we report ligand docking andVS tests against 33 crystal structures of four differ-ent sub-families of protein kinases; cAPK, CDK-2,P38 and LCK. We used a 1000 compound librarybuilt using a random set of molecules seeded withco-crystallized PK ligands. Native ligands weredocked with an average RMSD of 0.74 A to theirco-crystallized structure (maximum RMSD was1.5 A, see Table 2), while ,70% of the ligandswere cross-docked within 2.5 A, and 80% of thenative ligands scored in the top-ranking 1.5%,which altogether can be taken as a validation ofour docking and scoring algorithm, even thoughwe did not make any attempt to improve or tunethe scoring function for PKs. The fraction ofdocked ligands within 2.5 A and ranking withinthe top 10% is ,49% (see Figure 3). Comparingthis with the ,70% of compounds that are cross-docked correctly, it is observed that induced-fiteffects have stronger effect on scoring and rankingthan on ligand docking accuracies.

We then showed that merging the VS resultsagainst two and three diverse experimental recep-tor conformations by retaining the best rankingposition for each ligand, led to an average EF fold-increase of 1.85 ^ 0.65, with less than 3% cases inwhich the EF actually deteriorates due to anincrease of false positives (see Figure 3). In themerged set, most of the ligands are predictedwithin 2.5 A RMSD from their native pose (morethan 75% in 19 out of 21 of the groups). Since thecomputing time for the flexible ligand–rigid recep-tor docking is about one to two minutes using a700 MHz processor (1 Mb RAM for the dual-processor node), screening against a few numberof structures is affordable and has the advantageof using actual receptor conformations that maydiffer in side-chain conformation, and in therearrangement of loops.

The IFREDA procedure has been validated ineight PK complexes and was able to generate thecorrect receptor and ligand bound conformationand energetically discriminate it from mis-dockedconformations within the accuracy of the energyfunction, even in cases where loop rearrangement(,2 A) was necessary for the ligand binding (seeTable 4). Since it has been shown that bindingsites have regions of very high and very lowstability,41 some portions of the backbone were con-sidered as rigid, while others and the side-chainswithin the binding pocket were flexible. The choiceof flexible backbone parts has been undertaken byinspecting the mobile parts in the available crystalstructures. In some cases this is coincident withhigh B-factor regions. Results from the small-scaleVS against the generated set of structures werecondensed using the merging and shrinking pro-cedure. We observed an EF fold-increase of1.89 ^ 0.60 (see Table 5), slightly better than usingmultiple experimental structures, which clearlyshows that multiple receptor conformations gener-ated through IFREDA represent the structuralspace of the binding pocket and that these confor-mations can be used when multiple experimentalstructures are not available, or when the confor-mational space of the receptor is not representedby the experimental structures available. Interest-ingly, our de novo structure generation does notmake a custom-pocket that is suitable only for theligand used in pocket generation.

Our methodology was successful in incorporat-ing protein flexibility even in cases of looprearrangements and improving the results ofligand docking and VS. A better solvation energyterm for the Monte Carlo sampling and a method-ology to allow backbone flipping will be incorpor-ated in future developments.

Materials and Methods

Receptor preparation

The coordinates of protein kinase complexes were

Protein Flexibility in Docking and Virtual Screening 221

Page 14: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

taken from the RCSB Protein Data Bank (PDB).43 Kinasecomplexes with resolution higher than 2.8 A or incompleteligand structures were discarded. Structures containingcharged ligands with counter ions were not included. The33 selected complexes are listed in Table 1. In each sub-family, one apo structure was included. Two adenosine-bound structures of cAPK (1BKX and 1FMO) and twostaurosporine-bound structures of LCK (1QPD and 1QPJ)were included, since they exhibit different conformationsfor the glycine-rich flap. P38 apo structure 1P38 was pre-ferred over 1WFC due to a higher resolution.

Hydrogen and missing heavy atoms were added tothe receptor structure followed by local minimizationusing the conjugate gradient algorithm and analyticalderivatives in the internal coordinates space. The ener-getically most favorable tautomeric state of His waschosen. Positions of Asn and Gln were optimized tomaximize hydrogen bonding. In cases where theoccupancy was not equal to 1 we chose the conformationwith the lowest energy. The position of polar hydrogenatoms (including those of water molecules) in theligand-binding pocket (LBP) within 5 A of the ligandwere optimized but water molecules and peptide PKI-(5–24) were then removed whenever present for sub-sequent calculations. No structures with alternativehydroxyl rotamers or histidine tautomers were gener-ated for VS, since receptor flexibility was addresseddirectly in this work.

Small-molecule library preparation

The 3D coordinates of the ligands were taken from thecrystal structure and their correct stereochemistry andformal charges were assigned. The ligands are listed inTable 1 together with their PDB three-letter code. Theprotonation state was determined according to anenvironment at pH 7.4. The following groups wereregarded as charged: staurosporine amino group,44 nitro-gen atoms of terminal amino substituents inisoquinolinesulfonamide derivative inhibitors H7, H8and H89,45 the guanidine ring of hymenialdisine,46 theadenine-derived compound H717 amino group at itsdiaminocyclohexane substituent at C2,47 nitrogen in thecyclohexyl amine group of compound SB220025, piper-idine nitrogen in dyhydroquinazoline derivative (PDBcode dqo).48 Each ligand was then assigned the MMFFatom types and charges,49 and subjected to a globalenergy optimization using the ICM stochastic optimiz-ation algorithm.50

A random database of compounds was generatedusing every 272nd molecule of the Diverse Set databaseof ChemBridge (ChemBridge, Inc., San Diego, CA)consisting of 272,938 compounds. Carboxylic acids andprimary, secondary and tertiary amine groups wereregarded as being charged. These compounds were pre-pared and energy optimized in a fashion similar to thatused for the ligand compounds. The overlap of basicchemical descriptors between the ligand and the randomlibraries is essential for meaningful results using the VSprotocol. Four properties were calculated: molecularmass ðMÞ, number of rotatable bonds ðNrotÞ, number ofhydrogen bond acceptors ðNHBaccÞ and donors ðNHBdonÞ:The average and standard deviation values for the ligandand random libraries were respectively as follows: M,351.5 ^ 83.0 and 364.0 ^ 73.8; Nrot, 2.3 ^ 1.9 and2.4 ^ 2.0; NHBacc, 3.9 ^ 1.9 and 3.6 ^ 1.5; NHBdon,3.2 ^ 1.6 and 1.1 ^ 0.9.

For each PK sub-family, a 1000 compound sample

library was built by merging the corresponding ligandswith the compounds from the random database. Sincethe number of ligands depends on the PK sub-family,some compounds from the random database weredeleted in a random manner to ensure a total of 1000compounds.

Energy calculation and optimization

The molecular system is described in terms of internalcoordinates variables, using a modified ECEPP/351 force-field with distance-dependent dielectric constant for theenergy calculations as implemented in ICM.14,37 Thebiased probability Monte Carlo (BPMC) minimizationprocedure was used for global energy optimization.50,52

The BMPC global energy optimization method consistsof the following steps: (1) a random conformation changeof the free variables according to a predefined con-tinuous probability distribution;50,52 (2) local energyminimization of analytical differentiable terms; (3)calculation of the complete energy including non-differentiable terms such us entropy and solvationenergy; (4) acceptance or rejection of the total energybased on the Metropolis criterion53 and return to step 1.

Flexible ligand/grid receptor docking

The VS module as implemented in ICM35 – 38 was used.The energy function used during the flexible ligand–gridreceptor docking simulations consisted of:36

E ¼ EFFint þ Evw þ Eel þ Ehb þ Ehp

where EFFint is the internal force-field energy of theligand, and Evw,Eel,Ehb and Ehp are the van der Waals,electrostatic, hydrogen bond and hydrophobic potentialterms, respectively. The last four terms are pre-calculatedon a grid spacing of 0.5 A to accelerate energyevaluation.

The van der Waals potential in its standard 12-6 formis too sensitive and may introduce noise into the energyfunction. For intermolecular interactions, a softer trun-cated van der Waals potential is used according to:

Evw ¼

Eovw if Eo

vw # 0

EovwEmax

Eovw þ Emax

if Eovw . 0

8><>:

where the Emax value was chosen as 1.5 kcal/mol. Thetotal van der Waals map is composed by three mapsrepresenting hydrogen atoms, first-row atoms (C, N, Oand F) and heavy atoms (P, S, Cl, Br and I).

The electrostatic map is calculated using a distance-dependent dielectric constant 1 ¼ 4r:

The hydrogen bonding potential is represented byspherical Gaussians centered at the donor or acceptorsites, according to:

EhbðrÞ ¼ Eohb e2kr2repk

2=d2

hb

where the points r denote the locations of the interactioncenters, which are at 1.7 A from the atom center. In thecase of hydrogen atoms, the center is placed along theaxis of the covalent bond attaching the hydrogen atomto the rest of the molecule. In the case of heavy sp2

atoms, one (for nitrogen) or two (for oxygen) centers areplaced at an angle of 1208 to the existing covalent bond.For sp3 oxygen and sulfur atoms, two centers are placedin tetrahedral geometry at 1098 to the existing covalentbonds and to each other. The radius of the interaction

222 Protein Flexibility in Docking and Virtual Screening

Page 15: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

sphere is set to dhb ¼ 1:4 A. The Eohb for donor and

acceptor atoms is assumed to be 2.5 kcal/mol.The hydrophobic potential on point r on the grid is

calculated as:

EhpðrÞ ¼ Eohp e2d2

surfðrÞ=d2

w

where dsurfðrÞ is the distance from r to the closest point onthe hydrophobic surface, and dw is the effective radius ofthe hydrophobic interaction, which is taken as thediameter of the water molecule, 2.8 A. The value of Eo

hp3 kcal/mol was chosen to approximate the surfacetension of 30 cal/(mol A2) for extended hydrophobicsurfaces in test cases.

The energy of the free ligand is minimized in theabsence of the receptor prior to docking. The ligand isthen docked into the grid representation of the receptorusing the global energy optimization method describedabove. The best energy conformation was scored usingan empirical scoring function based on its fit into thebinding pocket. In this way, the reported RMSD valuesof ligands always refer to the best-energy conformation.

The ICM scoring function consists of the followingterms:35 – 38

Escore ¼ DEIntFF þ TDSTor þ a1DEHBond þ a2DEHBDesol

þ a3DESolEl þ a4DHPhob þ a5QSize

where DEIntFF includes the van der Waals interaction ofthe ligand with the receptor as well as the internalforce-field energy of the ligand; TDSTor is the ligand con-formational entropy loss contribution upon binding,which is assumed to be proportional to the number offree torsions; DEHBond is the hydrogen bonding term;DEHBDesol is the term that accounts for the disruption ofhydrogen bonds with solvent upon ligand binding(desolvation of hydrogen bond donors and acceptors).DESolEl is the solvation electrostatic energy change uponbinding, calculated solving the Poisson equation usingthe boundary element method as implemented in theREBEL module of ICM.54 DEHPhob is the hydrophobicfree energy gain assumed to be proportional to theaccessible surface area of the hydrophobic atoms of thereceptor and ligand, buried upon binding. The surfacetension parameter was set at 0.012 kcal/A2. QSize is asize correction term to avoid bias towards larger ligands.While this term has no direct physical meaning, it mayaccount for otherwise omitted interactions of the ligandwith the solvent, primarily the van der Waals dispersioninteraction. The weights a1–a5 give each interaction theappropriate strength and were optimized on a diversebenchmark of complexes.35

Since many complexes present non-satisfied hydrogenbonds for the oxygen atoms of the sulfonamide moiety,no penalty for hydrogen bond desolvation was con-sidered in these type of oxygen atoms. The docking andscoring process was repeated four times, and the bestscore for each ligand was retained.

De novo receptor structure generation using IFREDA

Seeding

The ligand was placed in the LBP by overlaying itsco-crystallized receptor structure with the one to bemodeled. The ligand was then flipped 1808 with respectto its principal axes of inertia, thus generating four start-ing conformations. This was undertaken to avoid any biastowards the preferred ligand conformation, though in

real-life calculations the knowledge of preferred ligand–receptor interactions in PK might help. Conformations ofthe ligand transversal to the PK pockets were avoidedbased on physical evidence of the binding mode to PKs.Starting with each of these four poses, the position of thecenter of mass of the ligand was displaced randomlywithin a sphere of 3 A and the orientation of the moleculein space was randomized around a random axis with anamplitude of 3 A/the radius of gyration of the molecule.This procedure was repeated ten times for each startingpose, so that 40 conformations were obtained. The ampli-tude of 3 A for random displacement was chosen for tworeasons: to avoid generating useless conformations withthe ligand outside the pocket and because the main interestis to generate alternative interaction modes of the ligandwith the neighboring residues. It should be made clearthat the objective of this seeding and the subsequent mini-mization is merely to obtain an ensemble of different start-ing receptor þ ligand conformations. The full globaloptimization that predicts the pocket conformation andthe ligand pose is described below.

Soft-van der Waals structure relaxation

The definition of the binding pocket is flexible and, ofcourse, can include the whole protein. Based on obser-vations that binding sites have regions of very high andvery low stability,41 here, we took the followingapproach: Some loops (which definition depends on thespecific PK) were regarded as flexible, which means thatboth backbone and side-chains torsion angles were free(referred to here as Loops). The rest of the backboneatoms of the receptor were kept fixed. We then definedthree layers of atoms (L1, L2 and L3) of increasing sizeas follows: L1, side-chains that have at least one atom incontact with any atom of the ligand within 4.5 A; L2,L1 þ side-chain atoms of Loops þ residue side-chainsthat have at least one heavy-atom in contact (within4.5 A) with L1 or with Loops’ side-chains; L3, L2 þbackbone atoms of Loops þ residue side-chains thathave at least one heavy-atom contact (within 4.5 A) withLoops’ backbone.

Regarding L3 as flexible by unfreezing its torsionangles, each of the 40 complexes was subjected to fivecycles of in vacuo minimization in the internal coordi-nates space using a soft-van der Waals potential,36 withincreasing weight in each cycle. In this way, we avoidhigh-energy gradients at the beginning of the minimiz-ation, and thus the binding site does not fall apart.Bi-quadratic distance restraints14 between the ligandheavy atoms and neighboring Ca are imposed so that apenalty is added to the energy when the ligand displacesmore than 4.5 A from the initial position. Again, theobjective of these restraints is to keep the ligand withinthe physical limits of the pocket, avoiding the generationof unrealistic complex conformations for the next stages.

Stochastic global energy optimization

The 40 complexes generated through seeding and invacuo minimization were subjected to global energyoptimization using the BPMC procedure, during whichtorsional variables associated with L1 atoms were pertur-bated randomly and local minimization of differentiableenergy terms was performed with respect to L2. Thenumber of energy evaluations during the BPMC andlocal minimizations was limited to 2,000,000 and 2000,respectively. Entropy50 and solvation energy terms were

Protein Flexibility in Docking and Virtual Screening 223

Page 16: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

then calculated and added to the in vacuo energy to beused in the acceptance/rejection stage. The solvationenergy term was based on atomic solvation parameters.55

Torsional angles were sampled between 21808 and 1808,while the amplitude for ligand random displacementand rigid rotation was 3 A and 3 A/ligand radius ofgyration, respectively. The optimal temperature for thesimulation was found to be 600 K, as described inResults. During each of the 40 simulations, 40 geometri-cally diverse low-energy states are stored in a confor-mational set.40 These 40 conformational sets (each with40 low-energy conformations) are then merged andtheir electrostatic and solvation energy contributions re-evaluated. The electrostatic part was calculated usingMMFF charges,49 and the interaction with the solvent bysolving the Poisson equation using the boundaryelement algorithm.39 The non-polar contribution to thesolvation energy was assumed to be proportional to thesolvent-accessible surface. Energy values for isolatedligand and receptor were not calculated, since we wereinterested only in relative values of the free energy.

The conformational set is then clustered to eliminateredundant conformations by comparing the RMSD ofthe atomic coordinates of the ligand heavy atoms. Thecutoff was 0.4 A. The top ranking conformations within30 kcal/mol (with a minimum of 15 and maximum 40)were subjected to full minimization with regard to L3and then the energy re-evaluated as described above.The best-energy conformation was used for subsequentVS calculations.

Acknowledgements

We thank Andrew Orry & Maxim Totrov formany useful discussions.

References

1. Amzel, L. M. (1998). Structure-based drug design.Curr. Opin. Biotechnol. 9, 366–369.

2. Rosenfeld, R., Vajda, S. & DeLisi, C. (1995). Flexibledocking and design. Annu. Rev. Biophys. Biomol.Struct. 24, 677–700.

3. Taylor, R. D., Jewsbury, P. J. & Essex, J. W. (2002). Areview of protein-small molecule docking methods.J. Comput. Aided Mol. Des. 16, 151–166.

4. Abagyan, R. & Totrov, M. (2001). High-throughputdocking for lead generation. Curr. Opin. Chem. Biol.5, 375–382.

5. Shoichet, B. K., McGovern, S. L., Wei, B. & Irwin, J. J.(2002). Lead discovery using molecular docking.Curr. Opin. Chem. Biol. 6, 439–446.

6. Murray, C. W., Baxter, C. A. & Frenkel, A. D. (1999).The sensitivity of the results of molecular docking toinduced fit effects: application to thrombin,thermolysin and neuraminidase. J. Comput. AidedMol. Des. 13, 547–562.

7. Davis, A. M. & Teague, S. J. (1999). Hydrogen bond-ing, hydrophobic interactions, and failure of therigid receptor hypothesis. Angew. Chem. Int. Ed.Engl. 38, 736–749.

8. Teague, S. J. (2003). Implications of protein flexibilityfor drug discovery. Nature Rev. Drug Discov. 2,527–541.

9. Bouzida, D., Rejto, P. A., Arthurs, S., Colson, A. B.,

Freer, S. T., Gehlhaar, D. K. et al. (1999). Computersimulations of ligand-protein binding withensembles of protein conformations: a Monte Carlostudy of HIV-1 protease binding energy landscapes.Int. J. Quantum Chem. 72, 73–84.

10. Cheney, D. & Mueller, L. (2003). Evaluation of strat-egies for molecular docking. Abstr. Pap. Am. Chem.Soc. 226, 144.

11. Jiang, F. & Kim, S. H. (1991). Soft docking: matchingof molecular surface cubes. J. Mol. Biol. 219, 79–102.

12. Leach, A. R. (1994). Ligand docking to proteins withdiscrete side-chain flexibility. J. Mol. Biol. 235,345–356.

13. Jones, G., Willett, P. & Glen, R. C. (1995). Molecularrecognition of receptor sites using a geneticalgorithm with a description of desolvation. J. Mol.Biol. 245, 43–53.

14. Abagyan, R., Totrov, M. & Kuznetsov, D. (1994).ICM—a new method for protein modeling anddesign—applications to docking and structureprediction from the distorted native conformation.J. Comput. Chem. 15, 488–506.

15. Desmet, J., Wilson, I. A., Joniau, M., De Maeyer, M. &Lasters, I. (1997). Computation of the binding of fullyflexible peptides to proteins with flexible side-chains.FASEB J. 11, 164–172.

16. Schaffer, L. & Verkhivker, G. M. (1998). Predictingstructural effects in HIV-1 protease mutant com-plexes with flexible ligand docking and proteinside-chain optimization. Proteins: Struct. Funct.Genet. 33, 295–310.

17. Sandak, B., Nussinov, R. & Wolfson, H. J. (1998). Amethod for biomolecular structural recognition anddocking allowing conformational flexibility.J. Comput. Biol. 5, 631–654.

18. Sandak, B., Wolfson, H. J. & Nussinov, R. (1998).Flexible docking allowing induced fit in proteins:insights from an open to closed conformationalisomers. Proteins: Struct. Funct. Genet. 32, 159–174.

19. Carlson, H. A. (2002). Protein flexibility is an import-ant component of structure-based drug discovery.Curr. Pharm. Des. 8, 1571–1578.

20. Knegtel, R. M., Kuntz, I. D. & Oshiro, C. M. (1997).Molecular docking to ensembles of protein struc-tures. J. Mol. Biol. 266, 424–440.

21. Claussen, H., Buning, C., Rarey, M. & Lengauer, T.(2001). FlexE: efficient molecular docking consider-ing protein structure variations. J. Mol. Biol. 308,377–395.

22. Osterberg, F., Morris, G. M., Sanner, M. F., Olson, A. J.& Goodsell, D. S. (2002). Automated docking tomultiple target structures: incorporation of proteinmobility and structural water heterogeneity in Auto-Dock. Proteins: Struct. Funct. Genet. 46, 34–40.

23. Morris, G. M., Goodsell, D. S., Huey, R. & Olson, A. J.(1996). Distributed automated docking of flexibleligands to proteins: parallel applications of Auto-Dock 2.4. J. Comput. Aided Mol. Des. 10, 293–304.

24. Goodsell, D. S., Morris, G. M. & Olson, A. J. (1996).Automated docking of flexible ligands: applicationsof AutoDock. J. Mol. Recogn. 9, 1–5.

25. Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey,R., Hart, W. E., Belew, R. K. & Olson, A. J. (1998).Automated docking using a Lamarckian geneticalgorithm and an empirical binding free energyfunction. J. Comput. Chem. 19, 1639–1662.

26. Frimurer, T. M., Peters, G. H., Iversen, L. F.,Andersen, H. S., Moller, N. P. & Olsen, O. H. (2003).Ligand-induced conformational changes: improved

224 Protein Flexibility in Docking and Virtual Screening

Page 17: Protein Flexibility in Ligand Docking and Virtual …ablab.ucsd.edu/pdf/04_Protein_Cavasotto_JMB.pdfProtein Flexibility in Ligand Docking and Virtual Screening to Protein Kinases Claudio

predictions of ligand binding conformations andaffinities. Biophys. J. 84, 2273–2281.

27. Schnecke, V. & Kuhn, L. A. (2000). Virtual screeningwith solvation and ligand-induced complementarity.Persp. Drug Discov. Des., 20, 171–190.

28. Broughton, H. B. (2000). A method for includingprotein flexibility in protein-ligand docking: improv-ing tools for database mining and virtual screening.J. Mol. Graph. Model. 18, 247–257.

29. Carlson, H. A., Masukawa, K. M., Rubins, K.,Bushman, F. D., Jorgensen, W. L., Lins, R. D. et al.(2000). Developing a dynamic pharmacophoremodel for HIV-1 integrase. J. Med. Chem. 43,2100–2114.

30. Lin, J. H., Perryman, A. L., Schames, J. R. &McCammon, J. A. (2002). Computational drug designaccommodating receptor flexibility: the relaxed com-plex scheme. J. Am. Chem. Soc. 124, 5632–5633.

31. Lin, J. H., Perryman, A. L., Schames, J. R. &McCammon, J. A. (2003). The relaxed complexmethod: accommodating receptor flexibility for drugdesign with an improved scoring scheme.Biopolymers, 68, 47–62.

32. Cavasotto, C. N., Orry, A. J. W. & Abagyan, R. (2003).Structure-based identification of binding sites, nativeligands and potential inhibitors for G-protein coupledreceptors. Proteins: Struct. Funct. Genet. 51, 423–433.

33. Taylor, R. D., Jewsbury, P. J. & Essex, J. W. (2003).FDS: flexible ligand and receptor docking with a con-tinuum solvent model and soft-core energy function.J. Comput. Chem. 24, 1637–1656.

34. Dancey, J. & Sausville, E. A. (2003). Issues andprogress with protein kinase inhibitors for cancertreatment. Nature Rev. Drug Discov. 2, 296–313.

35. Totrov, M. & Abagyan, R. (1999). Derivation of sensi-tive discrimination potential for virtual screening.RECOMB ‘99. Proceedings of the Third Annual Inter-national Conference on Computational Molecular Biology,Lyon, France, pp. 37–38, ACM Press, New York.

36. Totrov, M. & Abagyan, R. (2001). Protein–liganddocking as an energy optimization problem. InDrug-Receptor Thermodynamics: Introduction andExperimental Applications (Raffa, R. B., ed.), pp.603–624, Wiley, New York.

37. Molsoft LLC (2003). ICM Manual 3.0, Molsoft LLC,La Jolla, CA.

38. Schapira, M., Abagyan, R. & Totrov, M. (2003).Nuclear hormone receptor targeted virtual screening.J. Med. Chem. 46, 3045–3059.

39. Totrov, M. & Abagyan, R. (1996). The contour-buildup algorithm to calculate the analyticalmolecular surface. J. Struct. Biol. 116, 138–143.

40. Abagyan, R. & Argos, P. (1992). Optimal protocol andtrajectory visualization for conformational searches ofpeptides and proteins. J. Mol. Biol. 225, 519–532.

41. Luque, I. & Freire, E. (2000). Structural stability ofbinding sites: consequences for binding affinity andallosteric effects. Proteins: Struct. Funct. Genet.,Suppl. 4, 63–71.

42. Fitzgerald, C. E., Patel, S. B., Becker, J. W., Cameron,P. M., Zaller, D., Pikounis, V. B. et al. (2003). Struc-tural basis for p38a MAP kinase quinazolinone andpyridol-pyrimidine inhibitor specificity. NatureStruct. Biol. 10, 764–769.

43. Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer,E. F., Jr, Brice, M. D., Rodgers, J. R. et al. (1977). TheProtein Data Bank: a computer-based archival filefor macromolecular structures. J. Mol. Biol. 112,535–542.

44. Toledo, L. M. & Lydon, N. B. (1997). Structures ofstaurosporine bound to CDK2 and cAPK—newtools for structure-based design of protein kinaseinhibitors. Structure, 5, 1551–1556.

45. Engh, R. A., Girod, A., Kinzel, V., Huber, R. &Bossemeyer, D. (1996). Crystal structures of catalyticsubunit of cAMP-dependent protein kinase incomplex with isoquinolinesulfonyl protein kinaseinhibitors H7, H8, and H89. Structural implicationsfor selectivity. J. Biol. Chem. 271, 26157–26164.

46. Meijer, L., Thunnissen, A. M., White, A. W., Garnier,M., Nikolic, M., Tsai, L. H. et al. (2000). Inhibition ofcyclin-dependent kinases, GSK-3beta and CK1 byhymenialdisine, a marine sponge constituent. Chem.Biol. 7, 51–63.

47. Dreyer, M. K., Borcherding, D. R., Dumont, J. A.,Peet, N. P., Tsay, J. T., Wright, P. S. et al. (2001).Crystal structure of human cyclin-dependent kinase2 in complex with the adenine-derived inhibitorH717. J. Med. Chem. 44, 524–530.

48. Stelmach, J. E., Liu, L., Patel, S. B., Pivnichny, J. V.,Scapin, G., Singh, S. et al. (2003). Design andsynthesis of potent, orally bioavailable dihydro-quinazolinone inhibitors of p38 MAP kinase. Bioorg.Med. Chem. Letters, 13, 277–280.

49. Halgren, T. (1995). Merck molecular force field I–V.J. Comput. Chem. 17, 490–641.

50. Abagyan, R. & Totrov, M. (1994). Biased probabilityMonte-Carlo conformational searches and electro-static calculations for peptides and proteins. J. Mol.Biol. 235, 983–1002.

51. Nemethy, G., Gibson, K. D., Palmer, K. A., Yoon, C. N.,Paterlini, M. G., Zagari, A. et al. (1992). Energy par-ameters in polypeptides. 10. Improved geometricalparameters and nonbonded interactions for use in theECEPP/3 algorithm, with application to proline-con-taining peptides. J. Phys. Chem. 96, 6472–6484.

52. Abagyan, R. A. & Totrov, M. (1999). Ab initio foldingof peptides by the optimal-bias Monte Carlo mini-mization procedure. J. Comput. Phys. 151, 402–421.

53. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N.,Teller, A. H. & Teller, E. (1953). Equation of statecalculations by fast computing machines. J. Chem.Phys. 21, 1087–1092.

54. Totrov, M. & Abagyan, R. (2001). Rapid boundaryelement solvation electrostatics calculations in fold-ing simulations: successful folding of a 23-residuepeptide. Biopolymers, 60, 124–133.

55. Wesson, L. & Eisenberg, D. (1992). Atomic solvationparameters applied to molecular dynamics ofproteins in solution. Protein Sci. 1, 227–235.

Edited by J. Thornton

(Received 9 October 2003; received in revised form26 December 2003; accepted 6 January 2004)

Supplementary Material comprising two Figuresis available on Science Direct

Protein Flexibility in Docking and Virtual Screening 225