New Method for the Assessment of All Drug-Like Pockets...

10
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 15, Number 3, 2008 © Mary Ann Liebert, Inc. Pp. 1–10 DOI: 10.1089/cmb.2007.0178 New Method for the Assessment of All Drug-Like Pockets Across a Structural Genome GEORGE NICOLA, COLIN A. SMITH, and RUBEN ABAGYAN ABSTRACT With the increasing wealth of structural information available for human pathogens, it is now becoming possible to leverage that information to aid in rational selection of targets for inhibitor discovery. We present a methodology for assessing the drugability of all small-molecule binding pockets in a pathogen. Our approach incorporates accurate pocket identification, sequence conservation with a similar organism, sequence conservation with the host, and structure resolution. This novel method is applied to 21 structures from the malarial parasite Plasmodium falciparum. Based on our survey of the structural genome, we selected enoyl-acyl carrier protein reductase (ENR) as a promising candidate for virtual screening based inhibitor discovery. Key words: computational molecular biology, drug design, protein structure. 1. INTRODUCTION I DENTIFICATION OF NOVEL drug-binding targets remains an insufficiently met need in drug discovery. Recently, structural genomics initiatives have begun to play a significant role in the rapid production of structural data. However, many of the structures determined by such initiatives remain uncharacterized (Todd et al., 2005), leading to a discrepancy between the number of proteins elucidated and the number of druggable targets available. It is clear that, as the number of available protein structures in a particular genome increases, the need for understanding and annotation of these structures also increases. In turn, this important functional information can be exploited for structure-based drug design. One application of such information is the identification of small-molecule binding sites. Traditionally, these sites have been elucidated through co-crystallization with a known ligand or through site-directed mutagenesis of putative active site residues. However, if such data is not available, then other means must be explored. An alternative method of determining binding sites is through computational structural analysis. This type of approach has been shown recently to detect successfully greater than 95% of the ligand binding sites in a benchmark of over 5000 protein structures (An et al., 2005). Once potential ligand binding sites are identified, the next question is whether small molecules bound to those sites will interfere with the function of the protein. One method for determining the functional significance of a protein region is through sequence conservation to related homologues. This method has been shown to be particularly useful in determining interaction sites on proteins (Chelliah et al., 2004). Department of Molecular Biology, The Scripps Research Institute, La Jolla, California. 1

Transcript of New Method for the Assessment of All Drug-Like Pockets...

Page 1: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

JOURNAL OF COMPUTATIONAL BIOLOGY

Volume 15, Number 3, 2008

© Mary Ann Liebert, Inc.

Pp. 1–10

DOI: 10.1089/cmb.2007.0178

New Method for the Assessment of All Drug-LikePockets Across a Structural Genome

GEORGE NICOLA, COLIN A. SMITH, and RUBEN ABAGYAN

ABSTRACT

With the increasing wealth of structural information available for human pathogens, it is

now becoming possible to leverage that information to aid in rational selection of targets

for inhibitor discovery. We present a methodology for assessing the drugability of all

small-molecule binding pockets in a pathogen. Our approach incorporates accurate pocket

identification, sequence conservation with a similar organism, sequence conservation with

the host, and structure resolution. This novel method is applied to 21 structures from the

malarial parasite Plasmodium falciparum. Based on our survey of the structural genome,

we selected enoyl-acyl carrier protein reductase (ENR) as a promising candidate for virtual

screening based inhibitor discovery.

Key words: computational molecular biology, drug design, protein structure.

1. INTRODUCTION

IDENTIFICATION OF NOVEL drug-binding targets remains an insufficiently met need in drug discovery.Recently, structural genomics initiatives have begun to play a significant role in the rapid production

of structural data. However, many of the structures determined by such initiatives remain uncharacterized(Todd et al., 2005), leading to a discrepancy between the number of proteins elucidated and the numberof druggable targets available. It is clear that, as the number of available protein structures in a particulargenome increases, the need for understanding and annotation of these structures also increases. In turn,this important functional information can be exploited for structure-based drug design. One applicationof such information is the identification of small-molecule binding sites. Traditionally, these sites havebeen elucidated through co-crystallization with a known ligand or through site-directed mutagenesis ofputative active site residues. However, if such data is not available, then other means must be explored.An alternative method of determining binding sites is through computational structural analysis. This typeof approach has been shown recently to detect successfully greater than 95% of the ligand binding sitesin a benchmark of over 5000 protein structures (An et al., 2005).

Once potential ligand binding sites are identified, the next question is whether small molecules boundto those sites will interfere with the function of the protein. One method for determining the functionalsignificance of a protein region is through sequence conservation to related homologues. This method hasbeen shown to be particularly useful in determining interaction sites on proteins (Chelliah et al., 2004).

Department of Molecular Biology, The Scripps Research Institute, La Jolla, California.

1

Page 2: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

2 NICOLA ET AL.

Due to the unfulfilled requirement of target discovery, we decided to approach this problem by surveyingall proteins in a genome-wide scale. Here we present a methodology for the comprehensive computationalevaluation of all small-molecule binding sites across a genome, and the potential to target these sites forstructure-based drug design.

We apply our method to malaria, a human parasitic disease that causes 300–500 million acute infectionsand over 1 million deaths per year (WHO, 2005). The species Plasmodium falciparum is responsible forthe majority of all malarial deaths. The drugs chloroquine and sulfadoxine currently used to treat malariaare rapidly becoming ineffective due to resistance. As such, it is important to identify novel targets inthe malarial genome for rational drug design. The use of x-ray crystallography has led to structures ofmany P. falciparum proteins and has helped elucidate the structural basis for drug activity and resistancein this species. As a logical next step, our approach surveys the structural genome of P. falciparum for theidentification of drug targets. Based on the results of this analysis, we selected enoyl-acyl carrier proteinreductase, which we believe to be a prime candidate for virtual screening-based inhibitor discovery.

2. METHODS

2.1. Protein model generation

Molecular models were based on crystallographic PDB entries with released coordinates. In cases wheremultiple entries existed for a given protein, those with highest resolution, sequence identity closest towild-type, and bound ligands were chosen with higher priority.

For PDB entries representing more than one instance of the biologically functional quaternary structure,the molecule with the best overall similarity to the other molecules was used. (Overall similarity wasmeasured by summing the pair-wise RMSD with the other structures.) For PDB entries representing asubset of the functional protein, BIOMT transformations were applied to create the missing subunits. Forstructures without BIOMT annotation, the missing subunits were generated using crystallographic symmetrytransformations and quaternary structure information from the literature. Finally, hydrogen atoms wereadded to the functionally relevant structure and optimized for the lowest energy orientation. Incorrectlyformed histidine residues were also corrected.

2.2. Pocket identification

Pockets were identified using a recently published algorithm for identifying ligand-binding sites inproteins (An et al., 2005). Briefly, the algorithm is based on a van der Walls grid potential map usinga carbon probe. Bracketing and successive smoothing of that map with two parameter sets produces twodistinct maps. One indicates volume considered inside the molecule. The other indicates cavities of emptyspace. Applying a threshold to the product of those two maps creates solid geometrical objects, whichrepresent the identified pockets. Pockets below a minimum volume are discarded. The algorithm and allpocket analysis routines made use of the ICM software package (MolSoft, 2006).

Initially, only proteins were included in the calculation of the van der Walls potential map. For moleculeswith pockets containing prosthetic groups or cofactors, pocket identification was repeated with the inclusionof these bound molecules, and thus each complex was treated as a separate protein. Duplicate pocketswithin multimeric proteins were identified by clustering the pockets by residue interaction. Within a groupof similar pockets, only the best scoring pocket was kept.

2.3. Sequence alignment and conservation

Homologous sequences were identified using the online NCBI protein-protein BLAST tool (Altschulet al., 1990). PDB sequence data was queried, using default parameters, against the non-redundant compi-lation of GenBank CDS translations, RefSeq proteins, PDB, SwissProt, PIR, and PRF protein sequences.Homologous sequences were selected from each Plasmodium species. If multiple sequences were returnedper species, the sequence representing the greatest overall consensus was selected. The single highestscoring sequence from Homo sapiens was selected.

Page 3: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

POCKET ASSESSMENT OF A STRUCTURAL GENOME 3

Sequence alignment used the Needleman and Wunsch (1970) algorithm with zero end gap penalties.For each pocket, sequence conservation statistics were gathered for nearby residues whose side chainswere within 2.5 Å of the pocket. Residue conservation was defined as (number of identical residues)/(totalnumber of nearby residues). Relative conservation was defined as (residue conservation)/(solvent accessibleresidue conservation). Solvent accessible residues were those with at least 25% of their surface accessibleto a water probe molecule.

2.4. Pocket scoring

Pockets were ranked by a scoring function with six terms, with lower scores indicating a better pocket.The general equation of the scoring function is

S D �CA � CR C1

1:0001 � CH

C .V � V0/2C .A � A0/2

C R

where CA is the absolute residue conservation and CR is the relative residue conservation. The absoluteand relative conservation measures were calculated from pair-wise alignments with Plasmodium yoelii.CH is the absolute conservation with Homo sapiens. V and A are the pocket volume and surface area,respectively. Both V0 and A0 were set to 450, an ideal value for these variables (unpublished data). R isthe resolution of the crystallographic structure. The relative contributions of the terms were balanced byscaling each, so that the interquartile range (IQR) was 1 and the mean was 0. Resolution was scaled tohalf the weight of the other terms with an IQR of 0.5.

3. RESULTS

3.1. Pocket identification

Of the 59 crystal structures of Plasmodium falciparum obtained from the PDB, 21 were distinct proteins(Table 1). These proteins encompass a wide variety of functions, including glycolysis, gluconeogenesis,

�T1purine salvage, vesicular traffic, and hemoglobin degradation activity.

The PocketFinder module in ICM was executed on all 21 proteins. This method identifies all cavitieson the surface of a protein and numbers them based on volume, as depicted in Figure 1. However, because

�F1

several proteins contained repeated subunits in the quaternary structure, many pockets were replicated2–6 times. As such, the number of pockets per protein varied greatly, from 1 to 24. When duplicates werefound, only the pockets with the best score were considered, described below.

Structures for three proteins yielded large pockets (volume > 600 Å3) that also contained prostheticgroups or cofactors. These included enoyl-acyl carrier protein reductase (ENR), dihydrofolate reductase-thymidylate synthase (DHFR-TS), and glutathione reductase (GR). Small molecule inhibitors do notnecessarily need to compete out all cofactors and in some cases may have difficulty displacing thesetightly bound molecules. As such, we created a second set of proteins excluding their cofactors andperformed PocketFinder analysis on both the apo- and cofactor-bound protein set. Scoring and eliminationof redundant pockets produced a list of 133.

3.2. Sequence conservation

In order to determine which of the pockets identified with the PocketFinder module are functionallyrelevant, we compared the same regions with another important Plasmodium species, the rodent malarialparasiteP. yoleii. The recent whole genome sequencing of P. yoleii (Carlton et al., 2002) made this taskpossible for all 21 P. falciparum proteins. Sequence conservation between P. falciparum and P. yoleii(Table 2) was performed only on the pocket residues, as identified with PocketFinder. The mean pair-wise

�T2sequence identity to P. yoleii for all pockets was 0.92.

We next sought a measure to identify pockets in the Plasmodium genome that when targeted, wouldavoid cross-reactivity with the human host. We thus calculated sequence conservation between each ofthe Plasmodium pockets and their equivalent pockets in the most homologous human proteins. To avoidusing homologues similar in sequence but not function, we verified that the published annotation indicated

Page 4: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

4 NICOLA ET AL.

TABLE 1. LIST OF 21 P. falciparum PROTEINS WHOSE STRUCTURES WERE ANALYZED

FOR THE PRESENCE OF DRUGGABLE POCKETS

Protein Abbreviated name PDB ID Resolution (Å)

Acyl-CoA binding protein ACBP 1hbk 2.00Fructose-1,6-bisphosphate aldolase ALDO 1a5c 3.00Adenylosuccinate synthetase AdSS 1p9b 2.00Cyclophilin CyP 1qng 2.10Dihydrofolate reductase-thymidylate synthase DHFR-TS 1j3i 2.33Enoyl-acyl carrier protein reductase ENR 1vrw 2.35Ferredoxin Fd 1iue 1.70Glutathione reductase GR 1onf 2.60Glutathione S-transferase GST 1q4j 2.20Hypoxanthine-guanine-xanthine phosphoribosyltransferase HGXPRT 1cjb 2.00Kinesin KinI 1ry6 1.60L-Lactate dehydrogenase LDH 1t24 1.70Peptide deformylase PDF 1rl4 2.18Gamete antigen 27 Pfg27 1n81 2.10Phosphoglycerate kinase PGK 1ltk 3.00Protein kinase 5 PK5 1v0o 1.90Plasmepsin II PM2 1lf2 1.80Purine nucleoside phosphorylase PNP 1nw4 2.20Rab6 Rab6 1d5c 2.30Triosephosphate isomerase TIM 1o5x 1.10Thioredoxin Trx 1syr 2.95

The resolution of the structures ranged from 1.1 to 3.0 Å.

TABLE 2. TOP 20 POCKETS ACROSS THE P. falciparum PROTEOME RANKED BY DRUGABILITY SCORE

ProteinPocket

no. LigandsSeq.

ident.Rel.

ident.Hs

ident.Volume

(Å3)Area(Å2)

Res.(Å) Score

PNP 1 Inhibitor (IMH) 1.00 1.25 0.56 590 513 2.20 �2.95PNP 2 — 1.00 1.25 0.50 274 290 2.20 �2.56LDH 1 — 1.00 1.07 0.56 466 406 1.70 �2.49ENR� 1 Inhibitor (TCC) 1.00 1.17 0.57 410 362 2.35 �2.47TIM 1 — 0.94 1.17 0.81 448 407 1.10 �2.45DHFR-TS� 3 Inhibitor (WRA) 1.00 1.25 0.79 432 375 2.33 �2.35HGXPRT 3 — 1.00 1.25 0.75 269 323 2.00 �2.29PNP 3 — 1.00 1.25 0.50 174 323 2.20 �2.18PM2 1 Inhibitor (R37) 0.88 1.28 0.80 518 485 1.80 �2.15AdSS 3 Product (GDP) 0.95 1.23 0.82 396 442 2.00 �2.06LDH 2 Inhibitor (OXQ), 1.00 1.07 0.76 420 390 1.70 �1.88

cofactorKinI 2 — 1.00 1.10 0.65 263 297 1.60 �1.83AdSS 2 — 0.94 1.22 0.83 469 348 2.00 �1.65GR 4 — 1.00 1.16 0.71 269 416 2.60 �1.57ENR� 2 — 0.94 1.10 0.63 381 412 2.35 �1.57ENR� 3 — 1.00 1.17 0.64 232 298 2.35 �1.56KinI 1 — 1.00 1.10 0.85 442 464 1.60 �1.54ACBP 1 Payload (COA), 0.85 1.22 0.77 318 367 2.00 �1.41

payload (MYR)DHFR-TS� 7 — 1.00 1.25 0.69 210 210 2.33 �1.36ENR 2 — 0.94 1.10 0.63 295 355 2.35 �1.23

Starred (�) protein names indicate pockets in which a cofactor was included in the template structure. The columns indicateabbreviated protein name, pocket number of the original protein, presence of cofactor or ligand, sequence identity (seq. ident.)between P. falciparum and P. yoelii, relative sequence identity (rel. ident.) between pocket residues and other surface residues,sequence identity to H. sapiens (Hs ident.), volume and area of each pocket, and resolution (res.) of original protein.

Page 5: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

POCKET ASSESSMENT OF A STRUCTURAL GENOME 5

�4C

FIG. 1. Structure of ENR, one of 21 proteins targeted in this study. All drug-like binding sites are identified withthe ICM PocketFinder module. Each of these pockets are numbered by volume and combined with all the pocketsfrom the other proteins for final scoring.

a similar function and classification in both P. falciparum protein and H. sapiens homologue. The meanhuman protein sequence identity for all pockets was 0.73. It is important to note that this measurement doesnot express all types of cross-reactivity with the human host. In particular, it does not address non-specificreactivity or the toxicity of potential drugs designed for these pockets.

3.3. Pocket scoring and selection

We have developed a new “drugability” formula for the scoring of pockets on a genome-wide scale.The formula uses the following six terms: (1) absolute residue sequence identity, (2) sequence identityrelative to the protein surface, (3) absolute sequence identity with the closest human homologue, (4) pocketvolume, (5) pocket surface area, and (6) crystallographic resolution of the protein. All 133 pockets in theP. falciparum genome were ranked according to these criteria and are listed in Table 3.

�T3

We then rationalized the top 20 pockets of the malarial structural genome for their availability to drugtargeting. Five of these pockets were binding sites of existing inhibitors. These included enoyl-acyl carrierprotein reductase (ENR), dihydrofolate reductase-thymidylate synthase, plasmepsin II, as well as severalpockets of L-lactate dehydrogenase and purine nucleoside phosphorylase. It is important to note thatthe scoring function specifically excludes any information about known binding sites or co-crystallizedmolecules. Another top-ranked pocket included adenylosuccinate synthetase bound to GDP. Ligand-boundpockets did not show up in the top 20 ranked pockets, primarily because of the human homologue similarityand the unfavorable volume/surface area. All such pockets showed either human conservation greater than

Page 6: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

6 NICOLA ET AL.

TABLE 3. LIST OF ALL 133 POCKETS FROM THE 21 PROTEIN STRUCTURES

RANKED BY DRUGABILITY SCORE

Protein No. LigandsSeq.

ident.Rel.

ident.Hs

ident.Volume

(Å3)Area(Å2)

Res.(Å) Score

PNP 1 Inhibitor (IMH) 1.00 1.25 0.56 590 513 2.20 �2.95PNP 2 1.00 1.25 0.50 274 290 2.20 �2.56LDH 1 1.00 1.07 0.56 466 406 1.70 �2.49ENR� 1 Inhibitor (TCC) 1.00 1.17 0.57 410 362 2.35 �2.47TIM 1 0.94 1.17 0.81 448 407 1.10 �2.45DHFR-TS� 3 Inhibitor (WRA) 1.00 1.25 0.79 432 375 2.33 �2.35HGXPRT 3 1.00 1.25 0.75 269 323 2.00 �2.29PNP 3 1.00 1.25 0.50 174 323 2.20 �2.18PM2 1 Inhibitor (R37) 0.88 1.28 0.80 518 485 1.80 �2.15AdSS 3 Product (GDP) 0.95 1.23 0.82 396 442 2.00 �2.06LDH 2 Inhibitor (OXQ), cofactor 1.00 1.07 0.76 420 390 1.70 �1.88KinI 2 1.00 1.10 0.65 263 297 1.60 �1.83AdSS 2 0.94 1.22 0.83 469 348 2.00 �1.65GR 4 1.00 1.16 0.71 269 416 2.60 �1.57ENR� 2 0.94 1.10 0.63 381 412 2.35 �1.57ENR� 3 1.00 1.17 0.64 232 298 2.35 �1.56KinI 1 1.00 1.10 0.85 442 464 1.60 �1.54ACBP 1 Payload (COA), payload

(MYR)0.85 1.22 0.77 318 367 2.00 �1.41

DHFR-TS� 7 1.00 1.25 0.69 210 210 2.33 �1.36ENR 2 0.94 1.10 0.63 295 355 2.35 �1.23DHFR-TS 6 1.00 1.25 0.69 197 201 2.33 �1.22DHFR-TS� 6 0.95 1.20 0.73 216 315 2.33 �1.20GR� 3 0.96 1.12 0.79 452 442 2.60 �1.15AdSS 6 1.00 1.29 0.67 142 159 2.00 �1.10GR 3 0.96 1.11 0.75 374 356 2.60 �1.06GST 3 1.00 1.17 0.50 183 201 2.20 �1.04ENR� 4 1.00 1.17 0.56 170 230 2.35 �0.97AdSS 1 0.83 1.08 0.67 488 541 2.00 �0.92PM2 3 0.90 1.32 0.60 123 149 1.80 �0.73DHFR-TS� 4 0.88 1.10 0.56 283 293 2.33 �0.68LDH 4 1.00 1.07 0.64 171 222 1.70 �0.67DHFR-TS 4 0.88 1.10 0.56 263 279 2.33 �0.53ENR 3 1.00 1.17 0.75 174 227 2.35 �0.49ENR 4 1.00 1.17 0.50 120 188 2.35 �0.40PK5 3 1.00 1.10 0.50 156 167 1.90 �0.39CyP 2 1.00 1.10 0.75 185 223 2.10 �0.27DHFR-TS� 8 0.89 1.11 0.61 194 264 2.33 �0.21ENR� 6 1.00 1.17 0.50 124 160 2.35 �0.19GR� 2 0.96 1.11 0.75 518 712 2.60 �0.19DHFR-TS� 10 0.95 1.19 0.65 149 183 2.33 �0.15DHFR-TS 2 0.81 1.02 0.56 556 572 2.33 �0.13GR 7 1.00 1.16 0.50 117 178 2.60 �0.01Trx 1 0.91 1.00 0.64 316 374 2.95 0.00GST 2 0.92 1.07 0.67 227 220 2.20 0.01DHFR-TS 7 0.89 1.11 0.61 174 246 2.33 0.02DHFR-TS 9 0.95 1.19 0.65 134 168 2.33 0.08DHFR-TS� 1 0.81 1.02 0.56 599 599 2.33 0.11PK5 2 0.94 1.04 0.83 281 266 1.90 0.13DHFR-TS� 5 0.93 1.16 0.86 244 281 2.33 0.14TIM 3 0.81 1.01 0.69 166 286 1.10 0.18TIM 2 Inhibitor (2PG) 0.92 1.14 0.92 345 375 1.10 0.31

(continued)

Page 7: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

POCKET ASSESSMENT OF A STRUCTURAL GENOME 7

TABLE 3. (Continued)

Protein No. LigandsSeq.

ident.Rel.

ident.Hs

ident.Volume

(Å3)Area(Å2)

Res.(Å) Score

PGK 3 1.00 1.14 0.85 217 313 3.00 0.32PDF 1 Inhibitors (BRR, BL5) 0.94 1.11 0.88 293 285 2.18 0.35ENR 1 Inhibitor (TCC), cofactor 0.98 1.15 0.58 862 663 2.35 0.37Rab6 2 1.00 1.01 0.67 153 225 2.30 0.38KinI 5 1.00 1.10 0.83 140 192 1.60 0.43PGK 1 0.92 1.04 0.83 457 527 3.00 0.43TIM 4 0.90 1.11 0.80 115 184 1.10 0.49DHFR-TS 13 0.92 1.15 0.58 115 174 2.33 0.52Fd 2 1.00 1.08 0.70 103 149 1.70 0.53ALDO 11 0.90 1.01 0.70 231 372 3.00 0.58GR 6 1.00 1.16 0.67 123 139 2.60 0.58DHFR-TS� 2 Substrate (UMP) 0.98 1.22 0.93 471 528 2.33 0.66Pfg27 2 0.63 1.01 0.59 367 389 2.10 0.72DHFR-TS 5 1.00 1.25 0.92 224 266 2.33 0.75GR� 5 1.00 1.16 0.80 140 153 2.60 0.91ALDO 13 1.00 1.12 0.83 207 202 3.00 0.96PGK 5 0.94 1.07 0.75 170 236 3.00 1.09KinI 4 0.86 0.94 0.64 162 213 1.60 1.10ALDO 12 1.00 1.12 0.88 223 247 3.00 1.15CyP 1 Coinhibitor (CsA) 1.00 1.10 0.92 271 288 2.10 1.22ALDO 8 0.80 0.89 0.60 359 364 3.00 1.23ALDO 14 1.00 1.12 0.83 160 195 3.00 1.31GR 5 1.00 1.16 0.91 217 262 2.60 1.36LDH 3 0.75 0.80 0.75 406 344 1.70 1.45ALDO 9 0.90 1.01 0.85 295 312 3.00 1.45DHFR-TS� 12 0.83 1.04 0.50 132 174 2.33 1.46PGK 8 0.92 1.04 0.67 148 193 3.00 1.54Pfg27 1 0.57 0.92 0.64 427 465 2.10 1.58PM2 5 0.75 1.10 0.63 104 171 1.80 1.62KinI 6 0.90 0.99 0.60 110 118 1.60 1.62GST 1 Inhibitor (GTX) 1.00 1.17 0.64 789 889 2.20 1.65DHFR-TS� 14 0.83 1.04 0.50 124 159 2.33 1.65TIM 5 0.83 1.03 0.75 112 128 1.10 1.66DHFR-TS 3 Substrate (UMP) 1.00 1.25 0.95 380 427 2.33 1.73PM2 2 0.60 0.88 0.50 300 320 1.80 1.73ALDO 15 1.00 1.12 0.83 127 168 3.00 1.76AdSS 5 0.75 0.97 0.63 174 210 2.00 1.81Trx 2 1.00 1.10 0.80 118 141 2.95 1.87Fd 1 0.88 0.94 0.50 107 133 1.70 1.90DHFR-TS 1 Inhibitor (WRA), cofactor

(NDP)0.95 1.19 0.76 905 757 2.33 1.92

ALDO 16 0.92 1.02 0.58 119 165 3.00 1.93DHFR-TS� 16 0.83 1.04 0.50 111 129 2.33 2.02DHFR-TS 8 0.81 1.02 0.75 158 186 2.33 2.05DHFR-TS 14 0.81 1.02 0.50 110 149 2.33 2.14DHFR-TS 12 0.80 1.00 0.50 118 162 2.33 2.15DHFR-TS 16 0.83 1.04 0.50 103 122 2.33 2.16GR� 4 0.90 1.04 0.80 151 151 2.60 2.21DHFR-TS� 9 0.78 0.97 0.72 173 196 2.33 2.25PDF 2 0.92 1.09 0.83 104 129 2.18 2.31DHFR-TS� 11 0.75 0.94 0.50 145 205 2.33 2.32DHFR-TS 10 0.75 0.94 0.50 129 191 2.33 2.54Pfg27 4 0.64 1.04 0.50 124 173 2.10 2.55

(continued)

Page 8: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

8 NICOLA ET AL.

TABLE 3. (Continued)

Protein No. LigandsSeq.

ident.Rel.

ident.Hs

ident.Volume

(Å3)Area(Å2)

Res.(Å) Score

PGK 4 0.88 1.00 0.81 173 178 3.00 2.72GR 2 Cofactor (FAD) 0.95 1.11 0.86 841 743 2.60 2.83PK5 4 0.95 1.04 0.90 122 156 1.90 2.92Pfg27 3 0.63 1.01 0.50 126 148 2.10 3.04ENR 5 0.80 0.94 0.60 106 129 2.35 3.13ENR� 5 0.75 0.88 0.58 129 147 2.35 3.43ALDO 7 1.00 1.12 0.95 522 470 3.00 3.46ENR� 7 0.75 0.88 0.67 118 164 2.35 3.54Pfg27 5 0.64 1.04 0.79 115 148 2.10 3.60Rab6 1 Product (GDP) 1.00 1.01 0.95 349 357 2.30 3.75AdSS 4 Reaction intermediate (IMO),

substrate analogue1.00 1.29 0.96 333 278 2.00 3.76

ALDO 10 0.67 0.74 0.67 259 231 3.00 4.09PGK 2 Substrate analogue (AMP) 1.00 1.14 0.95 344 328 3.00 4.19GR 8 0.75 0.87 0.75 111 160 2.60 4.20DHFR-TS� 13 0.67 0.84 0.50 127 129 2.33 4.27PGK 6 0.81 0.92 0.88 155 201 3.00 4.28DHFR-TS� 15 0.67 0.84 0.67 121 157 2.33 4.35DHFR-TS 11 0.67 0.84 0.50 119 123 2.33 4.39DHFR-TS 15 0.60 0.75 0.70 108 147 2.33 5.60Pfg27 7 0.50 0.81 0.50 102 117 2.10 5.61Pfg27 6 0.50 0.81 0.67 107 132 2.10 5.71PM2 4 0.50 0.73 0.50 105 128 1.80 5.72GR 1 1.00 1.16 0.69 1284 1259 2.60 15.88HGXPRT 1 0.77 0.96 0.63 1976 2117 2.00 68.77GR� 1 0.99 1.15 0.76 2358 2320 2.60 93.20HGXPRT 2 Inhibitor (IRP), substrate

analogue1.00 1.25 1.00 385 320 2.00 2889.00

PK5 1 Inhibitor (INR) 1.00 1.10 1.00 530 484 1.90 2889.00KinI 3 1.00 1.10 1.00 206 230 1.60 2890.00PGK 7 1.00 1.14 1.00 148 168 3.00 2892.00PK5 5 1.00 1.10 1.00 100 118 1.90 2892.00

Starred (�) protein names indicate pockets in which a cofactor was included in the template structure. For definitions ofabbreviations, see Table 2.

0.88 or volume greater than 800 Å3. The “Ligands” column of Table 2 indicates ligands that, had theybeen left in the template, would have intersected with the pocket.

Our selection narrowed the results to two pocket candidates: purine nucleoside phosphorylase (PNP)pocket 1, whose inhibition has been shown to prevent P. falciparum growth (Kicska et al., 2002), and ENRpocket 1, which is involved in the critical process of fatty acid biosynthesis. Due to its important role inthe life cycle of malaria, and a lack of suitable therapeutics targeting it, we decided ENR is most amenablefor virtual screening based drug discovery.

4. DISCUSSION

There exists an unmet need to successfully identify novel targets in important pathogens. Other groupshave attempted to decipher drugability of active sites by relating to sequence conservation and pocketsurface area/size, with mixed results. In the present study, we describe a novel protocol that incorporatesthese variables in a unique formula for identifying drug-like pockets. Furthermore, we apply this methodto an entire structural proteome, a database of targets larger than any group has attempted previously.

Page 9: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

POCKET ASSESSMENT OF A STRUCTURAL GENOME 9

We used ICM PocketFinder (An et al., 2005), a recently developed method for detection of sites on aprotein surface that are most amenable to small molecule binding. PocketFinder calculates van der Waalspotentials in the vicinity of the protein surface, and identifies and sorts protein surface cavities based on theirability to bind a virtual ligand. The method is independent from any physical ligand-related informationand thus allows for pocket identification in apo- as well as bound structures. The efficiency of the methodmakes it applicable for identification of potential drug targets by screening every structurally-determinedprotein in a genome.

To validate our protocol on a large scale, we applied the method to the P. falciparum structural proteome.In order to select the most biologically significant targets and sites, we introduced a novel binding pocketscoring function, which included terms for the pocket volume, size, and shape, as well as evolutionaryconservation of residues lining each pocket. A high conservation between proteins in the P. falciparumgenome and homologous proteins from a related species, P. yoelii, was considered an advantage in theprocess of pocket scoring. We reasoned that this would ensure their functional relevance as well as increasethe chances that drugs developed to target these pockets will cross-react with different members of thePlasmodium genus. However, it is important to mention that this is a very limited attempt to select for themost functionally relevant proteins in a genome. For optimal evaluation, this approach needs to be coupledwith other biological and functional aspects of each protein.

Another sequence analysis used in the final pocket scoring function included the comparison betweenP. falciparum and H. sapiens. Conservation of the residues between each putative target and homologousproteins in human could potentially lead to toxicity due to cross-reactivity, and thus considered unfavorable.While this measure does not account for all types of potential cross-reactivity in human, it enriches theset of top-scoring targets with those that are more biologically significant.

The application of our novel method led to the identification of ENR pocket #1 as the best candidatefor structure-based drug design. ENR has not been previously targeted by structure-based drug design;moreover, this is a very important enzyme in the Plasmodium life cycle. We have validated our results in aseparate study by targeting ENR with virtual screening and successfully discovering novel small moleculeinhibitors (Nicola et al., 2007). Based on the success of these experiments, we have confidence that thesame approach can be used to identify druggable targets in other structural genomes.

ACKNOWLEDGMENTS

We thank Irina Kufareva for critical analysis of this manuscript. This work was funded in part by theNIH (grant 5R01GM071872-03 to R.A.).

DISCLOSURE STATEMENT

No competing financial interests exist.�

�AU1

REFERENCES

Altschul, S.F., Gish, W., Miller, W., et al. 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410.An, J., Totrov, M., and Abagyan, R. 2005. Pocketome via comprehensive identification and classification of ligand

binding envelopes. Mol. Cell Proteomics 4, 752–761.Carlton, J.M., Angiuoli, S.V., Suh, B.B., et al. 2002. Genome sequence and comparative analysis of the model rodent

malaria parasite Plasmodium yoelii. Nature 419, 512–519.Chelliah, V., Chen, L., Blundell, T.L., et al. 2004. Distinguishing structural and functional restraints in evolution in

order to identify interaction sites. J. Mol. Biol. 342, 1487–1504.Kicska, G.A., Tyler, P.C., Evans, G.B., et al. 2002. Transition state analogue inhibitors of purine nucleoside phospho-

rylase from Plasmodium falciparum. J. Biol. Chem. 277, 3219–3225.MolSoft. 2006. ICM. Program Manual. San Diego.Needleman, S.B., and Wunsch, C.D. 1970. A general method applicable to the search for similarities in the amino

acid sequence of two proteins. J. Mol. Biol. 48, 443–453.

Page 10: New Method for the Assessment of All Drug-Like Pockets ...ablab.ucsd.edu/pdf/08_New_method_Nicola_JCB.pdfThioredoxin Trx 1syr 2.95 The resolution of the structures ranged from 1.1

10 NICOLA ET AL.

Nicola, G., Smith, C.A., Lucumi, E., et al. 2007. Discovery of novel inhibitors targeting enoyl-acyl carrier proteinreductase in Plasmodium falciparum by structure-based virtual screening. Biochem. Biophys. Res. Commun. 358,686–691.

Todd, A.E., Marsden, R.L., Thornton, J.M., et al. 2005. Progress of structural genomics initiatives: an analysis ofsolved target structures. J. Mol. Biol. 348, 1235–1260.

WHO (World Health Organization). 2005. World Malaria Report 2005. In: WHO, ed. Roll Back Malaria. World HealthOrganization, Geneva, Switzerland.

Address reprint requests to:Dr. Ruben Abagyan

10550 North Torrey Pines Rd., TPC-28La Jolla, CA 92037

E-mail: [email protected]

AU1

Is Disclosure Statement as you meant?