Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard,...

6
Equilibrium fluctuations of a single folded protein reveal a multitude of potential cryptic allosteric sites Gregory R. Bowman a,b,1 and Phillip L. Geissler b,c a Departments of Molecular and Cell Biology, and b Chemistry, University of California, Berkeley, CA 94720; and c Physical Biosciences Division, Lawrence Berkeley National Lab, Berkeley, CA 94720 Edited by Michael L. Klein, Temple University, Philadelphia, PA, and approved June 6, 2012 (received for review June 1, 2012) Cryptic allosteric sitestransient pockets in a folded protein that are invisible to conventional experiments but can alter enzymatic activity via allosteric communication with the active siteare a promising opportunity for facilitating drug design by greatly ex- panding the repertoire of available drug targets. Unfortunately, identifying these sites is difficult, typically requiring resource- intensive screening of large libraries of small molecules. Here, we demonstrate that Markov state models built from extensive com- puter simulations (totaling hundreds of microseconds of dynamics) can identify prospective cryptic sites from the equilibrium fluctua- tions of three medically relevant proteinsβ-lactamase, interleu- kin-2, and RNase Heven in the absence of any ligand. As in previous studies, our methods reveal a surprising variety of confor- mationsincluding bound-like configurationsthat implies a role for conformational selection in ligand binding. Moreover, our ana- lyses lead to a number of unique insights. First, direct comparison of simulations with and without the ligand reveals that there is still an important role for an induced fit during ligand binding to cryptic sites and suggests new conformations for docking. Second, corre- lations between amino acid sidechains can convey allosteric signals even in the absence of substantial backbone motions. Most impor- tantly, our extensive sampling reveals a multitude of potential cryptic sitesconsisting of transient pockets coupled to the active siteeven in a single protein. Based on these observations, we propose that cryptic allosteric sites may be even more ubiquitous than previously thought and that our methods should be a valu- able means of guiding the search for such sites. molecular dynamics native state dynamics T he degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially major flaw. Most assume that proteins are frozen in their crystallo- graphic structures because we do not understand the intrinsic dynamics of proteins well enough to include them. Once this as- sumption is made, the only way to manipulate a proteins activity is with inhibitors that bind more tightly to the proteins active site than the molecules natural ligand. Unfortunately, only approxi- mately 15% of proteins have a sufficiently deep pocket coinciding with their active site to make this strategy feasible (1). Accounting for conformational heterogeneity could greatly ex- pand the repertoire of available drug targets by revealing cryptic sitestransient pockets that are not readily visible in experimen- tal structures, yet can inhibit enzymatic activity via allostery or block proteinprotein interactions (25). For example, they may provide previously undescribed druggable pockets on proteins that are already considered viable targets, or even make it pos- sible to target proteins that are currently considered undruggable. Targeting these sites may be easier than targeting the active site if there is no natural ligand to outcompete. Finally, cryptic sites open up the possibility of enhancing desirable activity (6). As their name implies, identifying cryptic sites with conven- tional experimental techniques is a challenging task. Cryptic sites are not visible in ligand-free structures of proteins and often have only been discovered serendipitously when crystal structures of ligand-bound proteins reveal that the small molecule binds in a cryptic pocket. Based on this pattern, many have drawn the con- clusion that cryptic ligands operate via an induced-fit mechanism the ligand triggers conformational changes that open the pocket the molecule ultimately binds to. Regardless of the binding me- chanism, a site-directed screening method called tethering (7) can, in principle, be used to discover cryptic sites and ligands. However, even tethering can benefit greatly from foreknowledge of the structures and locations of cryptic sites to avoid the need to exhaustively enumerate all possible pocket locations and ligands. Therefore, computational methods could prove valuable for finding new cryptic sites and guiding experiments to verify their existence. We hypothesized that regions where transient pockets coincide with structural elements coupled to the active site are likely to be viable cryptic sites and developed computational methods for detecting these signature fluctuations. This approach was inspired by the fluctuation-dissipation theorem and an ensemble view of allostery, (8, 9) which suggest that allosteric sites must be detect- able in a proteins natural fluctuations at equilibrium. Our meth- ods draw on kinetic network modelscalled Markov state models (MSMs)built from extensive molecular-dynamics simulations to describe a proteins intrinsic dynamics (10). Like a map of a mo- lecules free energy landscape, an MSM provides a reduced view of the ensemble of spontaneous fluctuations the molecule under- goes at equilibrium. These models capture both thermodynamic and kinetic properties of the system being considered. Therefore, we can identify transient pockets, calculate how often they are open, and look for structural couplings between different regions of a protein with MSMs. Our approach builds on a number of ex- isting methods for detecting cryptic pockets and allosteric sites (1122). Important advances made here include (i) capturing events on much longer timescales (i.e., microseconds and beyond) and (ii) accounting for the requirement that residues surrounding a binding pocket are significantly (though perhaps indirectly) coupled to the active site. These methods lead to a number of un- ique insights into the ubiquity and mechanism of cryptic allosteric sites that are the focus of this work. We apply these methods to three systems: TEM-1 β-lactamase, interleukin-2 (IL-2), and RNase H. We primarily focus on β-lac- tamase, an enzyme capable of conferring antibiotic resistance by hydrolyzing β-lactam antibiotics. Besides its relevance to antibio- tic resistance, this target was chosen because it has one known cryptic site to validate our approach on. This site is invisible in the apo crystal structure (Fig. 1A) (23) and was only discovered when a small molecule thought to bind the active site turned out to bind between helices 11 and 12 (Fig. 1B) (2). We also analyze IL-2 and RNase H to assess the generality of our results from Author contributions: G.R.B. and P.L.G. designed research; G.R.B. performed research; G.R.B. contributed new reagents/analytic tools; G.R.B. analyzed data; and G.R.B. and P.L.G. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.1209309109/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1209309109 PNAS July 17, 2012 vol. 109 no. 29 1168111686 BIOPHYSICS AND COMPUTATIONAL BIOLOGY CHEMISTRY Downloaded by guest on March 29, 2020

Transcript of Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard,...

Page 1: Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially

Equilibrium fluctuations of a single folded proteinreveal a multitude of potential cryptic allosteric sitesGregory R. Bowmana,b,1 and Phillip L. Geisslerb,c

aDepartments of Molecular and Cell Biology, and bChemistry, University of California, Berkeley, CA 94720; and cPhysical Biosciences Division, LawrenceBerkeley National Lab, Berkeley, CA 94720

Edited by Michael L. Klein, Temple University, Philadelphia, PA, and approved June 6, 2012 (received for review June 1, 2012)

Cryptic allosteric sites—transient pockets in a folded protein thatare invisible to conventional experiments but can alter enzymaticactivity via allosteric communication with the active site—are apromising opportunity for facilitating drug design by greatly ex-panding the repertoire of available drug targets. Unfortunately,identifying these sites is difficult, typically requiring resource-intensive screening of large libraries of small molecules. Here, wedemonstrate that Markov state models built from extensive com-puter simulations (totaling hundreds of microseconds of dynamics)can identify prospective cryptic sites from the equilibrium fluctua-tions of three medically relevant proteins—β-lactamase, interleu-kin-2, and RNase H—even in the absence of any ligand. As inprevious studies, our methods reveal a surprising variety of confor-mations—including bound-like configurations—that implies a rolefor conformational selection in ligand binding. Moreover, our ana-lyses lead to a number of unique insights. First, direct comparisonof simulations with andwithout the ligand reveals that there is stillan important role for an induced fit during ligand binding to crypticsites and suggests new conformations for docking. Second, corre-lations between amino acid sidechains can convey allosteric signalseven in the absence of substantial backbone motions. Most impor-tantly, our extensive sampling reveals a multitude of potentialcryptic sites—consisting of transient pockets coupled to the activesite—even in a single protein. Based on these observations, wepropose that cryptic allosteric sites may be even more ubiquitousthan previously thought and that our methods should be a valu-able means of guiding the search for such sites.

molecular dynamics ∣ native state dynamics

The degree to which standard, rational drug design protocolsignore protein-conformational changes is a potentially major

flaw. Most assume that proteins are frozen in their crystallo-graphic structures because we do not understand the intrinsicdynamics of proteins well enough to include them. Once this as-sumption is made, the only way to manipulate a protein’s activityis with inhibitors that bind more tightly to the protein’s active sitethan the molecule’s natural ligand. Unfortunately, only approxi-mately 15% of proteins have a sufficiently deep pocket coincidingwith their active site to make this strategy feasible (1).

Accounting for conformational heterogeneity could greatly ex-pand the repertoire of available drug targets by revealing crypticsites—transient pockets that are not readily visible in experimen-tal structures, yet can inhibit enzymatic activity via allostery orblock protein–protein interactions (2–5). For example, they mayprovide previously undescribed druggable pockets on proteinsthat are already considered viable targets, or even make it pos-sible to target proteins that are currently considered undruggable.Targeting these sites may be easier than targeting the active site ifthere is no natural ligand to outcompete. Finally, cryptic sitesopen up the possibility of enhancing desirable activity (6).

As their name implies, identifying cryptic sites with conven-tional experimental techniques is a challenging task. Cryptic sitesare not visible in ligand-free structures of proteins and often haveonly been discovered serendipitously when crystal structures ofligand-bound proteins reveal that the small molecule binds in a

cryptic pocket. Based on this pattern, many have drawn the con-clusion that cryptic ligands operate via an induced-fit mechanism—the ligand triggers conformational changes that open the pocketthe molecule ultimately binds to. Regardless of the binding me-chanism, a site-directed screening method called tethering (7)can, in principle, be used to discover cryptic sites and ligands.However, even tethering can benefit greatly from foreknowledgeof the structures and locations of cryptic sites to avoid the need toexhaustively enumerate all possible pocket locations and ligands.Therefore, computational methods could prove valuable forfinding new cryptic sites and guiding experiments to verify theirexistence.

We hypothesized that regions where transient pockets coincidewith structural elements coupled to the active site are likely to beviable cryptic sites and developed computational methods fordetecting these signature fluctuations. This approach was inspiredby the fluctuation-dissipation theorem and an ensemble view ofallostery, (8, 9) which suggest that allosteric sites must be detect-able in a protein’s natural fluctuations at equilibrium. Our meth-ods draw on kinetic network models—called Markov state models(MSMs)—built from extensive molecular-dynamics simulations todescribe a protein’s intrinsic dynamics (10). Like a map of a mo-lecule’s free energy landscape, an MSM provides a reduced viewof the ensemble of spontaneous fluctuations the molecule under-goes at equilibrium. These models capture both thermodynamicand kinetic properties of the system being considered. Therefore,we can identify transient pockets, calculate how often they areopen, and look for structural couplings between different regionsof a protein with MSMs. Our approach builds on a number of ex-isting methods for detecting cryptic pockets and allosteric sites(11–22). Important advances made here include (i) capturingevents on much longer timescales (i.e., microseconds and beyond)and (ii) accounting for the requirement that residues surroundinga binding pocket are significantly (though perhaps indirectly)coupled to the active site. These methods lead to a number of un-ique insights into the ubiquity and mechanism of cryptic allostericsites that are the focus of this work.

We apply these methods to three systems: TEM-1 β-lactamase,interleukin-2 (IL-2), and RNase H. We primarily focus on β-lac-tamase, an enzyme capable of conferring antibiotic resistance byhydrolyzing β-lactam antibiotics. Besides its relevance to antibio-tic resistance, this target was chosen because it has one knowncryptic site to validate our approach on. This site is invisible inthe apo crystal structure (Fig. 1A) (23) and was only discoveredwhen a small molecule thought to bind the active site turned outto bind between helices 11 and 12 (Fig. 1B) (2). We also analyzeIL-2 and RNase H to assess the generality of our results from

Author contributions: G.R.B. and P.L.G. designed research; G.R.B. performed research;G.R.B. contributed new reagents/analytic tools; G.R.B. analyzed data; and G.R.B. andP.L.G. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1209309109/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1209309109 PNAS ∣ July 17, 2012 ∣ vol. 109 ∣ no. 29 ∣ 11681–11686

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Mar

ch 2

9, 2

020

Page 2: Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially

β-lactamase. These proteins have completely different folds fromβ-lactamase and are also medically relevant targets.

Using our methods, we address questions like: Are knowncryptic sites detectable as transient pockets in the absence ofligands? How do small molecules bind cryptic sites—via inducedfit or conformational selection? How is this information trans-mitted to the active site? How common are cryptic sites?

Results and DiscussionTo better understand cryptic allosteric sites, we constructedMSMs of the conformational space each protein explores. MSMsserve as maps of the free energy landscapes that ultimately con-trol a protein’s structure and dynamics (10, 24). They have beensuccessfully employed to understand processes like protein fold-ing (25, 26), ligand binding (27–29), and functional dynamics(30–32). One powerful feature of these models is that they cancombine many simulations to capture processes occurring onmuch longer timescales than a single simulation could ever ad-dress. For example, they have previously been used to captureprotein folding on 10 millisecond timescales based on thousandsof microsecond timescale simulations run on commodity hard-ware (33). Moreover, one could capture even longer timescaleswith MSMs by simply running a larger number of simulationsin parallel. This is extremely attractive given that capturing suchtimescales in computer simulations is an enormously difficult task.

Our models for β-lactamase reveal atomistic details of func-tional dynamics within the native state on timescales as slowas a millisecond (SI Text). These models are built from hundredsof atomistic molecular dynamics simulations with explicit solvent,each up to 500 ns in length, for an aggregate of over 100 μs ofsimulation. As discussed shortly, this extensive sampling revealsimportant dynamic processes that could not have been identifiedwith the much smaller data sets often used to infer structure andfunction from simulation. Kinetic clustering of this data withMSMBuilder (34, 35) is used to create a partitioning of confor-mational space with states akin to minima in the free energy land-scape and the probabilities of transitioning between them in a 2 nstime interval (Fig. S1 and Fig. S2). Using these models, we surveythe variety of structures β-lactamase visits, their equilibrium prob-abilities, and the transition times between pairs of states. We alsoassess how ligands modulate the protein’s free energy landscapeby comparing models in the absence or presence of the small mo-lecule. Finally, we calculate experimental observables to validateour models and make predictions. Similar analyses are performedfor IL-2 and RNase H. More details are given in SI Text.

β-Lactamase’s known Cryptic Site Opens Transiently during the Pro-tein’s Natural Fluctuations, but there is Still a Role for Induced Fitin Ligand Binding. It has previously been proposed that ligandsbind cryptic sites via an induced-fit mechanism because thesesites are invisible in ligand-free crystal structures. If this werethe case, then observing cryptic sites in the absence of ligandswould be impossible and designing drugs to target them would

require extensive screening. However, the fluctuation–dissipationtheorem—which relates fluctuations at equilibrium to the sys-tem’s response to a perturbation—mandates that cryptic sitesmust open in the course of equilibrium dynamics, if only transi-ently, even in the absence of a ligand. Identifying such transientpockets could provide a starting point for rational drug design.

By applying a pocket-detection algorithm to each of the ap-proximately 5,000 states in our MSM for ligand-free β-lactamase,we assess how frequently the known cryptic site opens in solution.Specifically, we have employed LIGSITE (36) to identify pocketswithin representative structures from each state of our MSM.Discarding pockets visible in the apo crystal structure allows usto identify transient pockets that could potentially serve as crypticallosteric sites. Similar methods have previously been used tosearch for small, transient pockets (13). Our MSM enhances thepower of such analyses by allowing us to quickly quantify proper-ties like the probability of a pocket being open and the timescalefor opening (SI Text). For example, the probability that a pocket isopen is simply the sum of the equilibrium populations of all thestates in which it is open. Moreover, we scale to much larger datasets and capture motions on significantly longer timescales (mi-croseconds and longer compared to nanoseconds). This MSMapproach does not remedy systematic errors in the force fieldused to generate trajectories, but its applicability extends to anymodel potential.

This analysis reveals that β-lactamase’s known cryptic site doesindeed open transiently in solution (Fig. 2). This pocket is at leastpartially open 53% of the time, making it the most accessibletransient pocket in β-lactamase. The site is not visible in apo crys-tal structures because the closed conformation is also quite com-mon and is likely further stabilized by crystal packing.

The observation of a transient pocket coinciding with theknown cryptic site suggests a conformational selection (or popula-tion shift) binding mechanism, but there could still be a role forinduced fit. As imagined in the conformational selection model,β-lactamase samples both unbound and bound conformationseven in the absence of ligand. Ligands are thus able to bind topre-existing bound conformations, causing their population to in-crease relative to unbound conformations. This observation, how-ever, does not preclude a role for induced fit. A number of studieshave now demonstrated an interplay between conformational se-lection and induced fit(27, 37–41), supporting an extended modelfor binding that combines the two traditional mechanisms (42).

To assess the roles of conformational selection and inducedfit, we built an additional MSM for the ligand-binding process.First, we ran simulations with one of the known cryptic ligandspresent [N,N-bis(4-chlorobenzyl)-1H-1,2,3,4-tetraazol-5-amine,also called CBT]. We then evaluated rate constants for transitionsamong the same set of states determined for the ligand-free pro-

Fig. 1. Crystal structures of TEM-1 β-lactamase (A) in the absence of ligandand (B) in the presence of an allosteric inhibitor that reveals a cryptic bindingsite between helices 11 and 12. The backbone is colored from blue to redstarting at the N terminus, a key active site residue (Ser70) is highlightedin green, and the allosteric inhibitor is shown in cyan (with helix 12 to its leftand helix 11 to its right).

Fig. 2. The three most frequently open pockets (yellow spheres), two ofwhich coincidewith the known allosteric ligand (cyan). Each sphere representsa pocket with a radius of up to 5 Å (SI Text). The spheres are overlaid on theapo crystal structure with the backbone colored from blue to red starting atthe N terminus and a key active site residue (Ser70) highlighted in green.

11682 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1209309109 Bowman and Geissler

Dow

nloa

ded

by g

uest

on

Mar

ch 2

9, 2

020

Page 3: Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially

tein. Reusing the state space allows a quantitative comparisonof the thermodynamics and kinetics of the two systems.

Visual inspection of binding pathways alone is insufficient todistinguish between conformational selection and induced fit. Forexample, Fig. 3 shows the highest flux binding pathway from aparticular unbound state calculated following refs. (26) and (43).In this pathway, the cryptic ligand binds against the cryptic site,then the cryptic site opens more widely than in the holo structurewhile the ligand inserts into the cryptic site, and finally the proteincloses around the ligand. Since the ligand binds against the pro-tein before the protein opens, it is unclear from this depictionalone whether the ligand caused the protein to open or had towait for the cryptic site to appear spontaneously.

Quantitative comparison of β-lactamase’s dynamics in the ab-sence and presence of the cryptic ligand suggests that induced fitplays an important role in ligand binding. If binding were entirelydue to conformational selection, then one would expect the bind-ing rate to be no faster than the opening rate of the cryptic site inthe absence of ligands. However, we find that the cryptic siteopens more rapidly in the presence of the allosteric ligand. Forexample, on average, it takes 16 μs to transition from the apo tothe holo conformation with the ligand compared to 26 μs withoutthe ligand. Therefore, we conclude the ligand modulates the freeenergy landscape in a manner that promotes its own binding bystabilizing protein conformations along the binding pathways. Thisfinding may not surprise many experts; however, our method’sability to reveal details of the binding mechanism (like this inter-play) is an important result that may prove useful for drug designin the future. For example, the highest flux binding pathway showsthat the cryptic site opens more broadly than in the bound con-formation and then closes around the ligand. This broad openingcould be important for providing sufficient space for the ligand tobind, in which case docking against these highly open conforma-tions may be more fruitful for discovering new ligands than dock-ing against bound-like conformations. Furthermore, despite animportant role for induced fit, considering conformational selec-tion alone still allows potential cryptic sites to be identified.

Communities of Coupled Sidechains Link β-Lactamase’s Active andCryptic Sites. Understanding how cryptic sites work—and ulti-mately predicting new ones—also requires determining how theyare coupled to the active site. The ability to detect transient pock-ets that coincide with the known cryptic site suggests we can auto-

matically identify such sites even when they are invisible in experi-mental structures. However, binding of a small molecule to atransient pocket could have no effect on activity if there is no cou-pling between the cryptic and active sites. Therefore, it is also im-portant to identify regions of the protein that are somehow coupledto the active site in addition to containing transient pockets.

The allosterically inhibited structure of β-lactamase suggeststhat the active and cryptic sites are not coupled through the pro-tein’s backbone, though there is evidence other systems maybehave differently (44). First, the Cα rmsd between the apo andholo structures is only 0.96 Å and theCα rmsd between active siteresidues [defined as Ser70 to Lys73, Ser130 to Asn132, Glu166 toAsn170, Lys234 to Ala237, and Arg244] (45) is 0.68 Å. Our simu-lation results also show little variability in the backbone structure.TheCα rmsd of active site residues compared to the apo structureis 0.7� 0.3 Å in ligand-free simulations and 0.9� 0.2 Å whenthe cryptic ligand is present, so the ligand’s effect on the back-bone structure is statistically insignificant. This conclusion is alsoconsistent with recent studies of cytochrome P-450, which showthat there is little coupling between the dynamics of the active sitebackbone and the rest of the protein (46).

Despite the fact that the backbone is relatively static, recentstudies have suggested that there can be significant heterogeneityin sidechain rotameric states even in the context of a fixed back-bone structure (47, 48) and that couplings between these rotamericstates can allow long-range communication (12, 15, 49). Indeed,our simulations reveal a great deal of heterogeneity in sidechainrotameric states (Fig. S3). Communication via coupling betweensidechain rotameric states would also be consistent with the factthat one of the most significant differences between the apo andholo structures of β-lactamase is that the sidechain of a key activesite residue, Arg244, becomes disordered in the holo structure.

To explore the possibility that coupled sidechains are respon-sible for allosteric communication in β-lactamase, we used spec-tral clustering based on the mutual information between therotameric states (χ1 dihedral angles) of pairs of amino acids toidentify communities of coupled residues. The mutual informa-tion—defined in Eq. 1 of Materials and Methods—is a statisticalmeasure of the interdependence between two random variablesthat has been used successfully in previous studies of allostery(12, 15). Examining the mutual information alone reveals exten-sive coupling between both nearby and distant residues (Fig. S4).Nearly equivalent results are obtained with the excess mutual

Fig. 3. The highest flux binding pathway, depicted as a series of configurations exemplifying intermediate Markov states. The crystallographicallydetermined structure of helix 11 in the holo state is superimposed in yellow, emphasizing movement in this part of the protein during binding. One interestingfeature of the pathway is that the two helices surrounding the cryptic site open 2–3 Å more widely than in the holo structure (particularly in statesD and E) and then close around the ligand. The backbone is colored from blue to red starting at the N terminus, and the allosteric inhibitor is shown incyan. The number associated with each structure quantifies progress along the binding pathway. Specifically, it indicates the probability that a trajectoryinitiated in the corresponding state reaches the bound state (F) before first reaching the unbound state (A).

Bowman and Geissler PNAS ∣ July 17, 2012 ∣ vol. 109 ∣ no. 29 ∣ 11683

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Mar

ch 2

9, 2

020

Page 4: Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially

information, which accounts for the spurious correlation ex-pected from finite sampling (SI Text). Spectral clustering was cho-sen over other clustering methods as it provides a natural meansof determining an appropriate number of clusters to constructbased on the eigenvalue spectrum of the similarity matrix beingclustered (SI Text and Fig. S5). Based on the gap between the fifthand sixth eigenvalues in this spectrum, we chose to construct fiveclusters.

Applying this procedure to our model for ligand-free β-lacta-mase reveals a potential mechanism for allosteric communica-tion: A community of coupled residues encompassing the allo-steric site and a substantial portion of the active site that maybe altered upon ligand binding (Fig. 4). Specifically, one of ourclusters contains all residues within 3 Å of the cryptic ligand in theholo structure and 7 out of the 15 residues in the active site withdistinguishable rotameric states. Arg244, the active site residuethat displays the greatest change between the apo and holo struc-tures, is one of the active site residues in this community. There-fore, we suggest that the known cryptic site in β-lactamase isopening and closing in solution but that this has little to no effecton the protein’s activity. Once a ligand binds this site, however, italters the rotameric states of neighboring residues that are part ofa cooperative community. This change is quickly propagated toother members of this cooperative community, including Arg244and other active site residues. Thus, binding at the allosteric sitealters the active site structure and, ultimately, inhibits enzymaticactivity.

Coupling between rotameric states can communicate informa-tion over large distances, as seen in studies of coupling betweenlocal folding and unfolding events (9, 50) and a simple model forsidechain variability in proteins (15). For example, Fig. 4 showsthat the residues in the community encompassing the knowncryptic site span a large portion of β-lactamase. While many ofthe residues form contiguous groups—as seen in other systems(17)—there are large structural separations among others. Forthese discontinuous groups of residues, communication is likelyachieved through an ensemble of pathways, none of which havestrong enough pairwise correlations to appear in this analysis. Asimple model that exhibits similar behavior despite only includinglocal interactions supports this conclusion (15). Long-range inter-actions like electrostatics that are included in our model and en-ergetic correlations that do not require a pathway may alsocontribute.

Our model for β-lactamase with its cryptic ligand furthersupports the mechanism inferred from fluctuations of the apoprotein. First, the small molecule does alter the rotameric statesof nearby residues (Fig. S6). The pattern of coupling betweenresidues is also mostly conserved, demonstrating that our simula-

tions have converged to local equilibrium within the native state.The Pearson correlation coefficient between mutual informationvalues obtained from our models with and without the ligand is0.55, showing that the degree of coupling differs slightly but thatthe catalog of strongly coupled residues is mostly conserved. Fi-nally, this coupling alters the rotameric states of a subset of activesite residues (Fig. S6).

A Single Protein Contains a Multitude of Prospective Cryptic AllostericSites.An important next step is to investigate how common cryp-tic allosteric sites are. Typically, allosteric sites—and especiallythe cryptic variety—are assumed to be rare. However, the seren-dipitous discovery of the known cryptic site investigated here (2)and a variety of others (51–54) suggest that they may actually bequite common. Previous computational work focusing on transi-ent pockets has even hinted that a single protein could conceiva-bly contain multiple cryptic sites (13, 20), but this argument wouldbe significantly strengthened by demonstrating that there is alsocoupling to the active site since both elements are needed for acryptic allosteric site. Finally, the mechanisms of many drugs re-main unknown, so elucidating their modes of action may revealyet more cryptic allosteric sites.

We first investigated whether β-lactamase could contain morethan one cryptic site. We can rationalize the accidental discoveryof the known cryptic site in terms of it being the most commontransient pocket and the surrounding residues having some of thegreatest coupling to the active site. Other pockets may be openless frequently and have weaker coupling to the active site, yetstill serve as viable cryptic sites. Our success with identifyingand understanding the known cryptic site in β-lactamase suggeststhat the methods we present here can predict whether there areadditional sites.

Indeed, we have identified a number of other transient pocketsin β-lactamase that may be viable drug targets. Some of thesepockets may be detectable with techniques like NMR, which hasrevealed partially closed conformations with equilibrium popula-tions as low as approximately 5% (55). Based on this precedent,we can estimate that pockets in β-lactamase that are open morethan 5% of the time at equilibrium may be visible with techniqueslike NMR.With this assumption, there are numerous pockets thatare open sufficiently often to be detectable (Fig. S7). For example,the 50 pockets that are most likely to be open are all accessiblemore than 10% of the time. They are also distributed across theprotein (Fig. 5A), and therefore, there are numerous distinct sites.While harder to detect, many of the pockets that are open lessoften may still be viable drug targets. For example, a pocket thatis open 1% of the time is only about 1 kcal∕mol less stable thanone that is open 10% of the time. This difference could easily beovercome by binding a sufficiently high-affinity ligand.

Almost all of these transient pockets could exert allosteric con-trol over activity. Four out of five of the communities of coupledresidues identified here contain portions of the active site. There-fore, a small molecule that targeted any of these communitiescould alter the conformational ensemble in a manner that affectsfunction. Together, these communities encompass 212 out of the263 residues in β-lactamase, so it is not surprising that almostevery transient pocket is adjacent to a residue in at least one ofthese communities. Interestingly, 28 of the 50 most frequentlyopen pockets could exert allosteric control through the same com-munity that encompasses the known cryptic site. These pocketsmay be particularly attractive targets because we already know thiscommunity can exert allosteric control over the protein’s activity.Many of the pockets may also exert allosteric control to varyingextents through multiple communities due to interactions withmultiple clusters, as well as interactions among the communities.In fact, 48 out of the 50 most accessible pockets are adjacent toresidues from multiple communities and, therefore, are particu-larly likely to exert allosteric control through multiple routes.

Fig. 4. A structure highlighting the community of coupled residues encom-passing the known cryptic allosteric site. Side chains in this community areshown as sticks and are colored green if they are in the active site, cyan ifthey are in the allosteric site (i.e., within 3 Å of the cryptic ligand in the holostructure), and blue otherwise. The backbone is colored from blue to redstarting at the N terminus.

11684 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1209309109 Bowman and Geissler

Dow

nloa

ded

by g

uest

on

Mar

ch 2

9, 2

020

Page 5: Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially

Other transient pockets actually coincide with the active site, sothey could be valuable for the design of competitive inhibitors thatextend into these pockets to increase the drug’s affinity.

Based on these observations, a reasonable starting point forexperimentally screening for cryptic sites is to target the transientpockets that are open most often and are adjacent to commu-nities of residues with coupling to the active site. These sitesshould be the most accessible for ligand binding and, therefore,the most promising for rational drug design. Furthermore, thefrequency with which these pockets are open in our simulationslends confidence to our model-based conclusions. Besides theknown cryptic site, some of the best candidates in β-lactamaseare found between Gly238 and Met272, between Gly245 andIle279, between Ala42 and Phe66, and between Leu81 andAla202. However, we note that our results also demonstrate animportant role for induced fit, so starting with the pockets thatare open most often may not be the optimal search strategy, asit assumes conformational selection is dominant. Feedback fromexperiments could allow us to develop a more efficient searchstrategy in this case.

To test the generality of our β-lactamase results, we applied ourmethods to two other proteins with different folds: IL-2 andRNase H. IL-2 was chosen because it plays an important rolein regulating immune responses. Its relevance to human healthhas led to many experimental and theoretical investigations, in-cluding the discovery and characterization of a cryptic allostericsite coupled to a competitive site at IL-2’s binding interface for itsα-receptor (IL-2Rα). RNase H is a similarly well-studied systembecause an RNase H domain in HIV reverse transcriptase plays acrucial role in HIV infection. There are no known drugs targetingthis domain, so the discovery of cryptic sites could render it drug-gable for the first time. Both proteins are also known to have con-formational heterogeneity in their native states (56, 57) that couldresult in cryptic allosteric sites.

Our results for IL-2 are consistent with those from β-lacta-mase. First, there are transient pockets coinciding with the knownallosteric site (Fig. 5B), amongst others. Moreover, these pocketsare connected by a community of coupled residues (magentasidechains in Fig. 5B). Coupling between these sites was alsoobserved in a previous simulation study (12). One importantaddition made in this work is that our more extensive samplingallows us to find many more residues with statistically significantcoupling to the competitive site—which we will treat as the activesite for this system. Residues from the blue and cyan communitiesare also adjacent to the competitive site. Together, these commu-nities encompass the vast majority of the protein (98 of 128 re-sidues). Therefore, as in β-lactamase, there are many transientpockets that could serve as cryptic sites exerting allosteric controlover the competitive site. Some of the best candidate crypticallosteric sites lie roughly between Leu53 and Lys97, Met23and Leu85, and Lys48 and Ile89.

RNase H also contains a multitude of potential cryptic sites.First, there are numerous transient pockets spread across the sur-

face of the protein (Fig. 5C). Moreover, there are two large com-munities of coupled residues (blue and green residues in Fig. 5C)that contain portions of the active site. Together, these commu-nities encompass 90 out of 155 residues, a remarkably large por-tion of the protein given that RNase H contains 32 Gly and Alaresidues that cannot participate in sidechain coupling becausethey have indistinguishable rotameric states. Again, many of thetransient pockets are adjacent to at least one of these commu-nities and, therefore, could serve as cryptic allosteric sites. Someof the most promising candidates lie roughly between His62 andGln113, Tyr28 and Leu59, and Lys86 and Leu111.

ConclusionsThe primary result of this work is that single proteins contain amultitude of potential cryptic allosteric sites that could greatlyexpand the repertoire of available drug targets. This insight wasmade possible by (i) recognizing that cryptic allosteric sites re-quire both the formation of a transient binding pocket and cou-pling to the active site and (ii) developing methods to detect theseelements in the equilibrium fluctuations of a folded protein evenin the absence of any ligand. Our methods achieve this by com-bining MSMs built from extensive molecular dynamics simula-tions with quantitative statistical measures like the mutual infor-mation. We first validated this approach on a known cryptic site inβ-lactamase and then demonstrated that this single protein con-tains numerous other potential cryptic allosteric sites. Applica-tion of these methods to two other systems—IL-2 and RNaseH—supports the generality of these conclusions. In addition todiscovering that cryptic allosteric sites may be even more ubiqui-tous than previously thought, we also found that ligand binding tothese sites occurs via a combination of conformational selectionand induced fit. In particular, ligands can bind to highly open pro-teins and then stabilize more closed, bound conformations.Therefore, it may be fruitful to dock against these more open con-formations. In addition, allosteric signals can be transmitted overlarge distances via coupling between amino acid sidechains evenin the absence of substantial backbone motions. Based on theseresults, our approach should be a powerful means of providingfurther understanding of cryptic allosteric sites and guiding ex-perimental efforts to identify new sites.

Materials and MethodsMolecular dynamics simulations were run at 300 K with the GROMACS soft-ware package (58) deployed on the Folding@home distributed computingenvironment (59). Atomic interactions were described with the Amber03force field (60) and the TIP3P explicit solvent model. MSMBuilder was usedto construct MSMs from this data (34, 35). LIGSITE (36) was used to identifypockets in representative structures for each state and the mutual informa-tion was used to quantify correlations between the rotameric states of pairsof residues. The mutual information (MI) between two residues is

MIðX; Y Þ ¼ ∑x∈x

∑y∈y

pðx; yÞ log�

pðx; yÞpðxÞpðyÞ

�[1]

Fig. 5. A multitude of potential cryptic allosteric sites in (A) β-lactamase, (B) IL-2, and (C) RNase H. The 50 most frequently open pockets are shown as yellowspheres, each representing a pocket with a radius of up to 5 Å (SI Text). Sidechains within each coupled community are rendered in the same color. They aredepicted as spheres if they are in the active site and as sticks otherwise. There are five communities for β-lactamase, six for IL-2, and four for RNase H. Almost everycommunity contains at least one active site residue in each protein, so the vastmajority of transient pockets could serve as cryptic sites. The light-blue, dashed circlesencompass the known allosteric sites in β-lactamase and IL-2 and the backbone of each protein is colored from blue to red starting at the N terminus.

Bowman and Geissler PNAS ∣ July 17, 2012 ∣ vol. 109 ∣ no. 29 ∣ 11685

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

CHEM

ISTR

Y

Dow

nloa

ded

by g

uest

on

Mar

ch 2

9, 2

020

Page 6: Equilibrium fluctuations of a single folded protein reveal ... · The degree to which standard, rational drug design protocols ignore protein-conformational changes is a potentially

where pðx; yÞ is the joint probability distribution function of residues X andY , and pðxÞ is the marginal probability distribution function of residue X. Forour particular application, X denotes one potential rotameric state of residueX. Protein structures were visualized with PyMOL (61). More details are avail-able in SI Text.

ACKNOWLEDGMENTS. Thanks to Veena Thomas for helpful insights intoβ-lactamase. G.R.B. was funded by the Miller Institute. Computing resourceswere provided by users of the Folding@home distributed computing envir-onment and National Institutes of Health Grant R01-GM062868, courtesy ofVijay Pande.

1. Hopkins AL, Groom CR (2002) The druggable genome. Nat Rev Drug Discov 1:727–730.2. Horn JR, Shoichet BK (2004) Allosteric inhibition through core disruption. J Mol Biol

336:1283–1291.3. Hardy JA, Wells JA (2004) Searching for new allosteric sites in enzymes. Curr Opin

Struct Biol 14:706–715.4. Ceccarelli DF, et al. (2011) An allosteric inhibitor of the human Cdc34 ubiquitin-con-

jugating enzyme. Cell 145:1075–1087.5. Arkin MA, et al. (2003) Binding of small molecules to an adaptive protein-protein

interface. Proc Natl Acad Sci USA 100:1603–1608.6. Sadowsky JD, et al. (2011) Turning a protein kinase on or off from a single allosteric site

via disulfide trapping. Proc Natl Acad Sci USA 108:6056–6061.7. Erlanson DA, et al. (2000) Site-directed ligand discovery. Proc Natl Acad Sci USA

97:9367–9372.8. Hilser VJ (2010) An ensemble view of allostery. Science 327:653–654.9. Wrabl JO, et al. (2011) The role of protein conformational fluctuations in allostery,

function, and evolution. Biophys Chem 159:129–141.10. Bowman GR, Huang X, Pande VS (2010) Network models for molecular kinetics and

their initial applications to human health. Cell Res 20:622–630.11. Toth G, Mukhyala K, Wells JA (2007) Computational approach to site-directed ligand

discovery. Proteins 68:551–560.12. McClendon CL, Friedland G, Mobley DL, Amirkhani H, JacobsonMP (2009) Quantifying

correlations between allosteric sites in thermodynamic ensembles. J Chem TheoryComput 5:2486–2502.

13. Eyrisch S, Helms V (2007) Transient pockets on protein surfaces involved in protein-protein interaction. J Med Chem 50:3457–3464.

14. Hilser VJ, Dowdy D, Oas TG, Freire E (1998) The structural distribution of cooperativeinteractions in proteins: Analysis of the native state ensemble. Proc Natl Acad Sci USA95:9903–9908.

15. Dubay KH, Bothma JP, Geissler PL (2011) Long-range intra-protein communicationcan be transmitted by correlated side-chain fluctuations alone. PLoS Comput Biol7:e1002168.

16. Liang MP, Banatao DR, Klein TE, Brutlag DL, Altman RB (2003) WebFEATURE: An inter-active web tool for identifying and visualizing functional sites on macromolecularstructures. Nucleic Acids Res 31:3324–3327.

17. Lockless SW, Ranganathan R (1999) Evolutionarily conserved pathways of energeticconnectivity in protein families. Science 286:295–299.

18. Ho BK, Agard DA (2009) Probing the flexibility of large conformational changes inprotein structures through local perturbations. PLoS Comput Biol 5(4):e1000343.

19. Ota N, Agard DA (2005) Intramolecular signaling pathways revealed by modelinganisotropic thermal diffusion. J Mol Biol 351:345–354.

20. Grant BJ, et al. (2011) Novel allosteric sites on Ras for lead generation. PLoS One6:e25711.

21. Schmidtke P, Bidon-Chanal A, Luque FJ, Barril X (2011) MDpocket: Open-source cavitydetection and characterization on molecular dynamics trajectories. Bioinformatics27:3276–3285.

22. Frembgen-Kesner T, Elcock AH (2006) Computational sampling of a cryptic drug bind-ing site in a protein receptor: Explicit solvent molecular dynamics and inhibitor dock-ing to p38 MAP kinase. J Mol Biol 359:202–214.

23. Wang X, Minasov G, Shoichet BK (2002) Evolution of an antibiotic resistance enzymeconstrained by stability and activity trade-offs. J Mol Biol 320:85–95.

24. Noe F, Fischer S (2008) Transition networks for modeling the kinetics of conformationalchange in macromolecules. Curr Opin Struct Biol 18:154–162.

25. Bowman GR, Pande VS (2010) Protein folded states are kinetic hubs. Proc Natl Acad SciUSA 107:10890–10895.

26. Noe F, Schutte C, Vanden-Eijnden E, Reich L, Weikl TR (2009) Constructing the equili-brium ensemble of folding pathways from short off-equilibrium simulations. Proc NatlAcad Sci USA 106:19011–19016.

27. Silva DA, Bowman GR, Sosa-Peinado A, Huang X (2011) A role for both conformationalselection and induced fit in ligand binding by the LAO protein. PLoS Comput Biol 7:e1002054.

28. Buch I, Giorgino T, De Fabritiis G (2011) Complete reconstruction of an enzyme-inhi-bitor binding process by molecular dynamics simulations. Proc Natl Acad Sci USA108:10184–10189.

29. Held M, Metzner P, Prinz JH, Noe F (2011) Mechanisms of protein-ligand associationand its modulation by protein mutations. Biophys J 100:701–710.

30. Morcos F, et al. (2010) Modeling conformational ensembles of slow functional motionsin Pin1-WW. PLoS Comput Biol 6:e1001015.

31. Yang S, Banavali NK, Roux B (2009) Mapping the conformational transition in Srcactivation by cumulating the information from multiple molecular dynamics trajec-tories. Proc Natl Acad Sci USA 106:3776–3781.

32. Lane TJ, Bowman GR, Beauchamp K, Voelz VA, Pande VS (2011) Markov state modelreveals folding and functional dynamics in ultra-long MD trajectories. J Am Chem Soc133:18413–18419.

33. Bowman GR, Voelz VA, Pande VS (2011) Atomistic folding simulations of the five helixbundle protein λ6–85. J Am Chem Soc 133:664–667.

34. Bowman GR, Huang X, Pande VS (2009) Using generalized ensemble simulations andMarkov state models to identify conformational states. Methods 49:197–201.

35. Beauchamp KA, et al. (2011) MSMBuilder2: Modeling conformational dynamics on thepicosecond to millisecond scale. J Chem Theory Comput 7:3412–3419.

36. HendlichM, Rippmann F, Barnickel G (1997) LIGSITE: Automatic and efficient detectionof potential small molecule-binding sites in proteins. J Mol Graph Modell 15:359–363–389.

37. Wlodarski T, Zagrovic B (2009) Conformational selection and induced fit mechanismunderlie specificity in noncovalent interactions with ubiquitin. Proc Natl Acad Sci USA106:19346–19351.

38. Bucher D, Grant BJ, McCammon JA (2011) Induced fit or conformational selection?The role of the semi-closed state in the maltose binding protein. Biochemistry50:10530–10539.

39. Hammes GG, Chang YC, Oas TG (2009) Conformational selection or induced fit A fluxdescription of reaction mechanism. Proc Natl Acad Sci USA 106:13737–13741.

40. Lau AY, Roux B (2011) The hidden energetics of ligand binding and activation in aglutamate receptor. Nat Struct Mol Biol 18:283–287.

41. D’AbramoM, Rabal O, Oyarzabal J, Gervasio FL (2012) Conformational selection versusinduced fit in kinases: The case of PI3K-gamma. Angew Chem Int Ed Engl 51:642–646.

42. Csermely P, Palotai R, Nussinov R (2010) Induced fit, conformational selection andindependent dynamic segments: An extended view of binding events. Trends BiochemSci 35:539–546.

43. Berezhkovskii A, Hummer G, Szabo A (2009) Reactive flux and folding pathways innetwork models of coarse-grained protein dynamics. J Chem Phys 130:205102.

44. Young MA, Gonfloni S, Superti-Furga G, Roux B, Kuriyan J (2001) Dynamic couplingbetween the SH2 and SH3 domains of c-Src and Hck underlies their inactivation byC-terminal tyrosine phosphorylation. Cell 105:115–126.

45. Strynadka NC, et al. (1992) Molecular structure of the acyl-enzyme intermediate inbeta-lactam hydrolysis at 1.7 Å resolution. Nature 359:700–705.

46. Brandman R, Lampe JN, Brandman Y, de Montellano PR (2011) Active-site residuesmove independently from the rest of the protein in a 200 ns molecular dynamicssimulation of cytochrome P450 CYP119. Arch Biochem Biophys 509:127–132.

47. DuBay KH, Geissler PL (2009) Calculation of proteins’ total side-chain torsional entropyand its influence on protein-ligand interactions. J Mol Biol 391:484–497.

48. Lang PT, et al. (2010) Automated electron-density sampling reveals widespread con-formational polymorphism in proteins. Protein Sci 19:1420–1431.

49. Lenaerts T, et al. (2008) Quantifying information transfer by protein domains: Analysisof the Fyn SH2 domain structure. BMC Struct Biol 8:43.

50. Pan H, Lee JC, Hilser VJ (2000) Binding sites in Escherichia coli dihydrofolate reductasecommunicate by modulating the conformational ensemble. Proc Natl Acad Sci USA97:12020–12025.

51. Schames JR, et al. (2004) Discovery of a novel binding trench in HIV integrase. J MedChem 47:1879–1881.

52. Grimsley JK, Calamini B, Wild JR, Mesecar AD (2005) Structural and mutational studiesof organophosphorus hydrolase reveal a cryptic and functional allosteric-binding site.Arch Biochem Biophys 442:169–179.

53. Lundqvist T, et al. (2007) Exploitation of structural and regulatory diversity in gluta-mate racemases. Nature 447:817–822.

54. Gee CL, et al. (2007) Enzyme adaptation to inhibitor binding: A cryptic binding site inphenylethanolamine N-methyltransferase. J Med Chem 50:4845–4853.

55. Tang C, Schwieters CD, Clore GM (2007) Open-to-closed transition in apo maltose-binding protein observed by paramagnetic NMR. Nature 449:1078–1082.

56. Bernstein R, Schmidt KL, Harbury PB, Marqusee S (2011) Structural and kinetic map-ping of side-chain exposure onto the protein energy landscape. Proc Natl Acad Sci USA108:10532–10537.

57. Levin AM, et al. (2012) Exploiting a natural conformational switch to engineer aninterleukin-2 ‘superkine’. Nature 484:529–533.

58. Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) GROMACS 4: Algorithms for highlyefficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput4:435–447.

59. Shirts M, Pande VS (2000) COMPUTING: Screen savers of the world unite! Science290:1903–1904.

60. Wang JM, Cieplak P, Kollman PA (2000) How well does a restrained electrostaticpotential (RESP) model perform in calculating conformational energies of organicand biological molecules? J Comput Chem 21:1049–1074.

61. DeLano WL (2002) The PyMOL Molecular Graphics System, Version 1.5.0.3 Schrödin-ger, LLC...

11686 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1209309109 Bowman and Geissler

Dow

nloa

ded

by g

uest

on

Mar

ch 2

9, 2

020