Program Phase in ligand-based Pharmacophore generation and 3D database searching

1
Simone Brogi Simone Brogi and Andrea Tafi and Andrea Tafi Dipartimento Farmaco Chimico Tecnologico, Università degli Studi di Siena Dipartimento Farmaco Chimico Tecnologico, Università degli Studi di Siena Via Aldo Moro, I Via Aldo Moro, I-53100 Siena, Italy 53100 Siena, Italy We We have have applied applied a novel novel approach approach to to generate generate a ligand ligand-based based pharmacophore pharmacophore model model. The The pharmacophore pharmacophore was was built built from from a set set of of 42 42 compounds compounds showing showing activity activity against against MCF MCF-7 cell cell line line derived derived from from human human mammary mammary adenocarcinoma, adenocarcinoma, 1 using using the the program program PHASE, PHASE, 2 implemented implemented in in the the Schrödinger Schrödinger suite suite software software package package. PHASE PHASE is is a highly highly flexible flexible system system for for common common pharmacophore pharmacophore identification identification and and assessment assessment and and 3D-database database creation creation and and searching searching. The The best best pharmacophore pharmacophore hypothesis hypothesis showed showed five five features features: two two hydrogen hydrogen-bond bond acceptors, acceptors, one one hydrogen hydrogen-bond bond donor, donor, and and two two aromatic aromatic rings rings. The The structure structure–activity activity relationship relationship (SAR) (SAR) so so acquired acquired was was applied applied within within PHASE PHASE for for molecular molecular alignment alignment in in a comparative comparative molecular molecular field field analysis analysis (CoMFA) (CoMFA) 3D-QSAR QSAR study study. 3 The The 3D-QSAR QSAR model model yielded yielded a internal internal test test set set r 2 equal equal to to 0.97 97 and and demonstrated demonstrated to to be be highly highly predictive predictive with with respect respect to to an an external external test test set set of of 18 18 compounds compounds (r (r 2 =0.93 93). In In summary, summary, in in this this study study we we improved improved a previously previously developed developed Catalyst Catalyst MCF MCF-7 inhibitory inhibitory pharmacophore, pharmacophore, 4 and and established established a predictive predictive 3D-QSAR QSAR model model. We We have have further further used used this this model model to to detect detect novel novel MCF MCF-7 cell cell line line inhibitors inhibitors through through 3D database database searching searching Pharmacophore generation Pharmacophore generation PHASE PHASE 2.5 implemented implemented in in the the Maestro Maestro 8.0 modeling modeling package package (Schr (Schrödinger, dinger, LLC, LLC, New New York, York, NY) NY) was was used used to to generate generate pharmacophore pharmacophore models models for for MCF MCF-7 cell cell line line inhibitors inhibitors Some Some highly highly active active SERM, SERM, were were selected selected for for generating generating the the pharmacophore pharmacophore hypotheses hypotheses (Fig (Fig. 2) Pharmacophore Pharmacophore feature feature sites sites for for the the best best PHASE PHASE model model were were: two two hydrogen hydrogen-bond bond acceptors acceptors (A (A3, A5), ), one one hydrogen hydrogen-bond bond donor donor (D (D6) and and two two aromatic aromatic sites sites (R (R9, R10 10) (Fig (Fig.1) Common Common pharmacophore pharmacophore hypotheses hypotheses were were identified, identified, scored scored and and ranked ranked. The The regression regression is is performed performed by by a partial partial least least squares squares (PLS) (PLS) method method All All the the molecules molecules used used for for QSAR QSAR studies studies were were aligned aligned to to the the pharmacophore pharmacophore hypothesis hypothesis obtained obtained in in PHASE PHASE Fig Fig. 1 Superposition Superposition of of best best PHASE PHASE model model and and the the most most active active compound compound in in the the set set (38 38). Pharmacophore Pharmacophore features features are are color color-coded coded: cyan cyan for for hydrogen hydrogen bond bond donor donor (D), (D), pink pink for for hydrogen hydrogen bond bond acceptor acceptor (A), (A), brown brown rings rings for for the the aromatic aromatic features features (R) (R) Development of a PHASE 3D Development of a PHASE 3D--QSAR model QSAR model The The PHASE PHASE-generated generated 3D pharmacophore pharmacophore was was used used as as the the alignment alignment template template for for the the 3D-QSAR QSAR model model (Fig (Fig.4-5) 3 Phase Phase determines determines how how molecular molecular structure structure affects affects drug drug activity activity by by dividing dividing space space into into a fine fine cubic cubic grid, grid, encoding encoding Acknowledgment Acknowledgment: We We are are grateful grateful to to Prof Prof. Vassilios Vassilios Roussis Roussis and and co co-workers workers for for the the chemical chemical entities entities and and the the biological biological assays assays References References: (1) Soule Soule H. D. et et al al. J. Natl Natl Cancer Cancer Inst Inst. 1973 1973 51 51 (5) 1409 1409; (2) PHASE PHASE 2.5 (Schr Schrödinger, dinger, LLC, LLC, New New York, York, NY NY); (3) Dixon Dixon S. L. et et al al. J. Comput Comput. Aided Aided Mol Mol. Des Des. 2006 2006 20 20 (10 10-11 11) 647 647; (4) Kladi, Kladi, M. et et al al. J. Nat Nat. Prod Prod. 2009 2009 ASAP ASAP DOI DOI: 10 10.1021 1021/np /np800481 800481w; (5) PHASE PHASE user user manual manual; (6) Walters, Walters, W. P. et et al al. Adv Adv. Drug Drug Deliv Deliv. Rev Rev. 2002 2002 54 54 255 255 Fig Fig. 2 SERM SERM derivatives derivatives used used in in this this study study Fig Fig. 3 Predicted Predicted versus versus observed observed value value inhibitory inhibitory activity activity pIC pIC 50 50 (M) (M) 3D 3D--Database searching Database searching Conclusion Conclusion atom atom type type occupation occupation as as numerical numerical information, information, and and performing performing a partial partial least least-squares squares (PLS) (PLS) regression regression The The independent independent variables variables in in the the QSAR QSAR model model were were derived derived from from a regular regular grid grid of of cubic cubic volume volume elements elements that that span span the the space space occupied occupied by by the the training training set set ligands ligands and and biological biological activities activities (pIC (pIC 50 50 values) values) were were used used as as dependent dependent variables variables. In In addition addition to to the the q 2 , the the conventional conventional correlation correlation coefficient coefficient r 2 and and its its standard standard errors errors were were also also computed computed (Table (Table 1) Fig Fig. 4 3D-QSAR QSAR model model around around the the most most active active compounds compounds in in the the set set (38 38) In In our our study, study, we we built built a pharmacophore pharmacophore model model by by applying applying the the ligand ligand-based based pharmacophore pharmacophore generation generation approach, approach, using using PHASE PHASE. Different Different pharmacophore pharmacophore based based QSAR QSAR models models were were developed developed by by using using PLS PLS analysis analysis The The best best resulting resulting hypothesis hypothesis consisted consisted of of five five features features: two two hydrogen hydrogen bond bond acceptors, acceptors, one one hydrogen hydrogen-bond bond donor donor and and two two aromatic aromatic sites sites. The The alignment alignment rule rule of of the the best best-fit fit model model was was used used to to develop develop ligand ligand-based based 3D-QSAR QSAR model model The The established established computational computational tool tool endowed endowed with with high high predictive predictive ability ability and and robustness, robustness, might might be be useful useful for for the the design design and and optimization optimization of of new new MCF MCF-7 cell cell line line inhibitors inhibitors Validation of PHASE 3D Validation of PHASE 3D- -QSAR model QSAR model 18 18 new new potential potential SERMs SERMs were were tested tested against against MCF MCF-7 cells cells and and then then used used as as an an external external test test set set for for PHASE PHASE 3D-QSAR QSAR model model validations validations for for predictive predictive ability ability. The The prediction prediction results results of of this this external external test test set set are are show show in in Figure Figure 3 The The large large value value of of variance variance ratio ratio (F) (F) indicates indicates a statistically statistically significant significant regression regression model, model, which which is is supported supported by by the the small small value value of of the the significance significance level level of of variance variance ratio ratio (P), (P), an an indication indication of of a high high degree degree of of confidence confidence. The The q 2 value value suggesting suggesting the the model model is is robust robust (Table (Table 1) 5 Therefore, Therefore, the the correlation correlation between between the the actual actual and and predicted predicted values values of of activities activities suggested suggested that that the the PHASE PHASE 3D-QSAR QSAR model model was was reliable reliable. The The steric, steric, electrostatic, electrostatic, and and hydrogen hydrogen bond bond acceptor acceptor and and donor donor field field effects effects were were nicely nicely related related with with variation variation of of activity activity 3D-Database Database searching searching is is a powerful powerful tool tool to to discover discover new new structures structures and and design design new new ligands ligands of of a biological biological target target In In our our study, study, the the computational computational 3D-QSAR QSAR model model developed developed by by PHASE PHASE was was used used to to search search Asinex Asinex chemical chemical databases databases (about (about 250 250,000 000 structurally structurally diversified diversified small small molecules) molecules) for for new new chemical chemical structures structures active active against against MCF MCF-7 cell cell line line Compounds Compounds with with a predicted predicted activity activity cutoff cutoff value value of of 0.5 (pIC (pIC 50 50 μM) M) were were selected selected. Other Other filters filters were were applied applied to to identify identify entries entries against against MCF MCF-7 cell cell line line: the the compounds compounds must must satisfy satisfy the the Lipiniski's Lipiniski's rule rule of of five five The The query query identified identified 19 19 top top-ranking ranking compounds compounds with with high high predicted predicted activity activity against against MCF MCF-7 cell cell line line. These These molecules molecules were were considered considered likely likely to to be be well well-absorbed absorbed because because they they satisfied satisfied Lipiniski's Lipiniski's rule rule of of five five 6 These These 19 19 top top-ranking ranking compounds compounds will will be be submitted submitted to to biological biological evaluation evaluation Statistical Statistical parameter parameter value value SD 0.327 r 2 0,97 F 261.4 P 6.731 e-16 RMSE 1.384 Q 2 0.74 R 0.81 Table 1 Statistical parameter of PHASE 3D-QSAR models.Descriptor of the QSAR results: SD Standard deviation of the regression. r 2 : Value of r 2 for the regression. F Variance ratio. P Significance level of variance ratio. RMSE Root-mean-square error. Q 2 value of Q 2 for the predicted activities. R r-Pearson value Fig Fig. 5 3D-QSAR QSAR model model for for an an active active ligand ligand (left) (left) and and an an inactive inactive ligand ligand (right) (right); colored colored according according to to the the sign sign of of their their coefficient coefficient values values: blue blue for for positive positive coefficients coefficients and and red red for for negative negative coefficients coefficients. Positive Positive coefficients coefficients indicate indicate an an increase increase in in activity, activity, negative negative coefficients coefficients a decrease decrease

description

We have applied a novel approach to generate a ligand-based pharmacophore model. The pharmacophore was built from a set of 42 compounds showing activity against MCF-7 cell line derived from human mammary adenocarcinoma,1 using the program PHASE,2 implemented in the Schrödinger suite software package. PHASE is a highly flexible system for common pharmacophore identification and assessment and 3D-database creation and searching. The best pharmacophore hypothesis showed five features: two hydrogen-bond acceptors, one hydrogen-bond donor, and two aromatic rings.

Transcript of Program Phase in ligand-based Pharmacophore generation and 3D database searching

Page 1: Program Phase in ligand-based Pharmacophore generation and 3D database searching

Simone BrogiSimone Brogi and Andrea Tafiand Andrea Tafi

Dipartimento Farmaco Chimico Tecnologico, Università degli Studi di SienaDipartimento Farmaco Chimico Tecnologico, Università degli Studi di SienaVia Aldo Moro, IVia Aldo Moro, I--53100 Siena, Italy53100 Siena, Italy

WeWe havehave appliedapplied aa novelnovel approachapproach toto generategenerate aa ligandligand--basedbased pharmacophorepharmacophore modelmodel.. TheThe pharmacophorepharmacophore waswas builtbuilt fromfrom aa setset ofof 4242 compoundscompounds showingshowing activityactivity againstagainst MCFMCF--77 cellcelllineline derivedderived fromfrom humanhuman mammarymammary adenocarcinoma,adenocarcinoma,11 usingusing thethe programprogram PHASE,PHASE,22 implementedimplemented inin thethe SchrödingerSchrödinger suitesuite softwaresoftware packagepackage.. PHASEPHASE isis aa highlyhighly flexibleflexible systemsystem forforcommoncommon pharmacophorepharmacophore identificationidentification andand assessmentassessment andand 33DD--databasedatabase creationcreation andand searchingsearching.. TheThe bestbest pharmacophorepharmacophore hypothesishypothesis showedshowed fivefive featuresfeatures:: twotwo hydrogenhydrogen--bondbondacceptors,acceptors, oneone hydrogenhydrogen--bondbond donor,donor, andand twotwo aromaticaromatic ringsrings.. TheThe structurestructure––activityactivity relationshiprelationship (SAR)(SAR) soso acquiredacquired waswas appliedapplied withinwithin PHASEPHASE forfor molecularmolecular alignmentalignment inin aacomparativecomparative molecularmolecular fieldfield analysisanalysis (CoMFA)(CoMFA) 33DD--QSARQSAR studystudy..33 TheThe 33DD--QSARQSAR modelmodel yieldedyielded aa internalinternal testtest setset rr22 equalequal toto 00..9797 andand demonstrateddemonstrated toto bebe highlyhighly predictivepredictive withwith respectrespect totoanan externalexternal testtest setset ofof 1818 compoundscompounds (r(r22 ==00..9393)).. InIn summary,summary, inin thisthis studystudy wewe improvedimproved aa previouslypreviously developeddeveloped CatalystCatalyst MCFMCF--77 inhibitoryinhibitory pharmacophore,pharmacophore,44 andand establishedestablished aa predictivepredictive33DD--QSARQSAR modelmodel.. WeWe havehave furtherfurther usedused thisthis modelmodel toto detectdetect novelnovel MCFMCF--77 cellcell lineline inhibitorsinhibitors throughthrough 33DD databasedatabase searchingsearching

Pharmacophore generationPharmacophore generationPHASEPHASE 22..55 implementedimplemented inin thethe MaestroMaestro 88..00modelingmodeling packagepackage (Schr(Schröödinger,dinger, LLC,LLC, NewNewYork,York, NY)NY) waswas usedused toto generategeneratepharmacophorepharmacophore modelsmodels forfor MCFMCF--77 cellcell linelineinhibitorsinhibitorsSomeSome highlyhighly activeactive SERM,SERM, werewere selectedselected forforgeneratinggenerating thethe pharmacophorepharmacophore hypotheseshypotheses(Fig(Fig.. 22))PharmacophorePharmacophore featurefeature sitessites forfor thethe bestbestPHASEPHASE modelmodel werewere:: twotwo hydrogenhydrogen--bondbondacceptorsacceptors (A(A33,, AA55),), oneone hydrogenhydrogen--bondbond donordonor(D(D66)) andand twotwo aromaticaromatic sitessites (R(R99,, RR1010)) (Fig(Fig..11))CommonCommon pharmacophorepharmacophore hypotheseshypotheses werewereidentified,identified, scoredscored andand rankedranked.. TheThe regressionregressionisis performedperformed byby aa partialpartial leastleast squaressquares (PLS)(PLS)methodmethodAllAll thethe moleculesmolecules usedused forfor QSARQSAR studiesstudieswerewere alignedaligned toto thethe pharmacophorepharmacophorehypothesishypothesis obtainedobtained inin PHASEPHASE

FigFig.. 11 SuperpositionSuperposition ofof bestbest PHASEPHASE modelmodel andand thethe mostmost activeactive compoundcompound inin thethe setset ((3838))..PharmacophorePharmacophore featuresfeatures areare colorcolor--codedcoded:: cyancyan forfor hydrogenhydrogen bondbond donordonor (D),(D), pinkpink forfor hydrogenhydrogen bondbondacceptoracceptor (A),(A), brownbrown ringsrings forfor thethe aromaticaromatic featuresfeatures (R)(R)

Development of a PHASE 3DDevelopment of a PHASE 3D--QSAR modelQSAR modelTheThe PHASEPHASE--generatedgenerated 33DD pharmacophorepharmacophore waswas usedused asas thethealignmentalignment templatetemplate forfor thethe 33DD--QSARQSAR modelmodel (Fig(Fig..44--55))33

PhasePhase determinesdetermines howhow molecularmolecular structurestructure affectsaffects drugdrugactivityactivity byby dividingdividing spacespace intointo aa finefine cubiccubic grid,grid, encodingencoding

AcknowledgmentAcknowledgment:: WeWe areare gratefulgrateful toto ProfProf.. VassiliosVassilios RoussisRoussis andand coco--workersworkers forfor thethe chemicalchemical entitiesentities andand thethe biologicalbiological assaysassays

ReferencesReferences:: ((11)) SouleSoule HH.. DD.. etet alal.. JJ.. NatlNatl CancerCancer InstInst.. 19731973 5151 ((55)) 14091409;; ((22)) PHASEPHASE 22..55 ((SchrSchröödinger,dinger, LLC,LLC, NewNew York,York, NYNY));; ((33)) DixonDixon SS.. LL.. etet alal.. JJ.. ComputComput.. AidedAided MolMol.. DesDes.. 20062006 2020((1010--1111)) 647647;; ((44)) Kladi,Kladi, MM.. etet alal.. JJ.. NatNat.. ProdProd.. 20092009 ASAPASAP DOIDOI:: 1010..10211021/np/np800481800481ww;; ((55)) PHASEPHASE useruser manualmanual;; ((66)) Walters,Walters, WW.. PP.. etet alal.. AdvAdv.. DrugDrug DelivDeliv.. RevRev.. 20022002 5454 255255

FigFig.. 22 SERMSERM derivativesderivatives usedused inin thisthis studystudy

FigFig.. 33 PredictedPredicted versusversus observedobserved valuevalue inhibitoryinhibitory activityactivity pICpIC5050 (M)(M)

3D3D--Database searchingDatabase searching

ConclusionConclusion

activityactivity byby dividingdividing spacespace intointo aa finefine cubiccubic grid,grid, encodingencodingatomatom typetype occupationoccupation asas numericalnumerical information,information, andandperformingperforming aa partialpartial leastleast--squaressquares (PLS)(PLS) regressionregressionTheThe independentindependent variablesvariables inin thethe QSARQSAR modelmodel werewere derivedderivedfromfrom aa regularregular gridgrid ofof cubiccubic volumevolume elementselements thatthat spanspan thethespacespace occupiedoccupied byby thethe trainingtraining setset ligandsligands andand biologicalbiologicalactivitiesactivities (pIC(pIC5050 values)values) werewere usedused asas dependentdependent variablesvariables.. InInadditionaddition toto thethe qq22,, thethe conventionalconventional correlationcorrelation coefficientcoefficient rr22

andand itsits standardstandard errorserrors werewere alsoalso computedcomputed (Table(Table 11))

FigFig.. 44 33DD--QSARQSAR modelmodel aroundaround thethe mostmostactiveactive compoundscompounds inin thethe setset ((3838))

InIn ourour study,study, wewe builtbuilt aa pharmacophorepharmacophore modelmodel bybyapplyingapplying thethe ligandligand--basedbased pharmacophorepharmacophore generationgenerationapproach,approach, usingusing PHASEPHASE.. DifferentDifferent pharmacophorepharmacophorebasedbased QSARQSAR modelsmodels werewere developeddeveloped byby usingusing PLSPLSanalysisanalysisTheThe bestbest resultingresulting hypothesishypothesis consistedconsisted ofof fivefivefeaturesfeatures:: twotwo hydrogenhydrogen bondbond acceptors,acceptors, oneonehydrogenhydrogen--bondbond donordonor andand twotwo aromaticaromatic sitessites.. TheThealignmentalignment rulerule ofof thethe bestbest--fitfit modelmodel waswas usedused totodevelopdevelop ligandligand--basedbased33DD--QSARQSAR modelmodelTheThe establishedestablished computationalcomputational tooltool endowedendowed withwithhighhigh predictivepredictive abilityability andand robustness,robustness, mightmight bebeusefuluseful forfor thethe designdesign andand optimizationoptimization ofof newnew MCFMCF--77cellcell lineline inhibitorsinhibitors

Validation of PHASE 3DValidation of PHASE 3D--QSAR modelQSAR model1818 newnew potentialpotential SERMsSERMs werewere testedtested againstagainst MCFMCF--77 cellscellsandand thenthen usedused asas anan externalexternal testtest setset forfor PHASEPHASE 33DD--QSARQSARmodelmodel validationsvalidations forfor predictivepredictive abilityability.. TheThe predictionprediction resultsresultsofof thisthis externalexternal testtest setset areare showshow inin FigureFigure 33TheThe largelarge valuevalue ofof variancevariance ratioratio (F)(F) indicatesindicates aa statisticallystatisticallysignificantsignificant regressionregression model,model, whichwhich isis supportedsupported byby thethe smallsmallvaluevalue ofof thethe significancesignificance levellevel ofof variancevariance ratioratio (P),(P), ananindicationindication ofof aa highhigh degreedegree ofof confidenceconfidence.. TheThe qq22 valuevaluesuggestingsuggesting thethe modelmodel isis robustrobust (Table(Table 11))55

Therefore,Therefore, thethe correlationcorrelation betweenbetween thethe actualactual andand predictedpredictedvaluesvalues ofof activitiesactivities suggestedsuggested thatthat thethe PHASEPHASE 33DD--QSARQSARmodelmodel waswas reliablereliable.. TheThe steric,steric, electrostatic,electrostatic, andand hydrogenhydrogenbondbond acceptoracceptor andand donordonor fieldfield effectseffects werewere nicelynicely relatedrelated withwithvariationvariation ofof activityactivity

33DD--DatabaseDatabase searchingsearching isis aa powerfulpowerful tooltool toto discoverdiscover newnewstructuresstructures andand designdesign newnew ligandsligands ofof aa biologicalbiological targettarget InIn ourour study,study, thethe computationalcomputational 33DD--QSARQSAR modelmodel developeddevelopedbyby PHASEPHASE waswas usedused toto searchsearch AsinexAsinex chemicalchemical databasesdatabases(about(about 250250,,000000 structurallystructurally diversifieddiversified smallsmall molecules)molecules) forfornewnew chemicalchemical structuresstructures activeactive againstagainst MCFMCF--77 cellcell lineline CompoundsCompounds withwith aa predictedpredicted activityactivity cutoffcutoff valuevalue ofof 00..55(pIC(pIC5050 µµM)M) werewere selectedselected.. OtherOther filtersfilters werewere appliedapplied toto identifyidentifyentriesentries againstagainst MCFMCF--77 cellcell lineline:: thethe compoundscompounds mustmust satisfysatisfythethe Lipiniski'sLipiniski's rulerule ofof fivefiveTheThe queryquery identifiedidentified 1919 toptop--rankingranking compoundscompounds withwith highhighpredictedpredicted activityactivity againstagainst MCFMCF--77 cellcell lineline.. TheseThese moleculesmoleculeswerewere consideredconsidered likelylikely toto bebe wellwell--absorbedabsorbed becausebecause theytheysatisfiedsatisfied Lipiniski'sLipiniski's rulerule ofof fivefive66

TheseThese 1919 toptop--rankingranking compoundscompounds willwill bebe submittedsubmitted totobiologicalbiological evaluationevaluation

Statistical Statistical parameterparameter

valuevalue

SD 0.327

r2 0,97

F 261.4

P 6.731 e-16

RMSE 1.384

Q2 0.74

R 0.81

Table 1 Statistical parameter of PHASE 3D-QSARmodels.Descriptor of the QSAR results: SDStandard deviation of the regression. r2: Value of r2

for the regression. F Variance ratio. P Significancelevel of variance ratio. RMSE Root-mean-squareerror. Q2 value of Q2 for the predicted activities. Rr-Pearson value

FigFig.. 55 33DD--QSARQSAR modelmodel forfor anan activeactive ligandligand (left)(left) andand anan inactiveinactive ligandligand (right)(right);; coloredcolored accordingaccording toto thethe signsignofof theirtheir coefficientcoefficient valuesvalues:: blueblue forfor positivepositive coefficientscoefficients andand redred forfor negativenegative coefficientscoefficients.. PositivePositive coefficientscoefficientsindicateindicate anan increaseincrease inin activity,activity, negativenegative coefficientscoefficients aa decreasedecrease