LigandScout: Automated Pharmacophore Model Creation

1
LigandScout: Automated Pharmacophore Model Creation G. Wolber and T. Langer Institute of Pharmacy, Dpt. of Pharmaceutical Chemistry, University of Innsbruck, Innrain 52, A 6020 Innsbruck, Austria model. Devations from the ideal geometric conformation can be shown in a geo- metric uncertainty histogram (see Fig. 2), which allows the user to estimate the possible error rate for the automated interpretation of bond characteristics. 2. Pharmacophore Creation After ligand interpretation, all atoms whose mutual distance is smaller than 5 Å are investigated. If the ligand atom is part of a chemical pattern forming one of the pharmacophoric chemical features and additionally faces an atom of a comple- mentary chemical feature on the receptor side, it is selected to be part of the new pharmacophore (see Fig. 1 for an example). Results From 20.622 PDB entries, 42.471 ligand conformations of 3.387 different ligands were extracted in about eight hours on five standard PCs running Linux. For 3.111 ligands, relevant surrounding amino acids could be isolated in order to form phar- macophores. A graphical user interface was created in order to visualize and inter- actively investigate the created models. Conclusion and Perspectives An algorithm for automatically creating pharmacophore models suitable for virtu- al screening was presented. Due to the progress in bioinformatics and the human genome project, the number of new complex submissions to the PDB constantly grows every year. Automated methods for interpreting PDB data already play an important role. Filtering techniques, such as the one presented, will be essential to manage the huge amount of data available. Combinatorial chemistry and virtual libraries have become a significant factor in drug discovery, which has lead to an increasing number of compounds available for testing. The presented technology allows to create very selective and high- quality scoring functions in order to pre-categorize and rank the huge amount of compounds available. It could even be used to screen one single compound against several targets in order to retrieve information on adverse effects. References [1] Wermuth, C.-G.; Langer, T. Pharmacophore Identification, H. Kubinyi, Ed., Escom: Leiden, 1998, pp 117-136 [2] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne. The protein data bank. Nucleic Acids Research, 28:235-242, 2000. [3] R. Sayle. Perception of molecular connectivity from 3D coordinates, 2001. Bioinformatics Group, Metaphorics LLC, Santa Fe, New Mexico. [4] M. Hendlich, F. Rippmann, and G. Barnickel. BALI: Automatic assignment of bond and atom types for protein ligands in the brookhaven protein databank. J. Chem. Inf. Comput. Sci., 37:774-778, 1997. [5] G. Wolber and T. Langer. Comb i gen: A novel software package for the rapid generation of virtual combinatorial libraries. In H.-D. Höltje and W. Sippl, edi- tors, Rational Approaches to drug design, pp. 390-399. Prous Science, 2000. Introduction Due to the evolving technologies in the area of bioinformatics and combinatorial chemistry, the number of known targets as well as the size of compound libraries available is exploding. Virtual screening using pharmacophores has proved to be a major technique for pre-screening compounds from libraries against targets in- silico [1]. Aim An algorithm for automatically extracting pharmacophores without manual inter- action from a PDB complex file containing a protein complexed with a small organic ligand will be presented. This algorithm is to be implemented by using the ilib framework, that was previously used for implementing Comb i Gen[5], a tool for rapidly generating diverse, drug-like libraries. Subsequently it shall be applied to all relevant entries of the Brookhaven Protein Databank in order to generate a col- lection of pharmacophore models suitable for virtual screening. Methods and Data Sources Pharmacophore Models A pharmacophore is the three-dimensional arrangement of chemical features that causes activation or inhibition of the receptor. For the presented algorithm, chem- ical features include hydrogen-bond donors and acceptors as directed vectors, positive and negative ionizable areas as well as lipophilic areas represented by spheres. In order to increase selectivity, excluded volume spheres are added to reflect potential steric restrictions. The Brookhaven Protein Databank The Brookhaven Protein Databank (PDB) [2], currently operated by the American Research Collaboratory for Structural Bioinformatics, contains approx. 15.000 proteins, peptides or viruses whose three-dimensional structures were deter- mined via X-Ray crystallography. About 3.000 co-crystallized ligands in different conformations are contained in these complexes. They form the starting point for an extensive investigation of the chemical features relevant for drug interaction and, as a next step, pharmacophore creation. Application Flow 1. Ligand Extraction and Interpretation In a distributed computing environment (5-node Linux cluster), more than 20.000 PDB entries were analyzed in order to find all ligands from X-ray protein comp- lexes. All relevant ligands were extracted from proteins, interpreted and stored including the relevant surrounding amino acids. Due to historical reasons, experi- mental limitations, and file format constraints, nei- ther bond type information, nor hydrogens are included in PDB ligand information. Hybridization state information had to be elucidated from the geometrical positions of the ligand atoms. In con- trast to existing approaches [3,4], not only bond angles between three atoms were considered but all neighbour atoms for each atom were taken into account, resulting in a quantitative geometry 0 20 40 60 80 100 0 1 2 3 4 5 6 7 8 9 10 Percentage of atoms Deviation Distribution of uncertainty in hybridization state interpretation Fig. 3: Interactive graphical user interface showing Cerivastatin in PDB entry 1HWJ and the generated pharmacophore (Hydrogen bond acceptors green, Hydrogen bond donors pink, lipophilic yellow, excluded volume gray, neg. ionizable blue) Fig. 2: Uncertainty distribution corresponing to ligand interpretation from entry NPE (1AJ7) Fig. 1: Example for a pharmacophore model (left) including relevant chemical features derived from ligand 4-(N-Acetylamino)- 3-[N-(2-Ethylbutanoylamino)]benzoic acid in PDB entry 1B9S (right)

Transcript of LigandScout: Automated Pharmacophore Model Creation

Page 1: LigandScout: Automated Pharmacophore Model Creation

LigandScout: Automated Pharmacophore Model CreationG. Wolber and T. Langer

Institute of Pharmacy, Dpt. of Pharmaceutical Chemistry, University of Innsbruck, Innrain 52, A 6020 Innsbruck, Austria

model. Devations from the ideal geometric conformation can be shown in a geo-metric uncertainty histogram (see Fig. 2), which allows the user to estimate thepossible error rate for the automated interpretation of bond characteristics.

2. Pharmacophore Creation

After ligand interpretation, all atoms whose mutual distance is smaller than 5 Å areinvestigated. If the ligand atom is part of a chemical pattern forming one of thepharmacophoric chemical features and additionally faces an atom of a comple-mentary chemical feature on the receptor side, it is selected to be part of the newpharmacophore (see Fig. 1 for an example).

Results

From 20.622 PDB entries, 42.471 ligand conformations of 3.387 different ligandswere extracted in about eight hours on five standard PCs running Linux. For 3.111ligands, relevant surrounding amino acids could be isolated in order to form phar-macophores. A graphical user interface was created in order to visualize and inter-actively investigate the created models.

Conclusion and PerspectivesAn algorithm for automatically creating pharmacophore models suitable for virtu-al screening was presented. Due to the progress in bioinformatics and the humangenome project, the number of new complex submissions to the PDB constantlygrows every year. Automated methods for interpreting PDB data already play animportant role. Filtering techniques, such as the one presented, will be essentialto manage the huge amount of data available.Combinatorial chemistry and virtual libraries have become a significant factor indrug discovery, which has lead to an increasing number of compounds availablefor testing. The presented technology allows to create very selective and high-quality scoring functions in order to pre-categorize and rank the huge amount ofcompounds available. It could even be used to screen one single compoundagainst several targets in order to retrieve information on adverse effects.

References

[1] Wermuth, C.-G.; Langer, T. Pharmacophore Identification, H. Kubinyi, Ed.,Escom: Leiden, 1998, pp 117-136

[2] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N.Shindyalov, and P.E. Bourne. The protein data bank. Nucleic Acids Research,28:235-242, 2000.

[3] R. Sayle. Perception of molecular connectivity from 3D coordinates, 2001.Bioinformatics Group, Metaphorics LLC, Santa Fe, New Mexico.

[4] M. Hendlich, F. Rippmann, and G. Barnickel. BALI: Automatic assignment ofbond and atom types for protein ligands in the brookhaven protein databank.J. Chem. Inf. Comput. Sci., 37:774-778, 1997.

[5] G. Wolber and T. Langer. Combigen: A novel software package for the rapidgeneration of virtual combinatorial libraries. In H.-D. Höltje and W. Sippl, edi-tors, Rational Approaches to drug design, pp. 390-399. Prous Science, 2000.

IntroductionDue to the evolving technologies in the area of bioinformatics and combinatorialchemistry, the number of known targets as well as the size of compound librariesavailable is exploding. Virtual screening using pharmacophores has proved to bea major technique for pre-screening compounds from libraries against targets in-silico [1].

Aim

An algorithm for automatically extracting pharmacophores without manual inter-action from a PDB complex file containing a protein complexed with a smallorganic ligand will be presented. This algorithm is to be implemented by using theilib framework, that was previously used for implementing CombiGen[5], a tool forrapidly generating diverse, drug-like libraries. Subsequently it shall be applied toall relevant entries of the Brookhaven Protein Databank in order to generate a col-lection of pharmacophore models suitable for virtual screening.

Methods and Data Sources

Pharmacophore ModelsA pharmacophore is the three-dimensional arrangement of chemical features thatcauses activation or inhibition of the receptor. For the presented algorithm, chem-ical features include hydrogen-bond donors and acceptors as directed vectors,positive and negative ionizable areas as well as lipophilic areas represented byspheres. In order to increase selectivity, excluded volume spheres are added toreflect potential steric restrictions.

The Brookhaven Protein Databank

The Brookhaven Protein Databank (PDB) [2], currently operated by the AmericanResearch Collaboratory for Structural Bioinformatics, contains approx. 15.000proteins, peptides or viruses whose three-dimensional structures were deter-mined via X-Ray crystallography. About 3.000 co-crystallized ligands in differentconformations are contained in these complexes. They form the starting point foran extensive investigation of the chemical features relevant for drug interactionand, as a next step, pharmacophore creation.

Application Flow

1. Ligand Extraction and Interpretation

In a distributed computing environment (5-node Linux cluster), more than 20.000PDB entries were analyzed in order to find all ligands from X-ray protein comp-lexes. All relevant ligands were extracted from proteins, interpreted and storedincluding the relevant surrounding amino acids. Due to historical reasons, experi-mental limitations, and file format constraints, nei-ther bond type information, nor hydrogens areincluded in PDB ligand information. Hybridizationstate information had to be elucidated from thegeometrical positions of the ligand atoms. In con-trast to existing approaches [3,4], not only bondangles between three atoms were considered butall neighbour atoms for each atom were taken intoaccount, resulting in a quantitative geometry

0

20

40

60

80

100

0 1 2 3 4 5 6 7 8 9 10

Per

cent

age

of a

tom

s

Deviation

Distribution of uncertainty in hybridization state interpretation

Fig. 3: Interactive graphical user interface showing Cerivastatin in PDB entry 1HWJ and the generated pharmacophore(Hydrogen bond acceptors green, Hydrogen bond donors pink, lipophilic yellow, excluded volume gray, neg. ionizable blue)

Fig. 2: Uncertainty distribution corresponingto ligand interpretation from entry NPE (1AJ7)

Fig. 1: Example for a pharmacophore model (left) including relevant chemical features derived from ligand 4-(N-Acetylamino)-3-[N-(2-Ethylbutanoylamino)]benzoic acid in PDB entry 1B9S (right)