Connecting antimalarial data

1
Discussion Christopher Southan IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb), Centre for Integrative Physiology, The University of Edinburgh, EH8 9XD, UK. http:// www.slideshare.net/cdsouthan/connecting-antimalarial-data As outlined in the introduction to the CINF Symposium, among Jean-Claude Bradley’s achievements, his work on Open Notebook Science (ONS) (https:// en.wikipedia.org/wiki/Open_notebook_science ) has not only perhaps the largest impact but the ripple effect continues to broaden. This is particularly the case in Open Source Drug Discovery, OSDD (used here as a generic term not specific to any group) where ONS forms a core enablement for the movement (PMID:23985301). This is a radical departure from what we can call Traditional Closed Drug Discovery (TCDD). While boundaries between these camps are blurred, the use of ONS is a clear differentiator in the philosophy of real-time data surfacing (typically via an Electronic Laboratory Notebook ELN). This means that teams can intersect with, share and optimise any chemical space since they are no longer competitively compelled to IP-protect lead structures. The domain of small-molecule malaria treatments has become a poster child for OSDD and also spawned the “Box” concept of physically distributable active compound sets. Opening up and connecting antimalarial data: Progress with caveats An ACS SciMix contribution from the CINF session: The Growing Impact of Openness in Chemistry: A Symposium in Honour of JC Bradley Jean-Claude Bradley’s pioneering of ONS has the potential to shorten lead discovery and optimisation by years. Consequently it will bring more new medicines to more patients faster. This is not restricted to NTDs but is likely to be adopted by rare disease consortia. Notwithstanding, as a proportion of the current antimalarial chemical estate, the ONS contribution is small. Notably, the majority of lead SAR is still instantiated in patents and papers from the TCDD motus operandi . This was the reason why curating leads for the PB remained a typically arduous exercise (that we are used to at GtoPdb). It is also important to note that impediments to findability and connectivity of molecular relationships in the “system” (including target and pathway mapping) remain serious concerns for malaria and other OSDD domains. In the context of drug discovery ONS, like any other approach, has its caveats. The main one is that real-time data (hot off the instruments or just out of the fume hood) tends to be unstructured and confirmations pending. In this situation of “positive collaborative anarchy” across different global teams ONS data can be difficult to find, provenance, verify, curate, standardise and mine. Of course, a similar constellation of informatics challenges also arises from TCDD but (on a good day) open (e.g. in PubChem direct or curated from the literature via ChEMBL and/or GtoPdb) SAR may surface in a minable form even if some years after the fact. Notwithstanding, the major acceleration that ONS facilitates will ensure its expansion that will include new drug discovery commercial gaps with unmet clinical needs, as a fitting legacy of Jean-Claude Bradley’s innovation. As context for this invited presentation, while my day-job is working for the Edinburgh GtoPdb team, I have donated a small amount of voluntary support to the Sydney OSM team since 2012 (https:// www.thinkable.org/submission/2136 at 1.54 on the video). This has focused mainly on chemical structure searching, data organisation and surfacing Figure 1. Comparing the Genome Ontology function splits between all human proteins (left, 20,198) and the GtoPdb targets with small-molecule quantitative interactions (right, 978) Following on from the award-winning success of the Medicines for Malaria Ventures (MMV) Malaria Box of 400 compounds (http :// www.mmv.org/malariabox ) a Pathogen Box (PB) is in preparation for a range of Neglected Tropical Diseases (NTDs) in addition to malaria (http ://pathogenbox.org / ). Since I had already highlighted the vicissitudes of establishing the explicit molecular identities of published malaria leads in several blog posts I extended these to 28 structures for possible inclusion in the PB (http :// cdsouthan.blogspot.se/2014/06/getting-into-box-with-so me-recent.html ) the first page of which is shown below. The challenges of curating leads for the PB were similar to those encountered by the GtoPdb team for human targets and their ligands on a daily basis (PMID:24234439). They were in fact somewhat worse, as reflected in the statistics of the 22 PubChem CIDs linked below http:// www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan. 1/collections/48358242/public/ . Quirks encountered are detailed in the blog post but included; The 6 structures not in PubChem are de facto unfindable in open dbs but some may get Google InChIKey matches via chemicalize.org cache The only systematic identifier encountered was the IUPAC name which often had to be dug out of the supplementary data as in blog page on the left (i.e. neither SMILES nor InChI in papers or patents) No authors made direct database submissions The code name was often not a PubChem synonym ChEMBL had picked up 16 with data > to PubChem BioAssay • 13 had patent-extraction matches and 11 chemical vendor matches The MeSH annotation had only linked two directly to PMIDs Out of the documents and into the Box Introductio n RESULTS (3) Finding structures and linking data from the Sydney University OSM team and their collaborators (http ://opensourcemalaria.org / ) is much easier than for the PB 28. This is primarily because of their adoption of ONS, Google docs, other surfacing routes and direct submissions to ChEMBL. This is illustrated for MMV670437 (as an OSM 44 nM lead in the 28) by simply Googling the inner InChIKey layer (PMID:23399051). Matches (including below left) returned in 0.35 sec, include PubChem, OSM in GitHub and my blog. The PubChem SID (below right) with MMV code is my submission. Further ONS utility is exemplified by the surfacing of 250 project structures in the first link below. The second link maps 167 of these into PubChem https :// docs.google.com/spreadsheets/d/1Rvy6OiM291d1GN_cyT6eSw_C3lSuJ1jaR7AJa8hgGsc/edit#gid=510 297618 http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48338932/publ ic / Connecting up with Open Source Malaria (OSM)

Transcript of Connecting antimalarial data

Page 1: Connecting antimalarial data

Discussion

Christopher Southan IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb), Centre for Integrative Physiology, The University of Edinburgh, EH8 9XD, UK. http://www.slideshare.net/cdsouthan/connecting-antimalarial-data

As outlined in the introduction to the CINF Symposium, among Jean-Claude Bradley’s achievements, his work on Open Notebook Science (ONS) (https://en.wikipedia.org/wiki/Open_notebook_science) has not only perhaps the largest impact but the ripple effect continues to broaden. This is particularly the case in Open Source Drug Discovery, OSDD (used here as a generic term not specific to any group) where ONS forms a core enablement for the movement (PMID:23985301). This is a radical departure from what we can call Traditional Closed Drug Discovery (TCDD). While boundaries between these camps are blurred, the use of ONS is a clear differentiator in the philosophy of real-time data surfacing (typically via an Electronic Laboratory Notebook ELN). This means that teams can intersect with, share and optimise any chemical space since they are no longer competitively compelled to IP-protect lead structures. The domain of small-molecule malaria treatments has become a poster child for OSDD and also spawned the “Box” concept of physically distributable active compound sets.

Opening up and connecting antimalarial data: Progress with caveats

An ACS SciMix contribution from the CINF session: The Growing Impact of Openness in Chemistry: A Symposium in Honour of JC Bradley

Jean-Claude Bradley’s pioneering of ONS has the potential to shorten lead discovery and optimisation by years. Consequently it will bring more new medicines to more patients faster. This is not restricted to NTDs but is likely to be adopted by rare disease consortia. Notwithstanding, as a proportion of the current antimalarial chemical estate, the ONS contribution is small. Notably, the majority of lead SAR is still instantiated in patents and papers from the TCDD motus operandi. This was the reason why curating leads for the PB remained a typically arduous exercise (that we are used to at GtoPdb). It is also important to note that impediments to findability and connectivity of molecular relationships in the “system” (including target and pathway mapping) remain serious concerns for malaria and other OSDD domains. In the context of drug discovery ONS, like any other approach, has its caveats. The main one is that real-time data (hot off the instruments or just out of the fume hood) tends to be unstructured and confirmations pending. In this situation of “positive collaborative anarchy” across different global teams ONS data can be difficult to find, provenance, verify, curate, standardise and mine. Of course, a similar constellation of informatics challenges also arises from TCDD but (on a good day) open (e.g. in PubChem direct or curated from the literature via ChEMBL and/or GtoPdb) SAR may surface in a minable form even if some years after the fact. Notwithstanding, the major acceleration that ONS facilitates will ensure its expansion that will include new drug discovery commercial gaps with unmet clinical needs, as a fitting legacy of Jean-Claude Bradley’s innovation.

As context for this invited presentation, while my day-job is working for the Edinburgh GtoPdb team, I have donated a small amount of voluntary support to the Sydney OSM team since 2012 (https://www.thinkable.org/submission/2136 at 1.54 on the video). This has focused mainly on chemical structure searching, data organisation and surfacing strategies. In addition, I blog occasionally on the themes of data connectivity in general and for antimalarial leads in particular. For the record, MMV have thanked me for contributing the 28 structures. By various criteria they will not all go in to the PB but I hope to find out the inclusions.

Figure 1. Comparing the Genome Ontology function splits between all human proteins (left, 20,198) and the GtoPdb targets with small-molecule quantitative interactions (right, 978)Following on from the award-winning success of the Medicines for

Malaria Ventures (MMV) Malaria Box of 400 compounds (http://www.mmv.org/malariabox) a Pathogen Box (PB) is in preparation for a range of Neglected Tropical Diseases (NTDs) in addition to malaria (http://pathogenbox.org/). Since I had already highlighted the vicissitudes of establishing the explicit molecular identities of published malaria leads in several blog posts I extended these to 28 structures for possible inclusion in the PB (http://cdsouthan.blogspot.se/2014/06/getting-into-box-with-some-recent.html) the first page of which is shown below.

The challenges of curating leads for the PB were similar to those encountered by the GtoPdb team for human targets and their ligands on a daily basis (PMID:24234439). They were in fact somewhat worse, as reflected in the statistics of the 22 PubChem CIDs linked belowhttp://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48358242/public/. Quirks encountered are detailed in the blog post but included;

• The 6 structures not in PubChem are de facto unfindable in open dbs but some may get Google InChIKey matches via chemicalize.org cache

• The only systematic identifier encountered was the IUPAC name which often had to be dug out of the supplementary data as in blog page on the left (i.e. neither SMILES nor InChI in papers or patents)

• No authors made direct database submissions

• The code name was often not a PubChem synonym

• ChEMBL had picked up 16 with data > to PubChem BioAssay

• 13 had patent-extraction matches and 11 chemical vendor matches

• The MeSH annotation had only linked two directly to PMIDs

Out of the documents and into the Box Introduction

RESULTS (3)Finding structures and linking data from the Sydney University OSM team and their collaborators (http://opensourcemalaria.org/) is much easier than for the PB 28. This is primarily because of their adoption of ONS, Google docs, other surfacing routes and direct submissions to ChEMBL. This is illustrated for MMV670437 (as an OSM 44 nM lead in the 28) by simply Googling the inner InChIKey layer (PMID:23399051). Matches (including below left) returned in 0.35 sec, include PubChem, OSM in GitHub and my blog. The PubChem SID (below right) with MMV code is my submission.

Further ONS utility is exemplified by the surfacing of 250 project structures in the first link below. The second link maps 167 of these into PubChemhttps://docs.google.com/spreadsheets/d/1Rvy6OiM291d1GN_cyT6eSw_C3lSuJ1jaR7AJa8hgGsc/edit#gid=510297618http://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/48338932/public/

Connecting up with Open Source Malaria (OSM)