Online Resources to Support Open Drug Discovery Systems

Post on 10-May-2015

1.973 views 1 download

Tags:

description

This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.

Transcript of Online Resources to Support Open Drug Discovery Systems

Online Resources to Support Open Drug Discovery Systems

Antony Williams3rd Annual Drug Discovery Partnership: Filling the Pipeline, October 2011

Open Drug Discovery

Pharma Companies spend >$50 billion annually on R&D

How much historical data/knowledge/information is in the public domain? And where is it?

How much generated data is truly competitive? Pre-competitive and public domain data could

deliver high value to drug discovery Data mining Model-building Integrating into in-house and online systems

Internal and external content Built to meet primary use-case Tailored indexes and GUIs Internal unique language & metadata Poor interoperability/integration Powerpoint, Documents, Excel Many suppliers of systems and content in

a single workflow

Literature Patents NewsPipeline SAR CSRs SafetyIn vivo Etc

Pharma Information Tombs

What could create change?

Harvard Business Review (2010)

“One change would make a substantial difference [to drug R&D]: the creation of agreed-upon standards for digitally

representing drug assets.”

It is so difficult to navigate…

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science

PubChem

ChEMBL

ChemSpider

SciDBs Wiki

Pharma are accessing, processing, storing & re-processing

LiteraturePubChem

GenbankPatents

DatabasesDownloads

Data Integration Data AnalysisFirewalled Databases

Public Domain Drug Discovery Data

New trend: Set Data Free on the Web

Open Algorithms, Descriptors and Closed Data – Can We Unlock It?

The Innovative Medicines Initiative EC funded public-private partnership for

pharmaceutical research

Focus on key problems Efficacy Safety Education & Training Knowledge Management

Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in

pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project

Guiding principle is open access, open usage, open source- Key to standards adoption -

Guiding principle is open access, open usage, open source- Key to standards adoption -

Open PHACTS Project Partners

Example Research questions Give all compounds with IC50 < xxx for target Y in species

W and Z plus assay data

What substructures are associated with readout X (target, pathway, disease, …)

Give all experimental and clinical data for compound X

Give all targets for compound X or a compound with a similarity > y%

Prioritised Research Questions Analysis Prevalent Concepts

Compound Bioassay Target Pathway Disease

Prevalent data relationships Compound – target Compound – bioassay Bioassay – target Compound – target – mode of action Target – target classification Target – pathway and disease

Required cheminformatics functionality

– Chemical substructure searching– Chemical similarity searching

Required bioinformatics functionality

Sequence and similarity searching

Bioprofile similarity searching

Selection of prioritised data sources Chemistry

ChEMBL DrugBank ChEBI PubChem ChemSpider Human Metabolome DB Wombat (commercial)

Ontologies AmiGo (The Gene Ontology) KEGG (Kyoto Encyclopedia of Genes and Genomes) OBI (The Ontology for Biomedical Investigations) Bioassay Ontology EFO (Experimental Factor Ontology)

Biology– EntrezGene– HGNC– Uniprot– Interpro– SCOP– Wikipathways– OMIM– IUPHAR

Linking “Flavors” of Chemistry

Improve Linked Data Access… Coordinate effort to clean up chemistry related data

Open tools – require good validation studies

Support scientists making data open

Support companies/groups promoting software for data sharing

Engage community to help create what they want.

Openness and Quality IssuesWilliams and Ekins, DDT, 16: 747-750 (2011)

Science Translational Medicine 2011

Chemistry Databases on the Internet

Some public databases are “trusted” as primary sources

Trust is granted without investigation or understanding of the content

What do we know about some of the online resources?

PHYSPROP Database

The freely downloadable database under the EPI Suite prediction software

Very Basic filters suggest data quality issues

The Stereochemistry challenge.12500 chemicals with “missed” stereo

Searches on ChemSpider

Most searches are text-based: people searching for information about known chemicals

Creating accurate name-structure dictionaries is critical

NIST Webbook

PubChem

NPC Browser http://tripod.nih.gov/npc/

Cyclic Data Sharing

Data-sharing between open databases is cyclic

Synonyms on PubChem

1,3-DICHLORO-PROPAN-2-ONE

(2R,3R)-Butanediol bis(methanesulfonate)

Ethyl-1-propenyl ether, mixture of cis and trans

PSS-[2-[(Chloromethyl)phenyl]ethyl]-Heptaisobutyl substituted

1-Chlorobenzylethyl-3,5,7,9,11,13,15-heptaisobutylpentacyclo [9.5.1.1(3,9).1(5,15).1(7,13)]octasiloxane

Synonyms on PubChem

Data Proliferation

www.chemspider.com

ChemSpider…

>26 million unique molecules Links together >400 internet resources Linking patents, publications, chemical vendors and

online chemical compound databases Crowdsourced depositions and curations

ChemSpider…

>26 million unique molecules Links together >400 internet resources Linking patents, publications, chemical vendors and

online chemical compound databases Crowdsourced depositions and curations

A focus on data quality – cleaning data on the web

The structure database under Open PHACTS

Acknowledgments Sean Ekins – Collaborations in Chemistry

RSC|ChemSpider team

Open PHACTS consortium – especially Lee Harland and Carole Goble

Data depositors and curators

Software providers – ACD/Labs, OpenEye, GGA Software Inc, Open Source Cheminformatics

Thank you

Email: williamsa@rsc.org Twitter: ChemConnectorBlog: www.chemspider.com/blogPersonal Blog: www.chemconnector.comSLIDES: www.slideshare.net/AntonyWilliams