The royal society of chemistry and its adoption of semantic web technologies for chemistry at the...

Post on 11-May-2015

223 views 2 download

Tags:

description

Semantic web technologies have quickly penetrated all areas of traditional and new database systems and have become the de facto standard in information exchange and communication. The Royal Society of Chemistry has built a new chemistry data repository with the semantic web at the core of the system. Every module of the data repository contains a semantic web layer and is able to interact internally and externally using standard approaches and formats including RDF, appropriate ontologies, SPARQL querying and so on. In this presentation we will review the challenges associated with developing this new system based on semantic web technologies and how the approach that we have taken offers distinct advantages over the original data model designed to produce the ChemSpider database. Its advantages include extensibility, an ontological underpinning, federated integration and the adoption of modern standards rather than the constraints of a standard SQL model.

Transcript of The royal society of chemistry and its adoption of semantic web technologies for chemistry at the...

The Royal Society of Chemistry and its adoption of semantic web technologies for chemistry at the epoch of a federated world

Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov

ACS, 248th National Meeting

San Francisco, CA

August 11th 2014

Who is involved?

29 partners

Research questions

Research questions

ChEMBLChEMBL DrugBankDrugBank Gene Ontology

Gene Ontology WikipathwaysWikipathways

UniProtUniProt

ChemSpiderChemSpider

UMLSUMLS

ConceptWikiConceptWiki

ChEBIChEBI

TrialTroveTrialTrove

GVKBioGVKBio

GeneGoGeneGo

TR IntegrityTR Integrity

“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”

“What is the selectivity profile of known p38 inhibitors?”

“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”

Open PHACTS Explorer Web based searching interface

explorer.openphacts.org

Discovery Platform

Open PHACTS API dev.openphacts.org Applications can query the pharmacological data within Open PHACTS

Open PHACTS applicationsExternal bespoke applications using the Open PHACTS API.

chembionavigator.org

pharmatrek.org

• Compound-protein interactions • Physicochemical properties

Workflow toolsPipeline Pilot, KNIME, R

• Gene information• Biological pathways

OpenPHACTS UIhttp://explorer.openphacts.org/

ChemBioNavigator

OpenPHACTS APIhttps://dev.openphacts.org/

https://dev.openphacts.org/

KNIME

OpenPHACTS Architecture

Micro-article

Compounds

Reaction

Analytical Data

Text and References

Technical view - unification

Chemistry Validation and Standardization Platform

DrugBank dataset (6516 records)

J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10

DB06287

PubChemDrugbankChemSpider

Imatinib

Mesylate

What Is Gleevec?

Ambiguities

How is this a semantic web problem? Why can’t people just be clear?

People may be working with faulty data.

Salts, say, may make little difference to the effects of an active ingredient.

People may assume a one-to-one mapping between a gene and the gene product (protein, ncRNA) that it codes for.

What’s in a lens?

IdentifierTitle (dct:title)Description (dct:description)Documentation link (dcat:landingPage)Creator (pav:createdBy)Timestamp (pav:createdOn)

Equivalence rules (bdb:linksetJustification)

Equivalence rules

The BridgeDB vocabulary adds metadata that provides a justification for treating two URIs alike, thus allowing the researcher to determine whether their circumstances fit.

owl:sameAs ≤ skos:exactMatch ≤ skos:closeMatch ≤ rdfs:seeAlso

The ChEBI and CHEMINF ontologies provide a rich set of relations (many of which developed for this project) to relate one molecule to another.

ChEBI (http://www.ebi.ac.uk/chebi)

has partis tautomer of

CHEMINF (http://code.google.com/p/semanticchemistry/)

has component with uncharged counterparthas counterpart molecular entity

has normalized counterparthas OPS normalized counterparthas PubChem normalized counterpart

has uncharged counterpartsimilar to

similar to by PubChem 2D similarity algorithmsimilar to by PubChem 3D similarity algorithm

has same connectivity asis isotopologue ofis stereoisomer of

subClassOf (standard relation in RDF)has isotopically unspecified parenthas stereoundefined parent

Link: skos:closeMatchReason: non-salt form

Link: skos:exactMatchReason: drug name

Strict Relaxed

Analysing Browsing

skos:exactMatch(InChI)

Strict Relaxed

Analysing Exploring

23

skos:closeMatch(Drug Name)

skos:closeMatch(Drug Name)

skos:exactMatch(InChI)

What does the Open PHACTS Chemistry Registration System do?

Takes in structures from ChEMBL, ChEBI, DrugBank, PDB, Thomson Reuters.

Normalizes structures according to rules based on FDA guidelines.

Generates counterpart molecules: without charge, fragments

Chemistry Validation and Standardization Platform

Input pipeline

Compounds domain

Navigation in chemical space

Navigation in chemical space

Reactions domain

Analytical data domain

Crystallography domain

Standards

Share in a “proper way”

APIs, endpoints and widgets

Handling complex content

What’s the structure?What’s the structure?

Are they in our file?

Are they in our file?

What’s similar?What’s

similar?

What’s the target?

What’s the target?Pharmacology

data?Pharmacology

data?

Known Pathways?

Known Pathways?

Working On Now?

Working On Now?Connections

to disease?Connections to disease?

Expressed in right cell type?Expressed in

right cell type?

Competitors?Competitors?

IP?IP?

Machine learning

Thank you

Email: tkachenkov@rsc.org

Slides: http://www.slideshare.net/valerytkachenko16