Scientific Lenses over Linked Data An approach to support multiple integrated views
-
Upload
alasdair-gray -
Category
Science
-
view
154 -
download
2
description
Transcript of Scientific Lenses over Linked Data An approach to support multiple integrated views
Scientific Lenses over Linked DataAn approach to support multiple integrated views
Alasdair J G [email protected]
alasdairjggray.co.uk
@gray_alasdair
Open PHACTS Use Case
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
Chemical Properties (Chemspider)
Launched drugs (Drugbank)
Human => Mouse (Homologene)
Protein Families (Enzyme)
Bioactivty Data (ChEMBL)
… other info (Uniprot/Entrez etc.)
“Let me compare MW, logP
and PSA for launched
inhibitors of human &
mouse oxidoreductases”
16 October 2014 Scientific Lenses – A. J. G. Gray 1
Discovery Platform
16 October 2014 Scientific Lenses – A. J. G. Gray 2
Drug Discovery Platform
Apps
Domain API
Interactive
responses
Production quality
integration platform
Method
Calls
App EcosystemAn “App Store”?
http://www.openphactsfoundation.org/apps.html
Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium
MOE Collector Cytophacts Utopia Garfield SciBite
KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna
16 October 2014
API Hits
16 October 2014 Scientific Lenses – A. J. G. Gray 4
April 2013 – March 2014: 15.8m
April 2014 – Sept 2014: 14m
Total: 29.8 million
Linked Data API
16 October 2014 Scientific Lenses – A. J. G. Gray 5
Drug
Disease (1.4)
PathwayTarget
https://dev.openphacts.org/
Source Initial Records Triples Properties
ChEMBL 1,481,473 304,360,749 77
DrugBank 19,628 517,584 74
UniProt 564,246 405,473,138 82
ENZYME 6,187 73,838 2
ChEBI 40,575 1,673,863 2
GeneOntology 38,137 2,447,682 26
GOA 661,232 1,765,622,393 15
ChemSpider 1,361,568 215,193,441 23
ConceptWiki 2,828,966 4,291,131 1
WikiPathways 946 1,949,074 34
Open PHACTS Data
16 October 2014 Scientific Lenses – A. J. G. Gray 6
14 January 2013OPS Dataset Descriptions – A. J.
G. Gray 7
Dataset Descriptions in the Open Pharmacological Space
Being replaced by W3C
HCLS community profile
http://tiny.cc/hcls-datadesc-ed
OPS Discovery Platform
Nanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
Co
re P
latf
orm
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps
Multiple Identities
P12047X31045
GB:29384
16 October 2014 Scientific Lenses – A. J. G. Gray
Andy Law's Third Law
“The number of unique identifiers assigned to an individual is
never less than the number of Institutions involved in the study”http://bioinformatics.roslin.ac.uk/lawslaws/
9
Are these the
same thing?
Gleevec®: Imatinib Mesylate
16 October 2014 Scientific Lenses – A. J. G. Gray 10
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
Gleevec®: Imatinib Mesylate
16 October 2014 Scientific Lenses – A. J. G. Gray 11
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib Mesylate
YLMAHDNUQAMNNX-UHFFFAOYSA-N
Are these records the same?
It depends upon your task!
BRCA1: Chromosome 17Breast cancer type 1 susceptibility protein
16 October 2014 Scientific Lenses – A. J. G. Gray 12
http://en.wikipedia.org/wiki/File:Protei
n_BRCA1_PDB_1jm7.pnghttp://en.wikipedia.org/wiki/File:BRCA1
_en.png
Genes == Proteins?
BRCA1: Chromosome 17Breast cancer type 1 susceptibility protein
16 October 2014 Scientific Lenses – A. J. G. Gray 13
http://en.wikipedia.org/wiki/File:Protei
n_BRCA1_PDB_1jm7.pnghttp://en.wikipedia.org/wiki/File:BRCA1
_en.png
Genes == Proteins?
Are these records the same?
It depends upon your task!
Example Use Cases
16 October 2014 Scientific Lenses – A. J. G. Gray 14
I need to perform an
analysis, give me details
of the active compound
in Gleevec.
Which targets are
known to interact
with Gleevec?
Scientific Lenses – A. J. G. Gray 15
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Structure Lens
16 October 2014
I need to perform an analysis, give me
details of the active compound in
Gleevec.
Scientific Lenses – A. J. G. Gray 16
skos:closeMatch
(Drug Name)
skos:closeMatch
(Drug Name)
skos:exactMatch
(InChI)
Strict Relaxed
Analysing Browsing
Name Lens
16 October 2014
Which targets are known to interact
with Gleevec?
What is a Scientific Lens?
A lens defines a conceptual view over the data
Specifies operational equivalence conditions
Consists of:
Identifier (URI)
Title (dct:title)
Description (dct:description)
Documentation link (dcat:landingPage)
Creator (pav:createdBy)
Timestamp (pav:createdOn)
Equivalence rules (bdb:linksetJustification)
16 October 2014 Scientific Lenses – A. J. G. Gray 17
CHEMBL427526
CHEMBL521CHEMBL175
Lens Effects: Ibuprofen
16 October 2014 Scientific Lenses – A. J. G. Gray 18
Ibuprofen consists of two equally active stereoisomers.
• Stereoisomers not always represented in data
Users wish to retrieve information for any stereoisomer.
Default Lens
16 October 2014 Scientific Lenses – A. J. G. Gray 19
Ibuprofen consists of two equally active stereoisomers.
• Stereoisomers not always represented in data
Users wish to retrieve information for any stereoisomer.
Stereoisomer Lens
16 October 2014 Scientific Lenses – A. J. G. Gray 20
Ibuprofen consists of two equally active stereoisomers.
• Stereoisomers not always represented in data
Users wish to retrieve information for any stereoisomer.
Mapping Generation
16 October 2014 Scientific Lenses – A. J. G. Gray 21
ops:OPS437281
✔
ops:OPS380297
has_stereoundefined_parent[ci:CHEMINF_000456]
ops:OPS380292
is_stereoisomer_of[ci:CHEMINF_000461]
Other relationships
• has part
• is tautomer of
• uncharged counterpart
• isotope
…
Initial Connectivity
16 October 2014 Scientific Lenses – A. J. G. Gray 22
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7
Scientific Lenses – A. J. G. Gray 23
Compound Information
16 October 2014
Proceed with Caution!
16 October 2014 Scientific Lenses – A. J. G. Gray 24
Co-reference Computation
Rules ensure
Unrestricted
transitivity within
conceptual type
Restrict crossing
conceptual types
Based on justifications
Provenance captured
16 October 2014 Scientific Lenses – A. J. G. Gray 25
0..*
0..*
0..*
0..1
0..1
Initial Connectivity
16 October 2014 Scientific Lenses – A. J. G. Gray 26
Datasets 37
Linksets 104
Links 7,096,712
Justification
s
7
Inferred Connectivity
16 October 2014 Scientific Lenses – A. J. G. Gray 27
Datasets 37
Linksets 883
Links 17,383,846
Justifications 7
BridgeDb
16 October 2014 Scientific Lenses – A. J. G. Gray 28
?iri cheminf:logd ?logd .
FILTER (?iri = cw:979b545d-f9a9 ||
?iri = cs:2157 ||
?iri = chembl:1280 ||
?iri = db:db00945 )
GRAPH <http://rdf.chemspider.com> {
}
GRAPH <http://…
cw:979b545d-f9a9 cheminf:logd ?logd .
Identity
Mapping
Service(BridgeDB)
Query
Expander
Service
Profiles
Mappings
Q, L1 Q’
[cw:979b545d-f9a9,
cs:2157,
chembl:1280,
db:db00945]
cw:979b545d-f9a9, L1
cw:979b545d-f9a9 cheminf:logd ?logd .
Lenses: Under the hood
• Can also be achieved through UNION
• IMS call adds overhead
16 October 2014 Scientific Lenses – A. J. G. Gray 29
Experiment
Is it feasible to use a stand-off
mapping service? Base lines (no external call):
“Perfect” URIs
Linked data querying
Expansion approaches (external service
call):
FILTER by Graph
UNION by Graph
C. Y. A. Brenninkmeijer, C. A. Goble, A. J. G. Gray, P. T. Groth, A. Loizou, S.
Pettifer: Including Co-referent URIs in a SPARQL Query. COLD 2013.
http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf
“Perfect” URI Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
chembl_mol:m1280 cheminf:mw ?mw .
}
}
16 October 2014 Scientific Lenses – A. J. G. Gray 31
Linked Data Baseline
WHERE {
GRAPH <chemspider> {
cs:2157 cheminf:logp ?logp .
}
GRAPH <chembl> {
?chemblid cheminf:mw ?mw .
}
cs:2157 skos:exactMatch ?chemblid .
}
16 October 2014 Scientific Lenses – A. J. G. Gray 32
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
16 October 2014 Scientific Lenses – A. J. G. Gray 33
Queries
Drawn from Open PHACTS API:
1. Simple compound information (1)
2. Compound information (1)
3. Compound pharmacology (M)
4. Simple target information (1)
5. Target information (1)
6. Target pharmacology (M)
16 October 2014 Scientific Lenses – A. J. G. Gray 34
Data:
167,783,592 triples
Mappings:
2,114,584 triples
Lenses:
1
Experiment Data
16 October 2014 Scientific Lenses – A. J. G. Gray 35
Average execution times
Average execution times
0.0
18
Q6: Target Pharmacology
Explorer Screenshot
16 October 2014 Scientific Lenses – A. J. G. Gray 45
Explorer Screenshot
16 October 2014 Scientific Lenses – A. J. G. Gray 46
Conclusions
Scientific data is complex and messy
Requires flexibility in linking
Equivalence depends upon context
Lenses provide support for operation
equivalence
Chemical structures support automatic
computing of links with justification
16 October 2014 Scientific Lenses – A. J. G. Gray 47
Acknowledgements
Royal Society of Chemistry
Colin Batchelor
Karen Karapetyan
Jon Steele
Valery Tkachenko
Antony Williams
University of Manchester
Christian Brenninkmeijer
Ian Dunlop
Carole Goble
Steve Pettifer
Robert Stevens
Swiss Institute for Bioinformatics
Christine Chichester
European Bioinformatics Institute
Mark Davies
Anna Gaulton
John Overington
University of Vienna
Daniela Digles
Maastricht University
Chris Evelo
Andra Waagmeester
Egon Willighagen
VU University of Amsterdam
Paul Groth
Antonis Loizou
Connected Discovery
Lee Harland
16 October 2014 Scientific Lenses – A. J. G. Gray 48
Questions
Alasdair J G [email protected]
alasdairjggray.co.uk
@gray_alasdair
Open [email protected]
openphacts.org
@open_phacts