2016 bmdid-mappings
-
Upload
michel-dumontier -
Category
Science
-
view
209 -
download
0
Transcript of 2016 bmdid-mappings
![Page 1: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/1.jpg)
ISWC2016:::BMDID::Dumontier1
ONTOLOGY MAPPING FOR LIFE SCIENCE LINKED DATA
Amrapali Zaveri and Michel Dumontier
Stanford Center for Biomedical Informatics ResearchStanford University
![Page 2: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/2.jpg)
2
Large and growing network of Linked Data
ISWC2016:::BMDID::DumontierLinking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
![Page 3: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/3.jpg)
ISWC2016:::BMDID::Dumontier
Linked Data for the Life Sciences
3
Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF.
chemicals/drugs/formulations, genomes/genes/proteins, domainsInteractions, complexes & pathwaysanimal models and phenotypesDisease, genetic markers, treatmentsTerminologies & publications
• 11B+ interlinked statements from 35 biomedical datasets and 400+ ontologies
• dataset description, provenance & statistics• A growing interoperable ecosystem with the EBI,
NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers
![Page 4: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/4.jpg)
ISWC2016:::BMDID::Dumontier4
Biomedical Linked Data
![Page 5: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/5.jpg)
ISWC2016:::BMDID::Dumontier5
the lack of coordination to a global schema makes Linked Data chaotic and unwieldy
![Page 6: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/6.jpg)
6
Federated queries require intimate knowledge of each dataset schema
Get all protein catabolic processes (and more specific GO terms) in biomodels
SELECT ?go ?label count(distinct ?x) WHERE { service <http://bioportal.bio2rdf.org/sparql> { ?go rdfs:label ?label . ?go rdfs:subClassOf+ ?tgo ?tgo rdfs:label ?tlabel . FILTER regex(?tlabel, "^protein catabolic process") } service <http://biomodels.bio2rdf.org/sparql> { ?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> . }}
ISWC2016:::BMDID::Dumontier
![Page 7: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/7.jpg)
ISWC2016:::BMDID::Dumontier7
uniprot:P05067
uniprot:Protein
is a
sio:gene
is a is a
Previous work involved manual mappings between Bio2RDF types and relations and the Semanticscience
Integrated Ontology (SIO)
dataset
ontology
Knowledge Base
pharmgkb:PA30917
refseq:Protein
is a
is a
omim:189931
omim:Gene pharmgkb:Gene
Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. Bio-ontologies 2012.
![Page 8: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/8.jpg)
8 ISWC2016:::BMDID::Dumontier
Semanticscience Ontology (SIO)An effective upper level ontology.1500+ classes207 object properties (inc. inverses)1 datatype property
![Page 9: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/9.jpg)
9
Bio2RDF and SIO powered SPARQL federated query: Find chemicals (from CTD) and proteins (from SGD) that
participate in the same process (from GOA)SELECT ?chem, ?prot, ?procFROM <http://bio2rdf.org/ctd>WHERE { SERVICE <http://ctd.bio2rdf.org/sparql> {
?chemical a sio:chemical-entity. ?chemical rdfs:label ?chem.?chemical sio:is-participant-in ?process. ?process rdfs:label ?proc.
FILTER regex (?process, "http://bio2rdf.org/go:") }
SERVICE <http://sgd.bio2rdf.org/sparql> {?protein a sio:protein . ?protein sio:is-participant-in ?process. ?protein rdfs:label ?prot .
}}
ISWC2016:::BMDID::Dumontier
![Page 10: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/10.jpg)
ISWC2016:::BMDID::Dumontier
Many vocabularies, ontologies and community-based standards
are now available
10
![Page 11: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/11.jpg)
ISWC2016:::BMDID::Dumontier11
PubChem uses multiple terminologies
![Page 12: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/12.jpg)
ISWC2016:::BMDID::Dumontier12
Existing limitations with Bio2RDF mappings
• New datasets have been added• Existing datasets have changed• The target ontology (SIO) has changed• The target ontology (SIO) is incomplete and there
may be better ontologies to use• These ontologies are evolving, today’s mappings
may be invalid or imprecise tomorrow• Manual process -> not easy and not reproducible
-> must automate
![Page 13: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/13.jpg)
ISWC2016:::BMDID::Dumontier13
Goal
Develop a semi-automated procedure to generate high quality mappings between Bio2RDF and SIO.
![Page 14: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/14.jpg)
ISWC2016:::BMDID::Dumontier14
approach
distance metrics
graph-based
instance-based
BioPortal
crowdsourcing
previous work*Our work
Automated Manual
![Page 15: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/15.jpg)
ISWC2016:::BMDID::Dumontier
Idea: Create mappings between SIO and Bio2RDF using ontologies in BioPortal
15
Bio2RDF
NCBO Annotator/Recommender
SIO
![Page 16: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/16.jpg)
ISWC2016:::BMDID::Dumontier
Bio2RDF-SIO mappings via transitive closure through BioPortal ontologies
16
Bio2RDF
SIO
Super Class
Mapped Class
match
![Page 17: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/17.jpg)
ISWC2016:::BMDID::Dumontier
Results
17
319 (of 6093) classespruned
1 NCBO Annotator 174 Bio2RDF classesmatched directly and exactly to SIO
2 NCBO Recommender94 Bio2RDF classes matched toBioPortal ontologies
Bio2RDFremove blank nodes, general resources, OWL vocabulary & non-Bio2RDF types/relations.
![Page 18: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/18.jpg)
ISWC2016:::BMDID::Dumontier
Results
18
SIO1500 classes
475 BioPortalOntologies3
393 BioPortal ontologiesmatched to SIO
![Page 19: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/19.jpg)
ISWC2016:::BMDID::Dumontier
Results
19
Bio2RDF319 classes
4 Traverse hierarchySIO1500 classes
393 BioPortal ontologiesmatched to SIO
94 Bio2RDF classes matched toBioPortal ontologies
![Page 20: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/20.jpg)
ISWC2016:::BMDID::Dumontier
Results
20
Bio2RDF319 classes
4 Traverse hierarchy
SIO1500 classes
393 BioPortal ontologiesmatched to SIO
94 Bio2RDF classes matched toBioPortal ontologies
71 matches
Mapped class
Super class
![Page 21: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/21.jpg)
ISWC2016:::BMDID::Dumontier
Results — Example
21
Bio2RDFclass
clinicaltrials:Clincial-Study
Super class
Edda:Study_Design
Mapped class
edda:clinical_trial
SIOclass
sio:001041| (study design)
skos:broader
![Page 22: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/22.jpg)
ISWC2016:::BMDID::Dumontier
Mappings often occurred to more than one class
22
sider:Drug-Indication-Association
sio:010038 (drug)
sio:010299 (disease)
sio:000897 (association)
![Page 23: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/23.jpg)
ISWC2016:::BMDID::Dumontier
Manual validation of mappings
23
Bio2RDF Class SIO Class Annotation
drugbank:Biotech no match
clinicaltrials:Organization sio:00012 (organization) exact
drugbank:toxicity sio:001008 (toxicity) exact
sgd:GlycineCount sio:000794 (count) partial – is-a
wormbase:Genetic-Interaction sio:010035 (gene) partial – part-of
clinicaltrials:Serious-Event sio:000614 (attribute) incorrect
drugbank:Source sio:000510 (model) incorrect
All results available at https://goo.gl/eiijmQ
![Page 24: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/24.jpg)
ISWC2016:::BMDID::Dumontier
Conclusion
• Developed a semi-automated methodology to map Bio2RDF classes to SIO via BioPortal ontologies
• 245 of 319 Bio2RDF classes matched to SIO
24
![Page 25: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/25.jpg)
ISWC2016:::BMDID::Dumontier
Limitations
• Unmatched classes: neither SIO nor other ontologies have complete coverage
• Overly general concepts: Semantically incompatible classes
• Incorrect mappings: Matches to part of the class
• Mappings are insufficient to precisely to retrieve data across different datasets
25
![Page 26: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/26.jpg)
ISWC2016:::BMDID::Dumontier
Future Work
• Extend SIO to include classes that are ultimately not found
• Explore mid-level portion of SIO to eliminate root level mappings
• Scalable validation by via crowdsourcing• Pursue query rewriting
26
![Page 27: 2016 bmdid-mappings](https://reader036.fdocuments.in/reader036/viewer/2022062311/58793efc1a28ab23468b5c5f/html5/thumbnails/27.jpg)
ISWC2016:::BMDID::Dumontier27
Website: http://dumontierlab.com