Post on 08-May-2015
description
Using SPARQL to Query BioPortal Ontologies and Metadata
Manuel Salvadores, Matthew Horridge, Paul R. Alexander, Ray W. Fergerson, Mark A. Musen, and Natasha F. Noy
ISWC 2012 Boston, US
Stanford Center for Biomedical Informatics Research (BMIR)Stanford University
sparql.bioontology.org1
Tuesday, November 13, 12
2Tuesday, November 13, 12
3
The main entry point to BioPortal data are the REST APIs.
Tuesday, November 13, 12
3
The main entry point to BioPortal data are the REST APIs.
Via REST services we cannot offer answers to queries that require fine access to the data.
Tuesday, November 13, 12
3
The main entry point to BioPortal data are the REST APIs.
Via REST services we cannot offer answers to queries that require fine access to the data.
SPARQL finer data access
Tuesday, November 13, 12
3
The main entry point to BioPortal data are the REST APIs.
Via REST services we cannot offer answers to queries that require fine access to the data.
Challenges, opportunities and lessons learnt with sparql.bioontology.org
SPARQL finer data access
Tuesday, November 13, 12
4Tuesday, November 13, 12
5
Our SPARQL endpoint is different from others because our data are primarily ontologies themselves and not data about individuals.
Still lessons learnt apply to other domains, we have to deal with:
performancescalabilityheterogeneity query articulation
Tuesday, November 13, 12
6
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
6
1. Retrieval of common attributes and how simple reasoning can help.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
6
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
6
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
6
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
6
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
6
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
BioPortal Data• Ontology Content
• OBO Format
• Rich Release Format (RRF)
• OWL
• Ontology Metadata
• Mapping Data
7Tuesday, November 13, 12
BioPortal Data• Ontology Content
• OBO Format
• Rich Release Format (RRF)
• OWL
• Ontology Metadata
• Mapping Data
7
RDF
Tuesday, November 13, 12
BioPortal Data• Ontology Content
• OBO Format
• Rich Release Format (RRF)
• OWL
• Ontology Metadata
• Mapping Data
7
RDF
Triple Store
Tuesday, November 13, 12
BioPortal Data• Ontology Content
• OBO Format
• Rich Release Format (RRF)
• OWL
• Ontology Metadata
• Mapping Data
7
RDF
Triple Store
SPARQL
Tuesday, November 13, 12
RDF - Ontology Metadata
ontology/1353
ontology/46896
ontology/46116
ontology/42122
meta:hasVersion
name
date
format
(....)
meta:VirtualOntology omv:Ontologyversion
meta:hasDataGraph
<http://bioportal.bioontology.org/ontologies/SNOMED>
meta:hasVersion
meta:hasVersion
8Tuesday, November 13, 12
RDF - Mappings<http://purl.bioontology.org/mapping/2767e8e0-001b-012e-749f-005056bd0010> maps:has_process_info <.../procinfo/2008-04-23-38138> ; maps:comment "Manual mappings between Mouse anatomy and NCIT." ; maps:relation skos:closeMatch ; maps:target <http://purl.org/obo/owl/MA#MA_0001096> ; maps:source <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Olfactory_Nerve> ; maps:source_ontology_id <http://bioportal.bioontology.org/ontologies/1032> ; maps:target_ontology_id <http://bioportal.bioontology.org/ontologies/1000> ; a maps:One_To_One_Mapping .
<http://purl.bioontology.org/mapping/nonloom/procinfo/2008-04-23-38138> maps:date "2008-04-23T19:21:45Z"^^xsd:dateTime ; maps:mapping_source "Organization" ; maps:mapping_source_contact_info "http://www.nlm.nih.gov" ; maps:mapping_source_name "NLM" ; maps:mapping_source_site <http://www.nlm.nih.gov> ; maps:mapping_type "Manual" ; maps:submitted_by 38138 .
Mapping
Noy, N.F., Griffith, N., Musen, M.A.: Collecting community-based mappings in an ontology repository. In: International Semantic Web Conference. pp. 371–386 (2008)
9Tuesday, November 13, 12
10
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
Common Attributes in BP Ontologies
taxonomies
preferred labels
synonyms
definitions
11Tuesday, November 13, 12
Common Attributes in BP Ontologies
taxonomies
preferred labels
synonyms
definitions
11
rdfs:subClassOf
Tuesday, November 13, 12
Common Attributes in BP Ontologies
taxonomies
preferred labels
synonyms
definitions
11
rdfs:subClassOf
rdfs:subPropertyOf
Tuesday, November 13, 12
Common Attributes in BP Ontologies
taxonomies
preferred labels
synonyms
definitions
12
rdfs:subClassOf
rdfs:subPropertyOf
Tuesday, November 13, 12
BP Taxonomies
Almost every ontology in BioPortal uses rdfs:subClassOf to record class hierarchies.
13
We offer rdfs:subClassOf reasoning to collect hierarchy closures.
Tuesday, November 13, 12
BP Taxonomies
Almost every ontology in BioPortal uses rdfs:subClassOf to record class hierarchies.
13
We offer rdfs:subClassOf reasoning to collect hierarchy closures.
backward-chainoff by default
Tuesday, November 13, 12
BP Taxonomies2 use cases and their challenges
partial traversal
hierarchies with mappings
14Tuesday, November 13, 12
With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.
hierarchies and mappings
15
"malignant hyperthermia"Human Disease Ontology
Tuesday, November 13, 12
With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.
hierarchies and mappings
15Tuesday, November 13, 12
With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.
hierarchies and mappings
15Tuesday, November 13, 12
With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.
hierarchies and mappings
15Tuesday, November 13, 12
partial traversalSome applications need to traverse the hierarchy for a fixed number of steps.
16Tuesday, November 13, 12
partial traversalSome applications need to traverse the hierarchy for a fixed number of steps.
16Tuesday, November 13, 12
Common Attributes in BP Ontologies
taxonomies
preferred labels
synonyms
definitions
17
rdfs:subClassOf
rdfs:subPropertyOf
Tuesday, November 13, 12
Common Attributes in BP Ontologies
taxonomies
preferred labels
synonyms
definitions
18
rdfs:subClassOf
rdfs:subPropertyOf
Tuesday, November 13, 12
BP preferred labels, synonyms and definitions
34 ontologies record preferred labels, synonyms and definitions using their own predicates.
When ontology authors upload ontologies into BioPortal they have to choose what are the predicates that represent these attributes.
19Tuesday, November 13, 12
BP preferred labels, synonyms and definitions
We provide uniform access to these proper ties by linking these different properties to the standard SKOS properties using rdfs:subPropertyOf.
We assert these links in a graph named “globals”
skos:prefLabel skos:altLabel skos:definition
rdfs:label
pref. label predicates
alt. label predicates
definition predicates
user defined
SKOS
Tuesday, November 13, 12
By including the rdfs:subPropertyOf links in “globals” we do not need to know what property is used in NIF-RTH to retrieve preferred labels.
21Tuesday, November 13, 12
By including the rdfs:subPropertyOf links in “globals” we do not need to know what property is used in NIF-RTH to retrieve preferred labels.
21Tuesday, November 13, 12
22
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
22
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
22
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
Complex Query Articulation
:x owl:equivalentClass [ owl:Class; owl:unionOf ( :Class0 :Class1 :Class2 ) ] .
Anon0
Anon1
owl:unionOf
Class0
Anon2
rdf:first
rdf:rest
rdf:first
Class1
Anon3rdf:restrdf:rest
rdf:nil
rdf:first
Class2
RDF Turtle Serialization
rdf:type
owl:Classx
owl:equivalentClass
RDF Model Representation
EquivalentClasses( :x ObjectUnionOf( :Class1 :Class2 :Class3 ) ) . Functional Syntax
23Tuesday, November 13, 12
obo:VO_0000001 a owl:Class ; rdfs:label "vaccine" ; rdfs:seeAlso "MeSH: D014612" ; obo:IAO_0000115 "A vaccine is a processed (...) " ; obo:IAO_0000116 "Many vaccines are developed (...) " ; obo:IAO_0000117 "YH, BP, BS, MC, LC, XZ, RS" ; rdfs:subClassOf obo:OBI_0000047 ; owl:equivalentClass [ a owl:Class ; owl:intersectionOf (obo:OBI_0000047 [ a owl:Restriction ; owl:onProperty obo:BFO_0000085 ; owl:someValuesFrom [ a owl:Class ; owl:intersectionOf (obo:VO_0000278 [ a owl:Restriction ; owl:onProperty obo:BFO_0000054 ; owl:someValuesFrom obo:VO_0000494 ] ) ] ] [ a owl:Restriction ; owl:onProperty obo:OBI_0000312 ; owl:someValuesFrom obo:VO_0000590 ] ) ] .
Example of a relatively complex OWL construction from the Vaccine Ontology
24Tuesday, November 13, 12
25
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
25
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
25
1. Retrieval of common attributes and how simple reasoning can help.
2. Complexity of query articulation when targeting OWL complex constructions.
3. Best practices in using an open shared endpoint:I. Selective queries work better.
II. Careful with large result sets. Paginate.
III. How the client reads matter.
Challenges, opportunities and lessons learn with sparql.bioontology.org
Tuesday, November 13, 12
Best practices in using a shared SPARQL endpoint
selective queries work better
control size of result sets - pagination
how clients read matters
26Tuesday, November 13, 12
selective queries work better
27
Tuesday, November 13, 12
selective queries work better
27
Tuesday, November 13, 12
selective queries work better
27
for each ?p
Tuesday, November 13, 12
control size of result sets - pagination
28Tuesday, November 13, 12
control size of result sets - pagination
28
while len(results) == LIMIT
OFFSET += LIMIT
Tuesday, November 13, 12
how clients read matters
Use libraries that parse the result set on demand
retrieval of all preferred labels from NCBI Taxonomy (500K solutions)
0
32.5
65.0
97.5
130.0
XML JSON
output size in MB
0
15
30
45
60
JSON+Python JSON+CJSON XML+Jena ARQ XML+Sesame
parsing time in seconds
29Tuesday, November 13, 12
30
Using SPARQL to Query BioPortalOntologies and Metadata
Manuel Salvadores, Matthew Horridge, Paul R. Alexander,Ray W. Fergerson, Mark A. Musen, and Natalya F. Noy
Stanford Center for Biomedical Informatics ResearchStanford University, US
{manuelso,matthew.horridge,palexander,ray.fergerson,musen,noy}@stanford.edu
Abstract. BioPortal is a repository of biomedical ontologies—the largestsuch repository, with more than 300 ontologies to date. This set includesontologies that were developed in OWL, OBO and other languages, aswell as a large number of medical terminologies that the US National Li-brary of Medicine distributes in its own proprietary format. We have pub-lished the RDF based serializations of all these ontologies and their meta-data at sparql.bioontology.org. This dataset contains 203M triples,representing both content and metadata for the 300+ ontologies; and 9Mmappings between terms. This endpoint can be queried with SPARQLwhich opens new usage scenarios for the biomedical domain. This paperpresents lessons learned from having redesigned several applications thattoday use this SPARQL endpoint to consume ontological data.
Keywords: Ontologies, SPARQL, RDF, Biomedical, Linked Data
1 SPARQL In Use In BioPortal:Overview of Opportunities and Challenges
Ontology repositories act as a gateway for users who need to find ontologies fortheir applications. Ontology developers submit their ontologies to these reposi-tories in order to promote their vocabularies and to encourage inter-operation.In biomedicine, cultural heritage, and other domains, many of the ontologies andvocabularies are extremely large, with tens of thousands of classes.
In our laboratory, we have developed BioPortal, a community-based ontologyrepository for biomedical ontologies [11]. Users can publish their ontologies toBioPortal, submit new versions, browse the ontologies, and access the ontologiesand their components through a set of REST services. BioPortal provides searchacross all ontologies in its collection, a repository of automatically and manuallygenerated mappings between classes in di↵erent ontologies, ontology reviews,new term requests, and discussions generated by the ontology users in the com-munity. BioPortal contains metadata about each ontology and its versions aswell as mappings between terms in di↵erent ontologies.
Undefined 1 (2009) 1–5 1IOS Press
BioPortal as a Dataset of Linked BiomedicalOntologies and Terminologies in RDF.Manuel Salvadores, a,⇤ Paul R. Alexander, a Mark A. Musen a and Natalya F. Noy a
a Stanford Center for Biomedical Informatics ResearchStanford University, USE-mail: {manuelso, palexander, musen, noy}@stanford.edu,
Abstract. BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies todate. This set includes ontologies that were developed in OWL, OBO and other formats, as well as a large number of medicalterminologies that the US National Library of Medicine distributes in its own proprietary format. We have published the RDFversion of all these ontologies at http://sparql.bioontology.org. This dataset contains 190M triples, representingboth metadata and content for the 300 ontologies. We use the metadata that the ontology authors provide and simple RDFSreasoning in order to provide dataset users with uniform access to key properties of the ontologies, such as lexical properties forthe class names and provenance data. The dataset also contains 9.8M cross-ontology mappings of different types, generated bothmanually and automatically, which come with their own metadata.
Keywords: biomedical ontologies, BioPortal, RDF, linked data
1. IntroductionIn our laboratory, we have developed BioPortal, a
community-based ontology repository for biomedicalontologies [20,1]. Users can publish their ontologiesto BioPortal, submit new versions, browse the ontolo-gies, and access the ontologies and their componentsthrough a set of REST services, SPARQL and de-referenceable URIs.
Over the past four years, as BioPortal grew in popu-larity, research institutions and corporations have usedour REST APIs extensively. The use of the REST ser-vices has experienced outstanding growth in 2011. Theaverage number of hits per month grew from 3M hitsin 2010 to 122M hits in 2011.Our users have incorpo-rated these services in applications that perform drugsurveillance, gene annotation, enrichment and clas-sification of scientific literature, and other tasks. InDecember 2011, we released a public SPARQL end-point, http://sparql.bioontology.org, toprovide direct access to our datasets in RDF. We had
*Corresponding author. E-mail: manuelso@stanford.edu.
numerous requests from users for the SPARQL end-point, which would enable them to query and analyzethe data in much more precise and application-specificways than our set of REST APIs allowed.
This paper describes the Linked Data aspects of theBioPortal’s ecosystem and the structure of our linkeddatasets in RDF. In addition, we describe the processthat we used to transform different ontology formatsinto RDF and the mappings between ontologies. Wedescribe several issues with using the shared SPARQLendpoint elsewhere [10]. This discussion includes thedetails on retrieving common attributes from multi-ple ontologies, articulating complex queries, and thelessons that we have learned on the best practices ofusing a shared SPARQL endpoint.
2. Biomedical Ontologies in BioPortalResearchers and practitioners in the Semantic Web
normally deal with two types of data: (1) ontologies,vocabularies or TBoxes; and (2) instance data or sim-ply data. It is important to clarify that BioPortal’s con-tent is almost exclusively ontologies and related arti-facts. By contrast, most other datasets of the Linked
0000-0000/09/$00.00 c� 2009 – IOS Press and the authors. All rights reserved
Tuesday, November 13, 12
Conclusions• Our use of SPARQL is different from many other use cases
because our data are primarily ontologies themselves and not data about individuals.
• SPARQL and a small amount of reasoning can be particularly powerful in providing easy access to common attributes.
• Exposing OWL through a SPARQL endpoint poses a number of challenges.
• There are challenges in running a shared open SPARQL endpoint. We can overcome these challenges if we encourage developers to conform to a set of simple best practices.
31Tuesday, November 13, 12
Thank you
Questions
32Tuesday, November 13, 12