Cell Behavior Ontology Workshop · 5/4/2009 · Building Ontologies Best practices, pitfalls and...
Transcript of Cell Behavior Ontology Workshop · 5/4/2009 · Building Ontologies Best practices, pitfalls and...
Building OntologiesBest practices, pitfalls and positives
Olivier Bodenreider
Lister Hill National Centerfor Biomedical Communications
Bethesda, Maryland - USA
Cell Behavior Ontology Workshop National Institutes of Health Campus, Bethesda, MD
May 4, 2009
Lister Hill National Center for Biomedical Communications 2
Outline
If ontology is the solution, what is the problem?Think use cases
Don’t try this at home!Ontologies for dummies hasn’t been written yet
Where to start?
If ontology is the solution,what is the problem?
Lister Hill National Center for Biomedical Communications 4
Uses of biomedical ontologies
Knowledge managementAnnotating data and resourcesAccessing biomedical informationMapping across biomedical ontologies
Data integration, exchange and semantic interoperabilityDecision support
Data selection and aggregationDecision supportNLP applicationsKnowledge discovery
[Bodenreider, YBMI 2008]
Lister Hill National Center for Biomedical Communications 5
Properties of biomedical ontologies
Knowledge managementAnnotating data and resourcesAccessing biomedical informationMapping across biomedical ontologies
Data integration, exchange and semantic interoperabilityDecision support
Data selection and aggregationDecision supportNLP applicationsKnowledge discovery
Controlledvocabularies
Ontologicalresources
Lister Hill National Center for Biomedical Communications 6
Ontology “spectrum”
http://www.mathiswebs.com/ontology.htm
Lister Hill National Center for Biomedical Communications 7
Cell movement in MeSH
http://www.nlm.nih.gov/mesh/2009/mesh_browser/MBrowser.html
Controlled vocabulary
Lister Hill National Center for Biomedical Communications 8
Cell movement in MeSH
Uncharacterizedhierarchicalrelations(knowledgeorganizationin a thesaurus)
Lister Hill National Center for Biomedical Communications 9
Cell movement in GO
http://amigo.geneontology.org/cgi-bin/amigo/search.cgi
Lister Hill National Center for Biomedical Communications 10
Cell movement in GO
isa and part_of relations
Lister Hill National Center for Biomedical Communications 11
Thesaurus vs. Ontology
Define use casesTypical situations in which the resource to be created is expected to contribute to the solution
Resource annotation (controlled vocabulary)– Lexical aspects (e.g., synonyms, variants) are important
Resource classification/organization (thesaurus)Inference based on the attributes of biological entities (ontology)
Competency questionsMinimal ontological commitment(just enough information for the task at hand)
Don’t try this at home!
Lister Hill National Center for Biomedical Communications 13
Ontologies vs. cars
What is the difference betweenan ontology and a car?
Lister Hill National Center for Biomedical Communications 14
Dependent continuants, existential quantification and rigidity
BFO, presented in http://www.cs.ubc.ca/~poole/aibook/html/ArtInt_313.html
Formal ontology (philosophy)
Lister Hill National Center for Biomedical Communications 15
Dependent continuants, existential quantification and rigidity
[Suntisrivaraporn, AIME 2007]
Knowledge representation
Lister Hill National Center for Biomedical Communications 16
Dependent continuants, existential quantification and rigidity
OntoClean, http://www.loa-cnr.it/Papers/GuarinoWeltyOntoCleanv3.pdf
Formalontology(philosophy)
Lister Hill National Center for Biomedical Communications 17
Instances, classes, collectives
“Lmo-2 interacts with Elf-2” One individual Lmo-2 molecule interacts with one individual Elf-2 molecule.A collective of Lmo-2 molecules interacts with one individual Elf-2 molecule.One individual Lmo-2 molecule interacts with a collective of Elf-2 molecules.A collective of Lmo-2 molecules interacts with a collective of Elf-2 molecules.[…]
[Schulz and Landsen, Applied ontology 2009]
Lister Hill National Center for Biomedical Communications 18
Top-level ontologies
Basic Formal Ontology (BFO)DOLCEBioTop
Ground domain ontologies into sound philosophical foundationsDifficult to understand for “folks form the trenches”
Lister Hill National Center for Biomedical Communications 19
A pyramid of ontologies
BioTop, http://www.imbi.uni-freiburg.de/biotop/
Lister Hill National Center for Biomedical Communications 20
Power tools for ontologies
Ontology editorsProtégéOBO-Edit
Semantic wikisIntermediate representationsCollaborative development
Too complexfor mostbiologists
Simplifiedrepresentations
Where to start?
Lister Hill National Center for Biomedical Communications 22
Collect entities from the universe of discourse
From expertsWorkshops
From textual corporaManuallyAutomatic term extraction
From existing terminologies and ontologies
Lister Hill National Center for Biomedical Communications 23
Shop around Ontology repositories
http://bioportal.bioontology.org/
Lister Hill National Center for Biomedical Communications 24
Shop around Ontology repositories
UMLS Semantic Navigator http://umlsks.nlm.nih.gov/
Lister Hill National Center for Biomedical Communications 25
Link to/Borrow from existing ontologies
AdvantagesAvoid reinventing the wheelBenefit from the experience of specialists of a given subdomain
DisadvantagesBorrow ontological commitment from these ontologiesMight align (or not) with the ontological commitment in your ontology
Lister Hill National Center for Biomedical Communications 26
Decide on standards and tools
With the help of experienced ontologists
For knowledge representatione.g., OWL
For editing ontologiese.g., Protégé
For ontological commitmente.g., top-level ontology, relation ontology
Lister Hill National Center for Biomedical Communications 27
Guidelines for ontology development
OBO Foundry principlesOpennessCommon shared syntax (OBO or OWL)Unique identifier space within the OBO FoundryVersioning mechanismClearly specified and clearly delineated contentTextual definitions for all termsUse relations from the OBO Relation OntologyThe ontology is well documentedPlurality of independent usersCollaborative development with other OBO Foundry members [Smith et al., Nature Biotechnology 2007]
http://www.obofoundry.org/crit.shtml
Lister Hill National Center for Biomedical Communications 28
Learn from others Three recent projects
BiomedGTNational Cancer InstituteSemantic wiki approachhttp://biomedgt.nci.nih.gov/wiki/index.php/Main_Page
Infectious Disease OntologyOBO Foundry approachhttp://www.infectiousdiseaseontology.org/Home.html
International Classification of Diseases (ICD11 revisions)Semantic wiki approach + Protégé backgroundhttp://www.who.int/classifications/icd/ICDRevision/en/index.html
Neuroscience Information Frameworkhttp://neuinfo.org/
Take home points
Lister Hill National Center for Biomedical Communications 30
Take home points
Start by defining use cases, not ontologiesDefine and measure success
Let the biologists be biologistsSeek the assistance of ontologists when dealing with
Top-level ontologyFormalismComplex ontology editors
Follow experience/guidelines, not gurusNIF, BiomedGT, ICD11 revisionsOntology Foundry
Think prospectivelyMaintenanceFunding (beyond short term)
MedicalOntologyResearch
Olivier Bodenreider
Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA
Contact:Web:
Lister Hill National Center for Biomedical Communications 32
Additional references
Bodenreider O. Biomedical ontologies in practiceShort course at the University of Utah, Department of Biomedical Informatics, Salt Lake City, Utah, June 9-11, 2008.http://mor.nlm.nih.gov/pubs/pres/080609-11_bioontologies_in_practice.pdfSmith B. Introduction to Biomedical Ontologies – A training course in eight lectures (video)http://ontology.buffalo.edu/smith/Ontology_Course.html
International Conference on Biomedical OntologyUniversity at Buffalo, NY · July 24-26, 2009http://icbo.buffalo.edu/