Document Navigation: Ontologies or Knowledge Organisation Systems
Simon Jupp
Bio-Health Informatics Group
University of Manchester, UK
Bio-ontologies SIG, ISMB 2007 Vienna, Austria
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Document navigation over a Semantic Web
Introduction
Ontologies are being developed as background knowledge to drive the Semantic Web.
Message: Formal ontologies are not the only knowledge artefact needed, artefacts with weaker semantics have their role and are the best solution in some circumstances
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
COHSE (Conceptual Open Hypermedia SErvice)
Navigation via Hypertext is a mainstay of WWW
Problem: Links are typically embedded to Web pages; hard-coding, format restrictions, ownership, legacy resources, maintenance, Unary targets etc.
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
HTML Document in
HTML Document in
Linked HTML Document outLinked HTML
Document out
DLS AgentDLS
Agent
Knowledge Service
Knowledge Service
Resource Service
Resource Service
OntologyOntology
SKOSSKOS
Search EngineSearch Engine
Annotation DB
Annotation DB
COHSE Architecture
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Sealife project
www.biotec.tu-dresden.de/sealife
A realisation of a Semantic Grid Browser, which links the current Web to the emerging eScience infrastructure
Built on existing tools developed by partners:
COHSE, but also GoPubMed and others.
Applications: from cells via tissue to patients
Evidence-based medicine
Patent and literature mining
Molecular biology
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
What is a Sealife browser?
A Sealife browser is a browser that identifies concepts in texts and offers appropriate services based on these concepts.
Three main components:
1) an underlying “ontology” or set of “ontologies”
2) a text mining component
3) a service composition module
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Sealife use case - study of disease
NeLI: National electronic Library of Infection portal. Range of users but few links.
User Group Question Targets Family Doctor (GP)
Tuberculosis drugs and side effects?
British National Formulary (BNF)
Clinicians Tuberculosis treatments guidelines?
Public Health Observatories (PHO)
Molecular Biologists
Drug resistant tuberculosis species?
PubMed
General Public What is tuberculosis? Health Protection Agency (HPA) or the NHS direct online website.
Given a document about Tuberculosis, where would users want to navigate to next?
http://www.neli.org.uk
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Background Knowledge
What style of knowledge model is best suited for navigation?
Do we need strict semantics, are they a help or hindrance?
All I want to say is this concept “has something to do with” this concept - this doesn’t fit well with formal ontology
e.g Polio has something to do with polio virus / polio disease / polio treatment - get me documents about all of them!
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Sealife background knowledge
To cover molecular biology through to medicine we need a large knowledge artefact to serve as background knowledge for SeaLife. This artefact must support sensible navigation between documents on the web.
Luckily…
Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue / enzyme source
Development
Anatomy
Phenotype
Plasmodium life cycle
-Sequence types and features-Genetic Context
- Molecule role - Molecular Function- Biological process - Cellular component
-Protein covalent bond -Protein domain -UniProt taxonomy
-Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction
-Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version
-Mosquito gross anatomy-Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy-Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development
-NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history
eVOC (Expressed Sequence Annotation for Humans)
1603 1975
ICD9
1900
ICD
1855 1785
SynopsisNosologiaeMethodicae
OPCS
SNOPCPT
MESHICD
MeSH
OPCSEmTree
SNOPCPT
1700
OPCS3
CTV3READ
1975 2005 1985 1995
ICPCSNOMED-2
SNOMEDInternational SNOMED-RT
SNOMED-CT
GALEN DM&D
UMLS
FMA
OPCS4 OPCS4.3
History of Medical Vocabularies
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
What do we need for navigation?
The bio-medical domain is rich in vocabularies and ontologies. Large lexical resource including textual definitions and synonyms There is a varying degree of semantics, expressivity and formality in these vocabularies (e.g. MeSH) and ontologies (e.g FMA). Most include some form of hierarchy. Hierarchies are well suited for driving navigation.Question: Do we want strict sub/super class relationships? Or, do we want looser notations such as broader/narrower?
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Ontology or Vocabulary?
Initial approach to COHSE and SeaLife was to represent everything in OWLThe strict semantics of OWL do not always lend themselves to sensible navigation, conversion from vocabularies to OWL are difficult. It’s hard to model some things in OWL…
MeSH OBO/OWL
Head <-- Ear
<-- Nose
Accident <-- Traffic Accident
<-- Accident Prevention
Nucleus part_of Cell
Cell has_part Nucleus - Not always True
PolioDisease causedby PolioVirus
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
SKOS (Simple Knowledge Organisation System)
Purpose: Subject Metadata and information retrieval
RDF/XML representation
Model ideally suited for our task
e.g. This document is about tuberculosis
e.g. prefLabe, altLabel, broader, narrower, related.
http://www.w3.org/2004/02/skos/
Model for representing concept schemes, thesauri, classification system, taxonomies etc…
SKOS Concept is an individual of the class skos:concept, relationships are just asserted facts between (not restrictions!)
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Conversion to SKOS
• relationship:part_of --> skos:broader ( e.g. finger part_of hand)
• relationship:contains --> skos:narrower ( e.g. skull contains brain)
• relationship:causes --> skos:related ( e.g. PolioDisease causes PolioVirus)
Sub properties:
skos:broader skos:narrowerinverse
obo:part_of obo:has_part
Leaves us open to migration back to OWL
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Conversion to SKOS
SKOS
MeSHOBO ontologies
OWL ontologies
SNOMED
NeLITaxonomies
Concept Schemas
Thesauri Other..
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Advantage of this approach
For a given concept e.g. “Polio Virus”, we can query multiple resources and bring related concepts together.
Source Terms found SKOS relation to “Poliovirus”
MeSH Brunhilde Virus skos:altTerm Disease Ontology Spinal cord disease skos:broaderThan Postpoliomyelitis
Syndrome skos:narrowerThan
SNOMED Microorganism skos:broaderThan Enterovirus skos:broaderThan
• Rapid (and cheap!) generation of knowledge artefact
• Take advantage efforts in multiple biomedical communities
• We don’t have to make any strong ontological distinctions
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Disadvantage of this approach
Trade off: Lose the inferential power when querying a knowledge resource
Inability to do inconsistency checking
Potentially large redundancy in our knowledge base
Maintenance and scalability (>1000000 concepts) - especially for dynamic hyper-linking.
Unwanted concepts & relationship - especially from OWL conversion e.g. ‘Physical Entity’, ‘Continuant’ etc….
Linking overload!
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Plug for Manchester’s SKOS plug-in - Protégé 4
• Instance hierarchy viewer
• OBO or OWL --> SKOS wizards
• Various rendering options
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Conclusion
We need a large knowledge artefact that support navigation between related web resources
Rapid generation and reuse of existing terminologies (cheap)
Loosening the semantics of our model enables this with acceptable trade off
SKOS is a suitable model to represent our knowledge artefact
We have strong use cases from the life sciences - demos available
Still some open issues ….
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
Acknowledgments
Manchester
Robert Stevens
Sean Bechhofer
Yeliz Yesilada
NeLI
Patty Kostkova
Other
COHSE developers
Sealife project
Thank you.
Top Related