Ontologies for Semantic Normalization of Immunological Data
-
Upload
yannick-pouliot -
Category
Health & Medicine
-
view
80 -
download
2
Transcript of Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot1, Atul J. Butte1,2
1. Division of Systems Medicine, Stanford University School of Medicine2. Center for Pediatric Bioinformatics, Lucile Packard Children’s Hospital, Palo Alto, California
Results & Discussion
AcknowledgmentsNIAID, Hewlett Packard Foundation, Butte Lab
57 Ontologies within these domains were then screened for preliminary suitability to HIPC data/metadata according to four broad criteria:
Criterium ParameterDesign Must be integrative (ability to draw on similar/identical
concepts from other ontologies)
Must minimize overlap with other ontologies, consistent with providing terms that can be inter-related across ontologies
Should be “relatable” to clinical applications Must be applicable to humans, and perhaps one animal modelMust be an ontology, not just a controlled vocabulary (with a few exceptions)
Developmental state
Evidence of ongoing development and maintenance
Developed with, accepted by standards or professional organizations
Adheres to standards such as the Basic Formal OntologyUsage Must be released (post beta)
Reasonably widely adoptedContent Must exhibit a good balance between expressiveness and
usability/understandability: Conceptual clarity (e.g., no ambiguous classifications for
dual-use organs such as reproductive/urinary organs) Limited redundancy of synonymous concepts Usable definitions of concepts describing HIPC data or
metadata, including experiment design
Completeness (frequency of missing concepts)Correctness (how accurately is a concept is expressed)
Ontologies provided by BioPortal1 were first selected based on their domain of application: Domain Description Example OntologyAnalysis Process of data analysis Ontology of Data MiningAnatomy Anatomical structures at all level of
resolution except molecularCell Ontology
Disease Disease states manifested by organisms at anatomical, spatial, temporal and functional levels
Infectious Disease Ontology (IDO)
Experimental conditions
Conditions/specifications associated with a scientific or clinical protocol
Ontology of Clinical Research (OCRe)
Modeling Process/properties/data types of modeling, computational or otherwise
Interaction Network Ontology (INO)
Molecule Aspects of biomolecules : structure, sequence, function
ONTIE - Ontology of Immune Epitopes
Pathways Biochemical, signaling pathways used by organisms
Pathway Ontology
Phenotype States manifested by organisms at anatomical, spatial, temporal and functional levels. “Anatomy” and “Disease” are components of “Phenotype” but treated distinctly
Phenotypic Quality
1
2
These ontologies were then analyzed for their ability to recognize terms from text obtained from Build 1 datasets, protocols, and metadata, as well as from Stanford’s DataMt database2 (which stores many Stanford HIPC datasets). An automated pipeline that relies on the National Center for Biomedical Ontology’s Annotator3 was written that relies on BioPortal’s Web services to parse the text and attempt to map to the reference ontologies.
3
The Problem
Methods
References1. Noy et al., (2009) “BioPortal: Ontologies and Integrated Data resources at the Click of a Mouse”, Nucl. Acids Res., 37:W170-W173.2. Siebert, J., Munsil, D. & Maecker, H. (2011) "A Novel Approach for Integrating and Exploring Heterogeneous Translational Data", manuscript in preparation.3. Jonquet et al., (2009) “The Open Biomedical Annotator”, Summit on Transla. Bioinfo., 56-60.
Build 1 ImmPort
CV Data Mt
Ontology (%) n (%) n (%) n
NCI Thesaurus 16.7% 7 33.3% 14 52.6% 92
Medical Subject Headings 7.1% 3 21.4% 9 22.9% 40
Molecule role 14.3% 6 0.0% 24.6% 43
SNOMED Clinical Terms 2.4% 1 11.9% 5 17.7% 31
PRotein Ontology (PRO) 16.7% 7 0.0% 10.3% 18
Cell Cycle Ontology 7.1% 3 0.0% 12.6% 22
Ontology for Biomedical Investigations 2.4% 1 14.3% 6 5.7% 10
Experimental Factor Ontology 7.1% 3 7.4% 13
SemanticScience Integrated Ontology 2.9% 5
Units of measurement 2.4% 1 2.3% 4
Phenotypic quality 2.4% 1 2.3% 4
EDAM 7.1% 3 0.6% 1
Foundational Model of Anatomy 2.3% 4
Vaccine Ontology 2.4% 1 1.1% 2
ICPC-2 PLUS 1.1% 2
MGED Ontology 2.4% 1
Measurement Method Ontology 2.4% 1
Gene Ontology 2.4% 1
Ontology of Clinical Research (OCRe) 0.6% 1
Protein-protein interaction 0.6% 1
Mammalian phenotype 0.6% 1
• Many mapping failures attributable to lack of definition for commercial objects within the reference ontologies (e.g., “Anti-CD27” antibody from BD)
Solution: Contacting ontology owners to have them add commercial terms to their ontologies
• Many mapping failures are easily correctable
Example: Adding a pre-processor able to recognized instances of the “anti-“ problem (e.g., “anti-CD20” not recognized even though “CD20” is known)
We conclude that ImmPort should be able to migrate toward ontologically-based encodings.
Data from experiments probing the immune system are inherently complex because of the diversity of data types, assay types and the number of biological agents involved. This complexity is further increased by the multi-center nature of data generated by HIPC. One of the goals of HIPC is to deliver a database able to support broad community access to these complex data sets. Critical to the success of this database will be its ability to provide conceptual characterizations of experiments and their results (“data and metadata encoding”). Such encodings identify data sets according to experimental properties so that users can quickly narrow their searches to the most pertinent results. To this end, conceptual encoding that rely on “industry-standard” ontologies are preferred is the best way to achieve this. We determined the extent to which existing ontologies can be used to encode HIPC data, and ImmPort’s ability to support the application of these concepts. Since ImmPort will be the repository of HIPC data, we evaluated its use of ontologies. Upon determining that ImmPort is not ontology-compliant, we analyzed the universe of ontologies to determine the extent to which existing ontologies can be used to encode HIPC data, and ImmPort’s ability to support the application of these concepts.