ECO R European Centre for Ontological Research Application of Ontology in Cancer Bioinformatics. Dr....
-
Upload
paxton-belden -
Category
Documents
-
view
214 -
download
1
Transcript of ECO R European Centre for Ontological Research Application of Ontology in Cancer Bioinformatics. Dr....
ECOREuropean Centre forOntological Research
Application of Ontology in Cancer Bioinformatics.
Dr. Werner Ceusters, MDExecutive Director
European Centre for Ontological ResearchSaarland University
Saarbrücken, Germany
ECOREuropean Centre forOntological Research
11th WorldConference on
Medical InformaticsSan Francisco 7-
11/9/2004
• 759 papers• 48 contain word “bioinformatics”• 124 contain “cancer”• 1 contains “cancer bioinformatics”• But: about 50 deal with cancer bioinformatics• 89 contain “ontology”
ECOREuropean Centre forOntological Research
• A Log Likelihood Predictor for Genomic Classification of Oral Cancer using Principle Component Analysis for Feature Selection
• Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development
• A Text Mining Approach to Enable Detection of Candidate Risk Factors
• Cancer-related Complementary and Alternative Medicine Online: Factors Affecting Information Retrieval (by patients)
• Development of the ICNP based cancer nursing information system• NCI Thesaurus: Using Science-Based Terminology to Integrate Cancer
Research Results• Extraction of Diagnosis Related Terminological Info from Discharge Summary• Automated Clinical Annotation of Tissue Bank Specimens• Mining OMIM for Insight into Complex Diseases• A new parameter enhancing breast cancer detection in computer aided diagnosis of X-
ray mammograms• Tools for the Performance of Clinical Trials Research• Formal Representation of Medical Goals for Medical Guidelines• Using Internet Survey Among Cancer Patients
Ontology relatedCancer Bioinformatics
at MEDINFO 2004
ECOREuropean Centre forOntological Research
Goals of Cancer Bioinformatics
• To integrate molecular, biological and clinical knowledge about cancer with analytic methods from bioinformatics.
• The ultimate aim is to create comprehensive prognostic and predictive models as aids to diagnosis, treatment and the design of new therapeutics.
ECOREuropean Centre forOntological Research
Task descriptions• Sequence similarity searching
– Nucleic acid vs nucleic acid 28– Protein vs protein 39– Translated nucleic acid vs protein 6– Unspecified sequence type 29– Search for non-coding DNA 9
• Functional motif searching 35• Sequence retrieval 27• Multiple sequence alignment 21• Restriction mapping 19• Secondary and tertiary structure prediction 14• Other DNA analysis including translation 14• Primer design 12• ORF analysis 11• Literature searching 10• Phylogenetic analysis 9• Protein analysis 10• Sequence assembly 8• Location of expression 7• Miscellaneous 7• Total 315
Stevens R, Goble C, Baker P, and Brass A. A Classification of Tasks in Bioinformatics. Bioinformatics 2001: 17 (2):180-188.
ECOREuropean Centre forOntological Research
Three major challenges
• Analyse massive amounts of data:– Eg: high throughput technologies based upon cDNA or
oligonucleotide microarrays for analysis of gene expression, analysis of sequence polymorphisms and mutations, and sequencing
• Appropriately link clinical histories to molecular or other biomarker data generated by genomic and proteomic technologies.
• Development of user-friendly computer-based platforms – that can be accessed and utilized by the average
researcher for searching, retrieval, manipulation, and analysis of information from large-scale datasets
ECOREuropean Centre forOntological Research
Words of Wisdom
• “Ontology” is too often not taken seriously, and only few people understand that. But there is hope: – The promise of Web Services, augmented with the
Semantic Web, is to provide THE major solution for integration, the largest IT cost / sector, at $ 500 BN/year. The Web Services and Semantic Web trends are heading for a major failure (i.e., the most recent Silver Bullet). In reality, Web Services, as a technology, is in its infancy. ... There is no technical solution (i.e., no basis) other than fantasy for the rest of the Web Services story. Analyst claims of maturity and adoption (...) are already false. ... Verizon must understand it so as not to invest too heavily in technologies that will fail or that will not produce a reasonable ROI.
Dr. Michael L. Brodie, Chief Scientist, Verizon ITOntoWeb Meeting, Innsbruck, Austria, December 16-18, 2002
ECOREuropean Centre forOntological Research
Setup of this presentation
• Look at some popular views, statements, claims, systems, beliefs, ... about “ontology”, and indicate where and how they fail to do justice to what ontology is actually about;
• Explain the basics of the principled approach that we use and give examples of practical applications;
• Some comments on the future of ontology in Buffalo and the US.
ECOREuropean Centre forOntological Research Data Integration approaches
1. Data Warehousing : Data from various data sources are converted, merged and stored in a
centralized DBMS. (Examples) Integrated Genomic Database 2. Hyperlinking approaches: Where links are set up between related information and data sources.
SRS, Entrez (NCBI)3. Standardization:
Efforts which address the need for a common metadata model for various application domains.
4. Integration systems: Systems that can gather and integrate information from multiple sources. Some of these systems have a Mediator-Wrapper Architecture others are language based systems like Bio-Kleisli.
5. Federated Database:Cooperating, yet autonomous, databases map their individual schema’s to a single global schema. Operations are preformed against the federated schema.
Steve Brady
System Integration approaches
ECOREuropean Centre forOntological Research Data integration approaches
• Protein interaction databases
• Small molecule databases
• Genome databases
• Pathway databases
• Protein databases
• Enzyme databases GeneOntology
at least, the beginnings of ...
ECOREuropean Centre forOntological Research
GO deals with basic ontological notions very
haphazardly
• GO’s three main term-hierarchies are:
• component, function and process
• But GO confuses functions with structures, and also with executions of functions
• and has no clear account of the relation between functions and processes
ECOREuropean Centre forOntological Research A flavour of ontology
<!-- ****************************************************************
Description of a location in a lipid bilayer membrane
Field description for BIND-membrane – not-specified = somewhere in membrane – outer-surface = on the outer surface of the membrane – within = within the bilayer – inner-surface = on the inner surface of the membrane – lumen = in the lumen that the membrane surrounds
*************************************************************** -->
<!ELEMENT BIND-membrane %ENUM; >
<!ATTLIST BIND-membrane value ( not-specified | outer-surface | within | inner-surface | lumen ) #REQUIRED >
ECOREuropean Centre forOntological Research
HAS-PARTIAL-SPATIAL-OVERLAP
IS-TOPO-
INSIDE-OF
IS-GEO-INSIDE-
OF
IS-INSIDE-
CONVEX-HULL-OF
IS-PARTLY-IN-CONVEX-
HULL-OFIS-OUTSIDE-CONVEX-HULL-OF
HAS-DISCONNECTED-
REGION
HAS-EXTERNAL-
CONNECTING-REGION
HAS-DISCRETED-REGION
HAS-TANG.-SPAT.-PART
HAS-NON-TANG.-SPAT.-PART
IS-SPAT.-
EQUIV.-OF
IS-TANG.-SPAT.-PART-
OF
IS-NON-TANG.-SPAT.-PART-
OF
HAS-PROPER-SPATIAL
-PART
IS-PROPER-
SPAT.-PART-
OF
HAS-SPATIAL
-PART
IS-SPATIAL-PART-
OF
HAS-OVERLAPPING
-REGION
HAS-CONNECTING-
REGION
HAS-SPATIAL-POINT-
REFERENCE
Mereo-topology
ECOREuropean Centre forOntological Research
caCORE:The NCICB Cancer Informatics
Infrastructure Backbone
cancer Bioinformatics Infrastructure Objects :Biomedical objects to facilitate the communication and integration of information from the various initiatives supported by the NCICB
cancer Data Standards Repository: meta-data used for cancer research
NCI Enterprise Vocabulary Services :standard vocabularies for a variety of settings in the life sciences
ECOREuropean Centre forOntological Research
caBIO architecture
Connectivity at programming interface level, NOT content
ECOREuropean Centre forOntological Research CoMeDIAS (France)
ECOREuropean Centre forOntological Research
GenesTraceTM: Biological Knowledge Discovery via Structured Terminology
ECOREuropean Centre forOntological Research
But ....
Talking to each other
does not mean
Understanding each other
ECOREuropean Centre forOntological Research Pray your computer isn’t
Irish ...
X: “Hallo stranger, you appear to be traveling?”
Y: “Yes, I always travel when on a journey.”
X: “And pray, what might your name be?”
Y: “It might be Sam Patch, but it isn't.”
X: “Have you been long in these parts?”
Y: “Never longer than at present—5 feet 9.”
X: “Do you get anything new?”
Y: “Yes, I bought a new whetstone this morning.”
Copyright © 1996 Electronic Historical Publications
ECOREuropean Centre forOntological Research
Cancer Data Standards Repository (caDSR)
• One of the problems confronting the biomedical data management community is the panoply of ways that similar or identical concepts are described.
• Amen !?• But more appropriate would it be to say:
– THE problem confronting the biomedical data management community is that concepts are described.
ECOREuropean Centre forOntological Research
Triadic models of meaning: The Semiotic/Semantic triangle
Sign:Language/
Term/Symbol
Referent:Reality/Object
Reference: Concept / Sense / Model / View
ECOREuropean Centre forOntological Research
“Ontology”• In Information Science:
– “An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.”
• In Philosophy:– “Ontology is the science of
what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.”
concept
term referent
definition
concept
term referent
definition
ECOREuropean Centre forOntological Research
Why are conceptsnot enough?
• Why must our theory address also the referents in reality?– Because referents are observable fixed
points in relation to which we can work out how the concepts used by different communities relate to each other ;
– Because only by looking at referents can we establish the degree to which concepts are good for their purpose.
ECOREuropean Centre forOntological Research NCI Enterprice Vocabulary
Services environment
ECOREuropean Centre forOntological Research
NCI Thesaurus
• a biomedical thesaurus created specifically to meet the needs of the NCI
• semantically modeled cancer-related terminology built using description logic
ECOREuropean Centre forOntological Research
Why description logicsare not enough
SNOMED-RT (2000)
SNOMED-CT (2003)
ECOREuropean Centre forOntological Research
Underspecificationnew-1
new-2
ECOREuropean Centre forOntological Research Use of description logics does not
guarantee correct representations !
ECOREuropean Centre forOntological Research
It’s not just a problemin Healthcare
Ontologies for Legal Information Serving and Knowledge ManagementJoost Breuker, Abdullatif Elhag, Emil Petkov and Radboud Winkels
ECOREuropean Centre forOntological Research
Ontology versusDescription Logics
• In the Description Logic world – terms and definitions come first,– the job is to validate them and reason with
them
• In the realist ontology world – robust ontology (with all its reasoning power)
comes first– and terms and term-hierarchies must be
subjected to the constraints of ontological coherence
ECOREuropean Centre forOntological Research
Search for “cancer”
ECOREuropean Centre forOntological Research
NCI Thesaurus Root concepts
Anatomic Structure, Anatomic System, or Anatomic Substance ?Or ? Does the NCI not know to which categoryAny item classified there belongs ?Anatomic Substance ? If yes, why is geneproduct not subsumed by it ? If no, why aredrugs and chemicals not subsumed by it ?
ECOREuropean Centre forOntological Research
Conceptual entity
• Definition: none• Semantic type:
– Conceptual entity– Classification
• Subconcepts:– Action:
• definition: action; a thing done
– And: • Definition: an article which expresses the relation of
connection or addition, used to conjoin a word with a word, ...
– Classification• Definition: the grouping of things into classes or categories
ECOREuropean Centre forOntological Research
Definition of “cancer gene”
ECOREuropean Centre forOntological Research
NCI Thesaurus architecture
Disease
BreastBreast neoplasmDisease-has-associated-anatomy
ISA
Findings-And-Disorders-Kind Anatomy-Kind
“Formal subsumption” or
“inheritance”
“Associative” relationships providing
“differentiae”
“Kinds” restrict the domain and range of
associative relationships
What diseases have a diameter of over 3 cm ?
ECOREuropean Centre forOntological Research
Problems with C - rel - C
• Ad hoc readings of statements of the type C1-relationship-C2– Human has-part head // Human has-part finger– California is-part-of United States // California isa name– labial vein isa vein of head // labial vein isa vulval vein
• Concepts not necessarily correspond to something that (will) exist(ed)– Sorcerer, unicorn, leprechaun, ...
• Definitions set the conditions under which terms may be used, and may not be abused as conditions an entity must satisfy to be what it is
• Language can make strings of words look as if it were terms– “Middle lobe of left lung”
ECOREuropean Centre forOntological Research
NCI Metathesaurus
• based on NLM's Unified Medical Language System Metathesaurus supplemented with additional cancer-centric vocabulary
• a database of many biomedical terminologies, mapped where possible to NCI Thesaurus terms and shared conceptual meanings
ECOREuropean Centre forOntological Research
NCI and Partner Data Sources
• SAGE Data (CGAP) – NCI and Duke university SAGE experiment data
• Expression Measurements (NCICB GEDP) - Probe sets • Sequence Trace Files (GAI) - EST traces and full-length
mRNA clone traces • Genetic Annotation Initiative (GAI) - SNPs • Sequence Verified Clones (as of caBIO version 2.0)
(NCICB internal pre-processed) - Human and mouse sequence-verified clone information
• Cancer Clinical Trials (NCI CTEP and PDQ) - Trials and drug agent information
• CMAP Annotation Data (CMAP) - Drug targets, anomalies • Cancer Vocabulary (NCI) - Cancer related terminology and
concepts
ECOREuropean Centre forOntological Research
External Data Sources• Unigene (NCBI) - Human and mouse genes, sequences,
map locations, clones, proteins and protein homologs • Homologene (NCBI) - Human and mouse gene homologs • LocusLink (NCBI) - Genes, gene ontologies, gene aliases,
taxons • RefSeq (NCBI) - Reference sequences • EST Data (NCICB) - Tissue-specific expression level ESTs • cDNA library information (NCICB) - cDNA libraries for
disease and tissue • Human Genome via UCSC DAS server (UCSC) - Genomic
sequences, annotations, and map coordinates • BioCarta (BioCarta) - Pathways • Gene Ontology - Hierarchy of gene functions
ECOREuropean Centre forOntological Research Metathesaurus traps
UMLS example
ECOREuropean Centre forOntological Research
IFOMIS: Institute for Formal Ontology and Medical Information Science
The Institute for Formal Ontology and Medical Information Science was founded in April 2002 as part of the Faculty of Medicine of the University of Leipzig utilizing a grant of the Alexander von Humboldt Foundation. It comprehends an interdisciplinary research group with members from Philosophy, Computer and Information Science, Logic, Medicine, and Medical Informatics. IFOMIS established itself as a center of theoretically grounded research in both formal and applied ontology. Its goal is to develop a formal ontology that will be applied and tested in the domain of medical and biomedical information science.In August 2004 IFOMIS moved its base of operations from Leipzig to Saarland University in Saarbrücken.
IFOMIS Universität des Saarlandes Postfach 151150 D-66041 Saarbrücken Germany
Secretariat Tel.: +49 (0)681-302-64770 Fax: +49 (0)681-302-64772
ECOREuropean Centre forOntological Research
IFOMIS’s long-term goal
• Build a robust high-level BFO-MedO framework
• THE WORLD’S FIRST INDUSTRIAL-STRENGTH PHILOSOPHY
• which can serve as the basis for an ontologically coherent unification of medical knowledge and terminology
ECOREuropean Centre forOntological Research
IFOMIS’ research inFormal Ontology
• Formal treatment of universals, individuals, endurants, perdurants, scales, functions, collections, ...
• Universals / Concepts
• Meriology and topology
• Vagueness and granularity
• Applicability to domain ontologies, terminologies, ...
ECOREuropean Centre forOntological Research
Reference Ontology
• a theory of a domain of entities in the world
• based on realizing the goals of maximal expressiveness and adequacy to reality
• sacrificing computational tractability for the sake of representational adequacy
ECOREuropean Centre forOntological Research
Basic Ontological Notions
• Identity– How are instances of a class distinguished
from each other
• Unity– How are all the parts of an instance isolated
• Essence– Can a property change over time
• Dependence– Can an entity exist without some others
ECOREuropean Centre forOntological Research
(Simplified) Logic of classes• primitive:
– entities: particulars versus universals– relation inst such that:
• all classes are universals; all instances are particulars
• some universals are not classes, hence have no instances: pet, adult, physician
• some particulars are not instances; e.g. some mereological sums
• subsumption defined resorting to instances:
ECOREuropean Centre forOntological Research Basic Formal Ontology
Basic Formal Ontology consists in a series of sub-ontologies (most properly conceived as a series of perspectives on reality), the most important of which are: – SnapBFO, a series of snapshot ontologies (Oti ),
indexed by times: continuants– SpanBFO a single videoscopic ontology (Ov):
occurants.
Each Oti is an inventory of all entities existing at a time. Ov is an inventory (processory) of all processes unfolding through time.
ECOREuropean Centre forOntological Research
Occurants and continuants
Picture by Vladimir Brajic
ECOREuropean Centre forOntological Research
ECOREuropean Centre forOntological Research
Levels of granularity inbiomedical ontology
Population environment screening
Person Race, age, disease, symptom
ADL, working, treatment, prevention
Organ Liver, lung, organ part, sign
Heart beat, digestion, surgery
Tissue Elasticity,Turgor, Strength
Resorption, protection
Cell Bone cell, Alveolar cellCell size, bacterium
Fagocytosis, Cell growth, Reparation, hormone production
Subcellular Cell membrane, ProteinDNA, Oncogene, Protooncogene,Virus, oncogenic molecule
TranscriptionSplicingMutationGene regulation
Granularity level Continuants Occurrents
ECOREuropean Centre forOntological Research Missed subsumption
detection in SNOMED-CT
Missing: ISA neoplasm of heart
ECOREuropean Centre forOntological Research
Correction of MGED’s ontology upper part
MGEDOntology
MGEDCoreOntology
The MGED Ontology is a top level container for the MGEDCoreOntology and the MGEDExtendedOntology. The MGED ontology describes microarray experimentsand is split intothe MGEDCoreOntology, which supports MAGE-OM v1.0 and is organized consistently with MAGE, and the MGEDExtendedOntology, which expands MAGE v1.0 and contains concepts and relationships which are not included in MAGE. Cancer
Site
SubClassOf
SubClassOf
Primary site
Metastatic site
InstanceOf
InstanceOf
the organism part in which additional tumors are identified remote from the primary site
BioMaterialPackage
SubClassOf
BioMaterialCharacteristics
OrganismPart
SubClass Of
SubClassOf
DiseaseLocationSubClass Of
has_cancer_site has-class one-of
Anatomical location(s) of disease.
ECOREuropean Centre forOntological Research Text mining and classification
Having a healthcare phenomenon
Generalised PossessionHealthcare phenomenonHuman
IS-A
Has-possessor Has-
possessed
PatientIs-possessor-of
Cancer patient
IS-A
Has-Healthcare-phenomenon
Malignant neoplasm
IS-A
11
1
2
2
IS-A
3
3lung carcinoma
IS-A
Mr. Smith has a pulmonary carcinoma
ECOREuropean Centre forOntological Research
The near future:International
CancerOntology Project
• Healthcare Informatics call 6th FP of EU
• Applying realist ontology to:– Connect relevant databases for combatting
cancer, • covering all levels of granularity (from molecules to
entire patients) at deep semantic level• Independent of the dataformat (text, structured,
coded, ...)
ECOREuropean Centre forOntological ResearchKnowledge discovery and use
ECOREuropean Centre forOntological Research
Towards a US-based “X”CORs
• BCOR: Buffalo Centre for Ontological Research
• NCOR: National Centre for Ontological Research– Involving Stanford
• Introducing realist ontology (as a sound analytical philosophical discipline) to improve ontologies (as representations).