1 How Ontologies Create Research Communities Barry Smith .

49
1 How Ontologies Create Research Communities Barry Smith http://ontology.buffalo.edu/smith
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of 1 How Ontologies Create Research Communities Barry Smith .

1

How Ontologies Create Research Communities

Barry Smith

http://ontology.buffalo.edu/smith

2

genomic medicine, molecular medicine, translational medicine, personalized medicine ...

need

methods for data integration to enable reasoning across data at multiple granularities

to identify biomedically relevant relations on the side of the entities themselves

3

4

where in the body ?

what kind of disease process ?

we need semantic annotations of data which human beings can understand

and computers can reason with

5

6

Institute for Formal Ontology and Medical Information Science (IFOMIS) Institute for Formal Ontology and Medical Information Science (IFOMIS) Institute for Formal Ontology and Medical Information Science (IFOMIS)

BIRN: Bioinformatics Research Network

7

Institute for Formal Ontology and Medical Information Science (IFOMIS) Institute for Formal Ontology and Medical Information Science (IFOMIS) Institute for Formal Ontology and Medical Information Science (IFOMIS)

BIRN: Bioinformatics Research Network

Center for In Vivo Microscopy

Brain Imaging and Analysis Center

Neuropsychiatric Imaging Research Laboratory

Yerkes National Primate Research Center

Clinical Cognitive Neuroscience Laboratory

Mallinkrodt Institute of Radiology

Lineberger Comprehensive Cancer Center

fMRI Research Center

Surgical Planning Laboratory

Center for Magnetic Resonance Research

Multiscale Systems Immunology for Adjuvant [Vaccine] Development

Investigators

Duke Center for Computional ImmunologyThomas B KeplerLindsay G CowellCliburn Chan

Duke Center for Computational Sciences, Engineering and MedicineJohn PormannRachael BradyBill Rankin

Duke MathematicsBill Allard

Duke Institute of Statistics and Decision SciencesMike West

Duke Computer ScienceJun Yang

Duke Human Vaccine InstituteGreg SempowskiMunir Alam

Department of Pathology,EmoryBali Pulendran

Department of Physiology & Biophysics, UC IrvineMichael Cahalan

Department of Pediatrics, Vanderbilt Kathryn Edwards

9

how do we make different sorts of data combinable in ways useful to the human beings who carry out research?

10

how was this problem solved in the years BC?

how did clinical researchers from different disciplines communicate?

how did they learn to communicate?

11

through the basic biomedical sciences:

anatomy, physiology, biochemistry, histology, ...

12

create ontologies corresponding to the basic biomedical sciences

clinical medicine relies on anatomy

and molecular biology to provide

integration across medical specialisms

13

14

where do we find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development, histology in different model organisms?

but we need more

15

16

what makes GO so wildly successful ?

17

different model organism databases employ scientific curators who use the experimental observations reported in the biomedical literature to associate GO terms with gene products in a coordinated way

The methodology of annotations:

18

cellular locations

molecular functions

biological processes

used to annotate the entities represented in the major biochemical databases

thereby creating integration across these databases and making them available to semantic search

A set of standardized textual descriptions of

19

what cellular component?

what molecular function?

what biological process?

20

This process

leads to a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form

21

Five bangs for your GO buckscience base

cross-species database integration

cross-granularity database integration

through links to the things which are of biomedical relevance

semantic searchability links people to software

22

but alsoneed to extend this methodology beyond the basic biomedical sciences, to clinical domains

disease ontology

immunology ontology

symptom (phenotype) ontology

neuron ontology

brain (mal)function ontology ...

23

the problem

need to ensure consistency of the new clinical ontologies with the basic biomedical sciences

need to find ways to ensure clinical data is annotated in terms of these new controlled vocabularies

if we do not start now, the problem will only get worse

24

a shared portal for (so far) 58 ontologies (low regimentation)

http://obo.sourceforge.net NCBO BioPortal

First step (2003)First step (2003)

25

26

Second step (2004)Second step (2004)reform efforts initiated, e.g. linking GO to other

OBO ontologies to ensure orthogonality

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

27

The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/

Third step (2006)Third step (2006)

28

a family of interoperable gold standard biomedical reference ontologies to serve the annotation of inter alia

scientific literature

model organism databases

clinical trial data

The OBO FoundryThe OBO Foundry

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

29

A prospective standarddesigned to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping)

established March 2006

12 initial candidate OBO ontologies – focused primarily on basic science domains

several being constructed ab initio

by influential consortia who have the authority to impose their use on large parts of the relevant communities.

30

undergoing rigorous reform

new

GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology

CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology

31

already in good shape

GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology

CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

Foundational Model of Anatomy

33

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy?)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

34

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy?)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULE Molecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Building out from the original GO

35

Disease Ontology (DO)

Biomedical Image Ontology (BIO)

Upper Biomedical Ontology (OBO UBO)

Environment Ontology (EnvO)

Systems Biology Ontology (SBO)

Under consideration:

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

36

OBO Foundry = a subset of OBO ontologies, whose developers have agreed in advance to accept a common set of principles reflecting best practice in ontology development designed to ensure

tight connection to the biomedical basic sciences

compatibility

interoperability, common relations

formal robustness

support for logic-based reasoning

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

37

CRITERIA The ontology is OPEN

The ontology employs a COMMON FORMAL LANGUAGE.

The developers agree to COLLABORATE

UPDATE in light of scientific advance

ORTHOGONALITY: one ontology per domain

38

COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.*

* Smith et al., Genome Biology 2005, 6:R46

CRITERIA

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

39

Further criteria will be added over time in light of lessons learned in order to bring about a gradual improvement in the quality of Foundry ontologies

ALL FOUNDRY ONTOLOGIES WILL BE SUBJECT TO CONSTANT UPDATE IN LIGHT OF SCIENTIFIC ADVANCE

IT WILL GET HARDER

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

40

But not everyone needs to join

The Foundry is not seeking to serve as a check on flexibility or creativity

ALL FOUNDRY ONTOLOGIES WILL ENCOURAGE COMMUNITY CRITICISM, CORRECTION AND EXTENSION WITH NEW TERMS

IT WILL GET HARDER

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

41

to introduce some of the features of SCIENTIFIC PEER REVIEW into biomedical ontology development

KUDOS for early adopters of high quality ontologies / terminologies e.g. in reporting clinical trial results

establish ONTOLOGY CHAMPIONS to create EVIDENCE-BASED TERMINOLOGY RESEARCH

GOALS

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

42

DATA REUSABILITY: if data-schemas are formulated using a single well-integrated framework ontology system in widespread use, then this data will be to this degree itself become more widely accessible and usable

GOALS

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

43

June 2006: establishment of MICheck:

reflects growing need for prescriptive checklists specifying the key information to include when reporting experimental results (concerning methods, data, analyses and results).

expand to all areas of biomedical experimentation

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

44

MICheck: ‘a common resource for minimum information checklists’ analogous to OBO / NCBO BioPortal

MICheck Foundry: will create ‘a suite of self-consistent, clearly bounded, orthogonal, integrable checklist modules’ *

* Taylor CF, et al. Nature Biotech, in press

MICheck Foundry

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

45

Transcriptomics (MIAME Working Group)

Proteomics (Proteomics Standards Initiative)

Metabolomics (Metabolomics Standards Initiative)

Genomics and Metagenomics (Genomic Standards Consortium)

In Situ Hybridization and Immunohistochemistry (MISFISHIE Working Group)

Phylogenetics (Phylogenetics Community)

RNA Interference (RNAi Community)

Toxicogenomics (Toxicogenomics WG)

Environmental Genomics (Environmental Genomics WG)

Nutrigenomics (Nutrigenomics WG)

Flow Cytometry (Flow Cytometry Community)

MICheck/Foundry communities

46

how to replicate the successes of the GO in clinical medicine:

choose two or three representative disease domains

work out reasoning challenges for those domains

work with specialists to create ontologies interoperable with OBO Foundry basic science ontologies to address these reasoning challenges

work with leaders of clinical trial initiatives to foster the collection of clinical data annotated in their terms

Fourth Step (the future)Fourth Step (the future)

Draft Ontology for Acute Respiratory Distress Syndrome

Draft Ontology for Muscular Sclerosis

what data do we have?

what data do the others have?

what data do we not have?

Draft Ontology for Muscular Sclerosis

to apprehend what is unknown requires a complete demarcation of the relevant space of alternatives