Http://ontologist.com 1 The OBO Foundry A Gold Standard Approach to Ontology Evaluation Barry Smith...

Post on 27-Mar-2015

214 views 1 download

Tags:

Transcript of Http://ontologist.com 1 The OBO Foundry A Gold Standard Approach to Ontology Evaluation Barry Smith...

1http://ontologist.com

The OBO FoundryA Gold Standard Approach to

Ontology Evaluation

Barry Smith

http://ontology.buffalo.edu/smith

2http://ontologist.com

Two types of ontology

natural-science ontologies capture terminology-level knowledge underlying the best current science

contrasted with administrative ontologies (e.g. billing ontologies, bloodbank ontologies, lab workflow ontologies) prepared for specific, local purposes

3http://ontologist.com

scientific ontologies have special features

Every term in a scientific ontology must be such that the developers of the ontology believe it to refer to some entity on the basis of the best current evidence

scientific ontologies are realism-based

4http://ontologist.com

For scientific ontologies

reusability is crucialcompatibility with neighboring scientific

ontologies

it is generalizations that are important

= universals, types, kinds

5http://ontologist.com

An ontology is a representation of universals

We learn about universals in reality from looking at the results of scientific experiments in the form of scientific theories

experiments relate to what is particular science describes what is general

6http://ontologist.com

what is the difference between an ontology and a scientific theory?

an ontology is also a terminological standardization

WHAT DOES THIS MEAN?

7http://ontologist.com

1st aspect: additivity

cell = def. plant cell, consisting of protoplast and cell wall; ... [Plant Ontology]

what happens when the users of the Plant Ontology need to consider bacterial pathogens in plants?

8http://ontologist.com

2nd aspect: calibration with reality

gold standard kilogram

the same universal is defined by reference either to some artifact or to some universal physical constant(for realists there is no problem here)

9http://ontologist.com

VIM: the InternationalVocabulary of Metrology

(i) repeated measurements always give rise to some variation in values, (ii) one can never be sure (fallibilism) that one has got the true value, Hence: (iii) there are no true values.

To keep happy those who dismiss the notion of the true value, the international community is agreeing to a set of terms which intentionally allow two possible interpretations

once again: bad philosophy leads to bad standards Compare:http://ontology.buffalo.edu/medo/Wuesteria.pdf

10http://ontologist.com

from: The NIST Reference on Constants, Units and UncertaintyThe creation of the decimal Metric System at the time of the French Revolution and the subsequent deposition of two platinum standards representing the meter and the kilogram, on 22 June 1799, in the Archives de la République in Paris can be seen as the first step in the development of the present International System of Units.

11http://ontologist.com

from: The NIST Reference on Constants, Units and UncertaintyIn the 1860s Maxwell and Thomson ‘formulated the requirement for a coherent system of units with base units and derived units. In 1874 the British Association for the Advancement of Science introduced the CGS system, a three-dimensional coherent unit system based on the three mechanical units centimeter, gram and second, using prefixes ranging from micro to mega to express decimal submultiples and multiples. The following development of physics as an experimental science was largely based on this system.’

12http://ontologist.com

13http://ontologist.com

Base and Derived Units

Units based on undefined SI dimensions: meter, second, kilogram, ampere, candela, kelvin, mole.

Units based on defined SI dimensions: volume, area, velocity, acceleration, newton, joule, pascal, coulomb, farad, henry, hertz, lumen, lux, ohm, etc.

Dimensions can be multiplied and divided (meters/second).

14http://ontologist.com

The SI System of Units

is a qualitative ontology: it captures qualitative dimensions of reality to which quantities can be applied (it captures measurable dimensions of reality)

there is a degree of conventionality in the choice of basic vs. derived units, and in the standard [e.g. the Paris meter] that is used to define the unit in each dimension

15http://ontologist.com

but the dimensions themselves exist independently of our conventions

so that an ontology of these dimensions is a true representation of an independently existing reality

16http://ontologist.com

Quantities are UniversalsIngvar Johansson:

Many different things can simultaneously have a mass of 5kg (length of 4m, etc.).

Determinate quantities are universals, which means that they have many instances

17http://ontologist.com

Units Ontology

developed in conjunction with PATO, the Phenotypic qualities ontology

obo.sourceforge.net/cgi-bin/detail.cgi?quality

18

fiat subtypes of qualities

spatial quality

length weighttemperature

is_a

1mm 1cm 1g 1kg…

quality

19

Representation of measurements

spatial quality

length weighttemperature

is_a

mm

cm

kg

g

qualityunit

measurement_of

20http://ontologist.com

Ingvar Johansson:

(a) no object can possibly at one and the same time take two values of the same quantity dimension

(b) in case of additive quantities, only quantities of the same dimension can be added together to give rise to a sum: no material object can have two masses, and masses can only be added to other masses

21http://ontologist.com

Controlled vocabulary

Each SI unit is represented by a symbol, not an abbreviation. The use of unit symbols is regulated by precise rules.

These symbols are the same in every language of the world, even though the names of the units themselves vary in spelling according to national conventions.

22http://ontologist.com

The SI system of units gives you:

a gold standard controlled vocabulary for the expression of scientific results which makes these results comparable and integratable– my hypotheses can be checked against your

datamy measuring equipment can be callibrated against your measuring equipment (because each can be callibrated against the same gold standard)the SI system of units can serve as a gold standard because it is a true reflection of an independent reality

23http://ontologist.com

a system of units is a legend for measurement data

heartrate

cadencespeed

torque power

24http://ontologist.com

compare: legends for mapscompare: legends for maps

25http://ontologist.com

Creating a system of units

is not easy; it has to match the way the measurable dimensions are interconnected in reality

it may need to be revised in light of new discoveries about how reality is structured

26http://ontologist.com

after Maxwell and Thomson

the subsequent development of physics as an experimental science was largely based on their system of standardized units.

27http://ontologist.com

analogous achievements also in chemistry

IUPAC

InChI

and in molecular biology,

for proteins, enzymes, genes, etc.

IUBMB

HUGO Gene Nomenclature Committee,

etc.

28http://ontologist.com

Periodic Table

29http://ontologist.com

the goal of realist ontology

to generalize this achievement– specifically in biology– and in medicine (where forces are at work

which tend to thwart standardization of vocabulary)

to move from standardizations of nouns to standardizations of sentences

gene expression data

realist ontologies are legends for data

31http://ontologist.com

where in the body ?

what kind of disease process ?

need for semantic annotation of data

in what kind of cell?

32http://ontologist.com

33http://ontologist.com

the Gene Ontology is already a de facto standard

34http://ontologist.com

natural language labels organized in a graph-theoretic structure,designed to make the data

cognitively accessible to human beings

algorithmically accessible to machines

linked up to other data resources because the same labels have been used

35http://ontologist.com

compare: legends for cartoons (for diagrams in scientific texts)

36http://ontologist.com

xi = vector of measurements of gene i k = the state of the gene ( as “on” or “off”)θi = set of parameters of the Gaussian model......

ontologies are legends for mathematical equations

37http://ontologist.com

or chemistry diagrams

Prasanna, et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds

PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006)

38http://ontologist.com

annotation using common ontologies yields integration of databases

MouseEcotope GlyProt

DiabetInGene

GluChem

Holliday junction helicase complex

39http://ontologist.com

What is mapping (1)

“Given two ontologies A and B, mapping one ontology with another means that for each concept (node) in ontology A, we try to find a corresponding concept (node), which has the same or similar semantics, in ontology B and vice verse.”

M. Ehrig M and Y. Sure, Ontology mapping - an integrated approach. In Proceedings of the First European Semantic Web Symposium, ESWS 2004,

volume 3053 of Lecture Notes in Computer Science, pages 76–91, Heraklion, Greece, May 2004. Springer Verlag.

40http://ontologist.com

What is mapping (2)“the task of relating the vocabulary of two ontologies in such a way that the mathematical structure of ontological signatures and their intended interpretations, as specified by the ontological axioms, are respected ”.[ontological signature = a hierarchy of concept symbols together with a set of relation symbols whose arguments are defined over the concepts of the concept hierarchy]

Y. Kalfoglou and M. Schorlemmer, Ontology mapping: the state of the art. Knowl. Eng. Rev., 18(1): 2003.

41http://ontologist.com

What is mapping (3)“a formal expression that states the semantic

relation between two entities belonging to different ontologies”,

“Simple examples are: concept c1 in ontology O1 is equivalent to concept

c2 in ontology O2; concept c1 in ontology O1 is similar to concept c2

in ontology O2; individual i1 in ontology O1 is the same as

individual i2 in ontology O2”

P. Bouquet et al. KnowledgeWeb deliverable D2.2.1. Specification of a common framework for characterizing alignment.

42http://ontologist.com

One way to support ontology matching (and evaluation)

have experts manually prepare for each given matching problem a gold standard to which matching efforts could be compared.

– M. Ehrig and J. Euzenat, Relaxed Precision and Recall for Ontology Matching, in: Proc. K-Cap 2005 workshop on Integrating ontology, Banff (CA), p. 25-32, 2005.

43http://ontologist.com

Gold standard methodology for ontology evaluation

is very expensive

who are the experts?

sometimes cannot be done for political reasons• UMLS metathesaurus

even a gold standard can contain errors

44http://ontologist.com

Solution: The OBO Foundry1. some large pieces already exist (especially

Gene Ontology, Foundational Model of Anatomy)

2. processes of unification and reform already in place

3. all participants aiming for additivity4. procedures for constant update in light of

scientific advance

http://obofoundry.org

45http://ontologist.com

science basis of the GO: trained experts curating peer-reviewed literature

RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form

Contrast: data-mining based approaches to ontology construction

The GO methodology of annotations

46http://ontologist.com

Systematic annotation of references to gene products in literature

• leads to improvements and extensions of the ontology• leads to better annotations• leads to a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself

47http://ontologist.com

Five bangs for your GO buckscience base

cross-species database integration

cross-granularity database integration

through links to the entities in biological reality

semantic searchability links people to software

48http://ontologist.com

a shared portal for (so far) 58 ontologies (low regimentation)

http://obo.sourceforge.net NCBO BioPortal

First step (2003)First step (2003)

49http://ontologist.com

50http://ontologist.com

Second step (2004)reform efforts initiated, e.g. linking GO to other

OBO ontologies to ensure orthogonality

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

51http://ontologist.com

The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/

Third step (2006)Third step (2006)

52http://ontologist.com

A prospective standarddesigned to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping)

established March 2006

12 initial candidate OBO ontologies – focused primarily on basic science domains

several being constructed ab initio

by influential consortia who have the authority to impose their use on large parts of the relevant communities.

53http://ontologist.com

undergoing rigorous reform

new

GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology

CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

54http://ontologist.com

Ontology Scope URL Custodians

Cell Ontology (CL)

cell types from prokaryotes to mammals

obo.sourceforge.net/cgi-

bin/detail.cgi?cell

Jonathan Bard, Michael Ashburner, Oliver Hofman

Chemical Entities of Bio-

logical Interest (ChEBI)

molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara

Common Anatomy Refer-

ence Ontology (CARO)

anatomical structures in human and model

organisms(under development)

Melissa Haendel, Terry Hayamizu, Cornelius

Rosse, David Sutherland,

Foundational Model of Anatomy (FMA)

structure of the human body

fma.biostr.washington.

edu

JLV Mejino Jr.,Cornelius Rosse

Functional Genomics Investigation

Ontology (FuGO)

design, protocol, data instrumentation, and

analysisfugo.sf.net FuGO Working Group

Gene Ontology (GO)

cellular components, molecular functions, biological processes

www.geneontology.org

Gene Ontology Consortium

Phenotypic Quality Ontology

(PaTO)

qualities of anatomical structures

obo.sourceforge.net/cgi

-bin/ detail.cgi?attribute_and_value

Michael Ashburner, Suzanna

Lewis, Georgios Gkoutos

Protein Ontology (PrO)

protein types and modifications

(under development)Protein Ontology

Consortium

Relation Ontology (RO)

relationsobo.sf.net/

relationshipBarry Smith, Chris

Mungall

RNA Ontology(RnaO)

three-dimensional RNA structures

(under development) RNA Ontology Consortium

Sequence Ontology(SO)

properties and features of nucleic sequences

song.sf.net Karen Eilbeck

55http://ontologist.com

to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation of new ontologies y each clinical research group

REUSABILITY: if data-schemas are formulated using a single well-integrated framework ontology system in widespread use, then this data will be to this degree itself become more widely accessible and usable

GOALS

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

56http://ontologist.com

to serve as BENCHMARK FOR IMPROVEMENTS: once a system of interoperable reference ontologies is there, it will make sense to calibrate existing terminologies in its terms in order to achieve more robust alignment and greater domain coverage

GOALS

The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/

57http://ontologist.com

Gold standard

Two aspects:

1. an expression of practice carried out perfectly (for example, the optimal therapy for a given medical problem)

2. based on complete acceptance or consensus: everyone qualified to render a judgement would agree to what the gold standard is.

Friedman CP, Wyatt J. Evaluation Methods in Medical Informatics

58http://ontologist.com

Gold standards

are worth approximating. That is, “tarnished” or “fuzzy” standards are better than no standards at all. ... studies comparing the performance of information resources against imperfect standards, so long as the degree of imperfection has been estimated, represent a stronger approach than those that bypass the issue of a standard altogether.

Friedman CP, Wyatt J. Evaluation Methods in Medical Informatics

59http://ontologist.com

Gold standardscan also be partial: to serve ontology matching and evaluation it is enough to have ontologies comprehending even selected aspects of biomedical reality, provided the assertions contained in these ontologies are universally truein non-closed worlds, gold standards will always be partial in complex disciplines gold standards will always be evolving

60http://ontologist.com

the constraint of universalityOBO Foundry ontologies accept only those relations

between their terms which obtain universally (= for all instances)

lung is_a anatomical structurelobe of lung part_of lung

Compare: electrons have a negative electric charge electrons have a negative electric charge of 1.6 × 10-19 coulomb

61http://ontologist.com

Principle of Low Hanging Fruit

Ontologies should include even absolutely trivial assertions (assertions you know to be universally true)

herpes virus is_a virus

Computers need to be led by the hand

62http://ontologist.com

if the standard is to workit has to simulate the achievements of the SI system

of units• simple• controlled vocabulary• wide acceptance• uncontroversial• allows cross-disciplinary, cross-experimenter

callibration • my data can confirm or disconfirm your

hypothesis