1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

167
1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith http://ontology.buffalo.edu/smith
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

Page 1: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

1

Part I: Biomedical Ontologies: A Critical Survey

Barry Smith

http://ontology.buffalo.edu/smith

Page 2: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

2

I: Biomedical Ontologies: A Critical SurveyOntologies, terminologies and thesauri are now in common use in the domain of biomedical informatics. Their goal is to support search and retrieval, but also to advance genuine reasoning about biomedical phenomena and to enable re-use of heterogeneous data through the use of common systems of annotations. We examine a representative collection of biomedical ontologies in light of these criteria, and draw (somewhat sad) conclusions as to the current state of the field.

II. The Ontology of Biomedical Reality (terminology)Ontologies to support scientific research and clinical medicine have special characteristics, which we shall outline in terms of a distinction between three levels: (1) the level of reality; (2) the level of cognitive representations; and (3) the level of the publicly accessible concretizations of such cognitive representations for example in ontologies. Against this background we shall clarify the relations between ontologies, terminologies, information models, databases, and similar artifacts.

III. The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology DevelopmentThe OBO Foundry is a collaborative experiment, involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. The primary objective is to establish gold standard reference ontologies, one for each core domain of biomedical science. We shall describe how this objective is already being realized, and show how it can not only help solve the problems of data retrieval and re-use but also foster the development of the powerful tools that will be needed to reason with biomedical data in the future.

Page 3: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

3

Problem:how to reason with data deriving from different sources, each of which uses its own system of classification ?

Page 4: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

4

Solution:

Ontology !

Page 5: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

5

Examples of current needs for ontologies in biomedicine

to enforce semantic consistency within a database

to enable data retrieval, sharing and re-use

to enable data integration (bridging across data at multiple granularities)

to allow querying

Page 6: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

6

General trend

on the part of NIH, FDA and other bodies to consolidate ontology-based standards for the communication and processing of biomedical data.

Page 7: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

7

Old approach

gather terminologies in libraries

Unified Medical Language System

National Library of Medicine

Page 8: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

8

SNOMED

DEMONS

U M L S

Page 9: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

9

New Approach

MusicBeanz

Page 10: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

10

http://www.w3.org/

Page 11: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

11

Semantic Web deposits

Pet Profile Ontology

Review Vocabulary

Band Description Vocabulary

Musical Baton Vocabulary

MusicBrainz Metadata Vocabulary

Kissology

Page 12: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

12

http://www.w3.org/

Beer Ontology

all instances of hops that have ever existed are necessarily ingredients of beer.

Page 13: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

13

Both UMLS- and OWL-type responses involve ad hoc creation of new terminologies by each separate community, and an open-door policy for admission

Many of these terminologies remain as torsos, gather dust, poison the wells, ...

Page 14: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

14

OWL’s syntactic regimentation is not enough to ensure high-quality

ontologies

– the use of a common syntax and logical machinery and the careful separating out of ontologies into namespaces does not solve the problem of ontology integration

Page 15: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

15

from Ontological Engineering

location =def. a spatial point identified by a name (p. 12)

arrivalPlace =def. a journey ends at a location (p. 13)

facet = def. ternary relation that holds between a frame, a slot, and the facet (p. 51)

an example of function is Pays, which obtains the price of a room after applying a discount (p. 13)

Page 16: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

16

from Handbook of Ontology

On 'achieving consistency from multiple sources‘:if exact semantic identity is lacking, terms can be unified at a higher level, and information that is possibly related can be retrieved as well. When the application objective is to study and understand, the end-user can reject misleading records. (p. 94)

owl:InverseFunctionalProperty defines a property that for which two different objects cannot have the same value, e.g. isTheSocialSecurityNumberOf (a social number is assigned to one person only) (p. 78)

Page 17: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

17

SNOMED

DEMONS

U M L S

The Good, the Bad, and the UGLY

Page 18: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

18

A methodology for quality-assurance of ontologies

tested thus far in the biomedical domain on:

FMAGO + other OBO OntologiesFuGOSNOMEDUMLS Semantic NetworkNCI ThesaurusICF (International Classification of Functioning,

Disability and Health)ISO Terminology StandardsHL7-RIM

Page 19: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

19

The Good

Foundational Model of Anatomy (FMA)

Proclear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromoleculePowerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning

ConSome unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)

Page 20: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

20

it’s better manually

Page 21: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

Page 22: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

22

The Foundational Model of Anatomy

Follows formal rules for ‘Aristotelian’ definitions

When A is_a B, the definition of ‘A’ takes the form:

an A =def. a B which ...

a human being =def. an animal which is rational

Page 23: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

23

FMA Example

Cell =def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus

Plasma membrane =def. a cell part that surrounds the cytoplasm

Page 24: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

24

The FMA regimentation

Each definition reflects the position in the hierarchy to which a defined term belongs.

The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.

The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation

Page 25: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

25

Principle

Use Aristotelian definitions

An A is a B which C’s.

Page 26: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

26

IntermediateGALEN

Pro Allows formal representation of clinical information Allows multiple views of relevant detail as needed Uses powerful Description Logic (DL)-based formal structureMakes definitions easy to formulateConRemains only partially developedContains errors: Vomitus contains carrot

– which DLs did not prevent

Page 27: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

27

Principle

An ontology should not remain a torso

Page 28: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

28

Principle

An ontology should have a properly personed help desk

Page 29: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

29

Principle

An ontology should have procedures for up-dating in light of scientific advance

Page 30: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

30

IntermediateThe Gene Ontology

Con

Poor formal architecture

Full of errors

menopause part_of death

Poor support for automatic reasoning and error-checking

Poor treatment of definitions

Not trans-granular

No relation to time or instances

Page 31: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

31

The Gene Ontology

Pro

Open Source

Cross-Species

... has recognized the need for reform, including explicit representation of granular levels

Page 32: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

32

Old GO Definitions

hemolysis =def. the causes of hemolysis

Page 33: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

GO now adopting structured definitions which contain both genus and differentiae

Species =def Genus + Differentiae

neuron cell differentiation =defdifferentiation by which a cell acquires features of a neuron

Page 34: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

Ontology alignmentOne of the current goals of GO is to align:

cone cell fate commitment retinal_cone_cell

keratinocyte differentiation keratinocyte

adipocyte differentiation fat_cell

dendritic cell activation dendritic_cell

lymphocyte proliferation lymphocyte

T-cell homeostasis T_lymphocyte

garland cell differentiation garland_cell

heterocyst cell differentiation heterocyst

Cell Types in GO Cell Types in the Cell Ontologywith

Page 35: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

Alignment of the two ontologies will permit the generation of consistent and complete definitions

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

Page 36: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

36

Other Ontologies to be aligned with GO

Chemical ontologies3,4-dihydroxy-2-butanone-4-phosphate synthase activity

Anatomy ontologiesmetanephros development

Page 37: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

37

Principle

Exploit existing ontologies when formulating definitions

Page 38: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

38

The Bad

Reactome ProRich catalogue of biological process ConIncoherent treatment of categories:

ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). Similarly CatalystActivity is a sibling of Event.

Page 39: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

39

Principle

An ontology should be in agreement with the truths of basic science (e.g. that molecules are physical entities)

Page 40: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

40

The UglyDisease Ontology / ICD-10

Other problems with special functions

Tuberculosis of unspecified bones and joints, tubercle bacilli not found by bacteriological or histological examination, but tuberculosis confirmed by other methods (inoculation of animals)

Other mineral salts, not elsewhere classified, causing adverse effects in therapeutic use

Other general medical examination for administrative purposes

Assault by other specified means

Page 41: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

41

The UglyDisease Ontology / ICD-10

Other accidental submersion or drowning in water transport accident injuring other specified person

Accident to powered aircraft, other and unspecified, injuring occupant of military aircraft, any rank

Other accidental submersion or drowning in water transport accident injuring occupant of other watercraft - crew

Page 42: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

42

The UglyDisease Ontology / ICD-10

Normal pregnancy

Fall on stairs or ladders in water transport injuring occupant of small boat, unpowered

Railway accident involving collision with rolling stock and injuring pedal cyclist

Injury due to war operations by lasers

Nontraffic accident involving motor-driven snow vehicle injuring pedestrian 

Page 43: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

43

The UglyDisease Ontology / ICD-10

Donors of other specified organ or tissue

Fitting and adjustment of wheelchair

Hot (boiling) tap water

Training in use of lead dog for the blind

Person consulting on behalf of another person

Page 44: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

44

Principle

An ontology should have a clearly specified domain (captured by its root node)

Page 45: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

45

“Circular Hierarchical Relationships in the UMLS:Etiology, Diagnosis, Treatment, Complications and Prevention”

Olivier Bodenreider

Topographic regions: General terms

Physical anatomical entity

Anatomical spatial entity

Anatomical surface

Body regions

Topographic regions

Page 46: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

46

Principle

Avoid cycles

Page 47: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

47

MeSH

National Socialism is_a Political Systems

National Socialism is_a Anthropology ...

Page 48: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

48

Principle

Use singular nouns

Page 49: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

49

MeSH

National Socialism is_a MeSH Descriptor

Page 50: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

50

Plant Ontology

cell = def. structural and physiological unit of a living organism; it (i.e., plant cell) consists of protoplast and cell wall; ...

Page 51: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

51

Principle

For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings

(Don’t use ‘cell’ when you mean ‘plant cell’)

Page 52: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

52

ICNP: International Classification of Nursing Procedures

water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings.

Page 53: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

53

MORE UGLYNational Cancer Institute Thesaurus

(NCIT)

Page 54: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

54

The NCIT reflects a recognition of the need

for high quality shared ontologies and terminologies the use of which by clinical researchers in large communities can ensure re-usability of data collected by different research groups

Page 55: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

55

NCIT

“a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research”

“exhibits ontology-like properties in its construction and use”.

Page 56: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

56

Goals

to make use of current terminology “best practices” to relate relevant concepts to one another in a formal structure, so that computers as well as humans can use the Thesaurus for a variety of purposes, including the support of automatic reasoning;

to speed the introduction of new concepts and new relationships in response to the emerging needs of basic researchers, clinical trials, information services and other users.

Page 57: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

57

Formal Definitions

of 37,261 nodes, 33,720 were stipulated to be primitive in the DL sense

Thus only a small portion of the NCIT ontology can be used for purposes of automatic classification and error-checking by using OWL.

Page 58: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

58

Principle

Supply definitions wherever possible

(both human-understandable natural language definitions, and equivalent formal definitions)

Page 59: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

59

Verbal Definitions

About half the NCIT terms are assigned verbal definitions

Unfortunately some are assigned more than one

Page 60: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

60

Disease ProgressionDefinition1

Cancer that continues to grow or spread. Definition2

Increase in the size of a tumor or spread of cancer in the body.

Definition3 The worsening of a disease over time. This concept is most often used for chronic and incurable diseases where the stage of the disease is an important determinant of therapy and prognosis.

Page 61: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

61

Principle

Each term should have at most one definition*

*which may have both natural-language and formal versions

Page 62: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

62

To make matters worse Disease Progression has as subclass:

Cancer Progression

Definition:

The worsening of a cancer over time. This concept is most often used for incurable cancers where the stage of the cancer is an important determinant of therapy and prognosis.

Page 63: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

63

Cancer

a process (of getting better or worse)

an object (which can grow and spread)

Page 64: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

64

Principle

Distinguish continuant entities (molecule, cell, tumor, organism) from occurrent entities (processes of growth, change, ...)

Page 65: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

65

Two kinds of entitiesoccurrents (processes, events, happenings)

cell division, ovulation, death

continuants (objects, qualities, ...)

cell, ovum, organism, temperature of organism, ...

Page 66: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

66

NCIT confuses definitions with descriptions

Tuberculosis DefinitionA chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.

Page 67: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

67

Confuses definitions with descriptionsTuberculosis

DefinitionA chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.

Page 68: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

68

A better definition

Tuberculosis

Definition:

A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis.

IS THIS CORRECT? (An infection is not a disease)

Page 69: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

69

the use-mention confusion

Conceptual Entities =Def.

An organizational header for concepts representing mostly abstract entities.

Confuses use and mention (swimming is healthy and has eight letters)

Page 70: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

70

Principle

Don’t confuse an entity with the name of an entity

Page 71: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

71

Duratec, Lactobutyrin, Stilbene Aldehyde

are classified by the NCIT as Unclassified Drugs and Chemicals

Page 72: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

72

Problematic synonymsAnatomic Structure, System, or Substance ~ Anatomic

Structures and Systems

Does ‘anatomic’ apply only to structure or also to system and substance?

Biological Function ~ Biological Processsome biological processes are the exercises of biological

functionsothers (e.g. pathological processes, side effects) not

Genetic Abnormality ~ Molecular Abnormality (with subtype: Molecular Genetic Abnormality) (definitions not supplied)

Page 73: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

73

Three disjoint classes of plants

Vascular Plant

Non-vascular Plant

Other Plant

Page 74: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

74

Three kinds of cells

Abnormal Cell is a top-level class (thus not subsumed by Cell

Normal Cell is a subclass of Microanatomy.

Cell is a subclass of Other Anatomic Concept (so that cells themselves are concepts)

Page 75: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

75

NCIT as now constituted will block automatic reasoning

Neither Normal Cells nor Abnormal Cells are Cells within the context of the NCIT

Page 76: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

76

Some consolationsNCIT is open source

NCIT has broad coverage

NCIT has some formal structure (OWL-DL)

NCIT is much, much better than (for example) the HL7-RIM

NCIT has realized the errors of its ways

Page 77: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

77

What might have been

http://www.cbd-net.com/index.php/search/show/938464

= “Review of NCI Thesaurus and Development of Plan to Achieve OBO Compliance”

Page 78: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

78

Fragment of Pre-NCIT Hierarchy

Murine Tissue Type Body Fluids and Substances (MMHCC) Cardiovascular System (MMHCC) Blood Vessel (MMHCC) Heart (MMHCC) Digestive System (MMHCC)

Welcome to the Pre-NCIT:http://nciterms.nci.nih.gov/

NCIBrowser/Dictionary.do

Page 79: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

79

More UGLY

Page 80: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

80

MeSHMeSH Descriptors

Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism

National Socialism is_a Political SystemsNational Socialism is_a Anthropology ...

Page 81: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

81

MeSH

National Socialism is_a MeSH Descriptors

The Bodenreider Defence:

MeSH is not an ontology

Page 82: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

82

BIRNLex

Page 83: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

83

BIRNLexThe eye =def.

The eyeball and its constituent parts, e.g. retina

mouse =def.

common name for the species mus musculus

Page 84: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

84

BIRNLex

Page 85: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

85

BIRNLex

Page 86: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

86

Principle

Avoid circular definitions

(The term defined should not appear in its own definition)

Page 87: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

87The UMLS Semantic Network

Page 88: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

88

More UglyUMLS Semantic Network

Pros

Broad coverage; no multiple inheritance

Cons

Incoherent use of ‘conceptual entities’

(e.g. the digestive system as a conceptual part of the organism)

Full of errors

Page 89: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

89

UMLS Semantic Network

Edges in the graph represent merely “possible significant (= some-some) relations”:Bacterium causes Experimental Model of

Disease

Experimental Model of Disease affects Fungus

Experimental model of disease is_a Pathologic Function

Page 90: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

90

UMLS Semantic NetworkUnclear what the nodes of the graph are:

Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object

The use-mention confusion:“Swimming is healthy and has 8 letters”

Page 91: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

91

UMLS Semantic Network

Edges in the graph represent merely “possible significant (= some-some) relations”:Bacterium causes Experimental Model of

Disease

Experimental Model of Disease affects Fungus

Experimental Model of Disease is_a Pathologic Function

Page 92: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

92a pudding of ‘concepts’

Page 93: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

93

location_of

Fungus location_of Vitamin

Tissue location_of Mental or Behavioral Dysfunction

Page 94: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

94

Fungus location_of Vitamin

Every instance of vitamin is located in some fungus?

Some instances of vitamin are located in some fungi?

Some instances of fungi have instances of vitamin located in them?

Every instance of vitamin is located in every instance of fungus?

Page 95: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

95what are the nodes in this graph?

Page 96: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

96

Page 97: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

97

UMLS Semantic NetworkUnclear what the nodes of the graph are:

Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object

The use-mention confusion:“Swimming is healthy and has 8 letters”

Page 98: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

98

NCIT inherits this ontological and terminological incoherence from source vocabularies in UMLS

Conceptual Entities =def

An organizational header for concepts representing mostly abstract entities.

Includes as subtypes:

action, change, color, death, event, fluid, injection, temperature

Page 99: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

99

The UMLS

Unified Medical Language System

Metathesaurus

Semantic Network (SN)

Page 100: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

100

BIRNLex and UMLS-SNRest =SN Daily or Recreational ActivityPrincipal Investigator =SN Professional or Occupational Group

Left handedness =SN Organism AttributeAmbidextrous =SN Finding

Brain Imaging =SN Diagnostic ProcedureBrain Mapping =SN Diagnostic Procedure & Research Activity

Healthy Adult =SN Finding

Page 101: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

101

To build a high quality shared ontology requires hard work and

staying power

You cannot cheat by borrowing from UMLS

UMLS (= the UMLS Metathesaurus) is not an ontology

Page 102: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

102

is_a (sensu UMLS)

A is_a B =def

‘A’ is narrower in meaning than ‘B’

grows out of the heritage of dictionaries, which reflect meanings, not biological reality

Page 103: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

103

Concepts, Concept Names, and their Identifiers in the UMLS

The Metathesaurus is organized by concept. One of its primary purposes is to connect different names for the same concept from many different vocabularies.

Page 104: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

104

The desperate search for ‘mappings’

A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms).

Page 105: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

105

The desperate search for ‘mappings’

This is not an exact science. ... Metathesaurus editors decide what view of synonymy to represent in the Metathesaurus concept structure. Please note that each source vocabulary’s view of synonymy is also present in the Metathesaurus, irrespective of whether it agrees or disagrees with the Metathesaurus view.

Page 106: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

106

These strange mappingbetween names as they appear in different source vocabularies created for widely different purposes can still be very usefulbut the source vocabularies themselves are of variable quality

(not all mappings are created equal)and the sorts of search which the UMLS supports reflects an already outmoded technology

Page 107: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

107

is_a (sensu UMLS)congenital absent nipple is_a nipple

surgical procedure not carried out because of patient’s decision is_a surgical procedure

cancer documentation is_a cancer

disease prevention is_a disease

living subject is_a information object representing an animal or complex organism

individual allele is_a act of observation

limb is_a tissue

Page 108: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

108

is_a (sensu UMLS)

both testes is_a testis

plant leaves is_a plant

smoking is_a individual behavior

walking is_a social behavior

Page 109: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

109

Advantages of the methodology of shared coherently defined

ontologiesonce the interoperable gold standard reference ontologies are there, it will make sense to reformulate parts of existing incompatible terminologies (e.g. in UMLS) in terms of the standard ontologies in order to achieve greater domain coverage and alignment of different but veridical views. Thus not everything that was done in the past turns out to be a waste.

Page 110: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

110

is_a (sensu UMLS)

A is_a B =def

‘A ’ is narrower in meaning than ‘B ’

grows out of the heritage of dictionaries

(which ignore the basic distinction between universals and instances)

Page 111: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

111

The really ugly

Page 112: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

112

Page 113: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

113

HL7 Marketing

HL7 V3 claims to be:

“The foundation of healthcare interoperability”

“The data standard for biomedical informatics”

from blood banks to Electronic Health Records to clinical genomics

Page 114: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

114

HL7 Incredibly Successful

adopted by Oracle as basis for its Electronic Health Record technology; supported by IBM, GE, Sun ...

embraced as US federal standard

central part of $35 billion program to integrate all UK hospital information systems

Page 115: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

115

Problem V3 of HL7 is designed to address

in HL7 V2 the realization of the messaging task allows ad hoc interpretations of the standard by each sending or receiving institution.

Result: vendor products never properly interoperable, and always require mapping software.

Page 116: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

116

The solution to this problem (V3) is the HL7 RIM

or Reference Information Model

= a world standard for exchange of information between clinical information systems

Page 117: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

117

The V3 solution

Remove optionality by having the RIM serve as a master model of all health information, from blood banks to Electronic Health Records to clinical genomics

Page 118: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

118

The hype

“HL7 V3 is the standard of choice for countries and their initiatives to create national EHR and EHR data exchange standards as it provides a level of semantic interoperability unavailable with previous versions and other standards. Significant V3 national implementations exist in many countries, e.g. in the UK (e.g. the English NHS), the Netherlands, Canada, Mexico, Germany and Croatia.”

Page 119: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

119

The reality (I asked them)“None of the implementations have a national scope” (e.g. Stockholm City Council)

The paradigm Dutch national HL7 V3 EHR implementation uses HL7 technology exclusively for exchanging data (i.e. messaging). The EHR architectures themselves are HL7-free.

Page 120: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

120

The Oracle Healthcare Transaction Base (HTB)

Oracle itself refers (April 2006) to three implementations of HTB described as being 'live for EHR projects':

1) Byrraju Foundation (BSRF) in India (Live)2) Stockholm County (planned to go live by May 2006)3) Louisiana (planned to go live by May 2006)

Page 121: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

121

Regarding the Byrraju case, I am told that there is no V3 application running in India today and that the Byrraju Foundation is presently not using any telemedicine application that utilizes HL7.

As to the Stockholm case, the HTB was purchased and deployed in late 2004. An attempt to port a pilot system was made during the spring of 2005. This attept was abandoned, as I understand from my Swedish colleagues, partly because of poor performance (the new application performed significantly less well than the system it was designed to replace, even though it was being run on considerably more expensive hardware), and partly because of a lack of fault tolerance, which made it inadequate as a mechanism for integrating legacy systems marked by a high degree of variation in data quality. During the spring of 2006, it seems, an attempt will be made to construct a new pilot application, this time with the more modest goal of handling referrals.

Page 122: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

122

The hype

The RIM is “credible, clear, comprehensive, concise, and consistent”

It is “universally applicable” and “extremely stable”

Page 123: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

123

The reality

• HL7 V3 documentation is 542,458 KB, divided into 7,573 files

• It remains subject to frequent revisions

• It is very difficult to understand

Page 124: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

124

The reality

The decision to adopt the RIM was made already in 1996, yet the promised benefits of interoperability still, after 10 years, remain elusive.

HL7 has bet the farm on the RIM – technology has advanced in these 10 years

Page 125: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

125RIM NORMATIVE CONTENT

Page 126: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

126

to design a message, choose from here

Page 127: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

127

Too many combinations

as the traffic on HL7’s own vocabulary mailing list reveals, there is no adequate mechanism for ensuring that the vast number of combinations of coded terms within actual messages can be controlled in such a way that messages will be understood in the same way by designers, senders and receivers.

Page 128: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

128

Page 129: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

129

These pre-defined attributes

code, class_code, mood_code,

status_code, etc.

yield a combinatorial explosion:

class_code (61 values) x mood_code (13 values) x code (estimate 200) x status_code (10 codes) = 1.58 million combinations.

Adding in the other codes this becomes 810 billion.

Page 130: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

130

Why does the RIM embody so many

combinations?

To ensure in advance that everything can be said in conformity to the standard

Page 131: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

131

The RIM methodologydefines a set of ‘normative’ classes (Act, Role, and so on), with which are associated a rich stock of attributes from which one must make a selection when applying the RIM to each new domain (pharmacy, clinical genomics ...), Compare: attempting to create manufacturing software by drawing from a store containing pre-established parts (so that the store would need to have the bits needed for making every conceivable manufacturable thing, be it a lawnmower, a refrigerator, a hunting bow, and so on).

Page 132: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

132

The RIM methodology

are there examples where a methodology of this sort has been made to work? Does the RIM yield a coherent basis for constructing well-designed software artifacts for functions like the EHR or computerized decision support?

Page 133: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

133

This methodology does not impede the formation of local dialects

Different teams produce different message designs for the very same topic.

In the UK, the £ 35 bn. NHS National Program “Connecting for Health” has applied the RIM rigorously, using all the normative elements, and it discovered that it needed to create dialects of its own to make the V3-based system work for its purposes (it still does not work)

Page 134: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

134

The RIM documentation• is subject to multiple and systematic internal

inconsistencies and unclarities: • is marked by sloppy and unexplained use of

terms such as ‘act’, ‘Act’, ‘Acts’, ‘action’, ‘ActClass’ ‘Act-instance’, ‘Act-object’

• and uncertain cross-referencing to other HL7 documents

• no publicly available teaching materials (no HL7 for Dummies)

Page 135: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

135

from HL7 email forum (do not circulate)

“I am ... frightened when I contemplate the number of potential V3ers who ... simply are turned away by the difficulty of accessing the product.

  “Some of them attend V3 tutorials which explain V3 as the hugely complex process of creating a message and are turned off. [They] simply do not have the stamina, patience, endurance, time, or brain-cells to understand enough for them to feel comfortable contributing to debates / listserves, etc., so they remain silent.”

Page 136: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

136

Problems of scope

Only two main classes in the RIM

Act = roughly: intentional action

Entity = persons, places, organizations, material

How can the RIM deal transparently with information about, say, disease processes, drug interactions, wounds, accidents, bodily organs, documents?

Page 137: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

137

Diseases in the RIM... are not Acts... are not Entities... are not Roles, Participations ...

So what are they?At best: a case of pneumonia is identified as

the Act of Observation of a case of pneumonia

Note: RIM’s treatment of SNOMED codes

Page 138: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

138

HL7 Clinical Document Architecturedefines a document as an Act

HL7’s Clinical Genomics Standard Specifications

defines an individual allele as an Act of Observation

Page 139: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

139

Why the centrality of ‘Act’

because of HL7’s roots in US hospital messaging – and thus in US hospital billing:

intentional actions are what can be billed

Page 140: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

140

Mayo RIM discussion of the meaning of ‘Act’ as “intentional action”

Is a snake bite or bee sting an "intentional action"?

Is a knife stabbing an intentional action?

Is a car accident an intentional action?

When a child swallows the contents of a bottle of poison is that an intentional action?

Page 141: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

141

The RIM has no coherent criteria for deciding

For this reason, too, dialects are formed – and the RIM does not do its job. One health information system might conceive snakebites and gunshots as Procedures. Another might classify them with diseases, and so treat them as Observations.

If basic categories cannot be agreed upon for common phenomena like snakebites, then the RIM is in serious trouble.

Page 142: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

142

Are definitions like this a good basis for achieving semantic interoperability in the biomedical domain?:

LivingSubject Definition: A subtype of Entity representing an organism or complex animal, alive or not.

Page 143: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

143

Person (from HL7 Glossary)

Definition: A Living Subject representing single human being [sic] who is uniquely identifiable through one or more legal documents

Page 144: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

144

The Problem of Circularity

A Person =def. A person with documents

‘An A is an A which is B’– useless in practical terms, since neither we

nor the machine can use it to find out what ‘A’ means

– incorporates a vicious infinite regress– has the effect of making it impossible to

refer to A’s which are not Bs, for example to undocumented persons

Page 145: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

145

Katrina

Page 146: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

146

Katrina

Page 147: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

147

What is the RIM about?blood pressure measurement = an information item blood pressure = something in reality which exists independently of any recording of information, and which the measurement measures

Q: Is the RIM about information, or about the reality to which such information relates? A: There is no difference between the two

Page 148: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

148

RIM Philosophy“The truth about the real world is constructed through a combination and arbitration of attributed statements ...

“As such, there is no distinction between an activity and its documentation.”

Page 149: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

149

The RIM as an Information Model

‘a static (UML) model of health and health care information’

The scope of the RIM’s class hierarchy consists in packets of information:

the information content of invoices, statements of observations, lab reports, …

Page 150: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

150

A good, general constraint on a theory of meaning

For each linguistic expression ‘E’

‘E’ means E

‘snow’ means snow

‘pneumonia’ means pneumonia

Page 151: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

151

From the perspective of the RIM on the Information Model conception‘medication’ does not mean: medication rather it means:

the record of medication in an information system

‘stopping a medication’ does not mean: stopping a medication

rather it means: change of state in the record of a Substance Administration Act from Active to Aborted

Page 152: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

152

The RIM’s Entity class

persons, places, organizations, material

Page 153: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

153

States of Entity• active: The state representing the fact that the

Entity is currently active. • nullified: The state representing the termination

of an Entity instance that was created in error. • inactive: The state representing the fact that an

entity can no longer be an active participant in events.

• normal: The “typical” state. Excludes “nullified”, which represents the termination state of an Entity instance that was created in error

Page 154: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

154

Persons are Entities

What do ‘active’ and ‘nullifed’ mean as applied to Person?

Is there a special kind of death-through-nullification in the case of those instances of Person who were created in error?

Page 155: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

155

HL7 GlossaryDefinition of Animal: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain.

An Animal is not an animal. Rather (an) Animal represents an animal: it is an information item which represents a certain highly specific kind of animal-of-interest, namely an animal that is of interest to the Personnel Management domain.

Page 156: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

156

Double Standards

The RIM is a confusion of two separate artifacts:

1. an “information model”, relating to names of persons, records of observations, social security numbers, etc.

2. a reference ontology, relating to persons, observations, documents, acts, etc.

Page 157: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

157

The examples provided to illustrate the RIM’s classes

are almost always in conformity with the Reference Ontology Conception of the RIM

They involve the familiar kinds of things and processes in reality (medication, patients, devices, paper documents, surgery, diet, supply of bedding) with which healthcare messages are concerned.

Page 158: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

158

HL7 Glossary:

Instances of Person include: John Smith, RN, Mary Jones, MD, etc.

not: information about John Smith ...

Page 159: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

159

Some of the RIM’s definitions are in conformity with the

Information Model Conception

Page 160: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

160

Definition of Act:A record of something that is being done,

has been done, can be done, or is intended or requested to be done

An Act is the record of an Act

“There is no difference between an activity and its documentation”

HL7’s backbone ‘Act’ class

Page 161: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

161

Acts are records: but the examples of Act given by the RIM are as follows:

“The kinds of acts that are common in health care are (1) a clinical observation, (2) an assessment of health condition (such as problems and diagnoses), (3) healthcare goals, (4) treatment services (such as medication, surgery, physical and psychological therapy), ...

Page 162: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

162

The class Procedure (a subclass of Act)

Definition of Procedure: An Act whose immediate and primary outcome (post-condition) is the alteration of the physical condition of the subject

Examples:

chiropractic treatment, acupuncture, straightening rivers, draining swamps.

Page 163: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

163

What is an information model ?

Is it a model of entities in reality (an ontology)?

Or of information about entities in reality (an ontology)?

The RIM is an incoherent mixture of the two

Does this matter?

Page 164: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

164

What’s gone wrong? 

People of good will are making mistakes because of insufficient concern for clarity and consistency

Even large ontologies are built in the spirit of the amateur hobbyist

Money is wasted on megasystems that cannot be used

Page 165: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

165

Lessons for Semantic Interoperability

Clear and easily accessible documentation – based on an intuitive ontology (understandable to all classes of users)

Business model should be such that those responsible for creating documentation do not have an incentive for it to be unclear

Centralized control of documentation, to ensure consistency (too much democracy is a bad thing)

Page 166: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

166

Lessons for Standards for Semantic Interoperability

Create standards on the basis of thorough pilot testing

(Avoid systems like the RIM, which is imposed from the top down, on a wing and a prayer)

Page 167: 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .

167

What should take the place of the RIM?1. A Reference Ontology of the types of biomedical entity such

as thing, process, person, disease, infection, molecule, procedure, etc.,

2. A Reference Ontology of the types of biomedical information entity such as message, document, record, image, diagnosis, interpretation, etc.

1. provides a high-level framework in terms of which the lower-level types captured in vocabularies like SNOMED CT could be coherently organized

2. helps to specify how information can be combined into meaningful units and used for further processing.