Principles for Building Biomedical Ontologies: A GO Perspective
1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of 1 Part I: Biomedical Ontologies: A Critical Survey Barry Smith .
1
Part I: Biomedical Ontologies: A Critical Survey
Barry Smith
http://ontology.buffalo.edu/smith
2
I: Biomedical Ontologies: A Critical SurveyOntologies, terminologies and thesauri are now in common use in the domain of biomedical informatics. Their goal is to support search and retrieval, but also to advance genuine reasoning about biomedical phenomena and to enable re-use of heterogeneous data through the use of common systems of annotations. We examine a representative collection of biomedical ontologies in light of these criteria, and draw (somewhat sad) conclusions as to the current state of the field.
II. The Ontology of Biomedical Reality (terminology)Ontologies to support scientific research and clinical medicine have special characteristics, which we shall outline in terms of a distinction between three levels: (1) the level of reality; (2) the level of cognitive representations; and (3) the level of the publicly accessible concretizations of such cognitive representations for example in ontologies. Against this background we shall clarify the relations between ontologies, terminologies, information models, databases, and similar artifacts.
III. The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology DevelopmentThe OBO Foundry is a collaborative experiment, involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. The primary objective is to establish gold standard reference ontologies, one for each core domain of biomedical science. We shall describe how this objective is already being realized, and show how it can not only help solve the problems of data retrieval and re-use but also foster the development of the powerful tools that will be needed to reason with biomedical data in the future.
3
Problem:how to reason with data deriving from different sources, each of which uses its own system of classification ?
4
Solution:
Ontology !
5
Examples of current needs for ontologies in biomedicine
to enforce semantic consistency within a database
to enable data retrieval, sharing and re-use
to enable data integration (bridging across data at multiple granularities)
to allow querying
6
General trend
on the part of NIH, FDA and other bodies to consolidate ontology-based standards for the communication and processing of biomedical data.
7
Old approach
gather terminologies in libraries
Unified Medical Language System
National Library of Medicine
8
SNOMED
DEMONS
U M L S
9
New Approach
MusicBeanz
10
http://www.w3.org/
11
Semantic Web deposits
Pet Profile Ontology
Review Vocabulary
Band Description Vocabulary
Musical Baton Vocabulary
MusicBrainz Metadata Vocabulary
Kissology
12
http://www.w3.org/
Beer Ontology
all instances of hops that have ever existed are necessarily ingredients of beer.
13
Both UMLS- and OWL-type responses involve ad hoc creation of new terminologies by each separate community, and an open-door policy for admission
Many of these terminologies remain as torsos, gather dust, poison the wells, ...
14
OWL’s syntactic regimentation is not enough to ensure high-quality
ontologies
– the use of a common syntax and logical machinery and the careful separating out of ontologies into namespaces does not solve the problem of ontology integration
15
from Ontological Engineering
location =def. a spatial point identified by a name (p. 12)
arrivalPlace =def. a journey ends at a location (p. 13)
facet = def. ternary relation that holds between a frame, a slot, and the facet (p. 51)
an example of function is Pays, which obtains the price of a room after applying a discount (p. 13)
16
from Handbook of Ontology
On 'achieving consistency from multiple sources‘:if exact semantic identity is lacking, terms can be unified at a higher level, and information that is possibly related can be retrieved as well. When the application objective is to study and understand, the end-user can reject misleading records. (p. 94)
owl:InverseFunctionalProperty defines a property that for which two different objects cannot have the same value, e.g. isTheSocialSecurityNumberOf (a social number is assigned to one person only) (p. 78)
17
SNOMED
DEMONS
U M L S
The Good, the Bad, and the UGLY
18
A methodology for quality-assurance of ontologies
tested thus far in the biomedical domain on:
FMAGO + other OBO OntologiesFuGOSNOMEDUMLS Semantic NetworkNCI ThesaurusICF (International Classification of Functioning,
Disability and Health)ISO Terminology StandardsHL7-RIM
19
The Good
Foundational Model of Anatomy (FMA)
Proclear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromoleculePowerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning
ConSome unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)
20
it’s better manually
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
part
_of
is_a
22
The Foundational Model of Anatomy
Follows formal rules for ‘Aristotelian’ definitions
When A is_a B, the definition of ‘A’ takes the form:
an A =def. a B which ...
a human being =def. an animal which is rational
23
FMA Example
Cell =def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus
Plasma membrane =def. a cell part that surrounds the cytoplasm
24
The FMA regimentation
Each definition reflects the position in the hierarchy to which a defined term belongs.
The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it.
The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation
25
Principle
Use Aristotelian definitions
An A is a B which C’s.
26
IntermediateGALEN
Pro Allows formal representation of clinical information Allows multiple views of relevant detail as needed Uses powerful Description Logic (DL)-based formal structureMakes definitions easy to formulateConRemains only partially developedContains errors: Vomitus contains carrot
– which DLs did not prevent
27
Principle
An ontology should not remain a torso
28
Principle
An ontology should have a properly personed help desk
29
Principle
An ontology should have procedures for up-dating in light of scientific advance
30
IntermediateThe Gene Ontology
Con
Poor formal architecture
Full of errors
menopause part_of death
Poor support for automatic reasoning and error-checking
Poor treatment of definitions
Not trans-granular
No relation to time or instances
31
The Gene Ontology
Pro
Open Source
Cross-Species
... has recognized the need for reform, including explicit representation of granular levels
32
Old GO Definitions
hemolysis =def. the causes of hemolysis
GO now adopting structured definitions which contain both genus and differentiae
Species =def Genus + Differentiae
neuron cell differentiation =defdifferentiation by which a cell acquires features of a neuron
Ontology alignmentOne of the current goals of GO is to align:
cone cell fate commitment retinal_cone_cell
keratinocyte differentiation keratinocyte
adipocyte differentiation fat_cell
dendritic cell activation dendritic_cell
lymphocyte proliferation lymphocyte
T-cell homeostasis T_lymphocyte
garland cell differentiation garland_cell
heterocyst cell differentiation heterocyst
Cell Types in GO Cell Types in the Cell Ontologywith
Alignment of the two ontologies will permit the generation of consistent and complete definitions
id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375
GO
Cell type
New Definition
+
=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
36
Other Ontologies to be aligned with GO
Chemical ontologies3,4-dihydroxy-2-butanone-4-phosphate synthase activity
Anatomy ontologiesmetanephros development
37
Principle
Exploit existing ontologies when formulating definitions
38
The Bad
Reactome ProRich catalogue of biological process ConIncoherent treatment of categories:
ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). Similarly CatalystActivity is a sibling of Event.
39
Principle
An ontology should be in agreement with the truths of basic science (e.g. that molecules are physical entities)
40
The UglyDisease Ontology / ICD-10
Other problems with special functions
Tuberculosis of unspecified bones and joints, tubercle bacilli not found by bacteriological or histological examination, but tuberculosis confirmed by other methods (inoculation of animals)
Other mineral salts, not elsewhere classified, causing adverse effects in therapeutic use
Other general medical examination for administrative purposes
Assault by other specified means
41
The UglyDisease Ontology / ICD-10
Other accidental submersion or drowning in water transport accident injuring other specified person
Accident to powered aircraft, other and unspecified, injuring occupant of military aircraft, any rank
Other accidental submersion or drowning in water transport accident injuring occupant of other watercraft - crew
42
The UglyDisease Ontology / ICD-10
Normal pregnancy
Fall on stairs or ladders in water transport injuring occupant of small boat, unpowered
Railway accident involving collision with rolling stock and injuring pedal cyclist
Injury due to war operations by lasers
Nontraffic accident involving motor-driven snow vehicle injuring pedestrian
43
The UglyDisease Ontology / ICD-10
Donors of other specified organ or tissue
Fitting and adjustment of wheelchair
Hot (boiling) tap water
Training in use of lead dog for the blind
Person consulting on behalf of another person
44
Principle
An ontology should have a clearly specified domain (captured by its root node)
45
“Circular Hierarchical Relationships in the UMLS:Etiology, Diagnosis, Treatment, Complications and Prevention”
Olivier Bodenreider
Topographic regions: General terms
Physical anatomical entity
Anatomical spatial entity
Anatomical surface
Body regions
Topographic regions
46
Principle
Avoid cycles
47
MeSH
National Socialism is_a Political Systems
National Socialism is_a Anthropology ...
48
Principle
Use singular nouns
49
MeSH
National Socialism is_a MeSH Descriptor
50
Plant Ontology
cell = def. structural and physiological unit of a living organism; it (i.e., plant cell) consists of protoplast and cell wall; ...
51
Principle
For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings
(Don’t use ‘cell’ when you mean ‘plant cell’)
52
ICNP: International Classification of Nursing Procedures
water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings.
53
MORE UGLYNational Cancer Institute Thesaurus
(NCIT)
54
The NCIT reflects a recognition of the need
for high quality shared ontologies and terminologies the use of which by clinical researchers in large communities can ensure re-usability of data collected by different research groups
55
NCIT
“a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research”
“exhibits ontology-like properties in its construction and use”.
56
Goals
to make use of current terminology “best practices” to relate relevant concepts to one another in a formal structure, so that computers as well as humans can use the Thesaurus for a variety of purposes, including the support of automatic reasoning;
to speed the introduction of new concepts and new relationships in response to the emerging needs of basic researchers, clinical trials, information services and other users.
57
Formal Definitions
of 37,261 nodes, 33,720 were stipulated to be primitive in the DL sense
Thus only a small portion of the NCIT ontology can be used for purposes of automatic classification and error-checking by using OWL.
58
Principle
Supply definitions wherever possible
(both human-understandable natural language definitions, and equivalent formal definitions)
59
Verbal Definitions
About half the NCIT terms are assigned verbal definitions
Unfortunately some are assigned more than one
60
Disease ProgressionDefinition1
Cancer that continues to grow or spread. Definition2
Increase in the size of a tumor or spread of cancer in the body.
Definition3 The worsening of a disease over time. This concept is most often used for chronic and incurable diseases where the stage of the disease is an important determinant of therapy and prognosis.
61
Principle
Each term should have at most one definition*
*which may have both natural-language and formal versions
62
To make matters worse Disease Progression has as subclass:
Cancer Progression
Definition:
The worsening of a cancer over time. This concept is most often used for incurable cancers where the stage of the cancer is an important determinant of therapy and prognosis.
63
Cancer
a process (of getting better or worse)
an object (which can grow and spread)
64
Principle
Distinguish continuant entities (molecule, cell, tumor, organism) from occurrent entities (processes of growth, change, ...)
65
Two kinds of entitiesoccurrents (processes, events, happenings)
cell division, ovulation, death
continuants (objects, qualities, ...)
cell, ovum, organism, temperature of organism, ...
66
NCIT confuses definitions with descriptions
Tuberculosis DefinitionA chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.
67
Confuses definitions with descriptionsTuberculosis
DefinitionA chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use.
68
A better definition
Tuberculosis
Definition:
A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis.
IS THIS CORRECT? (An infection is not a disease)
69
the use-mention confusion
Conceptual Entities =Def.
An organizational header for concepts representing mostly abstract entities.
Confuses use and mention (swimming is healthy and has eight letters)
70
Principle
Don’t confuse an entity with the name of an entity
71
Duratec, Lactobutyrin, Stilbene Aldehyde
are classified by the NCIT as Unclassified Drugs and Chemicals
72
Problematic synonymsAnatomic Structure, System, or Substance ~ Anatomic
Structures and Systems
Does ‘anatomic’ apply only to structure or also to system and substance?
Biological Function ~ Biological Processsome biological processes are the exercises of biological
functionsothers (e.g. pathological processes, side effects) not
Genetic Abnormality ~ Molecular Abnormality (with subtype: Molecular Genetic Abnormality) (definitions not supplied)
73
Three disjoint classes of plants
Vascular Plant
Non-vascular Plant
Other Plant
74
Three kinds of cells
Abnormal Cell is a top-level class (thus not subsumed by Cell
Normal Cell is a subclass of Microanatomy.
Cell is a subclass of Other Anatomic Concept (so that cells themselves are concepts)
75
NCIT as now constituted will block automatic reasoning
Neither Normal Cells nor Abnormal Cells are Cells within the context of the NCIT
76
Some consolationsNCIT is open source
NCIT has broad coverage
NCIT has some formal structure (OWL-DL)
NCIT is much, much better than (for example) the HL7-RIM
NCIT has realized the errors of its ways
77
What might have been
http://www.cbd-net.com/index.php/search/show/938464
= “Review of NCI Thesaurus and Development of Plan to Achieve OBO Compliance”
78
Fragment of Pre-NCIT Hierarchy
Murine Tissue Type Body Fluids and Substances (MMHCC) Cardiovascular System (MMHCC) Blood Vessel (MMHCC) Heart (MMHCC) Digestive System (MMHCC)
Welcome to the Pre-NCIT:http://nciterms.nci.nih.gov/
NCIBrowser/Dictionary.do
79
More UGLY
80
MeSHMeSH Descriptors
Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism
National Socialism is_a Political SystemsNational Socialism is_a Anthropology ...
81
MeSH
National Socialism is_a MeSH Descriptors
The Bodenreider Defence:
MeSH is not an ontology
82
BIRNLex
83
BIRNLexThe eye =def.
The eyeball and its constituent parts, e.g. retina
mouse =def.
common name for the species mus musculus
84
BIRNLex
85
BIRNLex
86
Principle
Avoid circular definitions
(The term defined should not appear in its own definition)
87The UMLS Semantic Network
88
More UglyUMLS Semantic Network
Pros
Broad coverage; no multiple inheritance
Cons
Incoherent use of ‘conceptual entities’
(e.g. the digestive system as a conceptual part of the organism)
Full of errors
89
UMLS Semantic Network
Edges in the graph represent merely “possible significant (= some-some) relations”:Bacterium causes Experimental Model of
Disease
Experimental Model of Disease affects Fungus
Experimental model of disease is_a Pathologic Function
90
UMLS Semantic NetworkUnclear what the nodes of the graph are:
Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object
The use-mention confusion:“Swimming is healthy and has 8 letters”
91
UMLS Semantic Network
Edges in the graph represent merely “possible significant (= some-some) relations”:Bacterium causes Experimental Model of
Disease
Experimental Model of Disease affects Fungus
Experimental Model of Disease is_a Pathologic Function
92a pudding of ‘concepts’
93
location_of
Fungus location_of Vitamin
Tissue location_of Mental or Behavioral Dysfunction
94
Fungus location_of Vitamin
Every instance of vitamin is located in some fungus?
Some instances of vitamin are located in some fungi?
Some instances of fungi have instances of vitamin located in them?
Every instance of vitamin is located in every instance of fungus?
95what are the nodes in this graph?
96
97
UMLS Semantic NetworkUnclear what the nodes of the graph are:
Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object
The use-mention confusion:“Swimming is healthy and has 8 letters”
98
NCIT inherits this ontological and terminological incoherence from source vocabularies in UMLS
Conceptual Entities =def
An organizational header for concepts representing mostly abstract entities.
Includes as subtypes:
action, change, color, death, event, fluid, injection, temperature
99
The UMLS
Unified Medical Language System
Metathesaurus
Semantic Network (SN)
100
BIRNLex and UMLS-SNRest =SN Daily or Recreational ActivityPrincipal Investigator =SN Professional or Occupational Group
Left handedness =SN Organism AttributeAmbidextrous =SN Finding
Brain Imaging =SN Diagnostic ProcedureBrain Mapping =SN Diagnostic Procedure & Research Activity
Healthy Adult =SN Finding
101
To build a high quality shared ontology requires hard work and
staying power
You cannot cheat by borrowing from UMLS
UMLS (= the UMLS Metathesaurus) is not an ontology
102
is_a (sensu UMLS)
A is_a B =def
‘A’ is narrower in meaning than ‘B’
grows out of the heritage of dictionaries, which reflect meanings, not biological reality
103
Concepts, Concept Names, and their Identifiers in the UMLS
The Metathesaurus is organized by concept. One of its primary purposes is to connect different names for the same concept from many different vocabularies.
104
The desperate search for ‘mappings’
A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms).
105
The desperate search for ‘mappings’
This is not an exact science. ... Metathesaurus editors decide what view of synonymy to represent in the Metathesaurus concept structure. Please note that each source vocabulary’s view of synonymy is also present in the Metathesaurus, irrespective of whether it agrees or disagrees with the Metathesaurus view.
106
These strange mappingbetween names as they appear in different source vocabularies created for widely different purposes can still be very usefulbut the source vocabularies themselves are of variable quality
(not all mappings are created equal)and the sorts of search which the UMLS supports reflects an already outmoded technology
107
is_a (sensu UMLS)congenital absent nipple is_a nipple
surgical procedure not carried out because of patient’s decision is_a surgical procedure
cancer documentation is_a cancer
disease prevention is_a disease
living subject is_a information object representing an animal or complex organism
individual allele is_a act of observation
limb is_a tissue
108
is_a (sensu UMLS)
both testes is_a testis
plant leaves is_a plant
smoking is_a individual behavior
walking is_a social behavior
109
Advantages of the methodology of shared coherently defined
ontologiesonce the interoperable gold standard reference ontologies are there, it will make sense to reformulate parts of existing incompatible terminologies (e.g. in UMLS) in terms of the standard ontologies in order to achieve greater domain coverage and alignment of different but veridical views. Thus not everything that was done in the past turns out to be a waste.
110
is_a (sensu UMLS)
A is_a B =def
‘A ’ is narrower in meaning than ‘B ’
grows out of the heritage of dictionaries
(which ignore the basic distinction between universals and instances)
111
The really ugly
112
113
HL7 Marketing
HL7 V3 claims to be:
“The foundation of healthcare interoperability”
“The data standard for biomedical informatics”
from blood banks to Electronic Health Records to clinical genomics
114
HL7 Incredibly Successful
adopted by Oracle as basis for its Electronic Health Record technology; supported by IBM, GE, Sun ...
embraced as US federal standard
central part of $35 billion program to integrate all UK hospital information systems
115
Problem V3 of HL7 is designed to address
in HL7 V2 the realization of the messaging task allows ad hoc interpretations of the standard by each sending or receiving institution.
Result: vendor products never properly interoperable, and always require mapping software.
116
The solution to this problem (V3) is the HL7 RIM
or Reference Information Model
= a world standard for exchange of information between clinical information systems
117
The V3 solution
Remove optionality by having the RIM serve as a master model of all health information, from blood banks to Electronic Health Records to clinical genomics
118
The hype
“HL7 V3 is the standard of choice for countries and their initiatives to create national EHR and EHR data exchange standards as it provides a level of semantic interoperability unavailable with previous versions and other standards. Significant V3 national implementations exist in many countries, e.g. in the UK (e.g. the English NHS), the Netherlands, Canada, Mexico, Germany and Croatia.”
119
The reality (I asked them)“None of the implementations have a national scope” (e.g. Stockholm City Council)
The paradigm Dutch national HL7 V3 EHR implementation uses HL7 technology exclusively for exchanging data (i.e. messaging). The EHR architectures themselves are HL7-free.
120
The Oracle Healthcare Transaction Base (HTB)
Oracle itself refers (April 2006) to three implementations of HTB described as being 'live for EHR projects':
1) Byrraju Foundation (BSRF) in India (Live)2) Stockholm County (planned to go live by May 2006)3) Louisiana (planned to go live by May 2006)
121
Regarding the Byrraju case, I am told that there is no V3 application running in India today and that the Byrraju Foundation is presently not using any telemedicine application that utilizes HL7.
As to the Stockholm case, the HTB was purchased and deployed in late 2004. An attempt to port a pilot system was made during the spring of 2005. This attept was abandoned, as I understand from my Swedish colleagues, partly because of poor performance (the new application performed significantly less well than the system it was designed to replace, even though it was being run on considerably more expensive hardware), and partly because of a lack of fault tolerance, which made it inadequate as a mechanism for integrating legacy systems marked by a high degree of variation in data quality. During the spring of 2006, it seems, an attempt will be made to construct a new pilot application, this time with the more modest goal of handling referrals.
122
The hype
The RIM is “credible, clear, comprehensive, concise, and consistent”
It is “universally applicable” and “extremely stable”
123
The reality
• HL7 V3 documentation is 542,458 KB, divided into 7,573 files
• It remains subject to frequent revisions
• It is very difficult to understand
124
The reality
The decision to adopt the RIM was made already in 1996, yet the promised benefits of interoperability still, after 10 years, remain elusive.
HL7 has bet the farm on the RIM – technology has advanced in these 10 years
125RIM NORMATIVE CONTENT
126
to design a message, choose from here
127
Too many combinations
as the traffic on HL7’s own vocabulary mailing list reveals, there is no adequate mechanism for ensuring that the vast number of combinations of coded terms within actual messages can be controlled in such a way that messages will be understood in the same way by designers, senders and receivers.
128
129
These pre-defined attributes
code, class_code, mood_code,
status_code, etc.
yield a combinatorial explosion:
class_code (61 values) x mood_code (13 values) x code (estimate 200) x status_code (10 codes) = 1.58 million combinations.
Adding in the other codes this becomes 810 billion.
130
Why does the RIM embody so many
combinations?
To ensure in advance that everything can be said in conformity to the standard
131
The RIM methodologydefines a set of ‘normative’ classes (Act, Role, and so on), with which are associated a rich stock of attributes from which one must make a selection when applying the RIM to each new domain (pharmacy, clinical genomics ...), Compare: attempting to create manufacturing software by drawing from a store containing pre-established parts (so that the store would need to have the bits needed for making every conceivable manufacturable thing, be it a lawnmower, a refrigerator, a hunting bow, and so on).
132
The RIM methodology
are there examples where a methodology of this sort has been made to work? Does the RIM yield a coherent basis for constructing well-designed software artifacts for functions like the EHR or computerized decision support?
133
This methodology does not impede the formation of local dialects
Different teams produce different message designs for the very same topic.
In the UK, the £ 35 bn. NHS National Program “Connecting for Health” has applied the RIM rigorously, using all the normative elements, and it discovered that it needed to create dialects of its own to make the V3-based system work for its purposes (it still does not work)
134
The RIM documentation• is subject to multiple and systematic internal
inconsistencies and unclarities: • is marked by sloppy and unexplained use of
terms such as ‘act’, ‘Act’, ‘Acts’, ‘action’, ‘ActClass’ ‘Act-instance’, ‘Act-object’
• and uncertain cross-referencing to other HL7 documents
• no publicly available teaching materials (no HL7 for Dummies)
135
from HL7 email forum (do not circulate)
“I am ... frightened when I contemplate the number of potential V3ers who ... simply are turned away by the difficulty of accessing the product.
“Some of them attend V3 tutorials which explain V3 as the hugely complex process of creating a message and are turned off. [They] simply do not have the stamina, patience, endurance, time, or brain-cells to understand enough for them to feel comfortable contributing to debates / listserves, etc., so they remain silent.”
136
Problems of scope
Only two main classes in the RIM
Act = roughly: intentional action
Entity = persons, places, organizations, material
How can the RIM deal transparently with information about, say, disease processes, drug interactions, wounds, accidents, bodily organs, documents?
137
Diseases in the RIM... are not Acts... are not Entities... are not Roles, Participations ...
So what are they?At best: a case of pneumonia is identified as
the Act of Observation of a case of pneumonia
Note: RIM’s treatment of SNOMED codes
138
HL7 Clinical Document Architecturedefines a document as an Act
HL7’s Clinical Genomics Standard Specifications
defines an individual allele as an Act of Observation
139
Why the centrality of ‘Act’
because of HL7’s roots in US hospital messaging – and thus in US hospital billing:
intentional actions are what can be billed
140
Mayo RIM discussion of the meaning of ‘Act’ as “intentional action”
Is a snake bite or bee sting an "intentional action"?
Is a knife stabbing an intentional action?
Is a car accident an intentional action?
When a child swallows the contents of a bottle of poison is that an intentional action?
141
The RIM has no coherent criteria for deciding
For this reason, too, dialects are formed – and the RIM does not do its job. One health information system might conceive snakebites and gunshots as Procedures. Another might classify them with diseases, and so treat them as Observations.
If basic categories cannot be agreed upon for common phenomena like snakebites, then the RIM is in serious trouble.
142
Are definitions like this a good basis for achieving semantic interoperability in the biomedical domain?:
LivingSubject Definition: A subtype of Entity representing an organism or complex animal, alive or not.
143
Person (from HL7 Glossary)
Definition: A Living Subject representing single human being [sic] who is uniquely identifiable through one or more legal documents
144
The Problem of Circularity
A Person =def. A person with documents
‘An A is an A which is B’– useless in practical terms, since neither we
nor the machine can use it to find out what ‘A’ means
– incorporates a vicious infinite regress– has the effect of making it impossible to
refer to A’s which are not Bs, for example to undocumented persons
145
Katrina
146
Katrina
147
What is the RIM about?blood pressure measurement = an information item blood pressure = something in reality which exists independently of any recording of information, and which the measurement measures
Q: Is the RIM about information, or about the reality to which such information relates? A: There is no difference between the two
148
RIM Philosophy“The truth about the real world is constructed through a combination and arbitration of attributed statements ...
“As such, there is no distinction between an activity and its documentation.”
149
The RIM as an Information Model
‘a static (UML) model of health and health care information’
The scope of the RIM’s class hierarchy consists in packets of information:
the information content of invoices, statements of observations, lab reports, …
150
A good, general constraint on a theory of meaning
For each linguistic expression ‘E’
‘E’ means E
‘snow’ means snow
‘pneumonia’ means pneumonia
151
From the perspective of the RIM on the Information Model conception‘medication’ does not mean: medication rather it means:
the record of medication in an information system
‘stopping a medication’ does not mean: stopping a medication
rather it means: change of state in the record of a Substance Administration Act from Active to Aborted
152
The RIM’s Entity class
persons, places, organizations, material
153
States of Entity• active: The state representing the fact that the
Entity is currently active. • nullified: The state representing the termination
of an Entity instance that was created in error. • inactive: The state representing the fact that an
entity can no longer be an active participant in events.
• normal: The “typical” state. Excludes “nullified”, which represents the termination state of an Entity instance that was created in error
154
Persons are Entities
What do ‘active’ and ‘nullifed’ mean as applied to Person?
Is there a special kind of death-through-nullification in the case of those instances of Person who were created in error?
155
HL7 GlossaryDefinition of Animal: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain.
An Animal is not an animal. Rather (an) Animal represents an animal: it is an information item which represents a certain highly specific kind of animal-of-interest, namely an animal that is of interest to the Personnel Management domain.
156
Double Standards
The RIM is a confusion of two separate artifacts:
1. an “information model”, relating to names of persons, records of observations, social security numbers, etc.
2. a reference ontology, relating to persons, observations, documents, acts, etc.
157
The examples provided to illustrate the RIM’s classes
are almost always in conformity with the Reference Ontology Conception of the RIM
They involve the familiar kinds of things and processes in reality (medication, patients, devices, paper documents, surgery, diet, supply of bedding) with which healthcare messages are concerned.
158
HL7 Glossary:
Instances of Person include: John Smith, RN, Mary Jones, MD, etc.
not: information about John Smith ...
159
Some of the RIM’s definitions are in conformity with the
Information Model Conception
160
Definition of Act:A record of something that is being done,
has been done, can be done, or is intended or requested to be done
An Act is the record of an Act
“There is no difference between an activity and its documentation”
HL7’s backbone ‘Act’ class
161
Acts are records: but the examples of Act given by the RIM are as follows:
“The kinds of acts that are common in health care are (1) a clinical observation, (2) an assessment of health condition (such as problems and diagnoses), (3) healthcare goals, (4) treatment services (such as medication, surgery, physical and psychological therapy), ...
162
The class Procedure (a subclass of Act)
Definition of Procedure: An Act whose immediate and primary outcome (post-condition) is the alteration of the physical condition of the subject
Examples:
chiropractic treatment, acupuncture, straightening rivers, draining swamps.
163
What is an information model ?
Is it a model of entities in reality (an ontology)?
Or of information about entities in reality (an ontology)?
The RIM is an incoherent mixture of the two
Does this matter?
164
What’s gone wrong?
People of good will are making mistakes because of insufficient concern for clarity and consistency
Even large ontologies are built in the spirit of the amateur hobbyist
Money is wasted on megasystems that cannot be used
165
Lessons for Semantic Interoperability
Clear and easily accessible documentation – based on an intuitive ontology (understandable to all classes of users)
Business model should be such that those responsible for creating documentation do not have an incentive for it to be unclear
Centralized control of documentation, to ensure consistency (too much democracy is a bad thing)
166
Lessons for Standards for Semantic Interoperability
Create standards on the basis of thorough pilot testing
(Avoid systems like the RIM, which is imposed from the top down, on a wing and a prayer)
167
What should take the place of the RIM?1. A Reference Ontology of the types of biomedical entity such
as thing, process, person, disease, infection, molecule, procedure, etc.,
2. A Reference Ontology of the types of biomedical information entity such as message, document, record, image, diagnosis, interpretation, etc.
1. provides a high-level framework in terms of which the lower-level types captured in vocabularies like SNOMED CT could be coherently organized
2. helps to specify how information can be combined into meaningful units and used for further processing.