Developing Medical Informatics Ontologies with Protégé Kokkinaki Alexandra Lab of Medical...

Post on 22-Dec-2015

221 views 3 download

Tags:

Transcript of Developing Medical Informatics Ontologies with Protégé Kokkinaki Alexandra Lab of Medical...

Developing Medical Informatics Ontologies with Protégé

Kokkinaki Alexandra

Lab of Medical Informaticsalko@med.auth.gr

Tutorial materials The Protégé application:

copy the Protégé-2000 directory into your “Program Files” or “Applications” folder

The tutorial example:copy the “Wine” folder on your hard disk

Examples of Medical Informatics ontologiescopy the “Medical Informatics” examples on your disk

Slides from the tutorial (AMIA2003-Protege-Tutorial.ppt)

Outline

Ontology development basics What is an ontology and why do we need one?

The ECG vs wine ontology will be analyzed

A step-by-step guide to ontology development

An overview of Protégé Medical Informatics ontologies Hands on: Design part of ECG ontology in Protégé.

Γνώση και οντολογίες

Τι είναι γνώση: Ένα σύνολο από δεδομένα με σημασιολογικό

περιεχόμενο

Οι οντολογίες χρησιμοποιούνται για την αναπαράσταση γνώσης

Οντολογία (Ορισμός)

Στη φιλοσοφία Η επιστήμη της ύπαρξης (Αριστοτέλης)

Στην επιστήμη και στην τεχνητή νοημοσύνη Αποτελείται από τις ρητές προδιαγραφές της

αντίληψης για τον κόσμο (Gruber) “an explicit specification of conceptualisation”

H τυπική προδιαγραφή μίας κοινής αντίληψης για τον κόσμο (Borst) “a formal specification of a shared conceptualisation

What Is An Ontology

An ontology is an explicit description of a domain consisting of: Concepts (Classes)

Classes are the focus of most ontologies. Classes describe concepts in the domain. For example, a class of ECGs represents all ECGs. Specific ECGs are instances of this class.

Class/subclass hierarchy A class can have subclasses that represent concepts that are more

specific than the superclass. For example, we can divide the class of Heart Diseases in:

Atrial abnormalities, Cardiac arrhythmia, Cardiac hyperthtophy, Cardiomyopathies….

What Is An Ontology

An ontology also consists of: properties and attributes of concepts (slots)

Slots describe properties of classes and instances: Patient with patientId =XXXX who is a 40 year old Male

Instances of the class Patient will have slots describing their age, address, race, sex etc

constraints on properties and attributes age <100, race {Caucasian, Black, Oriental)

Individuals (often, but not always) PatientXXX, Amiodarone etc

Ontology Examples

Taxonomies on the Web Yahoo! Categories

The Yahoo! Directory is a catalog of sites organized into subject-based categories and sub-categories. All of

the site listings in the Directory are contained 14 main categories on Yahoo! Directory:

Domain-specific standard terminology SNOMED Clinical Terms – terminology for clinical

medicine covering most areas of clinical information such as diseases, findings,

procedures, microorganisms, pharmaceuticals

Ontology Examples

UMLS Unified Medical Language System The UMLS integrates and distributes key terminology, classification and coding

standards, and associated resources to promote creation of more effective and

interoperable biomedical information systems and services, including

electronic health records (http://umlsks.nlm.nih.gov/uPortal/frame.jsp?umlsks-

frame=http://www.nlm.nih.gov/research/umls/documentation.html)

What Is “Ontology Engineering”?

Ontology Engineering: Defining terms in the domain and relations among them Defining concepts in the domain (classes) Arranging the concepts in a hierarchy (subclass-

superclass hierarchy) Defining which attributes and properties (slots)

classes can have and constraints on their values Defining individuals and filling in slot values

Why Develop an Ontology?

To share common understanding of the structure of information among people and among software agents Web sites containing medical information publish

the same underlying ontology of the terms they all use

To enable reuse of domain knowledge to avoid “re-inventing the wheel” to introduce standards to allow interoperability

More Reasons

To make domain assumptions explicit easier to change domain assumptions (consider a

genetics knowledge base) easier to understand and update legacy data

To separate domain knowledge from the operational knowledge re-use domain and operational knowledge separately

(e.g., configuration based on constraints)

An Ontology Is Often Just the Beginning

OntologiesOntologies

Software agents

Software agents Problem-

solving methods

Problem-solving

methods

Domain-independent applications

Domain-independent applications

DatabasesDatabasesDeclarestructure

Knowledgebases

Knowledgebases

Providedomain

description

Ontology-Development Process

In this tutorial:determine

scopeconsider

reuseenumerate

termsdefine

classesdefine

propertiesdefine

constraintscreate

instances

In reality - an iterative process:

determinescope

considerreuse

enumerateterms

defineclasses

considerreuse

enumerateterms

defineclasses

defineproperties

createinstances

defineclasses

defineproperties

defineconstraints

createinstances

defineclasses

considerreuse

defineproperties

defineconstraints

createinstances

Οντολογίες vs Βάσεις Δεδομένων

Μία βάση δεδομένων είναι ένα σύνολο από πίνακες και σχέσεις

Μία οντολογία περιέχει συντακτικά και σημασιολογικά πλουσιότερη πληροφορία από τις βάσεις δεδομένων

Οι βάσεις δημιουργούνται κυρίως για την αποθήκευση πληροφοριών. Οι οντολογίες για την περιγραφή μιας ολόκληρης θεματικής περιοχής

Μία οντολογία πρέπει να είναι δικτυακής αρχιτεκτονικής γιατί χρησιμοποιείται για το διαμοιρασμό της πληροφορίας.

Preliminaries - Tools

Protégé-2000 is a graphical ontology-development tool supports a rich knowledge model is open-source and freely available

Some other available tools: Ontolingua and Chimaera OntoEdit OilEd

Determine Domain and Scope

What is the domain that the ontology will cover?

For what we are going to use the ontology? For what types of questions the information in

the ontology should provide answers (competency questions)?

Answers to these questions may change during the lifecycle

determinescope

considerreuse

enumerateterms

defineclasses

defineproperties

defineconstraints

createinstances

French winesand

wine regions

California wines and

wine regions

Which wine should

I serve with seafood today? A shared

ONTOLOGY of Wine and food

A sharedONTOLOGY of Wine and food

ECGs and

accompanying

data

ECGs and

accompanying

data

PhysioNet

Find arrhythmia ECG’s, of men

>40 taking Aldomet

UMLSDrugs &diseases

Competency Questions

Which wine characteristics should I consider when choosing a wine?

Is Bordeaux a red or white wine? Does Cabernet Sauvignon go well with seafood? What is the best choice of wine for grilled meat? Which characteristics of a wine affect its appropriateness for a

dish? Does a flavor or body of a specific wine change with vintage

year? What were good vintages for Napa Zinfandel?

Ερωτήσεις αρμοδιότητας Ι

Ποιες παραμέτρους πρέπει να καταγράψω για κάθε ένα από τα χαρακτηριστικά του ΗΚΓ?

Τι χαρακτηριστικά πρέπει να καταγράψω για τους ασθενείς? Ποια φάρμακα και ποιες ασθένειες θα καταγράψω? Ποια χαρακτηριστικά του ΗΚΓ είναι abnormal κατά την εμφάνιση

αρρυθμίας? Ποια τα χαρακτηριστικά (δημογραφικά) των ασθενών με κολπική

μαρμαρυγή? Τι εύρος διαγνώσεων παρέχεται από το ηλεκτροκαρδιογράφημα?

Ερωτήματα Αρμοδιότητας ΙΙ

Θέλω ΗΚΓ ασθενών με αρρυθμίες? Ποια τα χαρακτηριστικά της αρρυθμίας στο ΗΚΓ? Τι φάρμακα έχουν χορηγηθεί σε ασθενείς με

αρρυθμίες? Θέλω τα ΗΚΓ ασθενών με αρρυθμία που

ταυτόχρονα έπαιρναν Antiarryhtmic drugs? Θέλω τα ΗΚΓ ασθενών αρρένων με ηλικία >40 και

Ιατρικό Ιστορικό διαβήτη?

Consider Reuse

Why reuse other ontologies? to save the effort to interact with the tools that use other ontologies to use ontologies that have been validated through

use in applications

determinescope

considerreuse

enumerateterms

defineclasses

defineproperties

defineconstraints

createinstances

What to Reuse?

Ontology libraries Protégé ontology library

(protege.stanford.edu/ontologies.html) DAML ontology library (www.daml.org/ontologies) Ontolingua ontology library

(www.ksl.stanford.edu/software/ontolingua/) Upper ontologies

IEEE Standard Upper Ontology (suo.ieee.org) Cyc (www.cyc.com)

What to Reuse? (II)

General ontologies DMOZ (www.dmoz.org) WordNet (www.cogsci.princeton.edu/~wn/)

Domain-specific ontologies UMLS Semantic Net GO (Gene Ontology) (www.geneontology.org) GLIF HL7

Enumerate Important Terms

What are the terms we need to talk about? What are the properties of these terms? What do we want to say about the terms?

considerreuse

determinescope

enumerateterms

defineclasses

defineproperties

defineconstraints

createinstances

Enumerating Terms - The Wine Ontology

wine, grape, winery, location, wine color, wine body, wine flavor, sugar content white wine, red wine, Bordeaux wine food, seafood, fish, meat, vegetables, cheese

Enumerating Terms - The ECG Ontology

ECG, Patient, Drug, Disease, ECG Characteristics, Acquiring Device, Lead

Measurement, Diagnosis, Medical History, Blood Pressure, V1,V2, aVL

Antiarrhythmic, Amiodarone, Dilantin, Lorcainide, ACE-Inhibitors, Captopril

Define Classes and the Class Hierarchy

A class is a concept in the domain a class of wines a class of wineries a class of red wines

A class is a collection of elements with similar properties

Instances of classes a glass of California wine you’ll have for lunch

considerreuse

determinescope

defineclasses

defineproperties

defineconstraints

createinstances

enumerateterms

Classes usually constitute a taxonomic hierarchy (a subclass-superclass hierarchy)

A class hierarchy is usually an IS-A hierarchy: an instance of a subclass is an instance of

a superclass If you think of a class as a set of elements, a

subclass is a subset

Class Inheritance

Class Inheritance - Example

Cardiac Drug is a subclass of Drug Antiarrhytmic is a subclass of Cardiac Drug

Every Antiarrythmic drug is a Cardiac Drug Amiodarone drug is a subclass of Antiarrhythmic

drugs Every Amiodarone Drug is an Antiarrhythmic drug

Levels in the Hierarchy

Middlelevel

Toplevel

Bottomlevel

Modes of Development

top-down – define the most general concepts first and then specialize them

bottom-up – define the most specific concepts and then organize them in more general classes

combination – define the more salient concepts first and then generalize and specialize them

Documentation

Classes (and slots) usually have documentation Describing the class in natural language Listing domain assumptions relevant to the class

definition Listing synonyms

Documenting classes and slots is as important as documenting computer code!

Define Properties of Classes – Slots

Slots in a class definition describe attributes of instances of the class and relations to other instances Each wine will have color, sugar content, producer,

etc.

considerreuse

determinescope

defineconstraints

createinstances

enumerateterms

defineclasses

defineproperties

Define Properties of Classes – Slots

Enumerate ECG slots Patient (patientID, sex, race, age….) ECGCharacteristics (onset, offset, duration) Recording Device (Device Type, Manufacturer,

serial Number.. )

considerreuse

determinescope

defineconstraints

createinstances

enumerateterms

defineclasses

defineproperties

Properties (Slots)

Types of properties “intrinsic” properties: flavor and color of wine “extrinsic” properties: name and price of wine parts: ingredients in a dish relations to other objects: producer of wine (winery)

Simple and complex properties simple properties (attributes): contain primitive values (strings,

numbers) complex properties: contain (or point to) other objects (e.g., a

winery instance)

Slots for the Class Wine

Slot and Class Inheritance

A subclass inherits all the slots from the superclass If a wine has a name and flavor, a red wine also has a

name and flavor

If a class has multiple superclasses, it inherits slots from all of them Port is both a dessert wine and a red wine. It inherits

“sugar content: high” from the former and “color:red” from the latter

Property Constraints

Property constraints (facets) describe or limit the set of possible values for a slot The name of a wine is a string The wine producer is an instance of Winery A winery has exactly one location

Race {Caucasian, Asian, Black, Unspecified} Age<150 The Id of a patient is String PatientName is an instance of Patient

considerreuse

determinescope

createinstances

enumerateterms

defineclasses

defineconstraints

defineproperties

Create Instances

Create an instance of a class The class becomes a direct type of the instance Any superclass of the direct type is a type of the instance

Assign slot values for the instance frame Slot values should conform to the facet constraints Knowledge-acquisition tools often check that

considerreuse

determinescope

createinstances

enumerateterms

defineclasses

defineproperties

defineconstraints

Creating an Instance: Example

Outline

Ontology development basics What is an ontology and why do we need one? A step-by-step guide to ontology development An overview of Protégé Advanced issues in knowledge modeling

Medical Informatics ontologies: examples and design decisions

Additional resources: Protégé plugins and applications

Where to go for help

Protégé user’s guide http://protege.stanford.edu/doc/users_guide/

index.html Protégé user’s guide

http://protege.stanford.edu/publications/ontology_development/ontology101.html

FAQ http://protege.stanford.edu/faq.html

Outline

Ontology development basics What is an ontology and why do we need one? A step-by-step guide to ontology development An overview of Protégé Advanced issues in knowledge modeling

Medical Informatics ontologies: examples and design decisions

Additional resources: Protégé plugins and applications

Going Deeper

Breadth-first coverage

determinescope

considerreuse

enumerateterms

defineclasses

defineproperties

defineconstraints

createinstances

Depth-first coverage

determinescope

considerreuse

enumerateterms d

efine

classes

defin

ep

rop

erties

defin

eco

nstrain

ts

createinstances

Defining Classes and a Class Hierarchy

Things to remember: There is no single correct class hierarchy But there are some guidelines

The question to ask: “Is each instance of the subclass an instance of its

superclass?”

Siblings in a Class Hierarchy

All the siblings in the class hierarchy must be at the same level of generality

Compare to section and subsections in a book

The Perfect Family Size

If a class has only one child, there may be a modeling problem

If the only Red Burgundy we have is Côtes d’Or, why introduce the subhierarchy?

Compare to bullets in a bulleted list

The Perfect Family Size (II)

If a class has more than a dozen children, additional subcategories may be necessary

However, if no natural classification exists, the long list may be more natural

Single and Plural Class Names

A “wine” is not a kind-of “wines” A wine is an instance of the class

Wines Class names should be either

all singular all plural

Class

Instance

instance-of

Classes and Their Names

Classes represent concepts in the domain, not their names The class name can change, but it will still refer to the same

concept Synonym names for the same concept are not different classes

Many systems allow listing synonyms as part of the class definition

A Completed Hierarchy of Wines

When to introduce a new class?

Subclasses of a class usually have Additional properties Additional slot restrictions Participate in different relationships

Subclasses of a class have New slots New facet values

But

In terminological hierarchies, new classes do not have to introduce new properties

A new class or a property value?

Do concepts with different slot values become restrictions for different slots?

How important is the distinction for the domain?

A class of an instance should not change often

Metaclasses: Templates For Class Definitions

Metaclasses enable us to add attributes to class definitions

By default, we have: Class name Documentation Slots …

Metaclasses (II)

Additional attributes: Synonyms UMLS CUI Latin name Other class-level properties

Best Wineries

Back to the Slots: Allowed Values

When defining a domain or range for a slot, find the most general class or classes

Consider the produces slot for a Winery: Range: Red wine, White wine, Rosé wine Range: Wine

Consider the flavor slot Domain: Red wine, White wine, Rosé wine Domain: Wine

slotclass allowed values

DOMAIN RANGE

Defining Domain and Range A class and a

superclass – replace with the superclass

All subclasses of a class – replace with the superclass

Most subclasses of a class – consider replacing with the superclass

Inverse Slots

Maker and Producer are inverse slots

Inverse Slots (II)

Inverse slots contain redundant information, but Allow acquisition of the information in either direction

Enable additional verification

Allow presentation of information in both directions

The actual implementation differs from system to system Are both values stored?

When are the inverse values filled in?

What happens if we change the link to an inverse slot?

Default Values

Default value – a value the slot gets when an instance is created

A default value can be changed The default value is a common value for the slot,

but is not a required value For example, the default value for wine body can

be FULL

Limiting the Scope

An ontology should not contain all the possible information about the domain No need to specialize or generalize more than the

application requires No need to include all possible properties of a class

Only the most salient properties Only the properties that the applications require

Limiting the Scope (II)

Ontology of wine, food, and their pairings probably will not include Bottle size

Label color

My favorite food and wine

An ontology of biological experiments will contain Biological organism

Experimenter

Is the class Experimenter a subclass of Biological organism?

Outline

Ontology development basics Medical Informatics ontologies: examples

Foundational Model of Anatomy (FMA)

UMLS (Unified Medical Language System)

Gene Ontology (GO)

Guideline Interchange Format (GLIF)

Foundational Model of Anatomy (FMA)

Developed at University of Washington as part of the Digital Anatomist project

Represents declaratively knowledge about human anatomy Canonical Independent of a specific viewpoint Machine-readable, symbolic representation

FMA in Protégé

Represents structures ranging fro macromolecular complexes to body parts

Contains ~70,000 distinct concepts ~ 110,000 terms 140 relations

FMA: Knowledge-Model Features

Metaclasses to define class-level properties Attributed relations Different types of part-whole, location, and other

spatial relations Synonyms FME explorer

http://sig.biostr.washington.edu/projects/fm/FME/index.html

FMA: Demo

Top-level distinctions: Physical vs Conceptual entity Material vs Non-Material Physical entity Anatomical Structure

Structural organization Examples:

Anatomical entity, Heart

Outline

Ontology development basics Medical Informatics ontologies: examples

Foundational Model of Anatomy (FMA)

UMLS Gene Ontology (GO)

Guideline Interchange Format (GLIF)

Additional resources: Protégé plugins and applications

Introduction

UMLS is a compendium of many controlled vocabularies in the biomedical sciences (created 1986[1]).

Mapping structure among vocabularies translations among the various terminology systems

Ontology of biomedical concetps.  natural language processing

It is intended to be used mainly by developers of systems in medical informatics.

Introduction

consists of three Knowledge Sources Metathesaurus:

concepts that include the various names representing the same meaning from different source vocabularies

Semantic Network 135 semantic types, 54 semantic relations

SPECIALIST Lexicon dictionary of biomedical terms and common words, lexical

tools and records used in natural language processing

Introduction

very large concept-oriented database holds concepts, their various names and relationship

among them Links alternative names and views of the same

concept together and identify useful relations between different concepts

2004AA 1,020,866 concepts and 2.8 million terms

Introduction

All concepts are assigned to at least one semantic type consistent categorization of all concepts at the

relatively general level Metathesaurus must be customized to be used

effectively

Metathesaurus structure

Concepts (CUI) Terms (LUI) Strings (SUI) Atoms (AUI) Relations

22/07/2004

Concept

A concept is meaning A meaning can have many different names link all the names from all of the source

vocabularies that mean the same thing each concept (meaning) has a concept unique

identifier (CUI)

22/07/2004

Concept Names and String identifiers

Each string in the concept names has a unique identifier (SUI)

Any variation in character set, upper-lower case, punctuation is a separate string with a separate SUI

The same string in different languages have different SUI

22/07/2004

2002AC

~870,000 concepts (Eye, Oculus = 1) ~1,756,000 “terms” (Eye, Eyes, eye = 1) ~2,083,103 “strings”/concept names (Eye, Eyes, eye =

3) ~11,479,000 relationships between concepts ~ 7 million of relationships between concepts and

English words >113 source vocabularies 15 different languages

Atoms

Each and every occurrence of a string in each source vocabulary is an atom

every atom has an atom identifier (AUI) In other words, Atoms are the entries in the

source vocabularies

Terms

All the variants of a string is grouped into a term a term is the group of all strings that are lexical

variants of each other Each term has a lexical identifier (LUI)

Example

Semantic Network

Broad subject categories Represent the biomedical domain 2 main categories

Entity Event

Semantic type is assigned to Metathesaurus concepts at the most specific level

UMLS Semantic Net

Entity Event

Language

Organisation

GroupAttribute

Idea orConcept

Finding

OrganismAttribute

IntellectualProduct

OccupationOr Discipline

Group

Substance

Organism

Anatomical Structure

Manufactured Object

Behaviour

Daily orRecreational

activityOccupational

ActivityMachineActiivty

Laboratory Procedure

Diagnostic Procedure

Therapeutic Procedure

IndividualBehaviour

SocialBehaviour

Health careActivity

ResearchActivity

EducationalActivity

Governmental orRegulatory Activity

Injury orPoisoning

NaturalPhenomenonOr Process

Human-causedPhenomenonOr Process

EnvironmentEffect of Humans

PhysicalObject

Conceptual Entity

PhenomenonOr Process

Activity

BiologicFunction

PhysiologicFunction

PathologicFunction

Organ orTissue

Function

OrganismFunction

MentalProcess

CellFunction

MolecularFunction

GeneticFunction

Disease or Syndrome

Mental orBehaviouralDysfunction

NeoplasticProcess

Cell orMolecular

Dysfunction

ExperimentalModel ofDisease

Relations

Primary link: isa establishes the hierarchy Five group other than isa:

physically related to spatially related to functionally related to temporally related to conceptually related to

inheritance supported

Semantic Net: 54 Links

SpatiallyRelatedTo

Has_location Adjacent_to Surrounded_by Traversed_by

PhysicallyRelatedTo

Has_partConstitutes

Contained_in Connected_to

Interconnected_by

Has_branch

Has_tributary

Has_ingredient

TemporallyRelatedTo

follows

co-occurs_with

brought_about_by

has_manifestation

indicated_by

has_resultFunctionallyRelatedTo

affected_by

managed_by

treated_by

disrupted_by

complicated_by

interacted_with

prevented_by

used-byproduced_by

caused_by

performed_by carried_out_by

exhibited_by

practiced_by

has_occurrence has_process

ConceptuallyRelatedTo

has_degreediagnosed_by

has_property

has_derivative

has_developmental_form

has_measurement

measured_by

has_evaluation

has_method

has_conceptual_part

has_issue

analyzed_by Assessed_for_effect_by

Example

Where To Go From Here

Protégé web site: http://protege.stanford.edu Documentation User’s Guide Tutorial protege-discussion mailing list Ontology library

Contribute ontologies and plugins Open Biomedical Ontologies http://www.obofoundry.org/