The UMLS Semantic Network Support for semantic integration and reasoning Workshop UMLS Semantic...
-
Upload
nickolas-potter -
Category
Documents
-
view
218 -
download
0
Transcript of The UMLS Semantic Network Support for semantic integration and reasoning Workshop UMLS Semantic...
The UMLS Semantic NetworkSupport for semantic integration and
reasoning
Workshop UMLS Semantic Network
NLM, NIH, Bethesda, 7-8 Apr 2005
Anita Burgun
Overview
• Semantic integration– Role of the SN– Integration of resources– Integration of data
• Reasoning– Reasoning with hierarchies– Reasoning with associative relations
• Perspectives• Illustration
– Genes, gene products, diseases– Findings, signs, diseases
Semantic integration
1- Role of ontologies
IntegrationDWH
Patientfiles
External resources
SWISSPROT
MEDLINE
…..
Data Warehouse
Micro-arraydata
GENBANK
Gene instances
Ontologies
Mediation system
GOA
Local res.
Integrating data in the domain of organ failure and transplantation
Local Information Systems
EfGtransplantation
REINEnd stage renal failure
EfG terminology server
dialysis
T1
T2
T3
MAPPING
ONTO-TERM
mappingterm-term
Semantic NetworkMetathesaurus
Semantic integration
2- Resource Integration
IntegrationDWH
Patientfiles
External resources
SWISSPROT
MEDLINE
…..
Data Warehouse
Micro-arraydata
GENBANK
Gene instances
Ontologies
Mediation system
GOA
Local res.
Introduction
• Increasing need for physicians and biologists to access information on the Internet
• Biomedical sources
– Scattered
– Multiple heterogeneity
– Rapid evolution and frequent creation
• Integration
<OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div></OrgName>
<ORGANISM>Homo sapiens</ORGANISM><TAXONOMY>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.</TAXONOMY>
<OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div></OrgName>
Objectives
• Overall: creating a system– Global access– Homogeneous and up-to-date information
• Specific: acquiring sources schemas– As automatically as possible– Dealing with updates, adding new resources– Generate different paths to access information
Sources schema
• Rarely available or hard to exploit• No existing standard• Identifying the schema of each source by
exploiting its contents– Informs on the type of information present
in the source– Extraction from its Web site
Use of UMLS
• Heterogeneity of schemas
• Need of a common vocabulary: the UMLS
• Example : finding the site of expression of a gene starting from a gene symbol
Results• 279 distinct terms extracted from 11 sources
– 232 found in the UMLS corresponding to 495 MTH concepts
• 318 were correct
• 177 were not
– 47 not found
• Of the 318 MTH concepts, 60 concepts are common to at least 2 distinct extracted terms (158 are specific)
Semantic Type Frequency Extracted term MT Concept
Intellectual Product 33 Other Database Links databases
Qualitative Concept 32 Approved Gene Symbol Approved
Functional Concept 28 BIOCHEMICAL FEATURES Biochemical
Spatial Concept 26 Chromosomal Location Location
Quantitative Concept 20Gross insertions & duplications
Gross
Gene or Genome 17 Related Genes Genes
Nucleic Acid, Nucleoside, or Nucleotide 14Additional Gene cDNA sequence
cDNA
Biologically Active Substance 13 Nucleotide Protein Proteins
Idea or Concept 10 Previously Approved Symbols symbol
Genetic Function 9 GENE FUNCTION Gene function, NOS
Organism Attribute 9 relative length Length
Temporal Concept 9 Aliases ALIAS
Amino Acid, Peptide, or Protein 6Name and origin of the protein
Protein
Indicator, Reagent, or Diagnostic Aid 5 Molecular reagents Reagents
Occupation or Discipline 5 Nomenclature History History <1>
Research Activity 5 CLONING Cloning
Disease or Syndrome 4 Disorders & Mutations Disease
Finding 4 view view
Nucleotide Sequence 4 SNPs Variants SNPs
Occupational Activity 4 Abstract Abstracting
Mapping ULT to the UMLS
• General concepts– Citation -> Organism attribute– Description -> Research activity– Symbol -> Idea or Concept– Name -> Intellectual product– History -> Finding– Matches -> Manufactured object– Link -> Chemical Viewed Structurally– Association -> Mental Process/ Social Behavior
General concepts
General classes« Metaterms »WordNet??
Upper Level Ontology
GeneralOntology
Domain Ontology
Idea or ConceptIntellectual ProductAttributesFunctional/Spatial/Temporal Concept
Semantic integration
3- Data Integration
Functional genomics
• Post genomics• Gene expression, protein function, biological
process, disease• van de Vijver MJ et al. A gene-expression
signature as a predictor of survival in breast cancer. N Engl J Med. 2002 Dec 19; 347(25): 1999-2009.
• Objective : provide « medical » annotation of genes (BioMeKe)
• GeneTraces (Cantor, Lussier)
Gene, gene product, disease
• HUGO : manage heterogeneity of data• Superoxide dismutase 1, soluble/ amyotrophic
lateral sclerosis 1 (adult)• C1420306 SOD1 gene (symbol) gene or Genome• C0669516 SOD1 gene product (symbol) Amino acid,
protein• C0002736 ALS (previous symbol) Disease or
Syndrome
• No relation in MTH
Gene, gene product, disease
• HUGO • Aconitase 1, soluble• C1412126 ACO1 gene (symbol) gene or
Genome• C0378502 ACO1 protein (symbol)/ IRP 1
protein (alias)• Amino acid, protein• OR relation between the two concepts in MTH
Gene, gene product, disease
• HUGO synonymous terms
T1 T3T2
C2
C1 C3
ST1 ST3
ST2
Gene, gene product, disease
Gene or Genome
AA, protein Disease or Syndrome
produces location_of
affectscauses
Reasoning
Reasoning
categorization
SN relations
Reasoning : relations
1- The hierarchy and the economy principle
The economy principle
• R1. Ad hoc precision– The intent is to establish a set of semantic types, which will be useful for a
variety of tasks without introducing undue complexity. The most specific semantic type in the semantic type hierarchy is assigned to the concept.
• R2. No hybrid types– Instead of creating a lattice structure, with hybrid types inheriting from
two supertypes, the SN has a single inheritance tree structure. As a consequence, a Metathesaurus concept inheriting from two STs is assigned to both types.
• R3. No category “other”– Rather than proliferating the number of semantic types to encompass
multiple additional subcategories, concepts that cannot be categorized by any sibling Semantic Type are simply assigned their common supertype.
The economy principle and the theory
• Intensions and extensions– Taxonomies (isa) are systems in which categories
(intensions) are related to one another by means of subordination, or, in class parlance (extensions), systems in which classes are related to one another by means of class inclusion.
• Categories and classes– When a category K has subcategories K1, K2, …. Kn, its
extension, the class CK is the union of the classes for each of its subcategories, i.e. CK1, CK2,……CKn.
Research Device
Manufactured object used primarily in carrying out scientific
research or experimentation
Medical Device
Manufactured object used primarily in the diagnosis, treatment, or prevention of
physiologic or anatomic disorders
Clinical Drug
Pharmaceutical preparation as produced by the manufacturer
Manufactured Object
physical object made by human beings
CMD CRD CCD
CMO
CMD CRD CCD
Categories
Classes
45 inch calibre bulletmagnetic tape, matches, corridor
Reasoning : relations
2- Associative relations
Diseases and Findings
Conceptual entity
Finding
Entity Event
Sign or Symptom Disease or syndrome
Pathologic function
Natural phenomenon or process
Diseases and Findings: SN
Finding
Sign or Symptom Disease or syndrome
Associated_with
Evaluation_of
Manifestation_of
Diagnoses
is_a
Relations SN
• Disease or Syndrome affects Disease or Syndrome • Disease or Syndrome associated_with Disease or Syndrome • Disease or Syndrome co-occurs_with Disease or Syndrome • Disease or Syndrome complicates Disease or Syndrome • Disease or Syndrome degree_of Disease or Syndrome • Disease or Syndrome manifestation_of Disease or Syndrome • Disease or Syndrome occurs_in Disease or Syndrome • Disease or Syndrome precedes Disease or Syndrome • Disease or Syndrome process_of Disease or Syndrome • Disease or Syndrome result_of Disease or Syndrome
Relations in SNOMED CT vs SN• Class ASNCT = SNCT concepts assigned to the Semantic Type A
• Class DISEASESSNCT = SNCT concepts assigned to ‘Diseases or Syndrome’
A
MTH restricted to SNCT
C
B
Relations in SNOMED CT
• MTH restricted to SNOMED CT• Relations whose SAB = SNOMED CT• 2,220,144 relations• 1,392,380 associative relations (including inverse relations)• 113 associative relationships (all have inverse except associated_with)• 18 relationships have less than 100 instances
– Has_time_aspect_of : 1– Has_property : 77
• The most frequent :– Has_onset : 114,173 – has_finding_site : 99,156 – has_method : 70,682
Relations in SNOMED CT• Focus on Diseases and Findings
• Class DISEASESSNCT = SNCT concepts assigned to ‘Disease or Syndrome’
• Class FINDINGSSNCT = SNCT concepts assigned to {‘Finding’ + ‘Sign or Symptom’}
Disease or Sd
MTH restricted to SNCT
Sign or symptom
Finding
Diseases-Diseases relations SNCT
• due_to
• definitional_manifestation_of
• associated_with
• occurs_before
• mapped_to
• has_finding_site
• has_associated_finding
• interprets• has_associated_morphology
Diseases-Diseases relations SNCT/SN
• due_to
• definitional_manifestation_of
• associated_with
• occurs_before
• mapped_to
• has_finding_site
• has_associated_finding
• interprets• has_associated_morphology
• result_of
• manifestation_of
• associated_with
• precedes , occurs_in, complicates?
• co-occurs_with
• degree_of
• process_of
• affects
SNCT SN
Findings-Diseases relations SNCT
• has_associated_finding / associated_finding_of
• has definitional manifestation/ definitional_manifestation_of
• interprets / is_interpreted_by/ has_interpretation
• occurs_after / occurs_before
• mapped_to /mapped from
• has_associated_morphology / associated_morphology_of
• due_to / cause_of
• focus_of
• has_finding_site
• isa / inverse is-a
Diseases-Findings relations SNCT/SN
• has_associated_finding / associated_finding_of
• has definitional manifestation/ definitional_manifestation_of
• interprets / is_interpreted_by/ has_interpretation
• occurs_after / occurs_before • mapped_to /mapped from • has_associated_morphology /
associated_morphology_of • due_to / cause_of • focus_of • has_finding_site • isa / inverse is-a
• associated_with
• manifestation_of
• diagnoses
• evaluation_of
SNCT SN
Diseases and Findings
Finding
Sign or Symptom Disease or syndrome
Associated_with
Evaluation_of
Manifestation_of
Diagnoses
is_a
Is_a5,592 instances
Diseases and Findings
Conceptual entity
Finding
Entity Event
Sign or Symptom Disease or syndrome
Pathologic function
Natural phenomenon or process
Diseases and Findings
Finding
Sign or Symptom
Disease or syndrome
C0000727Abdomen, acute
is_a C1300028Disorder characterizedby pain
Diseases and Findings
Finding
Sign or Symptom Disease or syndrome
C0008767Scar
has_finding_site
C1300028Endometriosisin scar of skin
Diseases and Findings
Conceptual entity
Finding
Entity Event
Sign or Symptom Disease or syndrome
Pathologic function
Natural phenomenon or process
Formal properties
• Guarino, Welty
• Rigidity– property that is essential to all the instances.
Person (+R). Physician (not R).
• Identity– there is a property that is both necessary and sufficient for identifying an
instance. Person (+I)
• Unity– instances are intrinsic wholes. Person (+U).
• Dependence– for all the instances x, necessarily some instance of Z must exist, which is
not a part of x, nor a constituent of x (+D). Food (+D)
Formal properties Rules
• Rules– (not U) cannot subsume (+U)
e.g., Substance cannot subsume Physical Object– […]
• Distinction between roles and sortal types– Roles: (Not Rigid) (+Dependent)– Sortal types : (+Rigid) (Not Dependent)
Formal properties: signs
• Signs or Symptoms are Roles• Metathesaurus concepts that are assigned only to
roles with no sortal Semantic Type represent a numerous set of entities
• About 90% of the MTH concepts assigned to Findings, and Signs or Symptoms are not assigned to another Semantic Type.
Roles vs relations
• Findings?• Sign or Symptom associated_with Disease or Syndrome
• Sign or Symptom diagnoses Disease or Syndrome
• Sign or Symptom evaluation_of Disease or Syndrome
• Sign or Symptom manifestation_of Disease or Syndrome
• Finding associated_with Disease or Syndrome
• Finding evaluation_of Disease or Syndrome
• Finding manifestation_of Disease or Syndrome
Diseases: frames
• Has_location
• Has_lesion : necrosis
• Has_process : infection– (has_agent)
• Has_discriminating_sign_or_finding– hematuria
• Occurs_in
Discussion
Perspectives (1) : coverage
• Extend the SN ???– Economy principle vs adding general concepts
• Resource integration ???– Needs in BIOmedical– Clarify conceptual entities– Semantic Types corresponding to general
entities
Perspectives (2) : compatibility
• Compatibility with general ontologies• Semantic web
• Alignment with existing domain ontologies• FMA (Zhang Medinfo 2004)
• SNOMED CT (Burgun ongoing work on SN relations)
• Rules (classification), consistency SN-MTH• E.g. sign or symptom is-a disease
Perspectives (3): formal aspects
• Formal ontology– Make relations and concepts more explicit, e.g. roles
(ULO), relationships between genes and diseases
– Cohérence, e.g. is-a relations between findings and diseases (studies and processing)
– Classification of new concepts, e.g upper MTH concepts (Bodenreider Medinfo 2004)
– Inference, e.g. use relations between anatomical sites and diseases to suggest new relations between diseases (Burgun submitted AMIA 2005)
References
• Mougin F, Burgun A, Loreal O, Le Beux P. Towards the automatic generation of biomedical sources schema. Medinfo. 2004;2004:783-7.
• Welty C, Guarino N. Supporting Ontological Analysis of Taxonomic Relationships (2001) Data Knowledge Engineering, http://www.ladseb.pd.cnr.it/infor/Ontology/Papers
• Zhang S, Bodenreider O. Comparing Associative Relationships among Equivalent Concepts Across Ontologies. Medinfo. 2004;2004:459-66.