RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth...
Transcript of RELATIONAL SUPPORT FOR PROTEGE › conference › 2003 › Raimundo_Lozano_Protege.pdfSixth...
RELATIONAL SUPPORTFOR PROTEGE
Raimundo LozanoFelipe Geva
Xavier PastorCSC
Sixth International Protégé Workshop
INTRODUCTION
SCOPE: “Structuring Concepts for Online Publishing Environment” (Project 22016Y1C1DMAL2)
Goal: Structuring scientific information in an ontology
Medical domain: Gastroenterology and Hepatology (G&H)
Hypothesis: Users able to search and retrieve information with
a higher level of abstraction than with actual keyword-based
systems
Implementation: Integrated tool for building and maintaining
medical ontologies
Sixth International Protégé Workshop
OBJECTIVES
Related functionalityImplement semantic search of contents
Knowledge representation
Multilingual support
Related interfaceFriendly interface
Structured presentation of results
Related technologyUse of standards
Open source contribution
Sixth International Protégé Workshop
METHODOLOGY
ConceptsMedical terminology framework of reference: UMLS (NLM)
Metathesaurus: + 800.000 conceptsMultilingual supportRelational database developed
Knowledge representation systemRDF: Relational database developedProtégé 2000: Extended
Articles categorizationRDF models storageUMLS search capabilities
Retrieve system: On the Web
Sixth International Protégé Workshop
GENERAL SCHEMA
Query Subsystem
Articles
User interface
Output
G&H
UMLS
Query
Ontological search
Related articles
Words
Concepts
Multilinguality
Ontology
Ontologicalorganization
?
Sixth International Protégé Workshop
IMPLEMENTATION
UMLS Relational DB
RDF Relational DB
Developed Plugins for Protégé UMLS
Categorization
RDF DB
Web searching system
Sixth International Protégé Workshop
UMLS - Metathesaurus
CONCEPTs TERMs STRINGCONCEPTs TERMs STRING
CUI’s SUI’sLUI’s
STRINGSTRING
STRING
STRINGSTRING
STRINGSTRING
Is organized by concept or meaning; its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts.
STRINGSTRING
Sixth International Protégé Workshop
UMLS - Normalization
term
concept
lui = lui�Upd(R); Del(R)
stt = stt�Upd(R); Del(R)
sui = sui�Upd(R); Del(R)
sui = sui�Upd(R); Del(R)sui = sui�Upd(R); Del(R)
sui = sui�Upd(R); Del(R)
sui = sui�Upd(R); Del(R)
cui = cui�Upd(R); Del(R)
concpt : 1cuivalid
char(8)numeric(4)
<pk>
concpt_pk
termluicuits
char(8)char(8)char(1)
<pk><fk>
term_pkterm_concept_fk1
string_typesttdescrp
varchar(3)varchar(255)
<pk>
string_type_pk
mrxw_itasuiwd
char(8)varchar(80)
<pk,fk><pk>
italian_fk1
string : 1suiluisttlong_strstrstr_txtlatlrl
char(8)char(8)varchar(3)bitvarchar(255)textchar(3)int
<pk><fk1><fk2>
string_pkstring_term_fk1string_type_string_fk2
mrxw_spasuiwd
char(8)varchar(80)
<pk,fk><pk>
spanish_fk1
mrxw_gersuiwd
char(8)varchar(80)
<pk,fk><pk>
german_fk1
mrxw_fresuiwd
char(8)varchar(80)
<pk,fk><pk>
french_fk1
mrxw_engsuiwd
char(8)varchar(80)
<pk,fk><pk>
english_fk1
MRST YCUIT UIST Y
A8A4VA41
MRCONCUILATTSLUISTTSUISTRLRL
A8A3A1A8VA3A8TXTSI
MRXW .SPALATW DCUILUISUI
A3VA80A8A8A8
MRXW .ENGLATW DCUILUISUI
A3VA80A8A8A8
MRXW .FRELATW DCUILUISUI
A3VA80A8A8A8
MRXW .GERLATW DCUILUISUI
A3VA80A8A8A8
MRXW .IT ALATW DCUILUISUI
A3VA80A8A8A8
String typestring
ORIGINAL
METATHESAURUS
FILES
NORMALIZED
TABLES
Mrxw_eng
Mrxw_fre
Mrxw_ger
Mrxw_spa
Mrxw_ita
Sixth International Protégé Workshop
UMLS - accessed from Protégé
Sixth International Protégé Workshop
UMLS - Functionality
englishgermanspanish
frenchitalian
CUI + WD
CUI + Description
English_Concept
French_Concept
German_Concept
Italian_Concept
Spanish_ConceptConceptCUIDescription
A8VA255
EnglishCUIWD
A8VA80
FrenchCUIWD
A8VA80
GermanCUIWD
A8VA80
ItalianCUIWD
A8VA80
SpanishCUIWD
A8VA80
OUTPUT QUERY
liverleberhígadofoiefegato
C0023895 Disease of liverC0023908 Liver transplantC0085605 Liver function failureC0023899 Liver ExtractC0019204 Carcinoma of liver cell
INPUT QUERY
liver
Sixth International Protégé Workshop
RDF
RDF “statements” consist ofresources (= nodes)
which have propertieswhich have values (= nodes, strings)
= subject= predicate= object
predicate(subject, object)resource valueproperty
The sentence “http://www.w3.org/Home/Lasilla has creator Ora Lasilla” would thus be diagrammed as:
From W3C RDF Model and Syntax Specification
Sixth International Protégé Workshop
RDFS
Collection of RDF resources that can be used to describe other resourcesProvide a mechanism to define vocabularies
RDFS basic elements
From W3C RDF Schema Specification
Sixth International Protégé Workshop
RDF STORAGE - Requirements
Wide scope, not limited to SCOPE project needsConceptual representation. Not attached to any specific formatPortable between different DBMS.
Sybase Adaptive Server AnywhereSybase Adaptive Server EnterpriseOracle 8i
Efficiency retrieving concepts
Sixth International Protégé Workshop
RDF STORAGE
No good models proposedvery simplenot efficient
Solutionto design a new storage modeltaking advantage of relational capabilities
making explicit all RDF components defined in the RDFS specification: classes, properties, literals, etc.
Sixth International Protégé Workshop
RDF – DB design (1)
Sixth International Protégé Workshop
RDF – DB design (2)
Sixth International Protégé Workshop
RDF – DB design (3)
Sixth International Protégé Workshop
RDF – DB design (4)
Sixth International Protégé Workshop
RDF STORAGE - Class hierarchy
Classes organized in a tree with indexesvery fast searches of subclasses
4
2
3 7
9
1 9
5 76
8 9
disease
ulcer
duodenalgastric
chronicacute
Sixth International Protégé Workshop
RDF STORAGE – Multiple inheritance
4
2
3 7
9
1 9
5 76
8 9
disease
ulcer
duodenalgastric
Gastroduodenal chronic ulcer
acute
1 1disease 0
ulcer2 10
3 7
4 5
76
8 1gastric duodenal 0
Gastroduodenal chronic ulcer
acute Gastroduodenal chronic ulcer
9 10
Sixth International Protégé Workshop
RDF STORAGE - Interface
Basic element: the Statement
RDF BD
To insert
To removeRDF Statement
Stored procedures
Sixth International Protégé Workshop
PLUGINS – Common featuresEach plugin is implemented by a class derived from AbstractTabWidget.
Access to Protégé classesKnowledgeBase Class management -> ClsProperties management -> SlotTree interface -> ClsesPanel
Tab presentationEasy configuration
Database access using jdbc:odbc.It is allowed to choose the database
Plugin RDF.Plugin Categorization.
Sixth International Protégé Workshop
PLUGINS - UMLS
UMLS
jdbc:odbc
connection
Sixth International Protégé Workshop
PLUGINS - UMLS
Concept searchVariable number of terms allowedOrdered result list with the most similar concept highlighted
Adding a concept to the ontologyMultiple parents selection allowedAutomatic addition of:
UMLS code UMLS semantic typeSemantic description
Sixth International Protégé Workshop
PLUGINS - Categorization
G&H
jdbc:odbc
connection
Sixth International Protégé Workshop
PLUGINS - Categorization
Show the list of articlesTitle, volume, issue, abstract...
CategorisationSelecting an articleArticle class automatically createdArticle identifier automatically addedVolume and issue parents automatically statedAllow selecting other parents
Sixth International Protégé Workshop
PLUGINS - RDF
.RDF.RDFS
JENA
Java API forRDF
RDF
Model comparison and statements extraction
Stored procedures on the database
RDF-XML file
Sixth International Protégé Workshop
RDF STORAGE – Integrity
Valid model needed Protégé
Not ordered statements in RDF: validity assumed
Automatic creation of needed resources
e.g.: (Gastritis, type, Disease)If not exists Disease class Disease is created
Sixth International Protégé Workshop
PROBLEMS – Name modification
The common identifier between the database and Protégé is
the resource name
The user is allowed to modify the name
Changes on Protégé needed
List of modified elements in DefaultKnowledgeBase
New attributes and functions in DefaultFrame
Sixth International Protégé Workshop
PROBLEMS – Type definition
Problems with abbreviated format of type definition
Protégé read as a literal
RDFFrameWalker.getDirectType(Resource resource)
modified to create the class
Sixth International Protégé Workshop
AKNOWLEDGEMENTS
SCOPE partnersUniversitat Pompeu Fabra: the coordinating institution for SCOPEDOYMA: a branch of Havas-MediMediaOrbiTeam Software GmbH: a spin-off company of GMD, the German National Research Center for Information Technology SESI group of the University of Wales, Bangor
Other institutionsStanford Medical InformaticsNational Library of Medicine