Linguistic enrichment of ontologies: a glance to Linguistic enrichment of ontologies: a glance to the role of previously existing linguistic the role of previously existing linguistic
resourcesresources
Maria Teresa Pazienza, Armando Stellato{pazienza,stellato}@info.uniroma2.it
ART group, Dept. of Computer Science, Systems and Production
18/04/23 2
MotivationMotivation
Ontologies provide vocabularies through which agents in the Semantic Web will be able to communicate
– Every specific ontology bears its semantics, which is specified
by:
• the interpretation given by people using the ontology inside a given
framework
• the consistent use that applications make of ontological knowledge
How can we recognize if and when these constraints are considered? Or, at least…
18/04/23 3
Role of Natural LanguageRole of Natural Language
What information can both humans and machines rely on? …natural language
• Natural Language is the last exploitable resource– …to convey data semantics
• It helps humans in understanding how formal objects relate to their world knowledge
• It may help machines in harmonizing different conceptualizations
– Pros and cons:• Pros: it offers a rich and universally accepted mean for express meaning
• Cons: it is ambiguous; phenomena like synonymy and homonymy must be taken in consideration
• Possible exploitations for a linguistically motivated approach to ontology development:
– Provides useful linguistic anchors for improving knowledge sharing efforts– Strengthens relationships between ontology and raw textual information (for tasks
like information extraction, ontology population etc…)– Enhances knowledge understanding and reuse even for humans
18/04/23 4
Enriching ontologies withEnriching ontologies withlexical informationlexical information
• Possible scenarios for linguistic enrichment:
– Explicit Linguistic Enrichment
Ontology
Linguistic
Resource
18/04/23 5
Enriching ontologies withEnriching ontologies withlexical informationlexical information
• Possible scenarios for linguistic enrichment:
– Producing Multilingual Ontologies
Ontology
Bilingual Linguistic
Resource
18/04/23 6
Enriching ontologies withEnriching ontologies withlexical informationlexical information
• Possible scenarios for linguistic enrichment:
– LexicoSemantic Enrichment of Ontologies
Craftsman
Employee
event Academic
...
Technician Administrative...
ProfessorResearcher ...
…
Ontology Linguistic Resource with aSemantic structure (e.g WordNet)
Worker
18/04/23 7
Exploiting Linguistic ResourcesExploiting Linguistic Resources
• Different Linguistic Resources (LRs) are available on the Web
• These resources differentiate upon:– Trustworthiness: from free initiatives to coordinated research
projects
– Complexity: quantity and quality of detailed information, adopted
model, morphology…
– Representation: no standard for representation of linguistic
resources
– Implementation: available as databases, huge xml repositories,
proprietary text formats etc..
18/04/23 8
Tools for Linguistic Enrichment: Tools for Linguistic Enrichment: RequirementsRequirements
• (possibly) embedded in ontology editing applications
• Browsing different linguistic resources
• Providing functionalities for:
– Querying LRs with terms from ontology
– Enriching ontology concepts with linguistic information
– Synonyms
– Rich textual descriptions
– Translations in different languages
– Semantic Indexes from LR
– Supporting ontology development by reusing semantic
information from linguistic resources (when available)
18/04/23 9
InfrastuctureInfrastucture
The Linguistic
Watermark– Offers a
classification of
different LRs
– Provides API
for accessing
their content
18/04/23 10
InfrastuctureInfrastucture
The Linguistic
Watermark– Offers a
classification of
different LRs
– Provides API
for accessing
their content
WordNetWordNethttp://wordnet.princeton.edu/
18/04/23 11
InfrastuctureInfrastucture
The Linguistic
Watermark– Offers a
classification of
different LRs
– Provides API
for accessing
their content
FreelangFreelanghttp://www.freelang.net/
18/04/23 12
InfrastuctureInfrastucture
The Linguistic
Watermark– Offers a
classification of
different LRs
– Provides API
for accessing
their content
DictDicthttp://www.dict.org/bin/Dict
18/04/23 13
OntoLing: a OntoLing: a tool for semi-automatic tool for semi-automatic linguistic enrichment of ontologieslinguistic enrichment of ontologies
• Deployed as a plug-in for the popular ontology editing tool Protégé ( http://protege.stanford.edu/ then go plugins -> OntoLing )
• Exploits the Linguistic Watermark API for accessing LRs• Support linguistic enrichment of ontologies and ontology development
Linguistic
Browser
Ontology
Browser
GUI
Facade
Linguistic Interface
<<interface>>
Protégé API
Ontoling Core
Ontoling
Architecture
Different resources may be plugged and recognized at run time, by inspection of their Linguistic Watermark
Wordnet 1.7
WordnetInterface
<<Implementation>>
FreeDictInterface
<<Implementation>>
…Interface
<<Implementation>>
Wordnet 2.1
Wordnet 2.0
Italian Hungarian
EnglishItalian
EnglishDanish ... ……
18/04/23 14
…synonyms……semantic pointers to the LR…Linguistic Metadata for…Concepts documentation
Search linguistic expressions inside the LR
Explore semantic relationships which characterize the LR
…and linguistic relationships
Integration between ontology and linguistic resource: search ontology terms inside the linguistic resource
Assist ontology creation by extracting portions of knowledge from the LR
Linguistic Enrichment of the Ontology
Ontology concepts bear a greater linguistic expressivity: this helps in identifying similarities with other conceptualizations.
18/04/23 15
Adaptive behaviour andAdaptive behaviour andGraphic User InterfaceGraphic User Interface
• Linguistic Resources may be loaded into OntoLing at run time
• Upon initialization they declare themselves and their specific Linguistic
Watermark
• OntoLing understands their capabilities and rearranges its Linguistic
Browser according to properties and characteristics exhibited by the LR
• Different functionalities for enriching ontologies with content from the loaded
LR are also activated depending on its watermark
• Support to semiautomatic enrichment also takes into consideration which ki
18/04/23 16
Dynamic FunctionalitiesDynamic Functionalities
• The Linguistic Watermark provides a generic interface which embraces typical LR configurations and structures
• Three methods act as service providers, in that they allow the definition of functionalities dedicated to the exploration of particular aspects of a given LR
– exploitSearchMethod
– exploreSemanticRelation
– exploreLexicalRelation
18/04/23 17
Representing Linguistic Information Representing Linguistic Information inside Ontologiesinside Ontologies
• Standard Protégé Model
– Use of meta-classes
• Linguistic-class
• Linguistic-slot
– A terminology slot (one for
each language) for
indicating synonyms
– Frame Documentation Slot
• Protégé-OWL
– Use of standard rdfs
properties:
• rdfs:label to indicate
synonyms (also specifying
the language)
• rdfs:comment to provide
documentation about
ontology objects
18/04/23 18
Summarizing Summarizing
• attention paid to formal conceptual representation in the Semantic Web is not being matched by an equivalent interest on how this information will be made easily accessible by humans, and by machines not sharing any form of semantic commitment.
• A wider and deeply aware adoption of Natural Language in representing knowledge could fill this gap
• We developed infrastructures and a tool for:– General framework for describing different kind of LRs
– provide functionalities for accessing their content
– enriching ontologies with information from LR
– Support a “linguistically aware ontology development”
• Future Work:– Integrate as many lexical resources as possible!
– Include interfaces for accessing and exploiting other kind of linguistic resources (e.g. Framenet)
– Establish more complex connections between lexical resources and ontologies
18/04/23 19
Automatic Lexico-Semantic Automatic Lexico-Semantic Enrichment (LSE) of OntologiesEnrichment (LSE) of Ontologies
• Objective:
– identify pointers (lexico-semantic anchors) from ontological objects to
semantic entities (e.g. synsets, for WordNet) of a linguistic resource
• Through:
– Observed linguistic/semantic similarities between the ontology and the
Linguistic Resource (LR) exploited for enrichment
• Exploitable Linguistic Watermarks:
– ConceptualizedLR
– At least one from:
• TaxonomicalLR
• LRWithGlosses
18/04/23 20
Automatic Lexico-Semantic Automatic Lexico-Semantic Enrichment (LSE) of OntologiesEnrichment (LSE) of Ontologies
Intuition behind the strategy:
If a semantic pointer links a frame-synset pair <F,S>
Then other frame-synset pairs (where the frame is more specific/more
generic than F and the synset is narrower/broader than S) have a good
probability of being linked through a semantic pointer
18/04/23 21
Automatic LSE of Ontologies:Automatic LSE of Ontologies:the Frameworkthe Framework
• O: space of ontological objects, called Frames (classes, properties,
individuals)
• L: space of semantic indexes (semex) in the LR
• Plausibility Matrix MP (defined over a O×L space)
– MP(i,j) represents the plausibility that the ontological object i be matched with
the semantic index j
• Evidence Matrix ME (defined over a O×L space)
– contains in each element ME(i,j) the set of evidences which contribute to the
computation of element MP(i,j) in the Plausibility Matrix.
18/04/23 22
Automatic LSE of Ontologies:Automatic LSE of Ontologies:the Frameworkthe Framework
• Discovery Phase– Objective: reduce the dimension of the L space
– Process: find candidate (lexical) anchors between elements in O
and elements in L, through:
• Search filtered by String similarity measures
• Exploitation of Translation and/or Synonyms vocabularies (possibly the LR itself)
– Output:
• LA L (all synsets bound by candidate anchors)
– Notes:• Maximize recall
18/04/23 23
Automatic LSE of Ontologies:Automatic LSE of Ontologies:the Frameworkthe Framework
Semantic Enrichment function:
Implemented through:
– Extraction of semantic/linguistic similarity evidences ME
– Computation of MP
Due to mutual dependencies between evidences for different candidate anchors:
and:
: 0..1se Af O L
( )se sef f t
( ) , ( 1), (0)P E P PM t f M M t M
18/04/23 24
Automatic LSE of Ontologies:Automatic LSE of Ontologies:the Frameworkthe Framework
Legenda:
– candidate pair : < f, s > (< frame, semex >)
with: f O ; s LA
where: p(f,s,0) ≠ 0.
– Smarter notation for plausibility:
( , , ) ( , ) with ( )def
P P Pp f s t M f s M M t
18/04/23 25
Implementing Implementing ffsese
• Guidelines
1. prizing candidate pairs characterized by positive
evidences.
2. punishing candidate pairs characterized by negative
evidences
3. evaluate quantitative factors associated to different
kind of evidences (representing the strength, or
presence, of the evidence)
4. take into account inherent ambiguity (polysemy) of
every label associated to ontology concepts
18/04/23 26
0 01
1 0
1 1 , 1
( )1
1 1 1 , 1
n
ii
m
ii
p t p
p t
tp
Implementing Implementing ffsese
Plausibility
at time = 0
Plausibility threshold
for an anchorto be confirmed
Plausibility threshold
for an anchorto be discarded
Ambiguity (polysemy) of term bounding synset to frame
Plausibilityat time t
18/04/23 27
0 01
1 0
1 1 , 1
( )1
1 1 1 , 1
n
ii
m
ii
p t p
p t
tp
Implementing Implementing ffsese
Plausibility
at time = 0
Plausibilityat time t
Weight related to single evidence at
time t
Positive EvidencesContribution
Negative EvidencesContribution
Plausibility
at time = 0
Plausibilityat time t
Normalizationfactor
18/04/23 28
Extracting evidencesExtracting evidences (1)(1)
Establishing proper context for each type of frame and for each type of evidence
computeConceptualSphere(Frame frm, int DepthRange) SET OF Frameinput
frm: the class, property or individual which has been selected for linguistic enrichmentDepthRange: the number of allowed hops along the IS-A relation for retrieving super concepts of frm
outputConceptualSphere: the conceptual sphere surrounding frm
beginFrameType type getOntoType(frm)SET OF Frame ConceptualSphere {}if (type = class or type = property)
ConceptualSphere ConceptualSphere getSuperConcepts(frm, DepthRange)else //frm is an instance
Classes getClasses(frm)for each class Classes do
ConceptualSphere ConceptualSphere {class} getSuperConcepts(class, DepthRange)
end forend ifif (type = class)
for each property p, class c | frm.hasRestriction(p,c) or c.harRestriction(p,frm) doConceptualSphere ConceptualSphere { c } { p }
if (type = instance)for each property p ( frm.getOwnRelationalProperties() ) do
ConceptualSphere ConceptualSphere { p } frm.getOwnPropertyValues(p)end ifif (type = property)
for each class c ( domain(frm) range(frm) ) doConceptualSphere ConceptualSphere {class}
end ifreturn ConceptualSphere
end
18/04/23 29
Extracting evidencesExtracting evidences (2)(2)
Examined evidences
– Analysis of Taxonomical alignment
• ConceptualSphere (context) := the transitive closure of the IS-A
relationship in the ontology (and hyponymy relation for LRs)
• Requirements: TaxonomicalLR compliant Linguistic Resource
– Analysis of glosses from the LR
• ConceptualSphere := depends on frame type (see example in
previous slide)
• Requirements: LRWithGlosses compliant Linguistic Resource
18/04/23 30
Extracting evidencesExtracting evidences (3)(3)
Evidences based on Taxonomical Alignment
Reflect alignment between the respective structures of the ontology and the
linguistic resource exploited for enrichment
Captured taxonomy patterns may have positive as well as negative influence
over the plausibility of a given < frame, semex > pair
Positive Evidence Negative Evidence
FH SH
FL SL
IS-A
semantic pointer
pair candidate for asemantic pointer
ONT LR
FL SL
IS-A
ONT LR
candidate pair candidate pairSHFH
18/04/23 31
Extracting evidencesExtracting evidences (3)(3)
, sgn , , 1i TAt p frame semex t
Weighting coefficient for
Taxonomy Alignment
sign
Plausibility at step t-1 of frame/semex
pair closing the alignment square
Evidences based on Taxonomical Alignment
Reflect alignment between the respective structures of the ontology and the
linguistic resource exploited for enrichment
Captured taxonomy patterns may have positive as well as negative influence
over the plausibility of a given < frame, semex > pair
18/04/23 32
Extracting evidencesExtracting evidences (4)(4)
Evidences extracted through Analysis of Glosses
Glosses bear a lot of semantic information; it is not formally explicited, but,
once unveiled, can provide useful hints on how to properly match ontology
concepts and linguistic expressions
Gloss Analysis generates three kind of evidences, provided by:
• glosses which contain linguistic reference to concepts expressed in the ontology and which are semantically related to the concept being enriched
• glosses which contain linguistic reference to concepts which at least exist in the ontology
• linguistic overlap between glosses of synsets which are candidate to enrich related concepts
Next slides: examples for enrichment of baseball ontology from:
http://www.daml.org/2001/08/baseball/baseball-ont
18/04/23 33
Ontology Linguistic Resource
Division
League
Noun.7741947
Gloss:A league ranked by quality; ”he played baseball in class D…
rdf triple: League division Division
GlossRelateds,League,prop(class,domain),1
,i GRv t MatchingLevel
Glosses containing linguistic reference to Glosses containing linguistic reference to semantically related conceptssemantically related concepts
for each Frame rc ConceptualSphere doMtchLvl match(rc, gloss), if MtchLvl 0 Evidences Evidences evd(GR, rc, MtchLvl) end ifend for
18/04/23 34
Noun.179011
Gloss: A score in baseball made by a runnertouching all four bases safely;
"the Yankees scored 3 runs in the bottom of the 9th";"their first tally came in the 3rd inning"
Glosses containing linguistic reference to Glosses containing linguistic reference to concepts which exist in the ontologyconcepts which exist in the ontology
for each term t gloss do Frame rc find(Ontology, t, MtchLvl), if rc null Evidences Evidences evd(GG, rc, MtchLvl) end ifend for
Ontology Linguistic Resource
Run
Inning
Inning O
GlossGeneral,Inning,1
,i GGv t MatchingLevel
18/04/23 35
Noun. 7009602
series that constitutes theplayoff for the baseball championship
Overlap between glosses of synsets which Overlap between glosses of synsets which are candidate to enrich related conceptsare candidate to enrich related concepts
for each Frame rfi ConceptualSphere do for each synset sij candidateSynsets(rfi) do let rfgloss[i,j] sj.getGloss() end for for each term t, t gloss and t rfgloss[i,j] let freq = LR.getGlossFrequency(t) if !filter(freq) Evidences Evidences evd(GO, rfi, si, freq) end if end forend for
Ontology Linguistic Resource
WorldSeries
home
rdf triple: WorldSeries home Team
Noun. 3399133
(baseball) base consisting of a rubber slab where the batter stands
GlossOverlap,baseball, home-noun.3399133,1
, , , 1i GOv t MatchingLevel object synset t
18/04/23 36
Testing our frameworkTesting our framework
Experimental setup:
Fine tuning of evidence-typed σ-parameters has been performed over a
collection of several small ontologies and/or portions of them
Two ontologies used for testing, WordNet used for enrichment in both cases:
1. BASEBALL ontology ( http://www.daml.org/2001/08/baseball/baseball-ont )
– Original version in DAML+OIL and converted to OWL
– 78 classes, 26 properties and 13 individuals
– 75,3% of ambiguous concepts, average ambiguity ~9,16
– Inter-annotator agreement = 98.76% (one contrasting decision out of the whole oracle)
2. MOSES Ontology about university ( http://www.mondeca.com/owl/moses/ita.owl )
– developed in the context of the EU funded project MOSES (IST-2001-37244)
– built, in OWL language, over a pre-existing DAML ontology, and finalized for representing the
Italian university domain
– 192 classes, 122 properties
– 73,1% of ambiguous concepts, average ambiguity ~5,23
18/04/23 37
Experimental resultsExperimental results
Detailed analysis of the test data on the first experiment revealed that,
though only 40% of the original corpus (ontology) has been correctly
enriched, another 50% contains the right choice as first (but still under
acceptance threshold), second or third in order of plausibility
Ontology Precision Recall
Baseball Ont 80% 39,5%
Moses Italian 81,48% 42,72%
18/04/23 38
ConclusionsConclusions
• attention paid to formal conceptual representation in the Semantic Web is not being matched by an equivalent interest on how this information will be made easily accessible by humans, and by machines not sharing any form of semantic commitment.
• A wider and deeply aware adoption of Natural Language in representing knowledge – or, at least, support knowledge representation – could fill this gap
• We defined a first framework for:– describing LRs (under an “operational point of view”) and for enriching ontologies with their
content
– (Semi)Automatically enrich the content of ontologies with information from linguistic resources
• Future work:– Large scale (ontologies) testing!
– Improving glosses processing (pos tagging, shallow parsing…)
– Development of new techniques for multilingual ontology enrichment (possibly exploiting more than one LR at a time)
– Embedding all these techniques inside existing frameworks for ontology editing
ReferencesReferences
Maria Teresa Pazienza, Armando Stellato An Environment for Semi-automatic Annotation of Ontological Knowledge with Linguistic Content 3rd European Semantic Web Conference (ESWC 2006) Budva, Montenegro, June 11-14, 2006
Maria Teresa Pazienza, Armando Stellato Exploiting Linguistic Resources for building linguistically motivated ontologies in the Semantic Web Second Workshop on Interfacing Ontologies and Lexical Resources for Semantic Web Technologies (OntoLex2006), held jointly with LREC2006 ,Magazzini del Cotone Conference Center, Genoa, Italy, 24-26 May 2006
Maria Teresa Pazienza, Armando Stellato Linguistic Enrichment of Ontologies: a methodological framework Second Workshop on Interfacing Ontologies and Lexical Resources for Semantic Web Technologies (OntoLex2006), held jointly with LREC2006 ,Magazzini del Cotone Conference Center, Genoa, Italy, 24-26 May 2006
18/04/23 40
References References
Maria Teresa Pazienza, Armando Stellato Linguistically motivated Ontology Mapping for the Semantic Web SWAP 2005, the 2nd Italian Semantic Web Workshop Trento, Italy, December 14-16, 2005
Maria Teresa Pazienza, Armando Stellato The Protégé Ontoling Plugin - Linguistic Enrichment of Ontologies in the Semantic Web 4th International Semantic Web Conference (ISWC-2005) Galway, Ireland, November, 2005
Armando Stellato, Michele Vindigni, Fabio Massimo Zanzotto XeOML: An XML-based extensible Ontology Mapping Language Workshop on Meaning Coordination and Negotiation, held in conjunction with 3rd International Semantic Web Conference (ISWC-2004) Hiroshima, Japan, November 8, 2004
18/04/23 41
Thanks for your attention….
see you in Roma for Aiia07 congress
http://aiia.info.uniroma2.it
Top Related