The Use of Machine-Generated Ontologies in Dynamic Information Seeking
description
Transcript of The Use of Machine-Generated Ontologies in Dynamic Information Seeking
CoopIS’2001Trento, Italy
The Use of Machine-Generated Ontologies in Dynamic Information Seeking
Giovanni ModicaAvigdor Gal
Hasan M. Jamil
CoopIS’2001Trento, Italy
Motivating example
CoopIS’2001Trento, Italy
PreliminariesDefinition: An ontology is an explicit representation of a conceptualization. (Gruber 1993)Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology.Observation: Application in a given domain use different ontology representation.Conjecture II: Given an application A such that A utilizes an ontology representation OA, and an ontology O, there exists an invertible mapping fA such that
fA(OA)=O
CoopIS’2001Trento, Italy
Problem descriptionGiven two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that
fBA (OB)=OA
In a perfect world:– O is known.
– fA is known.
– fB is known.
OA= fA-1(fB(OB))
Alas:– O is unknown. At best, an approximation of O exists, in a
form of a standard.
– fA and fB are unknown: lack of documentation, the mental state of a designer, etc.
CoopIS’2001Trento, Italy
Proposed solution
Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that
fBA depends on the ontology representation.
A matching is associated with a “degree of confidence” in the matching.
0 identifies non-matching terms.1 identifies a crisp matching.
]1,0[: BABA OOf
CoopIS’2001Trento, Italy
Ontology representation
Dynamic information seeking:– HTML forms
• Labels• Input fields• Scripts
– Assumptions:• Labels represent terms in an ontology (e.g., Pick-up Date).• Input fields provide constraints on the value domains (e.g., {Day, 1,…
31}).• Scripts, among other things, suggest a precedence relationship (e.g.,
Pick-up Locations is required before selecting a Car Type).
CoopIS’2001Trento, Italy
Ontology representation
Conceptual modeling approachBased on Bunge:– Terms (things)– Values– Composition– Precedence
CoopIS’2001Trento, Italy
Ontology extraction and matchingURL (e.g. http://www.avis.com)
HTMLParsing
DOMTree
Phase 1Parsing
Phase 2Labeling
HTML Elements
Label Identification
FORM Elements
rules
Form Renderin
g
Phase 3Ontology
Phase 4Merging
KB
KB Submission
Matching Algorithms
Target/Candidate Ontology
Target Ontology
CandidateOntology
Refined Ontology
Ontology Creation
Thesaurus
CoopIS’2001Trento, Italy
Phase 1: Parsing
CoopIS’2001Trento, Italy
Phase 2: Labeling
CoopIS’2001Trento, Italy
Phase 2: Labeling
CoopIS’2001Trento, Italy
Phase 2: Labeling
CoopIS’2001Trento, Italy
MergingHeuristics for the ontology merging (Frakes and Baeza-Yates,
1992): Textual matching: Date date Pickup pickup Ignorable characters removal: *Country country De-hyphenation: Pick-up Pickup Pickup Pick up Stop terms removal:
Date of Return Return DateStop terms: a, to, do, does, the, in, or, and, this, those, that,
… etc. Substring matching: Pickup Location Code Pick-up location
(66%) Content matching:
Dropoff Day (1,..,31) Return Day (1,..,31) (100%)Dropoff Return
Thesaurus matching: Dropoff Location Return Location (100%)
CoopIS’2001Trento, Italy
Phase 4: Merging
CoopIS’2001Trento, Italy
Preliminary Results
Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992):
Recall (completeness) Precision (soundness)
Parameters: tr : number of terms retrieved tm : number of terms matched te : number of terms effectively matched
r
m
t
tR
m
e
t
tP Recall: Precision:
CoopIS’2001Trento, Italy
Preliminary Results
RPb
PRbE
2
2 )1(1
Example: # of terms in Ontology1: 20# of matches identified: 15 Recall: 75%(15/20)# of effective matches: 10 Precision: 66%
(10/15)A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992):
CoopIS’2001Trento, Italy
Preliminary Results
Precision vs. Recall (Avis & Hertz)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Textual I gn. Chars. De-hyph. StopTerms Substring SubstringNames
Content Thesaurus
Recall
Precision
Strategy Recall Precision ETextual 0.3 0.33 0.673913Ign. Chars. 0.3 0.33 0.673913De-hyph. 0.6 0.33 0.634146StopTerms 0.6 0.33 0.634146Substring 0.75 0.40 0.558824Substring Names 0.75 0.67 0.318182Content 0.65 0.92 0.148472Thesaurus 0.65 0.92 0.148472
CoopIS’2001Trento, Italy
Preliminary Results
E Metric for Hertz vs. Alamo (b=0.5)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Textual Ign.Chars.
De-hyph. StopTerms Substring SubstringNames
Content Thesaurus
Hertz
Alamo
CoopIS’2001Trento, Italy
Preliminary Results
Learning from Thesaurus
0.389534884
0.479166667
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
No Thesaurus Improved Thesarus
E (b=0.5)
CoopIS’2001Trento, Italy
Summary and Future Work
We have introduced:– Automatic ontology creation– Automatic matching process– Preliminary results
Future work oriented towards:– Incorporation of query facilities into the tool– Automatic navigation of web sites for ontology extraction– Dynamic translation between queries against the target ontology to
queries against the multiple candidate ontologies