The Use of Machine-Generated Ontologies in Dynamic Information Seeking

CoopIS’2001Trento, Italy

The Use of Machine-Generated Ontologies in Dynamic Information Seeking

Giovanni ModicaAvigdor Gal

Hasan M. Jamil


Motivating example


PreliminariesDefinition: An ontology is an explicit representation of a conceptualization. (Gruber 1993)Conjecture I: Applications in a given domain base their information exchange on some (shared) underlying ontology.Observation: Application in a given domain use different ontology representation.Conjecture II: Given an application A such that A utilizes an ontology representation OA, and an ontology O, there exists an invertible mapping fA such that

fA(OA)=O


Problem descriptionGiven two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA (OB)=OA

In a perfect world:– O is known.

– fA is known.

– fB is known.

OA= fA-1(fB(OB))

Alas:– O is unknown. At best, an approximation of O exists, in a

form of a standard.

– fA and fB are unknown: lack of documentation, the mental state of a designer, etc.


Proposed solution

Given two applications A and B, such that A utilizes an ontology representation OA and B utilizes an ontology representation OB, introduce a mapping fBA such that

fBA depends on the ontology representation.

A matching is associated with a “degree of confidence” in the matching.

0 identifies non-matching terms.1 identifies a crisp matching.

]1,0[: BABA OOf


Ontology representation

Dynamic information seeking:– HTML forms

• Labels• Input fields• Scripts

– Assumptions:• Labels represent terms in an ontology (e.g., Pick-up Date).• Input fields provide constraints on the value domains (e.g., {Day, 1,…

31}).• Scripts, among other things, suggest a precedence relationship (e.g.,

Pick-up Locations is required before selecting a Car Type).


Ontology representation

Conceptual modeling approachBased on Bunge:– Terms (things)– Values– Composition– Precedence


Ontology extraction and matchingURL (e.g. http://www.avis.com)

HTMLParsing

DOMTree

Phase 1Parsing

Phase 2Labeling

HTML Elements

Label Identification

FORM Elements

rules

Form Renderin

g

Phase 3Ontology

Phase 4Merging

KB

KB Submission

Matching Algorithms

Target/Candidate Ontology

Target Ontology

CandidateOntology

Refined Ontology

Ontology Creation

Thesaurus


Phase 1: Parsing


Phase 2: Labeling


MergingHeuristics for the ontology merging (Frakes and Baeza-Yates,

1992): Textual matching: Date date Pickup pickup Ignorable characters removal: *Country country De-hyphenation: Pick-up Pickup Pickup Pick up Stop terms removal:

Date of Return Return DateStop terms: a, to, do, does, the, in, or, and, this, those, that,

… etc. Substring matching: Pickup Location Code Pick-up location

(66%) Content matching:

Dropoff Day (1,..,31) Return Day (1,..,31) (100%)Dropoff Return

Thesaurus matching: Dropoff Location Return Location (100%)


Phase 4: Merging


Preliminary Results

Two metrics are used for performance analysis (Frakes and Baeza-Yates, 1992):

Recall (completeness) Precision (soundness)

Parameters: tr : number of terms retrieved tm : number of terms matched te : number of terms effectively matched

r

m

t

tR

m

e

t

tP Recall: Precision:


Preliminary Results

RPb

PRbE

2

2 )1(1

Example: # of terms in Ontology1: 20# of matches identified: 15 Recall: 75%(15/20)# of effective matches: 10 Precision: 66%

(10/15)A third metric is used to compare the recall and precision. For a precision value P, a recall value R and an importance measure b, the combined metric E is calculated as (Frakes and Baeza-Yates, 1992):


Preliminary Results

Precision vs. Recall (Avis & Hertz)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Textual I gn. Chars. De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Recall

Precision

Strategy Recall Precision ETextual 0.3 0.33 0.673913Ign. Chars. 0.3 0.33 0.673913De-hyph. 0.6 0.33 0.634146StopTerms 0.6 0.33 0.634146Substring 0.75 0.40 0.558824Substring Names 0.75 0.67 0.318182Content 0.65 0.92 0.148472Thesaurus 0.65 0.92 0.148472


Preliminary Results

E Metric for Hertz vs. Alamo (b=0.5)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Textual Ign.Chars.

De-hyph. StopTerms Substring SubstringNames

Content Thesaurus

Hertz

Alamo


Preliminary Results

Learning from Thesaurus

0.389534884

0.479166667

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

No Thesaurus Improved Thesarus

E (b=0.5)


Summary and Future Work

We have introduced:– Automatic ontology creation– Automatic matching process– Preliminary results

Future work oriented towards:– Incorporation of query facilities into the tool– Automatic navigation of web sites for ontology extraction– Dynamic translation between queries against the target ontology to

queries against the multiple candidate ontologies

The Use of Machine-Generated Ontologies in Dynamic Information Seeking

Documents

Transcript of The Use of Machine-Generated Ontologies in Dynamic Information Seeking