CSR : Discovering Subsumption Relations for the Alignment of Ontologies

34
CSR: Discovering Subsumption Relations for the Alignment of Ontologies Vassilis Spiliopoulos 1, 2 , Alexandros G. Valarakos 1 , and George A. Vouros 1 1 AI Lab Department of Information and Communication Systems Eng. University of the Aegean 83200 Karlovassi, Samos, Greece {vspiliop, georgev}@aegean.gr

description

CSR : Discovering Subsumption Relations for the Alignment of Ontologies. Vassilis Spiliopoulos 1, 2 , Alexandros G. Valarakos 1 , and George A. Vouros 1 1 AI Lab Department of Information and Communication Systems Eng. University of the Aegean 83200 Karlovassi, Samos, Greece - PowerPoint PPT Presentation

Transcript of CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Page 1: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

CSR: Discovering Subsumption Relations

for the Alignment of Ontologies

Vassilis Spiliopoulos1, 2, Alexandros G. Valarakos1, and George A. Vouros1

1 AI LabDepartment of Information and Communication Systems

Eng.University of the Aegean

83200 Karlovassi, Samos, Greece{vspiliop, georgev}@aegean.gr

Page 2: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Outline Introduction Problem Definition Why Subsumption Relations Related Work The Method Experimental Results Conclusions

Page 3: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Ontology Concept features

Properties Data type Object Property

(relation) Instances Comments

Concepts organized into hierarchies (subsumption relation)

Ontology Languages OWL Family Union, Intersection,

Disjointness

Publication

Proceedings Book

Edited Selection

Monograph

Referenceof

# of pages title

date

The Semantic Web 08 Proc.

973

title

A book that is collection of texts or articles

“⊑” “⊑”

“⊑” “⊑”

Page 4: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Current Situation

...

...

Bibliographic Domain

...Engineer 1 Engineer 2 Engineer N

ontology 1 ontology N

Page 5: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Ontology Mapping

Publication

Proceedings Book

Referenceof

# of pages title

date

title

Citation

ProceedingsBook

# of pages

title

date

title

Work

to

“⊑”

“Find Citations in Proceedings”

Agents’ OntologyConference Ontology

Retrieves a superset of what he is looking for

Locates a mapping Ontology Mapping is a process that has as input two

ontologies and locates relations (i.e. mappings) between their elements Equivalence (≡) Subsumption (⊑ or ⊒) Intersection (⊥)

Page 6: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Why Subsumption Relations (1/2)

Publication

Proceedings Book

Edited Selection

Referenceof

# of pages title

date

title

Monograph

Citation

Proceedings Book

Edited Selection

# of pages title

date

title

Work

to

Monographchapters

chapters

# of pages

“≡”

“≡”

“⊑”

“⊑”

“⊥”

“≡”

“⊑”

“⊑”

Page 7: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Why Subsumption Relations (2/2) Discover subsumption relations separately from

subsumptions and equivalencies that can be deduced by a reasoning mechanism

May augment the effectiveness of current ontology mapping and merging methods

No or few equivalences Web Service matchmaking Ontology engineering environments

Page 8: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Problem Definition The subsumption computation problem

is defined as follows: Given two input ontologies optionally, specifying properties’

equivalences Classify each pair (C1,C2) of concepts to two

distinct classes: To the “subsumption” (⊑) class (C1 ⊑ C2 ), or to the class “R”

Class “R” denotes pairs of concepts that are not known to be related via the subsumption relation

Page 9: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Related Work Satisfiability Based Approaches [1]

Transformation of the ontology mapping problem in a satisfiability one

Exploitation of Domain Knowledge [2], [3], and [4] Exploit domain ontologies as an intermediate ontology

for bridging the semantic gap [2], [3] WordNet is used for the same purpose (WordNet

Description Logics) [4] Google Based Approaches [5], [6], and [7]

Exploit the hits returned by Google to test if subsumption relation holds [5], [6] or to loosen the formal constrains [7]

Machine Learning Approaches [8] A method based on Implication Intensity theory

(Unsupervised Learning) is proposed

Page 10: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method At a Glance (1/2) Purpose

We try to learn patterns of features that indicate a subsumption relation between two concepts belonging to two different ontologies

How By exploiting supervised machine learning

techniques (binary classification), and the ontology specification semantics

Page 11: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method At a Glance (2/2) Why machine learning?

There are no evident generic rules directly capturing the existence of a subsumption relation (e.g. labels/vicinity similarity or dissimilarity)

Learn patterns of features not evident to the naked eye

Self-adapting to idiosyncrasies of specific domains

Non-dependant to external resources

Page 12: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (1/12)

Input Two OWL-DL ontologies (the process is not language

specific) Optionally, property equivalencies computed by SEMA

mapping tool The method requires the existence of subsumption relations

between concepts

Hierarchies Enhancement

Generation of Testing Pairs (Search Space Pruning)

Generation of Features

SEMAGeneration of Training Examples

Train Classifier

R

Page 13: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (2/12)

Hierarchies Enhancement Inferring all indirect subsumption relations Influences the generation of training examples and

feature vectors

Hierarchies Enhancement

Generation of Testing Pairs (Search Space Pruning)

Generation of Features

SEMAGeneration of Training Examples

Train Classifier

R

Page 14: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (3/12)

Generation of Features CSR exploits two types of features: Concepts’

properties or words appearing in the “vicinity” of concepts

Hierarchies Enhancement

Generation of Testing Pairs (Search Space Pruning)

Generation of Features

SEMAGeneration of Training Examples

Train Classifier

R

Page 15: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (4/12)

(f1, f2, ..., fN)

fi: i-th feature

(C1, C2)

O1

O2

p1p2

pN-1

pN

pi

p1 p2 pN

21

2

1

21

andbothin appearsif,3

inonly appearsif,2

inonly appearsif,1

norinappear not doesif,0

CCp

Cp

Cp

CCp

f

i

i

i

i

i

Page 16: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (5/12)

O1

O2

w1

w3

wT

w3

wi

C1

C2

CM

wT

w2

... w

...

...

... ...

(frj1, frj

2, ... , frjT)

w1

wT

w1 w2 wT

(2, 0, 1, ... , 0)

(0, 0, 1, ... , 2)

(0, 1, 0, ...,1 , … , 1)

fri: frequency of i-th word

T: number of distinct words

For each concept Label Comments Properties Instances Related Concepts

Page 17: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (6/12)

(C1, C2)

(fr11, fr1

2, ..., fr1i , ... , fr1

T) (fr21, fr2

2, ..., fr2i , ... , fr2

T)C1 C2

(f1, f2, ..., fi ,..., fT)

0and0if,3

0and0if,2

0and0if,1

0and0if,0

21

21

21

21

ii

ii

ii

ii

i

frfr

frfr

frfr

frfr

f

Left Side Concept Right Side Concept

Page 18: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (7/12)

Generation of Training Examples Classes: “⊑” and R Training examples are being generated by

exploiting the input ontologies in isolation According to the semantics of specifications

Hierarchies Enhancement

Generation of Testing Pairs (Search Space Pruning)

Generation of Features

SEMAGeneration of Training Examples

Train Classifier

R

Page 19: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (8/12) Class “⊑”

Subsumption Relation. Include all concept pairs from both input ontologies that belong in the subsumption relation (direct or indirect)

Equivalence Relation. Any concept in a training pair can be substituted by any of its equals

Union Constructor. E.g. C4 ⊔ C5 ⊑ C2 => C4⊑C2 and C5⊑C2

Page 20: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (9/12) Generic class “R”

If there is not an axiom that specifies the subsumption relation between a pair of concepts

Categories of class “R” Concepts belonging to different hierarchies Siblings at the same hierarchy level Siblings at different hierarchy levels Concepts related through a non-subsumption relation Inverse pairs of class “ ”⊑

Page 21: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (10/12) Balancing the Training Dataset

The number of training examples for the class “⊑” are much less than the ones for class “R”

Dataset imbalance problem Two balancing strategies:

Random under-sampling variation Random over-sampling

Page 22: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (11/12)Hierarchies Enhancement

Generation of Testing Pairs (Search Space Pruning)

Generation of Features

SEMAGeneration of Training Examples

Train Classifier

R

Page 23: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

The CSR Method (12/12)Hierarchies Enhancement

Generation of Testing Pairs (Search Space Pruning)

Generation of Features

SEMAGeneration of Training Examples

Train Classifier

R

1st Ontology 2nd Ontology

C23

C24

C25 C2

6 C27

C29 C2

10 C211

C28C2

1 C22

C11 C1

2 C13

Page 24: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Experimental Settings The testing dataset has been derived from the

benchmarking series of the OAEI 2006 contest The compiled corpus + gold standard is available at

http://www.icsd.aegean.gr/incosys/csr Classifiers used: C4.5, Knn (2 neighbors), NaiveBayes (Nb)

and Svm (radial basis kernel) We denote each type of experiment with A+B+C

A: classifier, B: type of features (“Props” for properties or “Terms” for

words) and C: dataset balancing method (“over” and “under” for over-

and under-sampling) Baseline: Consults the vectors of the training examples of

the class “⊑”, and selects the first exact match (No generalization)

Description Logics’ Reasoner: We specify axioms concerning only properties’ equivalencies (Reasoner+Props), or alternatively, both properties’ and concepts’ equivalencies (Reasoner+Props+Con)

Page 25: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Overall Results

All classifiers (except Svm) based on properties outperform Baseline+Props

Generalization – location of pairs not in the training dataset

00,10,20,30,40,50,60,70,80,9

1

C4.5+

Props

+Ove

r

C4.5+

Props

+Und

er

C4.5+

Term

s+Ove

r

C4.5+

Term

s+Und

er

Knn+Pro

ps+Ove

r

Knn+Pro

ps+Und

er

Knn+Ter

ms+

Ove

r

Knn+Ter

ms+

Under

Nb+Pro

ps+O

ver

Nb+Pro

ps+U

nder

Nb+Ter

ms+

Over

Nb+Ter

ms+

Under

Svm+Pro

ps+Ove

r

Svm+Pro

ps+Und

er

Svm+Ter

ms+

Over

Svm+Ter

ms+

Under

Basel

ine+T

erm

s

Basel

ine+P

rops

Reaso

ner+

Props

Reaso

ner+

Props

+Con

Types of Experiments

F-m

ea

su

re

Page 26: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Overall Results

All classifiers based on words outperform Baseline+Words

Generalization – location of pairs not in the training dataset

00,10,20,30,40,50,60,70,80,9

1

C4.5+

Props

+Ove

r

C4.5+

Props

+Und

er

C4.5+

Term

s+Ove

r

C4.5+

Term

s+Und

er

Knn+Pro

ps+Ove

r

Knn+Pro

ps+Und

er

Knn+Ter

ms+

Ove

r

Knn+Ter

ms+

Under

Nb+Pro

ps+O

ver

Nb+Pro

ps+U

nder

Nb+Ter

ms+

Over

Nb+Ter

ms+

Under

Svm+Pro

ps+Ove

r

Svm+Pro

ps+Und

er

Svm+Ter

ms+

Over

Svm+Ter

ms+

Under

Basel

ine+T

erm

s

Basel

ine+P

rops

Reaso

ner+

Props

Reaso

ner+

Props

+Con

Types of Experiments

F-m

ea

su

re

Page 27: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Overall Results

C4.5 (using both words or properties) performs best comparing to all other CSR experimentation settings

Disjunctive descriptions of cases: More than one features may indicate whether a specific concept pair belongs in the class “⊑”

Decision trees are very tolerant to errors in the training set. Both to feature vectors and training examples

00,10,20,30,40,50,60,70,80,9

1

C4.5+

Props

+Ove

r

C4.5+

Props

+Und

er

C4.5+

Term

s+Ove

r

C4.5+

Term

s+Und

er

Knn+Pro

ps+Ove

r

Knn+Pro

ps+Und

er

Knn+Ter

ms+

Ove

r

Knn+Ter

ms+

Under

Nb+Pro

ps+O

ver

Nb+Pro

ps+U

nder

Nb+Ter

ms+

Over

Nb+Ter

ms+

Under

Svm+Pro

ps+Ove

r

Svm+Pro

ps+Und

er

Svm+Ter

ms+

Over

Svm+Ter

ms+

Under

Basel

ine+T

erm

s

Basel

ine+P

rops

Reaso

ner+

Props

Reaso

ner+

Props

+Con

Types of Experiments

F-m

ea

su

re

Page 28: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Overall Results

CSR exploiting words does not require neither properties nor concepts equivalencies

Reasoner exploits such equivalencies Depends on the mapping tool

00,10,20,30,40,50,60,70,80,9

1

C4.5+

Props

+Ove

r

C4.5+

Props

+Und

er

C4.5+

Term

s+Ove

r

C4.5+

Term

s+Und

er

Knn+Pro

ps+Ove

r

Knn+Pro

ps+Und

er

Knn+Ter

ms+

Ove

r

Knn+Ter

ms+

Under

Nb+Pro

ps+O

ver

Nb+Pro

ps+U

nder

Nb+Ter

ms+

Over

Nb+Ter

ms+

Under

Svm+Pro

ps+Ove

r

Svm+Pro

ps+Und

er

Svm+Ter

ms+

Over

Svm+Ter

ms+

Under

Basel

ine+T

erm

s

Basel

ine+P

rops

Reaso

ner+

Props

Reaso

ner+

Props

+Con

Types of Experiments

F-m

ea

su

re

Page 29: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Closer Look

A7 Category Different conceptualizations Flattened classes in target ontology + props

defined in a more detailed manner SEMA: 74% Precision – 100% Recall (Props+Cons)

00,10,20,30,40,50,60,70,80,9

1

A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4

Categories

Pre

cisi

on

TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)

0

0,2

0,4

0,6

0,8

1

A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4Categories

Rec

all

TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)

Page 30: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Closer Look

R1-R2 Category Different conceptualizations CSR locates subsumptions that the reasoner

cannot infer (R2), without using equivalencies

00,10,20,30,40,50,60,70,80,9

1

A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4

Categories

Pre

cisi

on

TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)

0

0,2

0,4

0,6

0,8

1

A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4Categories

Rec

all

TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)

Page 31: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

“Confused” Equivalencies

CSR is very tolerant in “confusing” equivalence relations as subsumption ones

Without using them also as input Can be used for filtering

01234567

C4.5+P

rops+

Over

C4.5+P

rops+

Under

C4.5+T

erms+

Ove

r

C4.5+T

erms+

Under

Knn+Pro

ps+O

ver

Knn+Pro

ps+U

nder

Knn+Ter

ms+

Over

Knn+Ter

ms+

Under

Nb+Pro

ps+Ove

r

Nb+Pro

ps+Und

er

Nb+Ter

ms+

Over

Nb+Ter

ms+

Under

Svm+Pro

ps+O

ver

Svm+Pro

ps+U

nder

Svm+Ter

ms+

Over

Svm+Ter

ms+

Under

Types of Experiments

Page 32: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Conclusions CSR method:

Learns patterns of concepts’ features (properties or terms) that provide evidence for the subsumption relation among these concepts, using machine learning techniques

Generates training datasets from the source ontologies specifications

Tackles the problem of imbalanced training datasets

Generalizes effectively over the training examples Does not exploits equivalence mapping (words

case as features) Does not easily “confuse” equivalence mappings as

subsumption ones Is independent of external resources

Page 33: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Questions? Comments?

Thank you!

Page 34: CSR : Discovering Subsumption Relations for the Alignment of Ontologies

Univ

ers

ity o

f th

e A

eg

ean

AI –

LAB

ES

WC

2008

Related Work1. Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and

implementation. Journal on Data Semantics, IX (2007)2. Aleksovski, Z., Klein, M., Kate, W, Harmelen F.: Matching Unstructured Vocabularies

Using a Background Ontology. In: EKAW, Podebrady, Czech Republic (2006)3. Gracia, J., Lopez, V., D'Aquin, M., Sabou, M, Motta, E., Mena, E.: Solving Semantic

Ambiguity to Improve Semantic Web based Ontology Matching, In: Ontology Matching Workshop, Busan, Korea (2007)

4. Bouquet, P., Serafini, L., Zanobini, S., and Sceffer, S. 2006: Bootstrapping semantics on the web: meaning elicitation from schemas. In: WWW, Edinburgh, Scotland (2006)

5. Cimiano P., Staab, S.: Learning by googling, In: SIGKDD Explor. Newsl., USA (2004)6. Hage, W.R. Van, Katrenko, S., Schreiber, A.Th.: A Method to Combine Linguistic

Ontology Mapping Techniques, In: ISWC, Osaka, Japan (2005)7. Risto G., Zharko A., Warner K.: Using Google Distance to weight approximate

ontology matches. In: WWW, Banff, Alberta, Canada (2007)8. Jerome D., Fabrice G., Regis G., Henri B.: An interactive, asymmetric and

extensional method for matching conceptual hierarchies. In: EMOI – INTEROP Workshop, Luxem-bourg (2006)