Post on 13-Jan-2016
description
CSR: Discovering Subsumption Relations
for the Alignment of Ontologies
Vassilis Spiliopoulos1, 2, Alexandros G. Valarakos1, and George A. Vouros1
1 AI LabDepartment of Information and Communication Systems
Eng.University of the Aegean
83200 Karlovassi, Samos, Greece{vspiliop, georgev}@aegean.gr
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Outline Introduction Problem Definition Why Subsumption Relations Related Work The Method Experimental Results Conclusions
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Ontology Concept features
Properties Data type Object Property
(relation) Instances Comments
Concepts organized into hierarchies (subsumption relation)
Ontology Languages OWL Family Union, Intersection,
Disjointness
Publication
Proceedings Book
Edited Selection
Monograph
Referenceof
# of pages title
date
The Semantic Web 08 Proc.
973
title
A book that is collection of texts or articles
“⊑” “⊑”
“⊑” “⊑”
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Current Situation
...
...
Bibliographic Domain
...Engineer 1 Engineer 2 Engineer N
ontology 1 ontology N
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Ontology Mapping
Publication
Proceedings Book
Referenceof
# of pages title
date
title
Citation
ProceedingsBook
# of pages
title
date
title
Work
to
“⊑”
“Find Citations in Proceedings”
Agents’ OntologyConference Ontology
Retrieves a superset of what he is looking for
Locates a mapping Ontology Mapping is a process that has as input two
ontologies and locates relations (i.e. mappings) between their elements Equivalence (≡) Subsumption (⊑ or ⊒) Intersection (⊥)
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Why Subsumption Relations (1/2)
Publication
Proceedings Book
Edited Selection
Referenceof
# of pages title
date
title
Monograph
Citation
Proceedings Book
Edited Selection
# of pages title
date
title
Work
to
Monographchapters
chapters
# of pages
“≡”
“≡”
“⊑”
“⊑”
“⊥”
“≡”
“⊑”
“⊑”
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Why Subsumption Relations (2/2) Discover subsumption relations separately from
subsumptions and equivalencies that can be deduced by a reasoning mechanism
May augment the effectiveness of current ontology mapping and merging methods
No or few equivalences Web Service matchmaking Ontology engineering environments
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Problem Definition The subsumption computation problem
is defined as follows: Given two input ontologies optionally, specifying properties’
equivalences Classify each pair (C1,C2) of concepts to two
distinct classes: To the “subsumption” (⊑) class (C1 ⊑ C2 ), or to the class “R”
Class “R” denotes pairs of concepts that are not known to be related via the subsumption relation
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Related Work Satisfiability Based Approaches [1]
Transformation of the ontology mapping problem in a satisfiability one
Exploitation of Domain Knowledge [2], [3], and [4] Exploit domain ontologies as an intermediate ontology
for bridging the semantic gap [2], [3] WordNet is used for the same purpose (WordNet
Description Logics) [4] Google Based Approaches [5], [6], and [7]
Exploit the hits returned by Google to test if subsumption relation holds [5], [6] or to loosen the formal constrains [7]
Machine Learning Approaches [8] A method based on Implication Intensity theory
(Unsupervised Learning) is proposed
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method At a Glance (1/2) Purpose
We try to learn patterns of features that indicate a subsumption relation between two concepts belonging to two different ontologies
How By exploiting supervised machine learning
techniques (binary classification), and the ontology specification semantics
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method At a Glance (2/2) Why machine learning?
There are no evident generic rules directly capturing the existence of a subsumption relation (e.g. labels/vicinity similarity or dissimilarity)
Learn patterns of features not evident to the naked eye
Self-adapting to idiosyncrasies of specific domains
Non-dependant to external resources
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (1/12)
Input Two OWL-DL ontologies (the process is not language
specific) Optionally, property equivalencies computed by SEMA
mapping tool The method requires the existence of subsumption relations
between concepts
Hierarchies Enhancement
Generation of Testing Pairs (Search Space Pruning)
Generation of Features
SEMAGeneration of Training Examples
Train Classifier
R
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (2/12)
Hierarchies Enhancement Inferring all indirect subsumption relations Influences the generation of training examples and
feature vectors
Hierarchies Enhancement
Generation of Testing Pairs (Search Space Pruning)
Generation of Features
SEMAGeneration of Training Examples
Train Classifier
R
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (3/12)
Generation of Features CSR exploits two types of features: Concepts’
properties or words appearing in the “vicinity” of concepts
Hierarchies Enhancement
Generation of Testing Pairs (Search Space Pruning)
Generation of Features
SEMAGeneration of Training Examples
Train Classifier
R
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (4/12)
(f1, f2, ..., fN)
fi: i-th feature
(C1, C2)
O1
O2
p1p2
pN-1
pN
pi
p1 p2 pN
21
2
1
21
andbothin appearsif,3
inonly appearsif,2
inonly appearsif,1
norinappear not doesif,0
CCp
Cp
Cp
CCp
f
i
i
i
i
i
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (5/12)
O1
O2
w1
w3
wT
w3
wi
C1
C2
CM
wT
w2
... w
...
...
... ...
(frj1, frj
2, ... , frjT)
w1
wT
w1 w2 wT
(2, 0, 1, ... , 0)
(0, 0, 1, ... , 2)
(0, 1, 0, ...,1 , … , 1)
fri: frequency of i-th word
T: number of distinct words
For each concept Label Comments Properties Instances Related Concepts
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (6/12)
(C1, C2)
(fr11, fr1
2, ..., fr1i , ... , fr1
T) (fr21, fr2
2, ..., fr2i , ... , fr2
T)C1 C2
(f1, f2, ..., fi ,..., fT)
0and0if,3
0and0if,2
0and0if,1
0and0if,0
21
21
21
21
ii
ii
ii
ii
i
frfr
frfr
frfr
frfr
f
Left Side Concept Right Side Concept
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (7/12)
Generation of Training Examples Classes: “⊑” and R Training examples are being generated by
exploiting the input ontologies in isolation According to the semantics of specifications
Hierarchies Enhancement
Generation of Testing Pairs (Search Space Pruning)
Generation of Features
SEMAGeneration of Training Examples
Train Classifier
R
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (8/12) Class “⊑”
Subsumption Relation. Include all concept pairs from both input ontologies that belong in the subsumption relation (direct or indirect)
Equivalence Relation. Any concept in a training pair can be substituted by any of its equals
Union Constructor. E.g. C4 ⊔ C5 ⊑ C2 => C4⊑C2 and C5⊑C2
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (9/12) Generic class “R”
If there is not an axiom that specifies the subsumption relation between a pair of concepts
Categories of class “R” Concepts belonging to different hierarchies Siblings at the same hierarchy level Siblings at different hierarchy levels Concepts related through a non-subsumption relation Inverse pairs of class “ ”⊑
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (10/12) Balancing the Training Dataset
The number of training examples for the class “⊑” are much less than the ones for class “R”
Dataset imbalance problem Two balancing strategies:
Random under-sampling variation Random over-sampling
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (11/12)Hierarchies Enhancement
Generation of Testing Pairs (Search Space Pruning)
Generation of Features
SEMAGeneration of Training Examples
Train Classifier
R
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
The CSR Method (12/12)Hierarchies Enhancement
Generation of Testing Pairs (Search Space Pruning)
Generation of Features
SEMAGeneration of Training Examples
Train Classifier
R
1st Ontology 2nd Ontology
C23
C24
C25 C2
6 C27
C29 C2
10 C211
C28C2
1 C22
C11 C1
2 C13
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Experimental Settings The testing dataset has been derived from the
benchmarking series of the OAEI 2006 contest The compiled corpus + gold standard is available at
http://www.icsd.aegean.gr/incosys/csr Classifiers used: C4.5, Knn (2 neighbors), NaiveBayes (Nb)
and Svm (radial basis kernel) We denote each type of experiment with A+B+C
A: classifier, B: type of features (“Props” for properties or “Terms” for
words) and C: dataset balancing method (“over” and “under” for over-
and under-sampling) Baseline: Consults the vectors of the training examples of
the class “⊑”, and selects the first exact match (No generalization)
Description Logics’ Reasoner: We specify axioms concerning only properties’ equivalencies (Reasoner+Props), or alternatively, both properties’ and concepts’ equivalencies (Reasoner+Props+Con)
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Overall Results
All classifiers (except Svm) based on properties outperform Baseline+Props
Generalization – location of pairs not in the training dataset
00,10,20,30,40,50,60,70,80,9
1
C4.5+
Props
+Ove
r
C4.5+
Props
+Und
er
C4.5+
Term
s+Ove
r
C4.5+
Term
s+Und
er
Knn+Pro
ps+Ove
r
Knn+Pro
ps+Und
er
Knn+Ter
ms+
Ove
r
Knn+Ter
ms+
Under
Nb+Pro
ps+O
ver
Nb+Pro
ps+U
nder
Nb+Ter
ms+
Over
Nb+Ter
ms+
Under
Svm+Pro
ps+Ove
r
Svm+Pro
ps+Und
er
Svm+Ter
ms+
Over
Svm+Ter
ms+
Under
Basel
ine+T
erm
s
Basel
ine+P
rops
Reaso
ner+
Props
Reaso
ner+
Props
+Con
Types of Experiments
F-m
ea
su
re
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Overall Results
All classifiers based on words outperform Baseline+Words
Generalization – location of pairs not in the training dataset
00,10,20,30,40,50,60,70,80,9
1
C4.5+
Props
+Ove
r
C4.5+
Props
+Und
er
C4.5+
Term
s+Ove
r
C4.5+
Term
s+Und
er
Knn+Pro
ps+Ove
r
Knn+Pro
ps+Und
er
Knn+Ter
ms+
Ove
r
Knn+Ter
ms+
Under
Nb+Pro
ps+O
ver
Nb+Pro
ps+U
nder
Nb+Ter
ms+
Over
Nb+Ter
ms+
Under
Svm+Pro
ps+Ove
r
Svm+Pro
ps+Und
er
Svm+Ter
ms+
Over
Svm+Ter
ms+
Under
Basel
ine+T
erm
s
Basel
ine+P
rops
Reaso
ner+
Props
Reaso
ner+
Props
+Con
Types of Experiments
F-m
ea
su
re
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Overall Results
C4.5 (using both words or properties) performs best comparing to all other CSR experimentation settings
Disjunctive descriptions of cases: More than one features may indicate whether a specific concept pair belongs in the class “⊑”
Decision trees are very tolerant to errors in the training set. Both to feature vectors and training examples
00,10,20,30,40,50,60,70,80,9
1
C4.5+
Props
+Ove
r
C4.5+
Props
+Und
er
C4.5+
Term
s+Ove
r
C4.5+
Term
s+Und
er
Knn+Pro
ps+Ove
r
Knn+Pro
ps+Und
er
Knn+Ter
ms+
Ove
r
Knn+Ter
ms+
Under
Nb+Pro
ps+O
ver
Nb+Pro
ps+U
nder
Nb+Ter
ms+
Over
Nb+Ter
ms+
Under
Svm+Pro
ps+Ove
r
Svm+Pro
ps+Und
er
Svm+Ter
ms+
Over
Svm+Ter
ms+
Under
Basel
ine+T
erm
s
Basel
ine+P
rops
Reaso
ner+
Props
Reaso
ner+
Props
+Con
Types of Experiments
F-m
ea
su
re
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Overall Results
CSR exploiting words does not require neither properties nor concepts equivalencies
Reasoner exploits such equivalencies Depends on the mapping tool
00,10,20,30,40,50,60,70,80,9
1
C4.5+
Props
+Ove
r
C4.5+
Props
+Und
er
C4.5+
Term
s+Ove
r
C4.5+
Term
s+Und
er
Knn+Pro
ps+Ove
r
Knn+Pro
ps+Und
er
Knn+Ter
ms+
Ove
r
Knn+Ter
ms+
Under
Nb+Pro
ps+O
ver
Nb+Pro
ps+U
nder
Nb+Ter
ms+
Over
Nb+Ter
ms+
Under
Svm+Pro
ps+Ove
r
Svm+Pro
ps+Und
er
Svm+Ter
ms+
Over
Svm+Ter
ms+
Under
Basel
ine+T
erm
s
Basel
ine+P
rops
Reaso
ner+
Props
Reaso
ner+
Props
+Con
Types of Experiments
F-m
ea
su
re
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Closer Look
A7 Category Different conceptualizations Flattened classes in target ontology + props
defined in a more detailed manner SEMA: 74% Precision – 100% Recall (Props+Cons)
00,10,20,30,40,50,60,70,80,9
1
A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4
Categories
Pre
cisi
on
TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)
0
0,2
0,4
0,6
0,8
1
A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4Categories
Rec
all
TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Closer Look
R1-R2 Category Different conceptualizations CSR locates subsumptions that the reasoner
cannot infer (R2), without using equivalencies
00,10,20,30,40,50,60,70,80,9
1
A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4
Categories
Pre
cisi
on
TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)
0
0,2
0,4
0,6
0,8
1
A1 A2 A3 A4 A5 A6 A7 F1 F2 E1 E2 R1 R2 R3 R4Categories
Rec
all
TERMS+OVER BASELINE+PROPSREASONER (PROPS) REASONER (PROPS+CONCEPTS)
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
“Confused” Equivalencies
CSR is very tolerant in “confusing” equivalence relations as subsumption ones
Without using them also as input Can be used for filtering
01234567
C4.5+P
rops+
Over
C4.5+P
rops+
Under
C4.5+T
erms+
Ove
r
C4.5+T
erms+
Under
Knn+Pro
ps+O
ver
Knn+Pro
ps+U
nder
Knn+Ter
ms+
Over
Knn+Ter
ms+
Under
Nb+Pro
ps+Ove
r
Nb+Pro
ps+Und
er
Nb+Ter
ms+
Over
Nb+Ter
ms+
Under
Svm+Pro
ps+O
ver
Svm+Pro
ps+U
nder
Svm+Ter
ms+
Over
Svm+Ter
ms+
Under
Types of Experiments
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Conclusions CSR method:
Learns patterns of concepts’ features (properties or terms) that provide evidence for the subsumption relation among these concepts, using machine learning techniques
Generates training datasets from the source ontologies specifications
Tackles the problem of imbalanced training datasets
Generalizes effectively over the training examples Does not exploits equivalence mapping (words
case as features) Does not easily “confuse” equivalence mappings as
subsumption ones Is independent of external resources
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Questions? Comments?
Thank you!
Univ
ers
ity o
f th
e A
eg
ean
AI –
LAB
ES
WC
2008
Related Work1. Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and
implementation. Journal on Data Semantics, IX (2007)2. Aleksovski, Z., Klein, M., Kate, W, Harmelen F.: Matching Unstructured Vocabularies
Using a Background Ontology. In: EKAW, Podebrady, Czech Republic (2006)3. Gracia, J., Lopez, V., D'Aquin, M., Sabou, M, Motta, E., Mena, E.: Solving Semantic
Ambiguity to Improve Semantic Web based Ontology Matching, In: Ontology Matching Workshop, Busan, Korea (2007)
4. Bouquet, P., Serafini, L., Zanobini, S., and Sceffer, S. 2006: Bootstrapping semantics on the web: meaning elicitation from schemas. In: WWW, Edinburgh, Scotland (2006)
5. Cimiano P., Staab, S.: Learning by googling, In: SIGKDD Explor. Newsl., USA (2004)6. Hage, W.R. Van, Katrenko, S., Schreiber, A.Th.: A Method to Combine Linguistic
Ontology Mapping Techniques, In: ISWC, Osaka, Japan (2005)7. Risto G., Zharko A., Warner K.: Using Google Distance to weight approximate
ontology matches. In: WWW, Banff, Alberta, Canada (2007)8. Jerome D., Fabrice G., Regis G., Henri B.: An interactive, asymmetric and
extensional method for matching conceptual hierarchies. In: EMOI – INTEROP Workshop, Luxem-bourg (2006)