A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and...
-
Upload
shona-cole -
Category
Documents
-
view
218 -
download
3
Transcript of A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and...
A Classification of Schema-based Matching Approaches
A Classification of Schema-based Matching Approaches
Pavel Shvaiko
Meaning Coordination and Negotiation Workshop, ISWC
8th November 2004, Hiroshima, Japan
2
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Outline
Introduction
Classification of schema-based matching approaches
Matching systems
Conclusions
Future work
3
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Introduction
4
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Semantic Web and the Match operator
Information sources (e.g., database schemas, taxonomies or ontologies) can be viewed as graph-like structures containing terms and their inter-relationships
Match is one of the key operators for enabling the Semantic Web since it takes two graph-like structures and produces a mapping between the nodes of the graphs that “correspond” semantically to each other
5
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Example: Two XML schemas
HT
FT
6
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Schema matching vs Ontology alignmentDifferences:
Database schemas often do not provide explicit semantics for their data
Ontologies are logical systems that themselves incorporate semantics (intuitive or formal)
E.g., ontology definitions as a set of logical axioms
Ontology data models are richer (the number of primitives is higher, and they are more complex) then schema data models
E.g., OWL allows defining new classes as unions or intersections of other classes
Commonalities:
Ontologies can be viewed as schemas for knowledge bases
Techniques developed for both problems are of a mutual benefit
7
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Matching
{M} {M'}
Parameters(e.g., weights, thresholds)
Auxiliary Information(e.g., lexicons, thesauri)
S1
S2
Match
Mapping element, M is a 5-tuple < ID, e1, e2, n, R >
n = {x[0,1]}
R = { =, , , , }
8
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Classification of Schema-based Matching Approaches
9
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Schema matching approaches
Individual matchers
Schema-based Instance-based
• Graph matching
Linguistic Constraint-based• Types• Keys
• Value pattern and ranges
Constraint-based
Linguistic
• IR (word frequencies, key terms)
Constraint-based
• Names• Descriptions
Structure-levelElement-level Element-level
Combined matchers
automatic composition
Composite
manual composition
Hybrid
Taxonomy from [E. Rahm, P. Bernstein, 2001]
10
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Semantic view on matching
Heuristic vs formal:heuristic techniques try to guess relations which may hold between similar labels or graph structuresformal techniques have model-theoretic semantics which is used to justify their results
Implicit vs explicit: Implicit techniques are syntax driven techniques
E.g., techniques, which consider labels as strings, or analyze data types, or soundex of schema/ontology elements
Explicit techniques exploit the semantics of labelsE.g., thesauruses, ontologies
What is missing in the taxonomy of schema matching approaches we have just seen ?
Two new criteria:
11
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Schema Matching Approaches
Individual matchers
Schema-based
• Graph matching
Linguistic Constraint-based• Types• Keys
Constraint-based
• Names• Descriptions
Structure-levelElement-levelHeuristic vs Formal
Implicit vs Explicit
12
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Schema-based Matching Approaches
Heuristic Techniques Formal Techniques
Element-level Element-levelStructure-level Structure-level
Implicit ImplicitExplicit Explicit ExplicitExplicit
String-based
Constraint-based
Constraint-based
Constraint-based
Auxiliary Information
Ontology-based
Reasoner-based
- Names
- Descriptions
- Type similarity
- Key properties
- Precompiled dictionary
- Lexicons
- Graph matching
- Children
- Leaves
- Taxonomic structure
- OWL properties
- Propositional SAT
- Modal SAT
13
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Heuristic Techniques
Element-level explicit techniques Precompiled dictionary (Cupid, COMA)
E.g., syn key - "NKN:Nikon = syn“
Lexicons (S-Match, CTXmatch)
E.g., WordNet: Camera is a hypernym for Digital Camera,
therefore, Digital_Cameras Photo_and_Cameras
Structure-level explicit techniquesTaxonomic structure (Anchor-Prompt, NOM)
E.g., Given that Digital_Cameras Photo_and_Cameras, then FJFLM and FujiFilm can be found as an appropriate match
Example
14
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Formal Techniques
Example
Element-level explicit techniques OWL properties (NOM)
E.g., sameClassAs constructor explicitly states that one class is equivalent to the other
Digital_Cameras = Camera DigitalPhoto_Producer
Structure-level explicit techniques Propositional satisfiability (SAT) (S-Match, CTXmatch)
The approach is to translate the matching problem, namely the two graphs (trees) and mapping queries into propositional formula and then to check it for its validity
Modal SAT (S-Match) The idea is to enhance propositional logics with modal logic (or ALC DL) operators. Therefore, the matching problem is translated into a modal logic formula which is further checked for its validity using sound and complete satisfiability search procedures.
15
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Matching Systems
16
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Characteristics of state of the art matchers
Conclusions
17
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Uses of Classification
The classification proposed provides a common conceptual basis, and hence can be used for comparing (analytically) different existing schema/ontology matching systems
It can help in designing a new matching system, or an elementary matcher, taking advantages of state of the art solutions
18
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Future Work
Provide a more detailed view on the general properties of matching algorithms
Add to the classification language-based techniques, e.g., tokenization, lemmatization, elimination
Extend classification by taking into account DL-based matchmaking solutions
Extend classification by adding new appearing matching techniques and systems implementing them, e.g., OLA, QOM
Compare matching systems also experimentally, with the help of benchmarks
19
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
References
Knowledge Web project: http://knowledgeweb.semanticweb.org/
Project website at DIT - ACCORD: http://www.dit.unitn.it/~accord/
P. Shvaiko: A classification of schema-based matching approaches. Technical Report, DIT-04-93, University of Trento, 2004.
E. Rahm, P. Bernstein: A survey of approaches to automatic schema matching. In Very Large Databases Journal, 10(4):334-350, 2001.
F. Giunchiglia, P.Shvaiko: Semantic matching. In The Knowledge Engineering Review Journal, 18(3):265-280, 2003.
P. Bouquet, L. Serafini, S. Zanobini: Semantic coordination: a new approach and an application. In Proceedings of ISWC, 130-145, 2003.
20
MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan
Thank you!