Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C....
-
date post
18-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C....
Reducing the Cost of Validating Mapping Compositions by
Exploiting Semantic Relationships
Reducing the Cost of Validating Mapping Compositions by
Exploiting Semantic Relationships
Eduard C. Dragut
Ramon Lawrence
Eduard C. Dragut
Ramon Lawrence
University of Illinois at Chicago
University of British Columbia Okanagan
University of Illinois at Chicago
University of British Columbia Okanagan
ODBASE 2006, Montpellier, France
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 2
Talk Overview Introduction Background
Model and Mapping representation systems Proposed Mapping Representation System
Invert and Compose operator definitions and properties
Mappings Composition Experiment
Estimate the quality of the proposed system
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 3
Modelsdenote a representation of a domain in a formal
language (e.g., EER, Relational, Description Logic)has two components [Russell et al 2003]
terminological (or metadata) This is the focus of this work and talk.This is the focus of this work and talk.
extensional (i.e. facts or instances) Mappings
describe how two models are related to each other
Introduction - Terminology
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 4
Ways to define mappings between models binary relationships
called morphismsmorphisms [Melnik et al 2003] or inter-schema correspondencesinter-schema correspondences [Popa et al. 2002]
mapping using a helper model [Bernstein et al. 2003]
mapping as queries [Madhavan et al. 2003, Berstein et al. 2006]
Our work falls in the class of the first two types of mappings. We call them metadata level mappings.metadata level mappings.
They are not concerned with the instances of a model.
Introduction - Mappings
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 5
Examples of models diagrams, interface definitions, database schemas,
web site layouts, control flow, XML schemas Applications of mappings
mapping between XML schemas to drive message translation;
schema and database integration;mapping between ontologies to help in the process of
merging and alignment
Introduction - Examples
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 6
The creation will be rarely completely automated. General strategy is to semi-automatically build mappings
use heuristics to generate matchingsmatchings (e.g. name similarity) [Rahm and Bernstein 2001, Shvaiko and Euzenat 2005] (surveys)
translate matches into formulas E.g., Clio project [Popa et al. 2002]
generate new mappings from existing mappings Composition Composition
E.g, [Madhavan et al. 2003, Berstein et al. 2006] InvertInvert
E.g, [Fagin 2006] Semi-automatic tools can significantly speed up the
process.
Background - Mapping creation
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 7
Background - Morphisms Mapping:
is just a set of binary relations between the elements of two models is a set of pairs < l, r >
Advantages/Disadvantages their expressiveness is enough for certain classes of problems and they
exhibit certain mathematical properties [Melnik et al. 2003] main drawback
assumes similarity to be transitiveassumes similarity to be transitive
CREATE TABLE Actor1(ActID int PRIMARY KEY, Bio varchar, ActorName varchar)
<schema xmlns=”...”> <complexType name=”Actor2”> <element name=”ActorID” type=”xs:int”/> <element name=”Bio” type=”xs:string”/> <element name=”FirstName” type=”xs:string”/> <element name=”LastName” type=”xs:string”/> </complexType></schema >
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 8
Actor2
ActorID
Bio
FirstName
Actor
Bio
ActID
ActorName LastName
Actor1
Bio
ID
MiddleName
FirstName
LastName
LastMovie
RecentMovie
Background – Morphisms problems Composition
<ID, ActID> ○ <ActID, ActorID> = <ID, ActorID>
due to transitivity assumption Problems with this technique
Whenever m:1 correspondence is composed with a 1:n correspondence, the composition result is a cross-product; many being false positives.
It may miss or suggest false relationships.
Actor2
ActorID
Bio
FirstName
LastName
Actor1
Bio
ID
MiddleName
FirstName
LastName
LastMovie
RecentMovie
Legend: Blue correct Red false positive or missed
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 9
Background – Mapping with helper models
Algorithm (for right compose)[Bernstein et al 2002]
copy the right hand side mapping for each mapping element, m, on the right, i.e. in map2
compute its Input(m) for each mapping element, m, on the right, i.e. in map2
set its domain to the union of the domains of Input(m)
Actor2
ActorID
Bio
FirstName
Actor
Bio
ActID
ActorName LastName
Actor1
Bio
ID
MiddleName
FirstName
LastName
LastMovie
RecentMovie
map1
m2
m4
m5m6
m3
m7
map2
m2
m4
m5m6
m3
Example:
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 10
Background – Mapping with helper models Composition result
Actor2
ActorID
Bio
FirstName
LastName
Actor1
Bio
ID
MiddleName
FirstName
LastName
LastMovieRecentMovie
map2
m2
m3
Actor2
ActorID
Bio
FirstName
Actor
Bio
ActID
ActorName LastName
Actor1
Bio
ID
MiddleName
FirstName
LastName
LastMovie
RecentMovie
map1
m2
m4
m5m6
m3
m7
map2
m2
m4
m5m6
m3
Problems with this technique It may miss or suggest false
relationships.
Legend: Red missed relationships
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 11
The Objectives The driving motivationdriving motivation
The need for a mapping definition subsuming the relationship kinds that the state of the art matching algorithms discover with high precision.
Investigate to what extent a set of operations over this mapping definition can be defined.
Provide a mapping representationa mapping representation at the metadata level combining the advantages of morphisms and mappings with helper models. The former has good mathematical properties. The latter is more expressive.
Provide a compose algorithma compose algorithm that exploits the semantic relationships within the mapping expression to produce correct semantic relationships whenever these can be determined automatically and to isolate those instances that require human intervention.
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 12
Proposed Mappings Representation Model
A model has similar expressiveness as an EER model and is consistent with the definition of model used in previous work on model management. [Bernstein et al 2002, Pottinger and Bernstein 2003]
Mapping Representation A mapping consists of a set of mapping elements, each mapping element is
a directed, kinded binary relationshipdirected, kinded binary relationship between a pair of elements not in the same model:
Triplets of form < m1,type,m2 >, type = {IsA, AKindOf, HasA, PartOf, =, Contains, ContainedBy, Unknown, Complex}
Comments Some of these types were introduced in other works.
E.g, [Euzenat 2004, Giunchiglia et al. 2004, Pottinger and Bernstein 2003, Xu and Embley 2003, Wu et al. 2004]
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 13
Proposed Mappings Representation
PO
Product
ShipTo
FirstName
Street1
POrder
Article
ShipAddress
RecipientLastName
Street2
OrganizationShipper
Phone WorkPhoneLegend
HomePhone
Equality
Contains
IsA
HasA
An example Most of the relationship kinds in the
mapping representation are well-known except for UnknownUnknown and ComplexComplex
<a,Unknown,b >, means that the relationship between concept a and b is not precisely known.
< a,Complex,b >, the relationship between concept a and b may require a functional specification: a = f(b)
e.g., Price = PriceVat(VAT + 1)
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 14
Operators - Invert Each of the relationship types introduced have well defined
inversion properties: IsA inverted is AKindOf, HasA inverted is PartOf, Contains inverted is
ContainedBy
Definition [invert for mapping elementsinvert for mapping elements]: Consider m = < a,type,b >. Then its corresponding inverted mapping element,
denoted m-1, is given by the following expression: < b, type-1,a> Mathematical form: < a,type, b >-1 = < b, type-1,a >
E.g. < a,HasA,b >-1 = < b,HasA-1,a >
Definition [invert for mappingsinvert for mappings]: Given two models A and B and a mapping, map, between them, the invert of
map denoted by map-1, is defined from B to A and its expression is given by:
map-1 = {< b,type-1,a >| < a,type,b > map}
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 15
Operators - Compose Composing two mappings involves defining a composition
operation between the elements of the mappings (i.e. between triplets of form < a,type,b >)
Example <HomePhone, IsA, Phone> ○ <Phone, = , Telephone> =
< HomePhone, IsA, Telephone>
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 16
Compose Properties Remarks:
The result of composing two mappings where mapping elements are expressed as triplets < a,type,b > is closed.
Mapping composition is symmetric in this framework:
(<a, type,b > ○ < b,type,c >)-1 =< c,type-1,b > ○ < b,type-1,a> The result of composing two mappings does not produce false does not produce false
correspondencescorrespondences between the elements of the two models, i.e. it does not suggest false directed, kinded relationships.
The Compose operator uses the Unknown relationship to indicate when it is not possible (in general) to suggest a relationship type given only the information expressed in the two mappings.
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 17
Experiment - Setup Experiment goal:
Show that the composition framework is robust when applied to real world application and that we are able to correctly identify problematic cases.
We compare it against mappings as morphisms.
Five real-world XML schemas in the purchase order domain:CIDR, Excel, Noris, Paragon, and Apertum from www.biztalk.org They were used in other projects:
[Dragut and Lawrence 2004, Madhavan et al. 2001]
And a reference ontology to which each XML schema is manually mapped both using morphisms
[Dragut and Lawrence 2004] and using the new mapping definition.
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 18
Experiment - Setup
Excel Purchase Order
Footer Header
totalValue orderNum
orderDate
ourAccountCode
yourAccountCode
Contact
contactName
companyName
telephone
InvoiceTo
city
country
Address
stateProvince
street1
street2
street3
street4
postalCode
DeliverTo
Items
Item
itemCount
partNumber
yourPartNumber
itemNumber
partDescription
quantity
unitOfMeasure
unitPrice
salesValue
Example of XML schemas: XML Excel and CIDR schemas
CIDR
ContactPOHeader
POShipTo
contactName
contactFunctionCode
contactEmail
city
contactPhone
POShipTo
attn
country
poDate
poNumber
entityidentifier
stateProvince
street1
street2
street3
street4
postalCode
city
attn
country
entityidentifier
stateProvince
street1
street2
street3
street4
postalCode
POLines
Item
count
startAt
qty
unitPrice
uom
partno
line
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 19
Experiment - Intermediary model
contactPerson
POrder
Agent
PurchaseOrder
HomePage
Fax
PurchaseOrderDate
Phone
Amount
Currency
Discount
ShipmentDate
Comments
OrderDate
City
Country
Address
State
Street
Zip
hasAddress
Person
LastName
FirstName
Personnel
Title
Position
Organization
Supplier Shipper
OrderNumber
billTo
OrganizationName
ItemsCollectionhasItems
PurchasedItem
ItemDescription
PartNumber
ItemName
UPC
Quantity
Price
hasItem
suppliedBy shippedBy
shipTo
Comments: The intermediary model does
not have all concepts in the schemas (e.g. unitOfMeasure, count, and VAT).
The intermediary model is structurally different from the five schemas considered and it is defined using OWL.
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 20
Experiment - Methodology Step 1: map the five schemas to the intermediary model:
First, using morphisms Second, using the proposed mapping
Step 2: apply the compose operators to compute direct mappings between the schemas First, employing composition over morphisms Second, using the new compose operator
Step 3: measure the quality of the two compositions in terms of Precision, Recall, and Overall A new metric is introduced User Effort.User Effort.
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 21
Experiment - Stats Overall after composition was computed
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5
Ov
era
ll
ours
morphisms
CIDR, Excel, Noris, Paragon, and Apertum are assigned numbers 1, 2, 3, 4, and 5 respectively.
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 22
User effortUser effort is the % of mappings that must be validated by a user. For morphisms, user effort is 100% as there is no way to distinguish true
over false relationships. In our framework, it is the ratio of the number of Unknown relationships to the
number of all produced relationships. On averageaverage it is only 19%.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1<->2 1<->3 1<->4 1<->5 2<->3 2<->4 2<->5 3<->4 3<->5 4<->5
% correct (semantic) % unknown (semantic)% correct (morphism) % incorrect (morphism)
Experiment - Stats
E. Dragut and R. Lawrence -Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Page 23
End
Thank you for your time and patience!