Model-Independent Schema and Data Translation Paolo Atzeni Università Roma Tre Based on Tutorial at...
-
Upload
emily-hines -
Category
Documents
-
view
219 -
download
3
Transcript of Model-Independent Schema and Data Translation Paolo Atzeni Università Roma Tre Based on Tutorial at...
Model-Independent Schema and Data Translation
Paolo AtzeniUniversità Roma Tre
Based onTutorial at ICDE 2006
Paper in EDBT 2006 (with P. Cappellari and P. Bernstein)
JISBD 2006 --- Sitges, 5 October 2006
P. Atzeni JISBD - October 5, 2006 2
Outline
• Introduction• Model management• Schema and data translation: the problem• A metamodel based approach
P. Atzeni JISBD - October 5, 2006 3
A ten-year goal for database research
• The “Asilomar report”(Bernstein et al. Sigmod Record 1999 www.acm.org/sigmod):– The information utility:
make it easy for everyone to store, organize, access, and analyze the majority of human information online
P. Atzeni JISBD - October 5, 2006 4
A general framework: cooperation
• "The capacity of a system to interact (effectively) with other systems, possibly managed by different organizations"
• Forms of cooperation– Process-centered cooperation:
• the systems offer services one another, by exchanging messages, or by triggering activities, without making remote data explicitly visible
– Data-centered cooperation: • the systems offer data one another; data is distributed,
heterogeneous and autonomous
P. Atzeni JISBD - October 5, 2006 5
Databases in the Internet era
• Databases before the Internet– An internal infrastructure, a precious resource, but usually
hidden, with some controlled cooperation• Internet changes the requirements
– More users (not only humans), more diverse cooperating systems (distributed, heterogeneous, autonomous), more types of data
• "Future" Internet changes more– New devices (embedded everywhere), even more users
(many “per person”), real mobility, need for personalization and adaptation
P. Atzeni JISBD - October 5, 2006 6
The most studied form of data-centered cooperation: integration
• We are interested in data-centered cooperation, often referred to as integration
“The unification of related, heterogeneous data from disparate sources, for example, to enable collaboration” (Hammer & Stonebraker 2005)
• Some "paradigms" …
P. Atzeni JISBD - October 5, 2006 7
Multidatabase
Global Manager
Local mgr
DB
Mediator
Local mgr
DB
Mediator
Local mgr
DB
Mediator
P. Atzeni JISBD - October 5, 2006 8
Data Warehousing System
Mediator
Local manager
Mediator
Local manager
Mediator
Local manager
“Integrator”
DB DB DB
Data Warehouse
DW manager
P. Atzeni JISBD - October 5, 2006 9
Intermediate solutions in practice
Mediator
Local manager
DB
Local manager
DB
Integrator
Mediator
Local manager
Mediator
Local manager
DB DB
DB
Local Manager
Application
P. Atzeni JISBD - October 5, 2006 10
Peer-based architecturePeer mgr
Local mgr
DB
Local mgr
DB
Mediator
Mediator
Peer mgr
Local mgr
DB
Local mgr
DB
Mediator
Mediator
Peer mgr
Local mgr
DB
Local mgr
DB
Mediator
Mediator
P. Atzeni JISBD - October 5, 2006 11
Data is not just in databases
• Mail messages• Web pages• Spreadsheets• Textual documents• Palmtop devices, mobile phones• Multimedia annotations (e.g., in digital photos)• XML documents
P. Atzeni JISBD - October 5, 2006 12
The same data in the same form?
• Adaptivity:– Personalization: content adapted to the user
• upon system's decision• upon user's request
– Customization: structure adapted to the user• according to the user's role• upon user's request
– Context dependence• User, Device, Network, Place, Time, Rate
P. Atzeni JISBD - October 5, 2006 13
Summarizing: a general need
• We have data at various places, and data has to be– exchanged– replicated – migrated– integrated – adapted
P. Atzeni JISBD - October 5, 2006 14
A major difficulty
• Heterogeneity– System level– Model level– Structural (different structure for similar data)– Semantic (different meaning for the same structure)
• Many efforts, but current techniques are mostly manual and ad hoc
P. Atzeni JISBD - October 5, 2006 15
A direction for the solutions
• Be general! – ad hoc solution could work in-the-small, but they
• are repetitive and time consuming • do not scale• are not maintainable
• Historical notes:– W. C. McGee: Generalization: Key to Successful Electronic
Data Processing. J. ACM 1959• Indeed, databases are the result of generalization applied to
secondary storage management!
P. Atzeni JISBD - October 5, 2006 16
Generality requires …
• … high-level descriptions of problems within the family of interest:– Metadata:
• “data about data”• (formal or informal) description of structures and
meaning
• General solutions leverage on metadata (and then operate on data as a consequence)
P. Atzeni JISBD - October 5, 2006 17
Outline
• Introduction and motivation• Model management• Schema and data translation: the problem• A metamodel based approach
P. Atzeni JISBD - October 5, 2006 18
A wider perspective
• (Generic) Model Management:– A proposal by Bernstein et al (2000 +)– Includes a set of operators on
• schemas and • mappings between schemas
P. Atzeni JISBD - October 5, 2006 19
Terminology: a warning
Model Mgmt people Traditional DB people
Meta-metamodel Metamodel
Metamodel Model
Model Schema
P. Atzeni JISBD - October 5, 2006 20
Schemas and mappings
• A simplified approach:– Schema:
• a set of elements, related in some way to one another– Mapping:
• a set of correspondences (pair of elements) or• its reification, a third schema related to the other two via
two sets of correspondences
P. Atzeni JISBD - October 5, 2006 21
Model mgmt operators, a first set
• map = Match (S1, S2) • S3 = Merge (S1, S2, map)• S2 = Diff (S1, map) • and more
– map3 = Compose (map1, map2)
– S2 = Select (S1, pred)
– Apply (S, f)
– list = Enumerate (S)
– S2 = Copy (S1)
– …
P. Atzeni JISBD - October 5, 2006 22
Match
• map = Match (S1, S2)– given
• two schemas S1, S2– returns
• a mapping between them• the “classical” initial step in data integration:
– find the common elements of two schemas and the correspondences between them
P. Atzeni JISBD - October 5, 2006 23
Merge
• S3 = Merge (S1, S2, map)– given
• two schemas and a mapping between them– returns
• a third schema (and two mappings)• the “classical” second step in data integration:
– given the correspondences, find a way to obtain one schema out of two
P. Atzeni JISBD - October 5, 2006 24
Diff
• S2 = Diff (S1, map) – given
• a schema and a mapping from it (to some other schema, not relevant)
– returns • a (sub-)schema, with the elements that do not participate
in the mapping
P. Atzeni JISBD - October 5, 2006 25
DW2
Example
(Bernstein and Rahm, ER 2000)• A database (a “source”), a data warehouse and a mapping
between the two• we get a second source, with some similarity to the first one• and we want to update the DW
DB1 DW1
DB2
P. Atzeni JISBD - October 5, 2006 26
DW2
Example, the "solution"
DB1 DW1
DB2
m1
m2m3
DB2’
m2 = Match(DB1,DB2)
m3= Compose(m2,m1)
DB2’=Diff(DB2,m3)
DW2’, m4 user defined
m5 = Match(DW1,DW2’)
DW2 = Merge(DW1,DW2’,m5)
DW2’m4
m5
P. Atzeni JISBD - October 5, 2006 27
Magic does not exist
• Operators might require human intervention:– Match is the main case
• Scripts involving operators might require human intervention as well (or at least benefit from it):– a full implementation of each operator might not always
available– a mapping might require manual specification– incomparable alternatives might exist
P. Atzeni JISBD - October 5, 2006 28
The “data level”
• The major operators have also an extended version that operates on data, and not only on schemas
• Especially apparent for– Merge
P. Atzeni JISBD - October 5, 2006 29
We also have heterogeneity
• Round trip engineering (Bernstein, CIDR 2003)– A specification, an implementation– then a change to the implementation: want to revise the
specification• We need a translation from the implementation model to the
specification one
S1
I1 I2
S2
P. Atzeni JISBD - October 5, 2006 30
Model management with heterogeneity
• The previous operators have to be “model generic” (capable of working on different models)
• We need a “translation” operator– <S2, map12> = ModelGen (S1)
P. Atzeni JISBD - October 5, 2006 31
ModelGen, an additional operator
• <S2, map12> = ModelGen (S1) – given
• a schema (in a model)– returns
• a schema (in a different data model) and a mapping between the two
• A “translation” from a model to another• I should call it “SchemaGen” …• We should better write
– <S2, map12> = ModelGen (S1,mod2)
P. Atzeni JISBD - October 5, 2006 32
S2
Round trip engineering
S1
I1
m1
I2
m2 = Match (I1,I2)m3 = Compose (m1,m2)I2’= Diff(I2,m3)<S2’,m4 > = Modelgen(I2’)… Match, Merge
m2
m3
I2’
S2’
m4
P. Atzeni JISBD - October 5, 2006 33
Outline
• Introduction• Model management• Schema and data translation: the problem• A metamodel based approach
P. Atzeni JISBD - October 5, 2006 34
Schema and data translation, a long standing issue
• Schema translation:– given schema S1 in model M1 and model M2– find a schema S2 in M2 that “corresponds” to S1
• Schema and data translation:– given also a database D1 for S1– find also a database D2 for S2 that “contains the same data”
to D1
P. Atzeni JISBD - October 5, 2006 35
Schema and data translation, a long standing issue
• Translations from a model to another have been studied since the 1970’s
• Whenever a new model is defined, techniques and tools to generate translations are studied
• However, proposals and solutions are usually model specific
P. Atzeni JISBD - October 5, 2006 36
Model specific solutions
• Given an ER schema, find the suitable relational schema that “implements” it – the original paper (Chen 1976) contains the basics– further discussions by many (e.g. Markowitz and Shoshani
1989)– illustrated in every textbook
• Similarly with – any other conceptual model and any other logical one– XML and relational (or object)
P. Atzeni JISBD - October 5, 2006 37
Another problem in the picture: data exchange
• Given a source S1 and a target schema S2 (in different models or even in the same one), find a translation, that is, a function that given a database D1 for S1 produces a database D2 for S2 that “correspond” to D1
• Often emphasized with reference to materialized solutions
P. Atzeni JISBD - October 5, 2006 38
Integration
• Given two or more sources, build an integrated schema or database
P. Atzeni JISBD - October 5, 2006 39
Schema translation
• Given a schema find another one with respect to some specific goal (better quality, another model, …)
P. Atzeni JISBD - October 5, 2006 40
Data exchange
• Given a source and a target schema, find a transformation from the former to the latter
P. Atzeni JISBD - October 5, 2006 41
Schema translation and data exchange
• Can be seen a complementary:– Data translation = schema translation + data exchange
• Given a source schema and database• Schema translation produces the target schema• Data exchange generates the target database
• In model management terms we could write– Schema translation:
• <S2, map12> = ModelGen (S1,mod2)– Data exchange:
• i2 = DataGen (S1,i1,S2,map12)
P. Atzeni JISBD - October 5, 2006 42
Outline
• Introduction• Model management• Schema and data translation: the problem• A metamodel based approach
P. Atzeni JISBD - October 5, 2006 43
The problem
• ModelGen– given two data models M1 and M2, and a schema S1 of M1
(the source schema and model), • generate a schema S2 of M2 (the target schema and
model), corresponding to S1• and, for each database D1 over S1, generate an
equivalent database D2 over S2
P. Atzeni JISBD - October 5, 2006 44
We have been doing this for a while
• Initial work more than ten years ago (Atzeni & Torlone, 1996)• Major novelty recently
– translation of both schemas and data– data-level translations generated, from schema-level ones
• Moreover:– a visible, multilevel and (in part) self-generating dictionary– high-level, visible and customizable translation rules
• in Datalog with OID-invention– mappings between elements generated as a by-product
P. Atzeni JISBD - October 5, 2006 45
Heterogeneity
• We need to handle artifacts and data in various models– Data are defined wrt to schemas– Schemas are defined wrt to models – How models can be defined?
Models
Schemas
Data
P. Atzeni JISBD - October 5, 2006 46
A metamodel approach
• The constructs in the various models are rather similar:
– can be classified into a few categories (Hull & King 1986):
• Lexical: set of printable values (domain)
• Abstract (entity, class, …)
• Aggregation: a construction based on (subsets of) cartesian products (relationship, table)
• Function (attribute, property)
• Hierarchies
• …
• We can fix a set of metaconstructs (each with variants):
– lexical, abstract, aggregation, function, ...
– the set can be extended if needed, but this will not be frequent
• A model is defined in terms of the metaconstructs it uses
P. Atzeni JISBD - October 5, 2006 47
The metamodel approach, example
• The ER model:– Abstract (called Entity)– Function from Abstract to Lexical (Attribute)– Aggregation of abstracts (Relationship) – …
• The OR model:– Abstract (Table with ID)– Function from Abstract to Lexical (value-based Attribute)– Function from Abstract to Abstract (reference Attribute)– Aggregation of lexicals (value-based Table)– Component of Aggregation of Lexicals (Column)– …
P. Atzeni JISBD - October 5, 2006 48
The supermodel
• A model that includes all the meta-constructs (in their most general forms)– Each model is subsumed by the supermodel (modulo
construct renaming) – Each schema for any model is also a schema for the
supermodel (modulo construct renaming)
P. Atzeni JISBD - October 5, 2006 49
A Multi-Level Dictionary
• Handles models, schemas and data• Has both a model specific and a model independent component
P. Atzeni JISBD - October 5, 2006 50
Multi-Level Dictionary
model
schema
modelgeneric
modelspecific
des
crip
tion
modelindependence
Supermodel description
(mSM)
Model descriptions
(mM)
Supermodel schemas
(SM)
Model specific schemas
(M)
Supermodel instances
(i-SM)
Model specific instances
(i-M)data
P. Atzeni JISBD - October 5, 2006 51
The supermodel description
MSM-Construct
OID Name IsLex
1 AggregationOfLexicals F
2 ComponentOfAggrOfLex T
3 Abstract F
4 AttributeOfAbstract T
5 BinaryAggregationOfAbstracts F
MSM-Property
OID Name Constr. Type
11 Name 1 String
12 Name 2 String
13 IsKey 2 Boolean
14 IsNullable 2 Boolean
15 Type 2 String
16 Name 3 String
17 Name 4 String
18 IsIdentifier 4 Boolean
19 IsNullable 4 Boolean
20 Type 4 String
21 IsFunct1 5 Boolean
22 IsOptional1 5 Boolean
23 Role1 5 String
24 IsFunct2 5 Boolean
25 IsOptional2 5 Boolean
26 Role2 5 String
MSM-Reference
OID Name Construct Target
30 Aggregation 2 1
31 Abstract 4 3
32 Abstract1 5 3
33 Abstract2 5 3
mSMmM
SMM
i-SMi-M
P. Atzeni JISBD - October 5, 2006 52
Schemas in the supermodel
SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
201 3 Clerks
202 3 Offices
SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
405 1 Address F F Text 302
501 3 Code T F Int 201
… … … … … … …
MSM-Construct
OID Name IsLex
… … …
3 Abstract F
4 AttributeOfAbstract T
… … …
Supermodel schemas
mSMmM
SMM
i-SMi-M
Employees
Departments
EmpNo
Name
Name
Address
P. Atzeni JISBD - October 5, 2006 53
i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1
… … …
3010 301 2
i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010
…
4021 402 1 Bob White 3011
… … … … …
Instances in the supermodel
SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
201 3 Clerks
202 3 Offices
SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
405 1 Address F F Text 302
501 3 Code T F Int 201
… … … … … … …Supermodel schemasSupermodel instances
mSMmM
SMM
i-SMi-M
P. Atzeni JISBD - October 5, 2006 54
Multi-Level Repository
model
schema
modelgeneric
modelspecific
des
crip
tion
modelindependence
Supermodel description
(mSM)
Model descriptions
(mM)
Supermodel schemas
(SM)
Model specific schemas
(M)
Supermodel instances
(i-SM)
Model specific instances
(i-M)data
P. Atzeni JISBD - October 5, 2006 55
Model descriptions
MSM-Construct
OID Name IsLex
1 AggregationOfLexicals F
2 ComponentOfAggrOfLex T
3 Abstract F
4 AttributeOfAbstract T
5 BinaryAggregationOfAbstracts F
MM-Construct
OID Model MSM-Constr Name
1 2 3 ER_Entity
2 2 4 ER_Attribute
3 2 5 ER_Relationship
4 1 1 Rel_Table
5 1 2 Rel_Column
6 3 3 OO_Class
MM-Model
OID Name
1 Relational
2 Entity-Relationship
3 Object
MSM-Property
OID Name Construct Type
… … … …
MSM-Reference
OID Name Construct …
… … … …
MM-Property
… … … …
MM-Reference
… … … …
mSMmM
SMM
i-SMi-M
P. Atzeni JISBD - October 5, 2006 56
Multi-Level Repository, generation and use
model
schema
modelgeneric
modelspecific
des
crip
tion
modelindependence
Supermodel description
(mSM)
Model descriptions
(mM)
Supermodel schemas
(SM)
Model specific schemas
(M)
Supermodel instances
(i-SM)
Model specific instances
(i-M)data
Structure fixed, content provided by tool
designers
Structure generated by the tool from the content of mSM
Structure generated by the tool from the content of mSM
Structure fixed, content provided by model
designers out of mSM
Structure generated by the tool from the content of mM
P. Atzeni JISBD - October 5, 2006 57
The metamodel approach, translations
• The constructs in the various models are rather similar: – can be classified into a few categories (“metaconstructs'')– translations can be defined on metaconstructs,
• and there are “standard”, accepted ways to deal with translations of metaconstructs
• they can be performed within the supermodel– each translation from the supermodel SM to a target model
M is also a translation from any other model to M:• given n models, we need n translations, not n2
P. Atzeni JISBD - October 5, 2006 58
Generic translation environment
1. Copy
2. Translation
3. Copy
Supermodel
Source model
Target model
Translation:composition 1,2 & 3
P. Atzeni JISBD - October 5, 2006 59
Translations within the supermodel
• We still have too many models:– Just within simple ER model versions, we have 4 or 5
constructs, and each has several independent features which give rise to variants
• for example, relationships can be– binary or N-ary– with all possible cardinalities or without many-to-
many– with or without the possibility of specifying optionality– with or without attributes– …
– Combining all these, we get hundreds of models!– The management of a specific translation for each model
would be hopeless
P. Atzeni JISBD - October 5, 2006 60
Translations, the approach
• Elementary translation steps to be combined• Each translation step handles a supermodel construct (or a
feature thereof) "to be eliminated" or "transformed"• A translation is the concatenation of elementary translation
steps
P. Atzeni JISBD - October 5, 2006 61
A complex translation, example(0,N) (0,N)
• Eliminate N-ary relationships • Eliminate attributes from relationships • Eliminate many-to-many relationships• Replace relationships with references• Eliminate generalizations
P. Atzeni JISBD - October 5, 2006 62
Complex translations
Binary ER w/o gen
N-ary ER w/ gen
Binary ERw/ gen
Relational
Bin ER w/ gen w/o attr on rel
OO w/ gen
N-ary ER w/o gen
Bin ER w/o gen w/o attr on rel
Bin ER w/o gen w/o M:N rel
Elim. N-ary relationships Elim. Relationship attr.sElim. M:N relationshipsReplace relationships with
referencesElim OO generalizations Elim ER generalizations
Bin ER w/ gen w/o M:N rel
OO w/o gen…
P. Atzeni JISBD - October 5, 2006 63
Translations
• Basic translations are written in a variant of Datalog, with OID invention– We specify them at the schema level– The tool "translates them down" to the data level– Some completion or tuning may be needed
P. Atzeni JISBD - October 5, 2006 64
A basic translation
• From (a simple) binary ER model to the relational model– a table for each entity– a column (in the table for E) for each attribute of an entity E– for each M:N relationship
• a table for the relationship• columns …
– for each 1:N and 1:1 relationship:• a column for each attribute of the identifier …
P. Atzeni JISBD - October 5, 2006 65
A basic translation application
Departments
Name Address
EmployeesEmpNo
Name
Affiliation
Departments
1,1
0,N
Name
Address
Employees
EmpNo Name Affiliation
P. Atzeni JISBD - October 5, 2006 66
A basic translation (in supermodel terms)
• From (a simple) binary ER model to the relational model– an aggregation of lexicals for each abstract– a component of the aggregation for each attribute of abstract– for each M:N aggregation of abstracts …
• …
• From (a simple) binary ER model to the relational model– a table for each entity– a column (in the table for E) for each attribute of an entity E– for each M:N relationship
• a table for the relationship• columns …
– for each 1:N and 1:1 relationship:• a column for each attribute of the identifier …
P. Atzeni JISBD - October 5, 2006 67
"An aggregation of lexicals for each abstract"
SM_AggregationOfLexicals(
OID: #aggregationOID_1(OID),
Name: name)
SM_Abstract (
OID: OID,
Name: name ) ;
P. Atzeni JISBD - October 5, 2006 68
Datalog with OID invention
• Datalog (informally):– a logic programming language with no function symbols and
predicates that correspond to relations in a database– we use a non-positional notation
• Datalog with OID invention:– an extension of Datalog that uses Skolem functions to
generate new identifiers when needed• Skolem functions:
– injective functions that generate "new" values (value that do not appear anywhere else); so different Skolem functions have disjoint ranges
P. Atzeni JISBD - October 5, 2006 69
"An aggregation of lexicals for each abstract"
SM_AggregationOfLexicals(
OID: #aggregationOID_1(OID),
Name: n)
SM_Abstract (
OID: OID,
Name: n ) ;
• the value for the attribute Name is copied (by using variable n)
• the value for OID is "invented": a new value for the function #aggregationOID_1(OID) for each different value of OID, so a different value for each value of SM_Abstract.OID
P. Atzeni JISBD - October 5, 2006 70
"An aggregation of lexicals for each abstract"
SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
… … …
SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
… … … … … … …
EmployeesEmpNoName
11
11
Departments1002
Employees1001
SM_AggregationOfLexicals(
OID: #aggregationOID_1(OID),
Name: n)
SM_Abstract (
OID: OID,
Name: n ) ;
…
3021002
……
1001
SM-AggregationOfLexicals
Schema NameOID
SM-aggregationOID_1_SK
absOIDOID
301
Employees
P. Atzeni JISBD - October 5, 2006 71
"A component of the aggregation for each attribute of abstract"
SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID),Name: name, AggrOID: #aggregationOID_1(absOID),IsNullable: isNullable, IsKey: isIdent, Type : type )
←SM_AttributeOfAbstract(
OID: attOID,Name: name,AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type : type ) ;
• Skolem functions
– are functions
– are injective
– have disjoint ranges
• the first function "generates" a new value
• the second "reuses" the value generated by the first rule
P. Atzeni JISBD - October 5, 2006 72
A component of the aggregation for each attribute of abstract"
SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
… … …
EmployeesEmpNoName
11
11
Departments1002
Employees1001
…
3021002
……
1001
SM-AggregationOfLexicals
Schema NameOID
SM-aggregationOID_1_SK
absOIDOID
301
SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID),Name: name, AggrOID: #aggregationOID_1(absOID),IsNullable: isNullable, IsKey: isIdent, Type : type )
←SM_AttributeOfAbstract(
OID: attOID,Name: name,AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type : type ) ;
SM-componentOID_1_SK
absOIDOID
Employees
EmpNo Name
SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
… … … … … … …
SM-ComponentOfAggregationOfLexicals
Schema Type AggrOIDisNullableisIdentNameOID
Text 1001FFName111004
1001IntFTEmpNo11
1003 401
1004 402
1003
P. Atzeni JISBD - October 5, 2006 73
Generating data-level translations
Schema translation
Data translation
• Same environment
• Same language
• High level translation specificationSupermodel description
(mSM)
Supermodel schemas
(SM)
Supermodel instances
(i-SM)
P. Atzeni JISBD - October 5, 2006 74
Translation rules, data level
i-SM_ ComponentOfAggregation… (OID: #i-componentOID_1 (i-attOID),i-AggrOID: #i-aggregationOID_1(i-absOID),ComponentOfAggregationOfLexicalsOID:
#componentOID_1(attOID),Value: Value )
←i-SM_AttributeOfAbstract(
OID: i-attOID,i-AbstractOID: i-absOID,AttributeOfAbstractOID: attOID,Value: Value ),
SM_AttributeOfAbstract(OID: attOID,AbstractOID: absOID,Name: attName,IsNullable: isNull,IsID: isIdent,Type: type )
SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), Name: name, AggrOID:
#aggregationOID_1(absOID),IsNullable: isNullable, IsKey: isIdent, Type : type )
←SM_AttributeOfAbstract(
OID: attOID,Name: name,AbstractOID: absOID, IsIdent: isIdent, IsNullable: isNullable , Type: type ) ;
P. Atzeni JISBD - October 5, 2006 75
Correctness
• Usually modelled in terms of information capacity equivalence/dominance (Hull 1986, Miller 1993, 1994)
• Mainly negative results in practical settings that are non-trivial• Probably hopeless to have correctness in general• We follow an "axiomatic" approach:
– We have to verify the correctness of the basic translations, and then infer that of complex ones
P. Atzeni JISBD - October 5, 2006 76
Experiments
• A significant set of models– ER (in many variants and extensions)– Relational– OR– XSD– UML
P. Atzeni JISBD - October 5, 2006 77
Summary
• ModelGen was studied a few years ago• New interest on it within the "Model management" framework• New approach
– Translation of schema and data– Visible (and in part self generated) dictionary– Visible and modifiable rules– Skolem functions describe mappings
P. Atzeni JISBD - October 5, 2006 78
Conclusion
• The ten-year goal of the Asilomar report:– The information utility:
make it easy for everyone to store, organize, access, and analyze the majority of human information online
• A lot of interesting work has been done but …• …integration, translation, exchange are still difficult…• … 2009 is approaching … we are late!
P. Atzeni JISBD - October 5, 2006 79
GrazzieGraciesGracias
Grazie
Thank you
P. Atzeni JISBD - October 5, 2006 80
Leftovers
P. Atzeni JISBD - October 5, 2006 82
i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1
3012 302 1
… … …
i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010
4021 402 1 Bob White 3011
4022 404 1 CS 3012
… … … … …
Instances in our dictionary
SM-Abstract
OID Schema Name
301 1 Employees
302 1 Departments
SM-AttributeOfAbstract
OID Schema Name isIdent isNullable Type AbstrOID
401 1 EmpNo T F Int 301
402 1 Name F F Text 301
404 1 Name T F Char 302
… … … … … … …
mSMmM
SMM
i-SMi-M
75432
John Doe
Bob White
…
CS
P. Atzeni JISBD - October 5, 2006 83
Translation of instances
75432
John Doe
Bob White
…
CS
Employees
EmpNo Name Affiliation
75432 John Doe CS
Bob White
Departments
Name
CS
Departments
Name Address
EmployeesEmpNo
Name
Affiliation
Departments
1,1
0,N
Name
Address
Employees
EmpNo Name Affiliation
P. Atzeni JISBD - October 5, 2006 84
i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1
3012 302 1
… … …
i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010
4022 404 1 CS 3012
… … … … …
Instances in our dictionary75432
John Doe
Bob White
…CS
Employees
EmpNo Name Affiliation
75432 John Doe CS
Departments
Name
CS
i-SM-AggregationOfLexicals
1002
1001
AggOID
115011
115010
InstOIDOID
5010CS1110054030
5010754321110034010
5010John Doe1110044020
i-SM-ComponentOfAggregationOfLexicals
i- AggOIDValueInstOIDCompOfAggOIDOID
P. Atzeni JISBD - October 5, 2006 85
i-SM-Abstract
OID AbsOID InstOID
3010 301 1
3011 301 1
3012 302 1
… … …
i-SM-AttributeOfAbstract
OID AttrOfAbsOID InstOID Value i- AbsOID
4010 401 1 75432 3010
4020 402 1 John Doe 3010
4022 404 1 CS 3012
… … … … …
Instances in our dictionary
i-SM-AggregationOfLexicals
1002
1001
AggrOID
115011
115010
InstOIDOID5010754321110036010
5010John Doe1110046020
5010CS1110056030
i-SM-ComponentOfAggregationOfLexicals
i- AggrOIDValueInstOIDCompOfAggOIDOID
i-SM_ComponentOfAggregation… (OID: #i-componentOID_1 (i-attOID),i-AggrOID: #i-aggregationOID_1(i-absOID),ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID),Value: Val )
← i-SM_AttributeOfAbstract( OID: i-attOID, i-AbstractOID: i-absOID,AttributeOfAbstractOID: attOID, Value: Val ),
SM_AttributeOfAbstract( OID: attOID, AbstractOID: absOID)
P. Atzeni JISBD - October 5, 2006 86
The problem
• ModelGen (a model management operator, [Bernstein 2003])– given two data models M1 and M2, and a schema S1 of M1
(the source schema and model), • generate a schema S2 of M2 (the target schema and
model), corresponding to S1• and, for each database D1 over S1, generate an
equivalent database D2 over S2
Generality: what are the models we consider?
Correctness: what do corresponding and equivalent mean?
P. Atzeni JISBD - October 5, 2006 87
Many artifacts and models for them
• Many notations, each with variants and conventions:
– Object diagrams
– Interface definitions
– Database schemas
– Web site layouts
– Control flow diagrams
– XML schemas
– Form definitions
– …
P. Atzeni JISBD - October 5, 2006 88
Many models just in the database world
• With different features and goals– semantic models and logical models:
• E-R, functional, (conceptual) object• relational, object-relational, object, network
– general purpose models (for all seasons) and problem oriented models (for specific contexts: DW, statistical, spatial, temporal)
• Variations of models– models with different levels of abstraction– versions within a family (e.g. many versions of the ER
model)• More models recently with the Web and XML
P. Atzeni JISBD - October 5, 2006 89
What do we need to exchange and translate
• In design:– Artifacts (schemas and other descriptions)
• In operations:– Data (in databases, files, documents, …)
P. Atzeni JISBD - October 5, 2006 90
Elementary steps
• There can be a set of predefined basic translations, for example– eliminate n-ary aggregations; replace them with binary ones
(and abstracts)– eliminate binary aggregations; replace them with functions– eliminate functions to abstracts; replace them with
aggregations– eliminate complex attributes; replace them with simple
attributes and abstracts• Assumed to be correct and so complex translations built over
them are correct by definition (an “axiomatic” approach)
P. Atzeni JISBD - October 5, 2006 91
A complex translation
• From an N-ary ER model with generalizations to a simple Object model with only single valued references and no generalizations– Eliminate N-ary relationships (replaced by binary ones and
new entities)– Eliminate attributes from relationships – Eliminate many-to-many relationships– Transform relationships to references– Eliminate generalizations
P. Atzeni JISBD - October 5, 2006 92
"An aggregation of lexicals for each M:N aggregation of abstracts"
SM_AggregationOfLexicals(
OID: #aggregationOID_2(aggOID),
Name: n )
SM_BinaryAggregationOfAbstracts(
OID: aggrOID,
Name: n,
isFun1: "False",
isFun2: "False") ;
• another Skolem function #aggregationOID_2():
– generates a "table" for a "relationship"
• the rule applies only to M:N "relationships", due to the conditions on isFun1 and isFun2
P. Atzeni JISBD - October 5, 2006 93
Translation rules, data level
i-SM_ ComponentOfAggregation… (OID: #i-componentOID_1 (i-attOID),i-AggrOID: #i-aggregationOID_1(i-absOID),
ComponentOfAggregationOfLexicalsOID: #componentOID_1(attOID),
Value: Value )←…
SM_ComponentOfAggregation… ( OID: #componentOID_1(attOID), AggrOID:
#aggregationOID_1(absOID),Name: name, IsNullable: isNullable, IsKey: isIdent, Type : type )
←…
P. Atzeni JISBD - October 5, 2006 94
Translation rules, data level
←i-SM_AttributeOfAbstract(
OID: i-attOID,i-AbstractOID: i-absOID,AttributeOfAbstractOID: attOID,Value: Value ),
SM_AttributeOfAbstract(OID: attOID,AbstractOID: absOID,Name: name,IsNullable: isNullable,IsID: isIdent,Type: type )
←SM_AttributeOfAbstract(
OID: attOID,AbstractOID: absOID, Name: name,IsIdent: isIdent, IsNullable: isNullable , Type: type ) ;
P. Atzeni JISBD - October 5, 2006 95
Management of translations
• Basic properties:– Correctness, minimality, …
• Construction of complex translations by picking basic translations in the library
P. Atzeni JISBD - October 5, 2006 96
ModelGen: the architecture
P. Atzeni JISBD - October 5, 2006 97
A simple example
• An object relational database, to be translated in a relational one
• Source: the OR-model
• Target: the relational model
System managed ids
P. Atzeni JISBD - October 5, 2006 98
Example, 2
• Does the OR model allow for keys? • Assume EmpNo and Name are keys
P. Atzeni JISBD - October 5, 2006 99
Example, 3
• Does the OR model allow for keys? • Assume no keys are specified