Francesca Bugiotti Università Roma Tre 1 17/12/2009.

68
Model-independent solutions to model management problems Francesca Bugiotti Università Roma Tre 1 17/12/2009

Transcript of Francesca Bugiotti Università Roma Tre 1 17/12/2009.

  • Slide 1

Francesca Bugiotti Universit Roma Tre 1 17/12/2009 Slide 2 Model management What is A systematic approach to metadata management, which handles schemas by means of a set of predefined operators. Its goals Enhance the productivity of software developers, by offering them techniques that allow for high-level specifications and abstraction over recurring tasks involving the manipulation of schemas. 2Universit Roma Tre17/12/2009 Slide 3 Model management Model management systems Handle schemas and mappings and support a wide variety of operations on them. MIDST We propose MIDST [1,2,3], a platform originally conceived for model-independent schema and data translation, as the basis to build a model management system. The so built model management system aims at being model-independent and model-aware. 3Universit Roma Tre17/12/2009 Slide 4 What model management addresses Concrete needs: t hey are a formalization of concrete and frequent database maintenance problems data integration over heterogeneous databases data exchange between independent databases ETL wrapper generation for the access to relational databases from object-oriented applications web site generation from databases. 4Universit Roma Tre17/12/2009 Slide 5 What model management addresses Model management solutions to formalized problems: - schema integration - schema evolution - forward engineering - round-trip engineering - 5Universit Roma Tre17/12/2009 Slide 6 Schema integration 17/12/2009Universit Roma Tre6 S1S2 S1S2 S3 map 12 map 23 Slide 7 Forward engineering 17/12/2009Universit Roma Tre7 V1 S1S2 V2 S2 map 1 map 2 Slide 8 Round-trip engineering 17/12/2009Universit Roma Tre8 S1 I1I2 S2 I2 map 1 map 2 Slide 9 Model management problems solution Solutions to model management problems are given in terms of scripts. A script is a set of model management operators which are executed according to a specific control flow. 9Universit Roma Tre17/12/2009 Slide 10 Operators The operators involved in the script specifications are: - Match - Diff - Merge - Compose - Modelgen - Copy - 10Universit Roma Tre17/12/2009 Slide 11 Match Given two schemas S1 and S2, we define map12 = MATCH(S1,S2) where MATCH is the operator identifying correspondences between the two schemas and hence yielding a possible mapping. There are several algorithms implementing MATCH operators. 11Universit Roma Tre17/12/2009 Slide 12 Match 17/12/2009Universit Roma Tre12 B ABAB CDECDE A B A S1 S2 Match(S1,S2) = ? Slide 13 Match 17/12/2009Universit Roma Tre13 B ABAB CDECDE A B A S1 S2 Match(S 1,S 2 ) = map 12 B ABAB CDECDE A B A S1 S2 Slide 14 Diff Given two schemas S and S1 the dierence diff(S, S1) is a schema S2 that contains all the schema elements of S that do not appear in S1. It can be interpreted as a set-oriented difference. 14Universit Roma Tre17/12/2009 Slide 15 Example 17/12/2009Universit Roma Tre15 B ABAB CDECDE A B A Diff(S,S1) = ? S S1 Slide 16 Example 17/12/2009Universit Roma Tre16 B ABAB CDECDE A B A Diff(S,S1) = S2 S S1 B A CDECDE A S2 Slide 17 Merge Given S and S1, their merge merge(S, S1) is a schema S2 that contains the schema elements that appear in at least one of S or S1, modulo equivalence. It can be interpreted as a set-oriented union. 17Universit Roma Tre17/12/2009 Slide 18 Example 17/12/2009Universit Roma Tre18 B ABAB CDECDE A F A Merge(S,S 1 ) = ? S S1 Slide 19 Example 17/12/2009Universit Roma Tre19 B ABAB CDECDE A F A Merge(S1,S2) = S3 S1 S2 B ABFABF CDECDE S3 A Slide 20 Compose Given three schemas: S1, S2, S3 and two mappings, map12 between S1 and S2 and map23 between S2 and S3, we define map13 as the composition of map12 and map23 as the mapping between S1 and S3. Compose(S1, S2,S3, map12, map23) = map13 20Universit Roma Tre17/12/2009 Slide 21 Modelgen Given a schema S of a source model M and a target model M 1, the translation modelgen(S, M 1 ) is a schema S1 of M1 that corresponds to S. 21Universit Roma Tre17/12/2009 Slide 22 Modelgen 17/12/2009Universit Roma Tre22 M1 = Relational Model Modelgen(S,M1) = ? M = ER Model S Slide 23 Example 17/12/2009Universit Roma Tre23 S S1 Modelgen(S,M1) = S1 Slide 24 Operators A major goal is to provide model-independent operators, which guarantee some kind of model closure property. Here we move from a simplified version of Bernsteins solving procedure for the round-trip engineering problem [4], in order to introduce the needed operators and explain how they are implemented in a model-independent fashion. 24Universit Roma Tre17/12/2009 Slide 25 Round-trip engineering One of the most meaningful model management problems. Let us take it as an example to illustrate our approach to model management problems. S 1 : specification schema I 1 : an implementation schema obtained from S 1 I 2 : a modified version of the implementation I 2 S 2 : a new specification which corresponds to I 2. S1S1 S2S2 I1I1 I2I2 25Universit Roma Tre17/12/2009 Slide 26 Round-trip engineering S 1 is the specification schema which is translated into its corresponding implementation schema I 1. It is a common example where the specification is expressed in ER and the implementation is relational. The translation might be performed using MIDST itself, since it was conceived as an implementation of the MODELGEN operator. Manager PCode Title SSN EID Name (1,1) (0,N) S1S1 Project Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I1I1 26Universit Roma Tre17/12/2009 Slide 27 Round-trip engineering I 2 is the implementation schema which is a modified version of I 1. The transformation involves a change in the key of a referred relation. The key of Manager, which is referred by MGRSSN of Project in I 1, becomes EID in I 2. As a consequence, the column MGRSSN of Project, referencing SSN ofManager, has to reference EID. MGRID is the version of MGRSSN modified accordingly. Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I1I1 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) I2I2 27Universit Roma Tre17/12/2009 Slide 28 Round-trip engineering Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) I2I2 S2S2 Our goal is to generate S 2, the appropriately revised version of the specification schema, such that its corresponding implementation is I 2. 28Universit Roma Tre17/12/2009 Slide 29 Operators in scripts The solution which has been provided for the round-trip engineering is based on a set of model management operators: DIFF, MERGE and MODELGEN. DIFF and MERGE have been used to compute the difference and the union of schemas. MODELGEN has been used as a solution to translate the specification schema into the implementation and to compute the reversed differences. 29Universit Roma Tre17/12/2009 Slide 30 The Round-trip solving script 30Universit Roma Tre17/12/2009 Slide 31 Midst and Modelgen The platform MIDST was originally conceived as a framework to perform model-independent schema and data translations. MIDST was designed as a model-generic implementation of MODELGEN. 31Universit Roma Tre17/12/2009 Slide 32 Translations Entity Relationship Relational XSD Object Oriented WSM Object Relational Object Relational XSD 32Universit Roma Tre17/12/2009 Slide 33 Translations Entity Relationship Relational XSD Object Oriented WSM Object Relational Object Relational XSD 33Universit Roma Tre17/12/2009 Slide 34 The constructs in the various model are rather similar: Can be classified into a few categories (metaconstructs) IE: the entity of the ER, the Object of the OO can be reconduct to the same abstract concept, the Abstract of our supermodel. The metamodel approach 34Universit Roma Tre17/12/2009 Slide 35 A model that includes all the meta-constructs (in their most general forms) Each model is subsumed by the supermodel (modulo construct renaming) Each schema for any model is also schema for the supermodel (modulo construct renaming). The supermodel 35Universit Roma Tre17/12/2009 Slide 36 Translations can be defined on metaconstructs And there are standard accepted ways to deal with translation of metaconstructs They can be performed within the supermodel Each translation from the supermodel SM to a target model M is also a translation from any other model to M. Translations specification 36Universit Roma Tre17/12/2009 Slide 37 Translation specification The Datalog is used to specify the translation A translation script in our tool is a set of datalog rules. 37Universit Roma Tre17/12/2009 Slide 38 Datalog Declarative language We specify the condition for the insertion For every set of construct that matchs the conditions in B we create a new construct A ASlide 39 Datalog rule example We generate a new Abstract for each Aggregation Abstract( OID: SK1(oid), Name: name ) Aggregation( OID: oid, Name: name ); 39Universit Roma Tre17/12/2009 Slide 40 Another rule We copy only Lexical of Aggregation Lexical ( OID: SK1(oid), aggregationOID: SK2(aggOID), Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, type:t)Slide 41 Approach It is possible to apply the same approach to other model management operators? How can we define other operators with respect to our supermodel? 41Universit Roma Tre17/12/2009 Slide 42 Construct characteristics Every costruct has: An identification OID A name A set of properties A set of references 42Universit Roma Tre17/12/2009 Slide 43 Construct characteristics Every costruct has: An identification OID A name A set of properties A set of references 43Universit Roma Tre17/12/2009 SM_Lexical ( OID: SK1 oid, aggregationOID: aggOID, Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, type:t ) Slide 44 Construct equivalence Two constructs are equivalent if they have: The same name The same set of properties And refer to equivalent costructs 44Universit Roma Tre17/12/2009 Slide 45 Comparison There is a recursive definition of equivalence. We can order the construct and start the matching from the constructs without references. 45Universit Roma Tre17/12/2009 Slide 46 Construct characteristics Those can be found also in the rules An identification OID A name A set of properties A set of references SM_Lexical ( OID: SK1(oid), aggregationOID: SK2(aggOID), Name:name, isIdentifier:isId, isNullable:isN, isOptional:isO, type:t )Slide 47 Example An equivalence comparison may work as follows: 1.comparison of the aggregations or abstracts without any references; 2. comparison of constructs which may refer to them 47Universit Roma Tre17/12/2009 Slide 48 Model management operators by examples An Example of a possible implementation of model management operators follow. The adopted language is Datalog. The tool is MIDST. 48Universit Roma Tre17/12/2009 Slide 49 Datalog implementation of equivalence Fundamental functional block to compare two constructs: EQUIV_Aggregation [DEST] ( OID1: oid1, OID2: oid2)Slide 50 Datalog implementation of difference - merge Fundamental functional block used to implement a SELECTIVE COPY. SM_Aggregation( OID: SK(oid), Name: name )Slide 51 Automatic generation These operators can be automatically generated by the MIDST application framework. The construct of the supermodel are used to generate the rules used for the matching. The order of the application is important. 51Universit Roma Tre17/12/2009 Slide 52 Example Manager PCode Title SSN EID Name (1,1) (0,N) S1S1 Project Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I1I1 52Universit Roma Tre17/12/2009 Slide 53 Example Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I1I1 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) I2I2 53Universit Roma Tre17/12/2009 Slide 54 Example Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I1I1 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) I2I2 Project (MGRSSN*) Manager (SSN, EID) G 2 - Step 1 : difference between the implementation schemas. 1: DIFF(I 1,I 2 ) 54Universit Roma Tre17/12/2009 Slide 55 The Round-trip solving script 55Universit Roma Tre17/12/2009 Slide 56 Example Project (PCode, Title, MGRSSN*) Manager (SSN, EID, Name) I1I1 Project (PCode, Title, MGRID*) Manager (SSN, EID, Name, Degree) I2I2 2: DIFF(I 2,I 1 ) Project (MGRID*) Manager (SSN, EID, Degree) G 2 + Step 2 : difference between the implementation schemas. 56Universit Roma Tre17/12/2009 Slide 57 The Round-trip solving script 57Universit Roma Tre17/12/2009 Slide 58 Example Step 3-4 : inversion of the two semidifferences. Project stub (MGRID*) Manager stub (SSN, EID, Degree) G2+G2+ Project stub (MGRSSN*) Manager stub (SSN, EID) G2-G2- Project stub Manager stub SSN EID Degree Project stub Manager stub SSN EID 3: REVERSE 4: REVERSE S3+S3+ S3-S3- (1,1) (0,N) (1,1) (0,N) 58Universit Roma Tre17/12/2009 Slide 59 The Round-trip solving script 59Universit Roma Tre17/12/2009 Slide 60 Example Project stub Manager stub SSN EID Degree Project stub (1,1) (0,N) (1,1) (0,N) Project Manager SSN EID Name PCode (1,1) (0,N) Title S3+S3+ H S1S1 5: MERGE Step 5 : merge of the initial specification schema with the inverted positive semidifference. Name SSN EID Degee EID SSN Manager stub PCode Title 60Universit Roma Tre17/12/2009 Slide 61 The Round-trip solving script 61Universit Roma Tre17/12/2009 Slide 62 Example Project stub (1,1) (0,N) H Name SSN EID Degee EID SSN Manager stub PCode Title Project stub Manager stub SSN EID S3-S3- (1,1) (0,N) 6: DIFF Project Manager SSN EID PCode (1,1) (0,N) Title Name Degree S2S2 Step 7 : difference between H and the inverted negative semidifference. 62Universit Roma Tre17/12/2009 Slide 63 The Round-trip solving script 63Universit Roma Tre17/12/2009 Slide 64 Demo 17/12/2009Universit Roma Tre64/49 Slide 65 Properties Model independence MIDST handles schemas as instances of subsets of the available metaconstructs. The operators are defined as datalog rules declaring transformations in terms of the supermodel metaconstructs. The operators are defined in such a way that they are valid for any model by specifying comparisons between every available construct. 65Universit Roma Tre17/12/2009 Slide 66 Properties Model closure A model management operator (except MODELGEN) applied to a set of input schemas of a model M yields output schemas of the same model M. Model awareness Operators can be defined in such a way that they do not add metaconstructs which are not present in the source schemas (model awareness). 66Universit Roma Tre17/12/2009 Slide 67 References [1] P. Atzeni, P. Cappellari and P.A. Bernstein. Modelgen: Model- independent schema translation. In ICDE Conference, pages 1111-1112, 2005. [2] P. Atzeni, P. Cappellari and G. Gianforme. MIDST: model- independent schema and data translation. In SIGMOD, pages 1134-1136, ACM, 2007. [3] P. Atzeni, P. Cappellari, R. Torlone, P.A. Bernstein and G. Gianforme. Model-independent schema translation. In VLDB Journal. [4] P.A. Bernstein. Applying model management to classical meta data problems. In CIDR, pages 209-220, 2003. 17/12/2009Universit Roma Tre67 Slide 68 Summary Model management Operators Model generic operators Operators in MIDST Example 68Universit Roma Tre17/12/2009