ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF...

24
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION

Transcript of ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF...

Page 1: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

ANHAI DOAN ALON HALEVY ZACHARY IVES

Chapter 6: General Schema Manipulation Operators

PRINCIPLES OF

DATA INTEGRATION

Page 2: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Outline

Introduction to model management and motivation The merge operator The ModelGen operator The Invert operator

Page 3: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Model Management Operators

We saw operators for creating mappings between pairs of schemas.

But you can imagine other operators on schemas and mappings: Merge schemas, compose and invert mappings, translate

schemas from one data model to another In fact, imagine an entire algebra of operators that

apply to schemas and to mappings: Many common workflows can be formulated as a sequence of

such operators [Bernstein, 2000] Note: “model” = “schema”. More terminology coming soon.

Page 4: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Example of Model Management (1)

In a data integration scenario, you may proceed as follows, beginning with sources S1 and S2: Use a match operator to create a mapping between S1

and S2

Use merge to create a merged (mediated) schema of S1 and S2 with mappings. Merge will create the minimal schema that includes both S1 and S2.

Page 5: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Example of Model Management (2)

Suppose we have another source S3, which is very similar to S1.

We could first use match to create a mapping from S1 to S3

Then use compose to create a mapping from S3 to the mediated schema G.

Page 6: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Operators

Match: see previous chapters Merge: create a merged schema of S1 and S2 w.r.t. a

mapping M12

ModelGen: create an equivalent model but in a different data model (e.g., relational XML)

Invert: given M12, create M21

Diff: find the difference between two models (see bibliography)

Page 7: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Some Terminology

Model: a specific description of a set of data in a given data model.

Meta model: a data model, such as relational schema, XML DTD, java class definitions, …

Meta-meta-model: a generic language that is independent of a particular meta-model Usually, some a graph-based formalism.

Page 8: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Outline

Introduction to model management and motivation The merge operator The ModelGen operator The Invert operator

Page 9: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

The Merge Operator

Given Two models, M1 and M2

A mapping from M1 to M2

Create: A merged model M12 that contains only the information in M1

and M2, but does not repeat information that is in both Mappings from M1and M2 to M12

Challenge to many model management operators: Can you develop algorithms that are generic, i.e., not specific to

particular data models?

Page 10: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Merge Challenges: Example

Challenge 1: different attribute representations. Resolution should be part of the input mappings.

Page 11: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Merge Challenges: Example

Challenge 2: merging models of different data models. (What if one data model supports sub-attributes and another doesn’t?) See ModelGen.

Page 12: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Merge Challenges: Example

Challenge 3: “fundamental conflicts”. Zipcode is an integer in one model and string in another. Merged model cannot have both: Solutions depend on particular conflict and data models

involved.

Page 13: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Outline

Introduction to model management and motivationThe merge operator The ModelGen operator The Invert operator

Page 14: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

The ModelGen Operator

Transform a schema from one meta-model (e.g,. Java object model, relational, XML) to another meta-model.

Main challenge: features that exist in the source meta-model may not exist in the target (e.g., sub-classes and inheritance).

The need for ModelGen is very common in practice and is used by several of the other operators.

Page 15: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

ModelGen Example

Java classes relational tables

No classes or inheritance in the relational model

Page 16: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

ModelGen Strategy

Possible to design specific transformations from one meta-model to another, but we want a generic approach.

Design a super meta-model that has (almost) all features that exist in the meta-models.

The super meta-model knows which features are present in each meta-model.

The algorithm will translate a given model into the super meta-model and from there to the target meta-model.

Page 17: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

ModelGen Algorithm

Input: model M1 in meta-model MM1

Output: a model M2 in meta-model MM2 that is equivalent to M1.

Transform M1 to the super-model, yielding M’. While M’ includes features that are not present in

MM2, apply transformations to remove these features (e.g., remove class hierarchy by translating it to multiple vertically partitioned tables)

Transform M’ into M2

Page 18: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Outline

Introduction to model management and motivationThe merge operatorThe ModelGen operator The Invert operator

Page 19: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

The Invert Operator

Schema mappings are often directional: They map data in source schema into a target schema.

Natural question: Can we find an inverse mapping?

But what is the right definition of inverse. We’ll see a couple of failed attempts before we see a good

one. Note: algorithms here are not generic. Highly

dependent on the meta-model.

Page 20: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Invert Definition: Attempt 1

Given a mapping M between a source S and target T. M defines a relation between pairs of instances (I,J)

that are consistent with each other: I is an instance of S, J is an instance of T.

Hence, a natural definition is: M-1 should define the relation (J,I), where (I,J) in M.

However, inverses defined this way will not be expressible with tuple-generating dependencies/GLAV mappings.

Why? See next slide.

Page 21: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Attempt #1 Problem Explained

Any relation defined by TGDs is closed up on the right and closed down on the left.

Formally, assume (I,J) is in M I’ is a subset of I, J is a subset of J’, then (I’, J’) is also in M.

However, by definition, M’ would have to be closed up on the left and closed down on the right Hence, cannot be defined with TGDs or GLAV.

Page 22: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Invert Definition: Attempt 2

Definition by composition: M composed with M’ should be the identity mapping!

However, it can be shown that under that condition, a mapping has an inverse only if the following holds: If I1 and I2 are two distinct instances of S, then their targets

under M should be distinct instances of T. The above result considerably limits the mappings

that have inverses. m1 and m2 won’t have inverses:

Page 23: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Third Time’s a Charm: Quasi inverses

Define equivalence between two instances w.r.t. M as:

Define M’ to be the quasi-inverse of M if the composition of M and M’ always maps I to an instance I’ such that

Example:

So m is a quasi-inverse of m’

Page 24: ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Summary of Chapter 6

Generic model management operators save a lot of repetitive code and can result in several forms of efficiency gains Employing such operators also ensures that applications

think carefully about the meaning of what they are doing. Two main open challenges:

Can the implementation of these operators be described in a meta-model independent fashion?

Is model management a system in itself that should be built or should operator implementations be individual services?