© 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation...

33
© 2001 Microsoft Corp. 1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September 6, 2001

Transcript of © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation...

Page 1: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 1

Generic Model Management

A Database Infrastructure for

Schema Manipulation

Philip A. BernsteinMicrosoft Corporation

September 6, 2001

Page 2: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 2

The Problem There is 30 years of DB Research on meta data

But we don’t have great infrastructure to offer– Most design tools and web services store meta data

in files, not DBs– OODBMS’s are not a huge success– Most meta data driven tools use their own infrastructure

Goal: generic meta data manipulation infrastructure – Reduce the amount of programming required to build meta

data driven applications.

Proposal: Model Management– Define an algebra to manipulate meta data in large

chunks, called models and mappings.

Page 3: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 3

Outline

• Overview of Model Management

• Solutions to classical meta data problems

• Recent technical results

Page 4: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 4

Models and Mappings• Model – a complex information structure

– XML schema, SQL schema, OO interface, UML model, web site map, make script, ….

• Mapping – a transformation from one model into another– Map between two XML schemas– Map a SQL schema to an XML schema– Map data sources to a data warehouse– Map an ER diagram to a SQL schema– Map a process defn to a workflow script

Page 5: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 5

RepresentationA model is a directed graph with one root.A model is a directed graph with one root.

Emp

E#

Dept#

Name

RelationalSchema

Emp

E#

Dept#

Name

First

Last

XSDmap1

A mapping is a model each A mapping is a model each of whose nodes connects of whose nodes connects nodes of two other modelsnodes of two other models

Page 6: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 6

Model Management Algebra

• Match

• Merge

• Compose

• Select

• Diff

• Enumerate

• ApplyFunction

• Copy

• Update operations

Page 7: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 7

map = Match(M1, M2, ) • Match(M1, M2, ) returns the best mapping

between M1 and M2, w.r.t. to

map1

=

=

Emp

E#

Dept#

Name

Addr

M1

M2Emp

E#

Dept#

Name

First

Last

Phone

Page 8: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 8

M3 = Merge(M1, M2, map)

• Return the union of models M1 and M2

– Use map to guide the Merge– If elements x = y in map, then collapse

them into one element

Emp

Addr Name

Emp

Name Phone

mapC

=

Emp

Name PhoneAddr

Page 9: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 9

Left Composition ( f • )Emp

Addr

Street

City

Emp

Street

City

Emp

StAddr

Town

mapA

a1

a2

a3

mapB

b2

b3

M1 M2 M3

Emp

Addr

Street

City

Emp

StAddr

Town

mapC

c1

c2

c3

mapC = mapA f• mapB

Name Nameb1

Page 10: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 10

Model Management Algebra

• map = Match (M1, M2, )

• M3 = Merge (M1, M2, map)

• map3 = Compose(map1, map2)

• M2 = Select(M1, pred)

• M2 = Diff(M1, map)

• list = Enumerate(M)

• ApplyFunction(M, f )

• M2 = Copy(M1)

• Update operations

They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story

Page 11: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 11

Example

rdb1rdb1

xsd1xsd1

map

1

xsd2xsd21. map2 1. 1. mapmap22= Match(xsd1, xsd2)= Match(xsd1, xsd2)

2. m

ap3

2. 2. mapmap33 = = mapmap11 mapmap22

rdb2rdb2

3. m

ap4

3. <3. <mapmap44, rdb2 > = Copy(, rdb2 > = Copy(mapmap33))

• Given – map1 from SQL schema rdb1 to xsd1, – xsd2, which is similar to xsd1

• Produce– a map between xsd2 and a relational schema.

4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2

Page 12: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 12

Theme• Classic meta data problems can be solved

using Model Management operations– Schema integration – Schema evolution – Data migration– Reverse engineering– Data reintegration (3-way merging)

• Published solutions to these problems help us produce generic implementations of model mgmt operations

Page 13: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 13

OutlineOverview of Model Management

• Solutions to classical meta data problems– Schema integration – Schema evolution– Reverse engineering– Data reintegration (3-way merging) – Data migration

• Recent technical results

Page 14: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 14

1. map

1. 1. mapmap= Match(V= Match(V11, V, V22))

Schema Integration• Given

– two view schemas, V1 and V2

• Produce – an integrated schema, S

VV11 VV22

2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)

map

SS

2. 3. 3. ApplyFunction(S) // to resolve ) // to resolve conflicts in conflicts in S, , producing SS

SS

Page 15: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 15

Emp

E#

Dept#

Addr

V1 V2

E#

Dept#

Phone

FirstName

LastName

Emp

Name

1. 1. mapmap= Match(V= Match(V11, V, V22))

map

=

=

2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)

S

E#

Dept#

Addr

Phone

Emp

Name

FirstName

LastName

f

L

R

FirstName

LastName 3. Use ApplyFunction(S3. Use ApplyFunction(S)) to re- to re-solve conflicts, producing Ssolve conflicts, producing S

Page 16: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 16

Merging Knowledge Bases (Ontologies)

• Same as schema integration, but applied to ontologies

• The literature on merging ontologies focuses mostly on Match.

Page 17: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 17

Schema Evolution• Given

– mapSV from schema S to view V– a modified version S of S

• Produce– a mapping mapSV from S to V

(i.e. a view defn for V over S).

SS

VV

map

SV

SS1. mapSS

1. 1. mapmapSSS S = Match(S= Match(S, S), S)2. map

SV2. 2. mapmapSS V V = = mapmapSS S S mapmapSVSV

3. Use ApplyFunction(V) to delete elements not derivable from S

Page 18: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 18

OutlineOverview of Model Management

• Solutions to classical meta data problemsSchema integration Schema evolution – Reverse engineering– Data reintegration (3-way merging)

– Data migration

• Recent technical results

Page 19: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 19

Reverse Engineering• Given

– Model M (e.g., an ER model)– Model G (e.g., SQL) generated via mapMG from M– A modified version G of G

• Produce– A modified version M of M that generates G

GG

MM

map

MG

GG1. mapGG

1. map1. mapGGGG = Match(G, G= Match(G, G))2. m

apM

G

2. map2. mapMGMG = map= mapMG MG map mapGGGG

MM3. map

MG

3. <M3. <M, map, mapGG M M > = Copy(map> = Copy(mapMGMG))

4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M

Page 20: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 20

3-Way Merge (aka Reintegration)• Given

– a source schema S0

– two derived schemas S1 and S2

• Produce– a schema S3 that merges the changes of S1 and S2

1. MapOA = Match(O, A) (based on OIDs)

2. MapOB = Match (O, B) (based on OIDs)

3. MapOA = ApplyFunction(MapOA) such that if eMapOA if domain(e) = range(e), then delete e  (i.e. things changed in A)

4. MapOB = ApplyFunction(MapOB) such that if eMapOB if domain(e) = range(e), then delete e (i.e. things changed in B)

5. ChangedA = range(MapOA)6. ChangedB = range(MapOB)7. MapChAChB = Match(ChangedA, ChangedB)

8. MapChBChA = invert(MapChAChB)

9. A = Diff(ChangedA,  ChangedB, MapChAChB) (changed in A but not changed in B)

10. B = Diff(ChangedB, ChangedA, MapChBChA)

11. MapAB =  Match (A,B) (by OIDs)

12. G = Merge (A,B, MapAB)

13. MapGA =Match(G,A)

14. GA = Merge (G, A, MapGA) with preference for A 15. MapGAB =Match(GA,B) 16. GAB = Merge (GA’, B’, MapGA’B’) with preference for B17. DeletedA = Diff(O,A,MapOA)

18. DeletedB = Diff(O, B, MapOB)

19. MapDeletedAChangedB = Match(DeletedA, ChangedB)

20. MapDeletedBChangedA = Match(DeletedB, ChangedA)

21. ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB)

22. ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA)

23. MapGABSDA = Match(GAB, ShouldDeleteA)

24. GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA)

25. MapGABSDASDB = Match(GABSDA,ShouldDeleteB)

26. Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB)

S0

S1 S2

S3

Page 21: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 21

Data Migration• Given

– a schema S and its database D– an evolved schema S

• Produce– a procedure for mapping D into an

S database D

SS SS D

2. Use Enum(S) to generate a data migration script

GenerateMigration

ScriptEnum

1. 1. mapmapSSSS = Match(S, S= Match(S, S))

1. mapSS

Run

D

Page 22: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 22

Data Translation

• Like data migration, except S and S are expressed in different data models.

Page 23: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 23

OutlineOverview of Model Management

Solutions to classical meta data problems

• Recent technical results

Page 24: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 24

Status Report• Vision

– [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00]

• Data Warehouse Examples– [Bernstein & Rahm, ER ’00]

• Match Operation– Survey: [Rahm & Bernstein, MSR Tech Report]– Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01]

• Merge Operation– coming soon …

• Theory– [Alagić & Bernstein, DBPL ’01]

Page 25: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 25

Schema Matching Approaches• About a dozen published algorithms.

• Many good ideas, but none are robust.

Automatic composition

Composite

Individual matchers Combined matchers

Manual composition

Schema-based Content-based

• Graph matching

Linguistic Constraint-based

StructuralPer-Element

• Types• Keys

• Value pattern and ranges

Constraint-based

Linguistic

• IR (word frequencies, key terms)

Per-Element

Hybrid

Constraint-based

• Names• Descriptions

Page 26: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 26

The CUPID Algorithm

City Street

PurchaseOrder

InvoiceToDeliverTo

City Street City Street

Address Address

POShipTo

PO

POBillTo

City Street

ssim++

• Computes linguistic similarity of element pairs• Computes structural similarity of element pairs• Generates a mapping

Page 27: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 27

M3 = Merge(M1, map, M2)

• [Buneman, Davidson, Kosky, EDBT 92]– Meta-model has aggregation & generalization only– Do a union and collapse objects having the same name– Fix-up step for inconsistencies created by merging

Y

X

a

Z

X

aY X Z

W

a

Y

X

Z

a a

– Successive fixups lead to different results – Batch them at the end, to produce a unique minimal result

• Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)

Page 28: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 28

A Formal Semantics for Model Mgt

• Use category theory for a data-model-independent characterization of models and mappings

• Models and their DBs are categories

• Model and data transformations are morphisms

• Mappings between models & data are functors

• Utility

– Define formal semantics for Match and Merge

– Explain when Match & Merge preserve constraints.

– Check that implementation satisfies the semantics

Page 29: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 29

Categories

Functor

Theory

Db Db(Sch1)

Db(Sch12)Db(Sch2)

DbDb

q

p

Sch12

Sch1

Sch2

fSchm

g

Match

Merge

• Goal – a mathematical semantics of MM algebra

Page 30: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 30

Implementation Vision

OR Mapper

MM Meta-Model

MatchMerge

ComposeCopy

Apply …

Model-DrivenUI Generator

ModelManager

Object-OrientedRepository

SQLDBMS

BillCustomer

UpdateMarketing

Inventory

AuthorizeCredit

OrderEntry

ScheduleDelivery

Customer

Order

ScheduledDelivery

Product

Salesperson

select allselect all

custempdept

dnodna

Generic ToolsGeneric Tools• BrowserBrowser• Import/exportImport/export• ScriptingScripting

• EditorsEditors• CatalogsCatalogs

OperationSpeciali-zations

InferencingEngine

Page 31: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 31

Related Work• There’s a lot of it. Apply it to model management!

• Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic)

• Inferencing on mappings – AQUV, description logic

• Transitive closure and recursive QP

• Differencing – text, trees, graphs

• Data translation – algebras, schema evolution

• Data integration – schema match, view generation

Page 32: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 32

Summary

• Raise the level of abstraction of meta-data programming by using:– models and mappings as objects– an algebra that manipulates models and

mappings on a generic meta-model

• Classical meta data problems can be expressed using this algebra

• Implementations of classic problems offer guidance on implementing the algebra

Page 33: © 2001 Microsoft Corp.1 Generic Model Management A Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September.

© 2001 Microsoft Corp. 33

References• http://www.research.microsoft.com/~philbe

• P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference

• P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000

• E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report

• J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001

• S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001