Generic Model Management: A Database Infrastructure for Schema Manipulation Philip A. Bernstein...

70

Transcript of Generic Model Management: A Database Infrastructure for Schema Manipulation Philip A. Bernstein...

Generic Model Management: Generic Model Management:

A Database Infrastructure A Database Infrastructure for Schema Manipulation for Schema Manipulation

Philip A. BernsteinPhilip A. BernsteinSenior ResearcherSenior ResearcherDatabase ResearchDatabase ResearchMicrosoft CorporationMicrosoft Corporation

Meta Data Management Meta Data Management Meta data = structural informationMeta data = structural information

DB schema, interface defn, web site map, form defns, …DB schema, interface defn, web site map, form defns, …

Table DefnsTable Defns

C++ interfacesC++ interfaces

UML ArchitectureUML Architecture

VB interfacesVB interfaces

ER DiagramER DiagramCustomer

Order

ScheduledDelivery

Product

Salesperson

FormsForms

BillCustomer

UpdateMarketing

Inventory

AuthorizeCredit

OrderEntry

ScheduleDelivery

Business Business ProcessProcess

Emp.Sal < Emp.Mgr.Sal

Business RulesBusiness Rules

Meta Data ProblemsMeta Data Problems They all involve schemas and mappingsThey all involve schemas and mappings E.g., data translation between data modelsE.g., data translation between data models

Hierarchical SchemaHierarchical Schema Relational SchemaRelational Schema

PurchaseOrder

OrdID OrderDate

Items

OrdID I_NameItem#

PO

PO#

POdate

POLines

Prod#

PName

Such Problems are PervasiveSuch Problems are Pervasive Data translationData translation

Schema evolutionSchema evolution

XML message translation for e-commerceXML message translation for e-commerce

Integrate custom apps with commercial appsIntegrate custom apps with commercial apps

Data warehouse loading (clean & transform) Data warehouse loading (clean & transform)

Design tool support (DB, UML, …)Design tool support (DB, UML, …)

OO or XML wrapper generation for SQL DBOO or XML wrapper generation for SQL DB

Meta Data SolutionsMeta Data Solutions They strongly resemble one another, They strongly resemble one another,

but …but … Usually are problem-specific Usually are problem-specific Usually are language-specificUsually are language-specific

SQL, ODMG, UML, XML, RDF, ….SQL, ODMG, UML, XML, RDF, …. Usually involve a lot of object-at-a-time Usually involve a lot of object-at-a-time

programmingprogramming

GoalsGoals Generic solutionsGeneric solutions ““Set”-at-a-time programmingSet”-at-a-time programming

Model ManagementModel Management A generic approach to meta data mgmtA generic approach to meta data mgmt

Model Mgmt operators manipulate Model Mgmt operators manipulate modelsmodels and and mappings mappings as bulk objectsas bulk objects Their representation is genericTheir representation is generic Operators - Match, Merge, Diff, ComposeOperators - Match, Merge, Diff, Compose

Avoids problem-specific and language-Avoids problem-specific and language-specific solutionsspecific solutions

Avoids object-at-a-time programmingAvoids object-at-a-time programming

Models and MappingsModels and MappingsA model is a directed graph with one root, which A model is a directed graph with one root, which represents a complex information structure.represents a complex information structure.

Emp

E#

Dept#

Name

RelationalSchema

Emp

E#

Dept#

Name

First

Last

XSD

A mapping represents a A mapping represents a transformation. It could transformation. It could be a binary table.be a binary table.

map1

Models and MappingsModels and MappingsA model is a directed graph with one root, which A model is a directed graph with one root, which represents a complex information structure.represents a complex information structure.

Emp

E#

Dept#

Name

RelationalSchema

Emp

E#

Dept#

Name

First

Last

XSD

Or it could be a modelOr it could be a model

map1

Model Mgmt AlgebraModel Mgmt Algebra

mapmap = Match ( = Match (MM11, , MM22))

MM3 3 = = Merge (Merge (MM11, , MM22, , mapmap))

mapmap33 = Compose( = Compose(mapmap11, , mapmap22))

MM22 = Diff(= Diff(MM11, map, map) )

<M<M22, map, map1212>> = ModelGen(= ModelGen(MM11, metamodel, metamodel22) ) Apply(Apply(MM, f), f)

MM22 = Copy(= Copy(MM11)) . . .. . .

OutlineOutline

IntroductionIntroduction How to solve a meta data problemHow to solve a meta data problem The role of schema matchingThe role of schema matching Using model management operators Using model management operators

to solve change propagationto solve change propagation Wrap-upWrap-up

Meta Data Solution TemplateMeta Data Solution Template

Representation for models1

Metamodels (e.g., for SQL schemas)2

Model

Importer

3

4 ImportModel1

Model2

SQL DefnsSQL Defns

Usage ScenariosUsage Scenarios

Metamodels (e.g., for SQL schemas)

Model

Importer

ProblemProblem ModelModel11 ModelModel22

Data translationData translation source schemasource schema target schematarget schema

Msg translationMsg translation source formatsource format target formattarget format

App integrationApp integration source interfacessource interfaces target interfacestarget interfaces

DW loadingDW loading source schemasource schema DW schemaDW schema

Model1 Model2

Representation for models

Solution Template (cont’d)Solution Template (cont’d)

Metamodels (e.g., for SQL schemas)2

Model

Importer

3

User creates

a mapping

5

User iterates

6

Use the mapping to generate data / msg translat’n script, app wrapper, ETL script, view defn’s, etc.

7

4 Import

Model1 Model2

Representation for models1

Table

Column

Example – Data TranslationExample – Data Translation

Meta Models

Record

Field Repeating Group

PurchaseOrder

OrdID OrderDate

Items

OrdID I_NameItem#

Schemasa.k.a. Models

PO

PO#

POdate

POLines

Prod#

PName

For each [po#, poD, poL] in PO Insert [po#, poD] into PurchaseOrder For each [prod#, pN] in poL Insert [prod#, po#, pN] into Items EndEnd

Generateddata

translationscript

PurchaseOrder

OrdID OrderDate

Items

OrdID I_NameItem#

Schemasa.k.a. Models

PO

PO#

POdate

POLines

Prod#

PName

DemonstrationDemonstration

BizTalk MapperBizTalk Mapper

OutlineOutline

IntroductionIntroduction How to solve a meta data problemHow to solve a meta data problem The role of schema matchingThe role of schema matching Using model management operators Using model management operators

to solve change propagationto solve change propagation Wrap-upWrap-up

Role of Schema MatchingRole of Schema Matching

The main The main useruser activity in the solution activity in the solution template is designing a mappingtemplate is designing a mapping

To help automate it, we need a schema To help automate it, we need a schema matchermatcher Good news – there are many useful ideasGood news – there are many useful ideas Bad news – the problem is AI-completeBad news – the problem is AI-complete

Schema Matching ApproachesSchema Matching Approaches Many good ideasMany good ideas

Rahm & Bernstein, VLDB J, Dec ’01Rahm & Bernstein, VLDB J, Dec ’01

Individual matchers

Schema-based Content-based

• Graph matching

Linguistic Constraint-based

StructuralPer-Element

• Types• Keys

• Value pattern and ranges

Constraint-based

Linguistic

• IR (word frequencies, key terms)

Per-Element

Constraint-based

• Names• Descriptions

But none are robust But none are robust combine ideas combine ideas

The Cupid AlgorithmThe Cupid Algorithm

City Street

PurchaseOrder

InvoiceToDeliverTo

City Street City Street

Address Address

POShipTo

PO

POBillTo

City Street

Sim++

Computes linguistic similarity of element pairsComputes linguistic similarity of element pairs Computes structural similarity of element pairsComputes structural similarity of element pairs Generates a mappingGenerates a mapping

Sim >

DemonstrationDemonstrationCupid AlgorithmCupid Algorithm

Research OpportunitiesResearch Opportunities

Better ways to combine algorithmsBetter ways to combine algorithms Reuse of validated mappingsReuse of validated mappings

E.g. using machine learning E.g. using machine learning [Doan et al] [He & Chang], [Madhavan et al] [Doan et al] [He & Chang], [Madhavan et al]

Experimentation with real schemasExperimentation with real schemas Science applicationsScience applications E.g., matching anatomies [Mork et al]E.g., matching anatomies [Mork et al]

OutlineOutline

IntroductionIntroduction How to solve a meta data problemHow to solve a meta data problem The role of schema matchingThe role of schema matching Using model management to solve Using model management to solve

change propagationchange propagation Wrap-upWrap-up

Change PropagationChange Propagation

Representation for models & metamodels

Metamodels

Model

Importer

Suppose ModelSuppose Model11 changes. changes.

E.g. source schema E.g. source schema for data or msg translation for data or msg translation or for DW loading or for DW loading

Model1 Model2

UserUsermodifiesmodifies

ModelModel11

Model1

GoalGoal

Model2

Rondo PrototypeRondo Prototype An implementation of model mgt algebraAn implementation of model mgt algebra Mappings are binary relationsMappings are binary relations Selector is a set of model elementsSelector is a set of model elements New operators: Extract, Traverse, …New operators: Extract, Traverse, … Sergey Melnik, Erhard Rahm Sergey Melnik, Erhard Rahm

Univ of LeipzigUniv of Leipzig

OID

OrderDateEmployeeCustomerPONumSalesTaxRate

PID

PNameBrand

ORDERS

DID

QuantityPriceDiscount

O-DETAILS

PRODUCTS

OIDPID

s1

relational schema

OrderIDOrderDateCustomerPONumSalesTaxRate

PurchaseOrder

ProductIDProductNameBrandQuantityPriceDiscount

Product

d1

XML schema

Example – SQL to XMLExample – SQL to XML

s1_d1

Change Propagation Goal

OID

OrderDateEmployeeCustomerPONumSalesTaxRate

PID

PName

Brand

OrderIDOrderDateCustomerPONumSalesTaxRate

PurchaseOrder

ProductIDProductName

BrandQuantityPrice

Discount

Product

ORDERS

DID

QuantityPrice

Discount

O-DETAILS

PRODUCTS

OIDPID

s1 d1originalrelationalschema

originalXMLschema

modifiedrelationalschema

OID

OrderDateEmployeeCustomerPONumSalesTaxRate

ShipDateFreightChRebate

ORDERS

PID

PName

PRODUCTS

s2

O-DETAILS

DID

QuantityPrice

OIDPID

modifiedXMLschema

OrderIDOrderDateCustomerPONumSalesTaxRate

PurchaseOrder

ProductIDProductNameQuantityPrice

Product

ShipDateFreightChRebate

d2

Step 1: What changed?

modifiedrelationalschema

OID

OrderDateEmployeeCustomerPONumSalesTaxRate

ShipDateFreightChRebate

ORDERS

DID

QuantityPrice

O-DETAILS

OIDPID

s2

OID

OrderDateEmployeeCustomerPONumSalesTaxRate

ORDERS

DID

QuantityPrice

Discount

O-DETAILS

OIDPID

s1

originalrelationalschema

deleted = All(s1) Domain(s1_s2)

s1_s2 = Match(s1, s2)

added = All(s2) Range(s1_s2)

a selectora selector

PID

PName

Brand

PRODUCTS

PID

PName

PRODUCTS

s1_s2s1_s2

Step 1: SchematicStep 1: Schematic

s1_d1 d1

Goal:d2

s2

s1

s1_s2

addedadded

deleteddeleted

Step 2: Propagate deletion

originalrelationalschema

originalXMLschema

OrderIDOrderDateCustomerPONumSalesTaxRate

PurchaseOrder

ProductIDProductName

BrandQuantityPrice

Discount

Product

d1OID

OrderDateEmployeeCustomerPONumSalesTaxRate

ORDERS

PID

PName

Brand

PRODUCTS

s1

d1, d1_d1 = Delete(d1, Traverse(deleted, s1_d1) )

d1

withoutdeleted

OrderIDOrderDateCustomerPONumSalesTaxRate

PurchaseOrder

ProductIDProductNameQuantityPrice

Product

d1_d1

DID

QuantityPrice

Discount

O-DETAILS

OIDPID

s1_d1

Traverse(deleted, s1_d1)

Step 3: Isolate additionsStep 3: Isolate additions 3a: change meta-models 3a: change meta-models

s1_s2

s2

s1_d1s1 d1

d1d1_d1

Goal:d2

s2_c

c, s2_c = ModelGen(s2, XSD)

c

3b:3b: Propagate Propagate additionaddition

DID

O-DETAILS

OIDOrderDateEmployeeCustomerPONumSalesTaxRate

ShipDateFreightChRebate

ORDERS

c

QuantityPrice

PRODUCTS

PIDName

modifiedrelationalschema

OID

OrderDateEmployeeCustomerPONumSalesTaxRate

ShipDateFreightChRebate

ORDERS

O-DETAILS

PID

PName

PRODUCTS

s2

convertedXML schema

c, c_c = Extract(c, Traverse(added, s2_c))

ShipDateFreightChRebate

ORDERS c

extractedXML schema

DID

QuantityPrice

OIDPID

Step 4: Compose mappingsStep 4: Compose mappings

s2c

cc_c

s2_c

s1_d1s1 d1

d1d1_d1

s1_s2

c_d1

c_d1 = c_c Invert(s2_c) Invert(s1_s2) s1_d1 Invert(d1_d1)

Step 5: MergeStep 5: Merged2, c_d2, d1_d2 = Merge(c, d1, c_d1);

s2c

cc_c

s2_c

s1_d1s1 d1

d1d1_d1

s1_s2

c_d1

d2d1c

s2_d2

c_d2

d1_d2

Complete scriptComplete scriptOperator Definition: PropagateChanges(s1, d1, s1_d1, s2, c, s2_c)

1. s1_s2 = Match(s1, s2);

2. d1, d1_d1 = Delete(d1, Traverse(All(s1) Domain(s1_s2), s1_d1));

3. c, c_c = Extract(c, Traverse(All(s2) Range(s1_s2), s2_c));

4. c_d1 = c_c Invert(s2_c) Invert(s1_s2) s1_d1 Invert(d1_d1);

5. d2, c_d2, d1_d2 = Merge(c, d1, c_d1);

6. s2_d2 = s2_c Invert(c_c) c_d2 + Invert(s1_s2) s1_d1 Invert(d1_d1) d1_d2;

7. return d2, s2_d2;

XSDSQL: PropagateChanges(d1, s1, Invert(s1_d1), d2, ModelGen(d2, SQL));

SQLXSD: PropagateChanges(s1, d1, s1_d1, s2, ModelGen(s2, XSD));

Operator Use:Operator Use:

OutlineOutline

IntroductionIntroduction How to solve a meta data problemHow to solve a meta data problem The role of schema matchingThe role of schema matching Using model management operators Using model management operators

to solve change propagationto solve change propagation Wrap-upWrap-up

Status ReportStatus Report There are many prototypes for MatchThere are many prototypes for Match Rondo is a complete MM prototypeRondo is a complete MM prototype Detailed design for Merge Detailed design for Merge

[Pottinger & Bernstein, VLDB 2003] [Pottinger & Bernstein, VLDB 2003] There are several efforts on a There are several efforts on a

mathematical semantics for MM algebramathematical semantics for MM algebra

What Next?What Next? Industrial strength schema matchingIndustrial strength schema matching

Better ways to reuse match resultsBetter ways to reuse match results Experiment with applicationsExperiment with applications

More applications. Bigger applications.More applications. Bigger applications. Thorough formal semantics Thorough formal semantics More prototypes. More operators.More prototypes. More operators. Add semantics to mappingsAdd semantics to mappings

And an inferencing engineAnd an inferencing engine

Improved user interfacesImproved user interfaces

Other ContributorsOther Contributors Halevy, Madhavan, et al (U. of Washington)Halevy, Madhavan, et al (U. of Washington) Do, Melnik, & Rahm (U. of Leipzig)Do, Melnik, & Rahm (U. of Leipzig) Kementsietsidis, Arenas, Miller (U of Toronto)Kementsietsidis, Arenas, Miller (U of Toronto) He & Chang (U. of Illinois U.C.)He & Chang (U. of Illinois U.C.) Kang & Naughton (U. of Wisconsin)Kang & Naughton (U. of Wisconsin) Bezivin & Valduriez (U. of Nantes)Bezivin & Valduriez (U. of Nantes)

ReferencesReferences http://www.research.microsoft.com/~philbe http://www.research.microsoft.com/~philbe Overview Overview

Bernstein, CIDR 2003Bernstein, CIDR 2003 Bernstein, Halevy, & Pottinger, SIGMOD Record, Dec. 2000Bernstein, Halevy, & Pottinger, SIGMOD Record, Dec. 2000

ImplementationImplementation Melnik, Rahm, & Bernstein, SIGMOD 2003 Melnik, Rahm, & Bernstein, SIGMOD 2003

Data Warehouse ExamplesData Warehouse Examples Bernstein & Rahm, ER 2000Bernstein & Rahm, ER 2000

Match OperationMatch Operation Survey: Rahm & Bernstein , VLDB J., Dec. 2001Survey: Rahm & Bernstein , VLDB J., Dec. 2001 Prototype: Madhavan, Bernstein, & Rahm, VLDB 2001Prototype: Madhavan, Bernstein, & Rahm, VLDB 2001

Merge OperationMerge Operation Pottinger & Bernstein, VLDB 2003Pottinger & Bernstein, VLDB 2003

TheoryTheory AlagiAlagićć & Bernstein, DBPL 2001 & Bernstein, DBPL 2001 Madhavan et al, AAAI 2002Madhavan et al, AAAI 2002

© 2003 Microsoft Corporation. All rights reserved.© 2003 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

Backup slidesBackup slides

MM System ArchitectureMM System Architecture

OR Mapper

Meta-Meta-Model

MatchMerge

ComposeCopy

Apply …

UI Generator

ModelManager

Object-OrientedRepository

SQLDBMS

BillCustomer

UpdateMarketing

Inventory

AuthorizeCredit

OrderEntry

ScheduleDelivery

Customer

Order

ScheduledDelivery

Product

Salesperson

select allselect all

custempdept

dnodna

Generic ToolsGeneric Tools• BrowserBrowser• Import/exportImport/export• ScriptingScripting

• EditorsEditors• CatalogsCatalogs

InferencingEngine

ModelGenModelGen

Model GenerationModel Generation

Representation for models & metamodels

Metamodel

Model

ImporterSchema1 Schema2

ER ER SQL SQLOO or XML wrapper generationOO or XML wrapper generationEncapsulate in ModelGen operatorEncapsulate in ModelGen operator

ModelGen OperatorModelGen Operator

Input: schema in source meta-model MInput: schema in source meta-model MSS

Output: schema in target meta-model MOutput: schema in target meta-model MTT

Meta-models are made of patternsMeta-models are made of patternsa.a. Object has sub-objectsObject has sub-objects

b.b. Aggregation has attributeAggregation has attribute

c.c. Aggregation has keyAggregation has key

Define pattern transformations as rulesDefine pattern transformations as rules For XSDFor XSDSQL, transform (a) into (b) + (c)SQL, transform (a) into (b) + (c)

Atzeni & Torlone, EDBT Atzeni & Torlone, EDBT ''9696

Implementing ModelGenImplementing ModelGen

ModelGen algorithmModelGen algorithm

1.1. Import SImport S11 into universal meta-model M into universal meta-model MUU which contains all patterns (a no-op)which contains all patterns (a no-op)

2.2. Translate STranslate S11 into meta-model M into meta-model MTT by by removing patterns in Mremoving patterns in MSS and not M and not MTT

ChallengesChallenges Define patterns for accurate translations Define patterns for accurate translations

(e.g., add nesting for SQL(e.g., add nesting for SQLXSD)XSD) Make patterns genericMake patterns generic Avoid cyclesAvoid cycles

MergeMerge

Merge(Merge(MM11, , MM22, , mapmap)) Return the union of models Return the union of models MM11 and and MM22

Use Use mapmap to guide the Merge to guide the Merge If elements x = y in If elements x = y in mapmap, then collapse , then collapse

them into one elementthem into one element

Emp

Addr Name

Emp

Name Phone

map

=

Emp

Name PhoneAddr

Emp

E#

Dept#

Addr

V1 V2

E#

Dept#

Phone

FirstName

LastName

Emp

Name

1. 1. mapmap= Match(V= Match(V11, V, V22))

map

=

=

2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)

S

E#

Dept#

Addr

Phone

Emp

FirstName

LastName 3. S = 3. S = Apply(S, crf) where ) where crf crf isis aa conflict resolution functionconflict resolution function

Name

FirstName

LastName

f

Merge(MMerge(M11, M, M22, map), map) [Buneman, Davidson, Kosky, EDBT 92][Buneman, Davidson, Kosky, EDBT 92]

Meta-model has aggregation & generalization onlyMeta-model has aggregation & generalization only Union, and collapse objects having the same nameUnion, and collapse objects having the same name Fix-up step for inconsistencies created by mergingFix-up step for inconsistencies created by merging

Y

X

a

Z

X

aY X Z

W

a

Y

X

Z

a a

Successive fixups lead to different results Successive fixups lead to different results Batch them at the end, to get a unique minimal resultBatch them at the end, to get a unique minimal result Now enrich the meta-model (containment, complex Now enrich the meta-model (containment, complex

mappings, …) & merge semantics (conflicts, deletes)mappings, …) & merge semantics (conflicts, deletes)

Emp

Emp#

Name

Employee

EmployeeID

FirstName

LastName

mapee

1

2

3 4

Resolving Merge ConflictsResolving Merge Conflicts

Emp

Emp#

Name

FirstName LastName

5

6

7

8

9

10

11

Meta MetaModel

Conflict

ModelConflict

MetaModel

Conflict

Matching Anatomy Matching Anatomy OntologiesOntologies

Matching Anatomy OntologiesMatching Anatomy Ontologies

Find differences in human anatomy Find differences in human anatomy ontologies ontologies FMA – Univ. of WashingtonFMA – Univ. of Washington Galen – Univ. of Manchester (UK)Galen – Univ. of Manchester (UK) The hard part of Diff is MatchThe hard part of Diff is Match UW Medical School, Dr. Cornelius RosseUW Medical School, Dr. Cornelius Rosse UW Computer Science, Peter MorkUW Computer Science, Peter Mork

Aligning RepresentationsAligning Representations

FMA:FMA:

Galen:Galen:

Heart sensiblyHeart sensiblyhasStructuralComponenthasStructuralComponentValveInHeartValveInHeart

HeartCardiac

valvegeneric

part HeartCardiac

valve

genericpart

Heart

sensibly

h-S-C

Valve InHeart

Anatomy Matching AlgorithmAnatomy Matching Algorithm

1.1. Lexical MatchLexical Match• Normalize string, UMLS dictionary lookup, Normalize string, UMLS dictionary lookup,

convert to concept-ID from thesaurusconvert to concept-ID from thesaurus

2.2. Structure MatchStructure Match• Similarity(reified nodes) = Similarity(reified nodes) =

Average(neighbors)Average(neighbors)• Back-propagate to neighborsBack-propagate to neighbors

3.3. Align Super-classesAlign Super-classes• Super-class similarity = average similarity of Super-class similarity = average similarity of

children, grandchildren, great-grandchildrenchildren, grandchildren, great-grandchildren

Anatomy Matching ExampleAnatomy Matching Example

HeartCardiac

valve

genericpart

Heart

sensibly

h-S-C

Valve InHeart

S = 2/3

S = 2/15

S = 1

S = 1

S: similarity score

SemanticsSemantics

Left Composition ( Left Composition ( f f •• ))Emp

Addr

Street

City

Emp

Street

City

Emp

StAddr

Town

mapA

a1

a2

a3

mapB

b2

b3

M1 M2 M3

Emp

Addr

Street

City

Emp

StAddr

Town

mapC

c1

c2

c3

mapC = mapA f• mapB

Name Nameb1

Category-Theoretic SemanticsCategory-Theoretic SemanticsA mapping is a model and two A mapping is a model and two

morphismsmorphisms

Goal – formal semantics to guide an Goal – formal semantics to guide an implementationimplementation When can morphisms be binary relations?When can morphisms be binary relations? When should morphisms be 1:1?When should morphisms be 1:1? When to use morphisms vs. mappings?When to use morphisms vs. mappings? How to assign semantics to morphisms?How to assign semantics to morphisms? What is the right semantics for What is the right semantics for

composition?composition?

Category-Theoretic SemanticsCategory-Theoretic Semantics

To ensure To ensure 55 andand 66 are unique, are unique, 33 andand 44 must be 1:1must be 1:1 Hence Hence 11 andand 22 must be 1:1, which has implications must be 1:1, which has implications

for Match for Match

AugmentedMatch(S1,S2)

S2Match(S1, S2)

S1 Merge(S1, S2)

1

2

5

6

3

4

ScenariosScenarios

ExampleExample

xsd1xsd1

xsd2xsd2

rdb1rdb1

Given Given mapmap11 between xsd1 and SQL schema rdb1 between xsd1 and SQL schema rdb1 xsd2, which is similar to xsd1xsd2, which is similar to xsd1

ProduceProduce rdb2 to store instances of xsd2rdb2 to store instances of xsd2 a mapping between rdb2 and xsd2a mapping between rdb2 and xsd2

4. Use Apply(map4) to map each x in Diff(xsd2,map4) into rdb2map1

1. 1. mapmap22= Match(xsd1, xsd2)= Match(xsd1, xsd2)

1. m

ap2

2. 2. mapmap33 = = mapmap22 mapmap11

2. map

3

rdb2rdb2

3. <3. <mapmap44, rdb2 > = Copy(, rdb2 > = Copy(mapmap33))

3. map4

1. map

1. 1. mapmap= Match(V= Match(V11, V, V22))

Schema IntegrationSchema Integration Given Given

two view schemas, Vtwo view schemas, V11 and V and V22

Produce Produce an integrated schema, San integrated schema, S

VV11 VV22

2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)

map

SS

2. 3. S = 3. S = Apply(S, crf) where ) where crf crf isis aa conflict resolution functionconflict resolution function

SS

Data MigrationData MigrationGivenGiven

a schema S and its database Da schema S and its database D an evolved schema San evolved schema S

ProduceProduce a procedure for mapping D into an a procedure for mapping D into an

SS database D database D

SS SS D

2. Use Enum(S) to generate a data migration script

GenerateMigration

ScriptEnum

1. 1. mapmapSSSS = Match(S, S= Match(S, S))

1. mapSS

Run

D

Schema EvolutionSchema EvolutionGiven Given

mapmapSVSV from schema S to view V from schema S to view V a modified version Sa modified version S of S of S

ProduceProduce a mapping mapa mapping mapSSVV from S from S to V to V

(i.e. a view defn for V over S(i.e. a view defn for V over S).).

SS

VV

map

SV

SS1. mapSS

1. 1. mapmapSSS S = Match(S= Match(S, S), S)2. map

SV2. 2. mapmapSS V V = = mapmapSV SV mapmapSS S S

3. Use Apply(V) to delete elements not derivable from S

Round-Trip EngineeringRound-Trip Engineering GivenGiven

An ER Model, MAn ER Model, M A SQL schema G, generated via mapA SQL schema G, generated via mapMGMG from M from M A modified version GA modified version G of G of G

ProduceProduce A modified version MA modified version M of M that generates G of M that generates G

1. mapGG

1. map1. mapGGGG = Match(G, G= Match(G, G))2. map

MG

2. map2. mapMGMG = map= mapGGGG map mapMGMG

MM

3. map

MG

3. <M3. <M, map, mapGG M M > = Copy(map> = Copy(mapMGMG))

4. For each g in Diff(G,mapMG) reverse engineer g into M (a 10-step procedure)GG

MM

GG

map

MG

Related WorkRelated Work There’s a lot of it. Apply it to model There’s a lot of it. Apply it to model

management!management!

Platforms – OODBs, datalog, deductive OODBs Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic)(Telos/ConceptBase, F-Logic)

Inferencing on mappings – AQUV, description Inferencing on mappings – AQUV, description logiclogic

Transitive closure and recursive QPTransitive closure and recursive QP

Differencing – text, trees, graphsDifferencing – text, trees, graphs

Data translation – algebras, schema evolutionData translation – algebras, schema evolution

Data integration – schema match, view Data integration – schema match, view generationgeneration

© 2003 Microsoft Corporation. All rights reserved.© 2003 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.