Sept. 15, 2003© 2003 Microsoft Corporation1 Generic Model Management: A Database Infrastructure for...
-
date post
22-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of Sept. 15, 2003© 2003 Microsoft Corporation1 Generic Model Management: A Database Infrastructure for...
Sept. 15, 2003 © 2003 Microsoft Corporation 1
Generic Model Management: Generic Model Management:
A Database Infrastructure A Database Infrastructure for Schema Manipulation for Schema Manipulation
Philip A. BernsteinPhilip A. BernsteinMicrosoft ResearchMicrosoft Research
Sept. 15, 2003 © 2003 Microsoft Corporation 2
Meta Data Management Meta Data Management Meta data = structural informationMeta data = structural information
DB schema, interface defn, web site map, form defns, …DB schema, interface defn, web site map, form defns, …
Table DefnsTable Defns
C++ interfacesC++ interfaces
UML ArchitectureUML Architecture
VB interfacesVB interfaces
ER DiagramER DiagramCustomer
Order
ScheduledDelivery
Product
Salesperson
FormsForms
BillCustomer
UpdateMarketing
Inventory
AuthorizeCredit
OrderEntry
ScheduleDelivery
Business Business ProcessProcess
Emp.Sal < Emp.Mgr.Sal
Business RulesBusiness Rules
Sept. 15, 2003 © 2003 Microsoft Corporation 3
Meta Data ProblemsMeta Data Problems They all involve schemas and mappingsThey all involve schemas and mappings E.g., data translation between data modelsE.g., data translation between data models
Hierarchical SchemaHierarchical Schema Relational SchemaRelational Schema
PurchaseOrder
OrdID OrderDate
Items
OrdID I_NameItem#
PO
PO#
POdate
POLines
Prod#
PName
Sept. 15, 2003 © 2003 Microsoft Corporation 4
Such Problems are PervasiveSuch Problems are Pervasive Data translationData translation
Schema evolution & data migrationSchema evolution & data migration
XML message translation for e-commerceXML message translation for e-commerce
Integrate custom apps with commercial appsIntegrate custom apps with commercial apps
Data warehouse loading (clean & transform) Data warehouse loading (clean & transform)
Design tool support (DB, UML, …)Design tool support (DB, UML, …)
OO or XML wrapper generation for SQL DBOO or XML wrapper generation for SQL DB
Semantic webSemantic web
Sept. 15, 2003 © 2003 Microsoft Corporation 5
Meta Data SolutionsMeta Data Solutions Solutions strongly resemble each other, butSolutions strongly resemble each other, but
usually are problem-specific usually are problem-specific usually are language-specificusually are language-specific
SQL, ODMG, UML, XML, RDF, ….SQL, ODMG, UML, XML, RDF, …. usually involve a lot of object-at-a-time usually involve a lot of object-at-a-time
programmingprogramming
GoalsGoals Generic solutionsGeneric solutions ““Set”-at-a-time programmingSet”-at-a-time programming
Sept. 15, 2003 © 2003 Microsoft Corporation 6
Model ManagementModel Management A generic approach to meta data mgmtA generic approach to meta data mgmt
Model Mgmt operators manipulate Model Mgmt operators manipulate modelsmodels and and mappings mappings as bulk objectsas bulk objects Their representation is genericTheir representation is generic Operators - Match, Merge, Diff, ComposeOperators - Match, Merge, Diff, Compose
Avoids problem-specific and language-Avoids problem-specific and language-specific solutionsspecific solutions
Avoids object-at-a-time programmingAvoids object-at-a-time programming
Sept. 15, 2003 © 2003 Microsoft Corporation 7
Models and MappingsModels and MappingsA model is a rooted directed graph, which A model is a rooted directed graph, which represents a complex information structure.represents a complex information structure.
Emp
E#
Dept#
Name
RelationalSchema
Emp
E#
Dept#
Name
First
Last
XSDmap1
A mapping is a model that A mapping is a model that represents a transformation represents a transformation between two modelsbetween two models
Sept. 15, 2003 © 2003 Microsoft Corporation 8
Models and MappingsModels and MappingsA model is a A model is a rooted rooted directed graph, which directed graph, which represents a complex information structure.represents a complex information structure.
Emp
E#
Dept#
Name
RelationalSchema
Emp
E#
Dept#
Name
First
Last
XSD
Or it could be a binary table Or it could be a binary table (a (a morphismmorphism))
map1
Sept. 15, 2003 © 2003 Microsoft Corporation 9
Model Mgmt AlgebraModel Mgmt Algebra
mapmap = Match ( = Match (MM11, , MM22))
<M<M33,, mapmap1313, map, map2323>> = = Merge (Merge (MM11, , MM22, , mapmap))
mapmap33 = Compose( = Compose(mapmap11, , mapmap22))
<M<M22, map, map1212>> = Diff(= Diff(MM11, map, map) )
<M<M22, map, map1212>> = ModelGen(= ModelGen(MM11, metamodel, metamodel22) )
MM22 = Copy(= Copy(MM11)) Apply, Insert, Delete, . . .Apply, Insert, Delete, . . .
Sept. 15, 2003 © 2003 Microsoft Corporation 10
OutlineOutline
Introduction to Model ManagementIntroduction to Model Management Using MM to solve meta data Using MM to solve meta data
problemsproblems Matching anatomy ontologiesMatching anatomy ontologies Model mergingModel merging Wrap-upWrap-up
Sept. 15, 2003 © 2003 Microsoft Corporation 11
Categorizing Meta Data ProblemsCategorizing Meta Data Problems
Model mappingModel mapping
M1 M2map12
Data translationData translation XML message translation for e-commerceXML message translation for e-commerce Integrate custom apps with commercial appsIntegrate custom apps with commercial apps Data warehouse loading (clean & transformData warehouse loading (clean & transform))
Solution is the match “operator”Solution is the match “operator” Really a CAD system for mapping generationReally a CAD system for mapping generation
Sept. 15, 2003 © 2003 Microsoft Corporation 12
Categorizing M D Problems (2)Categorizing M D Problems (2) Model integrationModel integration
M1 M2map12
View integrationView integration Data integrationData integration
Solution is the Merge operatorSolution is the Merge operator
M3
map13 map 23
Sept. 15, 2003 © 2003 Microsoft Corporation 13
Categorizing M D Problems (3)Categorizing M D Problems (3) Model and mapping generationModel and mapping generation
M1 M2
Design tools (ER Design tools (ER SQL) SQL) Wrapper generation (SQL Wrapper generation (SQL OO or XML) OO or XML)
Solution is the ModelGen operatorSolution is the ModelGen operator <M<M22, map, map1212>> = ModelGen(= ModelGen(MM11, metamodel, metamodel22))
map12
Sept. 15, 2003 © 2003 Microsoft Corporation 14
Categorizing M D Problems (4)Categorizing M D Problems (4)
Change propagationChange propagation
M1 M2map12
M1 M2map12
Schema evolutionSchema evolution Required maintenance for all meta data problemsRequired maintenance for all meta data problems
Solution requires the rest of MM algebraSolution requires the rest of MM algebra
Sept. 15, 2003 © 2003 Microsoft Corporation 15
Change PropagationChange Propagation
xsd1xsd1
xsd2xsd2
rdb1rdb1
Given Given mapmap11 between xsd1 and SQL schema rdb1 between xsd1 and SQL schema rdb1 xsd2, a modified version of xsd1xsd2, a modified version of xsd1
ProduceProduce rdb2 to store instances of xsd2rdb2 to store instances of xsd2 a mapping between xsd2 and rdb2 a mapping between xsd2 and rdb2
Now we need to merge Diff(xsd2,map4) into rdb3
map1 1. 1. mapmap22= Match(xsd1, xsd2)= Match(xsd1, xsd2)
1. m
ap2
2. 2. mapmap33 = = mapmap22 mapmap11
2. m
ap3
3. <3. <mapmap44, rdb3 > = Copy(, rdb3 > = Copy(mapmap33))
rdb3rdb33. map4 rdb2rdb2map
Sept. 15, 2003 © 2003 Microsoft Corporation 16
Change Propagation (cont’d)Change Propagation (cont’d)
xsd1xsd1
xsd2xsd2
rdb1rdb1map1
2. m
ap3
rdb3rdb33. map4
1. m
ap2
xsd2xsd2
4. m
ap5
4. <xsd2, map5> = Diff(xsd2,map4)
rdb2rdb2
7. map 8
7. map9
7. <rdb2, map8, map9> = Merge(rdb3, rdb4, map7)
6. m
ap7
6. map7 = map4 • map5 • map6
rdb4rdb45. map6
5. <rdb4, map6> = ModelGen(xsd2, SQL)
Sept. 15, 2003 © 2003 Microsoft Corporation 17
Complete Script in RondoComplete Script in RondoOperator Definition: PropagateChanges(s1, d1, s1_d1, s2, c, s2_c)
1. s1_s2 = Match(s1, s2);
2. d1, d1_d1 = Delete(d1, Traverse(All(s1) Domain(s1_s2), s1_d1));
3. c, c_c = Extract(c, Traverse(All(s2) Range(s1_s2), s2_c));
4. c_d1 = c_c Invert(s2_c) Invert(s1_s2) s1_d1 Invert(d1_d1);
5. d2, c_d2, d1_d2 = Merge(c, d1, c_d1);
6. s2_d2 = s2_c Invert(c_c) c_d2 + Invert(s1_s2) s1_d1 Invert(d1_d1) d1_d2;
7. return d2, s2_d2;
SQLXSD: PropagateChanges(s1, d1, s1_d1, s2, ModelGen(s2, XSD));
Operator Use:Operator Use:
Sept. 15, 2003 © 2003 Microsoft Corporation 18
Status ReportStatus Report Previous scenario is executable in Rondo, Previous scenario is executable in Rondo,
the first complete MM prototypethe first complete MM prototype [Melnik et al, SIGMOD 2003] [Melnik et al, SIGMOD 2003]
There are many prototypes for MatchThere are many prototypes for Match [Rahm & Bernstein, VLDB J., Dec. 2001][Rahm & Bernstein, VLDB J., Dec. 2001]
Detailed design for MergeDetailed design for Merge [Pottinger & Bernstein, VLDB 2003] [Pottinger & Bernstein, VLDB 2003]
There are several efforts on a formal There are several efforts on a formal semantics for MM operatorssemantics for MM operators
Sept. 15, 2003 © 2003 Microsoft Corporation 19
OutlineOutline
Introduction to Model ManagementIntroduction to Model Management Using MM to solve meta data Using MM to solve meta data
problemsproblems Matching anatomy ontologiesMatching anatomy ontologies Model mergingModel merging Wrap-upWrap-up
Sept. 15, 2003 © 2003 Microsoft Corporation 20
Schema Matching AlgorithmsSchema Matching Algorithms About a dozen published algorithmsAbout a dozen published algorithms
Schema-based vs. content-basedSchema-based vs. content-based Per-element vs. structuralPer-element vs. structural Linguistic vs. constraint-basedLinguistic vs. constraint-based Independently-developed schemas vs. Independently-developed schemas vs.
incrementally-modified schemasincrementally-modified schemas Hybrid vs. compositeHybrid vs. composite
Many good ideas, but none are robustMany good ideas, but none are robust Human review and input is essentialHuman review and input is essential
User interface is also quite importantUser interface is also quite important
Sept. 15, 2003 © 2003 Microsoft Corporation 21
Matching Anatomy OntologiesMatching Anatomy Ontologies
Match two human anatomy ontologies Match two human anatomy ontologies FMA – Univ. of WashingtonFMA – Univ. of Washington Galen CRM – Univ. of Manchester (UK)Galen CRM – Univ. of Manchester (UK) By Peter Mork (Univ. of Washington)By Peter Mork (Univ. of Washington) Both models are bigBoth models are big
Ultimate goal was finding differencesUltimate goal was finding differencesLike most match algorithms, ours Like most match algorithms, ours
calculates a similarity score for the calculates a similarity score for the mmn pairs of elementsn pairs of elements
Sept. 15, 2003 © 2003 Microsoft Corporation 22
Aligning RepresentationsAligning Representations
FMA:FMA:
CRM:CRM:
Heart sensiblyHeart sensiblyhasStructuralComponenthasStructuralComponentValveInHeartValveInHeart
HeartCardiac
valvegeneric
part HeartCardiac
valve
genericpart
Heart
sensibly
h-S-C
Valve InHeart
Sept. 15, 2003 © 2003 Microsoft Corporation 23
Anatomy Matching AlgorithmAnatomy Matching Algorithm
1.1. Lexical MatchLexical Match• Normalize string, UMLS dictionary lookup, Normalize string, UMLS dictionary lookup,
convert to concept-ID from thesaurusconvert to concept-ID from thesaurus
• String comparison String comparison 306 matches 306 matches• Adding spaces, ignoring case Adding spaces, ignoring case 1834 matches 1834 matches• Lexical tools Lexical tools 3503 matches 3503 matches
Sept. 15, 2003 © 2003 Microsoft Corporation 25
Anatomy Matching AlgorithmAnatomy Matching Algorithm1.1. Lexical MatchLexical Match
• Normalize string, UMLS dictionary lookup, Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurusconvert to concept-ID from thesaurus
2.2. Structure MatchStructure Match• Similarity(reified nodes) Similarity(reified nodes)
= Average(neighbors) = Average(neighbors)• Back-propagate to neighborsBack-propagate to neighbors
• Adds 64 matches (to previous 3503)Adds 64 matches (to previous 3503)• Implies 875 reified relationship matchesImplies 875 reified relationship matches
Sept. 15, 2003 © 2003 Microsoft Corporation 26
Anatomy Matching ExampleAnatomy Matching Example
HeartCardiac
valve
genericpart
Heart
sensibly
h-S-C
Valve InHeart
S = 2/3
S = 2/15
S = 1
S = 1
S: similarity score
Sept. 15, 2003 © 2003 Microsoft Corporation 27
Anatomy Matching AlgorithmAnatomy Matching Algorithm1.1. Lexical MatchLexical Match
• Normalize string, UMLS dictionary lookup, Normalize string, UMLS dictionary lookup, convert to concept-ID from thesaurusconvert to concept-ID from thesaurus
2.2. Structure MatchStructure Match• Similarity(reified nodes) Similarity(reified nodes)
= Average(neighbors) = Average(neighbors)• Back-propagate to neighborsBack-propagate to neighbors
3.3. Align Super-classesAlign Super-classes• Super-class similarity = average similarity of Super-class similarity = average similarity of
children, grandchildren, great-grandchildrenchildren, grandchildren, great-grandchildren• Adds 213 matches (to 3567)Adds 213 matches (to 3567)
Sept. 15, 2003 © 2003 Microsoft Corporation 28
Some LessonsSome Lessons A common encoding of models is hard A common encoding of models is hard
and involves compromisesand involves compromises Different styles of reifying relationshipsDifferent styles of reifying relationships CRM stores transitive relationships CRM stores transitive relationships
Match needs to invent generalizationsMatch needs to invent generalizations In FMA, In FMA, arterial supply, venous arterial supply, venous
drainage,drainage, nerve supply, lymphatic nerve supply, lymphatic drainagedrainage
In CRM, these all map to In CRM, these all map to isServedByisServedBy
On big models, Match is expensiveOn big models, Match is expensive Some steps required days to executeSome steps required days to execute Cross-product filled 80 GB (< 1GB input).Cross-product filled 80 GB (< 1GB input).
Sept. 15, 2003 © 2003 Microsoft Corporation 29
OutlineOutline
Introduction to Model ManagementIntroduction to Model Management Using MM to solve meta data Using MM to solve meta data
problemsproblems Matching anatomy ontologiesMatching anatomy ontologies Model mergingModel merging Wrap-upWrap-up
Sept. 15, 2003 © 2003 Microsoft Corporation 30
Merge(Merge(MM11, , MM22, , mapmap)) Return the union of models Return the union of models MM11 and and MM22
Use Use mapmap to guide the Merge to guide the Merge If elements x = y in If elements x = y in mapmap, then collapse , then collapse
them into one elementthem into one element
Emp
Addr Name
Emp
Name Phone
map
=
Emp
Name PhoneAddr
Sept. 15, 2003 © 2003 Microsoft Corporation 31
Merge(MMerge(M11, M, M22, map), map) [Buneman, Davidson, Kosky, EDBT 92][Buneman, Davidson, Kosky, EDBT 92]
Meta-model has aggregation & generalization onlyMeta-model has aggregation & generalization only Union, and collapse objects having the same nameUnion, and collapse objects having the same name Fix-up step for inconsistencies created by mergingFix-up step for inconsistencies created by merging
Y
X
a
Z
X
aY X Z
W
a
Y
X
Z
a a
Successive fixups lead to different results Successive fixups lead to different results Batch them at the end, to get a unique minimal resultBatch them at the end, to get a unique minimal result Now enrich the meta-model (containment, complex Now enrich the meta-model (containment, complex
mappings, …) & merge semantics (conflicts, deletes)mappings, …) & merge semantics (conflicts, deletes)
Sept. 15, 2003 © 2003 Microsoft Corporation 32
Emp
Emp#
Name
Employee
EmployeeID
FirstName
LastName
mapee
1
2
3 4
Resolving Merge ConflictsResolving Merge Conflicts
Emp
Emp#
Name
FirstName LastName
5
6
7
8
9
10
11
Meta MetaModel
Conflict
ModelConflict
MetaModel
Conflict
Sept. 15, 2003 © 2003 Microsoft Corporation 33
Contributions to MergeContributions to Merge[Pottinger & Bernstein, VLDB 03][Pottinger & Bernstein, VLDB 03]
Generic correctness criteria for MergeGeneric correctness criteria for Merge Use of first-class input mapping (not just Use of first-class input mapping (not just
correspondences)correspondences) Taxonomy of conflicts & resolution strategiesTaxonomy of conflicts & resolution strategies Characterize when Merge can be automaticCharacterize when Merge can be automatic A merge algorithm for an EER representationA merge algorithm for an EER representation Experimental evaluationExperimental evaluation
Sept. 15, 2003 © 2003 Microsoft Corporation 36
What Next?What Next?
Add semantics to mappingsAdd semantics to mappings Thorough formal semantics of operators Thorough formal semantics of operators Industrial strength schema matchingIndustrial strength schema matching More and bigger applicationsMore and bigger applications More prototypesMore prototypes More operatorsMore operators Better user interfacesBetter user interfaces
Sept. 15, 2003 © 2003 Microsoft Corporation 37
ReferencesReferences http://www.research.microsoft.com/~philbe http://www.research.microsoft.com/~philbe Overview Overview
Bernstein, CIDR 2003Bernstein, CIDR 2003 Bernstein, Halevy, & Pottinger, SIGMOD Record, Dec. 2000Bernstein, Halevy, & Pottinger, SIGMOD Record, Dec. 2000
ImplementationImplementation Melnik, Rahm, & Bernstein, SIGMOD 2003 Melnik, Rahm, & Bernstein, SIGMOD 2003
Data Warehouse ExamplesData Warehouse Examples Bernstein & Rahm, ER 2000Bernstein & Rahm, ER 2000
Match OperationMatch Operation Survey: Rahm & Bernstein , VLDB J., Dec. 2001Survey: Rahm & Bernstein , VLDB J., Dec. 2001 Prototype: Madhavan, Bernstein, & Rahm, VLDB 2001Prototype: Madhavan, Bernstein, & Rahm, VLDB 2001
Merge OperationMerge Operation Pottinger & Bernstein, VLDB 2003Pottinger & Bernstein, VLDB 20033737
TheoryTheory AlagiAlagićć & Bernstein, DBPL 2001 & Bernstein, DBPL 2001 Madhavan et al, AAAI 2002Madhavan et al, AAAI 2002
Sept. 15, 2003 © 2003 Microsoft Corporation 38