Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of...

Post on 21-Dec-2015

214 views 0 download

Tags:

Transcript of Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of...

Generic Schema Matching with Cupid

Jayant MadhavanPhilip A. Bernstein

Erhard Raham

Proceedings of the 27th VLDB Conference

Schema Matching

Schema Matching (Cont.)

Definition: Finding a mapping between those elements of two schemas that semantically correspond to each otherApplications

Schema integration Data translation XML message mapping Data warehouse loading

Goal

Taxonomy

Schema vs. Instance based Element vs. Structure granularityLinguistic basedConstraint basedMatching cardinalityAuxiliary informationIndividual vs. Combinational

CupidSchema-based Automated linguistic-based matchingBoth element-based and structure-basedBiased toward similarity of atomic elementsExploits internal structureExploits keys, referential constraints and viewsMakes context-dependent matches of a shard type1:n mapping

Similarity Coefficient Computation

First Phase: Linguistic matching Names Data types Domains

Linguistic similarity coefficient: lsim

Second Phase: Structural matching Contexts Linguistic similarity coefficients

Structural similarity coefficient: ssim

Hybrid (wsim = w_struct * ssim + (1-w_struct) * lsim)

Linguistic Matching Normalization

Tokenization Expansion elimination

Categorization Data types Schema hierarchy Linguistic contents

Comparison—Linguistic Similarity Coefficient (lsim)

Thesaurus Sub-string matching

Structural Matching Bottom-upMutually Recursive

Example

Example (Cont.)

Example (Cont.)

Schema Graphs Elements Relationships(containment, aggregation, and

IsDerivedFrom)

Matching Shard Types (context dependent mappings)Matching Referential Constraints

General Schemas

Matching Shard Types

Matching Referential Constraints

Other FeaturesOptionalityViewsInitial MappingsLazy ExpansionPruning Leaves

Comparative Study

Algorithms MOMIS DIKE Cupid

Canonical ExamplesReal World Example

Canonical ExamplesIdentical schemasAtomic elements with same names, but different data typesAtomic elements with same data types, but different names (a prefix or suffix is added)Different class names, but atomic elements same names and data typesDifferent nesting of the data – similar schemas with nested and flat structuresType substitution or context dependent mapping

Real World Example

Experimental ConclusionsLinguistic matchingThesaurusLinguistic similarity with no structure similarityGranularity of similarity computationLeavesStructure information beyond the immediate vicinityContext-dependent mappingsPerformance parameters

Future WorkA Truly Robust Solution

Machine learning applied to instances Natural language technology Pattern matching to reuse known matches

Immediate Challenges Off-the-shelf thesaurus Schema annotations Automatic tuning of the control parameters Scalability analysis and testing More comparative analysis of algorithms