Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of...

20
Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of...

Page 1: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Generic Schema Matching with Cupid

Jayant MadhavanPhilip A. Bernstein

Erhard Raham

Proceedings of the 27th VLDB Conference

Page 2: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Schema Matching

Page 3: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Schema Matching (Cont.)

Definition: Finding a mapping between those elements of two schemas that semantically correspond to each otherApplications

Schema integration Data translation XML message mapping Data warehouse loading

Goal

Page 4: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Taxonomy

Schema vs. Instance based Element vs. Structure granularityLinguistic basedConstraint basedMatching cardinalityAuxiliary informationIndividual vs. Combinational

Page 5: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

CupidSchema-based Automated linguistic-based matchingBoth element-based and structure-basedBiased toward similarity of atomic elementsExploits internal structureExploits keys, referential constraints and viewsMakes context-dependent matches of a shard type1:n mapping

Page 6: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Similarity Coefficient Computation

First Phase: Linguistic matching Names Data types Domains

Linguistic similarity coefficient: lsim

Second Phase: Structural matching Contexts Linguistic similarity coefficients

Structural similarity coefficient: ssim

Hybrid (wsim = w_struct * ssim + (1-w_struct) * lsim)

Page 7: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Linguistic Matching Normalization

Tokenization Expansion elimination

Categorization Data types Schema hierarchy Linguistic contents

Comparison—Linguistic Similarity Coefficient (lsim)

Thesaurus Sub-string matching

Page 8: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Structural Matching Bottom-upMutually Recursive

Page 9: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Example

Page 10: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Example (Cont.)

Page 11: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Example (Cont.)

Page 12: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Schema Graphs Elements Relationships(containment, aggregation, and

IsDerivedFrom)

Matching Shard Types (context dependent mappings)Matching Referential Constraints

General Schemas

Page 13: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Matching Shard Types

Page 14: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Matching Referential Constraints

Page 15: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Other FeaturesOptionalityViewsInitial MappingsLazy ExpansionPruning Leaves

Page 16: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Comparative Study

Algorithms MOMIS DIKE Cupid

Canonical ExamplesReal World Example

Page 17: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Canonical ExamplesIdentical schemasAtomic elements with same names, but different data typesAtomic elements with same data types, but different names (a prefix or suffix is added)Different class names, but atomic elements same names and data typesDifferent nesting of the data – similar schemas with nested and flat structuresType substitution or context dependent mapping

Page 18: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Real World Example

Page 19: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Experimental ConclusionsLinguistic matchingThesaurusLinguistic similarity with no structure similarityGranularity of similarity computationLeavesStructure information beyond the immediate vicinityContext-dependent mappingsPerformance parameters

Page 20: Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Future WorkA Truly Robust Solution

Machine learning applied to instances Natural language technology Pattern matching to reuse known matches

Immediate Challenges Off-the-shelf thesaurus Schema annotations Automatic tuning of the control parameters Scalability analysis and testing More comparative analysis of algorithms