Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of...
Generic Schema Matching with Cupid
Jayant MadhavanPhilip A. Bernstein
Erhard Raham
Proceedings of the 27th VLDB Conference
Schema Matching
Schema Matching (Cont.)
Definition: Finding a mapping between those elements of two schemas that semantically correspond to each otherApplications
Schema integration Data translation XML message mapping Data warehouse loading
Goal
Taxonomy
Schema vs. Instance based Element vs. Structure granularityLinguistic basedConstraint basedMatching cardinalityAuxiliary informationIndividual vs. Combinational
CupidSchema-based Automated linguistic-based matchingBoth element-based and structure-basedBiased toward similarity of atomic elementsExploits internal structureExploits keys, referential constraints and viewsMakes context-dependent matches of a shard type1:n mapping
Similarity Coefficient Computation
First Phase: Linguistic matching Names Data types Domains
Linguistic similarity coefficient: lsim
Second Phase: Structural matching Contexts Linguistic similarity coefficients
Structural similarity coefficient: ssim
Hybrid (wsim = w_struct * ssim + (1-w_struct) * lsim)
Linguistic Matching Normalization
Tokenization Expansion elimination
Categorization Data types Schema hierarchy Linguistic contents
Comparison—Linguistic Similarity Coefficient (lsim)
Thesaurus Sub-string matching
Structural Matching Bottom-upMutually Recursive
Example
Example (Cont.)
Example (Cont.)
Schema Graphs Elements Relationships(containment, aggregation, and
IsDerivedFrom)
Matching Shard Types (context dependent mappings)Matching Referential Constraints
General Schemas
Matching Shard Types
Matching Referential Constraints
Other FeaturesOptionalityViewsInitial MappingsLazy ExpansionPruning Leaves
Comparative Study
Algorithms MOMIS DIKE Cupid
Canonical ExamplesReal World Example
Canonical ExamplesIdentical schemasAtomic elements with same names, but different data typesAtomic elements with same data types, but different names (a prefix or suffix is added)Different class names, but atomic elements same names and data typesDifferent nesting of the data – similar schemas with nested and flat structuresType substitution or context dependent mapping
Real World Example
Experimental ConclusionsLinguistic matchingThesaurusLinguistic similarity with no structure similarityGranularity of similarity computationLeavesStructure information beyond the immediate vicinityContext-dependent mappingsPerformance parameters
Future WorkA Truly Robust Solution
Machine learning applied to instances Natural language technology Pattern matching to reuse known matches
Immediate Challenges Off-the-shelf thesaurus Schema annotations Automatic tuning of the control parameters Scalability analysis and testing More comparative analysis of algorithms