Composing Mappings among Data Sources
description
Transcript of Composing Mappings among Data Sources
Composing Mappings Composing Mappings among Data Sourcesamong Data Sources
Jayant MadhavanJayant Madhavan Alon Halevy Alon Halevy
University of WashingtonUniversity of Washington
September 11, 2003 Composing Mappings among Data Sources 2
Mediated Schema
Mappings in data sharing Mappings in data sharing architecturesarchitectures
Data Integration System Data Integration System Sources with mappings to a single mediated schema
…, [Lenzerini, PODS ’02]ACM DBLP CiteSeer
Peer Data Management Peer Data Management SystemSystem Network of pair-wise mappings
[Piazza, UW], [Hyperion, Toronto], [PeerDB, Singapore], [LRM, Trento], [Edutella, Hannover], [Semantic Gossiping, EPFL], [Raccoon, Irvine], [Orchestra, Penn]
ACMDBLP
CiteSeerUW
Humboldt
September 11, 2003 Composing Mappings among Data Sources 3
Peer Data Management System Peer Data Management System (Piazza)(Piazza)
ACM
DBLP
CiteSeerUW
Humboldt
RUW
u1(x,y), … RCiteseer
c1(x,y,z), …
Peer Schema
u1(x,y), u2(y,z) c1(x,y,z)…
Mapping Mapping Formula (Q1 Q2)
September 11, 2003 Composing Mappings among Data Sources 4
Mapping CompositionMapping Composition
ACM DBLP
CiteSeerUW
Humboldt
September 11, 2003 Composing Mappings among Data Sources 5
Query AnsweringQuery Answering
Iterative rewriting by chaining mappings Transitive closure of all relevant mappings
ACM DBLP
CiteSeerUW
Humboldt
QQ11
QQ44
QQ55
QQ22
QQ33
Q’Q’
Eliminating redundancies from rewritings (optimization)[Piazza: ICDE’03, WWW’03, VLDBJ’03]
September 11, 2003 Composing Mappings among Data Sources 6
Composition in a PDMSComposition in a PDMS
Potential inefficiencyExpensive rewriting + optimization at runtime at runtime for each query each query
ACM DBLP
CiteSeerUW
Humboldt
QQ11
QQ33
Q’Q’
QQ55
Side-benefit: robustness to information lossrobustness to information loss Dead intermediate peers will not semantically partition the network
But, composition must be independent of Qindependent of Q
OptimizationPre-compute or compose paths to relevant peers compose paths to relevant peers
September 11, 2003 Composing Mappings among Data Sources 7
Composition: Meta-data Composition: Meta-data operationoperation
Mappings are integral to all data sharing architectures
Message passingData exchange [Fagin, Kolaitis & Miller, ICDT’03]…
Composition is a natural problem in many of these
Fundamental operator to meta-data management
Model Management operators: Match, Merge, ComposeCompose, …[Bernstein, Halevy, Pottinger, SIGMOD Record ’01][Melnik, Bernstein, Rahm, SIGMOD ’03]
Formal treatment for a particular mapping language
September 11, 2003 Composing Mappings among Data Sources 8
Problem DefinitionProblem Definition
MMACAC M MABAB M MBCBC w.r.t. Query Language L LFor all queries QL,
Q Q given M MAC AC = Q = Q given M MABAB,M,MBCBC
A CB
MAC
MAB MBC
QQAA1 1 QQBB
11
……QQAA
nn QQBBnn
GLAV formulasGLAV formulas
a1,…,amRelational
Schema
QQ’Q’AA
1 1 QQcc11
……Q’Q’AA
kk QQcc
kk
September 11, 2003 Composing Mappings among Data Sources 9
Overview of ContributionsOverview of Contributions
Surprising: Composition of finite mappings can be infinite!!!
Good news: Composition computable for powerful practical query languages
CQk finite, or infinite but encoded finitely
Composition algorithm thaton termination computes all the formulas in the compositionterminates if composition finite, and also for many infinite compositions
Rewriting algorithm to exploit infinite formulasExtension of results from answering queries using views
Complexity results
September 11, 2003 Composing Mappings among Data Sources 10
OutlineOutline
Composition is interesting and important
Problem definition and results overview
Finite and infinite composition
Results and Composition Algorithm
Summary, current and future work
September 11, 2003 Composing Mappings among Data Sources 11
Composition ExampleComposition Example
MAB: bbba(x,y) b(x,t1), b(t1,t2), b(t2,y)
MBC: b(x,t), b(t,y)
bbc(x,y)
Graph G
A CBMAB MBC
bbba(x,y) bbc(x,y)b(x,y)
September 11, 2003 Composing Mappings among Data Sources 12
Composition Example (2)Composition Example (2)
Q(x) :- bbba(x,x1)
x
b(x,t1), b(t1,t2), b(t2,x1)
bbc(x,y1), bbc(y1,y2)
MAB
MBC
x y1 y2
bbba(x,x1) bbc(x,y1), bbc(y1,y2)
A CBMAB MBC
bbba(x,y) bbc(x,y)b(x,y)
Q
September 11, 2003 Composing Mappings among Data Sources 13
Composition Example (3)Composition Example (3)
bbba(x,x1) bbc(x,y1), bbc(y1,y2)
A CBMAB MBC
bbba(x,y) bbc(x,y)b(x,y)
ybbba(x1,y) bbc(y1,y2),
bbc(y2,y)
bbba(x,x1), bbba(x1,y) bbc(x,y1), bbc(y1,y2), bbc(y2,y)
x y
MAC
September 11, 2003 Composing Mappings among Data Sources 14
Infinite compositionInfinite composition
Graph G
A CBMAB MBC
rbba(x,y)bba(x,y)
rbc(x,y)bbc(x,y)
r(x,y)b(x,y)
MAB
rbba(x,y) r(x,t1), b(t1,t2), b(t2,y)
bba(x,y) b(x,t), b(t,y)
MBC
r(x,t), b(t,y) rbc(x,y)
b(x,t), b(t,y) bbc(x,y)
x yt
x yt
x yt1 t2
x yt
September 11, 2003 Composing Mappings among Data Sources 15
Infinite Composition (2)Infinite Composition (2)
Q(x) :- rbba(x,x1)
A CB
MAC
MAB MBC
rbba(x,y)bba(x,y)
rbc(x,y)bbc(x,y)
r(x,y)b(x,y)
x
r(x,t1), b(t1,t2), b(t2,x1)
rbc(x,y1), bbc(y1,y2)
MAB
MBC
x
rbba(x,x1) rbc(x,y1), bbc(y1,y2)
September 11, 2003 Composing Mappings among Data Sources 16
Infinite Composition (3)Infinite Composition (3)
bba(x,y) bbc(x,y)
rbba(x,x1) rbc(x,y1), bbc(y1,y2)
rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3)
XX
x
x
2n2n
2n+12n+1
September 11, 2003 Composing Mappings among Data Sources 17
Main ResultMain ResultComposition computable for interesting query languages
CQk : queries with localized variable interactions
Includes most queries in practice, e.g. cyclen(x) CQ3
cyclen(x) :- b(x,y), pathn-1(y,x)
pathn-1(x,y) :- b(x,z), pathn-2(z,y)…path1(x,y) :- b(x,y)
Composition w.r.t CQk is computable and is either afinite number of GLAV formulas, or finite encoding of infinite GLAV formulas
September 11, 2003 Composing Mappings among Data Sources 18
Composition AlgorithmComposition Algorithm
Minimal formulasFormulas that have to be present in the compositionLarger minimal formulas are extensions of smaller ones
Residues of minimal formulasSignatures that capture information on extensionsIsomorphic residues isomorphic extensions
Query Rewrite GraphsEncoding of all minimal formulas in the compositionCycles can be used to encode infinite number of formulas
September 11, 2003 Composing Mappings among Data Sources 19
Minimal Mapping FormulasMinimal Mapping Formulas
Formulas that cannot be constructed from smaller formulas
Theorem: Sufficient to compute all minimal formulas
rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3)
u1 u2 u3
Internally existential variableInternally existential variableNot visible on right sideNot visible on right side
Join variableJoin variablex x1 x1 x2
x y1 y1 y2 y2 y3
September 11, 2003 Composing Mappings among Data Sources 20
Incremental ConstructionIncremental ConstructionLemma: If QA QC is a minimal formula
minimal formula Q’A Q’C
QA’ has one atom less than QA
rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3)
rbba(x,x1) rbc(x,y1), bbc(y1,y2)
QA QC
Q’A1
Q’Am
Q’Ai
Q’A1 Q’C1
Try all one atom extensions
Completeformulas
X
Incremental algorithm…
September 11, 2003 Composing Mappings among Data Sources 21
Potential Join variablePotential Join variable
Internally existential variableInternally existential variable
Residues in FormulasResidues in Formulas
Residues capture all extension information Null residues No extensions
x x1
x y1u1 y2u2
x1 x2
ResidueResidueb(ub(u22,y,y22), {u), {u22}, {y}, {y22}, {x}, {x11uu22}}
rbba(x,x1) rbc(x,y1), bbc(y1,y2)
y2 y3u3
September 11, 2003 Composing Mappings among Data Sources 22
Isomorphic ResiduesIsomorphic Residues
Isomorphic residues Isomorphic extensions
x x1
x y1u1 y2u2
rbba(x,x1) rbc(x,y1), bbc(y1,y2)
x x1
x y1u1 y2u2
x2
y3u3
rbba(x,x1), bba(x1,x2)
rbc(x,y1), bbc(y1,y2), bbc(y2,y3)
isomorphic
September 11, 2003 Composing Mappings among Data Sources 23
Query Rewrite GraphsQuery Rewrite Graphs
Paths from roots encode minimal mapping formulasCycles encode infinite formulas
Q2 rbba(x,x1)Q1 bba(x,y)
Q3 bba(x1,x2)
R3 bbc(y2,y1)
Query Nodes
R2 rbc(x,y1),bbc(y1,y2)R1 bbc(x,y)
Rewrite Nodes
Theorem: QRG construction on termination encodes the composition
September 11, 2003 Composing Mappings among Data Sources 24
Other ResultsOther Results
Algorithm to exploit infinite formulasCyclic QRG can be represented by a pair of recursive datalog programsExtension of earlier results in answering queries using infinite views [Levy, Rajaraman & Ullman, PODS’96]
Complexity Results Upper-bound: composition verification is in Lower-bound: composition verification w.r.t. finite sized query languages is -hard
p2
p2
September 11, 2003 Composing Mappings among Data Sources 25
Related WorkRelated Work
GLAV[Millstein, Friedman & Halevy, AAAI’99], [Lenzerini, PODS’02], [Fagin, Kolaitis & Miller, ICDT’03]Generalization of LaV and GaVLeads to infinite composition
Reasoning w.r.t. Query LanguagesView containment [Li, Ullman & Bawa, ICDT’01]Makes the problem hard
September 11, 2003 Composing Mappings among Data Sources 26
SummarySummary
Mapping composition
Can be infinite for simple GLAV mappings
Can be constructed completely for interesting query languages
QRG encodes valid formulas in compositionQRG can also encode infinite formulas
Can be exploited for query answering even when infinite
A CB
MAC
MAB MBC
Q
September 11, 2003 Composing Mappings among Data Sources 27
Current and Future WorkCurrent and Future Work
Composition in a PDMSChoosing paths to pre-computeManipulating infinite compositions
Semi-automatic construction of mappingsLearning from a corpus of related schemasExploiting past mapping experience[Halevy, Madhavan & Bernstein, DeBull’03 to appear][Madhavan, Bernstein, Chen, Halevy & Shenoy, IIW@IJCAI ’03]
More information:http://www.cs.washington.edu/homes/jayant