Composing Mappings among Data Sources

27
Composing Mappings Composing Mappings among Data Sources among Data Sources Jayant Madhavan Jayant Madhavan Alon Halevy Alon Halevy University of Washington University of Washington

description

Composing Mappings among Data Sources. Jayant Madhavan Alon Halevy University of Washington. Mappings in data sharing architectures. Data Integration System Sources with mappings to a single mediated schema …, [Lenzerini, PODS ’02]. Mediated Schema. ACM. DBLP. CiteSeer. - PowerPoint PPT Presentation

Transcript of Composing Mappings among Data Sources

Page 1: Composing Mappings  among Data Sources

Composing Mappings Composing Mappings among Data Sourcesamong Data Sources

Jayant MadhavanJayant Madhavan Alon Halevy Alon Halevy

University of WashingtonUniversity of Washington

Page 2: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 2

Mediated Schema

Mappings in data sharing Mappings in data sharing architecturesarchitectures

Data Integration System Data Integration System Sources with mappings to a single mediated schema

…, [Lenzerini, PODS ’02]ACM DBLP CiteSeer

Peer Data Management Peer Data Management SystemSystem Network of pair-wise mappings

[Piazza, UW], [Hyperion, Toronto], [PeerDB, Singapore], [LRM, Trento], [Edutella, Hannover], [Semantic Gossiping, EPFL], [Raccoon, Irvine], [Orchestra, Penn]

ACMDBLP

CiteSeerUW

Humboldt

Page 3: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 3

Peer Data Management System Peer Data Management System (Piazza)(Piazza)

ACM

DBLP

CiteSeerUW

Humboldt

RUW

u1(x,y), … RCiteseer

c1(x,y,z), …

Peer Schema

u1(x,y), u2(y,z) c1(x,y,z)…

Mapping Mapping Formula (Q1 Q2)

Page 4: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 4

Mapping CompositionMapping Composition

ACM DBLP

CiteSeerUW

Humboldt

Page 5: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 5

Query AnsweringQuery Answering

Iterative rewriting by chaining mappings Transitive closure of all relevant mappings

ACM DBLP

CiteSeerUW

Humboldt

QQ

QQ11

QQ44

QQ55

QQ22

QQ33

Q’Q’

Eliminating redundancies from rewritings (optimization)[Piazza: ICDE’03, WWW’03, VLDBJ’03]

Page 6: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 6

Composition in a PDMSComposition in a PDMS

Potential inefficiencyExpensive rewriting + optimization at runtime at runtime for each query each query

ACM DBLP

CiteSeerUW

Humboldt

QQ

QQ11

QQ33

Q’Q’

QQ55

Side-benefit: robustness to information lossrobustness to information loss Dead intermediate peers will not semantically partition the network

But, composition must be independent of Qindependent of Q

OptimizationPre-compute or compose paths to relevant peers compose paths to relevant peers

Page 7: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 7

Composition: Meta-data Composition: Meta-data operationoperation

Mappings are integral to all data sharing architectures

Message passingData exchange [Fagin, Kolaitis & Miller, ICDT’03]…

Composition is a natural problem in many of these

Fundamental operator to meta-data management

Model Management operators: Match, Merge, ComposeCompose, …[Bernstein, Halevy, Pottinger, SIGMOD Record ’01][Melnik, Bernstein, Rahm, SIGMOD ’03]

Formal treatment for a particular mapping language

Page 8: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 8

Problem DefinitionProblem Definition

MMACAC M MABAB M MBCBC w.r.t. Query Language L LFor all queries QL,

Q Q given M MAC AC = Q = Q given M MABAB,M,MBCBC

A CB

MAC

MAB MBC

QQAA1 1 QQBB

11

……QQAA

nn QQBBnn

GLAV formulasGLAV formulas

a1,…,amRelational

Schema

QQ’Q’AA

1 1 QQcc11

……Q’Q’AA

kk QQcc

kk

Page 9: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 9

Overview of ContributionsOverview of Contributions

Surprising: Composition of finite mappings can be infinite!!!

Good news: Composition computable for powerful practical query languages

CQk finite, or infinite but encoded finitely

Composition algorithm thaton termination computes all the formulas in the compositionterminates if composition finite, and also for many infinite compositions

Rewriting algorithm to exploit infinite formulasExtension of results from answering queries using views

Complexity results

Page 10: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 10

OutlineOutline

Composition is interesting and important

Problem definition and results overview

Finite and infinite composition

Results and Composition Algorithm

Summary, current and future work

Page 11: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 11

Composition ExampleComposition Example

MAB: bbba(x,y) b(x,t1), b(t1,t2), b(t2,y)

MBC: b(x,t), b(t,y)

bbc(x,y)

Graph G

A CBMAB MBC

bbba(x,y) bbc(x,y)b(x,y)

Page 12: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 12

Composition Example (2)Composition Example (2)

Q(x) :- bbba(x,x1)

x

b(x,t1), b(t1,t2), b(t2,x1)

bbc(x,y1), bbc(y1,y2)

MAB

MBC

x y1 y2

bbba(x,x1) bbc(x,y1), bbc(y1,y2)

A CBMAB MBC

bbba(x,y) bbc(x,y)b(x,y)

Q

Page 13: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 13

Composition Example (3)Composition Example (3)

bbba(x,x1) bbc(x,y1), bbc(y1,y2)

A CBMAB MBC

bbba(x,y) bbc(x,y)b(x,y)

ybbba(x1,y) bbc(y1,y2),

bbc(y2,y)

bbba(x,x1), bbba(x1,y) bbc(x,y1), bbc(y1,y2), bbc(y2,y)

x y

MAC

Page 14: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 14

Infinite compositionInfinite composition

Graph G

A CBMAB MBC

rbba(x,y)bba(x,y)

rbc(x,y)bbc(x,y)

r(x,y)b(x,y)

MAB

rbba(x,y) r(x,t1), b(t1,t2), b(t2,y)

bba(x,y) b(x,t), b(t,y)

MBC

r(x,t), b(t,y) rbc(x,y)

b(x,t), b(t,y) bbc(x,y)

x yt

x yt

x yt1 t2

x yt

Page 15: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 15

Infinite Composition (2)Infinite Composition (2)

Q(x) :- rbba(x,x1)

A CB

MAC

MAB MBC

rbba(x,y)bba(x,y)

rbc(x,y)bbc(x,y)

r(x,y)b(x,y)

x

r(x,t1), b(t1,t2), b(t2,x1)

rbc(x,y1), bbc(y1,y2)

MAB

MBC

x

rbba(x,x1) rbc(x,y1), bbc(y1,y2)

Page 16: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 16

Infinite Composition (3)Infinite Composition (3)

bba(x,y) bbc(x,y)

rbba(x,x1) rbc(x,y1), bbc(y1,y2)

rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3)

XX

x

x

2n2n

2n+12n+1

Page 17: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 17

Main ResultMain ResultComposition computable for interesting query languages

CQk : queries with localized variable interactions

Includes most queries in practice, e.g. cyclen(x) CQ3

cyclen(x) :- b(x,y), pathn-1(y,x)

pathn-1(x,y) :- b(x,z), pathn-2(z,y)…path1(x,y) :- b(x,y)

Composition w.r.t CQk is computable and is either afinite number of GLAV formulas, or finite encoding of infinite GLAV formulas

Page 18: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 18

Composition AlgorithmComposition Algorithm

Minimal formulasFormulas that have to be present in the compositionLarger minimal formulas are extensions of smaller ones

Residues of minimal formulasSignatures that capture information on extensionsIsomorphic residues isomorphic extensions

Query Rewrite GraphsEncoding of all minimal formulas in the compositionCycles can be used to encode infinite number of formulas

Page 19: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 19

Minimal Mapping FormulasMinimal Mapping Formulas

Formulas that cannot be constructed from smaller formulas

Theorem: Sufficient to compute all minimal formulas

rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3)

u1 u2 u3

Internally existential variableInternally existential variableNot visible on right sideNot visible on right side

Join variableJoin variablex x1 x1 x2

x y1 y1 y2 y2 y3

Page 20: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 20

Incremental ConstructionIncremental ConstructionLemma: If QA QC is a minimal formula

minimal formula Q’A Q’C

QA’ has one atom less than QA

rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3)

rbba(x,x1) rbc(x,y1), bbc(y1,y2)

QA QC

Q’A1

Q’Am

Q’Ai

Q’A1 Q’C1

Try all one atom extensions

Completeformulas

X

Incremental algorithm…

Page 21: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 21

Potential Join variablePotential Join variable

Internally existential variableInternally existential variable

Residues in FormulasResidues in Formulas

Residues capture all extension information Null residues No extensions

x x1

x y1u1 y2u2

x1 x2

ResidueResidueb(ub(u22,y,y22), {u), {u22}, {y}, {y22}, {x}, {x11uu22}}

rbba(x,x1) rbc(x,y1), bbc(y1,y2)

y2 y3u3

Page 22: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 22

Isomorphic ResiduesIsomorphic Residues

Isomorphic residues Isomorphic extensions

x x1

x y1u1 y2u2

rbba(x,x1) rbc(x,y1), bbc(y1,y2)

x x1

x y1u1 y2u2

x2

y3u3

rbba(x,x1), bba(x1,x2)

rbc(x,y1), bbc(y1,y2), bbc(y2,y3)

isomorphic

Page 23: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 23

Query Rewrite GraphsQuery Rewrite Graphs

Paths from roots encode minimal mapping formulasCycles encode infinite formulas

Q2 rbba(x,x1)Q1 bba(x,y)

Q3 bba(x1,x2)

R3 bbc(y2,y1)

Query Nodes

R2 rbc(x,y1),bbc(y1,y2)R1 bbc(x,y)

Rewrite Nodes

Theorem: QRG construction on termination encodes the composition

Page 24: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 24

Other ResultsOther Results

Algorithm to exploit infinite formulasCyclic QRG can be represented by a pair of recursive datalog programsExtension of earlier results in answering queries using infinite views [Levy, Rajaraman & Ullman, PODS’96]

Complexity Results Upper-bound: composition verification is in Lower-bound: composition verification w.r.t. finite sized query languages is -hard

p2

p2

Page 25: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 25

Related WorkRelated Work

GLAV[Millstein, Friedman & Halevy, AAAI’99], [Lenzerini, PODS’02], [Fagin, Kolaitis & Miller, ICDT’03]Generalization of LaV and GaVLeads to infinite composition

Reasoning w.r.t. Query LanguagesView containment [Li, Ullman & Bawa, ICDT’01]Makes the problem hard

Page 26: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 26

SummarySummary

Mapping composition

Can be infinite for simple GLAV mappings

Can be constructed completely for interesting query languages

QRG encodes valid formulas in compositionQRG can also encode infinite formulas

Can be exploited for query answering even when infinite

A CB

MAC

MAB MBC

Q

Page 27: Composing Mappings  among Data Sources

September 11, 2003 Composing Mappings among Data Sources 27

Current and Future WorkCurrent and Future Work

Composition in a PDMSChoosing paths to pre-computeManipulating infinite compositions

Semi-automatic construction of mappingsLearning from a corpus of related schemasExploiting past mapping experience[Halevy, Madhavan & Bernstein, DeBull’03 to appear][Madhavan, Bernstein, Chen, Halevy & Shenoy, IIW@IJCAI ’03]

More information:http://www.cs.washington.edu/homes/jayant