CS 595 - Hot topics in database systems: Data Provenance...

89
CS 595 - Hot topics in database systems: Data Provenance I. Database Provenance I.1 Provenance Models and Systems Boris Glavic September 22, 2012

Transcript of CS 595 - Hot topics in database systems: Data Provenance...

Page 1: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

CS 595 - Hot topics in database systems:Data ProvenanceI. Database Provenance

I.1 Provenance Models and Systems

Boris Glavic

September 22, 2012

Page 2: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Outline

1 Where-Provenance and DBNotesIntroductionWhere-Provenance ModelInsensitive Where-ProvenanceInsensitive Where-Provenance for Queries With UnionDBNotesRecap

Page 3: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Where-Provenance

Motivation

• Annotation Propagation• Given database, annotations on attribute values• How to annotate query results?• Alterative Goal: study Where values (data) are copied from

Rationale

• From which attribute values in the input is a target attributevalue t.A in the result of q copied from.

• Annotated DBs: add annotations based on annotations inWhere-provenance

Provenance Representation

• Set of input attribute values (locations)

Slide 1 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 4: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Annotated Databases

Perquisites

• Relational Model

• Cell (R, t, a): Relation R at tuple t and attribute a

Annotated Database

• Each cell (R, t, a) is associated with a set of annotationsA(R, t, a) (Strings)

• For query results: A(Q(I ), t, a)

Example

Slide 2 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 5: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Excursion - Query Containment

Query Equivalence

• Recall: q ≡ q′ ⇔ ∀I : Q(I ) = Q ′(I )

• Queries have same results over same input

Query Containment

• q ⊆ q′: Query q contained in q′

• q ⊆ q′ ⇔ ∀I : Q(I ) ⊆ Q(I ′)

• ⇒The result of q is contained in the result of q′

• Relationship to equivalence: q ⊆ q′ ∧ q′ ⊆ q ⇔ q ≡ q′

Slide 3 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 6: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Annotation Containment

Annotation Containment

• q ⊆A q′: Query q annotation contained in q′ iff

1 q ⊆ q′: q contained in q′

2 ∀I , t,A : A(Q(I ), t,A) ⊆ A(Q ′(I ), t,A)

• ⇒q only has tuples from q′ with the same or some of theannotations

Annotation Equivalence

• q =A q′ : q ⊆A q′ ∧ q′ ⊆A q

• Direct definition ∀I , t, a:

1 Q(I ) = Q ′(I )2 A(Q(I ), t, a) = A(Q ′(I ), t, a)

Slide 4 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 7: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Annotation Example

CREATE VIEW EnzymeProduce AS

SELECT Enzyme , Id AS Gene

FROM Gene G, Enzyme E

WHERE G.Id = E.ProducedBy

PROPAGATE G.Id TO Gene , E.Enzyme TO Enzyme

EnzymeProduceEnzyme Gene

EC 1.1.1.1{a1} 4q11-q13{a4}

EC 1.97.1.6{a3} 4q11-q13{a4}

Slide 5 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 8: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Relationship Annotation Propagation andWhere-Provenance

• Where-provenance one possible way to define annotationpropagation

• ⇒Output attribute value carries annotations from all inputvalues in its provenance

• ⇒Copying values ⇒ copying annotations

• What about functions or aggregation?• E.g., sum(a)• E.g., f (a1, a2)

• Not always what user wants• Allow user to specify how to propagate?

Slide 6 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 9: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Relationship Annotation Propagation andWhere-Provenance

• Where-provenance one possible way to define annotationpropagation

• ⇒Output attribute value carries annotations from all inputvalues in its provenance

• ⇒Copying values ⇒ copying annotations

• What about functions or aggregation?• E.g., sum(a)• E.g., f (a1, a2)

• Not always what user wants• Allow user to specify how to propagate?

Slide 6 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 10: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Relationship Annotation Propagation andWhere-Provenance

• Where-provenance one possible way to define annotationpropagation

• ⇒Output attribute value carries annotations from all inputvalues in its provenance

• ⇒Copying values ⇒ copying annotations

• What about functions or aggregation?• E.g., sum(a)• E.g., f (a1, a2)

• Not always what user wants• Allow user to specify how to propagate?

Slide 6 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 11: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Introduction

Relationship Annotation Propagation andWhere-Provenance

• Where-provenance one possible way to define annotationpropagation

• ⇒Output attribute value carries annotations from all inputvalues in its provenance

• ⇒Copying values ⇒ copying annotations

• What about functions or aggregation?• E.g., sum(a)• E.g., f (a1, a2)

• Not always what user wants• Allow user to specify how to propagate?

Slide 6 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 12: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Outline

1 Where-Provenance and DBNotesIntroductionWhere-Provenance ModelInsensitive Where-ProvenanceInsensitive Where-Provenance for Queries With UnionDBNotesRecap

Page 13: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Where-Provenance

Properties

• Granularity: Attribute values

• Representation: Set of attribute locations: (R, t, a)

• Definition: Syntactic, recursive definition• Hard to come up with declarative definition!

• Sensitive to query rewrite

• ⇒Insensitive declarative version

Slide 7 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 14: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Definition (Where-Provenance)

1 Where(R, t, a) = A(R, t, a)

2 Where(σC (q), t, a) = Where(q, t, a)

3 Where(πA(q), t, b) =⋃

u.A=t Where(q, u, a)• Assuming that a is projected on b: (a→ b) ∈ A

4 Where(q1 ><C q2, t, a) =Where(q1, t.Q1, a) ∪Where(q2, t.Q2, a)

5 Where(q1 ∪ q2, t, a) = Where(q1, t, a) ∪Where(q2, t, a)

Slide 8 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 15: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Definition (Where-Provenance)

1 Where(R, t, a) = A(R, t, a)

2 Where(σC (q), t, a) = Where(q, t, a)

3 Where(πA(q), t, b) =⋃

u.A=t Where(q, u, a)• Assuming that a is projected on b: (a→ b) ∈ A

4 Where(q1 ><C q2, t, a) =Where(q1, t.Q1, a) ∪Where(q2, t.Q2, a)

5 Where(q1 ∪ q2, t, a) = Where(q1, t, a) ∪Where(q2, t, a)

Slide 8 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 16: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Definition (Where-Provenance)

1 Where(R, t, a) = A(R, t, a)

2 Where(σC (q), t, a) = Where(q, t, a)

3 Where(πA(q), t, b) =⋃

u.A=t Where(q, u, a)• Assuming that a is projected on b: (a→ b) ∈ A

4 Where(q1 ><C q2, t, a) =Where(q1, t.Q1, a) ∪Where(q2, t.Q2, a)

5 Where(q1 ∪ q2, t, a) = Where(q1, t, a) ∪Where(q2, t, a)

Slide 8 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 17: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Definition (Where-Provenance)

1 Where(R, t, a) = A(R, t, a)

2 Where(σC (q), t, a) = Where(q, t, a)

3 Where(πA(q), t, b) =⋃

u.A=t Where(q, u, a)• Assuming that a is projected on b: (a→ b) ∈ A

4 Where(q1 ><C q2, t, a) =Where(q1, t.Q1, a) ∪Where(q2, t.Q2, a)

5 Where(q1 ∪ q2, t, a) = Where(q1, t, a) ∪Where(q2, t, a)

Slide 8 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 18: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Definition (Where-Provenance)

1 Where(R, t, a) = A(R, t, a)

2 Where(σC (q), t, a) = Where(q, t, a)

3 Where(πA(q), t, b) =⋃

u.A=t Where(q, u, a)• Assuming that a is projected on b: (a→ b) ∈ A

4 Where(q1 ><C q2, t, a) =Where(q1, t.Q1, a) ∪Where(q2, t.Q2, a)

5 Where(q1 ∪ q2, t, a) = Where(q1, t, a) ∪Where(q2, t, a)

Slide 8 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 19: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Comments

• Where(q, t, a) = ∅ if q has no result attribute a

• Projection only on attribute + renaming

• Tuple t here refers to the values not the identity!• E.g., (Peter ,Chicago) and not t2

Slide 9 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 20: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Where-Provenance Example

Example

resultname street city

t1 Bob 10 W 31st Chicagot2 Alice 75 Cermak Chicagot3 Gertrud 15 Ellis Berlin

SELECT P.name , A.street , A.city

FROM person P, address A

WHERE P.addr = A.street

personname addr

p1 Bob 10 W 31stp2 Alice 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicagoa2 75 Cermak Chicagoa3 15 Ellis Berlin

Slide 10 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 21: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Where-Provenance Example

Example

resultname street city

t1 Bob 10 W 31st Chicagot2 Alice 75 Cermak Chicagot3 Gertrud 15 Ellis Berlin

q = πname,A.street,A.city (q1)

q1 = person><addr=street address

personname addr

p1 Bob 10 W 31stp2 Alice 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicagoa2 75 Cermak Chicagoa3 15 Ellis Berlin

Slide 10 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 22: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Where-Provenance Example

Example

Where(q, t2, name) = Where(q1, i2, name)

Where(q1, i2, name) = Where(person, p2, name)

∪Where(address, a2, name)

Where(person, p2, name) = {a2, a3}Where(address, a2, name) = {}

personname addr

p1 Bob 10 W 31st{a1}

p2 Alice{a2,a3} 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicago{a4}

a2 75 Cermak{a5} Chicagoa3 15 Ellis Berlin

a1 =”since 2005” a2 =”artist” a3 =”single” a4 =”south side”a5 =”construction”

Slide 10 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 23: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Where-Provenance Example

Example

resultname street city

t1 Bob 10 W 31st Chicago{a4}

t2 Alice{a2,a3} 75 Cermak{a5} Chicagot3 Gertrud 15 Ellis Berlin

personname addr

p1 Bob 10 W 31st{a1}

p2 Alice{a2,a3} 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicago{a4}

a2 75 Cermak{a5} Chicagoa3 15 Ellis Berlin

a1 =”since 2005” a2 =”artist” a3 =”single” a4 =”south side”a5 =”construction”

Slide 10 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 24: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Sensitivity to Query Rewrite

• Consider equivalent variation of example query

Example

SELECT P.name , A.street , A.city

FROM person P, address A

WHERE P.addr = A.street

resultname street city

t1 Bob 10 W 31st Chicago{a4}

t2 Alice{a2,a3} 75 Cermak{a5} Chicagot3 Gertrud 15 Ellis Berlin

Slide 11 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 25: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Where-Provenance Model

Sensitivity to Query Rewrite

• Consider equivalent variation of example query

Example

SELECT P.name , P.addr AS street , A.city

FROM person P, address A

WHERE P.addr = A.street

resultname street city

t1 Bob 10 W 31st{a1} Chicago{a4}

t2 Alice{a2,a3} 75 Cermak{a5} Chicagot3 Gertrud 15 Ellis Berlin

Slide 11 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 26: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Outline

1 Where-Provenance and DBNotesIntroductionWhere-Provenance ModelInsensitive Where-ProvenanceInsensitive Where-Provenance for Queries With UnionDBNotesRecap

Page 27: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Insensitive Where-Provenance

Idea

• For query q

• Consider all queries equivalent to q

• Annotate output with the union of the annotations producedfor each of these queries

• ⇒Gather all annotations that could have been copied

Slide 12 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 28: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Insensitive Where-Provenance

Definition (Insensitive Where-Provenance)

IWhere(q, t, a) =⋃

q′≡q Where(q′, t, a)

• Consider definition of union: Union annotations

• ⇒IWhere(q, t, a) = Where(⋃

q′≡q q′, t, a)

Problem

• There are infinitely many equivalent queries

• How to practically compute this?

Slide 13 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 29: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Insensitive Where-Provenance

Definition (Insensitive Where-Provenance)

IWhere(q, t, a) =⋃

q′≡q Where(q′, t, a)

• Consider definition of union: Union annotations

• ⇒IWhere(q, t, a) = Where(⋃

q′≡q q′, t, a)

Problem

• There are infinitely many equivalent queries

• How to practically compute this?

Slide 13 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 30: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Insensitive Where-Provenance

Definition (Insensitive Where-Provenance)

IWhere(q, t, a) =⋃

q′≡q Where(q′, t, a)

• Consider definition of union: Union annotations

• ⇒IWhere(q, t, a) = Where(⋃

q′≡q q′, t, a)

Problem

• There are infinitely many equivalent queries

• How to practically compute this?

Slide 13 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 31: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Equivalent Queries

• Even though the number of equivalent queries is infinite

• The number of queries with different annotation propagationbehaviour is finite• The number of cells in input/output is finite• For each output cells - either input cells annotations are added

or not• ⇒2IC∗OC potential annotation behaviours

• IC number of input cells• OC number of output cells

• ⇒large, but finite• ⇒unlikely that equivalent queries produce all these

combinations!

Slide 14 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 32: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Equivalent Queries

• Even though the number of equivalent queries is infinite

• The number of queries with different annotation propagationbehaviour is finite• The number of cells in input/output is finite• For each output cells - either input cells annotations are added

or not• ⇒2IC∗OC potential annotation behaviours

• IC number of input cells• OC number of output cells

• ⇒large, but finite• ⇒unlikely that equivalent queries produce all these

combinations!

Slide 14 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 33: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Equivalent Queries

• Even though the number of equivalent queries is infinite

• The number of queries with different annotation propagationbehaviour is finite• The number of cells in input/output is finite• For each output cells - either input cells annotations are added

or not• ⇒2IC∗OC potential annotation behaviours

• IC number of input cells• OC number of output cells

• ⇒large, but finite

• ⇒unlikely that equivalent queries produce all thesecombinations!

Slide 14 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 34: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Equivalent Queries

• Even though the number of equivalent queries is infinite

• The number of queries with different annotation propagationbehaviour is finite• The number of cells in input/output is finite• For each output cells - either input cells annotations are added

or not• ⇒2IC∗OC potential annotation behaviours

• IC number of input cells• OC number of output cells

• ⇒large, but finite• ⇒unlikely that equivalent queries produce all these

combinations!

Slide 14 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 35: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Query Basis

• E(q) = {q′ | q′ ≡ q}• Query Basis B(q) for q:

• ⋃q′∈B(q) q

′ =A⋃

q′∈E(q) q′

• A Query basis has same annotation behaviour as all equivalentqueries!

• How to find a query basis?

Slide 15 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 36: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Query Basis

• E(q) = {q′ | q′ ≡ q}• Query Basis B(q) for q:

• ⋃q′∈B(q) q

′ =A⋃

q′∈E(q) q′

• A Query basis has same annotation behaviour as all equivalentqueries!• How to find a query basis?

Slide 15 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 37: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Query Basis For SPJ Queries

Types of equivalent SPJ queries

1 Through attributes that are equated• Can project on either of them with different propagation

behaviour• See gene example• E.g., πa(σa=b(R)) ≡ πb(σb=a(R))

2 Through redundant joins• πA(R) ≡ πA(R ><A=A′ πA→A′(R))• Each tuple finds at least one join partner (itself)• Additional join partners can carry additional annotations

Slide 16 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 38: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Query Basis For SPJ Queries

Types of equivalent SPJ queries

1 Through attributes that are equated• Can project on either of them with different propagation

behaviour• See gene example• E.g., πa(σa=b(R)) ≡ πb(σb=a(R))

2 Through redundant joins• πA(R) ≡ πA(R ><A=A′ πA→A′(R))• Each tuple finds at least one join partner (itself)• Additional join partners can carry additional annotations

Slide 16 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 39: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Query Basis For SPJ Queries

Types of equivalent SPJ queries

1 Through attributes that are equated• Can project on either of them with different propagation

behaviour• See gene example• E.g., πa(σa=b(R)) ≡ πb(σb=a(R))

2 Through redundant joins• πA(R) ≡ πA(R ><A=A′ πA→A′(R))• Each tuple finds at least one join partner (itself)• Additional join partners can carry additional annotations

Slide 16 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 40: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Example Redundant Joins

Example

SELECT P.name , A.street , A.city

FROM person P, address A

WHERE P.addr = A.street

SELECT P.name , A.street , R.city

FROM person P, address A, address R

WHERE P.addr = A.street AND A.city = R.city

Slide 17 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 41: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Algorithm - Generate Query Basis

1 Initialize B(q) = ∅2 For each a = a′ in q with a in result schema

• Add variant of q where projection on a is replaced withprojection on a′

3 For each query in B(q) and attribute a from relation R inprojection• Add variant of q where an addition join ><a=a′ πa→a′(R) is

added and replace final projection of a with a′

Slide 18 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 42: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Example - Query Basis

Example

Step 2

q = πname,street,city (person><addr=street address)

q′ = πname,addr ,city (person><addr=street address)

Slide 19 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 43: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Example - Query Basis

Example

Step 3

q′′ =πname′,street,city (person><addr=street address

><name=name′ πname→name′(person))

q′′′ =πname,street,city ′(person><addr=street address

><city=city ′ πcity→city ′(address))

q′′′′ =πname,addr ′,city (person><addr=street address

><addr=addr ′ πaddr→addr ′(person))

q′′′′′ =πname,street′,city (person><addr=street address

><street=street′ πstreet→street′(address))

Slide 19 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 44: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Example - Query Basis

Example

resultname street city

t1 Bob 10 W 31st{a1} Chicago{a4}

t2 Alice{a2,a3} 75 Cermak{a5} Chicago{a4}

t3 Gertrud 15 Ellis Berlin

personname addr

p1 Bob 10 W 31st{a1}

p2 Alice{a2,a3} 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicago{a4}

a2 75 Cermak{a5} Chicagoa3 15 Ellis Berlin

Slide 19 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 45: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Discussion

Effect of Insensitive Where-Provenance

• Annotations from equated attributes are propagated too• Reasonable, can be seen as type of copying

• Annotations in attribute a from other tuples in the samerelation are propagated if the tuple has the same value inattribute a• E.g., city attribute in running example• debatable whether correct semantics for every use case!

Slide 20 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 46: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance

Discussion

Effect of Insensitive Where-Provenance

• Annotations from equated attributes are propagated too• Reasonable, can be seen as type of copying

• Annotations in attribute a from other tuples in the samerelation are propagated if the tuple has the same value inattribute a• E.g., city attribute in running example• debatable whether correct semantics for every use case!

Slide 20 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 47: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Outline

1 Where-Provenance and DBNotesIntroductionWhere-Provenance ModelInsensitive Where-ProvenanceInsensitive Where-Provenance for Queries With UnionDBNotesRecap

Page 48: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

From SPJ to Union queries

Observations

• Adding Union adds new types of equivalent queries

• ⇒every query that is contained in a query can be union-ed tothe query without changing the result

• ⇒this is independent of whether original query uses union!

Slide 21 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 49: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Query Containment with Union

• Relation R(a, b) and S(c)

• q = πa(R)

• q′ = (πa(R) ∪ πc(R ><a=c S))

• q ≡ q′

Slide 22 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 50: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Query Containment with Union

• Relation R(a, b) and S(c)

• q = πa(R)

• q′ = (πa(R) ∪ πc(R ><a=c S))

• q ≡ q′

Slide 22 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 51: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Query Containment with Union

• Relation R(a, b) and S(c)

• q = πa(R)

• q′ = (πa(R) ∪ πc(R ><a=c S))

• q ≡ q′

Slide 22 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 52: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Generating Query Basis - Union

Step 4

• For each attribute a in result of q and relation R withattribute b in database• Add query q ∪ q′ to B(q) with• q′ is q with additional join with R on a = b

Slide 23 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 53: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Example Query Basis

Example

Step 2

q = πname,street,city (person><addr=street address)

q′ = πname,addr ,city (person><addr=street address)

Slide 24 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 54: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Example Query Basis

Example

Step 3

q3 =πname′,street,city (person><addr=street address

><name=name′ πname→name′(person))

q′3 =πname,street,city ′(person><addr=street address

><city=city ′ πcity→city ′(address))

q′′3 =πname,addr ′,city (person><addr=street address

><addr=addr ′ πaddr→addr ′(person))

q′′′3 =πname,street′,city (person><addr=street address

><street=street′ πstreet→street′(address))

Slide 24 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 55: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Example Query Basis

Example

Step 4Joins on person.name

q4 =q ∪ πaddr ′,street,city (person><addr=street address

><name=addr ′ πaddr→addr ′(person))

q′4 =q ∪ πstreet′,street,city (person><addr=street address

><name=street′ πstreet→street′(address))

q′′4 =q ∪ πcity ′,street,city (person><addr=street address

><name=city ′ πcity→city ′(address))

Slide 24 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 56: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Example Query Basis

Example

Step 4Joins on address.street

q′′′4 =q ∪ πname,name′,city (person><addr=street address

><street=name′ πname→name′(person))

. . .

Joins on address.city

. . .

Slide 24 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 57: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Example Query Basis

Example

resultname street city

t1 Bob 10 W 31st{a1} Chicago{a4}

t2 Alice{a2,a3} 75 Cermak{a5} Chicago{a4}

t3 Gertrud 15 Ellis Berlin

personname addr

p1 Bob 10 W 31st{a1}

p2 Alice{a2,a3} 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicago{a4}

a2 75 Cermak{a5} Chicagoa3 15 Ellis Berlin

Slide 24 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 58: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Discussion

• An output cell (q, t, a) will carry all annotations from cells inthe database that have the same attribute value

• Rationale: annotations are bound to values not cells

• Still strange in a lot of cases.• E.g., numeric attributes age in person and street number in

company• Query SELECT * FROM person

• ⇒Annotations on company street numbers should carry overto the age of a person?

Slide 25 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 59: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Discussion

• An output cell (q, t, a) will carry all annotations from cells inthe database that have the same attribute value

• Rationale: annotations are bound to values not cells

• Still strange in a lot of cases.• E.g., numeric attributes age in person and street number in

company• Query SELECT * FROM person

• ⇒Annotations on company street numbers should carry overto the age of a person?

Slide 25 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 60: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Insensitive Where-Provenance for Queries With Union

Discussion

• An output cell (q, t, a) will carry all annotations from cells inthe database that have the same attribute value

• Rationale: annotations are bound to values not cells

• Still strange in a lot of cases.• E.g., numeric attributes age in person and street number in

company• Query SELECT * FROM person

• ⇒Annotations on company street numbers should carry overto the age of a person?

Slide 25 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 61: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Outline

1 Where-Provenance and DBNotesIntroductionWhere-Provenance ModelInsensitive Where-ProvenanceInsensitive Where-Provenance for Queries With UnionDBNotesRecap

Page 62: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

DBNotes

Approach

• Add annotations to DBMS

• User queries DB with pSQL:• USPJ SQL queries• plus language constructs to define how annotations are

propagated

• Implement as middleware layer over standard DBMS

• Store annotations + data in standard DBMS

• Middleware translates pSQL into SQL queries DB

• DB results are transformed into annotated relations

Slide 26 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 63: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

pSQL

pSQL query block

SELECT DISTINCT selectlistFROM fromlistWHERE wherelistPROPAGATE (DEFAULT

| DEFAULT -ALL

| R1.a1 TO b1, . . ., Rn.an TO bn)

• pSQL query is union of query blocks

• selectlist: Select attributes and rename (R1.a1 → b1)

• fromlist: List of relations with potential aliases

• wherelist: conjunction of attribute-attribute orattribute-constant equalities

Slide 27 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 64: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

pSQL - the PROPAGATE clause

• Defines how annotations are copied from input to output

• Three different options:• Custom• DEFAULT

• DEFAULT-ALL

Slide 28 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 65: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

pSQL - the PROPAGATE clause

• Defines how annotations are copied from input to output

• Three different options:• Custom• DEFAULT

• DEFAULT-ALL

Custom

• User defines annotation propagation at the query level

• R1.a1TOb1 = add annotations from input attribute R1.a1 toresult attribute b1.

Slide 28 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 66: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

pSQL - the PROPAGATE clause

• Defines how annotations are copied from input to output

• Three different options:• Custom• DEFAULT

• DEFAULT-ALL

DEFAULT

• Propagate annotations from all cells in the Where-provenance

Slide 28 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 67: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

pSQL - the PROPAGATE clause

• Defines how annotations are copied from input to output

• Three different options:• Custom• DEFAULT

• DEFAULT-ALL

DEFAULT-ALL

• Propagate annotations from all cells in the insensitiveWhere-provenance

Slide 28 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 68: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

pSQL example query

Example

SELECT P.name , A.street , A.city

FROM person P, address A

WHERE P.addr = A.street

PROPAGATE P.name TO name ,

P.addr TO street ,

P.addr TO city

personname addr

p1 Bob 10 W 31st{a1}

p2 Alice{a2,a3} 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicago{a4}

a2 75 Cermak{a5} Chicagoa3 15 Ellis Berlin

Slide 29 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 69: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

pSQL example query

Example

resultname street city

t1 Bob 10 W 31st{a1} Chicago{a1}

t2 Alice{a2,a3} 75 Cermak Chicagot3 Gertrud 15 Ellis Berlin

personname addr

p1 Bob 10 W 31st{a1}

p2 Alice{a2,a3} 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicago{a4}

a2 75 Cermak{a5} Chicagoa3 15 Ellis Berlin

Slide 29 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 70: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Storage Scheme

• Schema• For each attribute A in relation R• Add attribute Aa that stores annotations on A

• Instance• Each tuple can store one annotation on each attribute• Duplicate tuples to fit in more annotations

Example

• R(a, b) will be R(a, aa, b, ba)

Ra b

r1 1a1,a2 2a3

r2 2a4 3

R’a aa b ba

r1 1 a1 2 a3

r1 1 a2 2 -r1 2 a4 3 -

Slide 30 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 71: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

DBNotes - Approach

DBNotes

pSQL query

SQL query Result Relation

Translator

DBMS

Post Processor

Annotated Result

Slide 31 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 72: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Translator

1 Transform propagate clauses into Custom form

2 Build bins for each output attribute b• Add R.a to bin if R.a TO b in propagate clause

3 Generate intermediate queries Q1 to Qn

• While at least on bin not empty• Take one attribute a from each bin• Generate query that projects each aa to ba• Use NULL TO ba if bin is empty

4 Generate wrapper query• orderlist all attribute of pSQL query

Slide 32 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 73: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Translator

1 Transform propagate clauses into Custom form

2 Build bins for each output attribute b• Add R.a to bin if R.a TO b in propagate clause

3 Generate intermediate queries Q1 to Qn

• While at least on bin not empty• Take one attribute a from each bin• Generate query that projects each aa to ba• Use NULL TO ba if bin is empty

4 Generate wrapper query• orderlist all attribute of pSQL query

Slide 32 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 74: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Translator

1 Transform propagate clauses into Custom form

2 Build bins for each output attribute b• Add R.a to bin if R.a TO b in propagate clause

3 Generate intermediate queries Q1 to Qn

• While at least on bin not empty• Take one attribute a from each bin• Generate query that projects each aa to ba• Use NULL TO ba if bin is empty

4 Generate wrapper query• orderlist all attribute of pSQL query

SELECT DISTINCT *

FROM (Q1 UNION . . . UNION Qn)

ORDER BY orderlist

Slide 32 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 75: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Post-Processor

• Gather all annotations for each result tuple

• Works like aggregation

1 Initialize each attribute annotation sets to empty set2 For each tuple add annotations to sets3 If tuple has different attribute value

• Output annotated result tuple• Start over at (1)

Slide 33 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 76: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Example Processing

SELECT P.name , A.street , A.city

FROM person P, address A

WHERE P.addr = A.street

PROPAGATE P.name TO name ,

P.addr TO street ,

P.addr TO city

personname addr

p1 Bob 10 W 31st{a1}

p2 Alice{a2,a3} 75 Cermakp2 Gertrud 15 Ellis

addressstreet city

a1 10 W 31st Chicago{a4}

a2 75 Cermak{a5} Chicagoa3 15 Ellis Berlin

Slide 34 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 77: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Example Processing - Storage

person′

name namea addr addrap1 Bob 10 W 31st a1

p2 Alice a2 75 Cermakp2 Alice a3 75 Cermakp2 Gertrud 15 Ellis

address′

street streeta city cityaa1 10 W 31st Chicago a4

a2 75 Cermak a5 Chicagoa3 15 Ellis Berlin

Slide 34 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 78: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Example Processing - Intermediate Queries

• Bin name: {P.name}• Bin street: {P.addr}• Bin city: {P.addr}

SELECT P.name , P.namea,

A.street , P.addra AS streeta,

A.city , P.addra AS citya

FROM person ′ P, address ′ A

WHERE P.addr = A.street

Slide 34 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 79: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Example Processing - Wrapper Query

SELECT DISTINCT *

FROM (SELECT P.name , P.namea,

A.street , P.addra AS streeta,

A.city , P.addra AS citya

FROM person ′ P, address ′ A

WHERE P.addr = A.street) AS i

ORDER BY name , street , city

Slide 34 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 80: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Example Processing - Post-processing

resultname namea street streeta city citya

t1 Bob 10 W 31st a1 Chicago a1

t2 Alice a2 75 Cermak Chicagot2 Alice a3 75 Cermak Chicagot3 Gertrud 15 Ellis Berlin

Slide 34 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 81: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Example Processing - Post-processing

resultname street city

t1 Bob 10 W 31st{a1} Chicago{a1}

t2 Alice{a2,a3} 75 Cermak Chicagot3 Gertrud 15 Ellis Berlin

resultname namea street streeta city citya

t1 Bob 10 W 31st a1 Chicago a1

t2 Alice a2 75 Cermak Chicagot2 Alice a3 75 Cermak Chicagot3 Gertrud 15 Ellis Berlin

Slide 34 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 82: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Discussion

Types of Annotations

• On tuples

• On rectangular regions in relations

• Queries define what to annotate

Slide 35 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 83: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

DBNotes

Discussion

Annotation Propagation

• DBNotes propagation is schema based!

• allow conditions• over data: if a > 5 then annotate with annotations from

attribute b• over annotations: propagate if annotation contains word

“important”

• manipulation of annotations• Extract pattern• Concatenate annotations• . . .

Slide 35 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 84: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Recap

Outline

1 Where-Provenance and DBNotesIntroductionWhere-Provenance ModelInsensitive Where-ProvenanceInsensitive Where-Provenance for Queries With UnionDBNotesRecap

Page 85: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Recap

Recap

Where-Provenance

• Rationale: Models copy behaviour

• Representation: Set of cells• Cell is R, t, a

• Syntactic Definition:• For USPJ queries

• Variants:• Insensitive: Union for of provenance for equivalent queries

Slide 36 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 86: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Recap

Recap

Annotation Propagation

• One semantics is based on Where-provenance

• User defined also possible

Slide 36 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 87: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Recap

Recap

DBNotes

• Middleware implementations

• pSQL language

• Translated into SQL over relational storage schema

• Postprocessing

Slide 36 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 88: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Recap

Provenance Model Comparison

PropertyWhy Lin PI-CS Where

Representation Set of Set of Tuples List of Set of Tuples Set/Bag of List ofTuples

Sets of AttributeValue Positions

Granularity Tuple Tuple Tuple Attribute ValueLanguage Support USPJ ASPJ-Set ASPJ-Set + Nested

subqueriesU-SPJ

Semantics Set Set + Bag∗ Bag SetVariants Wit, Why, IWhy Set/Bag Influence + Copy SPJ + Insensitive +

Insensitive UnionDefinition Decl. - Synt. -

Decl./Synt.Decl. + Synt. Decl. + Synt. Synt.

Design Principles Sufficiency - No falsepositives

Sufficiency + Nofalse negatives + nofalse positives

Sufficiency + Nofalse negatives + Nofalse positives

Copying

Systems - WHIPS Perm DBNotesInsensitivity Yes - No - Yes No No No - Yes - Yes

Slide 37 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance

Page 89: CS 595 - Hot topics in database systems: Data Provenance ...cs.iit.edu/~cs595/pdfs/I_4_dbnotes.pdf · Setof inputattribute values (locations) Slide 1 of 38 Boris Glavic CS 595 - Hot

Where-Provenance

Recap

Literature

James Cheney, Laura Chiticariu, and Wang-Chiew Tan.

Provenance in Databases: Why, How, and Where.Foundations and Trends in Databases, 1(4):379–474, 2009.

Laura Chiticariu, Wang-Chiew Tan, and Gaurav Vijayvargiya.

DBNotes: a Post-it System for Relational Databases based on Provenance.In SIGMOD ’05: Proceedings of the 31th SIGMOD International Conference on Management of Data,942–944, 2005.

Deepavali Bhagwat, Laura Chiticariu, Wang-Chiew Tan, and Gaurav Vijayvargiya.

An Annotation Management System for Relational Databases.The VLDB Journal, 14(4):373–396, 2005.

Deepavali Bhagwat, Laura Chiticariu, Wang-Chiew Tan, and Gaurav Vijayvargiya.

An Annotation Management System for Relational Databases.In VLDB ’04: Proceedings of the 30th International Conference on Very Large Data Bases, 900–911, 2004.

Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan.

Why and Where: A Characterization of Data Provenance.In ICDT ’01: Proceedings of the 8th International Conference on Database Theory, 316–330, 2001.

Slide 38 of 38 Boris Glavic CS 595 - Hot topics in database systems: Data Provenance