Lineage Tracing in Data Warehouses Yingwei Cui Stanford University Database Group.

85
Lineage Tracing in Data Lineage Tracing in Data Warehouses Warehouses Yingwei Cui Stanford University Database Group
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Lineage Tracing in Data Warehouses Yingwei Cui Stanford University Database Group.

Lineage Tracing in DataLineage Tracing in Data WarehousesWarehouses

Yingwei Cui

Stanford University Database Group

2

Motivation: Data WarehousingMotivation: Data Warehousing

Data Warehouse

Source 1 Source 2 Source 3

Lucrative Fields

Databases $8800K Theory $320K

Networks $800K

StudentsEnrollmentsCourses

Wow?!

Databases $8800K

3

Courses Enrollments Students

Oh, I see...

Source 1 Source 2 Source 3

Lineage Tracer

Data Warehouse

Lucrative Fields

Database 1800 Theory $320K

Networks $800K Databases $8800K

CS145 Ted CS154 Joe

CS244 BobCS145 Ann CS245 Jane

……

Bob MS $1K Jane Web $5K

Ann BS $1K

Joe BS $1KTed Web $5K … … …

CS145 Databases CS154 Theory

CS244 Networks CS245 Databases

4

The Data Lineage ProblemThe Data Lineage Problem

Data warehouses integrate data from multiple sources for analysis and mining

Data lineageData lineage: given data item o in the warehouse, which data items in the sources were used to derive o?

Sometimes called “drill-through” in industry

5

ChallengesChallenges

Warehouse of relational views over relational sources– What is a good formal definition for lineage?– How do we trace data lineage for arbitrary views?– How do we make it efficient?

Warehouse defined by graph of data transformations– No fixed, well-defined relational operators– Large transformation sequences and graphs

6

ContributionsContributions Thesis contributions

– Basics of lineage tracing for relational views [TODS’00]

– Lineage tracing system prototype [ICDE’00 demo]

– Performance study and optimizations [ICDE’00, DMDW’00]

– Lineage tracing for general data transformations [VLDB’01]

– View update for deletions using data lineage [TechReport’01]

Other contributions (joint with others)– Data warehousing performance issue [VLDB’00]

– Data management for wireless networks [Infocom’98, Globecom’97]

7

Outline of TalkOutline of Talk

Part 1: Lineage tracing for relational views

Part 2: Lineage tracing for general data transformations

Part 3: View update for deletions using data lineage (time permitting)

8

Part 1: Part 1: Lineage Tracing for Relational ViewsLineage Tracing for Relational Views

Declarative definition of data lineage

Lineage tracing algorithms

Using auxiliary views for efficient lineage tracing

Experimental results (small sample)

9

Views We ConsiderViews We Consider

Relational algebra

Arbitrary use of aggregation

Set semantics

Also in thesis– Set operators – Bag semantics

R S T

V

10

V

V = ( (R S)) Y,sum(Z) X >Z

R

S

X Y Z3 2a

bb

88

06

Y sum

a 2b 6

X Y Z3 2a8 08 98 6

bbb

X Y3 a

Y Z

2a0b9b6b

8 b

Y,sum(Z)X >Z

T U

b 6b8 0b8 6

8 0

8 6

b

b0b

6b

8 b

Simple Lineage ExampleSimple Lineage Example

11

Lineage for Relational OperatorsLineage for Relational Operators

Unary relational operators

op

R

R* t

Lineage of t according to op is the maximal subset R* R such that

(1) op(R*) = {t}(2) t* R*: op({t*})

12

Example 1

R

X Y Z3 2a

bb

88

06

X Y Z3 2a8 08 98 6

bbb

X >Z

Lineage of t according to op is the maximal subset R* R such that

(1) (1) opop((RR*) = {*) = {tt}}(2) (2) tt* * RR*: *: opop({({tt*}) *})

Lineage for Relational OperatorsLineage for Relational Operators

b8 68 6b

13

Example 2

R

X Y Z3 2a

bb

88

06

Y sum

a 2b 6

Y,sum(Z)

Lineage of t according to op is the maximalmaximal subset R* R such that

(1) op(R*) = {t}(2) t* R*: op({t*})

Lineage for Relational OperatorsLineage for Relational Operators

b 6b8 0b8 6

14

N-ary relational operators (e.g., )

Lineage for Relational OperatorsLineage for Relational Operators

Lineage of t according to op is the maximalmaximal subsets Ri* Ri for i = 1..n such that

(1) op(R1*, …, Rn*) = {t}(2) ti* Ri*: op(R1, …, {ti*}, …, Rn)

op

R1*

*R2

R2

R1

15

Lineage for Relational ViewsLineage for Relational Views

Lineage of a tuple set is union of lineage of each tuple in the set

Lineage for views is defined recursively

opop1 2

VU

R1

R2

t

U*

*

*

R1

R2

Lineage of t is R1*, R2*

16

Lineage TracingLineage Tracing

Convert view into aa segmented normal form segmented normal form

E1 … En Each segment

Generate one tracing query tracing query for each segment

Apply tracing queries recursively

– # non-top + 1

Lineage result is unaffected by normalization and Lineage result is unaffected by normalization and segment-level tracingsegment-level tracing

17

Tracing Query for One SegmentTracing Query for One Segment

V Y sum

a 2b 6

R

S

TQ = Split ( (R S))X >Z Y=b R,S

Y,sum(Z)

X >Z

b

6

b

X Y3 a8

Y Z

2a09b

b

R*={(8,b)}, S*={(b,0),(b,6)}

b 0

6b

b8

b 6

V = ( (R S)) X >ZY,sum(Z)

18

Recursive Tracing ProcedureRecursive Tracing Procedure

V W avg

p 4q 6

U

R

S

X Y3 a

Y Z

2a0b9b6b

8 b

T

Y sum

a 2b 6

Y Wa p

pq

bb

TQ = Split ( (U T))W=q1 U,T TQ = Split ( (R S))X >Z Y=b2 R,S

b 6

qb

8 b

0b

6b

q 6

R*={(8,b)}, S*={(b,0),(b,6)}, T*={(b,q)}

8 b

0b

6bqb

V = (( (R S)) T)) W, avg(sum) Y,sum(Z) X >Z

19

Making It EfficientMaking It Efficient

Source accesses are usually expensive or impossible

Need some intermediate results for lineage tracing

Store auxiliary viewsauxiliary views at the warehouse– Reduce or eliminate source accesses– Reduce recomputation of intermediate results

20

Auxiliary ViewsAuxiliary Views

There are many possible auxiliary views

For single-segment views– Identified 10 possible auxiliary view schemes– Studied performance tradeoffs

For arbitrary views– Hard optimization problem– Exhaustive and heuristic algorithms– Performance study

R1 … Rn

21

+ Always improve lineage tracing

– Must be maintained when sources change

+ Can also help with maintenance of original user views

Auxiliary Views: Performance TradeoffsAuxiliary Views: Performance Tradeoffs

22

Auxiliary View Schemes for Auxiliary View Schemes for Single-Segment ViewsSingle-Segment Views

Parameters:- 3-way SPJ view- sources: 10MB each- disk: 1Mbps- network: 50kbps- 1000 operations- q/u ratio = 4

Measurements:- tracing time- maintenance time

23

Auxiliary View Selection Auxiliary View Selection Algorithms for Arbitrary ViewsAlgorithms for Arbitrary Views

24

Part 2: Part 2: Transformation GraphsTransformation Graphs

Lineage definition

Tracing algorithms

Combining transformations for lineage tracing

Experimental results (tiny sample) Source 1

Data Warehouse

Source 2 Source 3

T6

T4 T5

T3

T2

T1

25

T1

T3 T4 T6 T7T5

id cust date prod-list1 A 2/8/99 1(10),2(10)2 C 4/5/99 2(5),3(10) 3 D 6/1/99 1(20),2(10) 4 B 8/6/99 1(10),3(5)5 D 10/8/99 1(5),3(10) 6 B 12/1/99 2(10),3(10)

id name price valid1 imac 1200 10/1/98- 2 vaio 2400 6/1/98-9/1/99 2 vaio 1800 9/2/99- 3 palm 500 2/1/98-7/1/98 3 palm 400 7/2/98-9/1/99 3 palm 300 9/2/99-

name avg3 Q4 palm 2K 6Kpalmpalm 2K 6K 2K 6K

3 palm 400 7/2/98-9/1/993 palm 400 7/2/98-9/1/99 3 palm 300 9/2/99-3 palm 300 9/2/99-

2 C 4/5/99 2(5),3(10)2 C 4/5/99 2(5),3(10)

4 B 8/6/994 B 8/6/99 1(10),3(5)1(10),3(5)5 D 10/8/99 1(5),3(10)5 D 10/8/99 1(5),3(10) 6 B 12/1/99 2(10),3(10)6 B 12/1/99 2(10),3(10)

SalesJump

Order

Product T2

Transformation Example Transformation Example

selection

“join”split pivot projectionselectionprojection

26

Lineage for General TransformationsLineage for General Transformations

A transformationtransformation can be an arbitrary program

T

select … from … where … main(int argc, char** argv) {…} sed “s/string1/string2/g” …

??

– One extreme: relational operators– Another extreme: we know nothing about T– Middle ground: based on transformation properties

27

Transformation PropertiesTransformation Properties

Transformation classes

Additional properties– Transformation subclasses– Schema information– Provided inverse or tracing procedure

28

i II: T(I) = T({i})

dispatcher

T*(o) = {i | oT({i})}

Transformation ClassesTransformation Classes

29

Dispatcher ExampleDispatcher Example

id cust date prod-list1 A 2/8/99 1(10),2(10)2 C 4/5/99 2(5),3(10) 3 D 6/1/99 1(20),2(10) 4 B 8/6/99 1(10),3(5)5 D 10/8/99 1(5),3(10) 6 B 12/1/99 2(10),3(10)

Orderid cust date pid quant1 A 2/8/99 1 101 A 2/8/99 2 10 : : : 5 D 10/8/99 1 55 D 10/8/99 3 10 6 B 12/1/99 2 106 B 12/1/99 3 10

T1

O1

5 D 10/8/99 1(5),3(10)

5 D 10/8/99 1 55 D 10/8/99 3 10 5 D 10/8/99 3 10

5 D 10/8/99 1(5),3(10)

30

i II: T(I) = T({i})

dispatcher

I and T(I)={o1…on}: unique partition I1..In of I s.t. T(Ik) = {ok}

aggregator

T*(ok) = IkT*(o) = {i | oT({i})}

Transformation ClassesTransformation Classes

31

Aggregator ExampleAggregator Example

T4name Q1 Q2 Q3 Q4imac 12K 24K 12K 6K vaio 24K 12K 24K 18Kpalm 0K 4K 2K 6K

O3

O4

oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10 2 vaio 4/5/99 2400 5

3 imac 6/1/99 1200 203 vaio 6/1/99 2400 10 4 imac 8/6/99 1200 104 palm 8/6/99 400 55 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10

2 palm 4/5/99 400 10 2 palm 4/5/99 400 10

4 palm 8/6/99 400 5

6 palm 12/1/99 300 10

palm 0K 4K 2K 6K 5 palm 10/8/99 300 10

palm 0K 4K 2K 6K

2 palm 4/5/99 400 10

4 palm 8/6/99 400 5

6 palm 12/1/99 300 10

5 palm 10/8/99 300 10

32

i II: T(I) = T({i})

dispatcher

I and T(I)={o1…on}: unique partition I1..In of I s.t. T(Ik) = {ok}

aggregator black-box

All others

T*(ok) = Ik T*(o) = IT*(o) = {i | oT({i})}

Transformation ClassesTransformation Classes

33

Most transformations are dispatchers, aggregators, or their compositions

A transformation can be both dispatcher and aggregator– Lineage definitions are equivalent

Transformations can be relational operators– Lineage definitions same as relational definitions

Transformation ClassesTransformation Classes

34

Transformation PropertiesTransformation Properties

Transformation classes

Additional properties– Transformation subclasses– Schema information– Provided inverse or tracing procedure

35

Transformation SubclassesTransformation Subclasses

Permit more efficient lineage tracing

Filter is a special dispatcher– Each input data item produces itself or nothing

Context-free aggregator– Whether two input data items are in the same partition

is independent of other items

Key-preserving aggregator– Any subset of an input partition always produces the

same output key

36

Tracing Example: AggregatorsTracing Example: Aggregators Consider T(I) = {o1…on}

Tracing the lineage of o for aggregator– Partition input I into I1…In such that T(Ik) = {ok}– Return Ik such that T(Ik) = {o}

Tracing the lineage of o for context-free aggregator– Partition input I into I1…In such that |T(Ik)| = 1– Return Ik such that T(Ik) = {o}

37

Schema InformationSchema Information

Input schema A=(A1…An) and key Akey

Output schema B=(B1…Bn) and key Bkey

Schema mappings: f(A) B and A g(B)

Transformations with special schema mappings– Forward key-map: f(A) Bkey – Backward key-map: Akey g(B) – Backward total-map: A g(B)

38

Tracing Example: Forward Key-MapsTracing Example: Forward Key-Maps

T4

O3 O4name Q1 Q2 Q3 Q4imac 12K 24K 12K 6K vaio 24K 12K 24K 18Kpalm 0K 4K 2K 6K palm 0K 4K 2K 6K

oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10 2 vaio 4/5/99 2400 5

3 imac 6/1/99 1200 203 vaio 6/1/99 2400 10 4 imac 8/6/99 1200 104 palm 8/6/99 400 55 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10

2 palm 4/5/99 400 10 2 palm 4/5/99 400 10

4 palm 8/6/99 400 5

6 palm 12/1/99 300 10

5 palm 10/8/99 300 10

39

Other PropertiesOther Properties

Provided Tracing Procedure

Provided Transformation Inverse T –1

– If T is an aggregator, then o’s lineage is T –1({o}) – Not always true for dispatchers or black-boxes

40

Tracing ProceduresTracing Procedures

Property Procedure # T Calls # Accesses

dispatcher TraceDS O(|I|) O(|I|)

aggregator TraceAG O(2|I|) O(2|I|)

black-box return I; 0 O(|I|)

filter return o; 0 0

context-free aggr. TraceCF O(|I|2) O(|I|2)

key-preserving aggr. TraceKP O(|I|) O(|I|)

forward key-map TraceFM 0 O(|I|)

backward key-map TraceBM 0 O(|I|)

backward total-map TraceTM 0 0

Provided tracing-proc. provided ? ?

41

Property HierarchyProperty HierarchyANY

provided tracing-proc.

or inverse

black-boxaggregator

dispatchercontext-free aggr.

key-preserving aggr.

filter

forward key-mapbackward key-map

total-map

42

Summary of Our Approach for Summary of Our Approach for One TransformationOne Transformation

Properties are provided with transformations– Specified by the transformation author – Declared in prepackaged transformations– Derived using recent techniques [Clio01, RB01]

The best property of a transformation is selected based on the hierarchy

The tracing procedure using the best property is called at tracing time

Indexing techniques

43

Transformation SequencesTransformation Sequences

Naive algorithm traces backwards one transformation at a time– Need all intermediate results– Poor performance for long sequences

T1 T2 T3 TnI O

44

T1 T2 T3 TnI O

T’ TnI O

Combine transformations and trace as one– Reduces number of intermediate results– By combining judiciously

Reduces tracing cost Doesn’t lose accuracy

Transformation SequencesTransformation Sequences

45

Overall ApproachOverall Approach

Algorithm for deriving properties of T = T1 T2 from properties of T1 and T2

Coarse-grained cost metric for a tracing sequence based on transformation properties

Greedy algorithm

46

Example of Greedy AlgorithmExample of Greedy Algorithm

T4 T6 T7 T5

fkmap(2) btmap(1) filter(1) bkmap(2)

blkbox(5)

blkbox(5) bkmap(2)

bkmap(2)fkmap(2) btmap(1)

fkmap(2)T4’ T6 T7

bkmap(2)filter(1)

bkmap(2)T6’

fkmap(2)T4’

47

Multiple-Input ExampleMultiple-Input Example

T3

id cust date pid quant1 A 2/8/99 1 101 A 2/8/99 2 10 : : : 5 D 10/8/99 1 55 D 10/8/99 3 10 6 B 12/1/99 2 106 B 12/1/99 3 10

id name price valid1 imac 1200 10/1/98- 2 vaio 2400 6/1/98-9/1/99 2 vaio 1800 9/2/99- 3 palm 400 7/2/98-9/1/99 3 palm 300 9/2/99-

oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10 : : : 5 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10

5 palm 10/8/99 300 10

5 D 10/8/99 3 10

3 palm 300 9/2/99-

dispatcher

dispatcher

O3

O1

O2

48

Transformation GraphsTransformation Graphs

I1

I2O

Definition time – Specify properties of each transformation in graph

49

Transformation GraphsTransformation Graphs

Definition time – Specify properties of each transformation in graph– Consider each path as a transformation sequence– Combine transformations in each sequence

I1

I2O

50

Transformation GraphsTransformation Graphs

Load time – Save intermediate results and build indices as desired

Tracing time – Trace lineage through each sequence – Combine results

Definition time

I1

I2O

51

Example RevisitedExample Revisited

T1

T3 T4 T6

ProductSalesJumpT7T5

Order

T2

bkmapbkmap

dispatcher fkmap filterbtmap

filter

dispatcher

T1

T3 T4 T6

ProductSalesJumpT7T5

Order

T2

bkmapfkmapbkmap

dispatcher

52

Experimental ResultsExperimental Results

Transformation graph based on a complex TPC-D query (Q12)

53

Part 3: Part 3: View Update Using Data LineageView Update Using Data Lineage

View update: translating updates on views to updates on base tables

Obvious connection to lineage in case of view deletions

Fresh approach with improved results

54

View Update Translations: View Update Translations: Valid and Exact Valid and Exact

V

t

R1 R2 Rn

……

55

V

t

R1 R2 Rn

……

View Update Translations: View Update Translations: Valid and Exact Valid and Exact

56

V

t

R1 R2 Rn

……

View Update Translations: View Update Translations: Valid and Exact Valid and Exact

57

Our AlgorithmOur Algorithm

Uses lineage to:– Find an exact translation whenever one exists

(in linear time for many cases)– Find a “good” translation when no exact translation exists

Fully automatic

Previous approaches– Don’t always find an exact translation– Often require user input– Consider restricted classes of views

58

Related WorkRelated Work

Schema-level lineage tracing (annotation-based)

[BB99, HQGW93, RS98]

Drill-down or drill-through on data cubes [Gray95]

“Weak inverse” for transformations [WS97]

Warehouse load resumption [LGMW00]

Data cleaning [GFSS+01]

View update [DB82, Mas84, Kel85]

59

ConclusionsConclusions

Data lineage problem in two scenarios– Warehouse defined by relational views– Warehouse defined by general data transformations

For both scenarios, we provide:– Formal lineage definition– Lineage tracing algorithms– Optimization techniques– System prototype and performance study

Use lineage for the view update problem

60

Some Open ProblemsSome Open Problems

Lineage of “missing” view or base tuples

Deriving transformation properties

Combining with annotation-based approach

View update– Translation ambiguity– Base table constraints– Multiple interacting views

61

62

Lineage ApplicationsLineage Applications

On-line analytical processing (OLAP)

Scientific databases

Sensory and monitoring systems

Data cleaning

Warehouse resumption

Data security

View update

63

Convert view definition into aa segmented normal form segmented normal form

Generate one tracing querytracing query for each ASPJ segment Apply tracing queries top-down through view definition Lineage result is unaffected by normalizationLineage result is unaffected by normalization

R S T

V

W

R S T

V

W

Lineage TracingLineage Tracing

64

V

K1 X1 a

K2 X Z2b4a1b8d

b2

R

S1234

3 c

Y

9b5

X avg

a 4b 6

pqr

V = ( (R S)) X,avg(Z) K1<K2

TQ = Split ( (R S))K1<K2 X=b R,S

3b

b2

3

9b5

q

b 6

Tracing ExampleTracing Example

65

Split Lineage Tables (SLT)Split Lineage Tables (SLT)

V

K1 X1 a

K2 X Z2b4a3b8d

b2

R

S1234

3 c

Y

9b5

X avg

a 4b 6

pqr

K1 X1 a

b2

K2 X Z4a2

Y

1b39b5

R'

S'

Split

pqb2 q

3b39b5

b 6

66

Base Table Projections (BP)Base Table Projections (BP)

V X avg

a 4b 6

R

S K2 X Z2b4a1b8d

1234

8b5

K1 X1 a

b23 c

Ypqr

3b

b2

3

9b5

q

b 6

K1 X1 a

b23 c

K2 X babd

1234

b5

R’

S’

b2

b3

b5

67

Context-Free Aggregator Context-Free Aggregator ExampleExample

T4name Q1 Q2 Q3 Q4imac 12K 24K 12K 6K vaio 24K 12K 24K 18Kpalm 0K 4K 2K 6K

O3

O4

oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10

3 imac 6/1/99 1200 203 vaio 6/1/99 2400 10 4 imac 8/6/99 1200 104 palm 8/6/99 400 55 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10

2 vaio 4/5/99 2400 52 palm 4/5/99 400 10

1 imac 2/8/99 1200 10

3 imac 6/1/99 1200 20

1 vaio 2/8/99 2400 10 2 vaio 4/5/99 2400 5

3 vaio 6/1/99 2400 10

2 palm 4/5/99 400 10

4 imac 8/6/99 1200 10

5 imac 10/8/99 1200 5

6 vaio 12/1/99 1800 10

4 palm 8/6/99 400 5

5 palm 10/8/99 300 10

6 palm 12/1/99 300 10

palm 0K 4K 2K 6K

2 palm 4/5/99 400 10

4 palm 8/6/99 400 5

5 palm 10/8/99 300 10

6 palm 12/1/99 300 10

68

Tracing Example 1Tracing Example 1

Tracing procedure for context-free aggregators– Partition input I into I1…In such that |T(Ik)| = 1;– Return Ik s.t. T(Ik) = {o};

69

Lineage EquivalenceLineage Equivalence

Lineage of equivalent SPJ views are equivalent

Not for ASPJ views

R

UX Y Z3 2a

bb

88

06

Y sum

a 2b 6

Y,sum(Z)b 6b8 0

b8 6

Lineage of equivalent SPJ views are equivalent

Not for ASPJ views

70

Lineage EquivalenceLineage Equivalence

Lineage of equivalent SPJ views are equivalent

Not for ASPJ views

R

UX Y Z3 2a

bb

88

06

Y sum

a 2b 6

B=0 Y,sum(Z)b 6

b8 6

71

Non-Context-Free Example Non-Context-Free Example

72

Non-Context-Free ExampleNon-Context-Free Example

73

Indices Help!Indices Help!

Conventional index – On input key Akey for a backward key-map with

Akeyg(B)

Functional index– On f(A) for a forward key-map with f(A)Bkey – On T(A) for a dispatcher

Lineage index – Mapping the key of each output data item o to

the keys of input data items in o’s lineage

74

Experimental ResultsExperimental Results

Tracing through an “SP” transformation over TPC-D table PartSupp

75

Tracing Through SequencesTracing Through Sequences

Tracing cost estimation– Divide properties into 5 groups– T’s cost level depends on the group of its best property – Associate a sequence with N[1..5] where N[k] records

the number of transformations with cost level k

Greedy algorithm– Pick a combination that results in the lowest N

76

Lineage Annotation (Appendix)Lineage Annotation (Appendix)

1

2

3

{1}{1,2}

{1,2}

{2,4}{4}

{4}4

{1,2}

{1,2,4}

{4}

T1 T2

T1* T2*

77

Multiple Inputs and OutputsMultiple Inputs and Outputs

Define properties for each input and output

Trace lineage for each input/output pair using single-input single-output tracing procedures

T

I1

I2

Im

...

O1

O2

On

78

View UpdateView Update

Deletions on SPJ view deletions on base database

View tuple deletion request –t and base tuple deletion D

D is a translation for –t if {t} V = V(D) – V(D – D)

Side-effect E = V – {t}; D is exact if E =

D

V’UV

D’UD?

V

79

Relationships to Data LineageRelationships to Data Lineage

t

R1 R2 Rn

A

C

ti belongs to t’s exclusive lineage Ri** iff

{t} = ( (R1 …{ti}… Rn))

Intuition: ti contributes only to t

A C

ti Ri belongs to t’s lineage Ri* iff

{t} ( (R1 …{ti}… Rn))A C

For an SPJ view:

80

The ProblemThe Problem

D

V’

D’?

V

View update

View update for deletions

t

R1 R2 Rn

A

C

81

Relationships to Data LineageRelationships to Data Lineage

Deleting a lineage branch Ri*of t is always a translation for –t

t

R1 R2 Rn

A

C

82

Deleting a lineage branch Ri*of t is always a translation for –t

t

R1 R2 Rn

A

C

Deleting any subset of t’s exclusive lineage D** never causes side-effect

Relationships to Data LineageRelationships to Data Lineage

83

Deleting a lineage branch Ri*of t is always a translation for –t

t

R1 R2 Rn

A

C

If –t has an exact translation D, it must also has an exact translation within t’s lineage

Deleting any subset of t’s exclusive lineage D** never causes side-effect

Relationships to Data LineageRelationships to Data Lineage

84

Translating View Tuple DeletionsTranslating View Tuple Deletions

DELETE(t, V, D)

compute lineage D* and exclusive lineage D**; IF D** is a translation THEN RETURN; IF i s.t. Ri* causes no side-effect THEN RETURN; FOR each subset D of D* DO

IF D is not a translation THEN prune all subsets of D; ELSE IF D causes a side-effect THEN prune all supersets of D; ELSE RETURN;

85

Detailed ComputationsDetailed Computations

Is D a translation for –t?if t ( ((R1*–R1) … (Rn*–Rn)))then D is a translation

Does D cause side-effect?

E ( (R1 …Ri… Rn))) – {t}

if E ( ((R1–R1) … (Rn–Rn)))then D is exact

Further pruning by sizes

A C

A Ci=1..n

A C