Loss-Less Joins
description
Transcript of Loss-Less Joins
![Page 1: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/1.jpg)
1
Loss-Less Joins
![Page 2: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/2.jpg)
2
Decompositions Dependency-preservation
property: enforce constraints on original relation by enforcing some constraints on resulting relations
Lossless-join property: get original relation by joining the resulting relations
Boyce-Codd normal form (BCNF): lossless join
Third normal form (3NF): lossless join and dependency preservation
![Page 3: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/3.jpg)
3
Testing for a Dependency Preservation
If we project R with FD set F, onto R1 with FD F1, R2 with FD F2,…, Rk with FD Fk
We say that dependencies are preserved if and ony if
F+ = (F1+ U F2
+ U … U Fk+)
![Page 4: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/4.jpg)
4
Testing for a Lossless Join
If we project R onto R1, R2,…, Rk , can we recover R by rejoining?
Any tuple in R can be recovered from its projected fragments.
So the only question is: when we rejoin, do we ever get back something we didn’t have originally?
![Page 5: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/5.jpg)
5
Lossy Decomposition
R R1 R2 ... Rn
R R1
SSN Name Address SSN Name NameAddress1111 Joe 1 Pine 1111 Joe Joe 1 Pine2222 Alice 2 Oak 2222 Alice Alice 2 Oak3333 Alice 3 Pine 3333 Alice Alice 3 Pine
R2 Rn...
R1 R2R
Problem: Name is not a key
Always true!Always true!
Have to CheckHave to Check?
![Page 6: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/6.jpg)
6
The Chase Test
Suppose tuple t comes back in the join.
Then t is the join of projections of some tuples of R, one for each Ri of the decomposition.
Can we use the given FD’s to show that one of these tuples in R must be t ?
![Page 7: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/7.jpg)
7
The Chase – (2)
Start by assuming t = abc… . For each i, there is a tuple si of R that
has a, b, c,… in the attributes of Ri.
si can have any values in other attributes.
We’ll use the same letter as in t, but with a subscript, for these components.
![Page 8: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/8.jpg)
8
Example: The Chase
Let R = ABCD, and the decomposition be AB, BC, and CD.
Let the given FD’s be C->D and B ->A.
Suppose the tuple t = abcd is the join of tuples projected onto AB, BC, CD.
![Page 9: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/9.jpg)
9
The Tableau
A B C Da b c1 d1
a2 b c d2
a3 b3 c d
d
Use C->D
a
Use B ->AWe’ve proved thesecond tuple must be t.
The tuplesof R pro-jected ontoAB, BC, CD.
![Page 10: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/10.jpg)
10
Summary of the Chase
1. If two rows agree in the left side of a FD, make their right sides agree too.
2. Always replace a subscripted symbol by the corresponding unsubscripted one, if possible.
3. If we ever get an unsubscripted row, we know any tuple in the project-join is in the original (the join is lossless).
4. Otherwise, the final tableau is a counterexample.
![Page 11: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/11.jpg)
11
Example: Lossy Join
Same relation R = ABCD and same decomposition.
But with only the FD C->D.
![Page 12: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/12.jpg)
12
The Tableau
A B C Da b c1 d1
a2 b c d2
a3 b3 c d
d
Use C->DThese three tuples are an exampleR that shows the join lossy. abcdis not in R, but we can project andrejoin to get abcd.
These projectionsrejoin to formabcd.
![Page 13: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/13.jpg)
13
Attribute Closure as Chase
A B C D E F
a b c1 d1 e1 f1
a b c2 d2 e2 f2
R = ABCDE, AB ->C, BC ->AD, D ->E, CF ->B
Compute (AB)+
![Page 14: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/14.jpg)
14
Multivalued Dependencies
Fourth Normal FormReasoning About FD’s +
MVD’s
![Page 15: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/15.jpg)
15
Definition of MVD
A multivalued dependency (MVD) on R, X ->->Y
says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation.
Let Z = R - (X+Y), then for each value of X, values of Y are independent of values of Z.
![Page 16: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/16.jpg)
16
Multi-valued Dependencies?COURSE PROFESSOR BOOKS
DBSys Zaki LBK
DBSys Zaki O’Neil
DBSys Zaki Date
DBSys Adali LBK
DBSys Adali O’Neil
DBSys Adali Date
Comp Algo Musser CLR
Comp Algo Musser Baase
Comp Algo Goldberg CLR
Comp Algo Goldberg Baase
•Example: The MVD Course →→ Prof holds
•MVD: X →→ Y holds over R, then for any value of attribute set X = x, the following holds true (let Z = R-XY):∏YZ (σX=x (R)) = ∏Y (σ X=x (R)) x ∏Z (σX = x(R))
That is Y and Z=R-XY are independent of each other given X
That is Y and Z=R-XY are independent of each other given X
![Page 17: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/17.jpg)
17
Example: MVD
Drinkers(name, addr, phones, beersLiked) A drinker’s phones are independent of
the beers they like. name->->phones and name ->-
>beersLiked. Thus, each of a drinker’s phones
appears with each of the beers they like in all combinations.
This repetition is unlike FD redundancy. name->addr is the only FD.
![Page 18: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/18.jpg)
18
Tuples Implied by name->->phones
If we have tuples:
name addr phones beersLikedsue a p1 b1sue a p2 b2sue a p2 b1sue a p1 b2
Then these tuples must also be in the relation.
![Page 19: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/19.jpg)
19
Picture of MVD X ->->Y
X Y others
equal
exchange
![Page 20: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/20.jpg)
20
MVD Rules
Every FD is an MVD (promotion ). If X ->Y, then swapping Y ’s between two
tuples that agree on X doesn’t change the tuples (same X values imply same Y values!).
Therefore, the “new” tuples are surely in the relation, and we know X ->->Y.
Complementation : If X ->->Y, and Z is all the other attributes, then X ->->Z.
![Page 21: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/21.jpg)
21
Splitting Doesn’t Hold
Unlike FD’s, we cannot split the right side --- sometimes you have to leave several attributes on the right side.
![Page 22: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/22.jpg)
22
Example: Multiattribute Right Sides
Drinkers(name, areaCode, phone, beersLiked, manf)
A drinker can have several phones, with the number divided between areaCode and phone (last 7 digits).
A drinker can like several beers, each with its own manufacturer.
![Page 23: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/23.jpg)
23
Example Continued
Since the areaCode-phone combinations for a drinker are independent of the beersLiked-manf combinations, we expect that the following MVD’s hold:
name ->-> areaCode phonename ->-> beersLiked manf
![Page 24: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/24.jpg)
24
Example Data
Here is possible data satisfying these MVD’s:
name areaCode phone beersLiked manfSue 650 555-1111 Bud A.B.Sue 650 555-1111 WickedAle Pete’sSue 415 555-9999 Bud A.B.Sue 415 555-9999 WickedAle Pete’s
But we cannot swap area codes or phones by themselves.That is, neither name->->areaCode nor name->->phoneholds for this relation.
![Page 25: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/25.jpg)
25
Fourth Normal Form
The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF.
There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.
![Page 26: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/26.jpg)
26
4NF Definition
A relation R is in 4NF if: whenever X ->->Y is a nontrivial MVD, then X is a superkey.
Nontrivial MVD means that:• Y is not a subset of X, and• X and Y are not, together, all the
attributes.
Note that the definition of “superkey” still depends on FD’s only.
![Page 27: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/27.jpg)
27
BCNF Versus 4NF
Remember that every FD X ->Y is also an MVD, X ->->Y.
Thus, if R is in 4NF, it is certainly in BCNF. Because any BCNF violation is a 4NF
violation (after conversion to an MVD). But R could be in BCNF and not 4NF,
because MVD’s are “invisible” to BCNF.
![Page 28: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/28.jpg)
28
Decomposition and 4NF
If X ->->Y is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF.
1. XY is one of the decomposed relations.
2. (R – Y) U X is the other.
![Page 29: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/29.jpg)
29
Example: 4NF Decomposition
Drinkers(name, addr, phones, beersLiked)
FD: name -> addrMVD’s: name ->-> phones
name ->-> beersLiked Key is {name, phones, beersLiked}. All dependencies violate 4NF.
![Page 30: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/30.jpg)
30
Example Continued
Decompose using name -> addr:1. Drinkers1(name, addr)
In 4NF; only dependency is name -> addr.
2. Drinkers2(name, phones, beersLiked)1. Not in 4NF. MVD’s name ->-> phones and
name ->-> beersLiked apply. No FD’s, so all three attributes form the key.
![Page 31: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/31.jpg)
31
Example: Decompose Drinkers2
Either MVD name ->-> phones or name ->-> beersLiked tells us to decompose to: Drinkers3(name, phones) Drinkers4(name, beersLiked)
![Page 32: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/32.jpg)
32
Reasoning About MVD’s + FD’s
Problem: given a set of MVD’s and/or FD’s that hold for a relation R, does a certain FD or MVD also hold in R ?
Solution: Use a tableau to explore all inferences from the given set, to see if you can prove the target dependency.
![Page 33: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/33.jpg)
33
Why Do We Care?
1. 4NF technically requires an MVD violation.
Need to infer MVD’s from given FD’s and MVD’s that may not be violations themselves.
2. When we decompose, we need to project FD’s + MVD’s.
![Page 34: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/34.jpg)
34
Example: Chasing a Tableau With MVD’s and
FD’s To apply a FD, equate symbols, as
before. To apply an MVD, generate one or
both of the tuples we know must also be in the relation represented by the tableau.
We’ll prove: if A->->BC and D->C, then A->C.
![Page 35: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/35.jpg)
35
The Tableau for A->C
A B C Da b1 c1 d1
a b2 c2 d2
Goal: prove that c1 = c2. A->->BC and D->C
a b2 c2 d1
Use A->->BC (first row’sD with second row’s BC ).
c2
Use D->C (first andthird row agree on D,therefore agree on C ).
![Page 36: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/36.jpg)
36
Example: Transitive Law for MVD’s
If A->->B and B->->C, then A->->C. Obvious from the complementation
rule if the Schema is ABC. But it holds no matter what the
schema; we’ll assume ABCD.
![Page 37: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/37.jpg)
37
The Tableau for A->->C
A B C Da b1 c1 d1
a b2 c2 d2
Goal: derive tuple (a,b1,c2,d1). A->->B and B->->C
a b1 c2 d2
Use A->->B to swap B fromthe first row into the second.
a b1 c2
d1 Use B->->C to swap C fromthe third row into the first.
![Page 38: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/38.jpg)
38
Rules for Inferring MVD’s + FD’s
Start with a tableau of two rows. These rows agree on the attributes of
the left side of the dependency to be inferred.
And they disagree on all other attributes.
Use unsubscripted variables where they agree, subscripts where they disagree.
![Page 39: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/39.jpg)
39
Inference: Applying a FD
Apply a FD X->Y by finding rows that agree on all attributes of X. Force the rows to agree on all attributes of Y. Replace one variable by the other. If the replaced variable is part of the
goal tuple, replace it there too.
![Page 40: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/40.jpg)
40
Inference: Applying a MVD
Apply a MVD X->->Y by finding two rows that agree in X. Add to the tableau one or both rows
that are formed by swapping the Y-components of these two rows.
![Page 41: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/41.jpg)
41
Inference: Goals To test whether U->V holds, we
succeed by inferring that the two variables in each column of V are actually the same.
If we are testing U->->V, we succeed if we infer in the tableau a row that is the original two rows with the components of V swapped.
![Page 42: Loss-Less Joins](https://reader031.fdocuments.in/reader031/viewer/2022012914/5681591b550346895dc641a9/html5/thumbnails/42.jpg)
42
Inference: Endgame
Apply all the given FD’s and MVD’s until we cannot change the tableau.
If we meet the goal, then the dependency is inferred.
If not, then the final tableau is a counterexample relation. Satisfies all given dependencies. Original two rows violate target
dependency.