Supporting top-k join queries in relational databases
description
Transcript of Supporting top-k join queries in relational databases
![Page 1: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/1.jpg)
Supporting top-k join queries in relational databases
Ihab Ilyas, Walid Aref, Ahmed Elmagarmid
Presented by Jing Chen
![Page 2: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/2.jpg)
1. Motivation
• Select A.1, B.2
From A, B, C
Where A.1=B.1
and B.2 = C.2
Order By (0.3*A.1 + 0.7*B.2)
Stop After 5
![Page 3: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/3.jpg)
Traditional Joins
• Nested Loop
• Merge Join
• Hash Join
![Page 4: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/4.jpg)
2. Goals/Contribution
• Propose a pipe lined Rank-Join algorithm
• Analyze the I/O cost of the algorithm
• Implement the algorithm
• Propose a optimal join strategy
• Evaluate performance
![Page 5: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/5.jpg)
3. Ripple JoinJOIN : L.A = R.A L and R are descending ordered by B
(L1(1,1,5) R1(1,3,5))L
R
![Page 6: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/6.jpg)
3. Ripple Join --contd
JOIN: L.A = R.A
(L2,R2) {(2,2,4),(2,1,4)}
L
R
![Page 7: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/7.jpg)
Ripple Join --contd
JOIN: L.A = R.A
(L2,R2) {(2,2,4),(2,1,4)}
(L2,R1) {2,2,4), (1,3,5)}
(L1,R2) {(1,1,5), (2,1,4)}
L
R
![Page 8: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/8.jpg)
3. Variation Of Ripple Join
Rectangle
Block
Hash
![Page 9: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/9.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
(1). Get a valid combination using any certain algorithm
Ripple Select (L1, R1) => No Result
![Page 10: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/10.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
(1). Get a valid combination using any certain algorithm
Select (L2, R2)
(L2, R2), (L2, R1), (L1, R2) => (L1, R2)
![Page 11: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/11.jpg)
4. Proposed Algorithm -- ContdSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
(2). Compute the score (J) for the result
J1(L1, R2) => L.B + R.B = 5 + 4 = 9
![Page 12: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/12.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
(3). Compute a water mark score (T) by
Max {(Last L).B + (First R.B), (First L).B + (Last R).B}
![Page 13: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/13.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
(3). Compute a water mark (T) score by
Max {(Last L).B + (First R.B), (First L).B + (Last R).B}
Selection (L1, R1) , (L2, R2) =>
T = Max (L2.B + R1.B, L1.B + R2.B)
=Max (4+5, 5+4) = 9
![Page 14: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/14.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
J1= 9
T = 9
J1 >= T
Report J1
Since we need top 2 (k=2), continue until k=2 and Min(J1, J2, …Jk) > T
![Page 15: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/15.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
Select (L3, R3)
(L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3)
=> (L3, R3), (L2, R3)
J2(L2, R3) = 4 + 3 = 7
J3(L3, R3) = 3 + 3= 6
![Page 16: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/16.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
Calculate T
T = Max { (Last L).B + (First R).B, (First L).B+ (Last R).B}
= Max {L3.B + R1.B , L1.B + R3.B}
= Max(3 + 5, 5 + 3)
= 8
![Page 17: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/17.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
T = 8
J1(L1,R2) = 9 reported
J2( L2, R3) = 7
J3(L3, R3) = 6
Note, J’s are in descending order
Min (J) = 6 < T Continue
Comment: Calculate T before J is more efficient. Can stop after find first Jk >= T
![Page 18: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/18.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4)
J(L4, R4) = 7, J(L2, R4) = 6, J(L3, R4) = 5
T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7
![Page 19: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/19.jpg)
4. Proposed AlgorithmSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
T= 7J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7, J3(L3, R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5Min(J1, J2) = 7 >= T (k = 2)Comment: When reach all records, T does not need to be calculated, unless, calculate T first, and compare each J(i) with T immediately
![Page 20: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/20.jpg)
4. The Algorithm
![Page 21: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/21.jpg)
5. Implementation
• Hash rank join operator (HRJN)
- Use Hash Ripple Join
- Two hash table contain the two inputs
- A queue holds ordered join results
- Ltop, Rtop, Lbottom, Rbottom are used
to calculate T
![Page 22: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/22.jpg)
5. Implementation contd
• Issues with HRJN
Buffer problem
Local Ranking Problem
![Page 23: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/23.jpg)
5. Implementation contd
• Use Block Ripple Join to Solve Local Ranking Problem. (block size = 2)
![Page 24: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/24.jpg)
5. Implementation contd• HRJN* score-guided join strategy
- How to select next (block) tuple
T1 = Ltop + Rbottom , T2 = Lbottom + Rtop
T = Max(T1, T2)
If T1 > T2, need to reduce T1. How?
![Page 25: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/25.jpg)
5. Implementation contd• HRJN* score-guided join strategy
- How to select next (block) tuple
T1 = Ltop + Rbottom , T2 = Lbottom + Rtop
T = Max(T1, T2)
If T1 > T2, need to reduce T1. How?
Reduce Rbottom and not reduce Lbottom
(descending ordered), thus more tuples should be retrieved from R to reduce T1
![Page 26: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/26.jpg)
6. Join Order
• When more than two tables join, the join order matters. (A and C have high similarity)
![Page 27: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/27.jpg)
6. Join Order -- contd
• Rank-Join order heuristic
- Get a ranked sample, top S ranked list from L and R
- Calculate the similarity using footrule
Where L(i) and R(i) are the rank of object i in L and R
![Page 28: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/28.jpg)
6. Join Order -- contd
![Page 29: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/29.jpg)
7. Generalizing the rank-join
• Using indexes
• Eliminate duplications
• Faster termination
![Page 30: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/30.jpg)
8. Performance Evaluation
Selectivity = 0.2 and m= 4
![Page 31: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/31.jpg)
8. Performance Evaluation
Selectivity = 0.2 and m= 4
![Page 32: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/32.jpg)
8. Performance Evaluation
Selectivity = 0.2 and m= 4
![Page 33: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/33.jpg)
m = 4 and K=50
![Page 34: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/34.jpg)
m = 4 and K=50
![Page 35: Supporting top-k join queries in relational databases](https://reader036.fdocuments.in/reader036/viewer/2022062520/56815a81550346895dc7ea40/html5/thumbnails/35.jpg)
m = 4 and K=50