Three way join in one round on hadoop
-
Upload
iraj-hedayati -
Category
Data & Analytics
-
view
81 -
download
0
Transcript of Three way join in one round on hadoop
![Page 1: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/1.jpg)
Three-way join in one round on HadoopCOMP 6231
GROUP 7
IRAJ HEDAYATISOMARIN, ZAKARIA NASERELDINE, J INYANG DU
![Page 2: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/2.jpg)
Problem statement
In this section of second project we aimed to calculate three-way join in one round of Map-Reduce algorithm.
R join S join T
T
SR
![Page 3: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/3.jpg)
Algorithm Overview
First relation: R
Second relation: S
Third relation: T
a, b
b, c
c, d
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Mapper
h(b)=xh(c)=y
R,(a,b)
S,(b,c)
T,(c,d)
x
y
In memory join<KEY, VALUE>=<(X,Y), (relation_name, tuple)>
Coordinate of a reducer in imagined matrix of reducers
![Page 4: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/4.jpg)
Mapping and Hashing
<KEY, VALUE>=<(X,Y), (relation_name, tuple)>
Exactly same as inputFetch from file name
Input tuple
Third relation: T
First relation: R
Second relation: S (h(b),h(c))
…
(h(b),1)(h(b),2)
(h(b),11)
…
(1,h(c))(1,h(c))
(11,h(c))
𝑅𝑒𝑑𝑢𝑐𝑒𝑟 ¿=(𝑥−1)×√¿𝑜𝑓 𝑟𝑒𝑑𝑢𝑐𝑒𝑟𝑠+𝑦
h(b)=xh(c)=y
![Page 5: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/5.jpg)
In-memory join algorithmNESTED LOOP JOIN
For each tuple in RFor each tuple in S
If R.b==S.b thenFor each tuple in T
If S.c==T.c thenPrint (R.a, S.b, S.c, T.d)
SORT-BASED JOIN ALGORITHM
1. divide input list in three sorted lists using Binary Search algorithm
2. Execute in-memory join algorithm
•UNTIL R and S are not empty DO• IF the first items in both list are equal THEN• make sure all the tuples with the same value have
been joined together and remove them from the list• ELSE• Choose the smallest one and remove items until
reach an item equal or greater than the front item in the another list
𝑂 (𝑛3)
1.Divide list: 2.In-memory join:𝑂 (𝑛 log𝑛)
![Page 6: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/6.jpg)
Number of reducers We decide to use a square matrix. This choice would be a constraint on number of reducers. For example in this case, we had 128 reducers available but actually we just use 121 of them
On the other hand selecting different number of reducers in each dimension, we will have data replication and inefficiency.
![Page 7: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/7.jpg)
Number of reducers (example 1, replication problem)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2
3
4
# of reducers=128Assumption: R>>TBoth of them have uniform distributionT(R) = 1,000,000T(T) = 1,000
For square matrix:Replicated data=1,000,000*11+1,000*11=11,011,000
For above matrix:Replicated data=1,000,000*16+1,000*16=16,016,000
![Page 8: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/8.jpg)
Number of reducers (example 1, inefficiency problem)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2
3
4
# of reducers=128Assumption: T>>RT is not uniformly distributedT(R) = 1,000T(T) = 1,000,000
When the range is reduced, it’s more likely two value hash in to the same location.
IDLE IDLEFULL FULL
![Page 9: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/9.jpg)
Experimental results 37 seconds
![Page 10: Three way join in one round on hadoop](https://reader035.fdocuments.in/reader035/viewer/2022080910/55cc130dbb61ebe51c8b461e/html5/thumbnails/10.jpg)
Any Question?