Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan...
-
Upload
lauren-kelley -
Category
Documents
-
view
214 -
download
0
Transcript of Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan...
Data WarehousingData Warehousing
11
Data Warehousing Data Warehousing Lecture-28Lecture-28
Need for Speed: Join TechniquesNeed for Speed: Join Techniques
Virtual University of PakistanVirtual University of Pakistan
Ahsan AbdullahAssoc. Prof. & Head
Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]
Data WarehousingData Warehousing
22
Need for Speed: Join TechniquesNeed for Speed: Join Techniques
Data WarehousingData Warehousing
33
BackgroundBackground
Data WarehousingData Warehousing
44
About Nested-Loop JoinAbout Nested-Loop Join
Data WarehousingData Warehousing
55
FOR i = 1 to N DO BEGIN FOR i = 1 to N DO BEGIN /* /* N rows in T1N rows in T1 */*/IF iIF ithth row of T1 qualifies THEN BEGIN row of T1 qualifies THEN BEGIN
For j = 1 to M DO BEGIN For j = 1 to M DO BEGIN /* M rows in T2/* M rows in T2 */*/ IF the iIF the ithth row of T1 matches to j row of T1 matches to jthth row of T2 on join key THEN BEGIN row of T2 on join key THEN BEGIN IF the jIF the jthth row of T2 qualifies THEN BEGIN row of T2 qualifies THEN BEGIN produce output rowproduce output row ENDEND
ENDEND ENDEND ENDEND ENDEND
Nested-Loop Join: CodeNested-Loop Join: Code
GOES TO GRAPHICSGOES TO GRAPHICS
Data WarehousingData Warehousing
66
““What is the average GPA of What is the average GPA of
undergraduate male students?”undergraduate male students?”
For each qualifying row of Personal table, Academic table is examined for matching rows.
Student Personal Table Student Academic Table
298-------------------------------------------------------------62--------------------------------------------------------------440------------------
Nested-Loop Join: Working ExampleNested-Loop Join: Working Example
Results
Search
Results
Search
Results
Search
GOES TO GRAPHICSGOES TO GRAPHICS
Data WarehousingData Warehousing
77
Nested-Loop Join: Order of TablesNested-Loop Join: Order of Tables
Data WarehousingData Warehousing
88
Nested-Loop Join: Cost FormulaNested-Loop Join: Cost FormulaJoin cost =Join cost = Cost of accessing Table_A + # of qualifying rows in Table_A Blocks of Table_B to be scanned for each qualifying row
OR
Join cost =Join cost = Blocks accessed for Table_A + Blocks accessed for Table_A Blocks accessed for Table_B
GOES TO GRAPHICSGOES TO GRAPHICS
Data WarehousingData Warehousing
99
Nested-Loop Join: Cost of reorderNested-Loop Join: Cost of reorder
Table_A = 500 blocks and Table_B = 700 blocks.
Qualifying blocks for Table_A QB(A) = 50 Qualifying blocks for Table_B QB(B) = 100
Join cost A&B = 500 + 50700 = 35,500 I/Os Join cost B&A = 700 + 100500 = 50,700 I/Os
i.e. an increase in I/O of about 43%.
GOES TO GRAPHICSGOES TO GRAPHICS
Data WarehousingData Warehousing
1010
Nested-Loop Join: VariantsNested-Loop Join: Variants
Data WarehousingData Warehousing
1111
Sort-Merge JoinSort-Merge Join
Data WarehousingData Warehousing
1212
Sort-Merge Join: ProcessSort-Merge Join: Process
Data WarehousingData Warehousing
1313
1122245556666678
1334445566667777
Table_A Table_B
1122245556666678
1334445566667777
Table_A Table_B
1122245556666678
1334445566667777
Table_A Table_B
Sort-Merge Join Example
Data WarehousingData Warehousing
1414
Sort-Merge Join: NoteSort-Merge Join: Note
Data WarehousingData Warehousing
1515
Hash-Based joinHash-Based join
Data WarehousingData Warehousing
1616
Hash-Based Join: WorkingHash-Based Join: Working
Data WarehousingData Warehousing
1717
Hash-Based Join: ExampleHash-Based Join: Example
Table_B on disk
DiskDisk
Original Relation
Table_A
hashfunction
h
Join Result
. . .
Table_B
M N
N
2
1
.
.
.
1
2
.
.
.
Table_A in main memory
MAIN MEMORY
GOES TO GRAPHICSGOES TO GRAPHICS
Data WarehousingData Warehousing
1818
Hash-Based Join: Large “small” TableHash-Based Join: Large “small” Table
Data WarehousingData Warehousing
1919
Hash-Based Join: Partition SkewHash-Based Join: Partition Skew
Data WarehousingData Warehousing
2020
Hash-Based Join: Intrinsic SkewHash-Based Join: Intrinsic Skew