Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan...

20
Data Warehousing Data Warehousing 1 Data Warehousing Data Warehousing Lecture-28 Lecture-28 Need for Speed: Join Techniques Need for Speed: Join Techniques Virtual University of Virtual University of Pakistan Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: [email protected]

Transcript of Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan...

Page 1: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

11

Data Warehousing Data Warehousing Lecture-28Lecture-28

Need for Speed: Join TechniquesNeed for Speed: Join Techniques

Virtual University of PakistanVirtual University of Pakistan

Ahsan AbdullahAssoc. Prof. & Head

Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp

National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]

Page 2: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

22

Need for Speed: Join TechniquesNeed for Speed: Join Techniques

Page 3: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

33

BackgroundBackground

Page 4: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

44

About Nested-Loop JoinAbout Nested-Loop Join

Page 5: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

55

FOR i = 1 to N DO BEGIN FOR i = 1 to N DO BEGIN /* /* N rows in T1N rows in T1 */*/IF iIF ithth row of T1 qualifies THEN BEGIN row of T1 qualifies THEN BEGIN

For j = 1 to M DO BEGIN For j = 1 to M DO BEGIN /* M rows in T2/* M rows in T2 */*/ IF the iIF the ithth row of T1 matches to j row of T1 matches to jthth row of T2 on join key THEN BEGIN row of T2 on join key THEN BEGIN IF the jIF the jthth row of T2 qualifies THEN BEGIN row of T2 qualifies THEN BEGIN produce output rowproduce output row ENDEND

ENDEND ENDEND ENDEND ENDEND

Nested-Loop Join: CodeNested-Loop Join: Code

GOES TO GRAPHICSGOES TO GRAPHICS

Page 6: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

66

““What is the average GPA of What is the average GPA of

undergraduate male students?”undergraduate male students?”

For each qualifying row of Personal table, Academic table is examined for matching rows.

Student Personal Table Student Academic Table

298-------------------------------------------------------------62--------------------------------------------------------------440------------------

Nested-Loop Join: Working ExampleNested-Loop Join: Working Example

Results

Search

Results

Search

Results

Search

GOES TO GRAPHICSGOES TO GRAPHICS

Page 7: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

77

Nested-Loop Join: Order of TablesNested-Loop Join: Order of Tables

Page 8: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

88

Nested-Loop Join: Cost FormulaNested-Loop Join: Cost FormulaJoin cost =Join cost = Cost of accessing Table_A + # of qualifying rows in Table_A Blocks of Table_B to be scanned for each qualifying row

OR

Join cost =Join cost = Blocks accessed for Table_A + Blocks accessed for Table_A Blocks accessed for Table_B

GOES TO GRAPHICSGOES TO GRAPHICS

Page 9: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

99

Nested-Loop Join: Cost of reorderNested-Loop Join: Cost of reorder

Table_A = 500 blocks and Table_B = 700 blocks.

Qualifying blocks for Table_A QB(A) = 50 Qualifying blocks for Table_B QB(B) = 100

Join cost A&B = 500 + 50700 = 35,500 I/Os Join cost B&A = 700 + 100500 = 50,700 I/Os

i.e. an increase in I/O of about 43%.

GOES TO GRAPHICSGOES TO GRAPHICS

Page 10: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1010

Nested-Loop Join: VariantsNested-Loop Join: Variants

Page 11: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1111

Sort-Merge JoinSort-Merge Join

Page 12: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1212

Sort-Merge Join: ProcessSort-Merge Join: Process

Page 13: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1313

1122245556666678

1334445566667777

Table_A Table_B

1122245556666678

1334445566667777

Table_A Table_B

1122245556666678

1334445566667777

Table_A Table_B

Sort-Merge Join Example

Page 14: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1414

Sort-Merge Join: NoteSort-Merge Join: Note

Page 15: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1515

Hash-Based joinHash-Based join

Page 16: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1616

Hash-Based Join: WorkingHash-Based Join: Working

Page 17: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1717

Hash-Based Join: ExampleHash-Based Join: Example

Table_B on disk

DiskDisk

Original Relation

Table_A

hashfunction

h

Join Result

. . .

Table_B

M N

N

2

1

.

.

.

1

2

.

.

.

Table_A in main memory

MAIN MEMORY

GOES TO GRAPHICSGOES TO GRAPHICS

Page 18: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1818

Hash-Based Join: Large “small” TableHash-Based Join: Large “small” Table

Page 19: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

1919

Hash-Based Join: Partition SkewHash-Based Join: Partition Skew

Page 20: Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data WarehousingData Warehousing

2020

Hash-Based Join: Intrinsic SkewHash-Based Join: Intrinsic Skew