The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server...

18
September 22, 1999 1 The value of merge join and hash join in Microsoft SQL Server and relational query processing Goetz Graefe Microsoft SQL Server

Transcript of The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server...

Page 1: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 1

The value ofmerge join and hash joinin Microsoft SQL Serverand relational query processing

Goetz GraefeMicrosoft SQL Server

Page 2: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 2

Why this study?• Blasgen & Eswaran – 20 years ago

Merge join & (index) nested loops cover allcases pretty well

• DeWitt, Sacco, others – 10-15 years agoHash join is great for large unsorted inputs

• Analytical studies, simulation, experiments

Page 3: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 3

Success without merge/hash join• Sybase & Microsoft SQL Server

Until recently used only nested loopsSuccessful for over 10 years!Even used in data warehousing!

• Focus on OLTPSybase invented stored proceduresMicrosoft leads SMP TPC-C efficiency

• Focus on canned reportsPerfectly possible with tuned index sets

Page 4: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 4

Are the prior studies wrong?• Small evaluation sets

Few tables, few queries

• Insufficient credit to index tuningFixed set of indexes

• This study:Still limited yet non-trivial queries & tablesIndexes tuned using a “tuning wizard” tool

• Large set of possible indexes, integrated with query optimizer

• Next studyIndexes tuned specifically for available algorithms

Page 5: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 5

SQL Server 7.0 query processor• Nested loops with stored or temporary indexes• Merge join & hash join (incl. hash teams)• Index intersection, union, difference, & join• Star joins: star indexes, cross-product, & semi-

join reduction• Constraints exploited for selectivity estimation &

cost calculation & query simplification• Parallelism on SMPs• Content queries (“contains”, “near”, “about”)• Optimized update plans (indexes, constraints)• Heterogeneous & distributed queries

Page 6: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 6

Relevant SQL Server tools• Graphical show plan• Profiler

Captures workloads & events (e.g., deadlocks)Filters on application, database, user, operation,

elapsed time, etc.

• Index tuning wizardOptimizes a workload captured with the profilerReconsider all indexes – only add indexesIncrease / decrease database sizeUses query optimizer to assess choices

Page 7: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 7

Experimental setup• TPC-D database

scale factor = 1 (1 GB raw data)

• Old & new TPC queries22 queries total

• Flags to disableIndex join, merge join, hash join, hash teamsStream aggregation, hash aggregation

• Indexes in simple database designPrimary keys, foreign keys, dates

Page 8: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 8

Simple indexes

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Queries & Algorithms

Tim

e [%

of e

ntir

e N

L run]

NL

MJ

HJ

All

Performance with simple indexes

Page 9: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 9

Performance with simple indexes• NL=MJ >> HJ=All: #1, #15

Hashing improves performanceAggregation, not join, make the differenceEarly aggregation missing in sort code

• NL=MJ=HJ=All: #2, #13, #16, #17No really meaningful differenceIndexes are sufficient to select & retrieve rows

• NL > MJ > HJ=All: #3, #5, #7, #8, #9, #11• NL >> MJ=HJ=All: #4, #14, #19

Need some method for large unindexed inputs

Page 10: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 10

Workload performance

Workload performance

0

10

20

30

40

50

60

70

80

90

100

Entire workload

Tim

e [%

of N

L run]

NL

MJ

HJ

All

Page 11: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 11

Workload performance• Only NLJ is not competitive

Due to simplistic index design

• Hash-based query processor performs best• NLJ + MJ are very competitive

40% difference to full QP with hash joinThat’s 9 month of hardware improvements

• Presuming 2x CPU speed in 18 months

Poor indexing strongly favors hash joinBlasgen & Eswaran were right all along …?

Page 12: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 12

Tuned index setTuning wizard retains primary keys indexes• 7 indexes on line item, up to 7 columns

Total 26 columns indexes

• 4 indexes on orders, lots of redundant keys• 2 indexes on part supply

Page 13: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 13

Performance with tuned indexesTuned indexes

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Queries & Algorithms

Tim

e [%

of N

L run o

n s

imple

index

es]

NLMJ

HJAll

Page 14: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 14

Performance with tuned indexes• Overall performance improvements

Except queries 6, 12, 19Tuning wizard minimizes workload time

• Not the time for each individual query

• More queries in these patternsNL > MJ=HJ=AllNL=MJ=HJ=All

Page 15: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 15

Workload performanceEntire workload, tuned indexes

0

5

10

15

20

25

30

35

40

45

50

Tim

e [%

of N

L run o

n s

imple

index

es]

NL

MJ

HJ

All

Page 16: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 16

Workload performance• All algorithm combinations are fast

Maximal difference 45 vs. 20, or 21 months

• Either MJ or HJ serve wellHaving both adds 20% performance – 5 months

Page 17: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 17

Conclusions• Either indexing or merge / hash join• Are hash join & merge join just an excuse

for poor (non-automatic) indexes?• Next steps

Tune & analyze for specific algorithmsAnalyze bitmap operations & star joinsLook for orders of magnitude – multiple years

• Pre-computed query result – indexed views• Fully automatic indexing & tuning• Caching data & query results on desktops

Page 18: The value of merge join and hash join in Microsoft SQL ...hy460/pdf/SQL Server.pdf · SQL Server 7.0 query processor • Nested loops with stored or temporary indexes • Merge join

September 22, 1999 18

More information• www.microsoft.com/sql• Msdn.microsoft.com• Technet.microsoft.com• Research.microsoft.com