Query Processing: Joins

36
1 Query Processing: Joins

description

Query Processing: Joins. Computing Joins. One of the most expensive operations Lot of methods devised to join efficiently Block nested loops join Index nested loops join Sort-merge join Hash join. Schema for Examples. Buyers( id : integer , name : string, rating : integer, age : real) - PowerPoint PPT Presentation

Transcript of Query Processing: Joins

Page 1: Query Processing: Joins

1

Query Processing: Joins

Page 2: Query Processing: Joins

Computing Joins

One of the most expensive operations

Lot of methods devised to join efficiently Block nested loops join Index nested loops join Sort-merge join Hash join

2

Page 3: Query Processing: Joins

Schema for Examples

Bids: Each tuple is 40 bytes long, 100 tuples per

page, 1000 pages (i.e., 100,000 tuples, 4MB total)

Buyers: Each tuple is 50 bytes long, 80 tuples per

page, 500 pages (i.e., 40,000 tuples, 2MB total)

Buyers(id: integer, name: string, rating: integer, age: real)Bids (bid: integer, pid: integer, day: dates, product: string)Buyers(id: integer, name: string, rating: integer, age: real)Bids (bid: integer, pid: integer, day: dates, product: string)

Page 4: Query Processing: Joins

Computing Joins:Equality Joins With One Join

Column

In algebra: R S. Common! Must be carefully optimized. R S is large; so, R S followed by a selection is inefficient.

Assume: M pages in R, pR tuples per page, N pages in S, pS tuples per page.

We will consider more complex join conditions later.

Cost metric: # of page I/Os. We will ignore output costs.

SELECT *FROM Bids R, Buyers SWHERE R.bid=S.id

Page 5: Query Processing: Joins

Simple Nested Loops Join

For each tuple in the outer relation R, we scan the entire inner relation S. Cost: M + (pR * M) * N = 1000 + 100*1000*500

I/Os. = 50,001,000 pages!

foreach tuple tr in Bids R doforeach tuple ts in Buyers S do

if tr.bid= ts.id then output (tr, ts)

foreach tuple tr in Bids R doforeach tuple ts in Buyers S do

if tr.bid= ts.id then output (tr, ts)

Page 6: Query Processing: Joins

Page-Oriented Simple Nested Loops Join

Page-oriented Nested Loops join: For each page of R, get each page of S, and write out matching pairs of tuples <tr, ts> Cost: M + M*N = 1000 + 1000*500 = 501000 If smaller relation (S) is outer, cost = 500 +

500*1000 = 500500 Better to put smaller relation as outer loop

foreach page r in R do foreach page s in S do foreach tuple tr in r do foreach tuple ts in s do if tr.bid= ts.id then output (tr, ts)

foreach page r in R do foreach page s in S do foreach tuple tr in r do foreach tuple ts in s do if tr.bid= ts.id then output (tr, ts)

Page 7: Query Processing: Joins

7

Block-Nested Loops Join

Cost can be reduced to M + (M/B-2) N by using B buffers

B-2B-2

Page 8: Query Processing: Joins

Block Nested Loops Join Use one page as an input buffer for

scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block’’ of outer R. Use in-memory hash-table for efficient

matching.

. . .

. . .

R & SHash table for block of R

(k < B-1 pages)

Input buffer for SOutput buffer

. . .

Join Result

Page 9: Query Processing: Joins

Examples of Block Nested Loops

Cost: Scan of outer + #outer blocks * scan of inner #outer blocks =

With Bids (R) as outer, and say B-2=100 (blocksize): Cost of scanning R is 1000 I/Os; a total of 10 blocks. Per block of R, we scan Buyers (S); 10*500 I/Os. Total Cost: 1000 + 10*500 = 6000 pages

With 100-page blocksize of Buyers S as outer: Cost of scanning S is 500 I/Os; a total of 5 blocks. Per block of S, we scan Reserves; 5*1000 I/Os. Total Cost: 500 + 5*1000 = 5500 pages!

So, cheaper to keep smaller relation as outer

# /of pages of outer blocksize

Page 10: Query Processing: Joins

Index Nested Loops Join

If there is an index on the join column of one relation (say S), can make it the inner and exploit the index. Cost: M + ( (M*pR) * cost of finding matching S

tuples) For each R tuple, cost of probing S index is

about 1.2 for hash index, 2-4 for B+ tree. Cost of then finding S tuples depends on clustering. Clustered index: 1 I/O (typical), unclustered: upto 1

I/O per matching S tuple.

foreach tuple tr in R doforeach tuple ts in S

if bid == id (use index) add <tr, ts> to result

foreach tuple tr in R doforeach tuple ts in S

if bid == id (use index) add <tr, ts> to result

Page 11: Query Processing: Joins

Examples of Index Nested Loops

Hash-index on id of Buyers (as inner): Scan Bids R: 1000 page I/Os, 100*1000 tuples. For each R tuple: 1.2 I/Os to get data entry in index,

plus 1 I/O to get (the exactly one) matching S tuple. Total: 100000*2.2 + 1000 = 221,000 I/Os

Hash-index on bid of Bids (as inner): Scan Buyers S: 500 page I/Os, 80*500 tuples. For each S tuple: 1.2 I/Os to find index page with

data entries, plus cost of retrieving matching R tuples. Assuming uniform distribution, 2.5 bids per buyer (100,000 / 40,000). Cost of retrieving them is 1 or 2.5 I/Os depending on whether the index is clustered.

Total: 500+40000*2.2 = 88,500 (clustered) or 500+40000*3.5 = 140,500 (unclustered)

Page 12: Query Processing: Joins

Sort-Merge Join (R S) Sort R and S on the join column, then scan them to

do a “merge’’ (on join col.), and output result tuples. Advance scan of R until current R-tuple >= current S

tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple.

At this point, all R tuples with same value in Ri (current R group) and all S tuples with same value in Sj (current S group) match; output <tr, ts> for all pairs of such tuples.

Then resume scanning R and S. R is scanned once; each S group is scanned once

per matching R tuple. (Multiple scans of an S group are likely to find needed pages in buffer.)

Page 13: Query Processing: Joins

13

Sort-Merge Joinsort Bids R on bidsort Buyers S on idwhile !eof(R) and !eof(S) do scan r and s concurrently until tR.bid=tS.id if (tR.bid=tS.id=c)

output bid=cid=c)

sort Bids R on bidsort Buyers S on idwhile !eof(R) and !eof(S) do scan r and s concurrently until tR.bid=tS.id if (tR.bid=tS.id=c)

output bid=cid=c)

R

S

tR.bid = c

tS.id = c

Page 14: Query Processing: Joins

14

Sort-Merge Join

Cost of sorts assuming B buffers = 2 M log B-1 M + 2 N log B-1 N

Cost of merge depends on whether matching “groups” can be fit in buffers If yes, merge step takes M + N

Cost: 2M logB-1M + 2N logB-1N + (M+N) Let blocksize(B-1) = 100, then cost

is:2*1000*1.5 + 2*500*1.35 + 1000+500 = 5850 pages!

Page 15: Query Processing: Joins

15

Hash-Join

Step 1: Hash R on bid and S on id into the same set of buckets

Step 2: Since matching tuples must be in same bucket, read each bucket in turn and output the result of the join

Cost (assuming each bucket fits in memory) = 3 (M+ N ) (not including final output cost)

Page 16: Query Processing: Joins

Hash-Join Partition both

relations using hash function h: R tuples in partition i will only match S tuples in partition i.

Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches.

Partitionsof R & S

Input bufferfor Si

Hash table for partitionRi (k < B-1 pages)

B main memory buffersDisk

Output buffer

Disk

Join Result

hashfnh2

h2

B main memory buffers DiskDisk

Original Relation OUTPUT

2INPUT

1

hashfunction

h B-1

Partitions

1

2

B-1

. . .

Page 17: Query Processing: Joins

Observations on Hash-Join If we build an in-memory hash table to

speed up the matching of tuples, a little more memory is needed.

If the hash function does not partition uniformly, one or more R partitions may not fit in memory. Can apply hash-join technique recursively to do the join of this R-partition with corresponding S-partition.

Page 18: Query Processing: Joins

Cost of Hash-Join In partitioning phase, read+write both

relations: 2(M+N). In matching phase, read both: M+N

Cost: 3(M+N) = 3(1000+500) = 4500 I/Os. Sort-Merge Join vs. Hash Join:

Given a minimum amount of memory both have a cost of about 3(M+N) I/Os (assuming that logB-

1M=1). Hash Join superior if relation sizes differ greatly.

Also, Hash Join shown to be highly parallelizable. Sort-Merge less sensitive to data skew; result is

sorted!

Page 19: Query Processing: Joins

General Join Conditions Equalities over several attributes (e.g., R.A=S.A

AND R.B=S.B): For Index NL, build index on <A, B> (if S is inner); or

use existing indexes on A or B. For Sort-Merge and Hash Join, sort/partition on

combination of the two join columns. Inequality conditions (e.g., R.B < S.B):

For Index NL, need (clustered!) B+ tree index.• Range probes on inner; # matches likely to be much higher

than for equality joins. Hash Join not applicable. Sort Merge Join can be used, but more expensive Block NL quite likely to be the best join method here.

Page 20: Query Processing: Joins

20

Comparison

Hash Join seems best, but is sensitive to data skew

Sort-merge join had good asymptotic behavior (i.e., as relation size increases)

Sometime nested-block loops join can be the best (depends on relation parameters)

Choice may be affected by indices (e.g., Clustered B-Tree on some relation in sort-merge join)

Page 21: Query Processing: Joins

Set Operations Intersection and cross-product special cases of join. Union (Distinct) and Except similar; we’ll do union. Sorting based approach to union:

Sort both relations (on combination of all attributes). Scan sorted relations and merge them. Alternative: Merge runs from Pass 0 for both relations.

Hash based approach to union: Partition R and S using hash function h. For each S-partition, build in-memory hash table (using

h2), scan corresponding R-partition and add tuples to table while discarding duplicates.

Page 22: Query Processing: Joins

Aggregate Operations (AVG, MIN, etc.) Without grouping:

In general, requires scanning the relation. Given index whose search key includes all attributes in

the SELECT or WHERE clauses, can do index-only scan. With grouping:

Sort on group-by attributes, then scan relation and compute aggregate for each group. (Can improve upon this by combining sorting and aggregate computation.)

Similar approach based on hashing on group-by attributes.

Given tree index whose search key includes all attributes in SELECT, WHERE and GROUP BY clauses, can do index-only scan; if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.

Page 23: Query Processing: Joins

Impact of Buffering If several operations are executing concurrently,

estimating the number of available buffer pages is guesswork.

Repeated access patterns interact with buffer replacement policy. e.g., Inner relation is scanned repeatedly in Simple

Nested Loop Join. With enough buffer pages to hold inner, replacement policy does not matter. Otherwise, MRU is best, LRU is worst (sequential flooding).

Does replacement policy matter for Block Nested Loops?

What about Index Nested Loops? Sort-Merge Join?

Page 24: Query Processing: Joins

Summary A virtue of relational DBMSs: queries are

composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).

Many alternative implementation techniques for each operator; no universally superior technique for most operators.

Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops.

Page 25: Query Processing: Joins

25

Query Optimization

Page 26: Query Processing: Joins

26

Query Evaluation Problem: An SQL query is declarative –

does not specify a query execution plan. A relational algebra expression is

procedural – there is an associated query execution plan.

Solution: Convert SQL query to an equivalent relational algebra and evaluate it using the associated query execution plan. But which equivalent expression is best?

Page 27: Query Processing: Joins

27

Naive ConversionSELECT DISTINCT TargetListFROM R1, R2, …, RNWHERE Condition

is equivalent to

TargetList (Condition (R1 R2 ... Rn))

but this may imply a very inefficient query execution plan.

Example: Name (Id=ProfId and CrsCode=‘CS532’ (ProfessorProfessor TeachingTeaching))

• Result can be < 100 bytes• But if each relation is 50K then we end up computing an intermediate result ProfessorProfessor TeachingTeaching of size 1G before shrinking it down to just a few bytes.

Problem: Find an equivalent relational algebra expression that can be evaluated “efficiently”.

Page 28: Query Processing: Joins

28

Query Processing Architecture

Page 29: Query Processing: Joins

29

Query Optimizer

Uses heuristic algorithms to evaluate relational algebra expressions. This involves: estimating the cost of a relational algebra

expression transforming one relational algebra expression to an

equivalent one choosing access paths for evaluating the

subexpressions

Query optimizers do not “optimize” – just try to find “reasonably good” evaluation strategies. Optimal plan can be very hard to find!

Page 30: Query Processing: Joins

Highlights of System R Optimizer

Impact: Most widely used currently; works well for < 10 joins.

Cost estimation: Approximate art at best. Statistics, maintained in system catalogs, used to

estimate cost of operations and result sizes. Considers combination of CPU and I/O costs.

Plan Space: Too large, must be pruned. Only the space of left-deep plans is considered.

• Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.

Cartesian products avoided.

Page 31: Query Processing: Joins

Overview of Query Optimization

Plan: Tree of Relational Algebra ops, with choice of algorithm for each op. Each operator typically implemented using a `pull’

interface: when an operator is `pulled’ for the next output tuple, it `pulls’ on its inputs and computes them.

Two main issues: For a given query, what plans are considered?

• Algorithm to search plan space for cheapest (estimated) plan. How is the cost of a plan estimated?

Ideally: Want to find best plan. Practically: Avoid worst plans!

Page 32: Query Processing: Joins

32

Equivalence Preserving Transformations

To transform a relational expression into another equivalent expression we need transformation rules that preserve equivalence

Each transformation rule Is provably correct (i.e., does preserve

equivalence) Has a heuristic associated with it

Page 33: Query Processing: Joins

33

Selection and Projection Rules

Break complex selection into simpler ones: Cond1Cond2 (R) Cond1 (Cond2 (R) )

Break projection into stages: attr (R) attr ( attr (R)), if attr attr

Commute projection and selection: attr (Cond(R)) Cond ( attr (R)),

if attr all attributes in Cond

Page 34: Query Processing: Joins

34

Commutativity and Associativity of Join

(and Cartesian Product as Special Case) Join commutativity: R S S R

used to reduce cost of nested loop evaluation strategies (smaller relation should be in outer loop)

Join associativity: R (S T) (R S) T used to reduce the size of intermediate relations in

computation of multi-relational join – first compute the join that yields smaller intermediate result

N-way join has T(N) N! different evaluation plans T(N) is the number of parenthesized expressions N! is the number of permutations

Query optimizer cannot look at all plans (might take longer to find an optimal plan than to compute query brute-force). Hence it does not necessarily produce optimal plan

Page 35: Query Processing: Joins

35

Pushing Selections and Projections

Cond (R S) R Cond S If Cond relates attributes of both R and S Reduces size of intermediate relation since

rows can be discarded sooner

Cond (R S) Cond (R) S If Cond involves only the attributes of R Reduces size of intermediate relation since

rows of R are discarded sooner

attr(R S) attr(attr (R) S),

if attr attr’ attributes(R) reduces the size of an operand of product

Page 36: Query Processing: Joins

36

Equivalence Example

C1 C2 C3 (RR SS)

C1 (C2 (C3 (RR SS) ) )

C1 (C2 (RR) C3 (SS) )

C2 (RR) C1 C3 (SS)

assuming C2 involves only attributes of RR, C3 involves only attributes of SS, and C1 relates attributes of R and SS