CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1

Database Systems II

Query Execution


IntroductionWe have optimized the logical query plan, applying relational algebra equivalences.In order to refine this plan into a physical query plan, we in particular need to choose one of the available algorithms to implement the basic operations (selection, join, . . . ) of the query plan.For each alternative physical query plan, we estimate its cost.The cost estimates are based on the size estimates that we discussed in the previous chapter.


IntroductionDisk I/O (read / write of a disk block) is orders of magnitude more expensive than CPU operations.Therefore, we use the number of disk I/Os to measure the cost of a physical query plan.We ignore CPU costs, timing effects, and double buffering requirements.We assume that the arguments of an operator are found on disk, but the result of the operator is left in main memory.


IntroductionWe use the following parameters (statistics) to express the cost of an operator:- B(R) = # of blocks containing R tuples,- f(R) = max # of tuples of R per block,- M = # memory blocks available in the buffer,- HT(i) = # levels in index i,- LB(i) = # of leaf blocks in index i.

M may comprise the entire main memory, but typically the main memory needs to be shared with other processes, and M is much (!) smaller.


IntroductionThe performance of relational operators depends on many parameters such as the following ones.Are the tuples of a relation stored physically contiguous (clustered)? If yes, the number of blocks to be read is greatly reduced compared to non-clustered storage.Is a relation sorted by the relevant (selection, join) attribute? Otherwise, it may need to be sorted on-the-fly.Which indexes exist? Some algorithms require the existence of a corresponding index.


IntroductionEach operator (selection, join, . . .) in a logical query plan can be implemented by one of a fairly large number of alternative algorithms .We distinguish three types of algorithms:- sorting-based algorithms,- hash-based algorithms,- index-based algorithms.Sorting, building of hash table or building of index can either have happened in advance or may happen on the fly.


IntroductionWe can also categorize algorithms according to the number of passes over the data:

- one-pass algorithms read data only once from disk,- two-pass algorithms read data once from disk, write intermediate relation back to disk and then read the intermediate relation once.- multiple-pass algorithms perform more than two passes over the data, not considered in class.


One-Pass Algorithms for Unary Operations

Consider the unary, tuple-at-a-time operations, selection and projection on relation R.Read all the blocks of R into the input buffer, one at a time. Perform the operation on each tuple and move the selected / projected tuple to the output buffer.

RInputbuffer

Outputbuffer

Unaryoperation


One-Pass Algorithms for Unary Operations

Output buffer may be input buffer of other operation and is not counted.

Thus, algorithm requires only M = 1 buffer blocks.

I/O cost is B(R).

If some index is applicable for a selection, have to read only blocks that contain qualifying tuples.


One-Pass Algorithms for Binary Operations

SR B

Binary operations: union, intersection, difference, Cartesian product, and join. Use subscripts B and S to distinguish between the set- and bag version, e.g. The bag union can be computed using a very simple one-pass algorithm: copy each tuple of R to the output, and copy each tuple of S to the output. I/O cost is B(R) + B(S), M = 1.

.and SB



)).(),(min( SBRBM

Other binary operations require the reading of the smaller of the two input relations into main memory.One buffer to read blocks of the larger relation, M-1 buffers for holding the entire smaller table.I/O cost is B(R) + B(S). In main memory, a data structure is built that efficiently supports insertions and searches.Data structure, e.g., hash table or binary balanced tree. Space overhead can be neglected.



For set union, read the smaller relation (S) into M-1 buffers, representing it in a data structure whose search key consists of all attributes. All these tuples are also copied to the output.Read all blocks of R into the M-th buffer, one at a time.For each tuple t of R, check whether t is in S. If not, copy t to the output.For set intersection, copy t to output if it also is in S.


Nested-Loop JoinsWe now consider algorithms for the join operator.The simplest one is the nested-loop join, a one-and-a-half pass algorithm. One table is read once, the other one multiple times.It is not necessary that one relation fits in main memory.Perform the join through two nested loops over the two input relations.


Nested-Loop JoinsTuple-based nested-loop join

natural join R S, join attribute C

for each r R do for each s S do

if r.C = s.C then output (r,s)

Outer relation R, inner relation S.One buffer for outer relation, one buffer for inner relation. M = 2.I/O cost is T(R) x T(S).


Nested-Loop Joins

Example

Relations not clustered

T(R1) = 10,000 T(R2) = 5,000

R1 as the outer relation

Cost for each R1 tuple t1:

read tuple t1 + read relation R2

Total I/O cost is 10,000

(1+5,000)=50,010,000


Nested-Loop JoinsCan do much better by organizing access to both relations by blocks.Use as much buffer space as possible (M-1) to store tuples of the outer relation.Block-based nested-loop join for each chunk of M-1 blocks of R do read these blocks into the buffer;

for each block b of S do read b into the buffer;

for each tuple t of b do find the tuples of R that join with t and output the join results


Nested-Loop JoinsExampleR1 as the outer relationT(R1) = 10,000, T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101, 100 buffers for R1, 1 buffer for R210 R1 chunkscost for each R1 chunk:read chunk: 1,000 IOsread R2: 5,000 IOstotal I/O cost is 10 x 6,000 = 60,000 IOs


Nested-Loop JoinsCan do even better by reversing the join order.

R2 R1T(R1) = 10,000, T(R2) = 5,000S(R1) = S(R2) = 1/10 blockM = 101, 100 buffers for R2, 1 buffer for R15 R2 chunkscost for each R2 chunk:

read chunk: 1,000 IOsread R1: 10,000 IOs

total I/O cost is 5 x 11,000 = 55,000 IOs


Nested-Loop JoinsFinally, performance is dramatically improved when input relations are clustered.With clustered relations, for each R2 chunk:

read chunk: 100 IOsread R1: 1,000 IOs

Total I/O is 5 x 1,100 = 5,500 IOs.Note that the IO cost for a one-pass join (which has the minimum IO of any join algorithm) in this example is 1,000 + 500 = 1,500 IOs.For a comparison, the one-pass join requires M=501 buffer blocks, which is optimal.


Two-Pass Algorithms Based on Sorting

If the input relations are sorted, the efficiency of duplicate elimination, set-theoretic operations and join can be greatly improved.Reserve one buffer for each of the input relations R and S and another buffer for the output.Scan both relations simultaneously in sort order, merging matching tuples.For example, for set intersection: repeatedly consider the tuple t that is least in the sort order (w.r.t. primary key) among all tuples in the input buffer. If it appears in both R and S, output t.



In the following, we present a simple sort-merge join algorithm. It is called merge-join, if step (1) can be skipped, since the input relations R1 and R2 are already sorted.

Sort-merge join (1) if R1 and R2 not sorted, sort them

(2) i 1; j 1;while (i T(R1)) (j T(R2)) do if R1{ i }.C = R2{ j }.C then

outputTuples else if R1{ i }.C > R2{ j }.C then j j+1 else if R1{ i }.C < R2{ j }.C then i i+1



Procedure outputTuples produces all pairs of tuples from R1 and R2 with C = R1{ i }.C = R2{ j }.C.In the worst case, need to match each pairs of tuples from R1 and R2 (nested-loop join).

Procedure outputTuplesWhile (R1{ i }.C = R2{ j }.C) (i T(R1)) do

[jj j; while (R1{ i }.C = R2{ jj }.C) (jj T(R2)) do

[output pair R1{ i }, R2{ jj }; jj jj+1 ]

i i+1 ]


Two-Pass Algorithms Based on SortingExample

i R1{i}.C R2{j}.C j1 10 5 12 20 20 23 20 20 34 30 30 45 40 30 5

50 6 52 7


Two-Pass Algorithms Based on SortingExampleBoth R1, R2 ordered by C; relations clustered.

Memory

R1

R2

…..

…..

R1

R2

Total cost: read R1 cost + read R2 cost= 1,000 + 500 = 1,500 IOs


Two-Pass Algorithms Based on SortingWhat if input relations are not yet in the required sort order?Do Two-Phase, Multiway Merge-Sort (2PMMS).Phase 1: Sort each block of relation R separately in main memory, write sorted sublists back to disk.Phase 2: Merge all the B(R) sorted sublists.

pointer to firstunchosen tuple

output buffer. . .

selectsmallestunchosen

input buffer (sorted)



.)(RBM

Each sorted sublist has a length of M blocks.

Number of sublists is B(R)/M.

Therefore,

This means we require

In phase 1, each tuple is read and written once. In phase 2, each tuple is read again. We ignore the cost of writing the results to disk.

Thus, the IO cost is 3B(R).

.)(i.e.,1/)( 22 MMMRBMMRB


Two-Pass Algorithms Based on SortingIO cost is 4B(R), if sorting is used as a first step of sort-join and the results must be written to the disk.

If relation R is too big, apply the idea

recursively.

Divide R into chunks of size M(M-1), use

2PMMS to sort each one, and take resulting

sorted lists as input for a third (merge)

phase.

This leads to Multi-Phase, Multiway Merge

Sort.



(i) For each 100 blk chunk of R:- read chunk

- sort in memory- write to disk

sorted

chunks

R1

R2 ...

Example M=101

Memory



(ii) Read all chunks + merge + write out

Sorted file Memory Sorted

Chunks

......



Sort cost: cach tuple is read, written, read, written

Join cost: each tuple is read

Sort cost R1: 4 x 1,000 = 4,000Sort cost R2: 4 x 500 = 2,000

Total cost = sort cost + join cost= 6,000 + 1,500 = 7,500

IOs


Two-Pass Algorithms Based on SortingNested loop join (best version discussed above) needs only 5,500 IOs, i.e. outperforms sort-join.However, the situation changes for the following scenario:

R1 = 10,000 blocks clusteredR2 = 5,000 blocks not ordered.

R1 is 1000 blocks, sorting needs M 31.62.R2 is 500 blocks, sorting needs M 22.36. I.e., need at least M=32 buffers.


Two-Pass Algorithms Based on SortingNested-loops join: 5000 x (100+10,000) = 50 x 10,100

100 = 505,000 IOs

Sort-join: 5(10,000+5,000) = 75,000 IOs

Sort-join clearly outperforms nested-loop join!


Two-Pass Algorithms Based on SortingSimple sort-join costs 5(B(R) + B(S)) IOs.It requiresIt assumes that tuples with the same join attribute value fit in M blocks.If we do not have to worry about large numbers of tuples with the same join attribute value, then we can combine the second phase of the sort with the actual join (merge).We can save the writing to disk in the sort step and the reading in the merge step.

.)(and)( SBMRBM


Two-Pass Algorithms Based on SortingThis algorithm is an advanced sort-merge join.Repeatedly find the least C-value c among the tuples in all input buffers.Instead of writing a sorted output buffer to disk, and reading it again later, identify all the tuples of both relations that have C=c.Cost is only 3(B(R) + B(S)) IOs.Since we have to simultaneously sort both input tables and keep them in memory, the memory requirements are getting larger:

.)()( SBRBM


Two-Pass Algorithms Based on HashingIf both input relations are too large to be stored in the buffer, hash all the tuples of both relations applying the same hash function to the join attribute(s).

Hash function h has domain of k hash values, i.e. k buckets.

Only tuples from R and S that fall into the same bucket i can join.

Hash first relation R, then relation S, writing the buckets to disk.


Two-Pass Algorithms Based on HashingTo hash relation R, read it block by block.Allocate one buffer block to each of the k buckets.For each tuple t, move it to the buffer of h(t).If a buffer is full, write it to disk and initialize it.Finally, write to disk all partially full buffer blocks.IO cost is B(R).Memory requirement M = k+1.


Two-Pass Algorithms Based on HashingFor each i, read the i-th bucket of R into completely memory, and read the i-th bucket of S into memory, one block at a time.

For each S tuple s in the buffer block, determine matching tuples r in R and output (r,s).

We assume that each bucket fits into main memory.


Two-Pass Algorithms Based on HashingHash join

Hash function h, range 0 . . . kBuckets for R1: G0, G1, ... GkBuckets for R2: H0, H1, ... Hk

Algorithm(1) Hash R1 tuples into G buckets(2) Hash R2 tuples into H buckets(3) For i = 0 to k do

match tuples in buckets Gi, Hi and output results


Two-Pass Algorithms Based on Hashing

Example hash function: even/odd bucketsR1 R2 Buckets2 5 Even 4 4 3 12 Odd: 5 38 139 8

1114

2 4 8 4 12 8 14

3 5 9 5 3 13 11

R1 R2



R, S clustered (un-ordered).Use 100 hash buckets of 10 blocks each.To hash R: read R, hash, and write buckets.Hash S in the same way.

R

...

...

10 blocks

100


Two-Pass Algorithms Based on HashingSuppose R is the smaller of the input relations.Read one R bucket, build memory hash table (with other hash function).Read corresponding S bucket, one block at a time, and hash probe.Repeat same procedure for all buckets.

R

S

...

R

memory...


Two-Pass Algorithms Based on HashingCost“Bucketize:” Read R + write

Read S + writeJoin: Read R, S

Total cost = 3 x [1,000+500] = 4,500 IOThis is an approximation, since buckets will vary in size, and we have to round up to full blocks.



)(i.e.,1)1/()(

)(i.e.,1)1/()(

SBMMMSB

RBMMMRB

Memory requirementsSize of R bucket = (B(R)/M-1)

k = M-1 = number of hash bucketsThis is assuming that all hash buckets of R have the same size.Same calculation for S.The buckets for the smaller input relation must fit into main memory.


Index-Based Algorithms

Index-based algorithms are especially useful for the selection operator, but also for the join operator.We distinguish clustering and non-clustering indexes.A clustering index is an index where all tuples with a given search key value appear on (roughly) as few blocks as possible.One relation can have only one clustering index, but multiple non-clustering indexes.



Index join

For each r R doX index (S, C, r.C)

for each s X do output (r,s)

index(rel, attr, value)returns the set of rel tuples with attr = value



ExampleAssume R.C index exists; 2 levels.Assume S clustered, unordered.Assume R.C index fits in memory.Cost reads of S: 500 IOs

for each S tuple:- probe index – no IO- if match, read R tuple: 1 IO.



What is expected number of matching tuples?

(a) say R.C is key, S.C is foreign keythen expect 1 match

(b) say V(R,C) = 5000, T(R) = 10,000with uniform distribution assumptionexpect 10,000/5,000 = 2 matching tuples.



(c) Say DOM(R, C)=1,000,000 T(R) = 10,000

with alternate assumptionexpect = 10,000 = 1

matches 1,000,000 100

What is expected number of matching tuples?



Total cost of index join

(a) Total cost = 500+5000(1)1 = 5,500 IO

(b) Total cost = 500+5000(2)1 = 10,500 IO

(c) Total cost = 500+5000(1/100)1=550 IO



What if index does not fit in memory?

Example: say R1.C index is 201 blocks.

Keep root + 99 leaf nodes in memory.Expected cost of each probe is

E = (0)99 + (1)101 0.5.200 200



Total cost (including probes)For case (b)= 500+5000 [probe + get records]= 500+5000 [0.5+2] uniform assumption= 500+12,500 = 13,000 IOsFor case (c):

= 500+5000[0.5 1 + (1/100) 1] = 500+2500+50 = 3,050 IOs


Summary Join Algorithms

Nested-loop join ok for “small” relations (relative to memory size).

Hash-join usually best for equi-join, where relations not sorted and no indexes exist.

Sort-merge join good for non-equi-join e.g., R.C > S.C.

If relations already sorted, use merge join.

If index exists, index-join can be efficient (depends on expected result size).

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.

Documents

Transcript of CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.