CMPT 454
description
Transcript of CMPT 454
CMPT 454Query Optimization
Query Optimization
Introduction Query Execution Unary Operations External Sorting Binary Operations
Query Compilation Steps
Parsing Construct a parse tree for a query
Generate equivalent logical query plans Convert the parse tree to a query plan in
relational algebra Transform the plan into more efficient
equivalents Generate a physical plan
Select algorithms for each of the operators in the query
Including details about how tables are to be accessed or sorted
Query Optimization Generating equivalent logical plans and their
physical plans is known as query optimization Selecting a good query plan entails deciding
Which equivalent plan leads to the most efficient query Which algorithm should be used to implement an
operation How data from one operation should be passed to the
next These choices depend on database metadata
Size of relations Number and frequency of attributes Indexing and data file organization
Query Processing
Query Compilation
query
Query Execution
result
generating, improving and estimating query
plans
algorithms to implement physical plan operators
Query Execution
Introduction
A physical query plan is made up of operators Each implementing one step of the plan and Corresponding to a relational algebra
operation Some relational algebra operations may
require lower level physical operations Such as scanning a table
Sorting and Scanning
It is sometimes necessary or useful to sort data as it is scanned To satisfy a query with an ORDER BY clause Or because an algorithm requires sorted
input There are a number of ways in which a
sort scan can be performed Main memory sorting B tree index Multi-way mergesort
Statistics There are M main memory frames available
for an operation M may or may not be the entire main memory
Information relating to a relation (table) R B(R) or B – number of blocks of R T(R) or T – number of tuples (i.e. records) of R V(R, a) – number of distinct values for attribute a
in R Assume that data is accessed one block at a
time from a disk This assumption will be discussed later
Computation Model Cost metric
The number of disk I/Os required for an operation Assume that
Input to an operation is to be read from disk The result is left in main memory (does not
include write cost) Note that the size of a result should not
influence the choice of algorithm for a particular operation Although it may influence the order of operations
I/O Cost for Scans
The cost to scan R depends on its organization If R is organized into a single file the cost of a
scan is B And if B < M the cost of a sort scan is B If R is distributed with records from other tables
the cost of a scan may be higher The use of an index does not reduce
the cost to scan R But may reduce the cost to retrieve a part of R
Unary Operations
Introduction In this section the costs of individual
relational algebra operations are considered There are often a number of algorithms that
can be used to implement the same operation Each operation is considered in isolation
How the individual operations interact is discussed in a later section
Note that the cost of a later operation may be affected by the result of an earlier operation, in terms of▪ The order of the records in the result▪ The algorithm used to achieve that result
Selection
Only records that match the selection are included in the result The size of the result depends on the number of
records that match the selection Complex selections that contain more than a
single term will be considered later
SELECT *FROM CustomerWHERE age > 50
age > 50(Customer)
Selections with No Index
If the file is not sorted on the selection attribute Perform a file scan, cost is B reads If the selection attribute is a superkey the scan
can be terminated once a match as been found (cost is B/2)
The file is sorted on the selection attribute Perform a binary search to find the first record
that matches the selection criteria Then read all records that match the selection
criteria▪ Which may entail reading additional pages
log2(B) + the number of pages of matching records - 1
Selections with B+ Tree Index Find the leaf page of the first index entry
that points to a matching record The number of disk reads is equal to the height
of the tree The cost to find a leaf page in a B+ tree index
is usually in the order of 2 or 3 disk reads Then retrieve the matching records
This cost is determined by the number of leaf pages containing matching entries, and
The number of records to be retrieved, and Whether the index is primary or secondary
Primary B+ Tree Index In a primary index the file is sorted on the
search key If the selection is small it may fit on one page
An additional disk read to access this page is required If the selection is large, additional pages may
have to be read The selection may cover a large range of records, or May be an equality selection on an attribute with few
distinct values The exact number of pages depends on the number
of records that match the selection
Secondary B+ Tree Index
In a secondary index the sort order of the file (if any) is different from the search key of the index The leaves of the tree form a dense index on
the file Adjacent matching data entries are
unlikely to point to records on the same page A disk read is required for each matching
record For large selections, a secondary index may
be less efficient than scanning the entire table! In practice it is important to recognize this, by
estimating the number of records returned by a selection
Secondary B+ Tree Range SelectionSELECT *FROM PatientWHERE lName = 'Zoobat'
lName = "Zoobat"(Patient)
……
4 patients called Zoobat
file
… …
Selections with Hash Index A hash index can be used to efficiently
retrieve equality selections The cost to retrieve the bucket containing the
data entries is typically 1 or 2 disk reads The cost to retrieve the corresponding records
depends on whether the index is primary or secondary▪ If the index is primary, matching records are likely to
reside on the same, single, file page whereas▪ If the index is secondary, matching records are likely to
reside on different pages of the file Hash indexes cannot be used for range
selections
Complex Selections
Selections may be made up of a number of terms connected by and () and or () These terms may reference different attributes It is useful to know whether or not an index can be
used to evaluate a selection To determine this the selection is first
expressed in conjunctive normal form (CNF) A collection of conjuncts (terms joined by and) Each conjunct consists either of a single term, or
multiple terms joined by or (i.e. disjunctions)▪ e.g. (A B) C D (A C D) (B C D)
Selections with no Disjunctions Hash indexes can be used if there is an equality
condition for every attribute in the search key e.g. a single hash index on {city, street, number}▪ city="London"street="Baker"number=221(Detective) matches▪ city="Los Angeles"street="Cahuenga"(Detective) does not
Tree indexes can be used if there is a selection on each of the first n attributes of the search key e.g. B+ index on {city, street, number}▪ Both city="London"street="Baker"number=221(Detective) and▪ city="Los Angeles"street="Cahuenga"(Detective) match
Selections with no Disjunctions… If an index matches a subset of the conjuncts
Use the index to return a result that contains some unwanted records
Scan this result for matches to the other conjuncts▪ city="London"street="Baker"number=221fName = "Sherlock" (Detective)
If more than one index matches a conjunct Either use the most selective index, then scan the result,
discarding records that fail to match to the other criteria Or use all indexes and retrieve the rids Take the intersection of the rids and retrieve those
records
Selections with no Disjunctions… Consider the relation and selection shown below
Detective = {id, fName, lName, age, city, street, number, author}
city="New York"author="Spillane"lName="Hammer"(Detective) With indexes
Secondary hash index, {city, street, number} Secondary B+ tree index, {lName, fName} Secondary hash index, {author}
There are two strategies: Use the most selective of the two matching indexes, and
search the results for the remaining criteria Use both indexes, take the intersection of the rid
can be usedcan’t be used
can be used
What if the B+ tree index is primary?
Selections with Disjunctions
If necessary, restate the selection in CNF If a conjunct includes a disjunction with no
index on its attribute, no index can be used for that conjunct If all the conjuncts contain such a disjunction a
file scan is necessary to satisfy the selection Otherwise, use an index that matches one
of the conjuncts and retrieve those records The remaining selection criteria can then be
applied to the results from using that index
Selections with Disjunctions … Consider a selection of this form
(a b c) (d e f)(R) Where each of a to f is an equality selection on an
attribute If each of the terms in either of the conjuncts has a
matching index Use the indexes to find the rids Take the union of the rids and retrieve those records
If there are indexes on the attributes of a, b, c, and e Use the a, b, and c indexes and take the union of the rids Retrieve the resulting records and apply the other criteria
Selections with Disjunctions … Consider the selections shown below
(author="King" age>35)(lName="Tam" id=11)(Detective) (author="King") (lName="Tam" id=11)(Detective)
Indexes on the relation Secondary B+ tree index, {lName, fName} Secondary hash index, {author}
Compare the two selections In the first selection each conjunct contains a disjunction
without an index (age, id) so a file scan is required In the second selection the index on author can be used,
and records that don't meet the other criteria removed
Projections
Only columns in the column list are retained Reducing the size of the result relation
There are two steps to implementing projections Remove the unwanted columns Eliminate any duplicate records, but only if DISTINCT has been specified▪ Duplicates can be removed either by sorting or
hashing
SELECT fName, lNameFROM Customer fName,lName(Customer)
External SortingA Digression
External Sorting
Two-way external merge sort External merge sort Replacement sort External merge sort notes
External Sorting Introduction It is often necessary to sort data in a
database To sort the results of a query To make some other operations more efficient▪ Bulk loading a tree index▪ Eliminating duplicate records▪ Joining relations
The focus on a DB sorting algorithm is to reduce the number of disk reads and writes
Internal vs. External Sorting Sorting a collection of records that fit
within main memory can be performed efficiently There are a number of sorting algorithms that
can be performed in n(log2n) time▪ That is, with n(log2n) comparisons, e.g., Mergesort,
Quicksort, Many DB tables are too large to fit into
main memory at one time So cannot be read into main memory and
sorted The focus in external sorting is to reduce the
number of disk I/Os
Merge Sort – a Brief Reminder Consider the Mergesort algorithm
Input sub-arrays are repeatedly halved Until they contain only one element
Sub-arrays are then merged into sorted sub-arrays by repeated merge operations merging two sorted sub-arrays can be performed in O(n)
mergesort(arr, start, end)if(start < end)
mid = start + end / 2mergesort(arr, start, mid)mergesort(arr, mid+1, end)merge(arr, start, mid, mid+1, end)
Naïve External Merge Sort Read the first two pages (of B pages) of the file
Sort their contents (independently of each other) Merge the two pages
Write them out to a sorted run of size 2 Repeat for the rest of the file
Producing B/2 sorted runs of size 2 Merge the first two sorted runs of size 2
Read in the first page of the first two runs into input pages
Merge to a single output page, and write it out when full When all records in an input page have been merged read
in the second page of that run
Naïve External Merge Sort …
After the first merge pass the file consists of B/2 sorted runs each of two pages
Read in the first page of each of the first two sorted runs
Leaving a third page free as an output buffer
main memory
sorted runs of size 2
disk
11
15
236 1
014
11
15
23
39
53
31
64
87
6 10
14
3 25
28
41
55
76
pages contain three records
6 10
14
11
15
23
Memory Usage
Records from the input pages are merged into the output buffer
Once the output buffer is full it's contents are written out to disk, to form the first page of the first sorted run of length 4
6 10
11
39
53
31
64
87
6 10
14
3 25
28
41
55
76
main memory
66 10
6 10
11
11
15
23
disk
6 10
14
11
15
23
Memory Usage
At this point all of the records from one of the input pages have been dealt with
The next page of that sorted run is read into the input page
And the process continues
39
53
31
64
87
6 10
14
3 25
28
41
55
76
main memory
1414
15
11
15
23
6 10
11
41
55
76
disk
Cost of Naïve Merge Sort
Assume that B = 2k, after the first merge pass there are 2k-1 sorted runs, each two pages long After the second pass there are 2k-2 sorted runs,
of length 4 After the kth pass there is one sorted run of
length B The number of passes is therefore log2 B
+1 log2B is the number of merge passes required The +1 is for the initial pass to sort each page
Each pass requires that each page be read and written back for a total cost of 2B(log2 B +1) Note that only 3 frames of main memory are
used!
First Stage Improvement In the first stage of the process each page is
read into main memory, sorted and written out The pages are then ready to be merged
Instead of sorting each page independently, read M pages into main memory, and sort them together M is the main memory frames available for the sort After the first pass there will be B/M sorted runs,
each of length M This will reduce the number of subsequent merge
passes
Merge Pass Improvement In the merge passes perform an M-1 way
merge rather than a 2 way merge, using M-1 input pages, one for each of M-1 sorted
runs and 1 page for an output buffer
The first item in each of the M-1 input partitions is compared to each other to determine the smallest Resulting in less merge passes, and less disk
I/O Each merge pass merges M-1 runs
After the first pass the runs are size (M-1)*M▪ Since M is the size of the runs from the first stage
Cost of External Merge Sort The initial pass produces B/M sorted runs
of size M Each merge pass reduces the number of
runs by a factor of M-1 The number of merge passes is logM-1B/M
Each pass requires that the entire file is read and written Total cost is therefore 2B (logM-1B/M + 1)
As M is typically relatively large this reduction is considerable
Number of Passes
M log2B+1
logb-1B/M
3 21 205 21 10
200 21 32,00
021 2
Even a large file can usually be sorted in two passes (a cost of 4B I/Os to sort and write out)
B = 1,000,000
Generating Longer Initial Runs In the first stage B/M sorted runs of
size M are produced The size of these preliminary runs
determines how many merge passes are required
The size of the initial runs can be increased by using replacement sort On average the size of the initial runs
can be increased to 2*M Using replacement sort increases
complexity
Replacement Sort
M-2 pages are used to sort the file Called the current set
One page is used for input, and
One page is used for output
The current set is filled … and sorted
current set
input
output
10
43
231 6
4873
550
194
15 8
6
1 5 101
923
354
143
506
486
87
41
26
13
1 64
87
35
50
19
12
24
94
41
5 86
10
43
23
main memory
disk
64
86
87
Replacement Sort
The next page of the file is read in to the input buffer
The smallest record from the current set (and the input buffer) is placed in the output buffer
The first location of the current set is now available for the first record from the input buffer
current set
input
output
main memory
41
26
13
1 64
87
35
50
19
12
24
94
41
5 86
10
43
23
1 5 101
923
354
143
50
12
24
94
1
12
5 10
1 5
12
24
10
1 5 10
12
24
94
disk
1 5 10
12
24
94
12
24
94
Replacement Sort
The process continues until the output buffer is full and
The contents of the input buffer are in the current set
At this point another input page can be read into the input buffer
And the output buffer can be written to disk as the first page of the first run
64
86
87
41
26
13
current set
input
output
19
23
354
143
50
1 5 10
41
26
13
1 64
87
35
50
19
12
24
94
41
5 86
10
43
23
main memory
disk
61
67
72
Replacement Sort
The current set can be periodically re-sorted
At some point a page is read that contains records with values less than the values in records in the output buffer
The rest of the current set must now be written out to complete the first run
The process then begins again
disk
41
26
13
1 64
87
35
50
19
12
24
94
41
5 86
10
43
23
… … …
… … …
… 3 …
current set
input
output
main memory
… … …
… … …… … …
Revisiting I/O Costs The cost metric used is simplistic In practice it my be advantageous to make the
input and output buffers larger than one page This reduces the number of runs that can be merged at
one time, so may increase the number of passes required
But, it allows a sequence of pages to be read or written to the buffers, decreasing the actual access time per page
We have also ignored CPU costs If double buffering is used, the CPU can process one
part of a run while the next is being loaded into main memory
Note: B+ Trees and Sorting Primary B+ tree index
The index can be used to find the first page, but Note that the file is already sorted!
Secondary B+ tree index Leaves point to data records that are not in sort order▪ In the worst case, each data entry could point to a different page
from its adjacent entries Retrieving the records in order requires reading all of the
index leaf pages, plus one disk read for each record! In practice external sort is likely to be much more
efficient than using a secondary index
Projections
Projections and Sorting Projection with duplicate removal can be performed in a
number of steps Scan the table, remove unwanted attributes, and write it back▪ cost = 2B disk I/Os
Sort the result, using all of its attributes as a compound sort key▪ cost = 4B, possibly more if the file is very large
Scan the result, removing the adjacent duplicates as they are encountered; cost = 2B▪ The result of this last stage may not have to be written out; it may be
the last operation or may be pipelined into another operations The total cost is 8B disk I/Os, but this basic process can be
improved by combining these steps
Improved Sort Projection The initial scan is performed as follows
Read M pages and remove unwanted attributes Sort the records, and remove any duplicates Write the sorted run to disk Repeat for the rest of the file, for a total cost of
2B Perform merge passes as required on the
output from the first stage Remove any duplicates as they are
encountered If only one merge pass is required the cost is ≈
1B For a total cost of 3B
Projections and Hashing
Duplicates can also be identified by using hashing
Duplicate removal by hashing has two stages Partitioning and probing
In the partitioning stage Partition into M-1 partitions using a hash
function, h▪ There is an output buffer for each partition, and▪ One input buffer
The file is read into main memory one page at a time, with each record being hashed to the appropriate buffer
Duplicates are eliminated in the next stage
Projections and Hashing … The duplicate elimination stage uses a
second hash function h2 (h2 h) to reduce main memory costs An in-memory hash table is built using h2 If two records hash to the same location they
are checked to see if they are duplicates Assuming that each partition produced in the
partitioning stage can fit in main memory the cost is Partitioning stage: 2B Duplicate elimination stage: B, for a total cost
of 3B This is the same cost as projection using
sorting
Sort, Hash Projection Comparison Sort and hash projection have the same cost
(3B) If M > (B) sorting and sort projection can be
performed in two passes The first pass produces B/M sorted runs If there are less than M-1 of them only one
merge pass is required With hash projection, if just one partition
is greater than M-1, further partitioning is required Regardless of the overall size of the file
Binary Operations
Joins
A join is defined as a Cartesian product followed by a selection Joins are used more often than Cartesian products
Cartesian products typically result in much larger tables than joins It is important to be able to efficiently implement
joins
SELECT *FROM Customer NATURAL INNER JOIN Account
C.sin = A.sin(Customer Account)
Customer Account
Cost of Joins
The naïve method of producing a join is to perform a Cartesian product then a selection This is very inefficient if the intermediate
relation is large For example, consider two relations R and
S R has 10,000 records, and S has 2,000 S has a foreign key in R▪ Implying that a record in S relates to at most one
record in R The sizes of the join and the Cartesian
product are Natural join – 2,000 (if each s in S relates to an
r in R) Cartesian product – 20,000,000 records
Join Methods
Nested loop joins These algorithms basically compute the
Cartesian product and discard records that do not meet the join condition
Index nested loop join Uses an index on one of the tables
Sort-merge join Sorts both tables on the join attribute and
merges results Hash join
Similar to the sort-merge join but uses hashing We will also consider the hybrid hash join
algorithm
Tuple Nested Loop Join This simple algorithm compares each record in the outer
table with each record in the inner relation Start with one record, r, in relation R Compare r to all the records, s, in relation S and see if the join
condition matches▪ If so concatenate the records and add to the result
Repeat with all r in R
for each record r Rfor each record s S
if ri = sj then add r,s to
result
R R.i=S.j S
Basic Nested Loop Join
…
R (disk)
S (disk)
scan all of S for each record in R
main memory frames
records
output
buffer
Nested Loop Join Cost The inner relation (S) has to be scanned once
for each record in the outer relation (R) For a cost of B(R) + (T(R) * B(S) ) The results are in the same order as R
A simple improvement is to process one block of R at a time, instead of one record at a time This reduces the cost to B(R) + (B(R) * B(S) ) If B(R) and B(S) are roughly the same size this is O(n2)▪ Where n = B(R)
Note that the result is no longer strictly ordered on R The smaller relation should be the outer relation
Improved Nested Loop Join
…
R (disk)
S (disk)
scan all of S for each page in R
main memory frames
records
output
buffer
Block Nested Loop Join The nested loop join algorithm does not make
effective use of main memory Only one page of the outer relation is processed at a time
The algorithm can be significantly improved by reading as much of the outer relation as possible Use M – 2 pages as an input buffer for the outer relation, 1 page as an input buffer for the inner relation, and 1 page as an output buffer If the smaller relation fits in M – 2 pages the cost is B(R) +
B(S)!▪ CPU costs can be reduced by building an in-memory hash table on
the inner relation
Block Nested Loop Join … What if the smaller relation is larger than M-2?
Break R, the outer relation, into blocks of M – 2 pages▪ Refer (somewhat flippantly) to these blocks as clumps
Scan S once for each clump of R ▪ Insert concatenated records r, s that match the join
condition into the output buffer▪ S is read B(R)/(M-2) times
The total cost is B(R) + (B(R)/(M-2) * B(S)) Disk seek time can be reduced by increasing
the size of the input buffer for S Which may increase the number of times that S is
scanned
Improved Nested Loop Join
…
R (disk)
S (disk)
scan all of S for each M-2 pages in R
main memory frames
records
output
buffer…
M-2 pages
Index Nested Loop Join If there is an index on the join attribute
Make the indexed relation the inner relation (S) Read the outer relation, retrieving matching records of
S using the index▪ Does not first calculate the Cartesian product as it only
retrieves records that satisfy the join condition The cost is dependent on the kind of index and
the number of matching records An index lookup is required for each record in R▪ B+ tree index: 2 – 4, hash index: 1 – 2
If multiple S records match to a single R record the total retrieval cost depends on whether the index is primary
Index Nested Loop Join
…
R (disk)
S (disk)
main memory frames
records
output
buffer
read page with
matching slook up index using join
attribute of r
Sort-Merge Join Assume that both tables to be joined are sorted
on the join attribute The tables may be joined with one pass, in a similar
way to merging two sorted runs At a cost of B(R) + B(S)
If the tables are not sorted on the join attribute they may be sorted using multi-way Mergesort The sort can typically be performed in two passes▪ At a cost of 4 B(R) + 4 B(S), including writing out the results
Joining the tables has an additional cost of B(R) + B(S)
Sort-Merge Join Process
Read in pages of R and S Assume that the join attribute is x
While xr is different from xs If xr is less than xs move to the next record from
R, else Move to the next record from S
If xr equals xs Create the joined record by concatenating r
and s, and Add the resulting record to the output buffer
Repeat the process until all records have been read
Improved Sort-Merge Join The merge phase of the external sort algorithm
can be combined with the join Produce sorted runs (of size M) of R and S Merge R, and S, then compare the results, discarding
records that do not meet the join condition requires▪ A buffer for each sorted run of R▪ A buffer for each sorted run of S,▪ One buffer for merged runs of R,▪ One buffer for the merged runs of S, and▪ One buffer for the concatenated results
The improvement requires more main memory as there must be buffers for each run of both R and S
Sort-Merge Join… 16 …… 24 …… 55 …… 63 …… 71 …
… 23 …… 29 …… 31 …… 46 …… 53 …
… 16 …… 23 …… 24 …… 29 …… 31 …
… 11 …… 21 …… 25 …… 26 …… 29 …
… 15 …… 16 …… 23 …… 27 …… 32 …
… 11 …… 15 …… 16 …… 21 …… 23 …
r 16 sr 23 s………
Read in sorted runs of R andS
Merge R
and merge S
Merge and concatenate records
Memory Requirements
Sort-merge join can be performed in two passes For a cost of 3(B(R) + B(S))
This assumes that main memory is large enough to allow an input buffer for each run of both R and S The initial pass produces sorted runs of size M Main memory must be greater than (B(R) +
B(S)) to perform the join in two passes▪ The first pass produces B(R) /M runs of R and B(S) /M
runs of S▪ If M is not greater than (B(R) + B(S) ) then (B(R) /M +
B(S) /M ) must be greater than M
Sort-Merge Join and Indexes The sort-merge join can take advantage of a
primary index The relation with the index is already sorted so no initial
sort pass is required If both relations have a primary index on the join
attribute a zig-zag join can be performed Scan the leaves of the two B+ trees in order from the left When the search key value of one index is higher, scan
the other index Only when both indexes contain the same search key
values are the records retrieved and concatenated
Hash Join The hash join algorithm has two phases
Partitioning, and Probing
Partitioning Both relations are partitioned using the same hash
function, h, on the join attribute▪ Records in one partition of R can only match to records in the same
partition of S One input buffer and M- 1 output buffer pages are used to
make M - 1 partitions for each relation If both of the largest partitions of both relations do not fit
in main memory, the relations must be further partitioned
Hash Join continued
Probing Read in one partition of R To reduce CPU costs, build an in memory hash
table using hash function h2 (h2 h) Read the corresponding partition of S into an
input buffer and join matching records using the hash table
Repeat for each partition of R Cost
If each partition of R fits in main memory the overall cost is 3(B(R) + B(S))
Memory Requirements
Hash join is performed in two phases At a cost of 3(B(R) + B(S))
Main memory must be big enough to hold the largest partition of the smaller relation, R Which requires that M - 2 is (B(R))▪ A buffer for S and for output are needed ▪ Partitioning produces B(R) - 1 partitions, of average size
M /(B(R) - 1) If M < (B(R)) then B(R) / M must be larger than M,
and the partitions are larger than main memory
Hybrid Hash Join
If main memory is large enough, retain an entire partition of R during the partitioning phase This eliminates the need to write out the
partition, and read it back in during the probing phase
In addition, the matching S records can be joined (and written out to the result) when S is partitioned
Hence the records (of both R and S) belonging to that partition are only read once
This approach can be generalized to retain more than one partition where possible
If the smaller relation fits in main memory the costs are identical The smaller relation is read once The larger relation is scanned once to join the records
If the relations are large hash join is more efficient Block nested loop join must read S once for each clump
of R Hash join matches partitions of R to partitions of S
during the probing phase The records of both R and S belonging to a particular
partition are only read once, after the partitioning phase
Hybrid Hash Join and Block Nested Loop Join
Hash Join and Sort-Merge Join A skewed distribution of data may result in the
relations being partitioned more than once Because the partitions may vary considerably in
size Sort-merge join is not sensitive to data skew
If M falls between (B(R)) and (B(S)) hash join requires less passes Hash join only requires enough memory to hold
partitions of one relation The memory requirements of sort-merge join
depend on the size of both relations
Join Method Ordering Simple nested loop join (read S for each
record) Retains the original order of R
Index nested loop join Retains the original order of R
Sort-Merge join Ordered by the join attribute
All other join methods No order
General Join Conditions The join process is more complex if the join
condition is not simple equality on one attribute For equalities over several attributes
Sort-merge and hash join must sort (or hash) over all of the attributes in the selection
An index that matches one of the equalities may be used for the index nested loop join
For inequalities (, , etc.) Hash indexes cannot be used for index nested loop joins Sort-merge and hash joins are not possible Other join algorithms are unaffected
Set Operations
For set operations, unlike other SQL operations, duplicates are removed by default Unless ALL is specified Therefore most set operations entail sorting the
relations
SELECT fName, lNameFROM PatientINTERSECTSELECT fName, lNameFROM Doctor
fName,lName(Patient) fName,lName(Doctor)
Intersection and Cartesian Product
Intersection R S A join where the condition is equality on all
attributes An in-memory search structure should be used
Cartesian product R S A special case of join where there is no join
condition All records are joined to each other
fName,lName(Patient) fName,lName(Doctor)is equivalent to
fName,lName(Patient) fName,lName(Doctor)
Union
Union using sorting Sort R and S using all fields Scan and merge the results while removing
duplicates Union using hashing
Partition R and S using a hash function h For each partition of S (if S is the smaller
relation)▪ Build an in-memory hash table (using h2) ▪ Scan the corresponding partition of R▪ For each record, r, probe the hash table, if a duplicate
record already exists discard it, otherwise add it to the table
▪ Write out the hash table and process the next partition
Set Difference
Set Difference is implemented in a similar way to union
In sorting, in the merge phase check that records of R are not in S before adding them to the result
In hashing, probe the hash table (for S), if a record of R is not in the table then add it to the result
If the query contains ALL a count of duplicates should be maintained Instead of storing duplicate records
Aggregations
The basic method for calculating an aggregation is to scan the entire table
While keeping track of running information SUM – total of the retrieved values AVG – total, and count of the retrieved values COUNT – count of the retrieved values MIN – smallest of the retrieved values MAX – largest of the retrieved values
SELECT AVG(age)FROM Patient
Grouped Aggregations
An aggregation can include a GROUP BY clause In which case the aggregation has to be
computed for each group Aggregations with groupings can be
computed in a number of ways By sorting or hashing, or By using an index
SELECT AVG(income)FROM DoctorGROUP BY speciality
Sorting and Groups In the sorting based approach to the table is sorted
on the grouping attribute The results of the sort are scanned and the result of
the aggregate operation computed These two processes can be combined in a similar way to
the sort based projection and join algorithms The cost is essentially the cost of the sorting
operations Probably 3B(R) Although the final result will be considerably smaller than
the sorted table
Hashing and Groups In the hash based approach an in-memory hash
table is build on the grouping attribute Hash table entries consist of▪ grouping-value, running-information
The table is scanned, and for each record Probe the hash table to find the entry for the group that
the record belongs to, and Update the running information for that group
Once the table has been scanned the grouped results are computed using the hash table entries If the hash table fits in main memory the cost is B(R)
Aggregations and Indexes It may be possible to satisfy an aggregate
query using just the data entries of an index The search key must include all of the
attributes required for the query▪ The data entries may be sorted or hashed, and▪ No access to the records is required
If the GROUP BY clause is a prefix of a tree index, the data entries can be retrieved in the grouping order▪ The actual records may also be retrieved in this order
This is an example of an index-only plan
SummaryMemory Requirements
One-Pass and Simple Algorithms
Operation
Algorithm
M Requiremen
t
Disk I/O
, scan 1 B, * scan B B, , , scan min(B(R),
B(S))B(R) + B(S)
nested loop
min(B(R), B(S))
B(R) + B(S)
nested loop
M 2 B(R) * B(S)/M
* = duplicate removal, = grouping
Sort-Based Algorithms
Operation
M Requiremen
t
Disk I/O Notes
, B 3B, , B(R) + B(S) 3(B(R) +
B(S))max(B(R), B(S))
5(B(R) + B(S))
B(R) + B(S) 3(B(R) + B(S))
improved sort-join
Hash-Based Algorithms
Operation
M Requiremen
t
Disk I/O Notes
, B 3B, , B(S) 3(B(R) + B(S))
B(S) 3(B(R) + B(S))B(S) 3 – 2(M/B(S)) * (B(R) +
B(S))hybrid hash-join
Assume B(S) B(R), and that B(S) M