Download - Query optimisation

Transcript
Page 1: Query optimisation

CS263

Query Optimisation

Page 2: Query optimisation

Motivation for Query Optimisation Phases of Query Processing Query Trees RA Transformation Rules Heuristic Processing Strategies Cost Estimation for RA Operations

LECTURE PLAN

Page 3: Query optimisation

Motivation for Query OptimisationList all the managers that work in the sales department.

SELECT *

FROM emp, dept

WHERE emp.deptno = dept.deptno

AND emp.job = ‘Manager’

AND dept.name = ‘Sales’;

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

(job = ‘Manager’) (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)

((job = ‘Manager’) (EMP)) emp.deptno = dept.deptno ((name=‘Sales’) (DEPT))

There are at least three alternative ways of representing this query as a Relational Algebra expression.

Page 4: Query optimisation

Motivation for Query Optimisation

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

Metrics:1000 tuples in the EMP relation50 tuples in the DEPT relation50 employees are Managers (one per department)5 separate Sales departments (across the country)

Cost of processing the following query alternate:

Cartesian product of EMP and DEPT: (1000 + 50) record I/O’s to read the relations

+ (1000 * 50) record I/O’s to create an intermediate relation to store result

Selection on result of Cartesian product: (1000 * 50) record I/O’s to read tuples and compare against predicate

Total cost of the query: (1000 + 50) + 2*(1000 * 50) = 101, 050 record I/O’s.

Page 5: Query optimisation

Motivation for Query OptimisationMetrics:1000 tuples in the EMP relation50 tuples in the DEPT relation50 employees are Managers (one per department)5 separate Sales departments (across the country)

Cost of processing the following query alternate:

Join of EMP and DEPT over deptno: (1000 + 50) record I/O’s to read the relations

+ (1000) record I/O’s to create an intermediate relation to store join result

Selection on result of Join: (1000) record I/O’s to read each tuple and compare against predicate

Total cost of the query: (1000 + 50) + 2*(1000) = 3, 050 record I/O’s.

(job = ‘Manager’) (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)

Page 6: Query optimisation

Motivation for Query OptimisationCost of processing the following query:

((job = ‘Manager’) (EMP)) emp.deptno = dept.deptno ((name=‘Sales’) (DEPT))

Select ‘Managers’ in EMP: (1000) record I/O’s to read the relations

+ (50) record I/O’s to create an intermediate relation to store select result

Select ‘Sales’ in DEPT: (50) record I/O’s to read the relations

+ (5) record I/O’s to create an intermediate relation to store select result

Join of previous two selections over deptno: (50 + 5) record I/O’s to read the relations

Total cost of the query: (1000 2*(50) + 5 +(50 +5)) = 1, 160 record I/O’s.

Page 7: Query optimisation

Phases of Query Processing

Page 8: Query optimisation

Query Processing Stage - 1

Cast the query into internal form

This involves the conversion of the original (SQL) query into some internal representation more suitable for machine manipulation.

The internal representation typically chosen is either some kind of ‘abstract syntax tree’, or a relational algebra ‘query tree’.

Page 9: Query optimisation

Relational Algebra Query Trees

A Relational Algebra query can be represented as a ‘query tree’. For example the query to list all the managers that work in the sales department could be described as one of the following:

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno)

Leaves

Intermediateoperations

Root

Page 10: Query optimisation

Relational Algebra Query Trees

A Relational Algebra query can be represented as a ‘query tree’. For example the query to list all the managers that work in the sales department could be described as one of the following:

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

Leaves

Intermediateoperations

Root

Page 11: Query optimisation

Relational Algebra Query Trees

(job = ‘Manager’) (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

Alternative‘query tree’ for the query to list all the managers that work in the sales department:

Page 12: Query optimisation

Relational Algebra Query Trees

((job = ‘Manager’) (EMP)) emp.deptno = dept.deptno ((name=‘Sales’) (DEPT))

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’) (name=‘Sales’)

Alternative‘query tree’ for the query to list all the managers that work in the sales department:

Page 13: Query optimisation

Query Processing Stage - 2

Convert to canonical form

Find a more ‘efficient’ representation of the query by converting the internal representation into some equivalent (canonical) form through the application of a set of well-defined ‘transformation rules’.

The set of transformation rules to apply will generally be the result of the application of specific heuristic processing strategies associated with particular DBMSs.

Page 14: Query optimisation

1. Conjunctive selection operations can cascade into individual selection operations (and vice versa).

Sometimes referred to as cascade of selection.

pqr(R) = p(q(r(R)))

Example:

deptno=10 sal>1000(Emp) = deptno=10(sal>1000(Emp))

Transformation Rules for RA Operations

Page 15: Query optimisation

2. Commutativity of selection

p(q(R)) = q(p(R))

Example:

sal>1000(deptno=10(Emp)) = deptno=10(sal>1000(Emp))

Transformation Rules for RA Operations

Page 16: Query optimisation

3. In a sequence of projection operations, only the last in the sequence is required.

LM … N(R) = L (R)

Example:

deptnoname(Dept) = deptno (Dept))

Transformation Rules for RA Operations

Page 17: Query optimisation

4. Commutativity of selection and projection.

Ai, …, Am(p(R)) = p(Ai, …, Am(R))

where p {A1, A2, …, Am}

Example:

name, job(name=‘Smith’(Emp)) = name=‘Smith'(name, job(Staff))

Transformation Rules for RA Operations

Selection predicate (p) is only made up of projected attributes

Page 18: Query optimisation

5. Commutativity of theta-join (and Cartesian product).

Rp S = Sp R

Transformation Rules for RA Operations

R X S = S X R

Example:

EMP emp.deptno = dept.deptno DEPT

= DEPT emp.deptno = dept.deptno EMP

NOTE: Theta-join is a generalisation of both the equi-join and natural-join

Page 19: Query optimisation

6. Commutativity of selection and theta-join (or Cartesian

product).

Transformation Rules for RA Operations

Example:

emp.deptno=10 (EMP)) emp.deptno = dept.deptno DEPT

= emp.deptno=10 (EMP emp.deptno = dept.deptno DEPT)

(p(R)) r S = p(R r S)

where p {A1, A2, …, Am}

Selection predicate (p) is only made up of join attributes

Page 20: Query optimisation

7. Commutativity of projection and theta-join (or Cartesian

product).

Transformation Rules for RA Operations

Example:

job, location, deptno (EMP emp.deptno = dept.deptno DEPT)

= ( job, deptno (EMP)) emp.deptno = dept.deptno ( location, deptno (DEPT))

L(R r S) = (L1(R)) r (L2(S))

Project attributes L = L1 L2, where L1 are attributes of R, and L2 are attributes of S. L will also contain the join attributes

Page 21: Query optimisation

8. Commutativity of union and intersection (but not set

difference).

R S = S R

R S = S R

Transformation Rules for RA Operations

Page 22: Query optimisation

Transformation Rules for RA Operations

9. Commutativity of selection and set operations (union, intersection, and set difference).

Union

p(R S) = p(S) p(R)

Intersection

p(R S) = p(S) p(R)

Set Difference

p(R - S) = p(S) - p(R)

Page 23: Query optimisation

10 Commutativity of projection and union

L(R S) = L(S) L(R)

Transformation Rules for RA Operations

Page 24: Query optimisation

11 Associativity of natural join (and Cartesian product)

Natural Join

(R S) T = R (S T)

Cartesian Product

(R X S) X T = R X (S X T)

Transformation Rules for RA Operations

Page 25: Query optimisation

Transformation Rules for RA Operations

12 Associativity of union and intersection (but not set difference)

Union

(R S) T = S (R T)

Intersection

(R S) T = S (R T)

Page 26: Query optimisation

Heuristic Processing Strategies

Perform selection operations as early as possible

Translate a Cartesian product and subsequent selection (whose predicate represents a join condition) into a join operation.

Use associativity of binary operations to ensure that the most restrictive selection operations are executed first

Perform projections as early as possible.

Compute common expressions once

Page 27: Query optimisation

Heuristic Processing - Example

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’) (name=‘Sales’)

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’) (name=‘Sales’)

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’)(job = ‘Manager’) (name=‘Sales’)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

OptimisedCanonical Query

Page 28: Query optimisation

Query Processing Stage - 3

Choose candidate low-level procedures

Consider the (optimised canonical) query as a series of low-level operations (join, restrict, etc…).

For each of these operations generate alternative execution strategies and calculate the cost of such strategies on the basis of statistical information held about the database tables (files).

Page 29: Query optimisation

Query Processing Stage - 4

Generate query plans and choose the cheapest

Construct a set of ‘candidate’ Query Execution Plans (QEPs).

Each QEP is constructed by selecting a candidate implementation procedure for each operation in the canonical query and then combining them to form a string of associated operations.

Each QEP will have an (estimated) cost associated with it – the sum of the cost of each of its operations.

Choose the QEP with the least cost.

Page 30: Query optimisation

Cost Based Optimisation

Cost Based Optimisation (stages 3 & 4)

A good declarative query optimiser does not rely solely on heuristic processing strategies.

It chooses the QEP with the lowest estimated cost.

After heuristic rules are applied to a query, there still remains a number of alternative ways to execute it .

The Query Optimiser estimates the cost of executing each one (or at least a number) of these alternatives, and selects the cheapest one.

Page 31: Query optimisation

Costs associated with query execution

Secondary storage access costs: Searching for data blocks on disk, Reading data blocks from disk Writing data block to disk

Storage costs Cost of storing intermediate (temp) files

Computation costs Cost of CPU usage

Main memory usage costs Cost of buffering data

Communication costs Cost of moving data across

Page 32: Query optimisation

Database statistics used in cost estimation

Information held on each relation:

number of tuples number of blocks blocking factor primary access method primary access attributes secondary indexes secondary indexing attributes number of levels for each index number of distinct values of each attribute

Page 33: Query optimisation

Physical Data Structures – File Types Heap (Sequential, Unordered)

no key columns queries, other than appends, scan every page rows are appended at the end duplicate rows are allowed

Ordered physically sorted data file with no index

Hash (Random, Direct) data is located based on the (calculated) value of a hash field (key)

Indexed Sequential (ISAM) sorted data file with a primary index

B+Tree dynamic multilevel index reuses deleted space on associated data pages

Page 34: Query optimisation

Strategies for implementing the RESTRICT operation

Different access strategies dependant upon the structure of the file in which the relation is stored, and whether the predicate attribute(s) have been indexed/hashed: Each uses a different cost algorithm (which refers to specific database statistics).

Linear Search (Heap) Binary Search (Ordered)

Equality on Hash Key Equality condition on primary key Inequality condition on primary key Equality condition on secondary index Inequality condition on secondary B+Tree index

If the selection predicate is a composite (AND & OR) then there are additional cost considerations!

Page 35: Query optimisation

Strategies for implementing the JOIN operation

Different access strategies dependant upon the structure of the files in which the relations to be joined are stored, and whether the join attributes have been indexed/hashed: Each uses its own cost algorithm (which refers to specific database statistics).

Block nested loop join Indexed nested loop join Sort-merge join Hash join

Page 36: Query optimisation

Query Optimisation Summary

The aims of query processing are to transform a query written in a high-level language (SQL), into a correct and efficient execution strategy expressed in a low-level language (Relational Algebra), and to execute the strategy to retrieve the required data.

There are many equivalent transformations of the same high-level query, the DBMS has to choose the one that minimises resource usage.

There are two main techniques for query optimisation. The first uses heuristic rules that order the operations in a query. The second compares different execution strategies for those operations, based on their relative costs, and selects the least resource intensive (cheapest) ones.