Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

22
Query optimisation 1 Query optimisation

Transcript of Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Page 1: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

1

Query optimisation

Page 2: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

2

Example - hospital database

Name Office …L. Johnson Sur1-left …

P. Thomson IC100 …… … …C. Craig Int-100 …

Name Disease D_name …C. Reed prk11 P. Thomson …M. Fox blood press L. Johnson …C. Fish stomach-ul L. Johnson …… … … …… … … …P. Wolf kidn-fail U. Ulrich …

100 tuples

2500 tuples

Doctors

Patients

Page 3: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

3

Query

Get the name of the doctors who treat patients suffering from the prk11 disease

SELECT Doctors.nameFROM Doctors, PatientsWHERE Disease = ‘prk11’ AND D_name = Doctors.Name

Page 4: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

4

Evaluation #1

restrict Patients to those who suffer from prk11 • read: 2500 tuples; result: estimated 50 tuples; no need to

write intermediate result - sufficiently small

join above result with Doctors• read: 100 tuples (Doctors); result 50 tuples; no need to

write to disk intermediate result

project result over Doctors.name• the desired result is in the memory

estimated cost (read and write) 2600

Page 5: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

5

Evaluation #2

• suppose the internal memory allows only some 350 tuples

join Patients with Doctors• read Patients in batches of 250 tuples; therefore read

Doctors 10 times; read: 2500 + 1000 = 3500; write intermediate result (too big) to disk: 2500;

restrict above result• read 2500; result: estimated 50 tuples;

project cost: 8500 (read and write)

Page 6: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

6

Intermediate conclusions

the evaluation strategy (procedural aspect) can lead to very big differences in computation time, for the same query

• computation time: read from and write to disk (quintessential)• processor time

the actual evaluation procedures are far more complex than in the previous introductory example

Page 7: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

7

Optimisation - what

deciding upon the best strategy of evaluating a query

it is performed automatically by the optimiser of the DBMS

not just for data retrieval operations, but for updating operations as well (e.g. UPDATE)

not guaranteed to give the best result

Page 8: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

8

Optimisation - how

based on statistical information about the specific database (not necessarily, though) perform expression transformation (cast query in some

internal form and convert to respective canonical form candidate low level procedures selection query plans generation and selection

statistical information - could you think of examples? cardinality of base relations, indexes, ...

Page 9: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

9

Cast (transform) query in some internal form

internal format• more suitable for automatic processing• trees (syntax tree or query tree)

from a conceptual point of view is is easier to assume that the internal format is relational algebra

Page 10: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

10

Convert to canonical form

the initial expression is transformed into an equivalent but more efficient form

• “efficient form” = efficient when executed• these transformation are performed independently from

actual data values and access paths

Page 11: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

11

Expression transformation

examples(A WHERE condition#1) WHERE condition#2

(A WHERE condition#1 AND condition#2)

(A [projection#1] ) [projection#2]

A [projection#2]

(A [projection]) WHERE condition

(A WHERE condition) [restriction]

Page 12: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

12

Expression transformation

distributivity commutativity and associativity idempotence scalar expressions conditional expressions semantic transformation

Page 13: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

13

Set level operations

the operators of relational algebra are set level

• i.e. they manipulate sets (relations) and not individual tuples

however, these operators are implemented by internal (DBMS) procedures

• these procedures, inherently, need tuple-access (in fact, they need access to scalar values)

Page 14: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

14

Choose candidate low-level procedures

the optimiser decides how to execute the query (expressed in canonical form)

• access paths are relevant at this stage

in the main, each basic operation (join , restriction, …) has a set of procedures that implement it

• e.g. RESTRICTION - (1) on candidate key; (2) on indexed key; (3) on other attributes …

• each procedure has associated a cost function (usually based on the required I/O disk operations); these functions are used in the next stage

Page 15: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

15

Implementing JOIN - examples

R and P - two relations to be joined J - the attribute on which the (natural) join is

performed R[i] and P[j] mean the i-th tuple of R and the

j-th tuple of P, respectively R[i].J means the value of the attribute J for

the i-th tuple of the relation R R has M and P has N tuples, respectively

Page 16: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

16

Implementing JOIN - brute force

for i:=1 to Mfor j := 1 to N do

if R[i].J = P[j].J thenadd joined tuple R[i]*P[j] to result

endend

end

Page 17: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

17

Index lookup

index X on Patients.D_name

Name Disease D_name …C. Reed prk11 Thomson …M. Fox blood press Johnson …C. Fish stomach-ul Johnson …M. Maria ear Thomson …P. Bosh nose Johnson …P. Wolf kidn-fail Ulrich …

D_name PointerJohnsonJohnsonJohnsonThomsonThomsonU. Ulrich

Page 18: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

18

Implementing JOIN - index lookup

/* index X on P.J */for i:=1 to M

for j := 1 to K[i] doadd joined tuple (R[i] * PK[j]) to result/* PK[j] represents the tuple of P that K[j] points to */

endend

Page 19: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

19

Choose the cheapest query plan

construct query plans (query evaluation plan)• combine candidate low level procedures• choose the cheapest• total cost = the sum of individual costs• individual costs depend on the actual data values;

estimates are used instead, based on statistical data • usually not all possible evaluation procedures are

generated; the search space is reduced by applying heuristics

Page 20: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

20

Database statistics - in the data dictionary

for each base table• cardinality• space occupied• etc.

for each column of each base table• no of distinct values• maximum, minimum and average values• histogram of values • …

...

Page 21: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

21

An optimiser is never perfect

the following example is a real life example suppose a Postgres definition for

• base relation: Treatment(Patient, Drug, Disease, …)

the query• get all the drugs that are taken by patients that suffer from prk11

• (all the drugs, not only those for prk11)

SELECT DISTINCT Drug FROM Treatment

WHERE Patient IN

(SELECT Patient FROM Treatment

WHERE Disease = ‘prk11’) ;

the query is far slower that the equivalent one (next) ...

Page 22: Query optimisation 1. 2 Example - hospital database 100 tuples 2500 tuples Doctors Patients.

Query optimisation

22

An optimiser is never perfect

/* this query is faster than the previous one, even though it seems to be performing more computations - Patient is not unique! */

CREATE VIEW V_Treatment AS SELECT * FROM Treatment

SELECT DISTINCT Treatment.Drug FROM Treatment, V_Treatment WHERE Treatment.Patient = V_Treatment.Patient AND Disease = ‘prk11’ ;