ECE 569 Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

29
Spring 2004 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization Yanyong Zhang www.ece.rutgers.edu/~ yyzhang Course URL www.ece.rutgers.edu/~yyzhang/spring04

description

ECE 569 Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization. Yanyong Zhang www.ece.rutgers.edu/~yyzhang Course URL www.ece.rutgers.edu/~yyzhang/spring04. Select Operation. simple condition (C=A  V) - PowerPoint PPT Presentation

Transcript of ECE 569 Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Page 1: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.1

ECE 569 Database System Engineering

Spring 2004

Topic VIII: Query Execution and optimization

Yanyong Zhang www.ece.rutgers.edu/~yyzhang

Course URL www.ece.rutgers.edu/~yyzhang/spring04

Page 2: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.2

Select Operation

simple condition (C=A V) Selectivity of condition C for relation R = |{t  R | C(t)}| / |R|

- The number of records satisfying a condition divided by number of tuples

- If there are i distinct values of V with uniform distribution, average selectivity is 1/i.

Linear Search

- Retrieve every tuple in relation and test predicate

- Cost = N_Pages

Equality predicate with index on A

- Locate all the tuples with search key V using the index

- Cost = ??

Inequality with B+-tree on A

- Locate first tuple with t with search ky V using index

- Retrieve tuples in ascending order starting with t if is . Otherwise retrieve tuples in desending order

- Cost = ?? (selectivity)

Page 3: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.3

Select operation (cont’d)

Conjunctive conditions (C = C1 C2 ... Ck )

Use one of the access methods above to retrieve tuples satisfying Ci. For each tuple, test remaining conditions.

- Choose Ci with lowest selectivity to reduce cost

If secondary indices containing tuple pointers exist for all or some of the attributes, retrieve all pointers that satisfy individual conditions. Intersect pointers and retrieve tuples.

Disjunctive Conditions (C1 C2 ... Ck)

If there is an access path for every condition, then select records and perform union to eliminate duplicates.

Page 4: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.4

Join Operation

T = R >< S

Nested loop Algorithmwhile records remain in R do

fetch next record r from Rwhile records remain in S do

fetch next record s from Sif (r[A] == s[B]) then

insert t into T where t[R] = r and t[S] =s

endend

Analysis

- rR # of records in R

- bR # of blocks in R

- Cost = rR(bs+1)

A = B

Page 5: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.5

Join Operation (cont’d)

T = R >< S

Nested loop with multiple buffers Use one buffer to sequence through blocks of S

Use nb-2 buffers for Rwhile records remain in R do

read in nb-2 buffers of tuples from Rwhile records remain in S do

fetch next record s from Sfor every record r of R in a buffer do

if (r[A] == s[B]) then insert t into Tend for

end whileend while

- Every block of R is read only once

- Every block in S is read bR/(nb-2)

- Cost = bR + bS/(nb-2)

- Outer loop should scan smaller relation

A = B

Page 6: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.6

Join operation (cont’d)

Index method Requires an index (or hash key) for one of the join

attributes. (Assume there is an index on B of S)

Algorithmwhile records remain in R do

fetch next tuple from Ruse index to retrieve all records with key r(B) in Sfor each record s retrieved do

insert t into Tend for

end while

Analysis

- xB average # of block accesses to retrieve a record using access path for attribute B

- Cost = bR + rR xB + bT

- These disk accesses may be slower than those from nested join.

Page 7: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.7

Join operation (cont’d)

Sort-merge join Requires that records in R be ordered by their values in A

and that S be ordered according to B.

Algorithm below assumes A and B are candidate keys.fetch next record r from Rfetch next record s from Swhile (r NULL and s NULL) do

if(r(A) > s(B)) then fetch next record s from Selse if (r(A) < s(B)) then fetch next record r

from Relse /* r(A) == s(B) */

insert t into Tfetch next record r from Rfetch next record s from S

end while

Analysis

- Records of each file are scanned only once

- Cost = bR + bs + bT

Page 8: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.8

Projection operations

Projection - p(R)

P includes a candidate key for R

- No need to check for duplicates

Otherwise, one of following can be used to eliminate duplicates

- If result is hashed, check for duplicates as tuples are inserted

- Sort resulting relation and eliminate duplicates that are now adjacent.

Page 9: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.9

Set operations

R S Hash records of R and S to same file. Do not insert

duplicates

Concatenate files, sort, and remove adjacent duplicates

R S Scan smaller file, attempt to locate each record in larger

file. (If found, add tuple to result)

R – S Copy records from R to result

Hash records of S to result. If tuple is found, delete it

Page 10: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.10

Query Optimization rule of thumb

R1: selections and projections are processed on the fly and almost never generate intermediate relations. Selections are processed as relations are accessed for the first time. Projections are processed as the results of other operators are generated.

R2: Cross products are never formed, unless the query itself asks for them. Relations are always combined through joins in the query.

R3: The inner operand of each join is a database relation, never an intermediate result.

Page 11: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.11

Heuristic Optimization of Query Trees

Consider following schemacustomers (cid, cname, ccity, discnt)

products (pid, pname, pcity, pquantity, price)

agents (aid, aname, acity, percent)

orders (ordno, month, ocid, oaid, opid, quantity, oprice)

QueryR:(select pid from products)

except

(select opid

from customers, orders, agents

where ccity = “Duluth” and

acity = “New York” and

cid = ocid and

aid = oaid)

Page 12: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.12

Heuristic (cont’d)

Translate the query into algebra

R: pid (products)-

opid (city=“Duluth” acity=“New York” cid = ocid aid = oaid

((customers X orders) X agents)

Query Tree -pid opid

products city=“Duluth” acity=“New York”

cid = ocid aid = oaid

X

X

customers orders agents

Page 13: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.13

Heuristic (cont’d)

1. Replace F F’(R) with F(F’(R)) wherever possible (allow flexibility in scheduling selects)

-

products

pid opid

city=“Duluth”

acity=“New York”

cid = ocid

aid = oaid

X

X

customers orders agents

Page 14: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.14

Heuristic (cont’d)

2. Move select operations as close to leaves as possible

-

products

pid opid

city=“Duluth”

acity=“New York”

cid = ocid

aid = oaid

X

X

customers

orders agents

Page 15: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.15

Heuristic (cont’d)

3. Rearrange tree so that most restrictive select executes first. Most restrictive select produces smallest result, or is one with smallest selectivity.

Assume most restrictive select is probably acity=“New York

-

products

pid opid

city=“Duluth”

acity=“New York”

cid = ocid

aid = oaid

X

X

customersorders

agents

Page 16: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.16

Heuristic (cont’d)

4. Replace cartesian product and adjacent select with join

-

products

pid opid

city=“Duluth”

acity=“New York”

cid = ocid

aid = oaid

customersorders

agents

><

><

Page 17: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.17

Heuristic (cont’d)

5. Project out unnecessary attributes as soon as possible.

-

products

pid opid

city=“Duluth”

acity=“New York”

cid = ocid

aid = oaid

customersorders

agents

><

><

aid oaid,ocid,opid

cid

Page 18: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.18

Heuristic (cont’d)

6.Map subtrees to execution methods such as: A single selection or projection

A selection followed by a projection

A join, union, or set difference with two operands. Each input can be preceded by selections and/or projections. The output can also be followed by a selection and/or projection.

Page 19: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.19

Heuristic (cont’d)

-

products

pid opid

city=“Duluth”

acity=“New York”

cid = ocid

aid = oaid

customersorders

agents

><

><

aid oaid,ocid,opid

cid

Page 20: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.20

Example

Characteristics of DBMS Available join methods – (1) nested loop; (2) sort-merge

join

QuerySELECT emp.Name, dept.name, acnt.type

FROM emp, dept, acnt

WHERE emp.dno = dept.dno AND

dept.ano = acnt.ano AND

emp.age 50 AND

acnt.balance 10000

Page 21: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.21

Example (cont’d)

name,dname,type (emp.dno=dept.dno dept.ano=acnt.ano emp.age50

acnt.balance10000((emp x dept) x acnt))

Now decide the order of join

acnt is the third relation

- emp >< dept or dept >< emp

emp is the third relation

- dept >< acnt or acnt >< dept

emp.age >= 50

emp

><

acnt.balance >= 10000

acnt

dept

Page 22: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.22

Relations emp(name, age, sal, dno)

Pages – 20, 000 Number of tuples – 100,000 Indexes – dense clustered B+-tree on sal (3-level deep)

dept(dno, dname, floor, budget, mgr, ano) Pages – 10 Number of tuples – 100 Indexes – dense clustered hash table on dno (avg bucket length = 1.2

pages)

acnt (ano, type, balance, bno) Pages – 100 Number of tuples – 1000 Indexes – dense clustered B+-tree on balance (3-level deep)

bank (bno, bname, address) Pages – 20 Number of tuples – 200 Indexes – none

Page 23: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.23

Histograms

emp.page (assume uniform distribution)

range frequency

0< x 10 0

10< x 20 500

20< x 30 2500

30< x 40 4000

40< x 50 2000

50< x 60 800

60< x 70 200

Acnt.balance

range frequency

0< x 100 200 / 100 = 2

100< x 500 200 / 400 = 0.5

500< x 5000 200 / 4500 = 0.044

5000< x 10000 200 / 5000 = 0.04

10000< x 50000 200 / 40000 = 0.005

50000< x < 0

Page 24: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.24

Method Cost Order Result Size Comment

SCAN 20,000 none 12,000 tuples2,400 pages

Size reduced by selectivity of age <= 50

Plans for accessing relation

Plans for retrieving tuples from emp, dept and acnt.

EMP

DEPT

Method Cost Order Result Size Comment

SCAN 10 none 100 tuples10 pages

 

Page 25: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.25

Plans for accessing relation (cont’d)

Method Cost Order Result Size Comment

SCAN 100 none 200 tuples20 pages

Size corrected for selectivity of balance >= 10000

B+-tree on balance

3+20 =23 none 200 tuples20 pages

 

ACNT

Page 26: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.26

Plans for joining two relations

Method Cost Order Result Size

Comment

Nested Loop (page oriented)

20000 + 2400*10 = 44000

none 12000 tuples2400 pages

Assume that tuple size is same as EMP tuples.

Nested Loop using hash table on dno

20000 + 12000*(1 + 1.2 + 1) = 58400

none 12000 tuples2400 pages

 

EMP >< DEPT

Page 27: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.27

Plans for joining two relations (cont’d)

DEPT >< EMP

Method Cost Order Result Size

Comment

Nested Loop (page oriented)

10 + 10 * 20000 = 200010

none 12000 tuples2400 pages

 

Page 28: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.28

Method Cost Order Result Size

Comment

Nested Loop (page oriented)

10 + 10 * 100 = 10010

none 200 tuples20 pages

 

Plans for joining two relations (cont’d)

DEPT >< ACNO

ACNO >< DEPT

Method Cost Order Result Size

Comment

Nested Loop (page oriented)

23 + 20 * 10 = 223

none 200 tuples20 pages

 

Page 29: ECE 569  Database System Engineering Spring 2004 Topic VIII: Query Execution and optimization

Spring 2004ECE569 Lecture 06.29

Plans for joining the third relation to the other two

Think on your own …