Query optimisation

Click here to load reader

  • date post

    17-Dec-2014
  • Category

    Education

  • view

    69
  • download

    3

Embed Size (px)

description

Query optimisation

Transcript of Query optimisation

  • 1. CS263 Query Optimisation

2. Motivation for Query Optimisation Phases of Query Processing Query Trees RA Transformation Rules Heuristic Processing Strategies Cost Estimation for RA Operations LECTURE PLAN 3. Motivation for Query Optimisation List all the managers that work in the sales department. SELECT * FROM emp, dept WHERE emp.deptno = dept.deptno AND emp.job = Manager AND dept.name = Sales; (job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT) (job = Manager) (name=Sales) (EMP emp.deptno = dept.deptno DEPT) ((job = Manager) (EMP)) emp.deptno = dept.deptno ((name=Sales) (DEPT)) There are at least three alternative ways of representing this query as a Relational Algebra expression. 4. Motivation for Query Optimisation (job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT) Metrics: 1000 tuples in the EMP relation 50 tuples in the DEPT relation 50 employees are Managers (one per department) 5 separate Sales departments (across the country) Cost of processing the following query alternate: Cartesian product of EMP and DEPT: (1000 + 50) record I/Os to read the relations + (1000 * 50) record I/Os to create an intermediate relation to store result Selection on result of Cartesian product: (1000 * 50) record I/Os to read tuples and compare against predicate Total cost of the query: (1000 + 50) + 2*(1000 * 50) = 101, 050 record I/Os. 5. Motivation for Query Optimisation Metrics: 1000 tuples in the EMP relation 50 tuples in the DEPT relation 50 employees are Managers (one per department) 5 separate Sales departments (across the country) Cost of processing the following query alternate: Join of EMP and DEPT over deptno: (1000 + 50) record I/Os to read the relations + (1000) record I/Os to create an intermediate relation to store join result Selection on result of Join: (1000) record I/Os to read each tuple and compare against predicate Total cost of the query: (1000 + 50) + 2*(1000) = 3, 050 record I/Os. (job = Manager) (name=Sales) (EMP emp.deptno = dept.deptno DEPT) 6. Motivation for Query Optimisation Cost of processing the following query: ((job = Manager) (EMP)) emp.deptno = dept.deptno ((name=Sales) (DEPT)) Select Managers in EMP: (1000) record I/Os to read the relations + (50) record I/Os to create an intermediate relation to store select result Select Sales in DEPT: (50) record I/Os to read the relations + (5) record I/Os to create an intermediate relation to store select result Join of previous two selections over deptno: (50 + 5) record I/Os to read the relations Total cost of the query: (1000 2*(50) + 5 +(50 +5)) = 1, 160 record I/Os. 7. Phases of Query Processing 8. Query Processing Stage - 1 Cast the query into internal form This involves the conversion of the original (SQL) query into some internal representation more suitable for machine manipulation. The internal representation typically chosen is either some kind of abstract syntax tree, or a relational algebra query tree. 9. Relational Algebra Query Trees A Relational Algebra query can be represented as a query tree. For example the query to list all the managers that work in the sales department could be described as one of the following: (job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT) EMP DEPT X (job = Manager) (name=Sales) (emp.deptno = dept.deptno) Leaves Intermediate operations Root 10. Relational Algebra Query Trees A Relational Algebra query can be represented as a query tree. For example the query to list all the managers that work in the sales department could be described as one of the following: (job = Manager) (name=Sales) (emp.deptno = dept.deptno) (EMP X DEPT) EMP DEPT X (job = Manager) (name=Sales) (emp.deptno = dept.deptno) Leaves Intermediate operations Root 11. Relational Algebra Query Trees (job = Manager) (name=Sales) (EMP emp.deptno = dept.deptno DEPT) EMP DEPT (job = Manager) (name=Sales) emp.deptno = dept.deptno Alternativequery tree for the query to list all the managers that work in the sales department: 12. Relational Algebra Query Trees ((job = Manager) (EMP)) emp.deptno = dept.deptno ((name=Sales) (DEPT)) EMP DEPT emp.deptno = dept.deptno (job = Manager) (name=Sales) Alternativequery tree for the query to list all the managers that work in the sales department: 13. Query Processing Stage - 2 Convert to canonical form Find a more efficient representation of the query by converting the internal representation into some equivalent (canonical) form through the application of a set of well-defined transformation rules. The set of transformation rules to apply will generally be the result of the application of specific heuristic processing strategies associated with particular DBMSs. 14. 1. Conjunctive selection operations can cascade into individual selection operations (and vice versa). Sometimes referred to as cascade of selection. pqr(R) = p(q(r(R))) Example: deptno=10 sal>1000(Emp) = deptno=10(sal>1000(Emp)) Transformation Rules for RA Operations 15. 2. Commutativity of selection p(q(R)) = q(p(R)) Example: sal>1000(deptno=10(Emp)) = deptno=10(sal>1000(Emp)) Transformation Rules for RA Operations 16. 3. In a sequence of projection operations, only the last in the sequence is required. LM N(R) = L (R) Example: deptnoname(Dept) = deptno (Dept)) Transformation Rules for RA Operations 17. 4. Commutativity of selection and projection. Ai,,Am(p(R)) = p(Ai,,Am(R)) where p {A1, A2, , Am} Example: name, job(name=Smith(Emp)) = name=Smith'(name,job(Staff)) Transformation Rules for RA Operations Selection predicate (p) is only made up of projected attributes 18. 5. Commutativity of theta-join (and Cartesian product). R pS = S pR Transformation Rules for RA Operations R X S = S X R Example: EMP emp.deptno = dept.deptno DEPT = DEPT emp.deptno = dept.deptno EMP NOTE: Theta-join is a generalisation of both the equi-join and natural-join 19. 6. Commutativity of selection and theta-join (or Cartesian product). Transformation Rules for RA Operations Example: (emp.deptno=10 (EMP)) emp.deptno = dept.deptno DEPT = emp.deptno=10 (EMP emp.deptno = dept.deptno DEPT) (p(R)) r S = p(R r S) where p {A1, A2, , Am} Selection predicate (p) is only made up of join attributes 20. 7. Commutativity of projection and theta-join (or Cartesian product). Transformation Rules for RA Operations Example: job, location, deptno (EMP emp.deptno = dept.deptno DEPT) = ( job, deptno (EMP)) emp.deptno = dept.deptno ( location, deptno (DEPT)) L(R r S) = (L1(R)) r (L2(S)) Project attributes L = L1 L2, where L1 are attributes of R, and L2 are attributes of S. L will also contain the join attributes 21. 8. Commutativity of union and intersection (but not set difference). R S = S R R S = S R Transformation Rules for RA Operations 22. Transformation Rules for RA Operations 9. Commutativity of selection and set operations (union, intersection, and set difference). Union p(R S) = p(S) p(R) Intersection p(R S) = p(S) p(R) Set Difference p(R - S) = p(S) - p(R) 23. 10 Commutativity of projection and union L(R S) = L(S) L(R) Transformation Rules for RA Operations 24. 11 Associativity of natural join (and Cartesian product) Natural Join (R S) T = R (S T) Cartesian Product (R X S) X T = R X (S X T) Transformation Rules for RA Operations 25. Transformation Rules for RA Operations 12 Associativity of union and intersection (but not set difference) Union (R S) T = S (R T) Intersection (R S) T = S (R T) 26. Heuristic Processing Strategies Perform selection operations as early as possible Translate a Cartesian product and subsequent selection (whose predicate represents a join condition) into a join operation. Use associativity of binary operations to ensure that the most restrictive selection operations are executed first Perform projections as early as possible. Compute common expressions once 27. Heuristic Processing - Example EMP DEPT (job =Manager) (name=Sales) emp.deptno = dept.deptno EMP DEPT (job =Manager) (name=Sales) emp.deptno = dept.deptno EMP DEPT (job =Manager) (name=Sales) emp.deptno = dept.deptno EMP DEPT emp.deptno = dept.deptno (job =Manager) (name=Sales) EMP DEPT emp.deptno = dept.deptno (job =Manager) (name=Sales) EMP DEPT emp.deptno = dept.deptno (job =Manager) (job =Manager) (name=Sales) EMP DEPT X (job =Manager) (name=Sales) (emp.deptno = dept.deptno) EMP DEPT X (job =Manager) (name=Sales) (emp.deptno = dept.deptno) EMP DEPT X (job =Manager) (name=Sales) (emp.deptno = dept.deptno) Optimised Canonical Query 28. Query Processing Stage - 3 Choose candidate low-level procedures Consider the (optimised canonical) query as a series of low-level operations (join, restrict, etc). For each of these operations generate alternative execution strategies and calculate the cost of such strategies on the basis of statistical information held about the database tables (files). 29. Query Processing Stage - 4 Generate query plans and choose the cheapest Construct a set of candidate Query Execution Plans (QEPs). Each QEP is constructed by selecting a candidate implementation procedure for each operation in the canonical query and then combining them to form a string of associated operations. Each QEP will have an (estimated) cost associated with it the sum of the cost of each of its operations. Choose the QEP with the least cost. 30. Cost Based Optimisation Cost Based Optimisation (stages 3 & 4) A good declarative query optimiser does not rely solely on heuristic processing strategies. It chooses the QEP with the lowest estimated cost. After heuristic rules are applied to a query, there still remains a number of alternative ways to execute it . The Query Optimiser estimates the cost of executing each one (or at least a number) of these alternatives, and selects the cheapest one. 31. Costs associated with query execution Secondary s