ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and...
-
Upload
calvin-hubbard -
Category
Documents
-
view
214 -
download
0
description
Transcript of ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and...
![Page 1: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/1.jpg)
ICOM 6005 – Database Management ICOM 6005 – Database Management Systems DesignSystems Design
Dr. Manuel Rodríguez-MartínezElectrical and Computer Engineering Department
Lecture 15 – Query Optimization
![Page 2: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/2.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 2
Query OptimizationQuery Optimization
• Read :– Chapter 12, sec 12.4– Chapter 15– SAC+79 Pages
• Purpose:– Study different algorithms to optimize queries
submitted to the DBMS
![Page 3: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/3.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 3
IntroductionIntroduction• SQL query gets translated into relational algebra
expression• Relational algebra expression is represented as tree
– This is what DBMS “understands” how to process– Expression becomes a plan once we identify access
methods for each operator• Relational algebra expression might have an
equivalent expression– Example: R(A,B,C) , S(A, D, F)
• But, each expression might have different cost• How do we find the cheapest expression?
))(( ...,.,.,.,. SRSR ASARFSDSCRBRAR ×= =σπ><
![Page 4: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/4.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 4
Relational DBMS ArchitectureRelational DBMS Architecture
Disk Space Management
Buffer Management
File and Access Methods
Relational Operators
Query Optimizer
Query Parser
Client API
Client
DB
ExecutionEngine Concurrency
and Recovery
![Page 5: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/5.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 5
Query OptimizerQuery Optimizer
• Module in DBMS in charge of finding cheapest available plan to execute a query
• Building one is not easy!• Optimizer searches for plans and compares then
based on cost• Cost can be:
– Resource usage– Response time– Power consumption – Number of I/Os– Network transmission cost
![Page 6: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/6.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 6
Query PlansQuery Plans
• Query plan specifies the operations to be executed• Tree of operators
– Each operator corresponds to a relational operator
• Leaf nodes usually represent base tables
R S
R S
T
S
A,B
S
A,B
σA>2
![Page 7: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/7.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 7
Executing Query PlansExecuting Query Plans
• Plans generated by the optimizer are fed to the execution engine
• Plans support iterator interface – Open – initialize the operator– Next – get next tuple from operator– Close – de-allocate resources from operator
• Execution engine invokes each method• Invocation triggers cascade of calls
– Each operator call the corresponding methods on child nodes
– Example: open on join, causes call to open on outer table and call to open on inner table. Same for next and close.
![Page 8: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/8.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 8
Pipelined vs Materialized ExecutionPipelined vs Materialized Execution• Pipelined
– The output tuple from one operator immediately becomes input tuple to its parent operator in the tree
• Materialized– The output tuples from one operator must be stored to disk
first (into a temporary table)– Once the operator finishes, its parent operator can access
the materialed tuples• Most execution engines use pipelined
– Saving in I/O can be substantial! • Some operator cannot be pipelined
– Sorting, projections with duplicate elimination• Query optimizer must be aware of this issue!
![Page 9: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/9.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 9
Generation of Query PlansGeneration of Query Plans• Optimizer generates query plan after a search finds the optimal
one– According to some criteria
• Search is a search by construction– Alterative plans are built and compared– Cheapest one is kept
• Two major algorithms exist– Dynamic programming (SAC+79)
• Exhaustive search of plan space• Finds the optimal
– Randomized Algorithm• Random search of plan space• Quickly finds sub-optimal but good plan
• Optimization philosophy find a good plan quick, avoid bad ones!
![Page 10: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/10.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 10
Left-Deep Join vs Bushy PlansLeft-Deep Join vs Bushy Plans• Query Optimizer generate two major types of plans
– Left-deep plans– Bushy plans
• Left-deep plans– Every join has a base table as the inner join table– Use in commercial systems (first in System R)– Good for dynamic programming– Good for optimizing resource usage
• Bushy plans– Joins might have intermediate tables as input to the join– Good for randomized search– Use in research prototypes for distributed databases– Good for optimizing response time
![Page 11: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/11.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 11
Left-deep plansLeft-deep plans
R S
R S
T
R S
T
U
Each join always has base table as inner table
![Page 12: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/12.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 12
Bushy PlansBushy Plans
R S
T
R S
R S
T
U
U V
![Page 13: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/13.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 13
Left-Deep vs Bushy PlansLeft-Deep vs Bushy Plans
• Bushy plans– Enable parallelism in operator evaluation– Operator can execute at different rates
• Good in distributed environments– More complicate to build (harder optimizer)
• Left-deep plans– Joins are run in sequence
• Susceptible to bottleneck at some operator– Simpler to build (easier optimizer)– Good for single site systems
• Everything runs on the same machine
• Commercial DBMS systems use Left-deep plans
![Page 14: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/14.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 14
Cost of a planCost of a plan
• The cost of plan depends on the metric you wish to optimize
• Resource usage (CPU + I/O + Network)– Cost is the sum of the resources used by each operator
• Response time– Cost of the slowest path in the tree
• Number of I/Os– Cost is the sum of the I/Os generated by each operator
• Network cost– Cost is sum of cost in moving data between operators
![Page 15: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/15.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 15
Organization of an optimizerOrganization of an optimizer
Query ExecutionEngine
Query Parser
Parse Tree
Query Plan
SQL Query
CatalogManager
Catalog
QueryOptimizer
PlanGenerator
CostEstimator
![Page 16: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/16.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 16
Generating AlternativesGenerating Alternatives
• Relational equivalences are used by the optimizer to generate different operators that do the same
• Selection equivalences:– Cascading selections
– Commutative selections
))))((...(()(2121 ... RR
kk pppppp σσσσ =∧∧∧
))(())((1221RR pppp σσσσ =
![Page 17: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/17.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 17
Equivalence RulesEquivalence Rules
• Projections– Cascading projections
• Joins– Commutative rule
– Associative rule
)( RSSR ><>< =
TSRTSR ><><><>< )()( =
)))((()(211
RRkAAAA ππππ ≡
![Page 18: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/18.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 18
Equivalence Rules(2)Equivalence Rules(2)
• Commute selections and projections
• Pushing selections
• Decomposing selections
)()( ...21RR
kpppP ∧∧∧=σσ
SRSR pp ><>< )()( σσ =
val ))(())(( opApRR APPA ≡⇔= πσσπ
![Page 19: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/19.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 19
System R OptimizerSystem R Optimizer
• Based on left-deep plans and dynamic programming– Most commercial systems use a System R type of optimizer
• Cost is based on resource usage– Cost = CPU Cost + I/O Cost– Given a plan P, cost of P is computed as
• Cost(P) = operatorCost(P.root) + Cost(P.root.leftChild) + Cost(P.root.rightChild)
![Page 20: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/20.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 20
Estimating Cost of OperatorsEstimating Cost of Operators
• Key feature for this is selectivity factors, selectivity, and join costs
• Example:– R has no index
• |R| = 100,000, ||R|| = 5000– S has un-clustered Index on Join attribute
• |S| = 70,000, ||S|| = 2500– What algorithm shall be use for join?
• Chose between: BNLJ, INLJ, e GHJ with 20 B– What is the cost?
R S
![Page 21: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/21.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 21
System R Search AlgorithmSystem R Search Algorithm
• Idea:– Build every possible plan and keep track of
• Cheapest plan (overall)• Cheapest plans that bring data in sorted order (called
interesting orders)
• Dynamic Programming (divide-and-conquer)– To find plan for n-way join you
• First find singe table plans• Then find plans for all (n-1)-way join and find a plan to join
missing table with an (n-1)-way join• Plan for smaller joins are saved on a table
![Page 22: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/22.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 22
System R Search Algorithm (2)System R Search Algorithm (2)
• Process:– For an n-way join between tables R1, R2, …, Rn:
• Find the access path to access each table– Plans access to get R1, R2, …, Rn
» This includes application of selection and projections for each table
• Find the access path to compute 2-way joins– 2-way joins for all possible pairs of tables
• Find the access path to compute 3-way joins– Add a table to a 2-way join (forms all possible 3-way joins)
• Find the access path to compute 4-way joins– Add a table to a 3-way join (forms all possible 4-way joins)
• …• Find the access path to compute the n-way join
– Add a table to a n-1 way join (forms all possible n-way joins)
![Page 23: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/23.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 23
System R Search Algorithm (2)System R Search Algorithm (2)Plan SystemROptimizer(R1, R2, …, Rn){
for (int i = 0; i < n; ++i){ // single table access pathsoptPlan(Ri) = selectPlan(Ri);}for (int i=2; I < n; ++i){ // join access paths, start with 2 tables, then 3, …,for all S {R1, R2, R3, …Rn} s.t. |S| == i { // S is the next set to joinbestPlan = dummy plan with infinite cost;for all Rj, Sj s.t. S = Sj {Rj} { // Sj & Rj are pieces of SP = joinPlan(optPlan(Sj), optPlan(Rj))if (cost(P) < cost(bestPlan)){bestPlan = P;}}optPlan(Sj) = bestPlan;}}S = {R1, R2, …, Rn);return optPlan(S);
}
![Page 24: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/24.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 24
IllustrationIllustration
• How does the algorithm works for this case• Tables
– R - |R| = 100,000, ||R|| = 8,000– S - |S| = 90,000, ||S|| = 6,000– T - |T| = 120,000 ||T|| = 10,000– U - |U| = 80,000 ||U|| = 4,000
• All tables are stored in heap files• DBMS has: Blocked nested loops join, Hash Join and
25 free buffers• What is the best plan for query:
– RS T U
![Page 25: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/25.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 25
Illustration (2)Illustration (2)
• How does the algorithm works for this case• Tables
– R - |R| = 100,000, ||R|| = 8,000• R is stored on a clustered B+tree matching join attribute with T
– S - |S| = 90,000, ||S|| = 6,000• |R| is stored on
– T - |T| = 120,000 ||T|| = 10,000
• DBMS has: Blocked nested loops join, Indexed-nested loops join, and Hash Join and 3 free buffers
• What is the best plan for query:– RS T
![Page 26: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/26.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 26
Illustration (3)Illustration (3)• How does the algorithm works for this case• Tables
– R - |R| = 100,000, ||R|| = 8,000• R is stored on a clustered B+tree matching join attribute with T
– S - |S| = 90,000, ||S|| = 6,000• |R| is stored on
– T - |T| = 120,000 ||T|| = 10,000• DBMS has: Blocked nested loops join, Indexed-
nested loops join, and Hash Join and 25 free buffers• What is the best plan for query:
σA>3 B = ‘NY’ (R)S T– If SFA>3 = .10 and SF B= ‘NY’ = 0.05 and A>3 matches index
on R.
![Page 27: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/27.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 27
Issues with SystemR optimizerIssues with SystemR optimizer
• Algorithm performs exhaustive search of left-deep plans
• Dynamic Programming is ill-suited for optimization of response time– Principle of optimality is not observed
• Difficult (but not impossible) to modify for bushy plans– Search space is huge– Need pruning techniques to cut on the number of plans
stored
• Do we need exhaustive search?– Optimal plan vs sub-optimal that is good and quick to find
• Disaster avoidance – More important to avoid bad plans!!!
![Page 28: ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.](https://reader036.fdocuments.in/reader036/viewer/2022081517/5a4d1aee7f8b9ab05997c995/html5/thumbnails/28.jpg)
ICOM 6005 Dr. Manuel Rodriguez Martinez 28
Alternative approaches Alternative approaches
• Randomized Query Optimization– Use randomized algorithms to build and search plans– Good for bushy plans
• Rule-based Query Optimization– Use rules to guide the search and better prune space– Good to apply special cases and pruning
• Parametric Query Optimization– Add run-time parameters to really capture the reality of the
system
• Multiple-query Optimization– Optimizer takes 2 or more queries at a time for optimization