EN 600.619: Adv. Storage and TP Systems Cost-Based Query Optimization.
-
Upload
samantha-franklin -
Category
Documents
-
view
219 -
download
0
Transcript of EN 600.619: Adv. Storage and TP Systems Cost-Based Query Optimization.
EN 600.619: Adv. Storage and TP Systems
The Optimization Process
• Logical query plan– As an expression tree
• Rewrite query plan to improve performance
• Create physical plan– Select algorithms to
implement logical planQuickTime™ and a
TIFF (LZW) decompressorare needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
An Expression Tree
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
SELECT title, birthdate
FROM MovieStar, StarsIn
WHERE year=1996 AND
gender=‘F’ AND
starName= name;
EN 600.619: Adv. Storage and TP Systems
An Alternate (Better) Logical Plan
SELECT title, birthdate
FROM MovieStar, StarsIn
WHERE year=1996 AND
gender=‘F’ AND
starName= name;
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Query Optimization Heuristics
• Push operators as far down the plan as possible
• Do selections as soon as possible– Reduce intermediate result sizes
• Select then project
• Perform joins as late as possible– They are more costly
• Group associative and commutative operators– Let the physical plan reorder execution
EN 600.619: Adv. Storage and TP Systems
Improving the Plan
• Through query rewriting
• Split the selection
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Improving the Plan
• Through query rewriting
• Split the selection
• Push the projection
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Grouping Operators
• The physical (not logical) plan should pick the order
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
The Physical Plan
• Choose algorithms and estimate result size to generate concrete costs of a plan
• E.g. joins– Discipline: Hash, Index, Sort– Materialize, pipeline, ripple, parallel, etc.
• Large literature on different disciplines for all operations– Suitable for an entire (albeit detailed) course
• Also, how to search for good plans– Branch and bound, hill climbing, dynamic programming, etc.
• Result size and choice of algorithm are independent– For relation algebra operations
EN 600.619: Adv. Storage and TP Systems
Estimating Result Sizes
• Most inaccurate and difficult part of query processing– Cost of an operation is a f ( algorithm, size estimate )
– Given exact size, costing is very accurate
• Sometime sizing can be exact– Equality queries for unique attributes are 0/1
– Joins on key (foreign key) fields
– Good schema design improves query execution
• For many operations it is difficult– Joins: expand (cross product) or reduce (more often)
– Range queries: produce multiple tuples
• 50% accuracy is considered good……ugh!
EN 600.619: Adv. Storage and TP Systems
Problems w/ Estimating Size
• Need to know result sizes a-priori– Know them exactly after query execution
• Techniques need to be lightweight– Performing I/O as part of estimation reduces query performance
• General approach– Statistics on underlying tables for important queries
– Small, summary data structures (in-memory execution)
• Techniques– Histograms, sampling, wavelets
EN 600.619: Adv. Storage and TP Systems
Histograms
SELECT Jan.day, July,day
FROM Jan, July
WHERE Jan.temp = July.temp
Join estimate = T1T2/V
tuple product/width
Estimate:
5x20/10 + 10x5/10 = 10
Better than est. w/out histogram
245x245/100 = 600
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
On Histograms
• Workload defined– Keep for important fields. Similar concept to indexes.
• Data defined– Keep when they improve performance.
– Don’t need a histogram for the uniform distribution
• Complications– Update queries invalidate statistics
– Need to be pre-computed, often prior to witnessing workload
– Composing histograms (for multiple attributes) leads to inaccuracies
• What the world needs is fully incremental histograms on that support multi-attribute queries
EN 600.619: Adv. Storage and TP Systems
STHoles
Bruno, Chaudhuri, and Gravano. STHoles: A Multidimensional Workload-Aware Histogram, SIGMOD 2001.
• Generate histograms from analyzing query results– No examination of data sets
– Leverage workload information and query feedback
• Supports overlapped and nested buckets– Multi-resolution histogram
– Buckets allocated where they are most needed, e.g. if there are no queries to a region, no statistics are kept
EN 600.619: Adv. Storage and TP Systems
Feedback-Based Optimization
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Visualizing Histograms
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Histogram Construction
• Start with an empty histogram
• New queries punch ‘holes’ in the histogram, creating regions of refinement
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Policies
• Identify and drill candidate holes
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Policies
• Shrink regions to preserve rectangular spaces– Ease of description and improved accuracy
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
Policies
• Merge buckets (with similar densities) to improve histogram under a space budget
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
EN 600.619: Adv. Storage and TP Systems
STHoles Redux
• Quality histograms
• Runtime overhead (<10%)– Dynamic construction of histograms
– But, no pre-processing
• Preferable in several situations– Frequently updated data, needs distribution to change
– Shifting workloads -- STHoles can redirect attention to new regions dynamically. (This is what’s cool.)