Indexes Part 2 and Query Execution - Stanford...
Transcript of Indexes Part 2 and Query Execution - Stanford...
![Page 1: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/1.jpg)
Indexes Part 2 and Query Execution
Instructor: Matei Zahariacs245.stanford.edu
![Page 2: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/2.jpg)
From Last Time: Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245 2
![Page 3: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/3.jpg)
Hash Indexes
key h(key)
record / ptr
...
Buckets(block sized)
Buckets can contain records or pointers to file
overflowbucket
CS 245 3
Chaining is used to handle bucket overflow
![Page 4: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/4.jpg)
Hash vs Tree Indexes
+ O(1) instead of O(log N) disk accesses
– Can’t efficiently do range queries
CS 245 4
![Page 5: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/5.jpg)
Challenge: Resizing
Hash tables try to keep occupancy in a fixed range (50-80%) and slow down beyond that» Too much chaining
How to resize the table when this happens?» In memory: just move everything, amortized
cost is pretty low» On disk: moving everything is expensive!
CS 245 5
![Page 6: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/6.jpg)
Extendible Hashing
Tree-like design for hash tables that allows cheap resizing while requiring 2 IOs / access
CS 245 6
![Page 7: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/7.jpg)
Extendible Hashing: 2 Ideas
(a) Use i of b bits output by hash functionb
h(K) ®
i
i will grow over time; the first i bits of each key’s hash are used to map it to a bucket
00110101
CS 245 7
![Page 8: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/8.jpg)
(b) Use a directory with pointers to buckets
h(K)[0..i] to bucket...
...
Extendible Hashing: 2 Ideas
CS 245 8
![Page 9: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/9.jpg)
Example: 4-bit h(K), 2 keys/bucket
i = 11
1
0001
10011100
CS 245 9
local depthglobal depth
0010
Insert 0010
![Page 10: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/10.jpg)
Example: 4-bit h(K), 2 keys/bucket
i = 11
1
0001
10011100
Insert 1010
CS 245 10
local depthglobal depth
![Page 11: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/11.jpg)
i = 11
1
0001
10011100
Insert 101011100
1010
Example: 4-bit h(K), 2 keys/bucket
CS 245 11
![Page 12: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/12.jpg)
i = 11
1
0001
10011100
Insert 101011100
1010
New directory
200
01
10
11
i =
2
2
Example: 4-bit h(K), 2 keys/bucket
CS 245 12
![Page 13: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/13.jpg)
10001
21001101021100
Insert:
0111
0000
00
01
10
11
2i =
Example
CS 245 13
![Page 14: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/14.jpg)
10001
21001101021100
Insert:
0111
0000
00
01
10
11
2i =
0111
0000
0111
0001
Example
CS 245 14
![Page 15: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/15.jpg)
10001
21001101021100
Insert:
0111
0000
00
01
10
11
2i =
0111
0000
0111
0001
2
2Example
CS 245 15
![Page 16: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/16.jpg)
00
01
10
11
2i =
210011010
21100
20111
200000001
Example
Note: still need chaining if values of h(K) repeat and
fill a bucket
CS 245 16
![Page 17: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/17.jpg)
Some Types of Indexes
Conventional indexes
B-trees
Hash indexes
Multi-key indexing
CS 245 17
![Page 18: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/18.jpg)
Motivation
Example: find records where
DEPT = “Toy” AND SALARY > 50k
CS 245 18
![Page 19: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/19.jpg)
Strategy I:
Use one index, say Dept.
Get all Dept = “Toy” recordsand check their salary
I1
CS 245 19
![Page 20: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/20.jpg)
Strategy II:
Use 2 indexes; intersect lists of pointers
Toy Sal> 50k
CS 245 20
![Page 21: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/21.jpg)
Strategy III:
Multi-key index
One design:
I1
I2
I3
CS 245 21
![Page 22: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/22.jpg)
h
nb
i a
co
de
g
f
m
l
kj
k-d Trees
CS 245 23
Split dimensions in any order to hold k-dimensional data
![Page 23: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/23.jpg)
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj
CS 245 24
k-d Trees
![Page 24: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/24.jpg)
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj25 15 35 20
40
30
20
10
CS 245 25
k-d Trees
![Page 25: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/25.jpg)
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj25 15 35 20
40
30
20
10
5
15 15
CS 245 26
k-d Trees
![Page 26: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/26.jpg)
h
nb
i a
co
d
10 20
10 20
e
g
f
m
l
kj25 15 35 20
40
30
20
10
5
15 15
h i a bcd efg
n omlj k
Efficient range queries in both
dimensionsCS 245 27
k-d Trees
![Page 27: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/27.jpg)
Storage System Examples
MySQL: transactional DBMS» Row-oriented storage with 16 KB pages» Variable length records with headers, overflow» Index types:• B-tree•Hash (in memory only)• R-tree (spatial data)• Inverted lists for full text search
» Can compress pages with Lempel-Ziv
CS 245 29
![Page 28: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/28.jpg)
Apache Parquet + Hive: analytical data lake» Column-oriented storage as set of ~1 GB files
(each file has a slice of all columns)» Various compression and encoding schemes
at the level of pages in a file• Special scheme for nested fields (Dremel)
» Header with statistics at the start of each file•Min/max of columns, nulls, Bloom filter
» Files partitioned into directories by one key
CS 245 30
Storage System Examples
![Page 29: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/29.jpg)
Query Execution
Overview
Relational operators
Execution methods
CS 245 31
![Page 30: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/30.jpg)
Query Execution Overview
Recall that one of our key principles in data intensive systems was declarative APIs» Specify what you want to compute, not how
We saw how these can translate into many storage strategies
How to execute queries in a declarative API?
CS 245 32
![Page 31: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/31.jpg)
Query Execution Overview
Query representation (e.g. SQL)
Logical query plan(e.g. relational algebra)
Optimized logical plan
Physical plan(code/operators to run)
Many execution methods: per-record exec, vectorization,
compilation
CS 245 33
![Page 32: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/32.jpg)
Plan Optimization Methods
Rule-based: systematically replace some expressions with other expressions» Replace X OR TRUE with TRUE» Replace M*A + M*B with M*(A+B) for matrices
Cost-based: propose several execution plans and pick best based on a cost model
Adaptive: update execution plan at runtime
CS 245 34
![Page 33: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/33.jpg)
Execution Methods
Interpretation: walk through query plan operators for each record
Vectorization: walk through in batches
Compilation: generate code (like System R)
CS 245 35
![Page 34: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/34.jpg)
parse
convert
apply rules
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1, P2, …}
{(P1,C1), (P2,C2), ...}
Pi
result
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
Typical RDBMS Execution
CS 245 36
![Page 35: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/35.jpg)
Query Execution
Overview
Relational operators
Execution methods
CS 245 37
![Page 36: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/36.jpg)
The Relational Algebra
Collection of operators over tables (relations)» Each table has named attributes (fields)
Codd’s original RA: tables are sets of tuples(unordered and tuples cannot repeat)
SQL’s RA: tables are bags (multisets) of tuples; unordered but each tuple may repeat
CS 245 38
![Page 37: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/37.jpg)
Relational Algebra Operators
Basic set operators:
Intersection: R ∩ S
Union: R ∪ S
Difference: R – S
Cartesian Product: R ⨯ S
for tables with same schema
CS 245 39
{ (r, s) | r ∈ R, s ∈ S }
![Page 38: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/38.jpg)
Relational Algebra Operators
Basic set operators:
Intersection: R ∩ S
Union: R ∪ S
Difference: R – S
Cartesian Product: R ⨯ S
CS 245 40
consider both distinct (set union)and non-distinct (bag union)
![Page 39: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/39.jpg)
Relational Algebra Operators
Special query processing operators:
Selection: σcondition(R)
Projection: Pexpressions(R)
Natural Join: R ⨝ S
{ r ∈ R | condition(r) is true }
{ expressions(r) | r ∈ R }
{ (r, s) ∈ R ⨯ S) | r.key = s.key }where key is the common fields
CS 245 41
![Page 40: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/40.jpg)
Relational Algebra Operators
Special query processing operators:
Aggregation: keysGagg(attr)(R)
Examples: departmentGMax(salary)(Employees)
GMax(salary)(Employees)
SELECT agg(attr)FROM RGROUP BY keys
CS 245 42
![Page 41: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/41.jpg)
Algebraic Properties
Many properties about which combinations of operators are equivalent» That’s why it’s called an algebra!
CS 245 43
![Page 42: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/42.jpg)
Properties: Unions, Products and JoinsR ∪ S = S ∪ RR ∪ (S ∪ T) = (R ∪ S) ∪ T
R ⨯ S = S ⨯ R(R ⨯ S) ⨯ T = R ⨯ (S ⨯ T)
R ⨝ S = S ⨝ R(R ⨝ S) ⨝ T = R ⨝ (S ⨝ T) CS 245 44
Attribute order in a relation doesn’t matter either
Tuple order in a relation doesn’t matter (unordered)
![Page 43: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/43.jpg)
Properties: Selects
σp∧q(R) =
σp∨q(R) =
CS 245 45
![Page 44: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/44.jpg)
Properties: Selects
σp∧q(R) = σp(σq(R))
σp∨q(R) = σp(R) ∪ σq(R)
CS 245 46
careful with repeated elements
![Page 45: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/45.jpg)
Bags vs. Sets
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
R ∪ S = ?
CS 245 47
![Page 46: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/46.jpg)
Bags vs. Sets
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
R ∪ S = ?
CS 245 48
• Option 1: SUM of countsR ∪ S = {a,a,b,b,b,b,b,c,c,c,d}
• Option 2: MAX of countsR ∪ S = {a,a,b,b,b,c,c,d}
![Page 47: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/47.jpg)
Executive Decision
Use “SUM” option for bag unions
Some rules that work for set unions cannot be used for bags
CS 245 49
![Page 48: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/48.jpg)
Let: X = set of attributesY = set of attributes
PX∪Y (R) =
Properties: Project
CS 245 50
![Page 49: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/49.jpg)
Let: X = set of attributesY = set of attributes
PX∪Y (R) = PX(PY(R))
Properties: Project
CS 245 51
![Page 50: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/50.jpg)
Let: X = set of attributesY = set of attributes
PX∪Y (R) = PX(PY(R))
Properties: Project
CS 245 52
![Page 51: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/51.jpg)
Properties: σ + ⨝
Let p = predicate with only R attribs
q = predicate with only S attribs
m = predicate with only R, S attribs
σp(R ⨝ S) =
σq(R ⨝ S) = CS 245 53
![Page 52: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/52.jpg)
Properties: σ + ⨝
Let p = predicate with only R attribs
q = predicate with only S attribs
m = predicate with only R, S attribs
σp(R ⨝ S) = σp(R) ⨝ S
σq(R ⨝ S) = R ⨝ σq(S)CS 245 54
![Page 53: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/53.jpg)
Properties: σ + ⨝
Some rules can be derived:
σp∧q(R ⨝ S) =
σp∧q∧m(R ⨝ S) =
σp∨q(R ⨝ S) =
CS 245 55
![Page 54: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/54.jpg)
Properties: σ + ⨝
Some rules can be derived:
σp∧q(R ⨝ S) = σp(R) ⨝ σq(S)
σp∧q∧m(R ⨝ S) = σm(σp(R) ⨝ σq(S))
σp∨q(R ⨝ S) = (σp(R) ⨝ S) ∪ (R ⨝ σq(S))
CS 245 56
![Page 55: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/55.jpg)
Prove One, Others for Practice
σp∧q(R ⨝ S) = σp (σq(R ⨝ S))
= σp (R ⨝ σq(S))
= σp (R) ⨝ σq(S)
CS 245 57
![Page 56: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/56.jpg)
Properties: P + σ
Let x = subset of R attributes
z = attributes in predicate p(subset of R attributes)
Px(σp (R)) =
CS 245 58
![Page 57: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/57.jpg)
Properties: P + σ
Let x = subset of R attributes
z = attributes in predicate p(subset of R attributes)
Px(σp (R)) = σp(Px(R))
CS 245 59
![Page 58: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/58.jpg)
Properties: P + σ
Let x = subset of R attributes
z = attributes in predicate p(subset of R attributes)
Px(σp (R)) = Px(σp(Px∪z(R)))
CS 245 60
![Page 59: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/59.jpg)
Let x = subset of R attributesy = subset of S attributesz = intersection of R,S attributes
Px∪y(R ⨝ S) = Px∪y ((Px∪z (R)) ⨝ (Py∪z (S)))
CS 245 61
Properties: P + ⨝
![Page 60: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/60.jpg)
parse
convert
apply rules
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1, P2, …}
{(P1,C1), (P2,C2), ...}
Pi
result
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
Typical RDBMS Execution
CS 245 62
![Page 61: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/61.jpg)
Example SQL Query
SELECT titleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’
);
(Find the movies with stars born in 1960)
CS 245 63
![Page 62: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/62.jpg)
Parse Tree <Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Tuple> IN <Query>
title StarsIn <Attribute> ( <Query> )
starName <SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Attribute> LIKE <Pattern>
name MovieStar birthDate ‘%1960’
CS 245 64
![Page 63: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/63.jpg)
Ptitle
sstarName=name
StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStar
´
Logical Query Plan
CS 245 65
![Page 64: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/64.jpg)
Improved Logical Query Plan
Ptitle
starName=name
StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStar
Question:Push Ptitleto StarsIn?
CS 245 66
![Page 65: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/65.jpg)
Need expected size
StarsIn
MovieStar
P
s
Estimate Result Sizes
CS 245 67
![Page 66: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/66.jpg)
Parameters: join order,memory size, project attributes, ...Hash join
Seq scan Index scan Parameters:select condition, ...
StarsIn MovieStar
One Physical Plan
H
CS 245 68
![Page 67: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/67.jpg)
Parameters: join order,memory size, project attributes, ...Hash join
Index scan Seq scan Parameters:select condition, ...
StarsIn MovieStar
Another Physical Plan
H
CS 245 69
![Page 68: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/68.jpg)
Sort-merge join
Seq scan Seq scan
StarsIn MovieStar
Another Physical Plan
CS 245 70
Which plan is likely to be better?
![Page 69: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/69.jpg)
Logical plan
P1 P2 … Pn
C1 C2 … Cn
Pick best!
Estimating Plan Costs
Physical plancandidates
CS 245 71
Covered in next few lectures!
![Page 70: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/70.jpg)
Query Execution
Overview
Relational operators
Execution methods
CS 245 72
![Page 71: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/71.jpg)
Now That We Have a Plan, How Do We Run it?
Several different options that trade between complexity, setup time & performance
CS 245 73
![Page 72: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/72.jpg)
Example: Simple Query
SELECT quantity * priceFROM ordersWHERE productId = 75
Pquanity*price (σproductId=75 (orders))
CS 245 74
![Page 73: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/73.jpg)
Method 1: Interpretationinterface Operator {Tuple next();
}
class TableScan: Operator {String tableName;
}
class Select: Operator {Operator parent;Expression condition;
}
class Project: Operator {Operator parent;Expression[] exprs;
}
CS 245 75
interface Expression {Value compute(Tuple in);
}
class Attribute: Expression {String name;
}
class Times: Expression {Expression left, right;
}
class Equals: Expression {Expression left, right;
}
![Page 74: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/74.jpg)
Example Expression Classes
CS 245 76
class Attribute: Expression {String name;
Value compute(Tuple in) {return in.getField(name);
}}
class Times: Expression {Expression left, right;
Value compute(Tuple in) {return left.compute(in) * right.compute(in);
}}
probably better to use anumeric field ID instead
![Page 75: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/75.jpg)
Example Operator Classes
CS 245 77
class TableScan: Operator {String tableName;
Tuple next() {// read & return next record from file
}}
class Project: Operator {Operator parent;Expression[] exprs;
Tuple next() {tuple = parent.next();fields = [expr.compute(tuple) for expr in exprs];return new Tuple(fields);
}}
![Page 76: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/76.jpg)
Running Our Query with Interpretation
CS 245 78
ops = Project(expr = Times(Attr(“quantity”), Attr(“price”)),parent = Select(expr = Equals(Attr(“productId”), Literal(75)),parent = TableScan(“orders”)
));
while(true) {Tuple t = ops.next();if (t != null) {out.write(t);
} else {break;
}}
Pros & cons of this approach?
recursively calls Operator.next()and Expression.compute()
![Page 77: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/77.jpg)
Method 2: Vectorization
Interpreting query plans one record at a time is simple, but it’s too slow» Lots of virtual function calls and branches for
each record (recall Jeff Dean’s numbers)
Keep recursive interpretation, but make Operators and Expressions run on batches
CS 245 79
![Page 78: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/78.jpg)
Implementing Vectorization
CS 245 80
class TupleBatch {// Efficient storage, e.g.// schema + column arrays
}
interface Operator {TupleBatch next();
}
class Select: Operator {Operator parent;Expression condition;
}
...
class ValueBatch {// Efficient storage
}
interface Expression {ValueBatch compute(TupleBatch in);
}
class Times: Expression {Expression left, right;
}
...
![Page 79: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/79.jpg)
Typical Implementation
Values stored in columnar arrays (e.g. int[]) with a separate bit array to mark nulls
Tuple batches fit in L1 or L2 cache
Operators use SIMD instructions to update both values and null fields without branching
CS 245 81
![Page 80: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/80.jpg)
Pros & Cons of Vectorization
+ Faster than record-at-a-time if the queryprocesses many records
+ Relatively simple to implement
– Lots of nulls in batches if query is selective
– Data travels between CPU & cache a lot
CS 245 82
![Page 81: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/81.jpg)
Method 3: Compilation
Turn the query into executable code
CS 245 83
![Page 82: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/82.jpg)
Compilation Example
Pquanity*price (σproductId=75 (orders))
class MyQuery {void run() {
Iterator<OrdersTuple> in = openTable(“orders”);for(OrdersTuple t: in) {
if (t.productId == 75) {out.write(Tuple(t.quantity * t.price));
}}
}}
CS 245 84
generated class with the rightfield types for orders table
Can also theoretically generate vectorized code
![Page 83: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/83.jpg)
Pros & Cons of Compilation
+ Potential to get fastest possible execution
+ Leverage existing work in compilers
– Complex to implement
– Compilation takes time
– Generated code may not match hand-written
CS 245 85
![Page 84: Indexes Part 2 and Query Execution - Stanford Universityweb.stanford.edu/class/cs245/slides/06-Query-Execution.pdf · Indexes Part 2 and Query Execution Instructor: Matei Zaharia](https://reader035.fdocuments.in/reader035/viewer/2022071218/604e53861aa260324e748e33/html5/thumbnails/84.jpg)
What’s Used Today?
Depends on context & other bottlenecks
Transactional databases (e.g. MySQL):mostly record-at-a-time interpretation
Analytical systems (Vertica, Spark SQL):vectorization, sometimes compilation
ML libs (TensorFlow): mostly vectorization (the records are vectors!), some compilation
CS 245 86