Query Processing and Optimization in Modern Database Systems · 2017. 3. 13. · Query Processing...
Transcript of Query Processing and Optimization in Modern Database Systems · 2017. 3. 13. · Query Processing...
-
Query Processing and Optimizationin Modern Database Systems
Viktor Leis
-
Architecture of Traditional RDBMSs
feature techniquetransaction isolation locking (2PL)synchronization latching (“lock coupling”)large data sets buffer managementdurability ARIES-style loggingindexing B+treestorage slotted pages (row-wise)SQL iterator model (interpreter)parallelization Exchange operatorsquery optimization DP, indep. assumption
I optimizing (random) disk I/O operations
-
Architecture of Traditional RDBMSs
feature techniquetransaction isolation locking (2PL)synchronization latching (“lock coupling”)large data sets buffer managementdurability ARIES-style loggingindexing B+treestorage slotted pages (row-wise)SQL iterator model (interpreter)parallelization Exchange operatorsquery optimization DP, indep. assumption
I optimizing (random) disk I/O operations
-
Traditional RDBMSs on Modern Hardware
feature technique overhead1transaction isolation locking (2PL) 16%synchronization latching (“lock coupling”) 14%large data sets buffer management 35%durability ARIES-style logging 12%indexing B+treestorage slotted pages (row-wise)SQL iterator model (interpreter)parallelization Exchange operatorsquery optimization DP, indep. assumption
1OLTP Through the Looking Glass (Harizopoulos et al., SIGMOD 2008)
-
Modern Database Systems
I OLAP: column stores (Vectorwise, Vertica, Microsoft Apollo,IBM BLU)
I OLTP: main-memory systems (e.g., Microsoft Hekaton,VoltDB)
I OLAP&OLTP: HANA, HyPer
-
HyPer in 2017
feature HyPer in 2017 contributionstransaction isolation MVCC, precision lockingsynchronization - Part Ilarge data sets -durability physiological loggingindexing Adaptive Radix Tree Master’s [ICDE 2013]storage Data BlocksSQL LLVM compilationparallelization morsel-driven parallelism Part IIquery optimization DP, indep. assumption Part III
-
Part I:Synchronization
on Multi-Core CPUs
ICDE 2014, TKDE 2016, Damon 2016
-
SynchronizationI default index structure in HyPer: Adaptive Radix TreeI latch acquisition causes cache misses
25
50
75
100
5 10 15 20threads
M o
pera
tions
/sec
ond
no synchronization
lock coupling
I this explains single-threaded databases (VoltDB, HyPer 2011)
-
SynchronizationI default index structure in HyPer: Adaptive Radix TreeI latch acquisition causes cache misses
25
50
75
100
5 10 15 20threads
M o
pera
tions
/sec
ond
no synchronization
lock coupling
I this explains single-threaded databases (VoltDB, HyPer 2011)
-
Hardware Transactional Memory
I recent feature offered by Intel CPUs (from Haswell)
+ the easiest way to synchronize data structures+ often very good scalability− not yet widespread− scalability issues can be hard to debug
-
Hardware Transactional Memory
I recent feature offered by Intel CPUs (from Haswell)+ the easiest way to synchronize data structures+ often very good scalability− not yet widespread− scalability issues can be hard to debug
-
Optimistic Lock Coupling
I idea: writers acquire latches (only on modified nodes)I readers validate accesses using version counters (restart if
necessary)+ very general technique+ easy to use− may lead to restarts
-
Read-Optimized Write Exclusion (ROWEX)
I idea: writers acquire latches (on modified nodes)I writers ensure that reads are always safe+ reads always succeed− more difficult than optimistic lock coupling (but easier than
lock-free techniques)
-
Conclusions
25
50
75
100
5 10 15 20threads
M o
pera
tions
/sec
ond
no synchronization
lock coupling
Opt. Lock Coupling
ROWEX
HTM
I latching (does not scale), lock-free data structures (scalablebut slow), and HTM (not widespread) have major problems
I Optimistic Lock Coupling and ROWEX are scalable andpractical
-
Part II:Intra-Query Parallelization
on Multi-Core CPUs
SIGMOD 2014, VLDB 2015
-
Motivation: Many, Many Cores
NetBurst (Foster)NetBurst (Paxville)
Core (Kentsfield) Core (Lynnfield)
Nehalem (Beckton) Nehalem (Westmere EX)
Sandy Bridge EP
Ivy Bridge EP
Ivy Bridge EX
Haswell EP
Broadwell EPBroadwell EX
Skylake EP
1
10
20
30
2000 2004 2008 2012 2016year
core
s pe
r CPU
-
Parallel Query Processing in HyPer
I break input into work units (“morsels”)I worker threads grab morsels dynamically (“work stealing”)I # worker threads = # hardware threadsI requires all operators to be aware of parallelismI better scalability than Exchange operators
-
Example 1: Hash Join
morsel
T
Phase 1: process T morsel-wise and store NUMA-locally
Phase 2: scan NUMA-local storage areaand insert pointers into HT
next morsel
Storagearea of
blue core
scan Insert t
he po
inter
into H
T
globalHash Table
Storagearea of
red core
Storagearea of
green core
v
v
v
-
Example 2: Window Functionsselect a, b, rank() over (partition by a order by b) from r
1. hash partitioning (thread-local)
thread 1 thread 2
2. combine
3.1. inter-partition parallelism
3.2. intra-partition parallelism
3. sort/evaluation
-
Scalability on 32-core System (TPC-H Queries)
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22
010203040
010203040
010203040
010203040
1 16 32 48 64 1 16 32 48 64 1 16 32 48 64 1 16 32 48 64threads
spee
dup
over
HyP
er
System
HyPer
Vectorwise
-
Part III:Query Optimization
VLDB 2016
-
Query Optimization
SELECT ...FROM R,S,TWHERE ...
v
B
B
RS
T
HJ
INLcardinalityestimation
costmodel
plan spaceenumeration
I Do we need a new architecture for query optimizers, too?
-
Join Order Benchmark
I Internet Movie Data Base data set (4 GB)I much more challenging than synthetic benchmarks like TPC-HI 113 queries with 3 to 16 joins
-
Cardinality Estimation: PostgreSQL
1e8
1e6
1e4
1e2
1
1e2
1e4
0 1 2 3 4 5 6number of joins
←un
dere
stim
atio
n [lo
g sc
ale]
ov
eres
t. →
95th percentile
5th percentile
median75th percentile
25th percentile
-
Cardinality Estimation: Commercial Systems
PostgreSQL DBMS A DBMS B DBMS C HyPer
1e8
1e6
1e4
1e2
1
1e2
1e4
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6number of joins
←un
dere
stim
atio
n [lo
g sc
ale]
ove
rest
imat
ion
→
95th percentile
5th percentile
median75th percentile
25th percentile
-
Conclusions
I query optimization is essentialI most (random) join orders are badI optimizers will find good plans for most queries
I cardinality estimation is usually the reason for bad plansI cost model much less important (with memory-resident data)I relative plan quality decreases when more indexes are availableI operators should not rely on estimates (if possible)
-
Future Work
featuretransaction isolation MVCC, precision lockingsynchronization Optimistic Lock Couplinglarge data sets ?durability ?indexing Adaptive Radix Treestorage Data BlocksSQL LLVM compilationparallelization morsel-driven parallelismquery optimization index-based join sampling (CIDR 2017)