Post on 04-Mar-2015
M tDB/X100MonetDB/X100a (very) fast column-store( y)
M i Z k ki P t BMarcin Zukowski, Peter BonczSándor Héman, Niels NesSándor Héman, Niels Nes
CWI, Amsterdam, The Netherlands
Disclaimer… a different one ☺
This talk is about data-intensive applications pp
Data warehousing, analytical processing
Scientific data information retrievalScientific data, information retrieval
Transaction processing is a different story…
A lot of content… wake up!
MonetDB/X100 overview 2
Outline
Traditional database performancep
Improvements in MonetDB
MonetDB/X100MonetDB/X100
Query execution
Storage
MonetDB/X100 overview 3
Database performancep
TPC-H 1GB, Query 1, Q y
Selects 98% of fact table (6M rows), computes t i d t llnet prices and aggregates all
Performance:
C program: ?
MySQL: 26.2sMySQL: 26.2s
DBMS “X”: 28.1s
MonetDB/X100 overview 4
Database performancep
TPC-H 1GB, Query 1, Q y
Selects 98% of fact table (6M rows), computes t i d t llnet prices and aggregates all
Performance:
C program: 0.2s
MySQL: 26.2sMySQL: 26.2s
DBMS “X”: 28.1s
MonetDB/X100 overview 5
Database performance analyzedatabase pe o a ce a a y ed
Why so slow?y
Inefficient data storage format
Inefficient query processing modelInefficient query processing model
MonetDB/X100 overview 6
N-ary storage model (NSM)y g ( )
Fixed-width attributes in a record
Joe 27 Black101
Edward 21 Scissorhand103
MonetDB/X100 overview 7
Real-life NSM implementationp
“Slotted” pages, example:p g , p
27 Joe Black 101
21 Edward Scissorhand103 21 Edward Scissorhand 03
MonetDB/X100 overview 8
NSM problemsp
Poor bandwidth use
Always read all the attributes
Terrible on diskTerrible on disk
Bad in memory
Complex attribute access
Variable-length fields
Null fields
MonetDB/X100 overview 9
Column stores to the rescue!
Store attributes separatelyp y
R d l ib d bRead only attributes used by a query
MonetDB/X100 overview 10
“Traditional” column stores
Data pathp
Read columns from disk
Convert into NSMConvert into NSM
Use NSM-based processing
Examples: Sybase IQ, Vertica
Not enough!g
Only I/O problem addressed
MonetDB/X100 overview 11
How databases run a queryq y
Query
SELECT name, salary*.19 AS taxsalary .19 AS tax
FROMemployee
WHERE age > 25
MonetDB/X100 overview 12
Database operatorspTuple-at-a-time iterator interface:- open()- next(): tuple
close()- close()
next() is called:- for each operator- for each operator- for each tuple
Complex code repeatedComplex code repeated over and over
MonetDB/X100 overview 13
Primitive functions
Provide data-specific computational functionality
Called once for every operation on e er t pleoperation on every tuple.
Even worse with complex tuple representationtuple representation
Perform one operation (e.g. addition) in one call(e.g. addition) in one call
MonetDB/X100 overview 14
DBMS performance - IPTp
Lots of repeated, unnecessary codep , y
Operator logic
Function callsFunction calls
Attribute access
Most instructions NOT processing any actual data!
High instructions-per-tuple (IPT) factor
MonetDB/X100 overview 15
Modern CPUs
New CPU features over last 20 yearsy
Instruction and data cache
Deep pipeline with out of order executionDeep pipeline with out-of-order execution
Superscalar features – multiple instructions at once
SIMD instructions (SSE)
Great for e.g. multimedia processing…
… but bad for database code!
MonetDB/X100 overview 16
DBMS performance - CPIp
CPU-unfriendly codey
Complicated code
Poor use of CPU cache (both data and instructions)Poor use of CPU cache (both data and instructions)
Processing one value at a time
Compilers can’t help much
High cycles-per-instruction (CPI) factor
MonetDB/X100 overview 17
DBMS performancep
Performance factors:
High instructions-per-tuple
High cycles per instructionHigh cycles-per-instruction
Very high cycles-per-tuple (CPT)
Others can do better
Scientific computing, …
How can we?
MonetDB/X100 overview 18
MonetDB
MonetDB – 1993-now, developed at CWI , p
Peter’s PhD ☺
Column storeColumn store
Improves computational efficiency
Predecessor of MonetDB/X100
MonetDB/X100 overview 19
MonetDB: a column store
“save disk I/O when scan-intensive queries / qneed a few columns”
MonetDB/X100 overview 20
MonetDB: a column store
“save disk I/O when scan-intensive queries / qneed a few columns”
“ id i i t t t i“avoid an expression interpreter to improve computational efficiency”
MonetDB/X100 overview 21
MonetDB in actionSELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30WHERE age > 30
MonetDB/X100 overview 22
MonetDB in actionSELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30WHERE age > 30
MonetDB/X100 overview 23
MonetDB in actionSELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30
CPU Efficiency depends on “nice” code- out-of-order execution- few dependencies (control,data)WHERE age > 30
int select gt float( oid* res,
few dependencies (control,data)- compiler support
Compilers love simple loops over arraysl i li i
Simple hard
_g _ ( ,float* column, float val, int n)
{for(int j=0 i=0; i<n; i++)
- loop-pipelining- automatic SIMD
Simple, hard-coded semantics
in operators
for(int j=0,i=0; i<n; i++)if (column[i] >val) res[j++] = i;
return j;}
MonetDB/X100 overview 24
MonetDB: a column store
“save disk I/O when scan-intensive queries / qneed a few columns”
“ id i i t t t i“avoid an expression interpreter to improve computational efficiency”
Simple algebra – Monet Interpreter Language (MIL)
Hard-coded operator semantics – no function calls
Array-like processing
MonetDB/X100 overview 25
MonetDB problempSELECT id, name, (age-30)*50 as bonusFROM peopleWHERE age > 30WHERE age > 30
MATERIALIZED intermediate
results
MonetDB/X100 overview 26
Materialization problemp
Extra main-memory bandwidth y
Performance is sub-optimal…
but still faster than anything else (5 years ago ☺)… but still faster than anything else (5 years ago ☺)
Reduces scalability
Can’t afford writing to disk
Only effective for limited data sizes and not all query types
MonetDB/X100 overview 27
MonetDB: a Faustian Pact
You want efficiencyy
Simple hard-coded operators
I take scalabilitI take scalability
Result materializationSupports SQL and XQuery
C program: 0.2s
MonetDB: 3 7s
Open-source download:monetdb.cwi.nl
MonetDB: 3.7s
MySQL: 26.2s
DBMS “X”: 28.1s
MonetDB/X100 overview 28
MonetDB/X100/
My PhD thesisy
Motivation:
l ’ fi M DB l bililet’s fix MonetDB scalability…
… and improve the performance on the way ☺
Core ideas:
New execution modele e e u o ode
High performance column storage
MonetDB/X100 overview 29
Typical Relational DBMS Engineyp g
Query
SELECT name, salary*.19 AS taxsalary .19 AS tax
FROMemployee
WHERE age > 25
MonetDB/X100 overview 30
MonetDB/X100: “Vectors”/
MonetDB/X100 overview 31
MonetDB/X100: “Vectors”o et / 00 ecto s
Vector contains data of multiple tuples (~100-1000)
All operations work on entire vectorson entire vectors
Effect: much less operator.next() and primitive calls.
MonetDB/X100 overview 32
VectorsColumn slices as
unary arrays
NOT:NOT:Vertical is a better table storage layout than horizontal
(though we still think it often is)(though we still think it often is)
RATIONALE:- Simple array operations are p y p
well-supported by compilers- SIMD friendly layout- Assumed cache-resident
MonetDB/X100 overview 33
Vectorized Primitivesint select_lt_int_col_int_val (
int *res,t es,int *col, int val, int n)
{for(int j i 0; i<n; i++)
Most primitives take just 0.5 (!) tofor(int j=i=0; i<n; i++)
if (col[i] < val) res[j++] = i;return j;
}
take just 0.5 (!) to 10 cycles per tuple
}10-100+ times faster thant l t tituple-at-a-time
MonetDB/X100 overview 34
MonetDB/X100/
Both efficiency…y
Vectorized primitives
and scalabilit… and scalability
Pipelined query evaluation
C program: 0.2s
MonetDB/X100: 0.6s
MonetDB: 3.7s
MySQL: 26.2s
DBMS “X”: 28 1s
MonetDB/X100 overview 35
DBMS “X”: 28.1s
Memory Hierarchyy yX100 query engine
CPUhcache
ColumnBM (buffer manager)
RAM
MonetDB/X100 overview 36
(raid)Disk(s)
Optimal Vector size?pX100 query engine
All vectors together should fit the CPU cache
Depends on the query
CPUhcache
ColumnBM (buffer manager)
RAM
MonetDB/X100 overview 37
(raid)Disk(s)
Varying the Vector sizey g
Less and less operator.next() and
primitive function calls (“interpretation overhead”)( interpretation overhead )
MonetDB/X100 overview 38
Varying the Vector sizey g
Vectors start to exceed theCPU cache, causing
additional memory trafficadditional memory traffic
MonetDB/X100 overview 39
MonetDB/MIL materializes columnso et / ate a es co u sMonetDB/MILX100 query engine
CPUhcache
ColumnBM (buffer manager)
RAM
MonetDB/X100 overview 40
(raid)Disk(s)
Why is X100 so fast?yReduced interpretation overhead
100x less Function Calls100x less Function Calls
Good CPU cache useHigh locality in the primitivesg y pCache-conscious data placement
No Tuple NavigationPrimitives only see arrays
Vectorization allows algorithmic optimizationCPU and compiler-friendly function bodies
Multiple work units, loop-pipelining, SIMD…
MonetDB/X100 overview 41
Feeding the Beastg
X100 uses < 100 cycles per tuple for TPC-H Q1y p p Q
Q1 has ~30 bytes of used columns per tuple
3GHz CPU core3GHz CPU core
eats 900MB/s
No problem for RAM
But disk-based data?
MonetDB/X100 overview 42
Using Disk in the 21th centuryUsing Disk in the 21th century
Poor random disk access needs toPoor random disk access needs to be compensated with more and
more disk heads.(tens, hundreds… thousands!)
You’re better off with scanning!
MonetDB/X100 overview 43
Using Disk in Data Warehousingg gGoals:1 S b d di k * l * (f ll ti l )1. Scan-based disk access *only* (full or partial scans)2. Minimize bandwidth3. Benefit, not suffer from concurrency
Database strategies:replicate tables in multiple orders (goal 1)p p (g )clustering join-tables in foreign-key order (goal 1)keep dimension tables in RAM (goal 1&2)scan-optimized indices (goal 1&2)scan-optimized indices (goal 1&2)use a column-store (goal 2)increase disk bandwidth with lightweight compression (goal 2)
di di k ( l 2&3)
MonetDB/X100 overview 44
coordinate concurrent disk access (goals 2&3)
Feeding the Beast (1)g ( )
Two ideas pursued:p
Lightweight compression to enhance disk bandwidth
Maximizing diskMaximizing disk
scan sharing in
concurrent queries.
MonetDB/X100 overview 45
Compression to improve I/O bandwidthCompression to improve I/O bandwidth
0.9GB/s query consumption/ q y p
1/3 CPU for decompression 1.8GB/s needed
new lightweight compression schemesMonetDB/X100 overview 46
new lightweight compression schemes
Key Ingredientsy g
Compress relations on a per-column basisEasy to exploit redundancy
Keep data compressed in main-memoryKeep data compressed in main memoryMore data can be buffered
Decompress vector at a timeDecompress vector at a time Minimize main-memory overhead
Use light-weight, CPU-efficient algorithmsExploit processing power of modern CPUs
MonetDB/X100 overview 47
Results
MonetDB/X100 overview 48
TPC-H 100 GBDecent improvement
with fast disksTPC-H
query
MonetDB/X100 on 1 CPU DB2 – 8 CPUs
Compression ratio
4 disks 12 disks 142 disks
Speedup Time (s) Speedup Time (s) Time (s)Speedup Time (s) Speedup Time (s) Time (s)
01 4.33 4.41 69.6 1.29 50.9 111.9
03 3.04 3.10 11.3 1.48 6.0 15.1
04 8.15 7.58 2.4 2.67 1.8 12.5
05 3.81 3.55 15.3 1.06 16.2 84.0
06 4.39 4.50 10.7 2.35 4.6 17.1
07 1.71 1.66 72.0 0.84 40.8 86.5
MonetDB/X100 overview 49
Linear speedup with slow disks
Competes with DB2 using ~10x less resources
Feeding the Beast (2)g ( )
Two ideas pursued:p
Lightweight compression to enhance disk bandwidth
Maximizing diskMaximizing disk
scan sharing in
concurrent queries.
MonetDB/X100 overview 50
Concurrent scans
Multiple queries p qscanning the same table
Different start timesDifferent start times
Different scan ranges
Compete for disk accessCompete for disk access and buffer space
C SFCFS request scheduling: poor latency
MonetDB/X100 overview 51
“Normal” scans in real life
MonetDB/X100 overview 52
Shared scans
Observation: queries qoften do not need data in a sequential orderq
Idea: make queries “share” the scanningshare the scanning process
Two existing types:Two existing types:Attach
Elevator
MonetDB/X100 overview 53
Elevator
“Attach” in real life
MonetDB/X100 overview 54
“Elevator” in real life
MonetDB/X100 overview 55
Existing shared scansg
Benefits
Less I/O operations
Better data reuseBetter data reuse
Problems
Sharing decisions static (when a query starts)
Misses opportunities in a dynamic environment
Not sensitive to different query types
MonetDB/X100 overview 56
“Relevance” scans
Core ideas
Dynamically adapt to the current situation
Allow fully arbitrary data orderAllow fully arbitrary data order
Goals:
Maximize data sharing
Optimize latency and throughput
Work for different types of queries
MonetDB/X100 overview 57
“Relevance” in real life
MonetDB/X100 overview 58
Results
MonetDB/X100 overview 59
ConclusionPresented MonetDB/X100
A d t b k l d l d t CWIA new database kernel developed at CWI
Uses block-oriented iterator model (vectorization)
works amazingly wellworks amazingly well
So fast must reduce hunger for hard disksColumn storage specialized in sequential accessg p q
+ Lightweight compression schemes (give ~~ factor 3)
+ Cooperative Bandwidth Sharing (gives ~~ factor 2)
Good performance resultsFastest raw 100GB TPC-H performance around (** not fair)
MonetDB/X100 overview 60
Beats IR systems on Terabyte TREC
LiteratureMonetDB
P.A. Boncz. MIL Primitives for Querying a Fragmented World. VLDB Journal, 1999.
P.A. Boncz. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Ph.d. thesis, 2002.
MonetDB/X100P.A. Boncz, M.Zukowski, N.Nes. MonetDB/X100: Hyper-pipelining Query Execution, CIDR 2005.
M Zukowski S Heman N Nes P A Boncz Super-Scalar RAM-CPU Cache Compression ICDE 2006M. Zukowski, S. Heman, N. Nes, P.A. Boncz. Super Scalar RAM CPU Cache Compression. ICDE 2006.
M. Zukowski, S. Heman, N.Nes, P.A. Boncz. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS, VLDB 2007.
OtherS. Padmanabhan, T. Malkemus, R. Agarwal, A. Jhingran. Block oriented processing of relational database operations in modern computer architectures. ICDE 2001.
J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. ICDE 1998.
M. Stonebraker, et al. C-Store: A Column Oriented DBMS. VLDB 2005.
D.J. Abadi, S.R. Madden, M.C. Ferreira. Integrating Compression and Execution in Column-Oriented DatabaseD.J. Abadi, S.R. Madden, M.C. Ferreira. Integrating Compression and Execution in Column Oriented Database Systems. SIGMOD 2006.
S. Harizopoulos, V. Liang, D. Abadi, S. Madden. Performance Tradeoffs in Read-Optimized Databases. VLDB 2006.
D.J. Abadi, D.S. Myers, D.J. DeWitt, S.R. Madden. Materialization Strategies in a Column-Oriented DBMS. ICDE 2007.
C A L B Bh tt h j T M lk S P d bh K W I i b ff l lit f lti l
MonetDB/X100 overview 61
C.A. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, K. Wong. Increasing buffer-locality for multiple relational table scans through grouping and throttling. ICDE 2007.
The End
Thank you!
Questions?
MonetDB/X100 overview 62