The Adaptive Radix Tree: ARTful Indexing for Main- Memory...
Transcript of The Adaptive Radix Tree: ARTful Indexing for Main- Memory...
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases
Presentation by Aaron Kabcenell
The adaptive radix tree: ARTful indexing for main-memory databases. Viktor Leis, AlfonsKemper, Thomas Neumann. International Conference on Data Engineering (ICDE), 2013
https://en.wikipedia.org/wiki/The_Starry_Night#/media/File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
What is the problem?
Main Memory Indexing
Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and
Compilation. Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann,
Alfons Kemper. ACM SIGMOD International Conference on Management of Data. 2016
?
Why is it important?
OLTP Workloads limited by Index Performance
Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and
Compilation. Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann,
Alfons Kemper. ACM SIGMOD International Conference on Management of Data. 2016
Why is it hard?
Hash Tables
Fast, O(1) access time
x Point queries only
x Overflow causes periodic latency
A study of index structures for main memory database management systems. T. J. Lehman and M. J. Carey. International Conference on Very Large Databases (VLDB),1986
Trees
Keeps elements ordered
x Not ideal for modern hardware (cache misses, pipeline stalls)
Modern B-Tree Techniques, by Goetz Graefe, Foundations and Trends in Databases, 2011
Can we get fast, fully featured indexing?
Why do existing solutions not work?
T-Trees
A study of index structures for main memory database management systems. T. J. Lehman and M. J. Carey. International Conference on Very Large Databases (VLDB),1986
Balance of space overhead and search speed
x Significant amounts of data stored per node, but only two pointers used
x Poor cache behavior
Cache Sensitive B+-Trees
J. Rao and K. A. Ross, “Making B+ trees cache conscious in main memory”, SIGMOD, 2000.
Stores only one child pointer per node
Can fan out more and keep more nodes in cache line
x Many comparisons cause pipeline stalls
Fast Architecture Sensitive Trees
C. Kim et al., “FAST: fast architecture sensitive tree search on modern cpus and gpus”, SIGMOD 2010.
Binary Tree
SIMD Blocking
Three Level Hierarchy
Reduce comparisons by matching structure to SIMD vector size
Reduce cache misses by matching structure to cache line size
Pointer-free, stored in arrays and use offset calculations
x Expensive updates
Radix Tree
• Two factors determine performance:• k: key length in bits
• s: span – number of bits in key stored in each node
• Tree has k/s levels
• Node has 2s pointers
A
N R
D T Y E T
Radix Tree
Complexity of operations based on key length, not key number
Keys are ordered and stored implicitly
Insertion order independent creation with no rebalancing
x Mostly studied for character strings
x Poor space usage due to large number of null paths
A
N R
D T Y E T
What is the core intuition for the solution?
Adaptive Nodes to Reduce Space Consumption
Adaptive Node Types
Path Compression
Worst-Case Space Consumption
Binary-Comparable Keys
• Unsigned integers:• Binary representation already sorted
• Signed Integers:• Flip sign bit and store as unsigned integers
• Floating Point Numbers:• Separate into positive, negative, normalized, denormalized, NaN, Inf, or 0
• Reorder and store as unsigned integers
• Character Strings:• Standard libraries available
• Null:• Add one byte to key length to encode Null value
• Compound Keys:• Transform attributes individually and concatenate
What is the setup of analysis/experiments? Is it sufficient?
Micro Benchmarks
• Use 32-bit integers as keys• Path compression disabled for short keys
• Two different key distributions• Dense – keys ranging from [1,tree size]
• Sparse – each bit equally likely either 0 or 1
Search Performance65K Single Thread 16M Single Thread
256M Single Thread 16M Multi-Thread
Caching Effects
Updates
TPC-C Benchmark
• OLTP benchmark describing a merchandising company• Includes selects, inserts, deletes
• Write-heavy
• Integrates ART into HyPer• Depends heavily on index performance
TPC-C Benchmark Using HyPer
Gaps and Next steps?
Gaps and Next Steps
• Own implementation of competing data structures
• Sparse vs dense key performance
• Ideal node number and size?
• Synchronizing concurrent updates