H-Store: A Specialized Architecture for High-throughputOLTP Applications
Evan Jones (MIT)Andrew Pavlo (Brown)13th Intl. Workshop on High Performance Transaction SystemsOctober 26, 2009
Intel Xeon E5540 (Nehalem/Core i7)
October 26, 2009
Source: Intel 64 and IA-32 Architectures Optimization Reference Manual
Distributed Clusters
October 26, 2009
Scaling OLTP on Multi-Core?
Use a distributed shared-nothing design
October 26, 2009
How to Make a Faster OLTP DBMS Main memory storage
Replication for durability
Explicitly partition the data
Specialized concurrency control Stored procedures only Single partition: execute one transaction at time Multiple partitions: supported but slow
October 26, 2009
Source: S. Harizopoulos, D. J. Abadi, S. Madden, M. Stonebraker, “OLTP Under the Looking Glass”, SIGMOD 2008.
OLTP: Where does the time go?
October 26, 2009
Users Rely on Partitioning
October 26, 2009
Source: R. Shoup, D. Pritchett, “The eBay Architecture,” SD Forum, Nov. 2006.
What about multi-core? Traditional approach:
One database process Thread per connection Shared-memory, locks and latches
H-Store approach: Thread per partition Distributed transactions
October 26, 2009
Example Microbenchmark One table per client
Table(id INTEGER, counter INTEGER)
Each client executes the following query:UPDATE TableSET counter = counter + 1WHERE id = 0;
Add clients to find maximum throughput Data on RAM disk
October 26, 2009
Experimental Configuration
October 26, 2009
Threads Partitions
Partitions versus Threads
October 26, 2009
Ideal
Partitions
Threads
Scalability AnalysisPartitions scale better than threads.
Threads: contention for shared resources [1]
Partitions: memory bottleneck causes sublinear scaling
H-Store: Not just for distributed shared-nothing clusters
October 26, 2009
[1] R. Johnson et al., "Shore-MT: A Scalable Storage Manager for the Multicore Era," EDBT 2009.
Multi-core Design Problem How to automatically create a data placement
scheme to improve multi-core throughput?
Data Partitioning: Maximize the number of single-partition transactions.
Data Placement: Maximize the number of single-node transactions.
October 26, 2009
Database Partitioning Select partitioning keys and construct schema tree.
October 26, 2009
DISTRICTDISTRICT
CUSTOMERCUSTOMER
ORDER_ITEMORDER_ITEM
ITEMITEM
STOCKSTOCK
WAREHOUSEWAREHOUSE
ORDERSORDERS
DISTRICTDISTRICT
CUSTOMERCUSTOMER
ORDER_ITEMORDER_ITEM
STOCKSTOCK
ORDERSORDERS ITEMITEM
Replicated
WAREHOUSEWAREHOUSE
TPC-C Schema Schema Tree
ITEMITEMITEMjITEMjITEMITEMITEMITEMITEMITEM
Database Partitioning Combine table fragments into partitions.
October 26, 2009
P2
P4
DISTRICTDISTRICT
CUSTOMERCUSTOMER
ORDER_ITEMORDER_ITEM
STOCKSTOCK
ORDERSORDERS
Replicated
WAREHOUSEWAREHOUSE
P1
P1
P1
P1
P1
P1
P2
P2
P2
P2
P2
P2
P3
P3
P3
P3
P3
P3
P4
P4
P4
P4
P4
P4
P5
P5
P5
P5
P5
P5
P5
P3
P1
ITEMITEMITEMITEM
ITEMITEM ITEMITEM
ITEMITEM
Partitions
ITEMITEM
Schema Tree
Partition AffinityCore1 Core2
HT2
HT1
Core1 Core2
HT2
HT1
Data Placement Assign partitions to cores on each node.
October 26, 2009
P1ITEMITEM
P2ITEMITEM
P5ITEMITEM
Partitions Cluster Node
P4ITEMITEM
P3ITEMITEM
Node 1 Node n
P1
P2P3P4P5
H-Store’s Future New Name. New Company.
Six full-time developers.
Open-source project (GPL)
Beta by end of 2009 Multiple deployments in financial service areas.
October 26, 2009
More Information H-Store Info + Papers:
http://db.cs.yale.edu/hstore/
VoltDB Project Information: http://www.voltdb.org/
October 26, 2009
Top Related