The Hive Think Tank: Rocking the Database World with RocksDB
-
Upload
the-hive -
Category
Technology
-
view
516 -
download
0
Transcript of The Hive Think Tank: Rocking the Database World with RocksDB
MySQL + RocksDBBetter Storage Efficiency Than InnoDB
Feb 3, 2016Siying Dong, Software EngineerDatabase Engineering Team @ Facebook
Why Another MySQL Storage Engine ?
Facebook Website Architecture
MySQLDatabases
Web serversData center
Caches
Facebook Website Architecture
MySQLDatabases
Web serversData center
Caches
SSDs
Facebook Website Architecture
▪ What do the system metrics look like?MySQL
Databases
Web serversData center
Caches
SSDs
Measurements of MySQL Hosts’ Actual Resource Usages ▪ Read IOPS: < 5% ▪ Write IOPS: < 5%▪ Peak Write Bandwidth: < 15%▪ CPU: < 20%▪ Write Endurance: last more than 3 years.▪ Space is the bottleneck
RocksDB Storage Engine in MySQL(MyRocks)
▪ https://github.com/MySQLOnRocksDB/mysql-5.6
RocksDB vs. InnoDB
DB Size Comparison
InnoDB RocksDB00.20.40.60.8
11.2
DB Size (Relative)
Write Amplification Comparison
InnoDB RocksDB00.20.40.60.8
11.2 Bytes Written (Rel-
ative)
Why is RocksDB better?
Lower Space Amplification
RowRow
RowWasted
RowRow
RowWasted
Using 8KB space on storage
Waste
4KB 4KB
InnoDB RocksDBCompressed to 5KB
Uncompressed 16KB pageRowRow
RowWasted
Level 1
Level 2
Level 3
Level 4
Target 1GB
Target 10 GB
Target 100 GB
Target 1000 GB
stale
Level 0
stalestale
stale
stale
stale
Lower Write Amplification (Worst Case)
Row
Read
Modify
Write
Row
Row
Row
Row
Row
Row
Row
Row
Row
Write Amp = Page size / row size
InnoDB RocksDB
Level 1
Level 2
Level 3
Level 4
Target 1GB
Target 10 GB
Target 100 GB
Target 1000 GB
Level 0
Merge
Merge
Merge
Merge
flush
Write amp 1
Write Amp 10
Write Amp 10
Write Amp 10
Write Amp 10
How about query performance?
Queries Per Second (LinkBench)
Reads Per Query (LinkBench)
New RocksDB Features to Support MySQL
Transactions▪ Optimistic and Pessimistic▪ MyRocks Uses Pessimistic
“Single Delete”▪ Secondary index is insert/delete only, never upate▪ Updating a column in secondary key generates one delete + one insert
▪ We can drop “single delete” tombstone as soon as it meets a value
Delete Files in Range▪ Drop Table/Index Needs to claim space fast
Delete Files in Range▪ Drop Table/Index Needs to claim space fast
L0
L1
L2
L3
Table A Table B Table C
Delete Files in Range
L0
L1
L2
L3
Table A Table B Table C
▪ Drop Table/Index Needs to claim space fast
Delete Files in Range
L0
L1
L2
L3
Table A Table B Table C
▪ Drop Table/Index Needs to claim space fast
Delete Files in Range
L0
L1
L2
L3
Table A Table C
▪ Drop Table/Index Needs to claim space fast
What we worked around?
Reverse Column order▪ RocksDB’s Prev() is much slower than Next()▪ Reverse column order to serve common query better
Optimizer Stats▪ Query plans need index statistics▪ MyRocks stores index statistics in data dictionary (similar to InnoDB)
▪ When creating data files (SST files), statistics are also added to the SST files, and data dictionary is also updated
Limitation?
MyRocks LimitationLimitation Plan to Address
MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index
Support them
MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index
Support them
No next key locking support
MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index
Support them
No next key locking supportOnly support replication using row-based binary logging. (durability with XA not supported)
RocksDB to support Two-Phase-Commit. MyRocks to use it to support replication using statement-based binary logging.
MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index
Support them
No next key locking supportOnly support replication using row-based binary logging.(durability with XA not supported)
RocksDB to support Two-Phase-Commit. MyRocks to use it to support replication using statement-based binary logging.
Either ORDER BY DESC or ASC is slower
Improve RocksDB Prev() performance, to narrow performance gap between ORDER BY DESC and ASC.
Conclusion
Conclusion▪ MyRocks vs. InnoDB
▪ Better Space and Write Amplification▪ Comparable performance
▪ Features in RocksDB to Support MyRocks:▪Transactions▪“Single Delete”▪Delete Files in Range
▪ MyRocks has some limitations and they are being addressed.
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0