with RocksDB LSM Tree MongoDB Storage Engine · MongoDB Storage Engine with RocksDB LSM Tree Denis...
Transcript of with RocksDB LSM Tree MongoDB Storage Engine · MongoDB Storage Engine with RocksDB LSM Tree Denis...
5
Contents
- What is MongoRocks?
- RocksDB overview
- MongoDB contracts for storage engines
- The most problematic operation
19
Files in levels are immutable
- Compaction creates new files and old
ones get deleted when not used
20
Files in levels are immutable
- Compaction creates new files and old
ones get deleted when not used
- Files are written sequentially to disk,
which speeds up I/O
23
Data organization in MongoDB
- “Containers” for data and indexes receive
unique string identifiers ident
- Elements themselves shall have unique
id inside a container
26
Data organization in MongoRocks
<ident + id> for every container’s element
coll1 ind1_1 ind1_2 coll2 … indN_M
28
Data organization in MongoRocks
- ident > 20 symbols, extra cost for every
data element
- such ident length is caused by using it as
a filename for WiredTiger and mmapv1
30
Data organization in MongoRocks
- hash from ident is bad as it may cause
collisions for short hashes
31
Data organization in MongoRocks
- hash from ident is bad as it may cause
collisions for short hashes
- Auto increment counter (named prefix)
and map of ident → prefix
32
Data organization in MongoRocks
<prefix + id> for every container’s element
prefix_0 prefix_1 prefix_2 prefix_3 … prefix_N
34
Index format in MongoRocks
K = <prefix + value + order + id (loc)>
V = <typeof value>
Comes from MongoDB
36
Index format in MongoRocks
- The storage should support search
operation lower_bound | upper_bound
37
Index format in MongoRocks
- The storage should support search
operation lower_bound | upper_bound
- Allows to position on the closest value
and decode it
38
Index format in MongoRocks
- The storage should support search
operation lower_bound | upper_bound
- Allows to position on the closest value
and decode it
- RocksDB has iterators for this purpose
40
Deleting data in MongoRocks
- Deleting an element (document, index) -
is just putting operation D into LSM-tree
41
Deleting data in MongoRocks
- Deleting an element (document, index) -
is just putting operation D into LSM-tree
- As a result, the tree is filled with garbage
of old data and delete ops, which slows
down the iteration
44
Deleting data in MongoRocks
- Ask for iterator’s statistics after iteration
- If there’s too much skipped data - run
compaction for this range
45
Deleting data in MongoRocks
- Ask for iterator’s statistics after iteration
- If there’s too much skipped data - run
compaction for this range
- The range is always a prefix
47
- Need to iterate over all data and indexes
of collection and delete every item
Deleting collections in MongoRocks
48
- Need to iterate over all data and indexes
of collection and delete every item
- A lot of garbage created
Deleting collections in MongoRocks
49
- Need to iterate over all data and indexes
of collection and delete every item
- A lot of garbage created
- Doesn’t make sense compared to
engines that just drop files on disk
Deleting collections in MongoRocks
53
- Create filter with prefixes of dropped
containers
- Start compaction for prefix
Deleting collections in MongoRocks
54
- Create filter with prefixes of dropped
containers
- Start compaction for prefix
- Compaction calls the filter for every item
and decides if it shall be deleted or not
Deleting collections in MongoRocks
55
To run compaction after the crash, a
marker about dropped prefix is persisted,
and it’s kept until the compaction is finished
Deleting collections in MongoRocks
58
- DeleteFilesInRange allows to delete files
that contain keys fully in requested range
Deleting collections in MongoRocks
59
- DeleteFilesInRange allows to delete files
that contain keys fully in requested range
- Requires care as it deletes files
immediately even if some keys are still in
use (by snapshots)
Deleting collections in MongoRocks
61
- MongoDB doesn’t send notifications
about logical drop of a collection or a db
Deleting collections in MongoRocks
62
- MongoDB doesn’t send notifications
about logical drop of a collection or a db
- Because WiredTiger or mmapv1 don’t
need this as they delete files on disk
Deleting collections in MongoRocks
63
- MongoDB doesn’t send notifications
about logical drop of a collection or a db
- Because WiredTiger or mmapv1 don’t
need this as they delete files on disk
- Forces to compact every prefix by itself
Deleting collections in MongoRocks
66
MongoDB has specific
collection type built as
circular buffer
Developed solely for
oplog - replication log
Capped collections in MongoRocks
67
- oplog is pretty large (5% of disk size, not
more than 50Gb by default)
Capped collections in MongoRocks
68
- oplog is pretty large (5% of disk size, not
more than 50Gb by default)
- Because of lots of overwrites, oplog is
polluted with garbage, which affects the
performance of the whole storage
Capped collections in MongoRocks
69
- Have separate code to monitor oplog size
and number of ‘tombstones’ in it
Capped collections in MongoRocks
70
- Have separate code to monitor oplog size
and number of ‘tombstones’ in it
- Higher priority for oplog compaction (in
the queue of compaction operations)
Capped collections in MongoRocks
72
- Classic storage engine has one B-tree for
one “container” (data or index)
Column families in MongoRocks
73
- Classic storage engine has one B-tree for
one “container” (data or index)
- MongoRocks has one LSM-tree for all
“containers”
Column families in MongoRocks
76
- RocksDB supports set of LSM-trees
(column families) with shared WAL to
provide transactional logic
Column families in MongoRocks
77
- RocksDB supports set of LSM-trees
(column families) with shared WAL to
provide transactional logic
- First developed for MySQL (MyRocks
project)
Column families in MongoRocks
78
- MongoRocks should have separate
LSM-tree for oplog, maybe even separate
LSM-tree for every prefix
Column families in MongoRocks
81
- MongoDB contracts still have some
typical details not applicable to
MongoRocks
- It’s good to order keys in a storage
somehow
83
- The problem of deleting keys may be
solved using different optimizations
- The idea of multiple LSM-trees is a step
forward
85
SAVE THE DATE!
CALL FOR PAPERS OPENING SOON!www.perconalive.com
April 23-25, 2018Santa Clara Convention Center