20121024 mongodb-boston (1)

49
1 MongoDB and Fractal Tree ® Indexes Tim Callaghan* VP/Engineering, Tokutek [email protected] MongoDB Boston 2012 * not [yet] a MongoDB expert

Transcript of 20121024 mongodb-boston (1)

Page 1: 20121024 mongodb-boston (1)

1

MongoDB and Fractal Tree® Indexes

Tim Callaghan*!VP/Engineering, Tokutek!

[email protected]!!!

MongoDB Boston 2012

* not [yet] a MongoDB expert

Page 2: 20121024 mongodb-boston (1)

2

B-trees

Page 3: 20121024 mongodb-boston (1)

B-tree Definition

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches,

sequential access, insertions, and deletions in logarithmic time.

http://en.wikipedia.org/wiki/B-tree

Page 4: 20121024 mongodb-boston (1)

B-tree Overview

I will use a simple single-pivot example throughout this presentation

Page 5: 20121024 mongodb-boston (1)

5

Basic B-tree

Internal Nodes - Path to data

Leaf Nodes - Actual Data

Pointers

Pivots

Page 6: 20121024 mongodb-boston (1)

B-tree example

22

10 99

2, 3, 4 10,20 22,25 99

* Pivot Rule is >=

Page 7: 20121024 mongodb-boston (1)

B-tree - insert

22

10 99

2, 3, 4 10,15,20 22,25 99

“Insert 15”

Value stored in leaf node

Page 8: 20121024 mongodb-boston (1)

B-tree - search

22

10 99

2, 3, 4 10,20 22,25 99

“Find 25”

Page 9: 20121024 mongodb-boston (1)

DISK

RAM

RAM

B-tree - storage

22

10 99

2, 3, 4 10,20 22,25 99

Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes

Page 10: 20121024 mongodb-boston (1)

DISK

RAM

RAM

B-tree – serial insertions

22

10 99

2, 3, 4 10,20 22,25 99

Serial insertion workloads are in-memory, think MongoDB’s “_id” index

Page 11: 20121024 mongodb-boston (1)

11

Fractal Tree Indexes

Page 12: 20121024 mongodb-boston (1)

Fractal Tree Indexes

similar to B-trees - store data in leaf nodes - use PK for ordering

message buffer

message buffer

message buffer

All internal nodes have message buffers

different than B-trees - message buffer in all internal nodes - doesn’t need to update leaf node immediately - much larger nodes (4MB vs. 8KB*)

Page 13: 20121024 mongodb-boston (1)

13

Fractal Tree Indexes – “insert 15”

22

10 99

2, 3, 4 10, 20 22, 25 99

insert(15)

No IO is required, all internal nodes usually fit in RAM

Page 14: 20121024 mongodb-boston (1)

14

Fractal Tree Indexes – “find 25”

22

10 99

2, 3, 4 10 22, 25 99

insert(15)

insert(20) insert(25)

delete(3)

Page 15: 20121024 mongodb-boston (1)

15

Fractal Tree Indexes – “insert 8”

22

10 99

2, 3, 4 10 22, 25 99

insert(15)

Buffer is full, push messages down to next level.

insert(20) insert(25)

delete(3)

Page 16: 20121024 mongodb-boston (1)

16

Fractal Tree Indexes – “insert 8”

22

10 99

2, 4, 8 10, 20, 25 22, 25 99

insert(15)

Inserted 8, 20, 25. Deleted 3.

Page 17: 20121024 mongodb-boston (1)

17

Fractal Tree Indexes – compression

•  Large node size (4MB) leads to high compression ratios.

•  Supports zlib, quicklz, and lzma compression algorithms.

•  Compression is generally 5x to 25x, similar to what gzip and 7z can do to your data.

•  Significantly less disk space needed •  Less writes, bigger writes •  Both of which are great for SSDs

•  Reads are highly compressed, more data per IO

Page 18: 20121024 mongodb-boston (1)

18

So what does this have to do with MongoDB?

Page 19: 20121024 mongodb-boston (1)

19

So what does this have to do with MongoDB?

* Watch Tyler Brock’s presentation “Indexing and Query Optimization”

Page 20: 20121024 mongodb-boston (1)

20

MongoDB Storage

25

10 99

(2,ptr2), (4,ptr4)

(10,ptr10) (25,ptr25), (98,ptr98)

(101,ptr101)

85

40 120

(2,ptr10), (35,ptr101)

(55,ptr4) (90,ptr2) (2599,ptr98)

db.test.insert({foo:55}) db.test.ensureIndex({foo:1})

PK index (_id + pointer) Secondary Index (foo + pointer)

The “pointer” tells MongoDB where to look in the data files for the actual document data.

Page 21: 20121024 mongodb-boston (1)

21

MongoDB Storage

25

10 99

(2,ptr2), (4,ptr4)

(10,ptr10) (25,ptr25), (98,ptr98)

(101,ptr101)

85

40 120

(2,ptr10), (35,ptr101)

(55,ptr4) (90,ptr2) (2599,ptr98)

B-trees

Page 22: 20121024 mongodb-boston (1)

22

•  Tokutek’s Fractal Tree Index Implementations •  MySQL Storage Engine (TokuDB) •  BerkeleyDB API •  File System (TokuFS)

•  Recently added Fractal Tree Indexes to MongoDB 2.2

•  Existing indexes are still supported •  Source changes are available via our blog at

www.tokutek.com/tokuview •  This is a work in progress (see roadmap

slides)

Who is Tokutek and what have we done?

Page 23: 20121024 mongodb-boston (1)

23

as simple as

db.test.ensureIndex({foo:1}, {v:2})

MongoDB and Fractal Tree Indexes

Page 24: 20121024 mongodb-boston (1)

24

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Node size, defaults to 4MB.

Indexing Options #1

Page 25: 20121024 mongodb-boston (1)

25

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Basement node size, defaults to 128K. •  Smallest retrievable unit of a leaf node,

efficient point queries

Indexing Options #2

Page 26: 20121024 mongodb-boston (1)

26

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Compression algorithm, defaults to quicklz. •  Supports quicklz, lzma, zlib, and none. •  LZMA provides 40% additional compression

beyond quicklz, needs more CPU. •  Decompression is of quicklz and lzma are

similar.

Indexing Options #3

Page 27: 20121024 mongodb-boston (1)

27

db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})

•  Clustering indexes store data by key and

include the entire document as the payload (rather than a pointer to the document)

•  Always “cover” a query, no need to retrieve the document data

Indexing Options #4

Page 28: 20121024 mongodb-boston (1)

28

How well does it perform?

Three Benchmarks •  Benchmark 1 : Raw insertion performance •  Benchmark 2 : Insertion plus queries •  Benchmark 3 : Covered indexes vs. clustering

indexes

Page 29: 20121024 mongodb-boston (1)

29

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank

Page 30: 20121024 mongodb-boston (1)

30

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.”

Page 31: 20121024 mongodb-boston (1)

31

Benchmarks…

Race Results •  First Place = John •  Second Place = Tim •  Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” Understand benchmark specifics and review all results.

Page 32: 20121024 mongodb-boston (1)

32

Benchmark 1 : Overview

•  Measure single threaded insertion performance •  Document is URI (character), name (character),

origin (character), creation date (timestamp), and expiration date (timestamp)

•  Secondary indexes on URI, name, origin, expiration •  Machine specifics: – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek

Controller (256MB, write-back), 4x10K SAS/RAID 0 – Ubuntu 10.04 Server (64-bit), ext4 filesystem – MongoDB v2.2.RC0

Page 33: 20121024 mongodb-boston (1)

33

Benchmark 1 : Without Journaling

Page 34: 20121024 mongodb-boston (1)

34

Benchmark 1 : With Journaling

Page 35: 20121024 mongodb-boston (1)

35

Benchmark 1 : Observations

•  Fractal Tree Indexing insertion performance is 8x better than standard MongoDB indexing with journaling, and 11x without journaling

•  Fractal Tree Indexing insertion performance reaches steady state, even at 200 million insertions. MongoDB insertion performance seems to be in continual decline at only 50 million insertions

•  B-tree performance is great until the working data set > RAM

Page 36: 20121024 mongodb-boston (1)

36

Benchmark 2 : Overview

•  Measure single threaded insertion performance while querying for 1000 documents with a URI greater than or equal to a randomly selected value once every 60 seconds

•  Document is same as benchmark 1 •  Secondary indexes on URI, name, origin, expiration •  Fractal Tree Index on URI is clustering – clustering indexes store entire document inline – Compression controls disk usage – no need to get document data from elsewhere –  db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true})

•  Same hardware as benchmark 1

Page 37: 20121024 mongodb-boston (1)

37

Benchmark 2 : Insertion Performance

Page 38: 20121024 mongodb-boston (1)

38

Benchmark 2 : Query Latency

Page 39: 20121024 mongodb-boston (1)

39

Benchmark 2 : Observations

•  Fractal Tree Indexing insertion performance is 10x better than standard MongoDB indexing

•  Fractal Tree Indexing query latency is 268x better than standard MongoDB indexing

•  B-tree performance is great until the working data set > RAM

•  Random lookups are bad

...but what about MongoDB’s covered indexes?

Page 40: 20121024 mongodb-boston (1)

40

Benchmark 3 : Overview

•  Same workload and hardware as benchmark 2 •  Create a MongoDB covered index on URI to

eliminate lookups in the data files. –  db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1})

Page 41: 20121024 mongodb-boston (1)

41

Benchmark 3 : Insertion Performance

Page 42: 20121024 mongodb-boston (1)

42

Benchmark 3 : Query Latency

Page 43: 20121024 mongodb-boston (1)

43

Benchmark 3 : Observations

•  Fractal Tree Indexing insertion performance is still 3.7x better than standard MongoDB indexing

•  Fractal Tree Indexing query latency is 3.2x better than standard MongoDB indexing (although the MongoDB performance is highly variable)

•  B-tree performance is great until the working data set > RAM

•  MongoDB’s covered indexes can help a lot – But what happens when I add new fields to my

document? o Do I drop and re-create by including my new field? o Do I live without it?

– Clustered Fractal Tree Indexes keep on covering your queries!

Page 44: 20121024 mongodb-boston (1)

44

Roadmap : Continuing the Implementation

•  Optimize Indexing Insert/Update/Delete Operations – Each of our secondary indexes is currently creating and

committing a transaction for each operation – A single transaction envelope will improve performance

Page 45: 20121024 mongodb-boston (1)

45

Roadmap : Continuing the Implementation

•  Add Support for Parallel Array Indexes – MongoDB does not support indexing the following two

fields: o {a: [1, 2], b: [1, 2]}

– “it could get out of hand” – Ticketed on 3/24/2010,

jira.mongodb.org/browse/SERVER-826 – Benchmark coming soon…

Page 46: 20121024 mongodb-boston (1)

46

Roadmap : Continuing the Implementation

•  Add Crash Safety – Our implementation is not [yet] crash safe with the

MongoDB PK/heap storage mechanism. – MongoDB journal is separate from Fractal Tree Index

logs. – Need to create a transactional envelope around both of

them

Page 47: 20121024 mongodb-boston (1)

47

Roadmap : Continuing the Implementation

•  Replace MongoDB data store and PK index – A clustering index on _id eliminates the need for two

storage systems – Compression greatly reduces disk footprint – This is a large task

Page 48: 20121024 mongodb-boston (1)

48

We are looking for evaluators!

Email me at [email protected]

See me after the presentation

Page 49: 20121024 mongodb-boston (1)

49

Questions?

Tim Callaghan [email protected]

@tmcallaghan

More detailed benchmark information in my blogs at

www.tokutek.com/tokuview