Indexing and Query Optimization Webinar

51
Indexing and Query Optimization Kevin Matulef September 6, 2012 Thursday, September 6, 12

description

MongoDB supports a wide range of indexing options to enable fast querying of your data. In this talk we'll cover how indexing works, the various indexing options, and cover use cases where each might be useful.

Transcript of Indexing and Query Optimization Webinar

Page 1: Indexing and Query Optimization Webinar

Indexing  and  Query  OptimizationKevin  Matulef

September  6,  2012

Thursday, September 6, 12

Page 2: Indexing and Query Optimization Webinar

What’s in store

• What are indexes?

• Picking the right indexes.

• Creating indexes in MongoDB

• Troubleshooting

Thursday, September 6, 12

Page 3: Indexing and Query Optimization Webinar

Indexes are the single biggesttunable performance factor

in MongoDB.

Thursday, September 6, 12

Page 4: Indexing and Query Optimization Webinar

Absent or suboptimal indexes are the most common avoidable

MongoDB performance problem.

Thursday, September 6, 12

Page 5: Indexing and Query Optimization Webinar

So what problem do indexes solve?

Thursday, September 6, 12

Page 6: Indexing and Query Optimization Webinar

Thursday, September 6, 12

Page 7: Indexing and Query Optimization Webinar

How do you find a chicken recipe?

• An unindexed cookbook might be quite a page turner.

• Probably not what you want, though.

Thursday, September 6, 12

Page 8: Indexing and Query Optimization Webinar

I know, I’ll use an index!

Thursday, September 6, 12

Page 9: Indexing and Query Optimization Webinar

Thursday, September 6, 12

Page 10: Indexing and Query Optimization Webinar

Let’s imagine a simple index

ingredient page

aardvark 790

... ...

beef 190,  191,  205,  ...

... ...

chicken 182,  199,  200,  ...  

chorizo 497,  ...

... ...

zucchini 673,  986,  ...

Thursday, September 6, 12

Page 11: Indexing and Query Optimization Webinar

How do you find a quick chicken recipe?

Thursday, September 6, 12

Page 12: Indexing and Query Optimization Webinar

Let’s imagine a compound index

ingredient cooking  time page

... ... ...

chicken 15  min 182,  200

chicken 25  min 199

chicken 30  min 289,316,320

chicken 45  min 290,  291,  354

... ... ...

Thursday, September 6, 12

Page 13: Indexing and Query Optimization Webinar

Consider the ordering of index keys

Chicken,  15  min

Chicken,  45  min

Chicken,  25  min

Chicken,  30  min

Aardvark,  20  min Zuchinni,  45  min

Thursday, September 6, 12

Page 14: Indexing and Query Optimization Webinar

How about a low-calorie chicken recipe?

Thursday, September 6, 12

Page 15: Indexing and Query Optimization Webinar

Let’s imagine a 2nd compound index

ingredient calories page

... ... ...

chicken 250 199,  316

chicken 300 289,291

chicken 425 320

... ... ...

Thursday, September 6, 12

Page 16: Indexing and Query Optimization Webinar

How about a quick, low-calorie recipe?

Thursday, September 6, 12

Page 17: Indexing and Query Optimization Webinar

Let’s imagine a last compound index

calories cooking  time page

... ... ...

250 25  min 199

250 30  min 316

300 25  min 289

300 45  min 291

425 30  min 320

... ... ...

How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes?

Thursday, September 6, 12

Page 18: Indexing and Query Optimization Webinar

Consider the ordering of index keys

250  cal,25  min

250  cal,30  min

300  cal,25  min

300  cal,45  min

How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes?

4 index entries will be scanned, but only 1 will match!

425  cal,30  min

Thursday, September 6, 12

Page 19: Indexing and Query Optimization Webinar

Range queries using an index on A, B• A is a range J

• A is constant, B is a range J

• A is constant, order by B J

• A is range, B is constant/range K

• B is constant/range, A unspecified L

Thursday, September 6, 12

Page 20: Indexing and Query Optimization Webinar

It’s really that straightforward.

Thursday, September 6, 12

Page 21: Indexing and Query Optimization Webinar

B-Trees (Bayer & McCreight ’72)

Thursday, September 6, 12

Page 22: Indexing and Query Optimization Webinar

B-Trees (Bayer & McCreight ’72)

13

Thursday, September 6, 12

Page 23: Indexing and Query Optimization Webinar

B-Trees (Bayer & McCreight ’72)

13

Queries,  Inserts,  Deletes:  O(log  n)

Thursday, September 6, 12

Page 24: Indexing and Query Optimization Webinar

All this is relevant to MongoDB.

• MongoDB’s indexes are B-Trees, which are designed for range queries.

• Generally, the best index for your queries is going to be a compound index.

• Every additional index slows down inserts & removes, and may slow updates.

Thursday, September 6, 12

Page 25: Indexing and Query Optimization Webinar

On to MongoDB!

Thursday, September 6, 12

Page 26: Indexing and Query Optimization Webinar

Declaring Indexes

• db.foo.ensureIndex( { username : 1 } )

Thursday, September 6, 12

Page 27: Indexing and Query Optimization Webinar

Declaring Indexes

• db.foo.ensureIndex( { username : 1 } )

• db.foo.ensureIndex( { username : 1, created_at : -1 } )

Thursday, September 6, 12

Page 28: Indexing and Query Optimization Webinar

And managing them....

> db.system.indexes.find() //db.foo.getIndexes()

{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }

Thursday, September 6, 12

Page 29: Indexing and Query Optimization Webinar

And managing them....

> db.system.indexes.find() //db.foo.getIndexes()

{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }

> db.foo.dropIndex( { username : 1} )

{ "nIndexesWas" : 2 , "ok" : 1 }

Thursday, September 6, 12

Page 30: Indexing and Query Optimization Webinar

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

Thursday, September 6, 12

Page 31: Indexing and Query Optimization Webinar

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

• “_id” index is automatic (except capped collections before 2.2)

Thursday, September 6, 12

Page 32: Indexing and Query Optimization Webinar

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

• “_id” index is automatic (except capped collections before 2.2)

• All queries can use just 1 index (except $or queries).

Thursday, September 6, 12

Page 33: Indexing and Query Optimization Webinar

Key info about MongoDB’s indexes• A collection may have at most 64 indexes.

• “_id” index is automatic (except capped collections before 2.2)

• All queries can use just 1 index (except $or queries).

• The maximum index key size is 1024 bytes.

Thursday, September 6, 12

Page 34: Indexing and Query Optimization Webinar

Indexes get used where you’d expect

• db.foo.find({x : 42}) • db.foo.find({x : {$in : [42,52]}}) • db.foo.find({x : {$lt : 42})• update, findAndModify that select on x,• count, distinct,• $match in aggregation• left-anchored regexp, e.g. /^Kev/

Thursday, September 6, 12

Page 35: Indexing and Query Optimization Webinar

But indexes aren’t always helpful

• Most negations: $not, $nin, $ne

• Some corner cases: $mod, $where

• Matching most regular expressions, e.g. /a/ or /foo/i

Thursday, September 6, 12

Page 36: Indexing and Query Optimization Webinar

Advanced Options

Thursday, September 6, 12

Page 37: Indexing and Query Optimization Webinar

Arrays: the powerful “multiKey” index

{ title : “Chicken Noodle Soup”, ingredients : [“chicken”, “noodles”] }

ingredients page

chicken 42

... ...

noodles 42

... ...

>  db.foo.ensureIndex(  {  ingredients  :  1  }  )

Thursday, September 6, 12

Page 38: Indexing and Query Optimization Webinar

Unique Indexes

• db.foo.ensureIndex( { email : 1 } , {unique : true} )

> db.foo.insert({email : “[email protected]”})> db.foo.insert({email : “[email protected]”}) E11000 duplicate key error ...

Thursday, September 6, 12

Page 39: Indexing and Query Optimization Webinar

Sparse Indexes

• db.foo.ensureIndex( { email : 1 } , {sparse : true} )

No index entries for docs without “email” field

Thursday, September 6, 12

Page 40: Indexing and Query Optimization Webinar

Geospatial Indexes

{ name: "10gen Office", lat_long: [ 52.5184, 13.387 ] }

> db.foo.ensureIndex( { lat_long : “2d” } )

> db.locations.find( { lat_long: {$near: [52.53, 13.4] } } )

Thursday, September 6, 12

Page 41: Indexing and Query Optimization Webinar

Troubleshooting

Thursday, September 6, 12

Page 42: Indexing and Query Optimization Webinar

The Query Optimizer

• For each “type” of query, mongoDB periodically tries all useful indexes.

• Aborts as soon as one plan wins.

• Winning plan is temporarily cached.

Thursday, September 6, 12

Page 43: Indexing and Query Optimization Webinar

Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ){ "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ...}

Thursday, September 6, 12

Page 44: Indexing and Query Optimization Webinar

Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ){ "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ...}

Pay attention to the ratio  n/nscanned!

Thursday, September 6, 12

Page 45: Indexing and Query Optimization Webinar

Think you know better? Give us a hint> db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} )

Thursday, September 6, 12

Page 46: Indexing and Query Optimization Webinar

Recording slow queries> db.setProfilingLevel( n , slowms=100ms )

n=0 profiler offn=1 record queries longer than slowms n=2 record all queries

> db.system.profile.find()

Thursday, September 6, 12

Page 47: Indexing and Query Optimization Webinar

Operational Tips

Thursday, September 6, 12

Page 48: Indexing and Query Optimization Webinar

Background index builds

db.foo.ensureIndex( { user : 1 } , { background : true } )

Caveats:• still resource-intensive• will build in foreground on secondaries

Thursday, September 6, 12

Page 49: Indexing and Query Optimization Webinar

Minimizing impact on Replica Sets

for (s in secondaries) s.restartAsStandalone() s.buildIndex() s.restartAsReplSetMember() s.waitForCatchup()

p.stepDown()p.restartAsStandalone()p.buildIndex()p.restartAsReplSetMember()

Thursday, September 6, 12

Page 50: Indexing and Query Optimization Webinar

Absent or suboptimal indexes are the most common avoidable

MongoDB performance problem...

...so take some time and get your indexes right!

Thursday, September 6, 12

Page 51: Indexing and Query Optimization Webinar

Thanks!

(and thanks to Richard Kreuter for the slides)

Thursday, September 6, 12