Indexing and Parallel Query Processing Support for Visualizing Climate Datasets
Indexing and Query Optimization Webinar
-
Upload
mongodb -
Category
Technology
-
view
4.073 -
download
4
description
Transcript of Indexing and Query Optimization Webinar
Indexing and Query OptimizationKevin Matulef
September 6, 2012
Thursday, September 6, 12
What’s in store
• What are indexes?
• Picking the right indexes.
• Creating indexes in MongoDB
• Troubleshooting
Thursday, September 6, 12
Indexes are the single biggesttunable performance factor
in MongoDB.
Thursday, September 6, 12
Absent or suboptimal indexes are the most common avoidable
MongoDB performance problem.
Thursday, September 6, 12
So what problem do indexes solve?
Thursday, September 6, 12
Thursday, September 6, 12
How do you find a chicken recipe?
• An unindexed cookbook might be quite a page turner.
• Probably not what you want, though.
Thursday, September 6, 12
I know, I’ll use an index!
Thursday, September 6, 12
Thursday, September 6, 12
Let’s imagine a simple index
ingredient page
aardvark 790
... ...
beef 190, 191, 205, ...
... ...
chicken 182, 199, 200, ...
chorizo 497, ...
... ...
zucchini 673, 986, ...
Thursday, September 6, 12
How do you find a quick chicken recipe?
Thursday, September 6, 12
Let’s imagine a compound index
ingredient cooking time page
... ... ...
chicken 15 min 182, 200
chicken 25 min 199
chicken 30 min 289,316,320
chicken 45 min 290, 291, 354
... ... ...
Thursday, September 6, 12
Consider the ordering of index keys
Chicken, 15 min
Chicken, 45 min
Chicken, 25 min
Chicken, 30 min
Aardvark, 20 min Zuchinni, 45 min
Thursday, September 6, 12
How about a low-calorie chicken recipe?
Thursday, September 6, 12
Let’s imagine a 2nd compound index
ingredient calories page
... ... ...
chicken 250 199, 316
chicken 300 289,291
chicken 425 320
... ... ...
Thursday, September 6, 12
How about a quick, low-calorie recipe?
Thursday, September 6, 12
Let’s imagine a last compound index
calories cooking time page
... ... ...
250 25 min 199
250 30 min 316
300 25 min 289
300 45 min 291
425 30 min 320
... ... ...
How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes?
Thursday, September 6, 12
Consider the ordering of index keys
250 cal,25 min
250 cal,30 min
300 cal,25 min
300 cal,45 min
How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes?
4 index entries will be scanned, but only 1 will match!
425 cal,30 min
Thursday, September 6, 12
Range queries using an index on A, B• A is a range J
• A is constant, B is a range J
• A is constant, order by B J
• A is range, B is constant/range K
• B is constant/range, A unspecified L
Thursday, September 6, 12
It’s really that straightforward.
Thursday, September 6, 12
B-Trees (Bayer & McCreight ’72)
Thursday, September 6, 12
B-Trees (Bayer & McCreight ’72)
13
Thursday, September 6, 12
B-Trees (Bayer & McCreight ’72)
13
Queries, Inserts, Deletes: O(log n)
Thursday, September 6, 12
All this is relevant to MongoDB.
• MongoDB’s indexes are B-Trees, which are designed for range queries.
• Generally, the best index for your queries is going to be a compound index.
• Every additional index slows down inserts & removes, and may slow updates.
Thursday, September 6, 12
On to MongoDB!
Thursday, September 6, 12
Declaring Indexes
• db.foo.ensureIndex( { username : 1 } )
Thursday, September 6, 12
Declaring Indexes
• db.foo.ensureIndex( { username : 1 } )
• db.foo.ensureIndex( { username : 1, created_at : -1 } )
Thursday, September 6, 12
And managing them....
> db.system.indexes.find() //db.foo.getIndexes()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }
Thursday, September 6, 12
And managing them....
> db.system.indexes.find() //db.foo.getIndexes()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }
> db.foo.dropIndex( { username : 1} )
{ "nIndexesWas" : 2 , "ok" : 1 }
Thursday, September 6, 12
Key info about MongoDB’s indexes• A collection may have at most 64 indexes.
Thursday, September 6, 12
Key info about MongoDB’s indexes• A collection may have at most 64 indexes.
• “_id” index is automatic (except capped collections before 2.2)
Thursday, September 6, 12
Key info about MongoDB’s indexes• A collection may have at most 64 indexes.
• “_id” index is automatic (except capped collections before 2.2)
• All queries can use just 1 index (except $or queries).
Thursday, September 6, 12
Key info about MongoDB’s indexes• A collection may have at most 64 indexes.
• “_id” index is automatic (except capped collections before 2.2)
• All queries can use just 1 index (except $or queries).
• The maximum index key size is 1024 bytes.
Thursday, September 6, 12
Indexes get used where you’d expect
• db.foo.find({x : 42}) • db.foo.find({x : {$in : [42,52]}}) • db.foo.find({x : {$lt : 42})• update, findAndModify that select on x,• count, distinct,• $match in aggregation• left-anchored regexp, e.g. /^Kev/
Thursday, September 6, 12
But indexes aren’t always helpful
• Most negations: $not, $nin, $ne
• Some corner cases: $mod, $where
• Matching most regular expressions, e.g. /a/ or /foo/i
Thursday, September 6, 12
Advanced Options
Thursday, September 6, 12
Arrays: the powerful “multiKey” index
{ title : “Chicken Noodle Soup”, ingredients : [“chicken”, “noodles”] }
ingredients page
chicken 42
... ...
noodles 42
... ...
> db.foo.ensureIndex( { ingredients : 1 } )
Thursday, September 6, 12
Unique Indexes
• db.foo.ensureIndex( { email : 1 } , {unique : true} )
> db.foo.insert({email : “[email protected]”})> db.foo.insert({email : “[email protected]”}) E11000 duplicate key error ...
Thursday, September 6, 12
Sparse Indexes
• db.foo.ensureIndex( { email : 1 } , {sparse : true} )
No index entries for docs without “email” field
Thursday, September 6, 12
Geospatial Indexes
{ name: "10gen Office", lat_long: [ 52.5184, 13.387 ] }
> db.foo.ensureIndex( { lat_long : “2d” } )
> db.locations.find( { lat_long: {$near: [52.53, 13.4] } } )
Thursday, September 6, 12
Troubleshooting
Thursday, September 6, 12
The Query Optimizer
• For each “type” of query, mongoDB periodically tries all useful indexes.
• Aborts as soon as one plan wins.
• Winning plan is temporarily cached.
Thursday, September 6, 12
Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ){ "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ...}
Thursday, September 6, 12
Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ){ "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ...}
Pay attention to the ratio n/nscanned!
Thursday, September 6, 12
Think you know better? Give us a hint> db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} )
Thursday, September 6, 12
Recording slow queries> db.setProfilingLevel( n , slowms=100ms )
n=0 profiler offn=1 record queries longer than slowms n=2 record all queries
> db.system.profile.find()
Thursday, September 6, 12
Operational Tips
Thursday, September 6, 12
Background index builds
db.foo.ensureIndex( { user : 1 } , { background : true } )
Caveats:• still resource-intensive• will build in foreground on secondaries
Thursday, September 6, 12
Minimizing impact on Replica Sets
for (s in secondaries) s.restartAsStandalone() s.buildIndex() s.restartAsReplSetMember() s.waitForCatchup()
p.stepDown()p.restartAsStandalone()p.buildIndex()p.restartAsReplSetMember()
Thursday, September 6, 12
Absent or suboptimal indexes are the most common avoidable
MongoDB performance problem...
...so take some time and get your indexes right!
Thursday, September 6, 12
Thanks!
(and thanks to Richard Kreuter for the slides)
Thursday, September 6, 12