Indexing and Query Optimization

43
Technical Support Engineer [email protected] Thomas Rückstieß Indexing and Query Optimization #MongoDBDays

Transcript of Indexing and Query Optimization

Page 1: Indexing and Query Optimization

Technical Support [email protected]

Thomas Rückstieß

Indexing and Query Optimization

#MongoDBDays

Page 2: Indexing and Query Optimization

Agenda

• What are indexes?

• Why do I need them?

• Working with indexes in MongoDB

• Optimize your queries

• Avoiding common mistakes

Page 3: Indexing and Query Optimization

What are indexes?

Page 4: Indexing and Query Optimization

KRISTINE TO INSERT IMAGE OF COOKBOOK

Find a recipe by name: “Currywurst”

Page 5: Indexing and Query Optimization

What are indexes?

• How would you find a recipe using chicken?

• How about a 300-500 calorie recipe using chicken?

Chicken88 kcal: Chicken soup356 kcal: Chicken Ceasar

Salad480 kcal: Teriyaki Chicken680 kcal: Buffalo Chicken

WingsFish

412 kcal: Tuna SandwichPork

480 kcal: Curry Wurst

88 kcalChicken: Chicken soup

356 kcal Chicken: Chicken Ceasar

Salad412 kcal

Fish: Tuna Sandwich480 kcal

Pork: Curry WurstChicken: Teriyaki Chicken

680 kcal Chicken: Buffalo Chicken

Wings

Page 6: Indexing and Query Optimization

Linked List

Page 7: Indexing and Query Optimization

Finding 7 in Linked List

Page 8: Indexing and Query Optimization

Finding 7 in Tree

Page 9: Indexing and Query Optimization

Indexes in MongoDB are B-trees

Page 10: Indexing and Query Optimization

Indexes are the single biggest tunable performance factor in MongoDB

Page 11: Indexing and Query Optimization

Absent or suboptimal indexes are the most common avoidable MongoDB performance problem.

Page 12: Indexing and Query Optimization

Why do I need indexes?A brief story

5 min

30 sec

Page 13: Indexing and Query Optimization

Why do I need indexes?

3h

Page 14: Indexing and Query Optimization

Working with Indexes in MongoDB

Page 15: Indexing and Query Optimization

// The client remembers the index and raises no errorsdb.recipes.ensureIndex({ main_ingredient: 1 })

* 1 means ascending, -1 descending

How do I create indexes?

Page 16: Indexing and Query Optimization

// Multiple fields (compound indexes)db.recipes.ensureIndex({ main_ingredient: 1, calories: -1})

// Arrays of values (multikey indexes){ name: ’Curry Wurst mit Pommes’, ingredients : [’pork', ’curry'] }

db.recipes.ensureIndex({ ingredients: 1 })

What can be indexed?

Image: http://www.marions-kochbuch.com/

Page 17: Indexing and Query Optimization

// Subdocuments{ name : ’Curry Wurst mit Pommes', contributor: { name: ’Hans Wurst', id: ’hawu36' }}

db.recipes.ensureIndex({ 'contributor.id': 1 })

db.recipes.ensureIndex({ 'contributor': 1 })

What can be indexed?

Image: http://www.marions-kochbuch.com/

Page 18: Indexing and Query Optimization

// List a collection's indexes

db.recipes.getIndexes()

db.recipes.getIndexKeys()

// Drop a specific index

db.recipes.dropIndex({ ingredients: 1 })

// Drop all indexes and recreate them

db.recipes.reIndex()

// Default (unique) index on _id

How do I manage indexes?

Page 19: Indexing and Query Optimization

// Index creation is a blocking operation that can take a long time

// Background creation yields to other operations

db.recipes.ensureIndex(

{ ingredients: 1 },

{ background: true }

)

Background Index Builds

Page 20: Indexing and Query Optimization

Options

• Uniqueness constraints (unique, dropDups)

• Sparse Indexes

• Geospatial (2d) Indexes

• TTL Collections (expireAfterSeconds)

Page 21: Indexing and Query Optimization

// Only one recipe can have a given value for name

db.recipes.ensureIndex( { name: 1 }, { unique: true } )

// Force index on collection with duplicate recipe names – drop the duplicates

db.recipes.ensureIndex(

{ name: 1 },

{ unique: true, dropDups: true }

)

* dropDups is probably never what you want

Uniqueness Constraints

image: www.idownloadblog.com

Page 22: Indexing and Query Optimization

// Only documents with field calories will be indexed

db.recipes.ensureIndex(

{ calories: -1 },

{ sparse: true }

)

// Allow multiple documents to not have calories field

db.recipes.ensureIndex(

{ name: 1 , calories: -1 },

{ unique: true, sparse: true }

)

* Missing fields are stored as null(s) in the index

Sparse Indexes

Page 23: Indexing and Query Optimization

// Add latitude, longitude coordinates

{

name: ’Curry 36 am Mehringdamm’,

loc: [ 13.387764, 52.493442]

}

// Index the coordinates

db.locations.ensureIndex( { loc : '2d' } )

// Query for locations 'near' a particular coordinate

db.locations.find({

loc: { $near: [ 37.4, -122.3 ] }

})

Geospatial Indexes

image: NASA

Page 24: Indexing and Query Optimization

// Documents must have a BSON UTC Date field

{ ’submitted_date' : ISODate('2012-10-12T05:24:07.211Z'), … }

// Documents are removed after // 'expireAfterSeconds' seconds

db.recipes.ensureIndex(

{ submitted_date: 1 },

{ expireAfterSeconds: 3600 }

)

TTL Collections

image: taylordsdn112.wordpress.com

Page 25: Indexing and Query Optimization

Limitations

• Collections can not have > 64 indexes.

• Index keys can not be > 1024 bytes (1K).

• The name of an index, including the namespace,

must be < 128 characters.

• Queries can only use 1 index*

• Indexes have storage requirements, and impact the

performance of writes.

• In memory sort (no-index) limited to 32mb of return

data.

Page 26: Indexing and Query Optimization

Optimize Your Queries

Page 27: Indexing and Query Optimization

db.setProfilingLevel( n , slowms=100ms )

n=0 profiler off

n=1 record operations longer than slowms

n=2 record all queries

db.system.profile.find()

* The profile collection is a capped collection, and fixed in size

Profiling Slow Ops

image: http://www.speareducation.com/

Page 28: Indexing and Query Optimization

db.recipes.find( { calories:

{ $lt : 40 } }

).explain( )

{

"cursor" : "BasicCursor" ,

"n" : 42,

"nscannedObjects” : 12345

"nscanned" : 12345,

...

"millis" : 356,

...

}

* Doesn’t use cached plans, re-evals and resets cache

The Explain Plan (without Index)

Page 29: Indexing and Query Optimization

db.recipes.find( { calories:

{ $lt : 40 } }

).explain( )

{

"cursor" : "BtreeCursor calories_-1" ,

"n" : 42,

"nscannedObjects": 42

"nscanned" : 42,

...

"millis" : 0,

...

}

* Doesn’t use cached plans, re-evals and resets cache

The Explain Plan (with Index)

Page 30: Indexing and Query Optimization

The Query Optimizer

• For each "type" of query, MongoDB periodically tries all useful indexes• Aborts the rest as soon as one plan wins• The winning plan is temporarily cached for each “type” of query

Page 31: Indexing and Query Optimization

// Tell the database what index to use

db.recipes.find({

calories: { $lt: 1000 } }

).hint({ _id: 1 })

// Tell the database to NOT use an index

db.recipes.find(

{ calories: { $lt: 1000 } }

).hint({ $natural: 1 })

Manually Select Index to Use

Page 32: Indexing and Query Optimization

// Given the following indexdb.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })

// The following query and sort operations can use the indexdb.collection.find( ).sort({ a:1 })db.collection.find( ).sort({ a:1, b:1 })

db.collection.find({ a:4 }).sort({ a:1, b:1 })db.collection.find({ b:5 }).sort({ a:1, b:1 })

Use Indexes to Sort Query Results

Page 33: Indexing and Query Optimization

// Given the following index

db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })

// These can not sort using the index

db.collection.find( ).sort({ b: 1 })

db.collection.find({ b: 5 }).sort({ b: 1 })

Indexes that won’t work for sorting query results

Page 34: Indexing and Query Optimization

// MongoDB can return data from just the index

db.recipes.ensureIndex({ main_ingredient: 1, name: 1 })

// Return only the ingredients field

db.recipes.find(

{ main_ingredient: 'chicken’ },

{ _id: 0, name: 1 }

)

// indexOnly will be true in the explain plan

db.recipes.find(

{ main_ingredient: 'chicken' },

{ _id: 0, name: 1 }

).explain()

{

"indexOnly": true,

}

Index Covered Queries

Page 35: Indexing and Query Optimization

Absent or suboptimal indexes are the most common avoidable MongoDB performance problem.

Page 36: Indexing and Query Optimization

Avoiding Common Mistakes

Page 37: Indexing and Query Optimization

// MongoDB can only use one index for a query

db.collection.ensureIndex({ a: 1 })

db.collection.ensureIndex({ b: 1 })

// Only one of the above indexes is used

db.collection.find({ a: 3, b: 4 })

Trying to Use Multiple Indexes

Page 38: Indexing and Query Optimization

// Compound key indexes are very effective

db.collection.ensureIndex({ a: 1, b: 1, c: 1 })

// But only if the query is a prefix of the index

// This query can't effectively use the index

db.collection.find({ c: 2 })

// …but this query can

db.collection.find({ a: 3, b: 5 })

Compound Key Mistakes

Page 39: Indexing and Query Optimization

db.collection.distinct('status’)

[ 'new', 'processed' ]

db.collection.ensureIndex({ status: 1 })

// Low selectivity indexes provide little benefit

db.collection.find({ status: 'new' })

// Better

db.collection.ensureIndex({ status: 1, created_at: -1 })

db.collection.find(

{ status: 'new' }

).sort({ created_at: -1 })

Low Selectivity Indexes

Page 40: Indexing and Query Optimization

db.users.ensureIndex({ username: 1 })

// Left anchored regex queries can use the index

db.users.find({ username: /^hans wurst/ })

// But not generic regexes

db.users.find({username: /wurst/ })

// Or case insensitive queries

db.users.find({ username: /Hans/i })

Regular Expressions

Page 41: Indexing and Query Optimization

// Indexes aren't helpful with negationsdb.things.ensureIndex({ x: 1 })

// e.g. "not equal" queries db.things.find({ x: { $ne: 3 } })

// …or "not in" queriesdb.things.find({ x: { $nin: [2, 3, 4 ] } })

// …or the $not operatordb.people.find({ name: { $not: ’Hans Wurst' } })

Negation

Page 42: Indexing and Query Optimization

Choosing the right indexes is one of the most important things you can do as a MongoDB developer so take the time to get your indexes right!

Page 43: Indexing and Query Optimization

Technical Support [email protected]

Thomas Rückstieß

Thank you

#MongoDBDays