Indexing Strategies to Help You Scale

41
Indexing Strategies to Help you Scale Senior Solutions Architect, MongoDB Dmitry Baev

description

Learn all about Indexing Strategies for MongoDB.

Transcript of Indexing Strategies to Help You Scale

Page 1: Indexing Strategies to Help You Scale

Indexing Strategies to Help you Scale

Senior Solutions Architect, MongoDB

Dmitry Baev

Page 2: Indexing Strategies to Help You Scale

Agenda

• What are indexes?

• Indexing Basics

• Evaluation / Tuning

• Geospatial

• Text Search

• Scaling

Page 3: Indexing Strategies to Help You Scale

What Are Indexes?

Page 4: Indexing Strategies to Help You Scale

What Are Indexes?

Imagine you're looking for a recipe in a cookbook ordered by recipe name. Looking up a recipe by name is quick and easy.

Page 5: Indexing Strategies to Help You Scale

Consult the Index

Page 6: Indexing Strategies to Help You Scale

Linked List

Page 7: Indexing Strategies to Help You Scale

Finding 7 in a Linked List

Page 8: Indexing Strategies to Help You Scale

Finding 7 In a Tree

Page 9: Indexing Strategies to Help You Scale

Indexes in MongoDB are B-Trees

Page 10: Indexing Strategies to Help You Scale

Queries, inserts and deletes: O(log(n)) time

Page 11: Indexing Strategies to Help You Scale

Indexes are the single biggest tunable performance factor in MongoDB

Page 12: Indexing Strategies to Help You Scale

Indexing Basics

Page 13: Indexing Strategies to Help You Scale

13

• Single biggest tunable performance factor in the DB.– Index efficiency should be reviewed early– Avoid duplicates

– .

// index on author (ascending)>db.articles.ensureIndex( { author : 1 } )

// index on author (descending)>db.articles.ensureIndex( { author : -1 } )

// index on arrays of values – multi key index.>db.articles.ensureIndex( { tags : 1 } )

Indexing Basics

Page 14: Indexing Strategies to Help You Scale

14

• Index on sub-documents– Using dot notation

Sub-document indexes

{‘_id’ : ObjectId(..),

‘article_id’ : ObjectId(..), ‘section’ : ‘schema’,

‘date’ : ISODate(..),‘daily’: { ‘views’ : 45,

‘comments’ : 150 } ‘hours’ : { 0 : { ‘views’ : 10 }, 1 : { ‘views’ : 2 }, … 23 : { ‘views’ : 14,

‘comments’ : 10 } }}

>db.interactions.ensureIndex(

{ “daily.comments” : 1}

}

>db.interactions.find(

{“daily.comments” : { $gte : 150} } ,

{ _id:0, “daily.comments” : 1 } )

Page 15: Indexing Strategies to Help You Scale

15

• Indexes that use multiple values

Compound indexes

//To view via the console> db.articles.ensureIndex( { author : 1, tags : 1 } )

> db.articles.find( { author : ‘Joe D’, tags : ‘MongoDB’} )//and> db.articles.find( { author : ‘Joe D’ } )

// you don’t need this> db.articles.ensureIndex( { author : 1 } )

Page 16: Indexing Strategies to Help You Scale

16

• Sort doesn’t matter on single indexes– We can read from either side of the btree

• { attribute: 1 } or { attribute: -1 }

• Sort order matters on compound indexes– We’ll want to query on author and sort by date in the

application

Sort order

// index on author ascending but date descending

>db.articles.ensureIndex( { ‘author’ : 1, ‘date’ -1 } )

Page 17: Indexing Strategies to Help You Scale

17

• Returns data from the index– Rather than the database files– Performance optimization – Works with compound indexes

• Invoke with a projection

Covered or Index only Queries

> db.users.ensureIndex( { user : 1, password :1 } )

> db.user.find({ user:”joe” }, { _id:0, password:1 }

)

Tip: use projections anyway to reduce data sent back to the client

Page 18: Indexing Strategies to Help You Scale

18

Options

• Uniqueness constraints (unique, dropDups)

• Sparse Indexes

// index on author must be unique

>db.articles.ensureIndex( { ‘author’ : 1}, { unique : true } )

// allow multiple documents to not have likes field

>db.articles.ensureIndex( { ‘author’ : 1, ‘likes’ : 1}, { sparse: true } )

* Missing fields are stored as null(s) in the index

Page 19: Indexing Strategies to Help You Scale

19

Background Index Builds

• Index creation is a blocking operation that can take a long time

• Background creation yields to other operations

• Build more than one index in background concurrently

• Restart secondaries in standalone to build index

// To build in the background> db.articles.ensureIndex(

{ ‘author’ : 1, ‘date’ -1 }, {background : true}

)

Page 20: Indexing Strategies to Help You Scale

20

• Use to evaluate operations and indexes– Which indexes have been used.. If any.– How many documents / objects have been scanned– View via the console or via code

Explain plan

//To view via the console> db.articles.find({author:’Joe D'}).explain()

Page 21: Indexing Strategies to Help You Scale

21

Explain plan output (no index)

{"cursor" : ”BasicCursor","isMultiKey" : false,"n" : 12,"nscannedObjects" : 25820,"nscanned" : 25820,…"indexOnly" : false,…"millis" : 27,…

}

Other Types:

• BasicCursor• Full collection scan

• BtreeCursor• GeoSearchCursor• Complex Plan• TextCursor

Page 22: Indexing Strategies to Help You Scale

22

Explain plan output

{"cursor" : "BtreeCursor

author_1_date_-1","isMultiKey" : false,"n" : 12,"nscannedObjects" : 12,"nscanned" : 12,…"indexOnly" : false,…"millis" : 0,…

}

Other Types:

• BasicCursor• Full collection scan

• BtreeCursor• GeoSearchCursor• Complex Plan• TextCursor

Page 23: Indexing Strategies to Help You Scale

23

• Enable to see slow queries– (or all queries)– Default 100ms

Database profiler

//Enable database profiler on the console, 0=off 1=slow 2=all> db.setProfilingLevel(1, 100){ "was" : 0, "slowms" : 100, "ok" : 1 }

//View profile with > show profile

//or>db.system.profile.find().pretty()

Page 24: Indexing Strategies to Help You Scale

24

The Query Optimizer

• For each "type" of query, MongoDB periodically tries all useful indexes

• Aborts the rest as soon as one plan wins

• The winning plan is temporarily cached for each “type” of query (used for next 1,000 times)

• MongoDB 2.6 can use the intersection of multiple indexes to fulfill queries

Page 25: Indexing Strategies to Help You Scale

25

Other Index Types

• Geospatial Indexes (2d Sphere)

• Text Indexes

• TTL Collections (expireAfterSeconds)

• Hashed Indexes for sharding

Page 26: Indexing Strategies to Help You Scale

Geo Spatial Indexes

Page 27: Indexing Strategies to Help You Scale

27

• Indexes on geospatial fields– Using GeoJSON objects– Geometries on spheres

2dSphere

//GeoJSON object structure for indexing{ name: ’MongoDB Palo Alto’, location: { type : “Point”,

coordinates: [ 37.449157 , -122.158574 ] }}

// Index on GeoJSON objects>db.articles.ensureIndex( { location: “2dsphere” } )

Supported GeoJSON objects:

PointLineStringPolygonMultiPointMultiLineStringMultiPolygonGeometryCollection

Page 28: Indexing Strategies to Help You Scale

28

Extended Articles document

• Store the location article was posted from….

• Geo location from browser

Articles collections>db.articles.insert({

'text': 'Article content…’, 'date' : ISODate(...), 'title' : ’Intro to MongoDB’, 'author' : ’Joe D’, 'tags' : ['mongodb',

'database',

'nosql’],

‘location’ : { ‘type’ : ‘Point’, ‘coordinates’ :

[37.449, -122.158] }

});

//Javascript function to get geolocation.navigator.geolocation.getCurrentPosition();

//You will need to translate into GeoJSON

Page 29: Indexing Strategies to Help You Scale

29

– Query for locations ’near’ a particular coordinate

Example

>db.articles.find( { location: { $near :

{ $geometry : { type : "Point”, coordinates : [37.449, -

122.158] } }, $maxDistance : 5000 }

} )

Page 30: Indexing Strategies to Help You Scale

Text Search

Page 31: Indexing Strategies to Help You Scale

31

Text Indexes

• Use text indexes to support text search of string content in documents of a collection.

• Text indexes can include any field whose value is a string or an array of string elements.

• To perform queries that access the text index, use the $text query operator.

Page 32: Indexing Strategies to Help You Scale

32

Text Search

• Only one text index per collection

• $** operator to index all text fields in the collection

• Use weight to change importance of fields

>db.articles.ensureIndex({title: ”text”, content:

”text”})

>db.articles.ensureIndex( { "$**" : “text”,

name : “MyTextIndex”} )

>db.articles.ensureIndex( { "$**" : "text”}, { weights :

{ ”title" : 10, ”content" : 5}, name : ”MyTextIndex” })

Operators$text, $search, $language, $meta

Page 33: Indexing Strategies to Help You Scale

33

• Use the $text and $seach operators to query

• Now returns a cursor

• $meta for scoring results

– .// Search articles collection> db.articles.find ({$text: { $search: ”MongoDB" }})

> db.articles.find({ $text: { $search: "MongoDB" }}, { score: { $meta: "textScore" }, _id:0, title:1 } )

{ "title" : "Intro to MongoDB", "score" : 0.75 }

Search

Page 34: Indexing Strategies to Help You Scale

Scaling

Page 35: Indexing Strategies to Help You Scale

Working Set Exceeds Physical Memory

Page 36: Indexing Strategies to Help You Scale

• When a specific resource becomes a bottle neck on a machine or replica set

• RAM• Disk IO• Storage• Concurrency

When to consider Scaling?

Page 37: Indexing Strategies to Help You Scale

Vertical Scalability (Scale Up)

Page 38: Indexing Strategies to Help You Scale

Horizontal Scalability (Scale Out)

Page 39: Indexing Strategies to Help You Scale

Sharding

• User defines shard key

• Shard key defines range of data

• Data is partitioned into shards according to shard key

Page 40: Indexing Strategies to Help You Scale

40

Scalability

Auto-Sharding

• Increase capacity as you go

• Commodity and cloud architectures

• Improved operational simplicity and cost visibility

Page 41: Indexing Strategies to Help You Scale

Thank You