Nosql part 2
-
Upload
ruruchowdhury -
Category
Education
-
view
267 -
download
0
description
Transcript of Nosql part 2
NoSQL & MongoDB..Part II
Arindam Chatterjee
Indexes in MongoDB
• Indexes support the efficient resolution of queries in MongoDB.
–Without indexes, MongoDB must scan every document in a collection to select
those documents that match the query statement.
–These collection scans are inefficient and require the mongod to process a large
volume of data for each operation.
• Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form.
–The index stores the value of a specific field or set of fields, ordered by the value of
the field.
• Indexes in MongoDB are similar to indexes in other database systems.
• MongoDB defines indexes at the collection level and supports indexes on
any field or sub-field of the documents in a MongoDB collection.
Indexes in MongoDB..2
• The following diagram illustrates a query that selects documents using an
index.
MongoDB narrows the query by scanning the range of documents with values of score
less than 30.
Indexes in MongoDB..3
• MongoDB can use indexes to return documents sorted by the index key
directly from the index without requiring an additional sort phase.
Descending
Indexes in MongoDB..4
Index Types
• Default _id
–All MongoDB collections have an index on the _id field that exists by default. If
applications do not specify a value for _id the driver or the mongod will create an
_id field with an ObjectID value.
–The _id index is unique, and prevents clients from inserting two documents with
the same value for the _id field.
• Single Field
–MongoDB supports user-defined indexes on a single field of a document.
Example: Index on score filed (ascending)
Indexes in MongoDB..5
Index Types
• Compound Index
–These are user-defined indexes on multiple fields
Example: Diagram of a compound index on the userid field (ascending) and the
score field (descending). The index sorts first by the userid field and then by the
score field.
Indexes in MongoDB..6
Index Types
• Multikey Index
–MongoDB uses multikey indexes to index the content stored in arrays.
–If we index a field that holds an array value, MongoDB creates separate index
entries for every element of the array.
–These multikey indexes allow queries to select documents that contain arrays by
matching on element or elements of the arrays.
–MongoDB automatically determines whether to create a multikey index if the
indexed field contains an array value; we do not need to explicitly specify the
multikey type.
Indexes in MongoDB..7
Index Types
• Multikey Index: Illustration
Diagram of a multikey index on the addr.zip field.
The addr field contains an array of address
documents. The address documents contain the
zip field.
Indexes in MongoDB..8
Other Index Types
• Geospatial Index
– MongoDB provides two special indexes: 2d indexes that uses planar geometry
when returning results and 2sphere indexes that use spherical geometry to
return results.
• Text Index
– MongoDB provides a beta text index type that supports searching for string
content in a collection.
– These text indexes do not store language-specific stop words (e.g. “the”, “a”,
“or”) and stem the words in a collection to only store root words.
• Hashed Index
– To support hash based sharding, MongoDB provides a hashed index type,
which indexes the hash of the value of a field. These indexes have a more
random distribution of values along their range, but only support equality
matches and cannot support range-based queries.
Indexes in MongoDB..9
Explicit creation of Index
• Using ensureIndex() from shell
– The following creates an index on the phone-number field of the people collection
• db.people.ensureIndex( { "phone-number": 1 } ) .
– The following operation will create an index on the item, category, and price fields of the products collection
• db.products.ensureIndex( { item: 1, category: 1, price: 1 } )
– unique constraint prevent applications from inserting documents that have duplicate values for the inserted fields. The following example creates a unique index on the "tax-id": of the accounts collection to prevent storing multiple account records for the same legal entity
• db.accounts.ensureIndex( { "tax-id": 1 }, { unique: true } )
– ensureIndex() only creates an index if an index of the same specification does not already exist.
Indexes in MongoDB..10
Indexing Strategies
• Create Indexes to Support Your Queries
– An index supports a query when the index contains all the fields scanned by the query. Creating indexes that supports queries results in greatly increased query performance.
• Use Indexes to Sort Query Results
– To support efficient queries, use the strategies here when you specify the sequential order and sort order of index fields.
• Ensure Indexes Fit in RAM
– When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing.
• Create Queries that Ensure Selectivity
– Selectivity is the ability of a query to narrow results using the index. Selectivity allows MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
Indexes in MongoDB..11
• Indexes to Support Queries
– For commonly issued queries, create indexes. If a query searches multiple
fields, create a compound index. Scanning an index is much faster than
scanning a collection.
– Consider a posts collection containing blog posts, and if we need to regularly
issue a query that sorts on the author_name field, then we can optimize the
query by creating an index on the author_name field
• db.posts.ensureIndex( { author_name : 1 } )
– If we regularly issue a query that sorts on the timestamp field, then we can
optimize the query by creating an index on the timestamp field
• db.posts.ensureIndex( { timestamp : 1 } )
If we want to limit the results to reduce network load, we can use limit()
• db.posts.find().sort( { timestamp : -1 } ).limit(10) [
Indexes in MongoDB..12
• Index Administration
– Detailed information about indexes is stored in the system.indexes collection of
each database.
– system.indexes is a reserved collection, so we cannot insert documents into it
or remove documents from it. We can manipulate its documents only through
ensureIndex and the dropIndexes database command.
• Running Index at Background
– Building indexes is time-consuming and resource-intensive. Using the
{"background" : true} option builds the index in the background, while handling
incoming requests.
• > db.people.ensureIndex({"username" : 1}, {"background" : true})
– If we do not include the “background” option, the database will block all other
requests while the index is being built.
– Creating indexes on existing documents is faster than creating the index first
and then inserting all of the documents.
Indexes in MongoDB..12
• Do’s and Do not’s
– Create index only on the keys required for the query
• Indexes create additional overhead on the database
• Insert, Update and Delete operations become slow with too many idexes
– Index direction is important if there are more than one keys
• Index with {"username" : 1, "age" : -1} and {"username" : 1, "age" : 1} have different connotation
– There is a built-in maximum of 64 indexes per collection, which is more than
almost any application should need.
– Delete Index with “dropIndexes” if it is not required
– Sometimes the most efficient solution is actually not to use an index. In general,
if a query is returning a half or more of the collection, it will be more efficient for
the database to just do a table scan instead of having to look up the index and
then the value for almost every single document.
Exercise 2
• Insert records in collection userdetail
– {"username" : "smith", "age" : 48, "user_id" : 0 }
– {"username" : "smith", "age" : 30, "user_id" : 1 }
– {"username" : "john", "age" : 36, "user_id" : 2 }
– {"username" : "john", "age" : 18, "user_id" : 3 }
– {"username" : "joe", "age" : 36, "user_id" : 4 }
– {"username" : "john", "age" : 7, "user_id" : 5 }
– {"username" : "simon", "age" : 3, "user_id" : 6 }
– {"username" : "joe", "age" : 27, "user_id" : 7 }
– {"username" : "jacob", "age" : 17, "user_id" : 8 }
– {"username" : "sally", "age" : 52, "user_id" : 9 }
– {"username" : "simon", "age" : 59, "user_id" : 10 }
• Run the ensureIndex operation
– db.userdetail.ensureIndex({"username" : 1, "age" : -1})
Data Modelling in MongoDB
Data Modelling in MongoDB
• MongoDB has flexible Schema unlike Relational Databases. We need not declare
Table’s schema before inserting data.
• MongoDB’s collections do not enforce document structure
• There are 2 ways of mapping Relationships
–References
–Embedded Documents
Example: References• Both the “contact” and
“access” documents
contain a reference to the
“user” document.
• These are normalized data models
Data Modelling in MongoDB..2
Example: Embedded Documents“contact” and “access” are subdocuments embedded in main document. This is a “denormalized” data model
Data Modelling in MongoDB..3
References vs. Embedded Documents
References: Used when
• embedding would result in
duplication of data but would not
provide sufficient read
performance advantages to
outweigh the implications of the
duplication.
• to represent more complex many-
to-many relationships.
• to model large hierarchical data
sets.
Embedded documents: Used when
• we have “contains” relationships
between entities.
• we have one-to-many relationships
between entities. In these
relationships the “many” or child
documents always appear with or
are viewed in the context of the
“one” or parent documents.
• We need applications to store
related pieces of information in the
same database record.
Data Modelling in MongoDB..4
One to many relationships : Example where Embedding is advantageous
Using References{
_id: “chat",name: "ABC Chat"
}
{
patron_id: "chat",street: "10 Simla Street",
city: "Kolkata",
zip: 700006
}
{
patron_id: "chat",
street: "132 Lanka Street",
city: "Mumbai",
zip: 400032}
Issue with above: If the application frequently retrieves the address data with the name information, then your application needs to issue multiple queries to resolve the references
Using Embedded documents{
_id: "chat",name: "ABC Chat",
addresses: [
{
street: "10 Simla Street",
city: "Kolkata",zip: 700006
},
{
street: "132 Lanka Street",
zip: 400032}
]
}
With the embedded data model, the application can retrieve the complete patron information with one query.
Data Modelling in MongoDB..5One to many relationships : Example where referencing is advantageous
Using Embedded documents{
title: "MongoDB: The Definitive Guide",author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {name: "O'Reilly Media",
location: "CA",
}
}
{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,language: "English",
publisher: {
name: "O'Reilly Media",
location: "CA",}
}
Issue with above: Embedding leads to repetition of publisher data.
Using Reference{
_id: "oreilly",name: "O'Reilly Media",
location: "CA"
}
{_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,language: "English",
publisher_id: "oreilly"
}
{
_id: 234567890,title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,language: "English",
publisher_id: "oreilly"
}
Publisher Information kept separately in the above example to avoid repetition.
Data Modelling in MongoDB..6Tree structure with parent references
Data Modelling in MongoDB..7
• The following lines of code describes the tree structure in previous slide
– db.categories.insert( { _id: "MongoDB", parent: "Databases" } )
– db.categories.insert( { _id: “dbm", parent: "Databases" } )
– db.categories.insert( { _id: "Databases", parent: "Programming" } )
– db.categories.insert( { _id: "Languages", parent: "Programming" } )
– db.categories.insert( { _id: "Programming", parent: "Books" } )
– db.categories.insert( { _id: "Books", parent: null } )
• The query to retrieve the parent of a node
– db.categories.findOne( { _id: "MongoDB" } ).parent;
• Query by the parent field to find its immediate children nodes
– db.categories.find( { parent: "Databases" } );
Modelling Tree structure with Parent reference
Data Modelling in MongoDB..8
• The following lines of code describes the sametree structure
– db.categories.insert( { _id: "MongoDB", children: [] } );
– db.categories.insert( { _id: “dbm", children: [] } );
– db.categories.insert( { _id: "Databases", children: [ "MongoDB", “dbm" ] } );
– db.categories.insert( { _id: "Languages", children: [] } )
– db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } );
– db.categories.insert( { _id: "Books", children: [ "Programming" ] } );
• The query to retrieve the immediate child of a node
– db.categories.findOne( { _id: "Databases" } ).children;
• Query by the child field to find its parent nodes
– db.categories.find( { children: "MongoDB" } );
Modelling Tree structure with Child reference
Data Modelling in MongoDB..8
• Example (Online purchase portal):
– Step I: Insert data in a collection called “books” including the number of available copies
– Step II: Check if the book is available during checkout
Code
– Step I:
db.book.insert ({
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly",
available: 3,
checkout: [ { by: "joe", date: ISODate("2012-10-15") } ]
});
Data Modelling for “Atomic” operations
Data Modelling in MongoDB..9
Code
– Step II
db.book.findAndModify ( {
query: {
_id: 123456789,
available: { $gt: 0 }
},
update: {
$inc: { available: -1 },
$push: { checkout: { by: "abc", date: new Date() } }
}
} );
– In the above example, db.collection.findAndModify() method is used to atomically determine if a book is available for checkout and update with the new checkout information.
– Embedding the available field and the checkout field within the same document ensures that the updates to these fields are in sync:
Data Modelling for “Atomic” operations
Data Modelling in MongoDB..10
Example: Perform a keyword based search in a collection “volumes”
– Step I: Insert data in a collection “volumes”
db.volumes.insert ({
title : "Moby-Dick" ,
author : "Herman Melville" ,
published : 1851 ,
ISBN : 0451526996 ,
topics : [ "whaling" , "allegory" , "revenge" , "American" ,
"novel" , "nautical" , "voyage" , "Cape Cod" ]
});
In the above example, several topics are included on which we can perform keyword search
– Step II: create a multi-key index on the topics array
db.volumes.ensureIndex( { topics: 1 } )
– Step III: Search based on keyword “voyage”
• db.volumes.findOne( { topics : "voyage" }, { title: 1 } )
Keyword based Search
Exercise
• Create a collection named product meant for albums. The album can have several
product types including Audio Album and Movie.
• Record of Audio album can be created with the following attributes
– Record 1 (music Album) sku (character, unique identifier), type-Audio Album ,title:”Remembering Manna De”, description “By Music lovers”, physical_description (weight, width, height, depth), pricing (list, retail, savings, pct_savings), details (title, artist,genre (“bengalimodern”, “bengali film”), tracks (“birth”, “childhood”, “growing up”, “end”)
– Record 2 (movie) with similar details and description pertaining to movie (e.g. director, writer, music director, actors)
• Assignment
– Write a query to return all products with a discount>10%
– Write a query which will return the documents for the albums of a specific genre, sorted in reverse chronological order
– Write a query which selects films that a particular actor starred in, sorted by issue date