A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization
Introduction to Sharding with MongoDB
-
Upload
mongodb -
Category
Technology
-
view
5.598 -
download
3
description
Transcript of Introduction to Sharding with MongoDB
What is it about?
• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case
Sharding Basics
• To maintain the impression that things look like this
SearchCriteria using an index
scanning the collection
Sharding Basics (cont)
• When they actually are like this
SearchCriteria using an index
scanning the collection
A Detail
• Partitioning a collection is relatively easy• A bit of application logic to find a partition and
that’s it• Or is it?
The Certainty
• Things change– You get spotted, your querying volume grows– You build new functionality, your access pattern
changes– You buy new machines, your fixed partitioning
scheme goes out the window
Insurance
• Sharding is not about partitioning. It’s about repartitioning without you bothering to ask– Adding or removing shards– Splitting and moving chunks*– Logic of finding a chunk is MongoDB’s not the
application’s
* Chunk: an (arbitrary) unit that can move at once between shards
What is it about?
• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case
Starting to Shard
• You can load data into a sharded collection or shard an existing one*– Automatic range partition will take place – The data placement will be taken care of
• By default, it will be sharded over _id but you can specify a different sharding key– An index will be built automatically over that key
* 1.6
On Writes
• Write capacity becomes the sum of shards capacity
A digression
• A shard can actually live in a group of replicated servers
• Fault-tolerance is obtained that way• Our focus here is incremental scalability and
aggregated performance
On Reads, I
• Lookup over the shard key or a prefix thereof• Sharding at its best!– Search criteria can be satisfied by a single chunk– Lookup inside chunk uses index– May or may not need to access the collection
• Example:– Shard by user_id, return the user’s name
On Reads, II
• Lookup over secondary index• Not bad: merges results from shards• Example: {country : “UK”} with secondary index over
country
On Reads, III
• Lookups where indexes won’t help• Traversing shards sequentially or in parallel?*
*1.6
What is it about?
• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case
The Sharding Key
• Choose wisely; you’re marrying it• Often, you’re better off defining a unique key
that stores data the application wants to query
• (Internally generated _id is really not it)
Mind Your Queries
• Sure, dynamic partitioning is automatic• But, ultimately, the system’s response time
and scalability is connected to how your application query it
• If most important queries fall into category I, remaining ones in II, and seldom any query that matters in III, you’ll be fine
Pick Your Indexes
• MongoDB allows sharding and secondary indexes
• Critical queries that are not served by the sharding index can use help
• Sometimes, you can’t help them all…• Index selection is a trade-off between
querying and updates/insertion/deletions
What is it about?
• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case
Bit.ly History
• User creates URL shortener• Sharding is used to store all past URL’s of a
user– Sharding key: user_id– Indexes: timestamp(desc)
• Queries:– Shortened URLs by a given user– Last n URLs by any user
Take Away
• Picture to keep in mind
Questions?
www.mongodb.org