Introduction to Sharding with MongoDB

Introduction to MongoDB Sharding

Alberto LernerSoftware Engineer – 10Gen

[email protected]

What is it about?

• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case

Sharding Basics

• To maintain the impression that things look like this

SearchCriteria using an index

scanning the collection

Sharding Basics (cont)

• When they actually are like this

SearchCriteria using an index

scanning the collection

A Detail

• Partitioning a collection is relatively easy• A bit of application logic to find a partition and

that’s it• Or is it?

The Certainty

• Things change– You get spotted, your querying volume grows– You build new functionality, your access pattern

changes– You buy new machines, your fixed partitioning

scheme goes out the window

Insurance

• Sharding is not about partitioning. It’s about repartitioning without you bothering to ask– Adding or removing shards– Splitting and moving chunks*– Logic of finding a chunk is MongoDB’s not the

application’s

* Chunk: an (arbitrary) unit that can move at once between shards

What is it about?


Starting to Shard

• You can load data into a sharded collection or shard an existing one*– Automatic range partition will take place – The data placement will be taken care of

• By default, it will be sharded over _id but you can specify a different sharding key– An index will be built automatically over that key

* 1.6

On Writes

• Write capacity becomes the sum of shards capacity

A digression

• A shard can actually live in a group of replicated servers

• Fault-tolerance is obtained that way• Our focus here is incremental scalability and

aggregated performance

On Reads, I

• Lookup over the shard key or a prefix thereof• Sharding at its best!– Search criteria can be satisfied by a single chunk– Lookup inside chunk uses index– May or may not need to access the collection

• Example:– Shard by user_id, return the user’s name

On Reads, II

• Lookup over secondary index• Not bad: merges results from shards• Example: {country : “UK”} with secondary index over

country

On Reads, III

• Lookups where indexes won’t help• Traversing shards sequentially or in parallel?*

*1.6

What is it about?


The Sharding Key

• Choose wisely; you’re marrying it• Often, you’re better off defining a unique key

that stores data the application wants to query

• (Internally generated _id is really not it)

Mind Your Queries

• Sure, dynamic partitioning is automatic• But, ultimately, the system’s response time

and scalability is connected to how your application query it

• If most important queries fall into category I, remaining ones in II, and seldom any query that matters in III, you’ll be fine

Pick Your Indexes

• MongoDB allows sharding and secondary indexes

• Critical queries that are not served by the sharding index can use help

• Sometimes, you can’t help them all…• Index selection is a trade-off between

querying and updates/insertion/deletions

What is it about?


Bit.ly History

• User creates URL shortener• Sharding is used to store all past URL’s of a

user– Sharding key: user_id– Indexes: timestamp(desc)

• Queries:– Shortened URLs by a given user– Last n URLs by any user

Take Away

• Picture to keep in mind

Questions?

www.mongodb.org

Introduction to Sharding with MongoDB

Technology

Transcript of Introduction to Sharding with MongoDB