Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and...

49
© 2017 Percona 1 Scaling MongoDB Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer

Transcript of Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and...

Page 1: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona1

Scaling MongoDB

Percona Webinar - Wed October 18th 11:00 AM PDTAdamo ToneteMongoDB Senior Service Technical Service Engineer

Page 2: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona2

Me and the expected audience

● @adamotonete

● Intermediate - At least 6+ months MongoDB experience

Page 3: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona3

Agenda● Overview on MongoDB● The adamo.com story scaling out● The adamo.com story scaling down● Review● Q&A

Page 4: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona4

Overview on MongoDB● Fast;● Document-oriented database;● Easy to deploy and manage;● Secure database;● HA by default;● Easy to scale.

Page 5: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona5

Internals● Document size limit is 16 MB

● Different storage enginesMMAPv2WiredTigerRocksDBIn Memory

Page 6: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona6

MongoDB shares features with Relational Databases such as:

● Indexes;● Query Optimizer - Explain;● Cache Management;● Backups;● Restores.

Internals

Page 7: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona7

adamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider.

In this webinar, I’m going to tell you the story of how the company evolved from one instance to a sharded environment.

The adamo.com company story

Single instance

8 GB RAM, 2 processors - WiredTiger as Storage Engine

I

Page 8: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona8

After a few months, the database started answering queries very slowly. The company noticed high processor usage and a very active disk, so they decided to increase the instance type.

The adamo.com company story

Single instance

ETL process

16 GB RAM, 4 processors - WiredTiger as Storage Engine

I

Page 9: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona9

Even after increasing the instance type, the company still noticed slow reads and writes. Processor usage was still high and the company still noticed high disk I/O.To make matters worse, the database failed after a few weeks.

The adamo.com company story

Page 10: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona10

The adamo.com company story

Single instance ETL process

Cache

Cache evicted by ETL

Free memory16 GB RAM, 4 processors - WiredTiger as Storage Engine

Page 11: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona11

So, why was this database still slow even after increasing the machine resources? What could be done to avoid outages?

Now the adamo.com company had learned both how to use replica-sets and the advantages of having multiple instances with High Availability.

The HA was solved, the single instance became a replica-set with 3 members, but they were still having issues on the primary instance. The same problem as they had faced before.

The adamo.com company story

Page 12: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona12

A new environment. The replica-set was working properly but still facing issues while ETL was running, and the application auto fail-over was not working properly.

The adamo.com company story

ETL

ETL

Free memory

CACHE

Page 13: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona13

A few names you should know for the next step:

● Read Preference● Write Concern● Replication Lag● Oplog Window

The adamo.com company story

Page 14: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona14

A new environment. The replica-set was working properly but still facing issues while ETL was running, and the application auto fail-over was not working properly.

The adamo.com company story

ETL

ETL

Free memory

CACHE

Page 15: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona15

A few names you should know for the next steps:

● Read Preference: Most application drivers feature the read preference option. This is, in simple words, where the application will get data from.

All writes will be routed to the primary, but you can change the read preference option to:- Primary, secondary_prefered, secondary_only, nearest, and tags.

The adamo.com company story

Page 16: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona16

A few names you should know for the next steps:

● Write Concern: MongoDB offers eventual consistency, but we can tune it up and have more than one instance confirming a write command. This is very useful to guarantee data consistency across the replica-set and/or datacenters.

The write concerns are: 1,2,<n> and majority, where majority means ½ + 1 instance from the replica-set.

The adamo.com company story

Page 17: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona17

A few names you should know for the next steps:

● Replication Lag: MongoDB replication between instances doesn't occur in real time. It is asynchronous. A secondary instance with fewer resources than the primary can easily fall behind to receive new writes and updates.

The most common issues are slow disks, limited bandwidth, and slow processors.

Replication lag is a critical metric when you run queries on secondaries.

The adamo.com company story

Page 18: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona18

A few names you should know for the next steps:

● Oplog Window: Per design, the replication is asynchronous and all the primary commands are saved on the oplog.rs collection.The oplog collection is a fixed-size collection that saves such operations, so that the secondaries can pull and apply the same commands.As it is a fixed-size collection, the old documents will be replaced by new ones when it gets full.The period of time between the first and the last command in this collection is called oplog window.

The adamo.com company story

Page 19: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona19

A new environment, spreading the read to all the secondaries, and primary only receives writes.

The adamo.com company story

S

P

S

ETL

Page 20: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona20

The new environment seemed to be really robust, but after a while the company started facing new issues.

The understanding of the concepts below is required to discuss the next issues.

● Hidden Instance● Priority● Votes● Arbiters

The adamo.com company story

Page 21: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona21

When someone from BI was running their ETL, a few clients started noticing slowness, why?

The adamo.com company story

ETL

App is reading from here as well

Page 22: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona22

The adamo.com company story

S

P

S HSHidden secondary

PRIMARY

ETL

Page 23: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona23

The adamo.com company storyThe hidden secondary was completely hidden from the replica-set, which means that the application was not connected to the instance to perform reads.

In the case above, only the ETL process read data from this secondary.

If we need more reads, it is ok to add new secondaries. However, we must make sure we don't add too many of them, as issues such as replication delay may occur.

Page 24: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona24

The adamo.com company story

S

P

S HS

Hidden secondary

PRIMARY

ETL

S S

Page 25: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona25

The adamo.com company story

The replication delay can occur for several reasons. A replica-set may, for example, perform chained replication, which in other words means the database is replicating to a secondary, and that secondary is replicating to another secondary.

The chained replication can be disabled on the replica-set configuration.

● https://docs.mongodb.com/manual/tutorial/manage-chained-replication/

Page 26: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona26

The adamo.com company story

S

P

S HS

Hidden secondary

PRIMARY

ETL

S SNot In syncReplication Delay

Page 27: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona27

The adamo.com company story

Not all the instances will be eligible to vote. Replica-sets can only have up to 7 instances voting. Therefore, if there are more than 7 instances in the replica-set, only 7 can vote.As good practice, keep those close to the "primary" and choose the best ones.

Sometimes we need a mongodb instance that will only check whether the other instances are available. These are called arbiters and arbiters don’t keep data, they only perform votes.

Page 28: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona28

The adamo.com company story

S

P

S HS

PRIMARY

ETL

S S

Votes : 1Priority : 1 Votes : 1

Priority : 1

Votes : 0Priority : 0

Votes :0Priority :0

Votes : 1Priority : 1

Votes : 0Priority : 0

Page 29: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona29

The adamo.com company story

It is time to consider sharding.

Even with the replica-sets, the number of writes we need as well as the working set are making it too expensive to keep everything in a single replica-set. Sharding will create a virtual database among replicas and split the load between those replica sets, which will be called shards.

Page 30: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona30

A simple 2-shard cluster

S

P

SS

P

S

config

mongos

Page 31: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona31

Shards - Important wordsMongoS is the proxy process that the application will connect to.As the database is sharded ,there is no guarantee that all the data is in just one shard.

ConfigServers are the servers that store the data location. These special instances save all the data location in order to enable the mongos to find the expected data. This data is called cluster metadata.

A Shard is a replica set that belongs to a cluster.

A Cluster is a combination of 1 or more shards along with a config server and mongos.

Page 32: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona32

Shards - Important wordsA Shard Key is the field where the collection will be split.

Chunks are small amounts of data, usually 64M, based on the shard key. There are several chunks in a database after it is sharded.

A Balancer is the process that moves data between shards or even inside shards to different chunks.

Page 33: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona33

Replicaset - Moving to a clusteradamo.com was now one of the most popular websites.Even though the replica-set helped them to handle reads, writes became a problem.It was time to start developing a cluster strategy.

S

P

SHS

ETL

S S

Votes : 1Priority : 1 Votes : 1

Priority : 1

Votes : 0Priority : 0

Votes :0Priority :0

Votes : 1Priority : 1

Votes : 0Priority : 0

MongoS

Config

Page 34: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona34

Shards - 1-shard cluster

S

P

S

config

mongos

HSS

1-shard cluster

Page 35: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona35

The adamo.com company story - shards

S

P

S

config

mongos

HSS

S

P

S

HSS

2-shards cluster

Page 36: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona36

Shards - Adding new shardsEven after a new shard was added, the sample company didn’t see any improvement in the write performance.

In fact, the main database was not yet configured as a sharded database, so all the reads and writes were going to shard 1 only.

Page 37: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona37

Shards - Adding new shards

S

P

S

config

mongos

HSS

S

P

S

HSS

used

non used

ETL

Page 38: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona38

Shards - Sharding a databaseIt is necessary to tell the config servers that we want to shard a database.After running:

sh.shardDatabase('mydatabase')

We need to start sharding the collections using a shard key.The most used collection is called posts, and for this collection we shard based on the hash of the _id field.We can change shards by range, but we choose the hash _id to speed up the writes and make them random.

Page 39: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona39

Shards - Sharding a database

S

P

S

config

mongos

HSS

SP

S

HS S

ETL

balancer

Page 40: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona40

Shards - Balancer After the balancer funs for a while, both shards keep data. Each part has its data and the information about where the data is stored is in the config database.

adamo.com noticed that the etl was only moving "half" of the expected data because the data was now split between the shards.

Page 41: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona41

Shards - Balancer

S

P

S

config

mongos

HSS

SP

S

HS S

ETL

balancer

S

Page 42: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona42

The adamo.com company story - shardETL couldn't connect to the hidden secondaries because all the connections were made through the mongos.To work around this situation, the company decided to use tag-aware reads.

In order to perform the reads, each instance had to have a tag, for different purposes: read or ETL.

In this case, the application will perform reads from instances with the "read" tag and the ETL will only read from the ETL tag.

https://docs.mongodb.com/v3.2/core/tag-aware-sharding/

Page 43: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona43

Shards - Read Tag Aware

S

P

S

config

mongos

SSS

SP

S S

S S

ETL

Tag: readTag: read

Tag: ETL

Tag: ETL

Page 44: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona44

The adamo.com company storyWith such architecture, the example company could have 10 shards running at the same time, but unfortunately the success of the website didn't last too long.

After 2 years running on a 10-shard replica-set, the company decided to start shrinking the environment to just 2 shards again.

Page 45: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona45

The adamo.com company storyThe process was the inverse of sharding, where the company needed to remove shards from the cluster and wait until all the data moved to different shards.

The balancer process works the other way round. Data from the shard was removed to the remaining shards and the config servers were updated to the new data location.

All of those steps can be done online meaning that no downtime is required.

Page 46: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona46

Review● Single Instances are not recommended;● Replica-sets must have at least 3 instances storing data;● Hidden instances can't be read by drivers;● A cluster can have only one shard but there is no performance improvement;● Adding new shards is easy, but you need to pick the right shard key to split the data

among the instances;● Use read tags to connect through the mongos in order to read from a specific

instance in the shard.● Scale down is as easy as scale out in mongodb

Some subjects, such as backups and security, were omitted from this presentation to keep it simple and concise.

Page 47: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona47

Resources:

https://www.percona.com/blog/2017/10/16/when-should-i-enable-mongodb-sharding/

https://docs.mongodb.com/manual/sharding/

https://www.percona.com/blog/2016/12/16/mongodb-pit-backups-in-depth/

Review

Page 48: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

© 2017 Percona48

Q&A

Page 49: Scaling MongoDB - Perconaadamo.com is a startup that started using MongoDB as a single instance and was using a single machine on a cloud provider. In this webinar, I’m going to

DATABASE PERFORMANCEMATTERS

Database Performance MattersDatabase Performance MattersDatabase Performance MattersDatabase Performance MattersDatabase Performance Matters