Post on 07-Jun-2018
Ross LawleyPython Engineer @ 10genWeb developer since 1999
Passionate about open sourceAgile methodology
email: ross@10gen.comtwitter: RossC0
Today's Talk• Scaling• Understanding mongoDB's architecture
• Schema design and usage• Replication• Sharding
• Optimization & Tuning• Schema & Index Design• O/S tuning• Hardware configuration
• Vertical scaling• Hardware is expensive• Hard to scale in cloud
How do you scale now?
$$$
throughput
Write scaling - add shards
write
read
shard1
node_c1
node_b1
node_a1
shard2
node_c2
node_b2
node_a2
shard3
node_c3
node_b3
node_a3
Understanding mongoDB's architecture
http://www.flickr.com/photos/tragiclyflawed/867687742
mongod architecture• mongod memory maps the numbered & .ns files
• Memory mapping makes in-place updates effective
• File page residency decisions left to operating system
mongod Data Files• Fixed-size extents in data files store records,
indexes
• Records contain documents
• Unused records get placed in free lists
• Indexes are B-Trees
Deleted Record (Size, Offset, Next)
BSON Data
Header (Size, Offset, Next, Prev)
Padding
...
...
mongod Concurrency• Readers block writers• A writer blocks everything• Everybody yields periodically• When a new writer queues up, new readers block
• In v2.0 and earlier, this concurrency model is global to the mongod
• In v2.2, this model will be scoped to the database
Architecture Summary• Uses memory mapped files - RAM• Faster disks - more IOPS (SSDs are good!)• CPU usage low
Schema
http://www.magentocommerce.com/wiki/2_-_magento_concepts_and_architecture/magento_database_diagram
Schema• Data model effects performance• Embedding versus Linking• Roundtrips to database• Disk seek time• Size of data to read & write
• Partial versus full document writes• Partial versus full document reads
Query for {a: 7}
{...} {...} {...} {...} {...} {...} {...} {...} {...} {...} {...}
[-∞, 5) [5, 10) [10, ∞)
[5, 7) [7, 9) [9, 10) [10, ∞) buckets[-∞, 5) buckets
With Index
Without index - Scan
> db.users.ensureIndex({uname: 1, first: 1, last: 1})
> db.users.find( { uname: "RossC0" }, {_id: 0, first: 1, last: 1})
Covered IndexesUse just the index
Schema• Schema and data usage critical for scaling and
performance
• Understand data access patterns
• Use indexes but don't over index
Replication
http://www.flickr.com/photos/10335017@N07/4570943043
Replication• mongoDB replication like MySQL replication• Asynchronous master/slave
• Replica sets• A cluster of N servers• All writes to primary• Reads can be to primary (default) or a secondary• Any (one) node can be primary• Consensus election of primary• Automatic failover• Automatic recovery
How mongoDB Replication works
• Election establishes the PRIMARY• Data replication from PRIMARY to SECONDARY
Member 1
Member 2
Primary
Member 3
How mongoDB Replication works
• PRIMARY may fail• Automatic election of new PRIMARY if majority exists
Member 1
Member 2
DOWN
Member 3
negotiate new master
How mongoDB Replication works
• New PRIMARY elected• Replica Set re-established
Member 1
Member 2
DOWN
Member 3
Primary
> cfg = { _id : "myset", members : [ { _id : 0, host : "germany1.acme.com" }, { _id : 1, host : "germany2.acme.com" }, { _id : 2, host : "germany3.acme.com" } ] }
> use admin> db.runCommand( { replSetInitiate : cfg } )
Creating a Replica Set
Replica Set Member Types
• Normal {priority: 1}• Passive {priority: 0}• Cannot be elected as PRIMARY
• Arbiters• Can vote in an election• Do not hold any data
• Hidden {hidden: True}• Tagging:• {tags: {"dc": "germany", "rack": r23s5}}
Safe writesdb.runCommand({getLastError: 1, w : 1})
• ensure write is synchronous• command returns after primary has written to memory
w: n or w: 'majority'• n is the number of nodes data must be replicated to• driver will always send writes to Primary
w: 'my_tag'• Each member is "tagged" e.g. "allDCs"• Ensure that the write is executed in each tagged "region"
j: true• Ensures changes are flushed to the Journal
Replication features• Reads from Primary are always consistent
• Reads from Secondaries are eventually consistent
• Can be used to scale reads
• Automatic failover if a Primary fails
• Automatic recovery when a node joins the set
Sharding
http://www.flickr.com/photos/60218876@N08/6888405266
What is Sharding?• Ad-hoc partitioning
• Consistent hashing• Amazon Dynamo
• Range based partitioning• Google BigTable• Yahoo! PNUTS• mongoDB
mongoDB Sharding• Automatic partitioning and management
• Range based
• Convert to sharded system with no downtime
• Fully consistent
> db.runCommand({addshard: "shard1"});> db.runCommand({shardCollection: "mydb.users", key: {age: 1}})
How mongoDB Sharding works
Range keys from -∞ to +∞ Ranges are stored as "chunks"
-∞ +∞
> db.users.save({age: 40})
How mongoDB Sharding works
Data in insertedRanges are split into more "chunks"
-∞ +∞
-∞ 40 41 +∞
> db.users.save({age: 40})> db.users.save({age: 50})
How mongoDB Sharding works
More data insertedRanges are split into more "chunks"
-∞ +∞
-∞ 40 41 +∞
41 50 51 +∞
> db.users.save({age: 40})> db.users.save({age: 50})> db.users.save({age: 60})
How mongoDB Sharding works
-∞ +∞
-∞ 40 41 +∞
41 50 51 +∞
61 +∞ 51 60
-∞ +∞
41 +∞
51 +∞
-∞ 40
41 50
61 +∞ 51 60
> db.users.save({age: 40})> db.users.save({age: 50})> db.users.save({age: 60})
How mongo Sharding works
-∞ 40
41 50
61 +∞
51 60
shard1
> db.users.save({age: 40})> db.users.save({age: 50})> db.users.save({age: 60})
How mongo Sharding works
-∞ 40
41 50
61 +∞
51 60
> db.runCommand({addshard: "shard2"});> db.runCommand({addshard: "shard3"});
How mongoDB Sharding works
-∞ 40
41 50
61 +∞
51 60
shard1
> db.runCommand({addshard: "shard2"});> db.runCommand({addshard: "shard3"});
How mongoDB Sharding works
-∞ 40
41 50
61 +∞
51 60
shard1 shard2 shard3
> db.runCommand({addshard: "shard2"});> db.runCommand({addshard: "shard3"});
How mongoDB Sharding works
Sharding Features• Shard data without no downtime • Automatic balancing as data is written• Commands routed (switched) to correct node• Inserts - must have the Shard Key• Updates - must have the Shard Key• Queries• With Shard Key - routed to nodes• Without Shard Key - scatter gather
• Indexed Queries• With Shard Key - routed in order• Without Shard Key - distributed sort merge
Scaling with mongoDB• Schema & Index design• Simplest way to scale
• Replication• Provides High Availabilty• Can be used to automatically scale reads
• Sharding • Automatically scale writes