Agility and Scalability with MongoDB

MongoDB Scalability and Agility

Chris.Biow@MongoDB.com

• Now

• Secure

• All varieties

• Fast and interactive

• Scalable to “Big”

• Agile to develop and deploy operationally

• Cloud and edge

Data Challenge“I want my data...”

iStock licensed (pixelfit)

Scalability with MongoDB

Metric Meaning Examples

Operations per Second

Concurrent reads and writes per second

> 1 Million per second

Nodes per Cluster

Horizontal scale-out, distributed to multiple data centers worldwide, with high availability, using inexpensive cloud resources

> 1000 nodes

Records / Documents

Data objects in any number of schemas or structures

> 10 billion

Data Volume Total amount of data: documents X size

> 1 Petabyte = 10^15 = 1,000,000,000,000,000≈ 2^50

Key Differentiation

Operational Database Landscape

Document Data Model

Relational MongoDB

{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Documents are Rich Data Structures

{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Coordinate

Document Model Benefits

• Agility and flexibility– Data model supports business change– Rapidly iterate to meet new requirements

• Intuitive, natural data representation– Eliminates ORM layer– Developers are more productive

• Reduces the need for joins, disk seeks– Programming is more simple– Performance delivered at scale

Big Data Tech Interest Comparison

j.mp/Ssvpev

Enterprise Adoption Comparison

bit.ly/1vAI7rF

Architecture for Availability & Scalability

Replica Sets

• Replica Set – two or more copies

• Availability solution– High Availability

– Disaster Recovery

– Maintenance

• Deployment Flexibility– Data locality to users

– Workload isolation: operational & analytics

• Self-healing shard

Primary

Driver

Application

Secondary

Replication

Global Data Distribution

Real-time

Real-time Real-time

Real-time

Primary

Secondary

Automatic Sharding

• Sharding types

• Range

• Hash

• Tag-aware

• Elastic increase or decrease in capacity

• Automatic balancing

Query Routing

• Multiple query optimization models

• Each sharding option appropriate for different apps

Performance

Drag Strip: straight ahead, quarter-mile, stop

Road Race:stay fast, stay agile, continuous

Nürburgring, Germany

MongoDB at Scale

• Large data set

CarFax

Baseline MongoDB Comparison Initial Production

• Vehicle History Database

• 11 billion records (growing at 1 billion per year)

• 30-year-old VMS-based RDBMS

• Cumbersome

• Costly

• Performance: 4x faster than baseline, 10x key-value

• Scale out using inexpensive commodity servers

• Built-in redundancy

• Flexible dynamic schema data model

• Strong consistency

• Analytics/aggregation

• MongoDB is primary data store

• 50 servers• 10 shards• 5 node replica sets per

In-depth NoSQL evaluation

• 13 billion+ documents– 1.5 billion documents added every year

• 1 vehicle history report is > 200 documents

• 12 Shards

• 9-node replica sets

• Replicas distributed across 3 data centers

CARFAX Sharding and Replication

CARFAX Replication

• 50M users.

• 6B check-ins to date (6M per day growth).

• 55M points of interest / venues.

• 1.7M merchants using the platform for marketing

• Operations Per Second: 300,000

• Documents: 5.5B (~16.5B with replication).*

Foursquare

• 11 MongoDB clusters– 8 are sharded

• Largest cluster for check-ins

• 15 shards (check ins)

• Shard key user_id

Foursquare clusters

Facebook / parse.com mobile apps

• Persistent database for 270,000 mobile applications

• 200 M end-user mobile devices

• 250% annual growth in client apps

• 500% growth in requests

• 1.5 M collections

• Key differentiators:

– Document data model

– High perf. & avail.

– Geospatial query and index

• Charity Majors operations: j.mp/X3jVRC

– Understand your database and your data, and build for them.

Scalability Exercises in the Cloud with Amazon Web Services

• 27x hs1.8xlarge instances

– 16x VCPU

– 24x 2TB SATA drives, RAID0

– 8x mongod microshards

• Modified Yahoo Cloud Serving Benchmark (YCSB)

– Long Integer IDs (>2B)

– Zipfian-distributed integer fields

– Aggregation queries

• Load direct to 216 shards, 10 days, $4K "objects" : 7,170,648,489, "avgObjSize" : 147,438.99952658816, "dataSize" : NumberLong("1,057,240,224,818,640") (commas added)

Petascale Database

CGroup Memory Segregation

for DB in `seq 0 3`; do sudo cgcreate \ -a mongodb:mongodb \ -t mongodb:mongodb \ -g memory:mongodb$D sudo echo 48G > \ /sys/fs/cgroup/memory/mongodb$D/memory.limit_in_bytes cgexec \ -g memory:mongodb$DB \ numactl –interleave=all \ mongod –-config ~/mongod$DB.confdone

• Ingest 250-byte stock quotes at 2M/s

• Concurrently run 5 QPS, subsecond/indexed response on timeStamp, accountId, instrumentId, systemKey

• 5x r3.4xlarge– 16x VCPU, 1x 320GB SSD, 122GB RAM, 16x mongod

– 2.1M insert/second direct to shards

• 16x c3.8xlarge– 32x VCPU, 2x 320GB SSD, 60GB RAM, 16x mongod, 4x mongos

– 2.1M insert/second via mongos

Megawrite Ingest

• 2 threads on c3.8xl

• 264 bsonsize object, _id index only

• coll.insert() 15,600 ins / sec

• coll.insert(List<DBObject>)listsize = 64: 118,000 ins / sec

• Bulk ops APIsize = 64: 120,000 ins / sec

Java API comparison

BulkWriteOperation bo = null; for(a = 0; a < this.items && stayAlive; a++) { if(bo == null) { bo = collection.initializeUnorderedBulkOperation(); } fillMap(this.m); BasicDBObject dbObject = new BasicDBObject(this.m); bo.insert(dbObject); if(0 == a % listsize) { BulkWriteResult rc = bo.execute(); bo = null; }}

7x Load with BulkOp

How do I Pick A Shard Key?

Shard Key characteristics

• A good shard key has:– sufficient cardinality

– distributed writes

– targeted reads ("query isolation")

• Shard key should be in every query if possible– scatter gather otherwise

• Choosing a good shard key is important!– affects performance and scalability

– changing it later is expensive

Hashed shard key

• Pros:– Evenly distributed writes

• Cons:– Random data (and index) updates can be IO intensive

– Range-based queries turn into scatter gather

Shard 1

mongos

Shard 2 Shard 3 Shard N

Low cardinality shard key

• Induces "jumbo chunks"

• Examples: boolean field

Shard 1

mongos

[ a, b )

Ascending shard key

• Monotonically increasing shard key values cause "hot spots" on inserts

• Examples: timestamps, _id

Shard 1

mongos

[ ISODate(…), $maxKey )

Ensuring Success with High Scalability

Success Factors

• Storage: random seeks (IOPS)

• RAM: working set based on query patterns

• Query: indexing

• Delete: most expensive operation

• Real-time vs. bulk operations

• Continuity: HA, DR, backup, restore

• Agile process: iterate by powers of 4

• Sharding: shard key and strategy

• Resources: don’t go it alone!

Agility and Scalability with MongoDB

Technology

Transcript of Agility and Scalability with MongoDB

MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey

Getting Acquainted with MongoDB - Zend · PDF fileModeling complex or polymorphic data Schema migrations Administration Scalability is a trade-off Partition tolerance. ... Getting

MongoDB on AWS · MongoDB is a popular NoSQL document database that provides rich features, fast time-to-market, global scalability, and high availability, and is inexpensive to operate.

AppSphere 15 - Achieving Enterprise Agility, Superior Scalability and Deployment at Speed with APMaaS

Agility Scalability Performance · Performance Scalability Agility . 2 In this age of digital transformation, data is an asset to all the organizations. A good use of your data ...

InMemory Accelerator for MongoDB · Slide 26 Comparison: Scalability > GridGain’s InMemory Accelerator > Memory First > Scales to 1000s Of Nodes > Simple Unified Deployment > Data

High Performance NoSQL with MongoDB - SDD 2016sddconf.com/brands/sdd/library/high-performance... · Michael Kennedy | @mkennedy | michaelckennedy.net Replication vs. scalability •

“Agility Server Scalability Testing”...Note: This harness was only designed to test the scalability of the API, and not of the web client. As such, performance of the web client

19 ent architect - MIT OpenCourseWare...enterprise performance objectives, etc) and desired enterprise “ilities” (flexibility, scalability, agility, etc.) influence the enterprise

SPEED, AGILITY, FLEXIBILITY AND SCALABILITY

Divide and conquer for agility and scalability: An introduction to Microservices

MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas

Microservice Architectures for Scalability, Agility …eprints.uni-kiel.de/37490/1/ICSA2017presentation.pdfDevOps & Software Architecture ICSA, 05.04.2017 W. Hasselbring & G. Steinacker

How Gannett Achieved Scalability and Agility with NoSQL – Couchbase Live NYC

MongoDB Scalability Best Practices - Percona · MongoDB Scalability Best Practices Jason Terpko DBA, Rackspace/ObjectRocket 1

MySQL Performance and Scalability · Consider Caching and Buffering •Varnish, memcached etc Supplement MySQL with Other tools •Memcache,Redis,Gearman,MongoDB,Hadoop etc Be open

NoSQL@Mobiliar How we started · 2017. 9. 11. · Oracle, SQL Server, MariaDB, PostgreSQL, MongoDB, Couchbase, Cockroach, ArangoDB Focus on scalability, community, container size

Enhanced Operational Agility and Scalability For …docs.media.bitpipe.com/io_10x/io_105531/item_553415/vce-enhanced...Vblock systems are enterprise– and service provider–class

HARNESSING THE HYBRID CLOUD TO DRIVE GREATER … · TO DRIVE GREATER BUSINESS AGILITY | 1 ... greater scalability, improved security, ... ¥ Onerous planning and integration processes

No SQL and MongoDB - Hyderabad Scalability Meetup