Cignex mongodb-sharding-mongodbdays

25
CIGNEX Datamatics Confidential www.cignex.com Scaling MongoDB with Sharding – A Case Study Presented by: Nikhil Naib Title: Lead Consultant – Big Data For MongoDB and CIGNEX Datamatics Use Only

description

 

Transcript of Cignex mongodb-sharding-mongodbdays

Page 1: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Scaling MongoDB with Sharding – A Case Study

Presented by: Nikhil Naib

Title: Lead Consultant – Big Data

For MongoDB and CIGNEX Datamatics Use Only

Page 2: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Who We Are?

• Since 2000, delivering solutions using Open Source technologies to

– Address business goals

– Increase business velocity

– Lower the cost of doing business

– Gain competitive advantage

• Dramatically reduce Total Cost of Ownership (TCO) & deployment time of IT solutions

2

400+ Implementations

450+ Experts

200+ Integrations

13 Books

5000+ Community

Contributions

Offices : America | India | UK | Europe | Singapore | Australia

Portal Solutions Content

Solutions

Big Data Analytics Solutions

Page 3: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Our Big Data Analytics Practice

3

Team Size: 110+ Projects: 10+

• 20+ Big Data, 100+ Analytics & DW/BI

• Partnership –MongoDB, Cloudera, IBM

• Technical expertise –MongoDB, Hadoop,

Neo4j, Solr, Pentaho, Talend, Cognos, Business

Objects, Tableau, Jasper Reports

• Research & Analytics division with data

scientists

• Connectors/Accelerators, Frameworks

• BIGArchive – Enterprise Scale Archival

• Liferay MongoDB Store

• Drupal MongoDB Connector

Big Data Partners

Business Intelligence Expertise

Page 4: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com 4

• Use Case & Database Requirements

• Why MongoDB?

• Solution

• To Shard Or Not To Shard

• Scaling with Sharding – Sharding Basics

– Architecture and Hardware Sizing

– Sharding – Choosing the RIGHT Shard Key

– Benchmarking with Results

• Key Takeaways

Agenda

Page 5: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

5

Use Case

Load Balancer Database Devices

7 Million Users

Across Geography

Users

8 devices / user

Home/Office/Any

where

High volume of

concurrent CRUD requests routes

to DB cluster

MongoDB Data Storage cluster enabled with

sharding, Auto replication for

failover, Indexes

Ability to access the digital assets of the service provider across array of devices registered by the user with the facility of resuming (session shifting).

Page 6: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Database Requirements

6

Agility in Development

& Deployment

High Availability

Flexibility in Schema

Enterprise Level

Support

High Performance

Page 7: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

• Global Coverage • 24x7 Support • Ease of maintenance

Why MongoDB?

7

• Programming Language drivers • Shorter Dev cycle • Faster deployment

• Automatic failover • Redundancy • ~100% uptime

Agility in Development

& Deployment

• Easy integration • Ease of schema design • Document oriented storage

Loose Schema

Replication

Driver Support

Strong Community

• Concurrent CRUD • Fast Updates • Write distribution with Sharding

Indexes & Sharding

Availability

Flexibility in Schema

Enterprise Level

Support

High Performance

Page 8: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Sharding – What is it?

8

• Distributes single logical database across multiple mongod

nodes

• Advantages:

– Raises limits of data size beyond a single node

– Increases Write capacity

– Ability to support larger working sets

– Read scaling (By the means of targeting specific shards through

routed requests and distributed data. It is possible to support good

amount of Scatter-gather requests if used judiciously. )

Page 9: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Sharding – When to use?

9

Storage Drive

Your data set approaches or exceeds the storage capacity of a single node in your system

Working Set

RAM

The size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system

Storage Drive

Your system has a large amount of write activity, a single MongoDB instance cannot write data fast enough to meet demand, and all other approaches have not reduced contention

Page 10: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Sharding - Features

10

• Range-based Data Partitioning

• Automatic Data volume distribution

• Transparent query routing

• Horizontal capacity

– Additional write capacity through distribution

– Right shard key allows expansion of working set

Page 11: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Solution: Approach

11

11

• Schema Design • Collections and Field Definitions Schema

• Document Size • Total expected data size Database Size

• Frequency of CRUD operations • Read/Write ratio Concurrent Load

• Replication, Backup and Automatic Failover • Right Replication Factor (RF) • Read Scaling for the use cases with eventual consistency.

Availability

• Working Set • Access Patterns Indexing

• Horizontal Scaling • Read/Write Scaling Sharding

• Cluster sizing • RAM and Disk storage Hardware Sizing

Page 12: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

To Shard Or Not To Shard ?

• Sharding is a very powerful technique provided by MongoDB to scale, but it should be used only after due diligence, else it proves to be an over kill.

• It brings substantial amount of overhead from infrastructure and maintenance standpoint.

• It should be used only when you have done all the possible optimizations for the single node and still the write capacity of the single node proves to be a bottleneck.

• In production minimum 6 server instances are required to have a sharded cluster with no failover capability.

• In production we can not afford to have no redundancy/failover. Hence minimum RF of 2 is required which also brings an arbiter node into picture.

12

Page 13: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

To Shard Or Not To Shard ?

13

Inserts And Updates With No Sharding

Page 14: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Ap

p S

erve

r A

pp

Ser

ver

Ap

p S

erve

r

Solution: Architecture

14

mongod Primary

mongod Secondary

mongod Arbiter

Shard 1

mongod mongod

Config Servers

mongod

Routed Requests from mongos to shards

mongod Primary

mongod Secondary

mongod Arbiter

Shard 2

mo

ngo

s

Lo

ad

Bal

ance

r

Data Tier App Tier

mongod Primary

mongod Secondary

mongod Arbiter

Shard n

mo

ngo

s m

on

gos

Page 15: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Shard Keys

• The ideal shard key :

– High cardinality which makes it

easy for MongoDB to split the

chunks.

– Higher “randomness”

– Targeted queries

– May need to be computed

15

Shard Keys: Exist in every document in a collection. MongoDB uses shard key to distribute documents among the shards. Just like indexes, they can be either a single field, or a compound key.

Page 16: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Choosing Right Shard Key

16

Different approach for Shard Keys

• Approach 1: Random Key – UserId + AssetId

• Approach 2: Coarsely ascending key + Random Key –

YearMonth + UserId + AssetId

• Hashed Shard Keys (Not Tested/Applicable here.)

– New in version 2.4.

– Hashed shard keys use a hashed index of a single field as the shard

key to partition data across your sharded cluster.

– Field should good cardinality.

– Hashed keys work well with fields that increase monotonically.

Page 17: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Benchmarking / Load Testing Approach

17

Automated scripts with varied load

Page 18: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Results - INSERTS

18

Over 80 million documents inserted with a decreasing threshold over 10 million

Over 225 million documents inserted at a stable rate of 6000 documents/sec

Approach 1

Approach 2

Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines

Page 19: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Results - UPDATES

19

Over 50 million documents updated at avg. 400 documents/sec

Over 100 million documents updated at as high as. 4000 documents/sec

Approach 1

Approach 2

Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines

Page 20: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Results – INSERT, UPDATE

20

>6000 documents/ second

>70 million records

>6000 documents/ second

>50 million records

Simultaneous INSERT

Simultaneous UPDATE

Approach 2

Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines

Page 21: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Benchmarking – Sharding Vs Non Sharding

21

Operation Sharding (YearMonth + UserId)

Non-Sharding

INSERTS ~6000 docs/sec ~2900 docs/sec

UPDATES ~4000 docs/sec ~620 updates/sec

INSERT & UPDATES

~6000 docs/sec & ~6100 docs/sec

~2000 docs/sec & ~600 docs/sec

Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines

Page 22: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Key Takeaways

• MongoDB scales & shines.

– Expected - 690 Million CRUD operations per day.

– Achieved - 840 Million CRUD operations per day.

• Plan early for sharding.

• Sharding scales INSERTS/UPDATES Vs Non sharding.

• There is no magic recipe for finding an ideal shard key.

• DO NOT go to production without benchmarking the shard key. Shard key cannot be

changed for the given configuration.

• Use MMS. It’s a great tool to assess the health of the cluster and identify the bottlenecks

well in advance.

• Sharding with Approach 2(Coarsely ascending Key + Random Key) provides sustained

results & better utilization of the RAM (better index locality).

22

Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this use case, it is not a generic approach.

Page 23: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Key Takeaways

23

• Routed Requests are always faster than scatter/gather requests.

• Identify the consistency requirements for the read queries. In

case of eventual consistency using read preference secondary-

preferred can help you to squeeze more performance.

• Different set of server/s for NON-Sharded collections.

• Indexes to be defined carefully. More number of Indexes

substantially bring down the write throughput.

• Sharded collections should have minimal number of indexes.

Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this use case, it is not a generic approach.

Page 24: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

Our Success Stories : At a Glance

24

1

2

3

4

5

6

Big Data Analytics for Telecom Optimum network bandwidth management & policy configuration for telecom companies

Social Media Research Platform for Legal Firms

Leverage social media & unstructured data analytics for collecting supporting evidences for trials

US based Advanced GPS Solutions Provider

Real time analysis of data accumulated from 200,000 GPS based devices

Global Provider of Risk Management Solutions

Collection and analysis of data from external and internal applications delivered to a dashboard

US based Networking Equipment Leader

Cluster configuration of high volume video uploads including 30 million inserts/hour

European Chemical Giant

Patent search – 10x increased in performance and 20x reduction in TCO

7 US based Social Security e-Benefits System

Managing billion object repository with enterprise search and retrieval

Page 25: Cignex mongodb-sharding-mongodbdays

CIGNEX Datamatics Confidential www.cignex.com

For queries reach out to us at [email protected]

Thank You. Any Questions ?

Making Open Source Work