Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics
-
Upload
mongodb -
Category
Technology
-
view
1.209 -
download
0
description
Transcript of Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics
CIGNEX Datamatics Con1idential www.cignex.com
Scaling MongoDB with Sharding – A Case Study
Presented by Yash Badiani and Rahul Nair
CIGNEX Datamatics Con1idential www.cignex.com
About CIGNEX Datamatics
A subsidiary of Datamatics Global Services Limited
2
CIGNEX Datamatics Con1idential www.cignex.com
Introduction of Datamatics (DGSL)
• Mission – Experts in improving Enterprise productivity through Process Engineering & Information Management Solutions
• Key Highlights – Founded in 1975
– Publicly listed in India
– Annual consolidated revenue of US$100 Million
– Fortune 500 clients
– 4,400+ employees across 22 of1ices in 9 countries
Strategic Alliances
3
CIGNEX Datamatics Con1idential www.cignex.com
What Does CIGNEX Datamatics Do?
4
Since 2000, making Open Source work for
the enterprise through adoption and
integration to:
• Address business goals
• Increase business velocity
• Lower the cost of doing business
• Reduce TCO
• Gain competitive advantage
Portal Solutions Content
Solutions
Big Data Solutions
400+ implementations worldwide across industries
CIGNEX Datamatics Con1idential www.cignex.com
Where We Can Help You
5
SOLUTIONS
Managed Cloud Services -‐ Develop, Deploy, Manage
VAR/Annual Product Subscrip>on -‐ Liferay, Alfresco, Cloudera Hadoop, MongoDB
Extended Development Center – Center of Excellence
UI, Development , Integra>on, Customiza>on, Migra>on , Tes>ng, Training , Support (24*7)
User eXperience PlaRorm
Portals Liferay, Drupal, JBoss,
ZK, HTML5,
MuleSoW
• Intranet • Extranet • EAI • SOA
• S o c i a l Collabora>on
• Mobile Portals
Enterprise Content Management
Content Alfresco, Adobe CQ,
Drupal, Magento,
JBoss, Moodle, EphesoW,
Liferay
• WCM • DM • RM • CMS • DAM
• E-‐Commerce • E-‐learning • ERP • Imaging Solu>ons
SERVICES
Making Data Work Big Data Hadoop, MongoDB, Neo4j,
Flume, Hive
Solr, Pentaho, JaspersoW
• Analy>cs • Mobile • Social • Web • Real-‐>me
• DW -‐ BI • Log Processing and Analysis
• Enterprise Search
CIGNEX Datamatics Con1idential www.cignex.com
About the Presenters
• Yash Badiani is the Big Data Practice Lead at CIGNEX Datamatics and focuses on Big Data Technologies including MongoDB & Hadoop. He has worked extensively on large Data warehousing & Business Intelligence projects with tools such as Business Objects, Microsoft SQL Server, Microstrategy, IBM Cognos.
• Gaurav Khambhala works at CIGNEX Datamatics as Technical Lead.
He is the senior member of the PHP Practice at CIGNEX Datamatics and is involved on various technology initiatives like Big Data where he focuses on integration of PHP with NoSQL sources like MongoDB. He has a wide industry experience in software development & management in Open Source technologies such as Drupal & Moodle
6
CIGNEX Datamatics Con1idential www.cignex.com 7
• CIGNEX Datamatics – Introduction & Offerings • Use Case & Database Requirements • Challenges with Traditional Databases • Why MongoDB? • Solution
– Approach – Architecture and Hardware Sizing
• Scaling with Sharding – Sharding Basics – Sharding – Choosing the RIGHT Shard Key – Benchmarking with Results
• Key Takeaways
Agenda
CIGNEX Datamatics Con1idential www.cignex.com
Big Data Practice At CIGNEX Datamatics
8
Brief Snapshot
Technology Partnership • ~40 employee Big Data Practice focused on Hadoop, MongoDB, Neo4j, Solr
• Professionals formally trained / certi1ied from Cloudera and 10gen
• Expertize in Hadoop Eco-‐System (HBase, Pig, Hive, Flume, Sqoop, Oozie, Zookeeper)
• Strong partnerships: • System Integration partners with Cloudera for CDH
• Global partner with 10gen for MongoDB – multiple webinars on different solutions
CIGNEX Datamatics Con1idential www.cignex.com
Our Offerings – Big Data
9
Consulting Implementation Support & Training
Consulting • Business Analysis • Technology Evaluation • Architecture • Design Framework • Cluster sizing • Deployment planning • Proof-‐of-‐Concept • Health Check • Performance
Benchmarking
Implementation • UI Development • Application Integration • Customization • Migration • Testing • Performance Tuning
Support & Training • DBA Support • Application Support • Enhancements • 24*7 Production
Support(Tier 1/2/3) • Trainings
CIGNEX Datamatics Con1idential www.cignex.com
10
Users Devices Load Balancer Database
End Users 7 Million Users
Spread Across Geography
Devices 8 devices / user Home/OfMice/Anywhere
App. Layer
Load Balancer Receives high volume of concurrent CRUD requests Routes request trafMic to DB cluster
Data Storage
mongoDB cluster Sharding Replication with Automatic Failover Indexes
Use Case
CIGNEX Datamatics Con1idential www.cignex.com
Database Requirements
11
Flexibility in Schema
High Performance
Availability
Agility in Development & Deployment
Enterprise Level Support
CIGNEX Datamatics Con1idential www.cignex.com
Support limited to terabytes
Limitations of RDBMS
RDBMS can’t manage all dimensions of data with speed & at lower cost.
Manage only Structured Data
RDBMS doesn’t scale inherently
Feature rich but slow performance
Complex to Shard/Partition due to maintenance of schema
Limitations in scaling High volume of concurrent CRUD
$
Specialized Hardware -‐ Expensive
Vertical Scaling expensive and dif1icult to scale
12
CIGNEX Datamatics Con1idential www.cignex.com
• Global Coverage • 24x7 Support • Ease of maintenance
Why MongoDB?
13
• Programming Language drivers • Shorter Dev cycle • Faster deployment
• Automatic failover • Redundancy • 100% uptime
Agility in Development & Deployment
Availability
• Easy integration • Ease of schema design • Document oriented storage
Flexibility in Schema
Schema free
Replication
Driver Support
Enterprise Level Support
Strong Community
• Concurrent CRUD • Fast Updates • Write distribution with Sharding
High Performance
Indexes & Sharding
CIGNEX Datamatics Con1idential www.cignex.com
Solution: Approach
14 14
• Schema Design • Collections and Field De1initions Schema
• Document Size • Total expected data size Database Size
• Frequency of CRUD operations • Read/Write ratio Concurrent Load
• Automatic Failover • Replication and Backup Availability
• Working Set • Access Patterns Indexing
• Horizontal Scaling • Query Performance Sharding
• Cluster sizing • RAM and Disk storage Hardware Sizing
CIGNEX Datamatics Con1idential www.cignex.com
Solution: Architecture
15
mongod Secondary
mongod Primary Mongod
Arbiter
mongod Secondary
mongod Primary
Mongod Arbiter
mongod Secondary
mongod Primary
Mongod Arbiter
mongod Secondary
mongod Primary Mongod
Arbiter
mongod Secondary
mongod Primary Mongod
Arbiter
mongos
mongos
mongos
mongos
mongos
mongos
App
Server
App
Server
App
Server
App
Server
App
Server
App
Server
Data Tier
mongod mongod
mongod
Con1ig Servers
App Tier
Shard 1
Shard 2
Shard 3
Shard 4
Replica Set
Routed Requests from mongos to shards
Routed for non-‐sharded collections
Load
Balancer
CIGNEX Datamatics Con1idential www.cignex.com
Sharding – What is it?
16
• Distributes single logical database system across clusters
• Allows to partition a collection across # of mongod
instances(shards)
• Advantages: – Increases write capacity
– Ability to support larger working sets
– Raises limits of data size beyond a single node
CIGNEX Datamatics Con1idential www.cignex.com
Sharding -‐ Features
17
• Range-‐based Data Partitioning
• Automatic Data volume distribution
• Transparent query routing
• Horizontal capacity – Additional write capacity through distribution
– Right shard key allows expansion of working set
CIGNEX Datamatics Con1idential www.cignex.com
Sharding – When to use?
18
Storage Drive
Your data set approaches or exceeds the storage capacity of a single node in your system
Working Set
RAM
The size of your system’s active working set will soon exceed the capacity of the maximum amount of RAM for your system
Storage Drive
Your system has a large amount of write activity, a single MongoDB instance cannot write data fast enough to meet demand, and all other approaches have not reduced contention
CIGNEX Datamatics Con1idential www.cignex.com
Shard Keys
• The ideal shard key :
– Easily divisible which makes it
easy for MongoDB to distribute
content among the shards
– Higher “randomness”
– Targeted queries
– May need to be computed
19
Shard Keys: Exist in every document in a collection that MongoDB uses to distribute documents among the shards like indexes, they can be either a single 1ield, or a compound key
CIGNEX Datamatics Con1idential www.cignex.com
Choosing Right Shard Key
20
Different approach for Shard Keys
• Approach 1: Random Key – UserId
• Approach 2: Coarsely ascending key + Random Key –
YearMonth + UserId
CIGNEX Datamatics Con1idential www.cignex.com
Benchmarking / Load Testing Approach
21
Automated scripts with varied load
CIGNEX Datamatics Con1idential www.cignex.com
Results -‐ INSERTS
22
Over 80 million documents inserted with a decreasing threshold over 10 million
Over 225 million documents inserted at a stable rate of 6000 documents/sec
Approach 1
Approach 2
Benchmarks done on 8GB Test H/W Machines
CIGNEX Datamatics Con1idential www.cignex.com
Results -‐ UPDATES
23
Over 50 million documents updated at avg. 400 documents/sec
Over 100 million documents updated at as high as. 4000 documents/sec
Approach 1
Approach 2
Benchmarks done on 8GB Test H/W Machines
CIGNEX Datamatics Con1idential www.cignex.com
Results – INSERT, UPDATE
24
>6000 documents/ second >70 million records
>6000 documents/ second >50 million records
Simultaneous INSERT
Simultaneous UPDATE
Approach 2
Benchmarks done on 8GB Test H/W Machines
CIGNEX Datamatics Con1idential www.cignex.com
Benchmarking – Sharding Vs Non Sharding
25
Operation Sharding (YearMonth + UserId)
Non-‐Sharding
INSERTS ~6000 docs/sec ~2900 docs/sec UPDATES ~4000 docs/sec ~620 updates/sec INSERT & UPDATES
~6000 docs/sec & ~6100 docs/sec
~2000 docs/sec & ~600 docs/sec
Benchmarks done on 8GB Test H/W Machines
CIGNEX Datamatics Con1idential www.cignex.com
Key Takeaways
26
• Comprehensive approach on Performance Tuning
• Plan Early for Performance
• MongoDB scales & shines
• Sharding scales INSERTS/UPDATES vs. Non sharding
• Sharding with Approach 2 (Coarsely ascending Key + Random
Key) provides sustained results & better utilization of the RAM
• Different set of server/s for NON-‐Sharded collections
• Indexes to be de1ined carefully
• Sharded collections to have minimal number of indexes
CIGNEX Datamatics Con1idential www.cignex.com
For queries reach out to us at [email protected]
Thank You. Any Questions ?
Making Open Source Work