Webinar: Operational Best Practices

Senior Solutions Architect, 10gen

Asya Kamsky

#MongoDB

Operational Best Practices

Operational Best Practices Asya Kamsky

Best Practices == More Value

How to get more sleep while your MongoDB cluster hums along

The Agenda

•  Roles and responsibilities

•  Schema design and application performance

•  Hardware

•  Replication

•  Sharding

•  Monitoring


Roles and Responsibilities

Application Data needs

Schema Design

Read and Write

Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security



Application Data needs

Schema Design

Read and Write Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security

Backups

Maintenance

Upgrades



MONITORING

Roles and Responsibilities Application Developer

Data Architect

DBA System Admin

Network Admin


Schema Design and Application Performance

In MongoDB correct schema design is essential for optimal application performance.

DATA != SCHEMA

Schema and Performance


Multiple types of indexes supported.

Indexing is essential



•  Monitoring •  Measuring •  Benchmarking •  Optimizing

Understanding actual performance



•  Logs •  Query plan •  Application •  Ad-hoc testing

Hardware

Hardware

•  Memory

•  Storage

•  CPU - speed

•  CPU - number of cores

Impact on performance in that order!


Replica Sets

Secondary Secondary

Primary

Client ApplicationDriver

Write

Read

Replica Sets and Application

Node 1Secondary

Node 2Secondary

Node 3Primary

Replication

Heartbeat

ReplicationReplica Set – HA

Node 1Secondary

Node 2Secondary

Node 3

Heartbeat

Primary Election

Replica Set – Failure

Node 1Secondary

Node 2Primary

Node 3

Replication

Heartbeat

Replica Set – Failover

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Recovery

Replication

Replica Set – Recovery

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Secondary

Replication

Replica Set – Reestablished

Replica Sets

•  Primary purpose: –  High Availability with automatic failover –  Disaster Recovery –  No-down-time maintenance –  No application changes on reconfiguration –  Extra copies of data for "special" read workloads

•  Full benefit achieved with advance planning


Replica Sets

•  Full benefit achieved with advance planning


•  Determine your SLA/HA requirements •  Determine your DR requirements •  Understand impact of node, network, DC failure •  Understand all available RS features

priority scores, hidden, delayed, tags •  Monitor and proactively remedy potential problems •  Practice recovery from disastrous failure

Replica Sets

•  Best Practices for Configuration –  Odd number of voting replica members –  Size the oplog appropriately for high volume loads –  Use multiple Data Centers/Availability Zones –  Use DNS names for node configuration –  Add hidden delayed-replication member as "insurance" –  All replica set nodes should have same capacity

•  Operation –  Upgrade secondaries first (primary last) –  Maintenance on secondaries first (primary last) –  Use 'rs.stepDown()' command


Sharded Clusters

Node 1SecondaryConfigServer



Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

•  Keys to successful sharding: –  Pick a good shard key –  Make config servers resilient –  Shard before you "have to"

•  Good shard key is essential to achieving scaling


Sharded Clusters

Sharded Clusters

•  Good shard key is essential to achieving scaling


•  Distributes your writes across all shards •  Allows majority of reads to be "targeted" (not scatter-

gather) •  Exists in every document •  Has sufficiently high cardinality •  Allows you to take advantage of advanced features - tag aware balancing

•  Config Servers –  Three must be available to automatically balance data –  All three must be "in sync"

•  if one becomes unavailable others go read-only –  At least one must be available to avoid disaster

•  without information inside config server it's not possible to determine which shards contain which ranges of data!

•  Must stop balancing during backup

Sharded Clusters


•  Shard before you "have to" –  Balancing data is intensive process –  If existing cluster is near full capacity balancing may impact

response time of application –  Planning to shard well in advance gives more time

•  to provision new hardware •  to select a good shard key •  to understand advanced sharding features (tagging)

Sharded Clusters


•  Other best practices –  Three config servers –  Each shard is a replica set –  Test what you run

•  use the same topology in QA as in production –  Monitor

•  RAM •  disk I/O •  total storage •  MongoDB throughput

Sharded Clusters


Monitoring

Monitoring

• Multiple CLI and internal status commands •  mongostat; mongotop; db.serverStatus()

• MMS

•  Plug-ins for munin, Nagios, cacti, etc.

•  Integration via SNMP to other tools


MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

•  Charts, custom dashboards and automated alerting

•  Tracks 100+ metrics – performance, resource utilization, availability and response times

•  10,000+ users

MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

A Picture Speaks a Thousand Words


Symptoms

High Use CPU Similar Query Pattern


Diagnostics - iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00


Monitoring

• mongostat


Monitoring

• mongotop


Monitoring Best Practices

•  Monitor Logs –  Alert, escalate –  Correlate

•  Disk –  Monitor

•  Instrument/Monitor App (including logs!)

•  Know your application and application (write) characteristics


Monitoring Best Practices

•  Performance test/analyze system behavior

•  Load test before deployment

•  Selectively use database profiling during testing

•  Alert on abnormal states

•  High CPU is a sign of poorly indexed query


Best Practices Summary

Best Practices

•  Pre-deployment –  Learn –  Plan –  Prototype/Benchmark –  Execute

•  During deployment –  Monitor –  Continue planning –  Evolve


System provisioning

•  Capacity

•  Performance

•  Scale

•  Configuration


Logs

•  Review

•  Alert

•  Rotate and collect (per cluster)


Query/Index Analysis

•  Database Profiler

•  Run explain periodically (sampled)

•  Instrument code, generate metrics

•  Look for similar patterns to find root 'cause


Hardware Configuration

•  Pay attention to disk configurations

•  Load testing will find some misconfigurations

•  MongoDB depends on the OS a lot


Plan/Test Rollouts

•  Rolling upgrade for Replica Set

•  Generate indexes on secondaries first

•  Name services, use redirection


More References

•  Please take a look at http://docs.mongodb.org

•  Ask questions on mongodb-user group

•  Use MMS or historic monitoring –  Watch for trends –  Create alerts –  Forecast capacity for provisioning

•  Utilize all available resources –  10gen offers paid public and on-site training & free web-based

classes –  consulting services –  pre-production and production support


Senior Solutions Architect, 10gen

Asya Kamsky

#MongoSV

Thank You

Webinar: Operational Best Practices

Technology

Transcript of Webinar: Operational Best Practices