Webinar: Operational Best Practices

50
Senior Solutions Architect, 10gen Asya Kamsky #MongoDB Operational Best Practices

description

This webinar will cover best practices around dev/ops and general operations for those already familiar with basics of MongoDB. Topics will include team roles around data model design, monitoring, hardware configurations, replication and horizontal scaling.

Transcript of Webinar: Operational Best Practices

Page 1: Webinar: Operational Best Practices

Senior Solutions Architect, 10gen

Asya Kamsky

#MongoDB

Operational Best Practices

Page 2: Webinar: Operational Best Practices

Operational Best Practices Asya Kamsky

Best Practices == More Value

How to get more sleep while your MongoDB cluster hums along

Page 3: Webinar: Operational Best Practices

The Agenda

•  Roles and responsibilities

•  Schema design and application performance

•  Hardware

•  Replication

•  Sharding

•  Monitoring

Operational Best Practices Asya Kamsky

Page 4: Webinar: Operational Best Practices

Roles and Responsibilities

Page 5: Webinar: Operational Best Practices

Application Data needs

Schema Design

Read and Write

Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security

Roles and Responsibilities

Operational Best Practices Asya Kamsky

Page 6: Webinar: Operational Best Practices

Application Data needs

Schema Design

Read and Write Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security

Backups

Maintenance

Upgrades

Roles and Responsibilities

Operational Best Practices Asya Kamsky

MONITORING

Page 7: Webinar: Operational Best Practices

Roles and Responsibilities Application Developer

Data Architect

DBA System Admin

Network Admin

Operational Best Practices Asya Kamsky

Page 8: Webinar: Operational Best Practices

Schema Design and Application Performance

Page 9: Webinar: Operational Best Practices

In MongoDB correct schema design is essential for optimal application performance.

DATA != SCHEMA

Schema and Performance

Operational Best Practices Asya Kamsky

Page 10: Webinar: Operational Best Practices

Multiple types of indexes supported.

Indexing is essential

Schema and Performance

Operational Best Practices Asya Kamsky

Page 11: Webinar: Operational Best Practices

•  Monitoring •  Measuring •  Benchmarking •  Optimizing

Understanding actual performance

Schema and Performance

Operational Best Practices Asya Kamsky

•  Logs •  Query plan •  Application •  Ad-hoc testing

Page 12: Webinar: Operational Best Practices

Hardware

Page 13: Webinar: Operational Best Practices

Hardware

•  Memory

•  Storage

•  CPU - speed

•  CPU - number of cores

Impact on performance in that order!

Operational Best Practices Asya Kamsky

Page 14: Webinar: Operational Best Practices

Replica Sets

Page 15: Webinar: Operational Best Practices

Secondary Secondary

Primary

Client ApplicationDriver

Write

Read

Replica Sets and Application

Page 16: Webinar: Operational Best Practices

Node 1Secondary

Node 2Secondary

Node 3Primary

Replication

Heartbeat

ReplicationReplica Set – HA

Page 17: Webinar: Operational Best Practices

Node 1Secondary

Node 2Secondary

Node 3

Heartbeat

Primary Election

Replica Set – Failure

Page 18: Webinar: Operational Best Practices

Node 1Secondary

Node 2Primary

Node 3

Replication

Heartbeat

Replica Set – Failover

Page 19: Webinar: Operational Best Practices

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Recovery

Replication

Replica Set – Recovery

Page 20: Webinar: Operational Best Practices

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Secondary

Replication

Replica Set – Reestablished

Page 21: Webinar: Operational Best Practices

Replica Sets

•  Primary purpose: –  High Availability with automatic failover –  Disaster Recovery –  No-down-time maintenance –  No application changes on reconfiguration –  Extra copies of data for "special" read workloads

•  Full benefit achieved with advance planning

Operational Best Practices Asya Kamsky

Page 22: Webinar: Operational Best Practices

Replica Sets

•  Full benefit achieved with advance planning

Operational Best Practices Asya Kamsky

•  Determine your SLA/HA requirements •  Determine your DR requirements •  Understand impact of node, network, DC failure •  Understand all available RS features

priority scores, hidden, delayed, tags •  Monitor and proactively remedy potential problems •  Practice recovery from disastrous failure

Page 23: Webinar: Operational Best Practices

Replica Sets

•  Best Practices for Configuration –  Odd number of voting replica members –  Size the oplog appropriately for high volume loads –  Use multiple Data Centers/Availability Zones –  Use DNS names for node configuration –  Add hidden delayed-replication member as "insurance" –  All replica set nodes should have same capacity

•  Operation –  Upgrade secondaries first (primary last) –  Maintenance on secondaries first (primary last) –  Use 'rs.stepDown()' command

Operational Best Practices Asya Kamsky

Page 24: Webinar: Operational Best Practices

Sharded Clusters

Page 25: Webinar: Operational Best Practices

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

Page 26: Webinar: Operational Best Practices

•  Keys to successful sharding: –  Pick a good shard key –  Make config servers resilient –  Shard before you "have to"

•  Good shard key is essential to achieving scaling

Operational Best Practices Asya Kamsky

Sharded Clusters

Page 27: Webinar: Operational Best Practices

Sharded Clusters

•  Good shard key is essential to achieving scaling

Operational Best Practices Asya Kamsky

•  Distributes your writes across all shards •  Allows majority of reads to be "targeted" (not scatter-

gather) •  Exists in every document •  Has sufficiently high cardinality •  Allows you to take advantage of advanced features - tag aware balancing

Page 28: Webinar: Operational Best Practices

•  Config Servers –  Three must be available to automatically balance data –  All three must be "in sync"

•  if one becomes unavailable others go read-only –  At least one must be available to avoid disaster

•  without information inside config server it's not possible to determine which shards contain which ranges of data!

•  Must stop balancing during backup

Sharded Clusters

Operational Best Practices Asya Kamsky

Page 29: Webinar: Operational Best Practices

•  Shard before you "have to" –  Balancing data is intensive process –  If existing cluster is near full capacity balancing may impact

response time of application –  Planning to shard well in advance gives more time

•  to provision new hardware •  to select a good shard key •  to understand advanced sharding features (tagging)

Sharded Clusters

Operational Best Practices Asya Kamsky

Page 30: Webinar: Operational Best Practices

•  Other best practices –  Three config servers –  Each shard is a replica set –  Test what you run

•  use the same topology in QA as in production –  Monitor

•  RAM •  disk I/O •  total storage •  MongoDB throughput

Sharded Clusters

Operational Best Practices Asya Kamsky

Page 31: Webinar: Operational Best Practices

Monitoring

Page 32: Webinar: Operational Best Practices

Monitoring

• Multiple CLI and internal status commands •  mongostat; mongotop; db.serverStatus()

• MMS

•  Plug-ins for munin, Nagios, cacti, etc.

•  Integration via SNMP to other tools

Operational Best Practices Asya Kamsky

Page 33: Webinar: Operational Best Practices

MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

Page 34: Webinar: Operational Best Practices

•  Charts, custom dashboards and automated alerting

•  Tracks 100+ metrics – performance, resource utilization, availability and response times

•  10,000+ users

MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

Page 35: Webinar: Operational Best Practices

A Picture Speaks a Thousand Words

Operational Best Practices Asya Kamsky

Page 36: Webinar: Operational Best Practices

Symptoms

High Use CPU Similar Query Pattern

Operational Best Practices Asya Kamsky

Page 37: Webinar: Operational Best Practices

Diagnostics - iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00

Operational Best Practices Asya Kamsky

Page 38: Webinar: Operational Best Practices

Monitoring

• mongostat

Operational Best Practices Asya Kamsky

Page 39: Webinar: Operational Best Practices

Monitoring

• mongotop

Operational Best Practices Asya Kamsky

Page 40: Webinar: Operational Best Practices

Monitoring Best Practices

•  Monitor Logs –  Alert, escalate –  Correlate

•  Disk –  Monitor

•  Instrument/Monitor App (including logs!)

•  Know your application and application (write) characteristics

Operational Best Practices Asya Kamsky

Page 41: Webinar: Operational Best Practices

Monitoring Best Practices

•  Performance test/analyze system behavior

•  Load test before deployment

•  Selectively use database profiling during testing

•  Alert on abnormal states

•  High CPU is a sign of poorly indexed query

Operational Best Practices Asya Kamsky

Page 42: Webinar: Operational Best Practices

Best Practices Summary

Page 43: Webinar: Operational Best Practices

Best Practices

•  Pre-deployment –  Learn –  Plan –  Prototype/Benchmark –  Execute

•  During deployment –  Monitor –  Continue planning –  Evolve

Operational Best Practices Asya Kamsky

Page 44: Webinar: Operational Best Practices

System provisioning

•  Capacity

•  Performance

•  Scale

•  Configuration

Operational Best Practices Asya Kamsky

Page 45: Webinar: Operational Best Practices

Logs

•  Review

•  Alert

•  Rotate and collect (per cluster)

Operational Best Practices Asya Kamsky

Page 46: Webinar: Operational Best Practices

Query/Index Analysis

•  Database Profiler

•  Run explain periodically (sampled)

•  Instrument code, generate metrics

•  Look for similar patterns to find root 'cause

Operational Best Practices Asya Kamsky

Page 47: Webinar: Operational Best Practices

Hardware Configuration

•  Pay attention to disk configurations

•  Load testing will find some misconfigurations

•  MongoDB depends on the OS a lot

Operational Best Practices Asya Kamsky

Page 48: Webinar: Operational Best Practices

Plan/Test Rollouts

•  Rolling upgrade for Replica Set

•  Generate indexes on secondaries first

•  Name services, use redirection

Operational Best Practices Asya Kamsky

Page 49: Webinar: Operational Best Practices

More References

•  Please take a look at http://docs.mongodb.org

•  Ask questions on mongodb-user group

•  Use MMS or historic monitoring –  Watch for trends –  Create alerts –  Forecast capacity for provisioning

•  Utilize all available resources –  10gen offers paid public and on-site training & free web-based

classes –  consulting services –  pre-production and production support

Operational Best Practices Asya Kamsky

Page 50: Webinar: Operational Best Practices

Senior Solutions Architect, 10gen

Asya Kamsky

#MongoSV

Thank You