Using the MongoDB Monitoring Service (MMS)

Post on 17-Dec-2014

2.677 views 2 download

Tags:

description

This talk will cover MMS - the MongoDB monitoring system. MMS is a Free MongoDB monitoring Saas solution built by 10gen and used by many MongoDB users. Monitoring is a necessary activity for any production database system to detect upcoming or ongoing issues. In addition it gives an insight on all the vitals of your system and can help detect bottlenecks and inefficiencies for improved performance. This talk will focus on: - what is MMS and how to get started - understanding each metric and graph - what are signs of trouble, when to take actions or panic - what are signs that your hardware ressources are not properly used - how did we build MMS, the high performance time series system

Transcript of Using the MongoDB Monitoring Service (MMS)

Engineer, 10gen

Mark Hillick - @markofu

#mongosv

Using the MongoDB Monitoring Service (MMS)

What, where, numbers?

What is MMS?

• MongoDB monitoring Saas solution with:

– Per minute granularity

– Alerting: host up / down, metrics etc

– Event tracking (server restart, step down, …)

• Host management (auto discover)

• Profiling

• Hardware stats also

Why use MMS? (1)

• Overview – Bird’s Eye

– Macro

• Drill down (minute by minute)

– Micro

Why use MMS? (2)

• Haz all teh things

• Tailored specifically for MongoDB

• Incredibly helpful for 10gen Support when troubleshooting

A few numbers …

• Monitors over 19k database servers

• 40k writes per second

• 400 metrics per ping packet

• 9 billion metrics recorded per day

How?

Set up MMS – it’s easy

• Go to http://mms.10gen.com

– Create a new account or sign in with jira user.

– Pick an explicit company name

– Download and run the agent

– From MMS dashboard, add a host to monitor

The MMS client (agent)

• Small Python app

• A single agent process

– Failover – multiple agents

• Connect to mms.10gen.com (SSL over TCP 443)

Host

Operational Stats

Alerting

Alerts - Config

All good

Alerts - Closed

Events

Security

Security

• Purely stats (metadata). – Log transfer has to be turned on.

• HTTPS & connections are outbound only (from the agent)

• If profiling in db & MMS, then profiling data is sent

On-premise MMS

• Locally Hosted in Customer Infrastructure

• PCI, HIPAA etc

• Enterprise Customers (2.4)

Measure me!!!

Metrics

• Source : http://www.kaushik.net/avinash/wp-content/uploads/2007/10/metrics.jpg

opcounters• Count of every operation per second

• getMore – each batch of a query

memory• Mapped: sum of files on disk

• Virtual memory: 2 x mapped (j) + process overhead

• Resident memory: data in RAM actively used

Lock %• Amount of time spent in the write lock

• From 2.2 : each db has own lock

Background flush• Flush every 60 seconds

• Watch: if flush time gets close to sync delay

Page faults• Disk IO

• Readahead

Replication• On primary: amount of time in oplog

• On secondary: replication delay to primary

Metrics that we discussed• Opcounters

• Lock %

• Background Flush

• Page Faults

• Replication

Metrics for performance

• Resident memory: how much data in RAM?

• Page Faults: paging to disk? Readahead?

• Journal commits in write lock: separate journal

• High background flush: reduce sync delay to smooth

Documentation

Docs? Where?

• Manual : https://mms.10gen.com/help/

– Web– PDF

• FAQ : https://mms.10gen.com/docs/faq

• Blah

Futures

Feature Request

• JIRA Ticket - MMSSUPPORT

Coming up…

• Data visualization, e.g. shard distribution (Q1 2013)???

• Move from Python to Java

• Blah – Ryan???

Conclusion

Conclusion

• Easy to use

• Macro & micro

• Detailed monitoring features

• Aides 10gen Support immensely

Engineer, 10gen

Mark Hillick - @markofu

#mongosv

Questions?