Webinar: Deployment Best Practices

42
Solutions Architect, 10gen Sandeep Parikh Deployment Best Practices

description

The last bugs are finished, testing is complete, and business is ready. What do you do next? In this talk we will cover the topics to ensure that you are prepared for a successful launch of your MongoDB based product, including: - Key counters and metrics: Page Faulting? IO Bound? What's my working set? How do I know? - Load Testing and Capacity Planning: How much resource is my MongoDB going to use? When do I need to add replicas and shards? - Monitoring: What should I be watching and how do I know if things are running correctly? We will map the theory to the practice by illustrating with real world examples.

Transcript of Webinar: Deployment Best Practices

Page 1: Webinar: Deployment Best Practices

Solutions Architect, 10gen

Sandeep Parikh

Deployment Best Practices

Page 2: Webinar: Deployment Best Practices

Prototype

Test

Monitor Scale

Script

The Cycle of Deployment Prep

Page 3: Webinar: Deployment Best Practices

Prototype Your Deployment

•  You have to start somewhere

•  Development is complete, deployment is next

•  Sketch out some initial deployment parameters ü Hardware sizing ü Operating system ü Disk setup ü Storage layout, data vs. journal vs. log

Prototype

Test

Monitor Scale

Script

Page 4: Webinar: Deployment Best Practices

Prototyping Considerations

•  Additional considerations –  Horizontal vs. vertical scale options –  Multiple datacenters

•  Start thinking about data growth –  Do you know how your data will evolve? –  Does your data live in multiple collections/databases –  Read-centric, write-centric or both?

•  The more you start thinking about it, the better

Prototype

Test

Monitor Scale

Script

Page 5: Webinar: Deployment Best Practices

Test, Test, Test

•  Generate a lot of data –  Write tests to measure bulk loading throughput –  Scaffolding can be used for staging, validation

•  Build your indexes –  All in the beginning –  On the fly

•  Script your app –  Can you simulate “expected” usage?

Prototype

Test

Monitor Scale

Script

Page 6: Webinar: Deployment Best Practices

Monitor Your Resources

•  Watch everything

•  The goal is to understand the numbers before deploying

•  Monitor using –  SNMP, munin, nagios –  mongostat, mongotop, iostat, cpustat –  MongoDB Monitoring Service (MMS)

•  Other stats –  Database, Collection level

Prototype

Test

Monitor Scale

Script

Page 7: Webinar: Deployment Best Practices

Monitoring Key Metrics

•  Op Counters –  Inserts, updates, deletes, reads

(more is generally better) –  Some differences in primary

vs. secondary ops

•  Resident memory –  Want this lower than

available physical memory –  Correlated with page faults

and index misses

•  Queues –  Readers and writers

Prototype

Test

Monitor Scale

Script

Page 8: Webinar: Deployment Best Practices

Monitoring Key Metrics

•  Page faults and B-Tree –  How often are you having to

hit the disk –  Persistently non-zero?

Working set might not fit.

•  Lock Percentage –  If high and queues are filled,

hitting write capacity

•  IO and CPU Stats –  IO Sustained or fluctuating

=> IO bound –  CPU hitting IOWAITs

Prototype

Test

Monitor Scale

Script

Page 9: Webinar: Deployment Best Practices

Scale Your Setup

•  Monitor those metrics while testing

•  Should tell you where to add capacity –  CPU, RAM, Disks

•  Storage configuration –  RAID levels –  Filesystem selection –  Block sizing –  Readahead setting

Prototype

Test

Monitor Scale

Script

Page 10: Webinar: Deployment Best Practices

Script Your Plays

•  Backups

•  Restores (backups are not enough)

•  Maintenance and Upgrades

•  Replica Set operations –  Stepping primaries down, adding new secondaries

•  Sharding operations –  Consistent backups, balancer operations

Prototype

Test

Monitor Scale

Script

Page 11: Webinar: Deployment Best Practices

Prototype

Test

Monitor Scale

Script

Lather, Rinse, Repeat

Page 12: Webinar: Deployment Best Practices

Perfect. I know what to do. How Do I Do It?

Page 13: Webinar: Deployment Best Practices

Balancing Priorities

Product Development

Infrastructure Development

Integration

QA

Code

Operations

Monitoring

Page 14: Webinar: Deployment Best Practices

The Scale Tips To One Side

•  Product development is the priority –  As it should be, but…

•  Infrastructure development can’t be overlooked

•  Know the downsides of not being prepared –  Downtime –  Data safety

•  Disaster will strike in one way or another

Page 15: Webinar: Deployment Best Practices

Integrate With The Dev Cycle

•  Why are ops typically skipped over until it’s too late? –  Planning can alleviate this issue

•  Make operations development a part of the dev cycle –  Put it into the schedule –  Make it a development milestone

•  Use it to your advantage –  Script deployment of dev and test systems

Page 16: Webinar: Deployment Best Practices

That’s all well and good but we are already deployed

Page 17: Webinar: Deployment Best Practices

Let’s Avoid This Situation

Page 18: Webinar: Deployment Best Practices

Prototype

Test

Monitor Scale

Script

Start The Cycle Again

Page 19: Webinar: Deployment Best Practices

Start With Monitoring

•  Monitor your deployment –  Munin, nagios –  MMS

•  Instrument your app –  Know your queries –  Read/write/update/delete behaviors –  Index utilization

•  Database and collection stats

Prototype

Test

Monitor Scale

Script

Page 20: Webinar: Deployment Best Practices

Scaling Deployment

•  The numbers don’t lie –  But individual measurements don’t always tell the whole

story

•  Are you hardware bound? –  Memory, Disks, CPU

•  Is your app the problem?

•  What about system settings? –  Low Resident Memory > Readahead > Page Faults

Prototype

Test

Monitor Scale

Script

Page 21: Webinar: Deployment Best Practices

Basic Solutions

•  Low opcounters + high page faults –  More memory

•  High paddingFactor and fragmentation –  Data model changes

•  Balancer running a lot, chunks always migrating –  Better shard key

•  Persistent b-tree misses, high page faults –  Queries aren’t hitting the indexes or aren’t using them

Prototype

Test

Monitor Scale

Script

Page 22: Webinar: Deployment Best Practices

Continue Through the Cycle

•  Script your setup –  This will save time as you iterate

•  Prototype the fixes –  Evaluate queries, how documents change, expected usage

•  Test the new setup –  Scripts to build the deployment and model usage

Prototype

Test

Monitor Scale

Script

Page 23: Webinar: Deployment Best Practices

Deployment is about Not being surprised

Page 24: Webinar: Deployment Best Practices

Problem > Diagnosis > Solution

Page 25: Webinar: Deployment Best Practices

Problem 1: Streaming Events

•  Suboptimal write throughput

•  Where is the bottleneck? –  Check the metrics

Page 26: Webinar: Deployment Best Practices

Diagnosis 1

•  Are opcounters reasonably accurate?

•  Check the queues

•  Examine lock percentages

•  How does resident memory look?

•  How large are your indexes?

Page 27: Webinar: Deployment Best Practices

Solution 1

•  Opcounters aren’t as high as you’d expect but memory is saturated

•  Correlated with high page faults

•  You might need more memory

•  MongoDB wants to fit your working set into memory

Page 28: Webinar: Deployment Best Practices

Problem 2: Tracking FB Friends

•  Update-heavy workload is slow

•  Document paddingFactor is increasing

Page 29: Webinar: Deployment Best Practices

Diagnosis 2

•  High paddingFactor –  Fragmentation!

•  More memory/disk is taken up by new documents –  Inefficient space usage

•  Documents are having to be relocated regularly

Page 30: Webinar: Deployment Best Practices

Solution 2

•  Check your queries –  Are your documents growing because of arrays or added

fields?

•  Pre-create required document structure or…

•  Kick growing elements individual objects in a separate collection –  Data model changes, app changes

Page 31: Webinar: Deployment Best Practices

Problem 3: Status Updates

•  Write-heavy sharded deployment –  Is one shard getting burned –  Balancer locked all the time

•  Balancer is constantly migrating chunks

Page 32: Webinar: Deployment Best Practices

Diagnosis 3

•  Check the mongos logs –  How often is migration occurring? –  Are chunks constantly moving from one shard to the next?

•  Shard key distribution –  Sequential keys? –  One shard always getting new writes?

Page 33: Webinar: Deployment Best Practices

Solution 3

•  Consider using hash, byte swapping, etc. if no “natural” key that distributes well –  Avoids the “hot” shard problem

•  High writes and high balancer lock –  Manage balancer window –  Run it during low utilization

Page 34: Webinar: Deployment Best Practices

Problem 4: File Sharing

•  Storing files in GridFS

•  Uploads are taking too long

Page 35: Webinar: Deployment Best Practices

Diagnosis 4

•  Check CPU and IO stats

•  Is the CPU stuck in IOWAITS?

•  High sustained IO operations

•  Lots of queued operations

•  IO bound workload

Page 36: Webinar: Deployment Best Practices

Solution 4

•  Ensure storage is in good health –  RAID status –  SAN or NAS devices functioning properly –  Virtualized disks

•  Consider separating data and journal –  --directoryperdb –  Symlink journal to another location

•  Ensure other processes aren’t hitting storage

Page 37: Webinar: Deployment Best Practices

Problem 5: Reading Logs

•  Indexes are underperforming

•  Queries are using indexes but yielding quite a bit

Page 38: Webinar: Deployment Best Practices

Diagnosis 5

•  Use .explain() and .hint() with your queries

•  Check out the b-tree metrics –  Persistent non-zero misses? –  Correlated with memory, page faults, IO stats

•  B-trees best for range queries over single dimension –  Range queries on {A} if index is {A,B} could be suboptimal

Page 39: Webinar: Deployment Best Practices

Solution 5

•  Revisit your indexing strategy

•  Consider data model changes to optimize queries and indexes

•  Some functionality doesn’t hit the index –  $where javascript clauses –  $mod, $not, $ne –  Complex regular expressions

Page 40: Webinar: Deployment Best Practices

Miscellaneous Deployment Notes

•  Warm the cache –  Use touch via db.runCommand()

•  Dynamically change log levels

•  Synchronize all clocks to the same NTP server

Page 41: Webinar: Deployment Best Practices

Questions?

Page 42: Webinar: Deployment Best Practices

How To Get Help

•  Refer to our docs: docs.mongodb.org –  (hint: they’re very helpful!)

•  Other things we monitor –  mongodb-user Google group –  Stack Overflow

•  Found a bug? Submit a ticket