Running MongoDB in Production, Part I€¦ · MongoDB supports x.509 certificate authentication for...
Transcript of Running MongoDB in Production, Part I€¦ · MongoDB supports x.509 certificate authentication for...
Speaker Name
Running MongoDB in Production, Part ITim VaillancourtSr Technical Operations Architect, Percona
{name: “tim”,lastname: “vaillancourt”,employer: “percona”,techs: [
“mongodb”,“mysql”,“cassandra”,“redis”,“rabbitmq”,“solr”,“mesos”“kafka”,“couch*”,“python”,“golang”
]}
`whoami`
Agenda
● Backups○ Logical vs Binary○ Architecture○ Percona-Lab/mongodb_consistent_backup
● Security○ MongoDB Authorization○ System, Network and Filesystem best practices○ Connection and data encryption
● Monitoring○ Methodology○ Important Metrics○ Percona Monitoring and Management
● Data○ Document: single *SON object, often nested○ Field: single field in a document○ Collection: grouping of documents○ Database: grouping of collections○ Capped Collection: A fixed-size FIFO collection
● Replication○ Oplog: A special capped collection for replication○ Primary: A replica set node that can receive writes○ Secondary: A replica of the Primary that is read-only
Terminology
● Replication○ Election: The process to determine a new
Primary member○ Voting: The process of a single node voting in
an election○ Hidden-Secondary: A replica that cannot
become Primary○ Majority: “Most” of the members are available
or have acknowledged a change■ 3 node replica set = 2 nodes required for majority■ 5 node replica set = 3 nodes required for majority
Terminology
● Sharding○ Shard: a replica set or single node containing a
piece of the cluster data○ Shard Key: the document key used to partition
data○ Chunk: a range of the shard key○ Partitioned Collection: a collection distributed
amongst shards○ Config Server: a MongoDB server dedicated to
storing the sharding metadata
Terminology
Backups“An admin is only worth the backups they keep” ~ Unknown
Backups: Logical
● ‘mongodump’ tool from mongo-tools project● Supports
○ Multi-threaded dumping in 3.2+○ Optional inline gzip compression of data○ Optional dumping of oplog for single-node consistency○ Replica set awareness (via readPreference)
■ Ie: primary, primaryPreferred, secondary, secondaryPreferred, nearest● Process
○ Tool issues .find() with $snapshot query○ Stores BSON data in a file per collection○ Stores BSON oplog data in “oplog.bson”, even when compressed
Backups: Logical
● Useful for...○ upgrades of very old systems, eg: 2.6 -> 3.4 upgrade○ protection from binary-level/storage-engine corruption○ export/import to different CPU architecture
● Limitations○ Index metadata only in backup
■ Indexes are rebuilt entirely, in serial!!■ Often indexing process takes longer than restoring the data!■ Expect hours or days of restore time
○ Not Sharding aware■ Sharded backups are not Point-in-Time consistent
Backups: Logical
● Limitations○ Fetch from storage-engine, serialization, networking, etc is very inefficient○ Oplogs fetched in batch at end / oplog must be as long as the backup run-time○ Wire Protocol Compression (added in 3.4+) not supported yet:
https://jira.mongodb.org/browse/TOOLS-1668 (Please vote/watch Issue!)
Backups: Binary
● Options○ Cold Backup○ LVM Snapshot○ Hot Backup
■ Percona Server for MongoDB (FREE!)■ MongoDB Enterprise Hot Backup (non-free)■ NOTE: MMAPv1 not supported
● Benefits○ Indexes are backed up == faster restore!○ Storage-engine format backed up == faster backup AND restore!
Backups: Binary
● Limitations○ Increased backup storage requirements○ Compression is storage-engine dependant○ CPU Architecture limitations (64-bit vs 32-bit)○ Cascading corruption○ Batteries not included
■ Not Sharding aware■ Not Replica Set aware
● Process○ Cold Backup
■ Stop a mongod SECONDARY, copy/archive dbPath
Backups: Binary
● Process○ LVM Snapshot
■ Optionally call ‘db.fsyncLock()’ (not required in 3.2+ with Journaling)■ Create LVM snapshot of the dbPath■ Copy/Archive dbPath■ Remove LVM snapshot (as quickly as possible!)■ NOTE: LVM snapshots can cause up to 30%* write latency impact to disk (due to COW)
Backups: Binary
● Process○ Hot Backup (PSMDB or MongoDB Enterprise)
■ Pay $$$ for MongoDB Enterprise or download PSMDB for free(!)■ db.adminCommand({
createBackup: 1,backupDir: "/data/mongodb/backup"
})■ Copy/archive the output path■ Delete the backup output path■ NOTE: RocksDB-based createBackup creates filesystem hardlinks whenever possible!■ NOTE: Delete RocksDB backupDir as soon as possible to reduce bloom filter overhead!
Backups: Architecture
● Risks○ Dynamic nature of Replica Set○ Impact of backup on live nodes
● Example: Cheap Disaster-Recovery○ Place a ‘hidden: true’ SECONDARY in another location○ Optionally use cloud object store (AWS S3, Google GS,
etc)
Backups: Architecture
● Example: Replica Set Tags○ “tags” allow fine-grained server selection with
key/value pairs○ Use key/value pair to fence various application
workflows○ Example:
■ { “role”: “backup” } == Backup Node■ { “role”: “application” } == App Node
Backups: mongodb_consistent_backup
● Python project by Percona-Lab for consistent backups● URL: https://github.com/Percona-Lab/mongodb_consistent_backup● Best-effort support, not a “Percona Product”● Created to solve limitations in MongoDB backup tools:
○ Replica Set and Sharded Cluster awareness○ Cluster-wide Point-in-time consistency○ In-line Oplog backup (vs post-backup)○ Notifications of success / failure
Backups: mongodb_consistent_backup
● Extra Features○ Remote Upload (AWS S3, Google Cloud Storage and Rsync)○ Archiving (Tar or ZBackup deduplication and optional AES-at-rest)○ CentOS/RHEL7 RPMs and Docker-based releases (.deb soon!)○ Single Python PEX binary○ Multithreaded / Concurrent○ Auto-scales to available CPUs
Backups: mongodb_consistent_backup
● Low-Impact○ Tool focuses on low impact○ Uses Secondary nodes only○ Considers (Scoring)
■ Replication Lag■ Replication Priority■ Replication Health / State■ Hidden-Secondary State (preferred by tool)■ Fails if chosen Secondary becomes Primary (on purpose)
Backups: mongodb_consistent_backup
● Future○ Incremental Backups○ Binary-level Backups (Hot Backup, Cold Backup, LVM, Cloud-based, etc)○ More Notification Methods (PagerDuty, Email, etc)○ Restore Helper Tool○ Instrumentation / Metrics○ <YOUR AWESOME IDEA HERE> we take GitHub PRs (and it’s Python)!
Backups: mongodb_consistent_backup
● Simple Restore○ Seamless restore: “mongorestore --oplogReplay --gzip --dir /path/to/backup”
● Restore an Entire Cluster○ Mongorestore backups of config servers
■ If restoring old/SCCC config servers, restore to every node■ If restoring replica-set config servers
● Ensure Replica Set is initiated (rs.initiate() / rs.config())● Ensure SECONDARY members are added (via PRIMARY)● Restore to PRIMARY only
○ Update “config.shards” documents if shard hosts/ports changed
Backups: mongodb_consistent_backup
● Restore an Entire Cluster○ Mongorestore each shard from backup subdirectory (matches shard name)○ Start mongos process and test / QA
■ Tip: stopping the balancer may simplify troubleshooting any problems
Security“Think of the network like a public place”
Security: Authorization
● Always enable auth on Production Installs!● Built-in Roles
○ Database User: Read or Write data from collections■ “All Databases” or Single-database
○ Database Admin: Non-RW commands (create/drop/list/etc)○ Backup and Restore: ○ Cluster Admin: Add/Drop/List shards○ Superuser/Root: All capabilities
Security: Authorization
● User-Defined Roles○ Exact Resource+Action specification○ Very fine-grained ACLs
■ DB + Collection specific
Security: Filesystem Access
● Use a service user+group○ ‘mongod’ or ‘mongodb’ on most systems○ Ensure data path, log file and key file(s) are owned by this
user+group● Data Path
○ Mode: 0750● Log File
○ Mode: 0640○ Contains real queries and their fields!!!
■ See Log Redaction for PSMDB (or MongoDB Enterprise) to remove these fields
Security: Filesystem Access
● Key File(s)○ Files Include: keyFile and SSL certificates or keys○ Mode: 0600
Security: Network Access
● Firewall○ Single TCP port
■ MongoDB Client API■ MongoDB Replication API■ MongoDB Sharding API
○ Sharding■ Only the ‘mongos’ process needs access to shards■ Client driver does not need to reach shards directly
○ Replication■ All nodes must be accessible to the driver
Security: Network Access
● Internal Authentication: Use a key to use inter-node replication/sharding
● Creating a dedicated network segment for Databases is recommended!● DO NOT allow MongoDB to talk to the internet at all costs!!!
Security: System Access
● Recommended to restrict system access to Database Administrators● A “shell” on a system can be enough to take the system over!● SELinux
○ Linux Kernel Built-in Security mechanism○ Massively reduces the attack possibilities on a system by using ACLs/policies○ Modes
■ Enforcing: Do not allow policy violations■ Permissive: Log and allow policy violations■ Disabled: I really don’t like security!
○ Enforcing mode supported with Percona Server for MongoDB when using CentOS / RHEL 7+ RPMs■ SELinux NOT supported by MongoDB Community or Enterprise binaries!!
Security: External Authentication
● LDAP Authentication○ Supported in PSMDB and MongoDB Enterprise○ The following components are necessary for external authentication to work
■ LDAP Server■ SASL Daemon■ SASL Library
Security: External Authentication
● LDAP Authentication○ Creating a User:
db.getSiblingDB("$external").createUser( {user: “christian”,pwd: “secret”roles: [
{ role: "read", db: "test"}]
} );
○ Authenticating as a User:db.getSiblingDB("$external").auth({
mechanism: "PLAIN",user: "christian",pwd: "secret",digestPassword: false
})
○ Other auth methods possible with MongoDB Enterprise
Security: SSL Connections and Auth
● SSL / TLS Connections○ Supported since MongoDB 2.6x
■ May need to complile-in yourself on older binaries■ Supported 100% in Percona Server for MongoDB
○ Minimum of 128-bit key length for security○ Relaxed and strict (requireSSL) modes○ System (default) or Custom Certificate Authorities are accepted
Security: SSL Connections and Auth
● SSL Client Authentication (x509)○ MongoDB supports x.509 certificate authentication for use with a secure TLS/SSL
connection as of 2.6.x.○ The x.509 client authentication allows clients to authenticate to servers with
certificates rather than with a username and password.○ Enabled with: security.clusterAuthMode: x509
Security: Encryption at Rest
● MongoDB Enterprise○ Encryption supported in Enterprise binaries ($$$)
● Percona Server for MongoDB○ Use CryptFS/LUKS block device for encryption of data
volume○ Documentation published (or coming soon)○ Completely open-source / Free
Security: Encryption at Rest
● Application-Level○ Selectively encrypt only required fields in application○ Benefits
■ The data is only readable by the application (reduced touch points)
■ The resource cost of encryption is lower when it’s applied selectively
■ Offloading of encryption overhead from database
Security: Network Firewall
● MongoDB only requires a single TCP port to be reachable (to all nodes)○ Default port 27017○ This does not include monitoring tools, etc
■ Percona PMM requires inbound connectivity to 1-2 TCP ports ● Restrict TCP port access to nodes that require it!● Sharded Cluster
○ Application servers only need access to ‘mongos’○ Block direct TCP access from application -> shard/mongod instances
■ Unless ‘mongos’ is bound to localhost!
Security: Network Firewall
● Advanced○ Move inter-node replication to own network fabric, VLAN, etc○ Accept client connections on a Public interface
MongoDB: Source IP Restrictions
● “authenticationRestrictions” added to db.createUser() in MongoDB 3.6● Allows access restriction by client source IP(s) and/or IP range(s)● Example:
db.createUser({user: "admin",pwd: "insertSecurePasswordHere",roles: [
{ db: "admin", role: "root" }],authenticationRestrictions: [
{ clientSource: [ "127.0.0.1", "10.10.19.0/24" ] }]
})
Monitoring"If a tree falls in a forest and no one is around to hear it,
does it make a sound?"
Monitoring: Methodology
● Monitor often○ 60 - 300 seconds is not enough!○ Problems can begin/end in seconds
● Correlate Database and Operating System together!
● Monitor a lot○ Store more than you graph○ Example: PMM gathers 700-900 metrics per polling
Monitoring: Methodology
● Process○ Add monitoring○ Use monitoring to troubleshoot Production events /
incidents○ Iterate and Improve monitoring
■ Add graphing for whatever made you SSH to a host○ Review with someone unfamiliar with the problem
■ If reviewer can’t see the problem, start over!
Monitoring: Important Metrics
● Database○ Operation counters○ Cache Traffic and Capacity○ Checkpoint / Compaction Performance○ Concurrency Tickets (WiredTiger and RocksDB)○ Document and Index scanning○ Various engine-specific details
Monitoring: Important Metrics
● Operating System○ CPU
■ User● Compression (WiredTiger and RocksDB)● Encryption● Sorting, aggregations, groupings, etc
■ System● Connections● IO Scheduling
○ Disk○ Bandwidth / Util○ Average Wait Time○ Memory and Network
Monitoring: Percona PMM
● Open-source monitoring from Percona!
● Based on open-source technology
○ Prometheus○ Grafana○ Go Language
● Simple deployment● Examples in this demo are
from PMM!● Correlation of OS and DB
Metrics● 800+ metrics per ping
Monitoring: Percona PMM
● Simple deployment● Examples in this demo are
from PMM!● Correlation of OS and DB
Metrics● 800+ metrics per ping
Speaker Name
To be continued...
April 26th, 2018 May 3rd, 2018
Speaker Name
Questions?
49