PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen,...
-
Upload
puppet -
Category
Technology
-
view
73 -
download
2
Transcript of PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen,...
![Page 1: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/1.jpg)
Keeping an Eye on the PE StackAn Introduction to Measuring and Tuning PE Performance Charlie Sharpsteen, Puppet Inc.
![Page 2: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/2.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Overview
• How do I measure PE performance? What sources of data are available?
• What numbers are actually important? • What settings can I adjust when important metrics
start showing unhealthy trends?
2
![Page 3: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/3.jpg)
3
Gathering Data From PE ServicesJVM Logging and Metrics
![Page 4: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/4.jpg)
PE Server Components
TrapperKeeper JVM Puppet Server
PuppetDB Console Services
Orchestration Services
JVM ActiveMQ
Other PostgreSQL
NGINX
Mostly Java based with shared logging and metrics interfaces.
4
![Page 5: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/5.jpg)
TrapperKeeper Logging
• Configuration for main logs can be found in: /etc/puppetlabs/<service name>/logback.xml
• Controls output destinations, log levels and message formatting.
• Ship to a log aggregator to provide context for investigations.
• Default log pattern is: Date Level [Java Namespace] message
• Puppet Server also includes thread ID: Date Level [thread] [Java Namespace] message
• Thread ID is useful for grouping activity related to a single request.
5
![Page 6: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/6.jpg)
TrapperKeeper Logging
• Configuration for main logs can be found in: /etc/puppetlabs/<service name>/request-logging.xml
• Default format is Apache Combined Log + request duration
• Easily parsed by most log processors.
• Can add additional bits of information such as request headers.
6
![Page 7: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/7.jpg)
TrapperKeeper Metrics
• Metrics are recorded using JMX MBeans.
• Metrics that measure activity over time are weighted to represent the last 5 minutes.
• Metrics can be retrieved via the JMX protocol.
• Full access to all available metrics and all available measurements.
• Can attach tools such as JConsole and JVisualVM.
• Requires additional ports to be opened, configuration can be complex. Java tools only.
• Metrics can be retrieved as JSON over HTTP:
• For a curated set of common metrics: status/v1?level=debug
• For access to all available metrics: metrics/v1/mbeans
7
![Page 8: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/8.jpg)
TrapperKeeper Configuration
• Configuration files are stored under: /etc/puppetlabs/<service name>/conf.d
• Most important settings are managed by puppet_enterprise::profile classes and are tunable via the Console and Hiera.
• JVM settings are specified in /etc/sysconfig or /etc/default
• JVM memory limit, -Xmx is the primary tunable setting. Enable the G1 garbage collector when using limits higher than 10 GB: -XX:+UseG1GC
• These flags are configurable via the java_args parameter on profile classes.
8
![Page 9: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/9.jpg)
Puppet ServerIt’s all about the JRubies.
9
![Page 10: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/10.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Puppet Server Metrics Overview
● JVM resource usage: status-service
● JMX namespace: java.lang:*
● HTTP request times per endpoint: pe-master
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.http.*
● Catalog Compilation metrics: pe-puppet-profiler
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.compiler.* puppetserver:name=puppetlabs.<fqdn>.functions.* puppetserver:name=puppetlabs.<fqdn>.puppetdb.*
● JRuby Metrics: pe-jruby-metrics
● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.jruby.*
10
![Page 11: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/11.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
New PE 2016.4.0 Features
● The metrics/v1/mbeans endpoint has been added to Puppet Server. Must be enabled via Hiera: puppet_enterprise::master::puppetserver::metrics_webservice_enabled: true
● The Graphite metrics reporter has been optimized and extended:
● Only a subset of available metrics are reported by default.
● Reported metrics can be customized using the metrics_puppetserver_metrics_allowed parameter of the puppet_enterprise::profile::master class.
11
![Page 12: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/12.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
JRuby Metrics
● Almost all Puppet Server requests must be handled by a JRuby instance — this makes JRuby availability the primary performance bottleneck.
● num-free-jrubies
● Measures spare capacity for incoming requests.
● average-wait-time
● Should never grow to a significant fraction of HTTP request times.
● Impacted by agent checkin distribution, resource availability, Puppet plugins and code.
12
![Page 13: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/13.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Agent Checkin Activity
● Agents will check in runinterval after starting their last run — this can lead to pile-ups or “thundering herds”. Be careful of:
● Starting or re-starting a group of agents without the splay setting enabled.
● Triggering a group of agent runs via: mco puppet runonce
● Monitor average-requested-jrubies and Puppet Server access logs for spikes in agent activity.
● Use PostgreSQL to pull a histogram of Agent start times from report data:sudo su - pe-postgres -s /bin/bash -c "psql -d pe-puppetdb" SELECT date_part('minute', start_time), count(*) FROM reports WHERE start_time BETWEEN '2016-10-20 13:30:00' AND '2015-10-20 14:30:00' GROUP BY date_part('minute', start_time) ORDER BY date_part('minute', start_time) ASC;
13
![Page 14: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/14.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Re-balancing Agent Checkins
● Use MCollective to orchestrate a batched re-start:su - peadmin -c "mco rpc service stop service=puppet" su - peadmin -c "mco rpc service start service=puppet --batch 1 \ --batch-sleep <runinterval in seconds / #nodes>”
● Batching is not necessary if the agents have splay enabled.
● For a stable distribution that isn’t affected by re-starts, puppet agent -t can be run on a schedule determined by the fqdn_rand() function instead of using the service.
● Load due to agent activity can be cut dramatically by shifting to the Direct Puppet workflow where Orchestrator or MCollective are used to push catalog updates.
14
![Page 15: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/15.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Adding More JRuby Capacity
● JRuby count is set via jruby_max_active_instances, constrained by available CPU and RAM:
● Compile masters tend to top out around NCPU - 1. Monolithic masters need to share with PuppetDB and tend more towards (NCPU / 2 - 1).
● RAM requirements are 512 MB per JRuby, but may need to be increased if catalog compilation uses large datasets or dozens of environments are in use.
● The environment_timeout setting can be used to reduce the CPU requirements of catalog compilation. Set to 0 globally and unlimited for long-lived environments with lots of agents.
● Each environment using an unlimited timeout will add to the per-JRuby RAM requirements.Monitor memory usage of pre-2016.4.0 installations closely when using unlimited timeouts.
● Code Manager should be enabled when an unlimited timeout is used so that caches are flushed when new code is deployed.
15
![Page 16: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/16.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Investigating Compile Times
● PE Puppet Server tracks compilation time on several different levels: per-node, per-environment, per-resource, per-function, and more.
● Top 10 resources and functions are available via the status API and Puppet Server performance dashboard: https://<puppetmaster>:8140/puppet/experimental/dashboard.html
● Full access available through JMX and the metrics API.
● Detailed timing on catalog compilation can be obtained by setting the Puppet Server log level to DEBUG and running puppet agent -t --profile on nodes of interest.
16
![Page 17: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/17.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Investigating Agent Run Times
● Agent run summaries are stored at: /opt/puppetlabs/puppet/cache/state/last_run_summary.yaml
● Summaries are also stored by PuppetDB and can be viewed from the PE Console, or queried: reports[metrics] { latest_report? = true and certname = '<node name>' }
● The time section shows amount of time taken per resource type along with config_retrieval measuring the amount of time it took to receive a catalog.
● Per-resource timing can be logged by running: puppet agent -t --evaltrace
17
![Page 18: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/18.jpg)
PuppetDBProcessing Time and Storage Space
18
![Page 19: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/19.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Storage Usage
● Monitor disk space! /opt/puppetlabs/server/data/postgresql/ /opt/puppetlabs/server/data/puppetdb/
● If disk space runs out, there are two options for returning space to the operating system:
● The existing volume can be enlarged so that a VACUUM FULL can be run.
● Alternately, a new volume can be attached for a database backup and restore.
● The primary source of disk usage is report storage, this can be tuned by setting: report-ttl
● For infrastructure with high node turnover, consider setting node-purge-ttl to remove data related to decommissioned nodes.
19
![Page 20: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/20.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Command Processing
● Every PuppetDB operation, aside from queries, is executed by an asynchronous command processing queue. This queue is managed by an internal ActiveMQ server:org.apache.activemq:type=Broker,brokerName=localhost, destinationType=Queue,destinationName=puppetlabs.puppetdb.commands
● Important metrics:
● Backlog of commands waiting for processing: QueueSize
● Largest command seen: MaxMessageSize
● Available memory for in-flight commands: MemoryPercentUsage
● Increase PuppetDB heap size along with the command-processing.memory-usage setting if the percentage spikes close to 100%. This will prevent ActiveMQ from paging commands to disk.
20
![Page 21: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/21.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Command Processing
● Command processing rates: puppetlabs.puppetdb.mq:name=global.processing-time puppetlabs.puppetdb.storage:name=replace-facts-time puppetlabs.puppetdb.storage:name=replace-catalog-time puppetlabs.puppetdb.storage:name=store-report-time
● Additional processing threads can be added using the command-processing.threads setting.
● On a monolithic install, PuppetDB processing threads must be balanced against Puppet Server JRubies and the number of CPU cores available.
21
![Page 22: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/22.jpg)
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PostgreSQL Query Performance
● PostgreSQL configuration can be found in: /opt/puppetlabs/server/data/postgresql/9.4/data/postgresql.conf
● Add settings to improve logging around slow queries: log_min_duration_statement = 3000ms log_temp_files = 0
● If a temp file shows up in the logs, that means Postgres had to perform an operation outside of RAM; which is slow. Consider increasing the work_mem setting to be greater than the size of the temp files used.
● If query performance has been dropping over time, a database VACCUM may be needed: su - pe-postgres -s /bin/bash -c "vacuumdb --analyze --verbose --all"
22
![Page 24: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/24.jpg)
Resources
Logging:
• Directing Output: http://logback.qos.ch/manual/appenders.html
• Formatting Main Logs: http://logback.qos.ch/manual/layouts.html
• Formatting Access Logs: http://logback.qos.ch/manual/layouts.html#logback-access
JMX:
• Configuration: https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html
• Metric Polling Tool: https://github.com/jmxtrans/jmxtrans
24
![Page 25: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/25.jpg)
Resources
Puppet Server:
• Metrics Reference: https://docs.puppet.com/pe/2016.4/puppet_server_metrics.html
• Configuration Reference: https://docs.puppet.com/puppetserver/2.6/configuration.html
• Direct Puppet Workflow: https://docs.puppet.compe/2016.4/direct_puppet_workflow.html
PuppetDB:
• Metrics Reference: https://docs.puppet.com/puppetdb/4.2/api/metrics/v1/mbeans.html
• Configuration Reference: https://docs.puppet.com/puppetdb/4.2/configure.html
• Backup Procedures: https://docs.puppet.com/pe/2016.4/maintain_console-db.html
• PostgreSQL Maintenance: https://github.com/npwalker/pe_databases
25
![Page 26: PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet](https://reader031.fdocuments.in/reader031/viewer/2022030305/5874f9621a28ab29208b483b/html5/thumbnails/26.jpg)