Metrics with Ganglia
-
Upload
gareth-rushgrove -
Category
Technology
-
view
6.776 -
download
2
description
Transcript of Metrics with Ganglia
gareth rushgrove | morethanseven.net
Collecting MetricsWith Ganglia and Friends
Cambridge Geek Night 28th March 2011
http://www.flickr.com/photos/memestate/45986749
Gareth Rushgrove
gareth rushgrove | morethanseven.net
Work at FreeAgent
gareth rushgrove | morethanseven.net
freeagentcentral.com
Blog at morethanseven.net
gareth rushgrove | morethanseven.net
Curate devopsweekly.com
gareth rushgrove | morethanseven.net
Covering (Business Version)
gareth rushgrove | morethanseven.net
- Capacity planning metrics
- Metrics for your application- Business analytics
- Having everything in one place
Covering (Tech Version)
gareth rushgrove | morethanseven.net
- Ganglia Store metrics and view graphs
- Logster Get log files into Ganglia
- Gmetric Get anything into Ganglia
- Syslog Using Loggly to view individual log items
Everyone Uses Something Like?
gareth rushgrove | morethanseven.net
Use Something Like This Too
gareth rushgrove | morethanseven.net
What is Ganglia?
gareth rushgrove | morethanseven.net
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.ganglia.sourceforge.net
“
Example: vagrantbox.es
gareth rushgrove | morethanseven.net
Load Averages
gareth rushgrove | morethanseven.net
CPU
gareth rushgrove | morethanseven.net
Aggregate Graphs
gareth rushgrove | morethanseven.net
Across Entire Cluster
gareth rushgrove | morethanseven.net
Predicting When Your System Will Fail
gareth rushgrove | morethanseven.net
A strategy for anticipating future workloads of your computers, with the aim of creating a computing environment that can handle future workloadIBM
“
Disk Space
gareth rushgrove | morethanseven.net
Monitoring Your Application
gareth rushgrove | morethanseven.net
86.26.7.33 - - [26/Mar/2011:20:39:52 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.1" 200 2081 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-us) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"
Web Server Logs
gareth rushgrove | morethanseven.net
Logster from Etsy
gareth rushgrove | morethanseven.net
Tail a log file and filter each line to generate metrics that can be sent tocommon monitoring packages.
Options: -p METRIC_PREFIX, --metric-prefix=METRIC_PREFIX Add prefix to all published metrics. This is for people that may multiple instances of same service on same host. --gmetric-options=GMETRIC_OPTIONS Options to pass to gmetric such as -d 180 -c /etc/ganglia/gmond.conf (default). These are passed directly to gmetric. --graphite-host=GRAPHITE_HOST Hostname and port for Graphite collector, e.g. graphite.example.com:2003 -s STATE_DIR, --state-dir=STATE_DIR Where to store the logtail state file. Default location /var/run -d, --dry-run Parse the log file but send stats to standard output. -D, --debug Provide more verbose logging for debugging.
Logster
gareth rushgrove | morethanseven.net
logster SampleGangliaLogster /../access.log
Logster Command Line
gareth rushgrove | morethanseven.net
HTTP Responses with a 2xx Status Code
gareth rushgrove | morethanseven.net
The Ganglia Metric Client (gmetric) announces a metricon the list of defined send channels defined in a configuration file
Usage: gmetric [OPTIONS]... -V, --version Print version and exit -c, --conf=STRING The configuration file to use for finding send channels (default='/etc/ganglia/gmond.conf') -n, --name=STRING Name of the metric -v, --value=STRING Value of the metric -t, --type=STRING Either string|int8|uint8|int16|uint16|int32|uint32|float|double -u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius (default='') -s, --slope=STRING Either zero|positive|negative|both (default='both') -x, --tmax=INT The maximum time in seconds between gmetric calls (default='60') -d, --dmax=INT The lifetime in seconds of this metric (default='0') -S, --spoof=STRING IP address and name of host/device (colon separated) we are spoofing (default='') -H, --heartbeat spoof a heartbeat message (use with spoof option)
Gmetric
gareth rushgrove | morethanseven.net
Gmetric Scripts for Common Applications
gareth rushgrove | morethanseven.net
gmetric -n sales -v 200 -t float
Gmetric Command Line
gareth rushgrove | morethanseven.net
Our Custom Metric in Ganglia
gareth rushgrove | morethanseven.net
import subprocess
from bottle import route, run, abort, default_app
@route('/:name/:value')def index(name, value): try: cmd = 'gmetric -n %s -v %s -t float' % (name, value) subprocess.check_call( cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return "Success: %s" % cmd except subprocess.CalledProcessError: abort(500, "Error")
app = default_app()
Gmetric HTTP Interface
gareth rushgrove | morethanseven.net
http://../sales/200
Gmetric URL
gareth rushgrove | morethanseven.net
import subprocessimport SocketServer
class GmetricTCPHandler(SocketServer.BaseRequestHandler):
def handle(self): self.data = self.request.recv(1024).strip() items = self.data.split(' ') try: cmd = 'gmetric -n %s -v %s -t float' % (items[0], items[1]) subprocess.check_call( cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return "Success: %s" % cmd except Exception: return "Error"
if __name__ == "__main__": HOST, PORT = "0.0.0.0", 8001 server = SocketServer.TCPServer((HOST, PORT), GmetricTCPHandler) server.serve_forever()
Gmetric TCP Interface
gareth rushgrove | morethanseven.net
sales 200
Gmetric TCP
gareth rushgrove | morethanseven.net
Syslog
gareth rushgrove | morethanseven.net
Syslog is a standard for logging program messages. It allows separation of the software that generates messages from the system that stores them and the software that reports and analyzes them.Wikipedia
“
Loggly - Logging as a Service
gareth rushgrove | morethanseven.net
View logs
gareth rushgrove | morethanseven.net
Logstash
gareth rushgrove | morethanseven.net
Graylog2
gareth rushgrove | morethanseven.net
Other Things You Could Monitor
gareth rushgrove | morethanseven.net
- Database table sizes
- Cache hits- Time taken for test runs
- Codebase size
- Signups, sales, subscriptions
- Twitter followers
What Next?
gareth rushgrove | morethanseven.net
- Wikipedia http://ganglia.wikimedia.org/
- Install Ganglia deb and rpm packages available
- Add system metrics web servers, databases
- Add business metrics users, sales, tweets
- Try Loggly or at least investigate syslog
gareth rushgrove | morethanseven.net
Reading
CBGN11
2 months free on FreeAgent
gareth rushgrove | morethanseven.net
Questions?
gareth rushgrove | morethanseven.net http://flickr.com/photos/psd/102332391/