© Blackboard, Inc. All rights reserved. Monitoring Blackboard - Presenter: Volker Kleinschmidt...
-
Upload
claire-barrett -
Category
Documents
-
view
223 -
download
2
Transcript of © Blackboard, Inc. All rights reserved. Monitoring Blackboard - Presenter: Volker Kleinschmidt...
© Blackboard, Inc. All rights reserved.
Monitoring Blackboard-
Presenter:Volker Kleinschmidt
Blackboard Client Support
Session Abstract» Monitoring Blackboard
» As your Blackboard system becomes more and more mission critical, the need to monitor its availability and performance increases. Collecting such data over time allows easier troubleshooting and problem isolation. This session will present several approaches for monitoring whether and how well key areas of Blackboard are performing
About Forward-Looking Statements» We may make statements regarding our
product development and service offering initiatives, including the content of future product upgrades, updates or functionality in development. While such statements represent our current intentions, they may be modified, delayed or abandoned without prior notice and there is no assurance that such offering, upgrades, updates or functionality will become available unless and until they have been made generally available to our customers.
Quote
"It isn't a service if it isn't monitored. If there is no monitoring then you're just running software." -- Tom Limoncelli
© Blackboard, Inc. All rights reserved.
Types of Monitoring»Liveness Monitoring
»Status Monitoring»Time Series Monitoring
»Performance Monitoring»Predictive Analysis
Liveness MonitoringIs the server or service up?Most basic and simple form of monitoring
» ping to network interface» database: tnsping (best done from
appserver)» webserver: GET /nodatabase.html » tomcat: GET /webapps/login» modperl/PerlEx: GET /bin/button_gallery.pl» collab: telnet to ports 8010, 8011, 8443 (if
ssl)
Nagios status page for BB Client Support Test Lab
Status MonitoringIs the system functioning normally?Current values of common performance
parameters» system load» CPU» memory free/used/swap» disk I/O» network I/O» disk free spaceSend alerts when thresholds exceeded
UIC (Illinois Chicago) internal system monitor
Status MonitoringIs the application functioning normally?Can users login successfully?» POST to /webapps/login/ (use known credentials)Do typical tasks take reasonable time?» execute timed test script via cmdline browser» load portal for a known user» visit and browse through a known course» visit typical applications: forums, quiz, gradebook» setup can take significant work – share scripts and
test course in user community to distribute effort
Time Series MonitoringTrend: how well is server handling its load?» Domain of graphing tools such as RRDtool
» measure various usage and load factors» can measure resulting performance» determine system usage + server health at
once» provides basis for administrative usage reports» allows after-the-fact analysis of problems» should be combined with status monitoring
efforts for threshold-based warnings
UAA (Alaska Anchorage) BB Dashboard
Clemson’s Ganglia Dashboard
Status vs. Performance» Status Monitoring has qualitative focus» Can certain operations be performed
within reasonable time, below warning threshold?
» If not, trigger alert in monitor tool» Performance M. has quantitative focus» Just how long did operation X take?» Based on time series monitoring» Rarely necessary for ongoing operations
Predictive Analysis
Will the server still handle its load next term?
» Requires availability of historic data» Must use grapher tool + keep snapshots» Identify and plan for worst-case
scenarios» Historic load patterns allow predicting
future demand – but factor in changes in policy, adoption, usage patterns
© Blackboard, Inc. All rights reserved.
Monitoring Tools
Monitoring HostGrapher
Data provider/gatherer
Host applications
» Wealth of commercial and free offerings» SNMP is king for liveness/status monitors» But reportable data is quite restricted» Graphers rely on any type of numeric data,
provided on time interval basis by an SNMP agent or by a data gathering script in a file
» Many/most graphers based on free RRDtool
Monitoring Platforms
Popular liveness & status monitoring hosts:
» Big Brother (& Big Sister)» mon» Nagios» HP OpenView» SysOrb» WhatsUp
free, open source
commercial
Graphing tools
» Based on RRDtool (Tobias Oetiker, ETH)» Round Robin Database stores time-series data,
e.g. current CPU load average, measured every 5 minutes – this is a “data source”
» MRTG, cricket, orca, cacti, Bronc, Munin» Zabbix, Airwave, Big Sister, Torrus, NISCA
Munin
Orca
Similarities...RRDtool under the hood
The Differences
» Setup and Configuration - hard or terrible?
» Web Interface quality, navigability» Configurability (e.g. time intervals)» SNMP support built-in?» What pollers / data gatherers provided?» Ease of writing plugins / gatherers
© Blackboard, Inc. All rights reserved.
Data Gathering
service parametersdatabase parameters
system usage parameters
Getting the Data to Report
» SNMP defines a set of MIBs – agents are pre-compiled to report these
» mon etc. come with their own set of things to monitor (e.g. vmstat output)
» everything else needs to be gathered by scripts you write
» nobody said this was easy
The Concept of Data Sources
» Each data source is a numeric entity with its own range of possible values, units, critical values, name, description, color...
» Some data sources come from SNMP agents, others can be put into NFS-mounted files by remote jobs
» Object tree: machine > service > data
Data Collector
» Single cron job scheduled every 5min» Fires off various data collection tasks, e.g.
vmstat, apache status, df» Simple numbers (e.g. apache total hits)
require post-processing to be useful» e.g. keep two files (.last, .prev), calculate
and report difference to get interval value
Sample Service Parameters
» HTTP requests (hits) – cumulative/current» bytes transferred» number of threads/processes» number of active processes» apache reports all this and more via
server-status?auto (KB 181-2560)» IIS has Perfmon counters for these
Example: tomcat thread count» BBDIR=/usr/local/blackboard» TCPID=`cat $BBDIR/logs/pid-files/tomcat.pid`» Solaris: ps –Lp $TCPID | wc –l |sed ‘s/ //g’» Linux: ps –eo pid,ppid | grep $TCPID | ...» or: pstree –p $TCPID | ...» RHEL3: ps –emo pid,ppid | grep $TCPID | ...» Windows: pv java* -l”*tmpdir*” –o “%t”
(find TCPID via pv also: pv java* -l”*tmpdir*” -o”%i”)
» pv = freely downloadable cmdline Process Viewer(use pv –h to find out about invocation details)
Multiple data sources, one filecurl http://localhost/server-status?auto 2>/dev/null | \
head -9 | cut –d: -f2 | cut –b2- >apachestats»15»20».00847458»354».0423729»57.8531»1365.33»1»9Need to configure your graphs with legends, correct intervals etc. to know what these all mean
Sample database parameters» free space per tablespace» number of current DB sessions» number of executions of top V$SQL
items» too many reportable things to list» some of these parameters change
rarely, so don’t query them that often
Sample system usage parameters» # current authenticated sessions (from DB)
(Seneca WhosOnline auto-script)» # logins in last 5 mins (from webserver log)» # current chat participants & courses» # currently active quiz attempts (not!)» number of courses with >10 forum posts
(expensive queries can be run once daily)
A word of warning
» Beware the Heisenberg-principle» Avoid resource-intensive measurements
(don’t count hits in activity_accumulator)» Avoid over-monitoring – reports that
nobody reads are a great waste» Fully document/label your reports, or they
are useless to anyone but you
Community contributions?
» Ideally we could build a collection of user-contributed monitoring scripts and tools
» Listserv is too ephemeral» Blackboard Community site?» Lots of work to be done!
» Interest in a possible consulting offering?
Further Info
» The mother of monitoring links:» http://slac.stanford.edu/xorg/nmtf/nmtf
-tools.html» John Sellens’ Monitoring page:» http://www.generalconcepts.com/resou
rces/monitoring/