Nagios XI – Monitoring Websites - Nagios - The Industry Standard
Simply monitor a grid site with Nagios
description
Transcript of Simply monitor a grid site with Nagios
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Simply monitor a grid site with NagiosJ. Casey, CERN
E. Imamagic, SRCE
ISGC 2008
ISGC 2008 / Simply monitor a grid site with Nagios 2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Overview
• Nagios• Nagios-based grid monitoring• Site monitoring prototype• Demo• Current status• Future work• Conclusions
ISGC 2008 / Simply monitor a grid site with Nagios 3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Nagios
• Open source monitoring framework– widely used & actively developed
• Host and service problems detection and recovery• Provides wide set of basic sensors
– easy to develop custom sensors
• Centralized vs. distributed deployment• High configurability
– service dependencies, fine-grained notification options
• Web interface– status view, administration
ISGC 2008 / Simply monitor a grid site with Nagios 4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Nagios-based Grid Monitoring
• Monitoring CRO-GRID Infrastructure (2004-2006)– Globus Toolkit Pre-WS & WS, UNICORE, other services– active recovery of services– http://www.cro-ngi.hr
• Monitoring EGEE resources in Central Europe (CE)– core services since mid 2006– all CE sites for 1st line support since September 2006– http://nagios.ce-egee.org
• Grid Services Monitoring (GSM) WG– site monitoring prototype, mid 2007– http://crnjak.srce.hr/nagios (egee.srce.hr)– https://pps-monitoring.cern.ch/nagios (CERN-PPS)
ISGC 2008 / Simply monitor a grid site with Nagios 5
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Site Monitoring Prototype
Nagios
…Site
nodes
Site BDII CE SE LFC
NCG
NPM
Remote gatherers
Standard probes
Credential refresh
MyProxy
VOMS
Long-lived MyProxy certificate
Nagios configuration
Refresh
proxy
Get VOMS proxy
Probe wrapper
VOMS proxy certificate
Service checks
Get remote results
Probe descriptions
SAM
…
Get site’s & nodes
information
Get nodes information
Live node checks
Publisher
External applications
Get Nagios results
Nagios web
interface
Site admins
Get site status
Issue alarms
Monitoring server
ISGC 2008 / Simply monitor a grid site with Nagios 6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Grid Probes
• Provided by SRCE, CERN, OSG• Security facilities & services
– CA distribution, Certificate lifetime, MyProxy
• Monitoring & information services – R-GMA, BDII, MDS, GridICE
• Job management services – Globus Gatekeeper, RB, WMS, WMProxy, Job matching
• File management services – GridFTP, SRM, DPNS, LFC, FTS
ISGC 2008 / Simply monitor a grid site with Nagios 7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Standard Components
• Specifications defined by GSM WG• Probe wrapper
– enables integration of standardized probes– Grid Monitoring Probes Specification– https://twiki.cern.ch/twiki/bin/view/LCG/
GridMonitoringProbeSpecification
• Publisher & remote gatherers– integration with other tools– Grid Monitoring Data Exchange Standard– https://twiki.cern.ch/twiki/bin/view/LCG/
GridMonitoringDataExchangeStandard
ISGC 2008 / Simply monitor a grid site with Nagios 8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Nagios Config Generator
• Uses multiple information sources– SAM, BDII, active heuristic checks
• Modular approach– plugging in additional information sources– integration with other monitoring systems (e.g. LEMON)
• User-defined rules– configuration tuning for non-standard grid sites
• Standalone configuration– integration with existing Nagios server
ISGC 2008 / Simply monitor a grid site with Nagios 9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Remote gLite UI
• Avoid installation of grid middleware on Nagios server– execute grid probes on existing gLite UI– use Nagios Remote Plugin Executor (NRPE)
…Site
nodes
Site BDII CE SE LFC
Nagios
Nagios server
Standard probes
gLite UI
NRPE
Service checks
ISGC 2008 / Simply monitor a grid site with Nagios 10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ISGC 2008 / Simply monitor a grid site with Nagios 11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ISGC 2008 / Simply monitor a grid site with Nagios 12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
ISGC 2008 / Simply monitor a grid site with Nagios 13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
SAM
Standard probes
NPM
ISGC 2008 / Simply monitor a grid site with Nagios 14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Current Status
• Three sets of standard probes integrated– SRCE, CERN, OSG
• Two external monitoring systems– SAM, ENOC DownCollector
• Several deployments– CERN-PPS, SRCE, NIKHEF, PIC, IN2P3, ScotGrid
• RPMs in apt and yum repository• Installation and configuration manual• More info
https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo
ISGC 2008 / Simply monitor a grid site with Nagios 15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Future Work
• NCG development– providing configuration for multiple sites (regional monitoring)– providing configuration for multiple VOs
• Integration with global monitoring systems– ActiveMQ messaging system– Operations Automation Team mandate
• Enabling “on-host” check via NRPE– process, logs, ports, files, etc
• Probe description & site topology databases definition
ISGC 2008 / Simply monitor a grid site with Nagios 16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Conclusions
• Nagios – highly configurable monitoring framework with notifications,
service dependencies, …– widely used by site admins
• Grid extensions– integration with existing infrastructure (user certificates, VOMS,
GOCDB, SAM)– probes for key grid services
• Implementation of GSM WG specifications– probe wrapper, publisher & remote gatherers– easy integration with existing probes and monitoring systems
ISGC 2008 / Simply monitor a grid site with Nagios 17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Thank You!
Questions?