Monitoring and performance measurement in Production Grid Environments David Wallom.
-
date post
18-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of Monitoring and performance measurement in Production Grid Environments David Wallom.
![Page 1: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/1.jpg)
Monitoring and performance measurement in Production
Grid EnvironmentsDavid Wallom
![Page 2: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/2.jpg)
Overview
• Who uses monitoring?
• Aspects of performance measurement
• Tools for monitoring
• Adding a new service into a monitoring framework
![Page 3: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/3.jpg)
Who are the consumers of monitoring?
• Grid/VO management– Responsible for designing & maintaining requirements
– Verify fulfillment of SLAs by resource providers
• System administrators– Notified of problems– Enough information to understand context of problem
• End users– View results and compare to problems they are having
– Debug user account/environment issues– Advanced users: feedback to Grid/VO
![Page 4: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/4.jpg)
Monitoring from a user perspective
• Things that need to work for the Grid?– Can I login?– Is my application[s] available on connected systems?
– Can I get to my input data?– What credentials do I need?– Can I get the input data to the application?– How long will my application take to run?– …
![Page 5: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/5.jpg)
Performance Measurement
• Depends on monitoring of;– Availability– Usage
![Page 6: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/6.jpg)
Measuring Availability
• Test the following grid functionality– User authorization– System information publishing– Data transfer to and from system– Submission of tasks onto the system
• Measurement of other functionality– Type of system
![Page 7: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/7.jpg)
Measuring Usage
• Within each system need to know;– Current load
• e.g. queue lengths, number of running processes on an SMP system
– Knowledge of network connectivity– Total throughput rate for a submitted user job
![Page 8: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/8.jpg)
Tools for monitoring availability
• Systems status
• Grid status
• All system and grid status monitoring
![Page 9: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/9.jpg)
Ganglia
• Developed out of HPC community,
• Will monitor worker as well as system head nodes,
• Can have sub nodes reporting to a master to create grid monitoring,
• Example:– http://oxgrid-vom.ierc.ox.ac.uk/ganglia/
![Page 10: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/10.jpg)
![Page 11: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/11.jpg)
Big Brother
• Designed to monitor individual systems,• Simple interface giving immediate feedback on
overall system status,• Different providers can be added for additional
services such as different process to be monitored etc.
• Can be difficult to look at historical trends though,• Example;
– http://cerb-mds.bris.ac.uk/bb/bb.html
![Page 12: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/12.jpg)
![Page 13: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/13.jpg)
Grid Interoperability Test Scripts
• Developed by Southampton e-Science Centre,
• Tests in series each of the standard grid functionalities for a specified node
• Wrapper to test in parallel many systems• Example of the results
– http://www.ngs.ac.uk/ops/gits/oxford/NationalGridService.html
![Page 14: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/14.jpg)
![Page 15: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/15.jpg)
INCA
• Developed by SDSC and TeraGrid• Extensible framework for monitoring• Tests the following as standard
– Static system information– Installed software versions– Network performance– Load both on head and queue system if available
• Additionally the UK NGS has developed a plug-in for the GITS tests.
• Example– http://inca.grid-support.ac.uk/
![Page 16: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/16.jpg)
![Page 17: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/17.jpg)
Testing the behaviour of a Grid
• Define a set of concrete requirements for connected systems
• Write tests to verify requirements • Periodically run tests and collect data across all of the system
• Publish data and archive for reporting
• Automate Steps 3 and 4 to provide real time system status information
![Page 18: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/18.jpg)
Connecting to existing production systems
• Determine monitoring requirements for systems to be connected
• Write independent tests for service being provided.
• Write information providers to fit tests into existing monitoring frameworks
![Page 19: Monitoring and performance measurement in Production Grid Environments David Wallom.](https://reader038.fdocuments.in/reader038/viewer/2022110322/56649d245503460f949fa7b9/html5/thumbnails/19.jpg)
Conclusions
• Monitoring must be based on a well known set of requirements for admins (both VO and systems) & users
• There are several products available to provide monitoring frameworks, each can be extended beyond initial capabilities
• Life would be made a lot simpler if there was a standard monitoring schema which could then be used to plug-in grid and system information into all monitoring frameworks!