GridView - A Monitoring & Visualization tool for LCG

25
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting 15.09.2006

description

GridView - A Monitoring & Visualization tool for LCG. Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting 15.09.2006. Gridview : New Developments (During 27 th April to 15 th September). - PowerPoint PPT Presentation

Transcript of GridView - A Monitoring & Visualization tool for LCG

Page 1: GridView  - A Monitoring & Visualization tool for LCG

GridView - A Monitoring & Visualization

tool for LCGRajesh Kalmady, Phool Chand,

Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav

B.A.R.C.

BARC-CERN/LCG Meeting 15.09.2006

Page 2: GridView  - A Monitoring & Visualization tool for LCG

Gridview : New Developments(During 27th April to 15th September)

• Enhancements to Gridftp file transfer monitoring

• Development of summarization and presentation modules for

– Job Monitoring

– Service Availability Monitoring

• Deployment of all the new developments to production system

Page 3: GridView  - A Monitoring & Visualization tool for LCG

File Transfer Monitoring

• Enhanced Gridftp summarization and presentation modules for– VO-wise distribution of overall data transfers– VO-wise distribution of data transfers per Site– Site-wise distribution of data transfers per VO

• Developed graphs and reports for data transfers from all sites to a given site (Hourly, Daily reports)

Page 4: GridView  - A Monitoring & Visualization tool for LCG

File Transfer Monitoring : Overall VO-wise Details

Page 5: GridView  - A Monitoring & Visualization tool for LCG

File Transfer Monitoring : Site-wise details for a particular VO

Page 6: GridView  - A Monitoring & Visualization tool for LCG

Job Monitoring• Developed summarization module for

computation of job statistics • Developed presentation module to display

periodic Graphs and Reports for– Job Status (Total Number of Jobs in various States)– Job Success Rate– Job Resource Utilization (Elapsed time,CPU, Memory)– Average Job Turnaround time (RB Waiting, Site

Waiting, Execution Time)– Site, VO and RB-wise distribution– Hourly, Daily, Weekly and Monthly reports

Page 7: GridView  - A Monitoring & Visualization tool for LCG

Job Monitoring (Cont…)• Developed periodic Graphs and Reports for

– Overall Summary• sites with high/low job execution rate• sites with high/low job success rate• VOs running more/less jobs etc

– Possible to view job statistics for any user selected combination of VO, Site and RB

Page 8: GridView  - A Monitoring & Visualization tool for LCG

Job Status : State-wise Distribution

Page 9: GridView  - A Monitoring & Visualization tool for LCG

Job Status : VO-wise Distribution

Page 10: GridView  - A Monitoring & Visualization tool for LCG

Job Status : RB-wise Distribution

Page 11: GridView  - A Monitoring & Visualization tool for LCG

Job Status : Site-wise Distribution

Page 12: GridView  - A Monitoring & Visualization tool for LCG

Job Monitoring : Job Success Rate

Page 13: GridView  - A Monitoring & Visualization tool for LCG

Job Monitoring : Average Job Turnaround time

Page 14: GridView  - A Monitoring & Visualization tool for LCG

Service Availability Monitoring • Developed summarization module for computation of

Service Availability – based on SAM Test Results – AND (critical services) of OR (redundant services)

• Developed presentation module to display periodic Graphs and Reports for– Central Service Availability (FTS, LFC, RB)– Aggregate tier-1 site Availability– Site-wise availability for individual tier-1 sites– Site-wise service availability of tier-2 sites (grouped by

associated VOs)– Detailed availability of various services (CE, SE, SRM) and their

individual instances running at a particular site

Page 15: GridView  - A Monitoring & Visualization tool for LCG

Service Availability Monitoring (Cont…)

• Reports on Hourly, Daily, Weekly and Monthly basis

• Tracability from Aggregate Availability to Individual Service Instance Availability

• Provision for saving user preferences based on certificates

Page 16: GridView  - A Monitoring & Visualization tool for LCG

Service Availability Monitoring : Central Service Availability

Page 17: GridView  - A Monitoring & Visualization tool for LCG

Service Availability Monitoring : FTS Instance Availability

Page 18: GridView  - A Monitoring & Visualization tool for LCG

Service Availability Monitoring : Aggregate T1 Site Availability

Page 19: GridView  - A Monitoring & Visualization tool for LCG

Service Availability Monitoring : Tier-1 Site Availability

Page 20: GridView  - A Monitoring & Visualization tool for LCG

Service Availability Monitoring : Site Detail Availability

Page 21: GridView  - A Monitoring & Visualization tool for LCG

On-going Work

• Presentation of Detailed SAM test results for traceability from Availability Graphs to corresponding tests

• Development of Weekly and Monthly reports for All to Given site data transfers

• Modification to Gridftp file transfer GUI and Reports in order to enable Multiple site selection (new request)

Page 22: GridView  - A Monitoring & Visualization tool for LCG

Future Work

• Visualization of FTS Statistics

• Archival of Job data for jobs submitted directly to CE

• Interfacing GridView with Information System (Top level BDII) for Resource Availability– Compute nodes (WNs), Storage etc

Page 23: GridView  - A Monitoring & Visualization tool for LCG

Future Work : Visualization of FTS Statistics

• Currently GridView visualizes gridftp data transfer rates across the sites.

• FTS statistics to be visualized include

– Successful transfers

– Failure rates

– VO-wise, FTS server-wise and Channel-wise details of data transfers

Page 24: GridView  - A Monitoring & Visualization tool for LCG

Problems

• No data is being published to R-GMA table JobMonitor since 2 months (in spite of repeated reminders)

• Gridview Availability Depends on – R-GMA Service– Oracle Database Service– SAM/SFT tests

• Instabilities in Gridview service caused by– R-GMA Instabilities

• Registry failures, Monbox failures, Data loss etc.

– Occasional Oracle downtime– Unannounced software upgrades on production machines leading to

broken code

• Subsequently, Gridview address added to cern-quattor-announce mailing list and upgrades done manually by Gridview team

Page 25: GridView  - A Monitoring & Visualization tool for LCG

Thank You

Your comments and suggestions please