A lightweight Monitoring and Accounting system for LHCb DC'04 production

Post on 31-Dec-2015

40 views 1 download

Tags:

description

A lightweight Monitoring and Accounting system for LHCb DC'04 production. V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya Carrillo. Outline. Manifesto Monitoring Web interface Internals Accounting Web interface Internals Outlook URLs. Manifesto. - PowerPoint PPT Presentation

Transcript of A lightweight Monitoring and Accounting system for LHCb DC'04 production

1

A lightweight Monitoring and Accounting system for LHCb DC'04 production

V. GaronneR. Graciani Díaz

J. J. Saborido SilvaM. Sánchez GarcíaR. Vizcaya Carrillo

2

Outline

Manifesto Monitoring

Web interface Internals

Accounting Web interface Internals

Outlook URLs

3

Manifesto

Monitoring and Accounting are tasks in DIRAC377

DIRAC is a Production grid for LHCb The Monitoring reports the status of jobs while

in the WMS (Workload Management System)366

Instantaneous snapshot of the system No historic records

The Accounting records the status of jobs after leaving the WMS Provides historic record, accumulated statistics

and evolution of recorded variables with time Main users: production and site managers

4

Design choices

Monitoring Job information stored centrally in the WMS

Info Provided directly by the job and the WMS Passive services: no pushpushing of information

No need for a common consumer API Job and Application state stored together

Accounting Separate infrastructure from the monitoring

Jobs can never be on the Accounting and the Monitoring

Domain specific: LHCb production jobs

5

Information Flow

WMS

Web interface Web interface

Job Database Accounting Database

Cleaner Agent

Accounting

Write Read

Monitoring

Read Write

Job

Use

rsB

ack

en

dS

erv

ices

& A

gen

ts

Job Heart-beat

DIRAC

6

Monitoring Web Interface 1 Interface to query monitoring service

JobId popup a window with job details if clicked

7

Monitoring Web Interface 2

The overview shows predefined plots on the production Generated

every few minutes

PyPyCCharthart used as graphics engine

100% python Supports SVG

Running jobs by site

8

Monitoring Web Interface 3 Job status by site and production id

9

Monitoring Internals

It consists of a XML-RPC service exposing whatever parameters are known to DIRAC

Job parameters stored internally by DIRAC Primary parameters

Execution site, job status, job owner etc. Fixed, centrally defined: fast access Can query on them

Secondary parameters Number of steps, internal job state, etc Defined by the production job itself Stored as key-value pairs Slower access. Cannot query on them

10

JMS basic API example

from xmlrpclib import ServerProxyserver = ServerProxy(monitoring_url)

#Retrieve list of jobs verifying some conditionsconditions = {'Status': 'running', 'Site': 'DIRAC.CERN.ch' }jobreq = server.getJobs(conditions)

#Print some parameters for each jobif jobreq['Status']: for jobid in jobreq['Value']: print server.getJobSite(jobid) print server.getJobParameter(jobid, 'LocalBatchId')

#Bulk operationssum = server.getJobsPrimarySummary(jobreq['Value'])

~3 s to select 95 out of 50k jobs

~0.7 s

~40 s

11

Accounting Web Interface 1

GUI for querying the Accounting

Shows results As graphics As table As Excel sheet

Several types of report Only a few shown

here

12

Accounting Web Interface 2

Used resources by site

13

Accounting Web Interface 3

Used resources by event type Mb/job CPU/job Failed jobs CPU vs. Exec

time Input and

Output data vs. CPU

14

Accounting Web Interface 4

Produced data by production ID Rates Cumulative Number of

events Gb of output

15

Accounting Web Interface 5

WMS statistics on DIRAC's performance Plots

Job execution time vs. WMS waiting time Job execution time vs. WMS matching time

Granularity Per site Per production Integral

Allows assessment of DIRAC's performance

16

Accounting Internals

Job and DIRAC statistics kept in a database Site contribution Data produced and used by jobs and steps Timing for jobs, steps and DIRAC internals

Separate XML-RPC interfaces to populate and query the accounting tables Both interfaces have restricted access

Jobs are moved to the accounting system by a cleaner agent after being validated

17

Accounting Usage About 10 hits per day Time to generate daily static reports:

8 min 60-70% of the time querying the

database 30-40% of the time in the drawing

packageServer load<0.2

Total: 169 kjobs

18

Outlook

Monitoring page Transactions in monitoring updates Further optimisation (bulk operations...) Search for a faster rendering package Make the web page dynamic: Less

reloads Accounting

New report types Normalized CPU Contribution by country Rate by site, country etc...

19

URLs

Monitoring page http://fpegaes1.usc.es/dmon/DC04/joblist.

html Mirror on:

http://lhcb02.usc.cesga.es/dmon/DC04/joblist.html

Direct link to overview pages http://lhcb.ecm.ub.es/DC04/Monitoring

Accounting page http://lhcb.ecm.ub.es/DC04/Accounting/