LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

14
M.Biasotto, Padova, 24 aprile 2002 M.Biasotto, Padova, 24 aprile 2002 1 1 LNL CMS Farm monitoring Farm monitoring Massimo Biasotto - LNL Massimo Biasotto - LNL

Transcript of LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

Page 1: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 11

LNL

CMS

Farm monitoringFarm monitoring

Massimo Biasotto - LNLMassimo Biasotto - LNL

Page 2: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 22

LNL

CMS

Local Farm MonitoringLocal Farm Monitoring

LNL experiences with “local” farm monitoringLNL experiences with “local” farm monitoring

July 2001, we first started with July 2001, we first started with MRTGMRTG: a lot of problems: a lot of problems– heavy footprint on the serverheavy footprint on the server– unreliable (processes hanging with unreachable hosts)unreliable (processes hanging with unreachable hosts)– not scalablenot scalable

November 2001, November 2001, remstatsremstats: improvements: improvements– lighter and more robust than MRTGlighter and more robust than MRTG– more flexibility in graph display and alarm managementmore flexibility in graph display and alarm management– still scalability problems (it works in sequential mode)still scalability problems (it works in sequential mode)

Page 3: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 33

LNL

CMS

Remstats exampleRemstats example

Page 4: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 44

LNL

CMS

Remstats exampleRemstats example

Page 5: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 55

LNL

CMS

Remstats vs MRTGRemstats vs MRTG

Page 6: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 66

LNL

CMS

GangliaGanglia

March 2001, ganglia: many advantagesMarch 2001, ganglia: many advantages– much greater resolution: metrics sampled every 15 sec much greater resolution: metrics sampled every 15 sec

instead of 5 mininstead of 5 min– scalability: based on a distributed architecture, with scalability: based on a distributed architecture, with

data exchange via multicast channeldata exchange via multicast channel– single host metrics easily integrated to produce single host metrics easily integrated to produce

“cumulative” overview graphs“cumulative” overview graphs– there is still need to customize the tool (adding more there is still need to customize the tool (adding more

metrics, customizing web pages, etc)metrics, customizing web pages, etc)

Page 7: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 77

LNL

CMS

Page 8: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 88

LNL

CMS

Ganglia exampleGanglia example

Page 9: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 99

LNL

CMS

Ganglia exampleGanglia example

Page 10: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1010

LNL

CMS

Ganglia exampleGanglia example

Page 11: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1111

LNL

CMS

NetsaintNetsaint

During our survey of the existing monitoring tools, Netsaint During our survey of the existing monitoring tools, Netsaint was considered and discardedwas considered and discarded– Main reason: it didn’t monitor host performance metrics, Main reason: it didn’t monitor host performance metrics,

like % cpu, load, network traffic, etc. (at least, not like % cpu, load, network traffic, etc. (at least, not without without heavyheavy customization). Maybe now the necessary customization). Maybe now the necessary plugins have been added.plugins have been added.

– It didn’t have a database to record the historical dataIt didn’t have a database to record the historical data It monitors the status of the hosts (up or down) and of It monitors the status of the hosts (up or down) and of

some network servicessome network services It provides a log of all relevant events (hosts/services going It provides a log of all relevant events (hosts/services going

up or down, etc.)up or down, etc.) Probably other features, but I’ve never investigated the tool Probably other features, but I’ve never investigated the tool

deeplydeeply

Page 12: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1212

LNL

CMS

Grid monitoringGrid monitoring

Grid monitoring is different than “local” farm monitoringGrid monitoring is different than “local” farm monitoring– you cannot monitor on a WAN all the performance you cannot monitor on a WAN all the performance

metrics of all the farm nodes (and you probably don’t metrics of all the farm nodes (and you probably don’t want to)want to)

Currently, Netsaint is used on DataGrid Testbed to monitor Currently, Netsaint is used on DataGrid Testbed to monitor the status of the testbed nodes and their grid servicesthe status of the testbed nodes and their grid services

http://infngrid.ct.infn.it/index-orig.htmlhttp://infngrid.ct.infn.it/index-orig.html (infn-tb/guest) (infn-tb/guest) Is this useful for CMS?Is this useful for CMS? Can other useful features be added?Can other useful features be added?

Page 13: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1313

LNL

CMS

Netsaint exampleNetsaint example

Page 14: LNL CMS M.Biasotto, Padova, 24 aprile 2002 1 Farm monitoring Massimo Biasotto - LNL.

M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1414

LNL

CMS

Adapting CMS monitoring to GridAdapting CMS monitoring to Grid

What are the CMS requirements for “Grid monitoring”?What are the CMS requirements for “Grid monitoring”? What do we want to monitor and why?What do we want to monitor and why? Once these questions have been addressed, we can decide Once these questions have been addressed, we can decide

if Netsaint fulfills the requirementsif Netsaint fulfills the requirements Integrating Netsaint into existing CMS farms shouldn’t be Integrating Netsaint into existing CMS farms shouldn’t be

difficult difficult – the main issue is probably the setup (and maintenance) the main issue is probably the setup (and maintenance)

of the central repositoryof the central repository But it should be done only if there is a real need, not just But it should be done only if there is a real need, not just

for the sake of itfor the sake of it