CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the...

25
Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 CMS Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro On behalf of the CMS DAQ collaboration CHEP 2003, San Diego USA, March 2003
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the...

Page 1: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20031

CMS

Run Control and MonitorSystem for the CMS

Experiment

Michele GulminiCERN/EP – INFN Legnaro

On behalf of the CMS DAQ collaboration

CHEP 2003, San Diego USA, March 2003

Page 2: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20032

CMS

OutlineOutline

Run Control and Monitor System : RCMS

• RCMS Architecture• Session Managers• Subsystem Controllers• Services

• RCMS Prototypes• RCMS for Small DAQ Systems• RCMS Demonstrators

– Performance and Scalability Tests

• Plans

• Summary

Page 3: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20033

CMS

Run Control and Monitor SystemRun Control and Monitor System

RCMSInternetIntranet

InternetIntranet

UI

UI

UI

• The Run Control and Monitor System (RCMS) is the collection of hardware and software components responsible for controlling and monitoring the CMS experiment during the data taking.

• RCMS enables users to access and control the experiment from any part in the world providing a “virtual counting room”, where physicists and operators can effectively taking shifts from a distance.

• RCMS views the experiment as a set of partitions, where a partition is a grouping of entities that can be operated independently.

• Main operations are configuration, monitoring, error handling, logging and synchronization with other subsystems.

RCMS

Trigger

Event Builder

Event Filter

DCS

ComputingServices

UI

RCMS Context

Page 4: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20034

CMS

PartitionsPartitions Example Example

Session Manager-A

UIUIUI

Services Connection

ServicesServicesServices

CSCtrl

TRGCtrl

DCSCtrl

EVFCtrl

CS Sub- System

GlblMuCal

DCS Sub- System

EVB Sub-System TRG Sub-System EVF Sub-System

Session Manager-B

UIUIUI

EVBCtrl

FED BuilderSub-System

RU BuilderSub-System

FED-BCtrl

RU-BCtrl

Page 5: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20035

CMS

RCMS Logical LayoutRCMS Logical Layout

• The execution of the RCMS is organized on the basis of “Sessions”.

• A Session is the allocation of the hardware and software of a CMS partition needed to perform data-taking.

• Multiple Sessions may coexist and operate concurrently

• Each Session is associated with a Session Manager (SMR), that coordinates all the actions

Page 6: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20036

CMS

Sub-System Controller (SSC)Sub-System Controller (SSC)

• A SSC consists of a Function Manager (FM) and a local database (DB) service.

• There is one FM per partition that receives requests from a Session Manager (SMR) and transforms them into the corresponding requests for actions that are sent to the sub-system.

• The local DB service can be used as a proxy to the services.

Page 7: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20037

CMS

Basic RCMS ServicesBasic RCMS Services– SECURITY SERVICE

• login and user account management;

– RESOURCE SERVICE (RS)• information about DAQ

resources and partitions;– INFORMATION AND MONITOR

SERVICE (IMS)• Collects messages and

monitor data; distributes them to the subscribers;

– JOB CONTROL• Starts, monitors and stops

the software elements of RCMS, including the DAQ components

– PROBLEM SOLVER• Uses information from the

RS and IMS to identify mulfunctions and attempts to provide automatic recovery procedures where applicable

Page 8: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20038

CMS

Resource Service Block DiagramResource Service Block Diagram

• The Resource Service (RS) handles all the hardware and software components of the DAQ system including its partitions.

SS UserDB

RS

IMS

Job Ctrl

PS

SSC

Ser

vice

s C

on

nec

tio

n

Session Manager

RCMS

UIUIUI

ConfDB

LogDB

Page 9: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 20039

CMS

Information and Monitor Service Block DiagramInformation and Monitor Service Block Diagram

• The Information and Monitor Service (IMS) collects the information (log, warning, errors, monitoring, etc.) from the sub-systems and provides them to the subscribers.

SS UserDB

RS

IMS

Job Ctrl

PS

SSC

Ser

vice

s C

on

nec

tio

n

Session Manager

RCMS

UIUIUI

ConfDB

LogDB

Page 10: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200310

CMS

Time RequirementsTime Requirements

– Configuration and setup of the system: minutes

– Control (state change, execution of commands): seconds

– Monitoring: depending on the amount of data required

Information and Monitor Service:• Tens of subscribers• Peak: about 2000

messages (status change, log)

• Average: Tens to a few hundred messages/s

Page 11: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200311

CMS

RCMS PrototypesRCMS Prototypes

• RCMS for small DAQ Systems

– Fully functional RCMS systems targeted to small DAQs (Production systems, Testbeam DAQ systems)

– Real-life examples used to check the RCMS functionality.

• RCMS demonstrators

– Partially functional RCMS systems targeted to prove scalability issues.

– Test bed systems used to emulate slices or parts of the hierarchical structure of the final DAQ.

– Help to confirm the architecture and to evaluate and eventually select the technologies to be used in the final system.

Page 12: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200312

CMS

RCMS for small DAQsRCMS for small DAQs• Current Running Prototype:

– Designed to work together with XDAQ CMS online software framework (XDAQ: See Chep2003 J. Gutleber talk - “Using XDAQ in Application Scenarios of the CMS Experiment”)

– Available services:• Resource Service (RS)• Information and Monitor Service (IMS)• SubSystem Controllers (Function Managers)• Session Managers• GUIs

• Technologies and tools:

• Java Servlets (Apache Tomcat)• Sun “Java Web Services Developer Package” (JWSDP)

– JAXP, JAXM, XPath, ...• SOAP communication protocol• Databases

– XMLDB interface» eXist native XML database

– mySQL

Page 13: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200313

CMS

RCMS for Small DAQs – Current ApplicationsRCMS for Small DAQs – Current Applications

• CMS Muon Drift Tubes• Chamber Production DAQ (Legnaro - Italy)• Testbeam (CERN – next May)

• CMS Tracker• “ROD System Tests” (CERN)• Testbeam (CERN – next May)

• CMS TriDAS (CERN)• DAQ Column• TDR Demonstrator

Page 14: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200314

CMS

Session and Function ManagerSession and Function Manager Prototype PrototypeSS UserDB

RS

IMS

Job Ctrl

PS

SSC

Ser

vice

s C

on

nec

tio

n

Session Manager

RCMS

UIUIUI

ConfDB

LogDB

XML definitionXML definition

JavaJava

ImplementationImplementation

FF

SS

MM

SM/FM servletSM/FM servlet

• Function Managers and Session Manager have a built in Finite State Machine (FSM) to command the controlled components, and to track their state;

• The FSM is composed of a XML definition and a Java class implementation representing the actions to be performed;

• The definition and the implementation of the FSMs are managed by the Resource Service;

• Session Manager and Function Managers are launched when a new “Session” is opened, and can have a hierarchical structure;

Page 15: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200315

CMS

RS and IMS PrototypeRS and IMS Prototype

IMS Xpath Xpath Filter Filter

EngineEngineJAXM

XML messageTomcat servlet containerTomcat servlet container

NOTIFY

PUBLISH

Subs InfoSubs Info

JDOM FSJDOM FS

Java

Publisher

JAXM

Java Subscriber

JAXM

SUBSCRIBE

Tomcat/Tomcat/

JettyJetty

Soap

DB (eXist,DB (eXist,

File,mySQL)File,mySQL)

XDAQ

Application

XOAP

XMLDBXMLDB

Servlet container (TOMCAT)

Java Servlet

Resource Service

XML

Java client

Java Objs

XML Parser(CASTOR)

XML Parser

C++ client

XML Parser

XMLDB

Interface

REL DB

XML:DB

SOAP

Page 16: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200316

CMS

RCMS GUIsRCMS GUIs

• Generic GUI:– Insertion and retrieval of resources

(PCs, software, partitions, etc.)– Ability to command, set and retrieve

parameters from XDAQ applications– Scripting facility– Customisation facilities (plugins)

• Muon DT TestBeam GUI

Page 17: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200317

CMS

Legnaro T2 CMS farm:136 P3 1-1.2 GHz processors

RCMS DemonstratorsRCMS Demonstrators

Page 18: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200318

CMS

Demonstrator 1Demonstrator 1

• Exploring the ability to command a set of XDAQ executives running “empty” applications

• The time measured represents the time required to perform a state change of the entire cluster

.......... XDAQXDAQPC

XDAQXDAQ

PC

FMFMPC FM: Function Manager

.......... XDAQXDAQXDAQXDAQ

FMFM

.......... XDAQXDAQXDAQXDAQ

FMFM

FMFMPC SOAP

Page 19: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200319

CMS

Demonstrator 1Demonstrator 1

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50 60 70 80 90 100 110 120 130

Nr. of Nodes

Tim

e (

ms

)

Sequential FM

FM with Threads

2 intermediate FMs

120 nodes120 nodes

100 ms100 ms

Page 20: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200320

CMS

Demonstrator 2Demonstrator 2

0

1000

2000

3000

4000

0 1 2 3 4 5

Number of Web Services

To

tal N

o. o

f m

es

sag

es

/s(r

ece

ive

d b

y th

e s

erv

ice

s)

• Simplified version of a log message service based on Web Services technologies (Glue platform)

• 15 clients and a variable number of Web Services used

• The performance scales linearly with the number of instances of the service available

Page 21: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200321

CMS

IMS Prototype Test (I)IMS Prototype Test (I)

IMS Prototype

0

50

100

150

200

250

300

350

400

0 8 16 24 32 40 48 56 64 72

Number of Publishers

To

tal N

o. O

f m

ess

age

s/s

No Persistency

Percistency on File

Percistency on DB(mySQL)

PUBLISHPUBLISH

IMSIMSIMSIMS

IMSIMSPublisherPublisher mySQLmySQL DBDB

• Percistency on eXist XML native DB not plotted – very slow• Between 200 and 300 SOAP messages/s handled by the IMS prototype

Page 22: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200322

CMS

IMS Prototype Test (2)IMS Prototype Test (2)

PUBLISHPUBLISHNOTIFYNOTIFY

IMSIMSIMSIMSIMSIMS

IMSIMSIMSIMS

PublisherPublisher

IMSIMSIMSIMS

SubscriberSubscriberSUBSCRIBESUBSCRIBE

mySQLmySQL DBDBSOAP

• Performance improves augmenting the number of service instances

• Notification mechanism not optimized

• Test to be completed

4 Publishers - Persistency on DB

0

100

200

300

400

500

600

0 1 2 3

Number of Subscribers

To

tal N

o. O

f m

es

sa

ge

g/s

1 IMS

2 IMS

4 IMS0

200

400

600

1 2 3 4

Number of IMS Servlets

To

tal N

o. O

f m

ess

age

s/s

Page 23: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200323

CMS

IMS hierarchical structureIMS hierarchical structure

– Performance test done with the present prototypes:

– Commanding a cluster of DAQ application fits the requirements

– Information and Monitor Service prototype needs further investigation

– Notification architecture

– Hierarchical structure

IMS hierarchical structure:

..........

IMSIMS

IMS proxyIMS proxyIMS proxyIMS proxy

XDAQXDAQ XDAQXDAQ ..........XDAQXDAQ XDAQXDAQ

Page 24: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200324

CMS

Future – OGSA???Future – OGSA???

• RCMS architecture is service and web oriented

• Web services development tools (Apache Axis, Glue) may help to deploy reliable services quickly

• Open Grid Service Architecture (OGSA) (http://www.globus.org/ogsa) is Web Services based

• An alpha release of the framework is now available

• First official release foreseen in a few months time

• OGSA could be adopted for the RCMS services, providing several advantages:

• RCMS open to the Grid world

• Well supported and reliable framework

• Useful built-in services

• OGSA is under evaluation:

• The RCMS Resource Service has been successfully ported (Globus 3.0 alpha release)

• functionality and performance tests in progress

Page 25: CMS Michele Gulmini, CHEP2003, San Diego USA, March 2003 1 Run Control and Monitor System for the CMS Experiment Michele Gulmini CERN/EP – INFN Legnaro.

Michele Gulmini, CHEP2003, San Diego USA, March 200325

CMS

Summary and PlansSummary and Plans

• RCMS architecture defined

• Prototypes developed aiming:– Control of small DAQs to be used in Testbeam applications:

• Next May Testbeams (CMS Tracker and Muon DT) will provide important feedbacks on its functionality

– Demonstrators aiming the validation of the architecture in terms of performance and scalability

• Further investigation needed mainly on the IMS

• Open Grid Software Architecture (OGSA) under evaluation

• Problem Solver development in progress:– Error detection and recovery

• Databases studies and evaluation foreseen