December 2005 Scaling Up PVSS Phase II Test Results Paul Burkimsher IT-CO.

December 2005

Scaling Up PVSS

Phase II Test Results

Paul Burkimsher IT-CO

Aim of the Scaling Up Project

Investigate functionality and performance of large PVSS systems

In Phase 1 we reassured ourselves that PVSS scales to support large systems

Provided detail rather than bland reassurances

Phase 2: WYSIWYAF

Began with a questionnaire to you to establish your concerns

Eclectic list of “hot topics of the moment”– Oracle Archiving– Alerts– Regular reconfiguration of channels

(alerts and setpoints)– Backup and restore– Configuring all channels at startup

Your requests (cont.)

– OPC performance– Local DB cache– Central Panel Repository–Windows/Linux lurking limits– System startup time (DPT distribution)– Task Allocation

From these requests, we initially picked out four for investigation:

– Task Allocation– Backup of a running system– Alerts– Panel Repository

Task AllocationRecall that PVSS is manager based

and any manager can be scattered to another machine (not just UIs).

CTRLControlmanager

APIAPI-Manager

DDriver

DBDatabase-Manager

UIUserinterface

Runtime

DDriver

EVEventmanager

UIUserinterface

Editor

UIUserinterface

Runtime

EVEventmanager

CTRLControlmanager

UIUserinterface

Editor

APIAPI-Manager

DBDatabase-Manager

DDriver

UIUserinterface

Runtime

Task Allocation

More than 20 different tests conducted to investigate the effect of moving managers around.

Results have been available on the web for some time (URLs at the end)

Results were surprising and went against our (& ETM’s!) assumptions of what would be “better”…

What we measured…A task allocation was deemed “better” if

it supported a higher number of datapoint changes per second (“throughput”) than a system running entirely on a single processor.

We observed the number of changes per second that the system could support before one of the following became overloaded : – CPU usage – Memory usage – Network traffic – Disk traffic

What we saw…

As throughput increases on a typical PVSS system, the machine first becomes CPU bound.

The Event Manager (EM) is the task most in need of CPU.

We expected that scattering the EM away from the Data Manager (DM) would cause slow-down because of the high traffic between these tasks. WRONG!

Scattering the EM

Despite the overhead of sending traffic EM DM over the external network, scattering the EM caused throughput to be significantly increased. (+75%)

The Alert-Event Screen (AES) is CPU-hungry.

Runs in a UI task which can be scattered.

Beware: Each additional AES not only increases the load on its own machine, but also increases the load on the EM to which it is connected.

Recommendation

Execute as few AESs as possible outside the main control room.

When you are not actually looking at the AES, leave it in “stopped” mode. (Screen is not updated.)

Scattering other managers

Can improve throughput, but not as spectacularly as when scattering the EM.

Moving the DM is useful, but more delicate (i.e. many Value Archive (VA) connections?)

Absolute Performance

The average number of “changes per second” that can be supported depend on the nature of the traffic.

A steady data flow is easier to cope with.

Irregular bursts of rapid traffic tend to overflow the queues between the managers. (Queue lengths are configurable.)

Load Management

PVSS implements several Load Management schemes, e.g.

Alert screen update pauses during a brief avalanche

Alert screen switches into Stopped mode if the sustained number of alerts arriving is crazy

Load Management - II

Load Shedding, where EM will cut the umbilical to rogue managers rather than be brought down itself.

I recommend that shift operators be taught to recognise the symptoms when they occur

Multiple CPUs

An alternative to scattering: Buy a dual processor!

2 CPUs are generally enough to satisfy even the hungry Event Manager

Our dual-CPUs became disk bound when we pushed them.

---Tribute to the well balanced design of modern PCs!

Look how much memory you are using.

Buy enough of it.If you are worried about

performance, paging is wasted effort!!

Task summary

Give plenty CPU capacity to the EM by:– Buying a fast machine– Scattering the EM– Buying a dual CPU machine

Backup

In the development systems nobody did backup.

PVSS backup is somewhat intricate.

Need for a set of recipes of backup instructions

18-page Report

What needs backing upWhat this means in PVSSHow to back it up

How to restore (rather important!)

Handout

Four Parts

1) Executive Summary2) Recipes3) Detailed Background Description4) Frequently Asked Questions

about Backup.

(I’m not going to go through them, just let you know that they exist.)

Alerts

PVSS 3.5 (due in 200x) will contain new functionality for summary alerts and alert provocation during ramping.

I did not do in depth performance measurements on the existing system, beyond those I described to you in Phase 1 of S.U.P.

At the request of one experiment though, we did investigate

“What is the load of an alert definition on a PVSS system?”

Results on the web (Test 38).

Loads of Alert Definitions

We showed that it is safe to declare any number of alerts and even to activate them provided that the data values stay in range.

It is provocation of the warnings and alerts that incurs a significant CPU load.

Memory load

Test 39 looked at memory usage of Alerts.

Requirement of 2.5KB per DPE alert.

Panel Repository

Owing to staffing changes in the section, it was not possible to address this topic

On the subject of panels…

During the tests I would have found it helpful to have a ready display of the interconnection status of the distributed systems.

I recommend that there is something showing this on the top-level display panel. (Even just a grid of red/green pixels showing connection status.) Lost connections should raise an alert.

December 2005 Scaling Up PVSS Phase II Test Results Paul Burkimsher IT-CO.

Documents

Transcript of December 2005 Scaling Up PVSS Phase II Test Results Paul Burkimsher IT-CO.

Are women in Europe still having babies? Marion Burkimsher University of Lausanne, Switzerland.

Was slowing postponement really the engine for TFR rises in European countries? Marion Burkimsher Affiliate researcher University of Lausanne, Switzerland.

PVSS Oracle scalability

Cloud Scalability Patterns · Scaling Up == Vertical Scaling Scaling Out == Horizontal Scaling •Architectural Decision –ig decision… hard to change. Scaling Up: Scaling the

09 FA HE PVSS SI Event PVSS Cross Harbour - Siemens internal use only / © Siemens AG 2010. ... 13 Main SIMATIC PLCs S7-400, S7-300 Several PLC outstations S7 ... 09_FA_HE_PVSS_SI_Event_PVSS_Cross_Harbour.ppt

Laurent Locatelli LHCb CERN Calo commissioning meeting 16th April 2008 Trigger Validation Board PVSS control status 1.

Trends in young people’s religiosity and cohort religiosity trends Marion Burkimsher Affiliated researcher, ISSRC.

PVSS Support service Daniel Davids EN-ICE Workshop 23 April 2009.

24.07.2006Cédric Potterat - LPHE Monday Meeting1 PVSS for VELO and TELL1 at EPFL.

2014 Peripheral Vascular Surgery Society (PVSS) 24th Winter Annual Meeting

Scaling Up PVSS

PVSS Users’ Meeting 2005 Organization and Programme.

PVSS State Machine

ECAL Prototype DCS system using PVSS Günther Dissertori (for Alison Lister) ETHZ 8.7.2003.

Cohort fertility trends across Europe: commonalities and anomalies Marion Burkimsher Affiliated to the University of Lausanne.

Trigger Validation Board PVSS panels tutorial

Using PVSS for the control of the LHCb TELL1 detector emulator (OPG)

Spacing of children in Switzerland: constancy or change? Marion Burkimsher Affiliated to University of Lausanne.

Fair, Responsive Scheduling of Engineering Workﬂows …ijb/andrew_burkimsher.pdf · Fair, Responsive Scheduling of Engineering Workﬂows on Computing Grids Andrew Marc Burkimsher

PVSS Navigator PP-100618-a-MMA_PVSSNavigator.pptx Markus Marchhart PVSS Navigator 1.