TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

15
TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B

Transcript of TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Page 1: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

TCS Events, the Data Dictionary, and Alarms (continued)

Michele, Doug, and Chris

Version 2B

Page 2: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Points for Discussion

1) Logging, events, and alarm handler: Strategy for clean-up and reconcile.

2) Clean-up unused portion of XML files. Is large number of files for events and data dictionary values a problem?

3) Re-evaluate use of SYSLOG and event logging.

4) Enforce bookend events (start & complete/warning/failed).

5) Event correlation (time of event issue)? TG expound.

Software Group Presentation 202 May 2014

Page 3: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Item 2) XML Files

2) Clean-up unused portion of XML files. Is large number of files for events and data dictionary values a problem?3968 Data Dictionary files 2979 Event files:1103 error and 268 warning, 1608 otherGCS and AOS generate their events on-the-fly

GCS has ~320 error and ~50 warning events AOS has its own event priority scheme, not sure how many events are

generated

A.Clean up unused portion of files clean up code which handles the files.

B.Should all events be in XML? (requiring GCS and AOS changes)

C.General event clean up needed for AOS to be consistent with the rest of the TCS.

D.The XML files have not posed problems for maintenance thus far – concerns? Software Group Presentation 302 May 2014

Page 4: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Item 3) SYSLOG and Event Logging

3) Re-evaluate use of SYSLOG and event logging.

SYSLOG was designed to be for temporary, free form debug statements for the TCS. Once an algorithm is debugged, the messages should be removed from the SYSLOG. Some exceptions…

John uses messages from various subsystems for his diagnostics plots. GCS generates lots of guiding and wfs’ing messages which would be

awkward for events.

Accessible for quickly adding messagesSYSLOG reports the execution time (bookends here too), the associated command handle (the ultimate bookend), and “who” invoked the command.

Software Group Presentation 402 May 2014

Page 5: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Item 3) SYSLOG and Event Logging

In contrast…Events include the calendar date/time as well as MJD UTC which can be used conveniently for correlation with telemetry.Events are presented in a coherent and predictable structureThe event log does not have the extra messaging found in the SYSLOG which can be a distraction (network, rpc, and gshm server issues), and other free form messages which are the strength of the SYSLOG.

As a programmer who uses these files heavily for debugging, this discussion combined with Item 4 (next slide) make me want to keep the two logs separate.

Software Group Presentation 502 May 2014

Page 6: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Item 4) XML Files

4) Enforce bookend events (start & complete/warning/failed).

Bookend events – Enforce by convention Verification of input parameters (observers do not always send in the TCS

commands what they think they sent) Eases capture of system actions during the processing of the particular

command, particularly asynchronous actions Eases diagnosis of seemingly “stuck” commands Can be used to improve efficiency of TCS “super” commands (Preset,

Offset) by pin-pointing unnecessary wait states

Software Group Presentation 602 May 2014

Page 7: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Item 1) Reconcile Events with Severity Codes

There are a significant number of Level 3 events which need to be re-defined (373 in XML, 3 in GCS on-the-fly) to the new Level 4.

OSS Level 4 events created on-the-fly actually mean OK, so they will stay. GCS: 62 Level 4 events created on-the-fly will have to be examined. 20 Level 4 in XML (GCS, IIF, and MCS) will have to be examined.

Priority EventColor

EventMeaning

New Event Color

New Event Meaning

Severity Flag(Data Dictionary)

0 white defacto initialization

1 red error red error error

2 yellow warning yellow warning warning

3 green Ok? cyan intentional intentional

4 cyan Ok/Info? green OK OK

5 white Ok? gray debug (start/complete) debug

6 white unknown

Software Group Presentation 702 May 2014

Page 8: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Item 1) Reconcile Events with Severity Codes

The intentional category is to accommodate systems where something has deliberately been disabled essentially put into a non-functional state.

This state does not require attention and should NOT be propagated to an alarm handler.

The intentional state has to be defined by a strict set of rules (for ECS, when the breaker = OPEN and mode = LOCAL) or provided by an authority on the issue.

Ideally, the rules would be encapsulated in a configuration file (versus hard-coded) which would be re-read at regular intervals.

Software Group Presentation 802 May 2014

Page 9: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

ECS as an Example 1

ecs.severity = 1 (FACSUM) ecs.mv.severity = 2 (ECSGUI eyebrow)

ecs.mv.heatExchangers.severity = 2 (ECSGUI breadcrumb) ecs.mv.heatExchangers.hx0401.severity = 2 (AH & ECSGUI)

ecs.mv.heatExchangers.hx0402.severity = 2 (AH & ECSGUI)

ecs.mv.heatExchangers.hx0403.severity = 4 (AH & ECSGUI)

ecs.mv.heatExchangers.hx0404.severity = 4 (AH & ECSGUI)

ecs.mv.NCValves.severity = 4 (ECSGUI breadcrumb)… ecs.dampers.severity = 1 (ECSGUI eyebrow)

ecs.dampers.dp0405.severity = 1 (AH & ECSGUI)

ecs.dampers.dp0406.severity = 4 (AH & ECSGUI)

ecs.dampers.dp0407.severity = 2 (AH & ECSGUI)

ecs.dampers.dp0408.severity = 3 (ECSGUI)

Software Group Presentation 902 May 2014

Page 10: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

ECSGUI Eyebrow and Breadcrumbs

Software Group Presentation 10

Eyebrow

Breadcrumb

PE0412 is the device. It is the “leaf” in the system.

02 May 2014

Page 11: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

ECS as an Example 2

ecs.severity = 4 (FACSUM) ecs.mv.severity = 3 (ECSGUI eyebrow)

ecs.mv.heatExchangers.severity = 3 (ECSGUI breadcrumb) ecs.mv.heatExchangers.hx0401.severity = 3 (ECSGUI) ecs.mv.heatExchangers.hx0402.severity = 4 (AH & ECSGUI)

ecs.mv.heatExchangers.hx0403.severity = 4 (AH & ECSGUI)

ecs.mv.heatExchangers.hx0404.severity = 4 (AH & ECSGUI)

ecs.mv.NCValves.severity = 4 (ECSGUI breadcrumb)… ecs.dampers.severity = 4 (ECSGUI eyebrow)

ecs.dampers.dp0405.severity = 4 (AH & ECSGUI)

ecs.dampers.dp0406.severity = 4 (AH & ECSGUI)

ecs.dampers.dp0407.severity = 4 (AH & ECSGUI)

ecs.dampers.dp0408.severity = 4 (AH & ECSGUI)

Software Group Presentation 1102 May 2014

Page 12: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Reconcile Events with Severity Codes

This scheme accommodates binary data (alarm and not alarm) very nicely. Alarm stands for error/warning in AH jargon. In a few slides there is more discussion about this.

What about analog data and limit checking? Should the checking be done in the TCS or the AH? It could depend on the type of limit checking which must be done for any particular value.

Software Group Presentation 1202 May 2014

Page 13: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Analog Data and Limits

Analog Data Dictionary items can haveStatic limits – These can be defined and used from the XML definition (e.g., 0 ≤ RA < 24). This may cover the majority of the cases.Dynamic limits – These would override the values in the XML. They can be defined in a TCS configuration file which is read at regular intervals. This means someone could change the limits and no re-compilation nor re-start would be required (just as currently done for some subsystems configuration files) (e.g., new characteristics for new device).

If these values are in a configuration file, there could be a way to feed this into the AH.

Complex limits – The are values which require some computation to be performed or accommodation of values inserted into the system. This potentially requires significant data from the TCS, so the computation/knowledge should be kept in the TCS.

Software Group Presentation 1302 May 2014

Page 14: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Analog Data and Limits

If all the limit checking were done by the TCS subsystems, There would be no need for synchronization between TCS and AH

in terms of limits as we would be keeping all the intelligence in one place.

We would not be fully using functionality available in AH. TCS would only send error (1) and warning (2) flags to AH for not

only binary data, but also for analog data major (HIHI, LOLO) and minor (HIGH, LOW) alarms. We would not be using the HIHI and HIGH fields.

HIHI = >HIGH HIGH = >2 LOW = 2 (minor) (LOLO < value ≤ LOW) LOLO = 1 (major) (value ≤ LOLO)

An alternative is that we allow static checking to be done on our analog values by the AH, and keep the dynamic/complex limit checking in TCS.

Software Group Presentation 1402 May 2014

Page 15: TCS Events, the Data Dictionary, and Alarms (continued) Michele, Doug, and Chris Version 2B.

Hot Off the Presses

Explore some use of our event system and the DDS ability to export events to supply the AH for a certain class of problems or if it is going to take too long to implement the “severity flag” concept.

Software Group Presentation 1502 May 2014