Piotr Golonka, Manuel Gonzalez-Berges, Josef Hofer, Axel ...

1
Piotr Golonka, Manuel Gonzalez-Berges, Josef Hofer, Axel Voitier Piotr Golonka, Manuel Gonzalez-Berges, Josef Hofer, Axel Voitier CERN European Organization for Nuclear Research, Geneva, Switzerland CERN European Organization for Nuclear Research, Geneva, Switzerland Archiving architecture after re-design Archiving architecture after re-design Migration Migration Poster: MOPGF021 Track: “Control System Upgrades” Logging DB long-term data store Archiving architecture prior to re-design Archiving architecture prior to re-design Logging DB long-term data store XML Data Loader Single high-performance RDB Archive Manager writes data to central Oracle DB Data requiring long-term storage is transferred over a DBLink to the LoggingDB Standard database transfer job mechanism, used by other LoggingDB clients, serves all applications and provides diagnostic and statistical information Data stored in the LoggingDB can be made available in WinCC OA transparently Several WinCC OA Value Archive managers write historical data to local disk files Custom WinCCOA LoggingManager programs query data, transform to XML and send to a dedicated data loader application of the LoggingDB service The architecture would not scale to meet the new requirements for LHC Run II Index organized tables Index organized tables Row size reduction Row size reduction Column Type ELEMENT_ID NUMBER (25) TS TIMESTAMP(9) VALUE_NUMBER BINARY_DOUBE STATUS NUMBER (20) MANAGER NUMBER (20) USER_ NUMBER (5) SYS_ID NUMBER (20) BASE NUMBER (1) TEXT VARCHAR2 (4000) VALUE_STRING VARCHAR2 (4000) VALUE_TIMESTAMP TIMESTAMP(9) CORRVALUE_NUMBER BINARY_DOUBLE OLVALUE_NUMBER BINARY_DOUBLE CORRVALUE_STRING VARCHAR2 (4000) OLVALUE_STRING VARCHAR2 (4000) CORRVALUE_TIMESTAMP TIMESTAMP(9) OLVALUE_TIMESTAMP TIMESTAMP(9) Column Type ELEMENT_ID NUMBER (25) TS TIMESTAMP(9) VALUE_NUMBER BINARY_DOUBE BASE NUMBER (1) Typically NULL values 110 B (90B of data + 20 B indexes) Standard used schema: e.g. ATLAS 60 B Data and indexes merged together due to usage of IOT 30 B New schema with reduced column set, IOT and index compression Used in QPS Highly redundant Time based partitioning Time based partitioning I/O bottleneck in the database due to Oracle's REDU log size (data-recovery stream) Option to eliminate redundant/unnecessary data Data recovery performance test Data recovery performance test Motivation Motivation Conclusion Conclusion EVENTHISTORY_X EVENTHISTORY_0 EH_X <T n ;T n-1 > EH_X <T n-1 ;T n-2 > EH_X <T n-2 ;T n-3 > ... ... EH_1 <T 1 ;T 2 > EH_1 <T 2 ;T 3 > EH_1 <T 0 ;T 1 > EVENTHISTORY_0 EVENTHISTORY_1 ... EVENTHISTORY_X EVENTHISTORY EVENTHISTORY Key ingredient for optimization of readout path Access by index identified as the primary data-access pattern (transfers, trend plots) Query performance increased by a large factor Storage-space consumption reduced up to 50% Growing IOT leads to degraded query performance Time-based partitioning (partition pruning) Queries on views with pushed-predicate hints Data retention policy (3 months in daily partitions) Automated partition-managent job for 200 schemas WinCC OA System WinCC OA System WinCC OA System archive trend WinCC OA System WinCC OA System WinCC OA System Value Archive Value Archive Value Archive Logging Manager Logging Manager DB Link archive trend A challenging task, not only from a technical standpoint, but also from a coordination point of view. EVENTHISTORY_1 QPSHISTORY_1 EVENTHISTORY_2 QPSHISTORY_2 EVENTHISTORY_X QPSHISTORY_X INDEX INDEX HEAP ORGANIZED INDEX ORGANIZED Standard Schema IOT Schema Compressed Index 0 10 20 30 40 50 60 Disk Space Utilisation Disk Space [GB] Simulate a database-outage scenario in a set-up mimicking the LHC magnet protection system (QPS) WinCC OA Projects 50 Archived Signals 150 000 Data Input Rate [rows/s] 200 000 Outage duration [h] 8 Recovery Time [h] 2 Peak Recovery Rate [rows/s] 1 000 000 Migration completed in time for Run II of the LHC Single consistent archiving technology CERN-wide Satisfactory performance and reliability Optimisations available to all WinCC OA users JDBC HTTP Nominal Conditions Nominal Conditions Database Failure Database Failure Recovery Recovery Nominal Conditions Nominal Conditions Data buffered by RDB Archivers during DB outage Buffers inserted at maximal rate once DB becomes available Even in degraded mode (1 DB node) complete recovery in acceptable time. Logging Manager RDB Archive Manager Standard Schema IOT Schema Compressed Index 0 10 20 30 40 50 60 70 80 Query Time Query processing time [s] WinCC OA Projects Registered Signals [K] Transferred Rows [M/h] Databases in production SCADAR QPSR 140 48 2000 135 20 400 Migration 200 Systems and DB schemas Minimum downtime Human supervision Debugging and tuning RDB Archive Transfer Logic Eventhistory Eventhistory DB Schemas with data tables WinCC OA File Archiver: * Used by 200 WinCC OA systems for LHC and technical infrastructure + Stable, proven technology - Performance scalability issues - Complicated maintenance on many SCADA nodes WinCC OA Oracle RDB Archiver: + Already used in a few hundred controls systems at CERN + Proven scalability and performance for large systems + Centralized architecture eases management and handling of the SCADA nodes, provides better tools, and enables data analytics New requirement for the 13 TeV run of the LHC: the magnet protection system (QPS) needs data archived at a rate of 200.000 values/s

Transcript of Piotr Golonka, Manuel Gonzalez-Berges, Josef Hofer, Axel ...

Piotr Golonka, Manuel Gonzalez-Berges, Josef Hofer, Axel VoitierPiotr Golonka, Manuel Gonzalez-Berges, Josef Hofer, Axel VoitierCERN European Organization for Nuclear Research, Geneva, SwitzerlandCERN European Organization for Nuclear Research, Geneva, Switzerland

Archiving architecture after re-designArchiving architecture after re-design

MigrationMigration

Poster: MOPGF021Track: “Control System Upgrades”

Logging DB

long-termdata store

Archiving architecture prior to re-designArchiving architecture prior to re-design

Logging DB

long-termdata storeX

ML

Da

ta L

oa

der

● Single high-performance RDB Archive Manager writes data to central Oracle DB● Data requiring long-term storage is transferred over a DBLink to the LoggingDB● Standard database transfer job mechanism, used by other LoggingDB clients,

serves all applications and provides diagnostic and statistical information ● Data stored in the LoggingDB can be made available in WinCC OA transparently

● Several WinCC OA Value Archive managers write historical data to local disk files● Custom WinCCOA LoggingManager programs query data, transform to XML

and send to a dedicated data loader application of the LoggingDB service● The architecture would not scale to meet the new requirements for LHC Run II

Index organized tablesIndex organized tables Row size reductionRow size reductionColumn Type

ELEMENT_ID NUMBER (25)

TS TIMESTAMP(9)

VALUE_NUMBER BINARY_DOUBE

STATUS NUMBER (20)

MANAGER NUMBER (20)

USER_ NUMBER (5)

SYS_ID NUMBER (20)

BASE NUMBER (1)

TEXT VARCHAR2 (4000)

VALUE_STRING VARCHAR2 (4000)

VALUE_TIMESTAMP TIMESTAMP(9)

CORRVALUE_NUMBER BINARY_DOUBLE

OLVALUE_NUMBER BINARY_DOUBLE

CORRVALUE_STRING VARCHAR2 (4000)

OLVALUE_STRING VARCHAR2 (4000)

CORRVALUE_TIMESTAMP TIMESTAMP(9)

OLVALUE_TIMESTAMP TIMESTAMP(9)

Column Type

ELEMENT_ID NUMBER (25)

TS TIMESTAMP(9)

VALUE_NUMBER BINARY_DOUBE

BASE NUMBER (1)

Typically NULL values

110 B (90B of data + 20 B indexes) • Standard used schema: e.g. ATLAS

60 B • Data and indexes merged together due to

usage of IOT

30 B • New schema with reduced column set, IOT

and index compression • Used in QPS Highly redundant

Time based partitioningTime based partitioning

● I/O bottleneck in the database due to Oracle's REDU log size (data-recovery stream)

● Option to eliminate redundant/unnecessary data

Data recovery performance testData recovery performance test

MotivationMotivation

ConclusionConclusion

EVEN

THIS

TORY

_XEV

ENTH

ISTO

RY_0

EH_X <Tn;T

n-1>

EH_X <Tn-1

;Tn-2

>

EH_X <Tn-2

;Tn-3

>

...

...

EH_1 <T1;T

2>

EH_1 <T2;T

3>

EH_1 <T0;T

1>

EVENTHISTORY_0

EVENTHISTORY_1

...

EVENTHISTORY_X

EVENTHISTORY EVENTHISTORY

● Key ingredient for optimization of readout path● Access by index identified as the primary

data-access pattern (transfers, trend plots)● Query performance increased by a large factor● Storage-space consumption reduced up to 50%

● Growing IOT leads to degraded query performance● Time-based partitioning (partition pruning)● Queries on views with pushed-predicate hints

● Data retention policy (3 months in daily partitions)● Automated partition-managent job for 200 schemas

WinCC OA SystemWinCC OA SystemWinCC OA System

archive

trend

WinCC OA SystemWinCC OA SystemWinCC OA System

Value ArchiveValue ArchiveValue ArchiveLoggingManagerLoggingManager

DB Link

archive

trend

A challenging task, not only from a technical standpoint, but also from a coordination point of view.

EVENTHISTORY_1 QPSHISTORY_1

EVENTHISTORY_2 QPSHISTORY_2

EVENTHISTORY_X QPSHISTORY_X

INDE

X IN

DEX

HEAP ORGANIZED INDEX ORGANIZED

Standard SchemaIOT Schema

Compressed Index

0

10

20

30

40

50

60

Disk Space Utilisation

Dis

k S

pa

ce [G

B]

Simulate a database-outage scenario in a set-up mimicking the LHC magnet protection system (QPS)

WinCC OA Projects 50

Archived Signals 150 000

Data Input Rate [rows/s] 200 000

Outage duration [h] 8

Recovery Time [h] 2

Peak Recovery Rate [rows/s] 1 000 000

● Migration completed in time for Run II of the LHC● Single consistent archiving technology CERN-wide● Satisfactory performance and reliability● Optimisations available to all WinCC OA users

JDBCHTTP

Nominal ConditionsNominal Conditions

Database FailureDatabase Failure

RecoveryRecovery

Nominal ConditionsNominal Conditions

● Data buffered by RDB Archivers during DB outage

● Buffers inserted at maximal rate once DB becomes available

● Even in degraded mode (1 DB node) complete recovery in acceptable time.

LoggingManager

RDB Archive Manager

Standard SchemaIOT Schema

Compressed Index

01020304050607080

Query Time

Qu

ery

pro

ces

sin

g ti

me

[s]

WinCC OA Projects Registered Signals [K] Transferred Rows [M/h]

Databases in production

SCADAR QPSR

140

48

2000

135 20

400

Migration

200 Systemsand

DB schemas

Minimum downtime

Human supervision

Debugging and tuning

RDB Archive

Transfer Logic

EventhistoryEventhistory

DB Schemaswith

data tables

WinCC OA File Archiver:

* Used by 200 WinCC OA systems for LHC and technical infrastructure + Stable, proven technology - Performance scalability issues - Complicated maintenance on many SCADA nodes

WinCC OA Oracle RDB Archiver:

+ Already used in a few hundred controls systems at CERN+ Proven scalability and performance for large systems+ Centralized architecture eases management and handling of the SCADA nodes, provides better tools, and enables data analytics

New requirement for the 13 TeV run of the LHC: the magnet protection system (QPS) needs data archived at a rate of 200.000 values/s