LCCI (Large-scale Complex - Object Management Group (Large-scale Complex ... ¡Tests cost reduction...

LCCI (Large-scale Complex Critical Infrastructures)

¡ LCCIs are Internet-scale constellations of heterogeneous systems glued together into a federated and open system by a data distribution middleware.

¡ The shift towards Internet is considered a necessary step to overcome the limitations of the monolithic and closed architectures used traditionally to build critical systems (e.g., SCADA architectures).

¡ Real world example is the novel framework for Air Traffic Management (ATM) that EuroCONTROL is developing within the SESAR EU Joint Undertaking.

1

¡ New challenges rise from LCCIs that push the frontiers of current technologies.

¡ Data distribution task becomes crucial and has to be:¡ Reliability: deliveries have to be guaranteed despite

failures may happen;¡ Timeliness: messages must reach their destinations at

the right time, without breaking temporal constraints;¡ Scalability: performance is affected neither by the time

nor by the LCCI size.

¡ The challenge is to find the best data distribution paradigm able to meet the aforementioned requirements.

2LCCI (Large-scale Complex Critical Infrastructures)

Outline of SWIM concept

¡SWIM (System Wide Information Management) aims to establish a seamless interoperability among heterogeneous ATM stakeholders:¡ common data representation;¡ coherent view on current ATM information (e.g.

Flight Data, Aeronautical Data, Weather).

¡ It may be seen as a common data/service bus on which systems having to interoperate are “connected”.

¡Close in spirit to a middleware solution for LCCI.

3

¡ The prototype (named “SWIM-BOX”) has been conceived as a sort of “Gateway/Mediator” across legacy applications: ¡ Completely distributed architecture;¡ Designed using a domain based approach (Flight,

Surveillance, etc);¡ Implemented using a standard based approach;¡ Well known data and information models (e.g. ICOG2);¡ Standard technologies (Web Services, EJB, DDS);¡ DDS-compliant middleware for sharing data.

4SWIM prototype

Common Infrastructure

SWIM-BOXSWIM

Network

SWIM-BOX

Adapter B

Legacy B

Adapter A

Legacy A

Legacy site Legacy site

¡How subsystems (as COTS) involved into LCCI impacts on its dependability?

¡What are the effects on LCCI if DDS-compliant middleware is invoked with erroneous inputs?

¡Robustness testing provides answers to these questions:¡ Help vendors evaluating their implementations;¡ Help clients selecting several solutions.

¡ Tests cost reduction à automating tests procedure.

¡Automating tests results classification.

5Some challenges

Our goal¡Assessing the robustness of DDS-compliant

middleware

¡What does robustness mean?

¡Robustness testing features: ¡Only the system interface has to be known;¡Source code is not needed (black-box approach);

¡Injecting exceptional input through API;

¡Do not alter ”data and structure" internally;

¡Select carefully inputs and stressful conditions that cause the activation of faults representative of actual situations.

“Dependability with respect to external faults, which characterizes a system reaction to a specific class of faults” [Avizienis 04].

“The degree to which a system operates correctly in the presence of exceptional inputs or stressful environmental conditions” [IEEE Std 610.12.1990].

6

¡Robustness testing: stressing the public interface of the application/system/API with invalid and exceptional values:¡ From Application To System Under Test (Top-Down);¡ From OS to System Under Test (Bottom-UP).

7Robustness Testing Approaches

ApplicationApplication

DDS MiddlewareDDS Middleware

Operating SystemOperating System

API called with exceptional values

OS return with exceptional values

OS syscall



ApplicationApplication

DDS MiddlewareDDS Middleware

Operating SystemOperating System

API called with exceptional values

OS return with exceptional values

OS syscall


¡Workload stands for a set of valid calls. It’s needed to stress each operation of the device under test.

¡ Fault model is a set of rules applied at API to expose robustness problems.

¡ Failure mode classification characterizes the behavior of the system under test while executing the workload in the presence of fault model.


Injection library

Fault Injection: WWW dilemma

¡What to inject?¡ Fault model -> Fault List

¡Where to inject?¡ At API interface level

¡ Method with higher occurrences

(Method list)

¡When to inject?¡ At only one invocation of methods

(Trigger list)

¡ Fault, Model and Trigger lists define our Injection library

10

Fault list

Method list

Trigger List

Faults list 11

¡ The rules list applied during the API invocation:¡ Each method input is tested with all robustness values one for time.

¡ E.g., void replace(int a, String b).

Method list¡ Profiling different applications using DDS-

compliant middleware product:¡ Ping-pong application;

¡ Touchstone: benchmarking framework for evaluating the performance of OMG DDS compliant implementations;

¡ SWIM-BOX.

¡ The methods occurrences have been measured for each applications: ¡ Only a limited set core of all available methods are invoked;

¡ The same occurrences distribution is noted for all applications

¡Method list involved the methods with higher occurrences.

12

¡CRASH scale has been utilized to classify the robustness problems¡ Catastrophic: node crashes and OS hangs, DDS provider do not

deliver messages correctly.

¡ Restart: DDS provider becomes unresponsive and must be terminated by force.

¡ Abort: Abnormal termination when invoking API.

¡ Silent: Faulty submitted value doesn’t rise exceptions, despite this message are or aren’t transmitted.

¡ Hindering: returned error code is incorrect.

¡Further and suitable levels have been added:¡ non conformity: fault is not indicated as should be.

¡DDS API analysis has been performed for results classification.

¡ Golden run has been run for each injecting value to understand the system behavior.

13Failure mode classification

Test automation: JFault Injection Tool (JFIT)

¡Pros:

¡ Java-based implementation;

¡No knowledge about the SUT;

¡Run-time methods interception and values mutation:¡ Exploiting java reflection;

¡Monitoring status and output of the SUT.

¡Cons:

¡Only methods with primitive types (i.e. String, int, …) are taken into account;

¡Off line and by hand results classification.

14

High level architecture of JFIT

¡All robustness test are carried out according with the Injection library;¡ Controller is in charge for tests management and runs them through the Activator;¡ Interceptor catches the methods invocation to SUT and injects, by Injector, the faults one for time¡ Monitor records the output at Pub and Sub side.

15

CONTROLLERCONTROLLER

ACTIVATORACTIVATOR

MONITORMONITOR

System Under Test

System Under Test

INJECTORINJECTOR

INTERCEPTOR

INTERCEPTOR

Test execution stages 16

¡Preliminary execution of the workload without faults ¡ To understand the normal behavior

¡Starting robustness testing

DDS initialitation

Workload execution

Injection phase

Monitoring &Logging

Golden run

No faults are injected

One fault for time

17Tests Results¡DDS middleware: OpenSplice® implementation;

¡No QoS features have been defined (Best Effort);

¡ According with the failure mode classification the achieved results are as follows:

¡ no Catastrophic, Abort and Hindering problems have been evidenced:¡ Neither node crashes and nor OS hangs; ¡ No abnormal termination when invoking API;¡ No erroneous returned error code.

¡ 13% of robustness tests have shown Restartproblems:¡ Experiment doesn’t response and must be terminated by

force.

¡ 45% of robustness tests have risen Silent problems:¡ No exception has been thrown by DDS;

18Tests Results¡Faults distribution between Silent and Restart.

Int faults types String faults types

Faults types

19Conclusions¡ Our approach can automatically test the core set of DDS

methods;

¡ A significant fraction of tests shows some robustness issues raised when exceptional values are submitted to OpenSplice®APIs (e.g., large strings, or big integers);

¡ The ability to reach a consistent system state before performing fault injection makes us confident of the results.

20

¡ Testing all parameters types and not only primitive types;

¡ Automating results classification;

¡ Running tests in presence of quality of service mechanisms;

¡ Carrying out the same tests with other DDS-compliant middleware.

Conclusions

Ongoing activities

¡ Our approach can automatically test the core set of DDS methods;

¡ A significant fraction of tests shows some robustness issues raised when exceptional values are submitted to OpenSplice®APIs (e.g., large strings, or big integers);

¡ The ability to reach a consistent system state before performing fault injection makes us confident of the results.

References[Avizienis 04] A. Avizienis, J.C. Laprie, B. Randell, C. Landwehr. Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Trans. Dependable Secure Computing, 2004.

[Koopman 02] P. Koopman. “What's Wrong With Fault Injection As A Benchmarking Tool?”. in Proc. DSN 2002 Workshop on Dependability Benchmarking, pp. F- 31-36, Washington, D.C.,USA, 2002.

[Koopman 99] Koopman P., DeVale J., Comparing the robustness of POSIX operating, Proceedings of Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing, 1999.

[Johansson 07] Johansson A., Suri N., Murphy B. On the selection of Error models for OS Robustness Evaluation Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007.

[Miller 95] B.P. Miller et al, Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities., Technical report, 1995.

21

Test Scenario

Further details

¡DDS middleware: OpenSplice® implementation

¡No QoS features have been defined (Best Effort)

23

JFIT

AP

I int

erce

ptor

AP

I injector

JFIT

Mon

itori

ng Moni tori ng

¡ A receiver is waiting for messages

¡ The Transmittersends burst of messages for a while then terminates

¡ Pub/Sub reveals effective to federate heterogeneous systems¡ Space, time and synchronization decoupling enforce scalability¡ Asynchronous multi-point communication good to devise

cooperating systems

¡ Among the plethora of Pub/Sub alternatives DDS exhibits better performances, higher scalability and larger set of offered QoS ¡ Widely used in large scope initiatives addressing wide area

scenarios¡ E.g., it has been investigating as the data distribution system

into SESAR project through SWIM middleware infrastructure

CORBA NSCORBA NS JMSJMS

SIENASIENAGREENGREEN

JEDIJEDI

HERALDHERALD DREAMDREAM

HERMESHERMES

Pub/Sub paradigm 24

LCCI (Large-scale Complex - Object Management Group (Large-scale Complex ... ¡Tests cost reduction...

Documents

Transcript of LCCI (Large-scale Complex - Object Management Group (Large-scale Complex ... ¡Tests cost reduction...