LCCI (Large-scale Complex - Object Management Group (Large-scale Complex ... ¡Tests cost reduction...
Transcript of LCCI (Large-scale Complex - Object Management Group (Large-scale Complex ... ¡Tests cost reduction...
LCCI (Large-scale Complex Critical Infrastructures)
¡ LCCIs are Internet-scale constellations of heterogeneous systems glued together into a federated and open system by a data distribution middleware.
¡ The shift towards Internet is considered a necessary step to overcome the limitations of the monolithic and closed architectures used traditionally to build critical systems (e.g., SCADA architectures).
¡ Real world example is the novel framework for Air Traffic Management (ATM) that EuroCONTROL is developing within the SESAR EU Joint Undertaking.
1
¡ New challenges rise from LCCIs that push the frontiers of current technologies.
¡ Data distribution task becomes crucial and has to be:¡ Reliability: deliveries have to be guaranteed despite
failures may happen;¡ Timeliness: messages must reach their destinations at
the right time, without breaking temporal constraints;¡ Scalability: performance is affected neither by the time
nor by the LCCI size.
¡ The challenge is to find the best data distribution paradigm able to meet the aforementioned requirements.
2LCCI (Large-scale Complex Critical Infrastructures)
Outline of SWIM concept
¡SWIM (System Wide Information Management) aims to establish a seamless interoperability among heterogeneous ATM stakeholders:¡ common data representation;¡ coherent view on current ATM information (e.g.
Flight Data, Aeronautical Data, Weather).
¡ It may be seen as a common data/service bus on which systems having to interoperate are “connected”.
¡Close in spirit to a middleware solution for LCCI.
3
¡ The prototype (named “SWIM-BOX”) has been conceived as a sort of “Gateway/Mediator” across legacy applications: ¡ Completely distributed architecture;¡ Designed using a domain based approach (Flight,
Surveillance, etc);¡ Implemented using a standard based approach;¡ Well known data and information models (e.g. ICOG2);¡ Standard technologies (Web Services, EJB, DDS);¡ DDS-compliant middleware for sharing data.
4SWIM prototype
Common Infrastructure
SWIM-BOXSWIM
Network
SWIM-BOX
Adapter B
Legacy B
Adapter A
Legacy A
Legacy site Legacy site
¡How subsystems (as COTS) involved into LCCI impacts on its dependability?
¡What are the effects on LCCI if DDS-compliant middleware is invoked with erroneous inputs?
¡Robustness testing provides answers to these questions:¡ Help vendors evaluating their implementations;¡ Help clients selecting several solutions.
¡ Tests cost reduction à automating tests procedure.
¡Automating tests results classification.
5Some challenges
Our goal¡Assessing the robustness of DDS-compliant
middleware
¡What does robustness mean?
¡Robustness testing features: ¡Only the system interface has to be known;¡Source code is not needed (black-box approach);
¡Injecting exceptional input through API;
¡Do not alter ”data and structure" internally;
¡Select carefully inputs and stressful conditions that cause the activation of faults representative of actual situations.
“Dependability with respect to external faults, which characterizes a system reaction to a specific class of faults” [Avizienis 04].
“The degree to which a system operates correctly in the presence of exceptional inputs or stressful environmental conditions” [IEEE Std 610.12.1990].
6
¡Robustness testing: stressing the public interface of the application/system/API with invalid and exceptional values:¡ From Application To System Under Test (Top-Down);¡ From OS to System Under Test (Bottom-UP).
7Robustness Testing Approaches
ApplicationApplication
DDS MiddlewareDDS Middleware
Operating SystemOperating System
API called with exceptional values
OS return with exceptional values
OS syscall
¡Robustness testing: stressing the public interface of the application/system/API with invalid and exceptional values:¡ From Application To System Under Test (Top-Down);¡ From OS to System Under Test (Bottom-UP).
8Robustness Testing Approaches
ApplicationApplication
DDS MiddlewareDDS Middleware
Operating SystemOperating System
API called with exceptional values
OS return with exceptional values
OS syscall
¡Robustness testing: stressing the public interface of the application/system/API with invalid and exceptional values:¡ From Application To System Under Test (Top-Down);¡ From OS to System Under Test (Bottom-UP).
¡Workload stands for a set of valid calls. It’s needed to stress each operation of the device under test.
¡ Fault model is a set of rules applied at API to expose robustness problems.
¡ Failure mode classification characterizes the behavior of the system under test while executing the workload in the presence of fault model.
9Robustness Testing Approaches
Injection library
Fault Injection: WWW dilemma
¡What to inject?¡ Fault model -> Fault List
¡Where to inject?¡ At API interface level
¡ Method with higher occurrences
(Method list)
¡When to inject?¡ At only one invocation of methods
(Trigger list)
¡ Fault, Model and Trigger lists define our Injection library
10
Fault list
Method list
Trigger List
Faults list 11
¡ The rules list applied during the API invocation:¡ Each method input is tested with all robustness values one for time.
¡ E.g., void replace(int a, String b).
Method list¡ Profiling different applications using DDS-
compliant middleware product:¡ Ping-pong application;
¡ Touchstone: benchmarking framework for evaluating the performance of OMG DDS compliant implementations;
¡ SWIM-BOX.
¡ The methods occurrences have been measured for each applications: ¡ Only a limited set core of all available methods are invoked;
¡ The same occurrences distribution is noted for all applications
¡Method list involved the methods with higher occurrences.
12
¡CRASH scale has been utilized to classify the robustness problems¡ Catastrophic: node crashes and OS hangs, DDS provider do not
deliver messages correctly.
¡ Restart: DDS provider becomes unresponsive and must be terminated by force.
¡ Abort: Abnormal termination when invoking API.
¡ Silent: Faulty submitted value doesn’t rise exceptions, despite this message are or aren’t transmitted.
¡ Hindering: returned error code is incorrect.
¡Further and suitable levels have been added:¡ non conformity: fault is not indicated as should be.
¡DDS API analysis has been performed for results classification.
¡ Golden run has been run for each injecting value to understand the system behavior.
13Failure mode classification
Test automation: JFault Injection Tool (JFIT)
¡Pros:
¡ Java-based implementation;
¡No knowledge about the SUT;
¡Run-time methods interception and values mutation:¡ Exploiting java reflection;
¡Monitoring status and output of the SUT.
¡Cons:
¡Only methods with primitive types (i.e. String, int, …) are taken into account;
¡Off line and by hand results classification.
14
High level architecture of JFIT
¡All robustness test are carried out according with the Injection library;¡ Controller is in charge for tests management and runs them through the Activator;¡ Interceptor catches the methods invocation to SUT and injects, by Injector, the faults one for time¡ Monitor records the output at Pub and Sub side.
15
CONTROLLERCONTROLLER
ACTIVATORACTIVATOR
MONITORMONITOR
System Under Test
System Under Test
INJECTORINJECTOR
INTERCEPTOR
INTERCEPTOR
Test execution stages 16
¡Preliminary execution of the workload without faults ¡ To understand the normal behavior
¡Starting robustness testing
DDS initialitation
Workload execution
Injection phase
Monitoring &Logging
Golden run
No faults are injected
One fault for time
17Tests Results¡DDS middleware: OpenSplice® implementation;
¡No QoS features have been defined (Best Effort);
¡ According with the failure mode classification the achieved results are as follows:
¡ no Catastrophic, Abort and Hindering problems have been evidenced:¡ Neither node crashes and nor OS hangs; ¡ No abnormal termination when invoking API;¡ No erroneous returned error code.
¡ 13% of robustness tests have shown Restartproblems:¡ Experiment doesn’t response and must be terminated by
force.
¡ 45% of robustness tests have risen Silent problems:¡ No exception has been thrown by DDS;
18Tests Results¡Faults distribution between Silent and Restart.
Int faults types String faults types
Faults types
19Conclusions¡ Our approach can automatically test the core set of DDS
methods;
¡ A significant fraction of tests shows some robustness issues raised when exceptional values are submitted to OpenSplice®APIs (e.g., large strings, or big integers);
¡ The ability to reach a consistent system state before performing fault injection makes us confident of the results.
20
¡ Testing all parameters types and not only primitive types;
¡ Automating results classification;
¡ Running tests in presence of quality of service mechanisms;
¡ Carrying out the same tests with other DDS-compliant middleware.
Conclusions
Ongoing activities
¡ Our approach can automatically test the core set of DDS methods;
¡ A significant fraction of tests shows some robustness issues raised when exceptional values are submitted to OpenSplice®APIs (e.g., large strings, or big integers);
¡ The ability to reach a consistent system state before performing fault injection makes us confident of the results.
References[Avizienis 04] A. Avizienis, J.C. Laprie, B. Randell, C. Landwehr. Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Trans. Dependable Secure Computing, 2004.
[Koopman 02] P. Koopman. “What's Wrong With Fault Injection As A Benchmarking Tool?”. in Proc. DSN 2002 Workshop on Dependability Benchmarking, pp. F- 31-36, Washington, D.C.,USA, 2002.
[Koopman 99] Koopman P., DeVale J., Comparing the robustness of POSIX operating, Proceedings of Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing, 1999.
[Johansson 07] Johansson A., Suri N., Murphy B. On the selection of Error models for OS Robustness Evaluation Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007.
[Miller 95] B.P. Miller et al, Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities., Technical report, 1995.
21
22
Test Scenario
Further details
¡DDS middleware: OpenSplice® implementation
¡No QoS features have been defined (Best Effort)
23
JFIT
AP
I int
erce
ptor
AP
I injector
JFIT
Mon
itori
ng Moni tori ng
¡ A receiver is waiting for messages
¡ The Transmittersends burst of messages for a while then terminates
¡ Pub/Sub reveals effective to federate heterogeneous systems¡ Space, time and synchronization decoupling enforce scalability¡ Asynchronous multi-point communication good to devise
cooperating systems
¡ Among the plethora of Pub/Sub alternatives DDS exhibits better performances, higher scalability and larger set of offered QoS ¡ Widely used in large scope initiatives addressing wide area
scenarios¡ E.g., it has been investigating as the data distribution system
into SESAR project through SWIM middleware infrastructure
CORBA NSCORBA NS JMSJMS
SIENASIENAGREENGREEN
JEDIJEDI
HERALDHERALD DREAMDREAM
HERMESHERMES
Pub/Sub paradigm 24