Dependability Benchmarking: Where Are We Standing? · Dependability Benchmarking: Where Are We...

18
Dependability Benchmarking: Where Are We Standing? Jean Arlat [[email protected]] Benchmarking Expert Meeting — Stuttgart, Germany — September 19-20, 2007

Transcript of Dependability Benchmarking: Where Are We Standing? · Dependability Benchmarking: Where Are We...

Dependability Benchmarking:Where Are We Standing?

Jean Arlat[[email protected]]

Benchmarking Expert Meeting — Stuttgart, Germany — September 19-20, 2007

2

197 Researchers and Faculty• 80 CNRS, 101 Faculty,

16 Visiting researchers

33 Post-docs

248 PhD students

110 Engineers, Techniciansand Administrative staff

• 70 eng & tech., 40 admin.

Four Research Clusters

• MINAS: Micro and Nano Systems

• MOCOSY: Modelling, Optimization &System Control

• RIA: Robotics and Artificial Intelligence

• SINC: Critical Information Systems

• Research unit of CNRS (French National Organization for Scientific Research)

• Associated with 3 academic institutions: UPS, INPT, INSA

Brest

Besançon

Limoges

Orléans

Rouen

Caen

Troyes

Tours

Clermont-Ferrand

Dijon

Bordeaux

Belfort

Poitiers

VannesLe Mans

Amiens

Paris

Grenoble

Toulouse

RennesBrest

Lille

Nancy

Lyon

Saint-Etienne

Lannion

Montpellier

Nantes

Marseille

Angers

Strasbourg

Nice

3

Cluster on Critical Information Systems

TSF (18 + 18)

Dependable Computingand Fault Tolerance

OLC (29 + 26)

Software Toolsfor Communication

ISI (7 + 6)

System Engineeringand Integration

SINC

DISCOQualitative Diagnosis& Supervisory Control

MRSModeling and control

of Networks andSignals

MOCOSY

RISRobotics

and InteractionS

RIA

MISMicrosystems andSystem Integration

MINAS

4

DEPENDABLE COMPUTING AND FAULTTOLERANCE

5

DEPENDABLE COMPUTING AND FAULTTOLERANCE

Post’Doc• Benjamin Lussier

PhD Students (15)• Éric Alata• Amine Baina• Étienne Baudin• Ludovic Courtès• Ossama Hamouda• Youssef Laarouchi• Éric Lacombe• Caroline Lu• Minh Nguyen• Thomas Pareaud• Thomas Robert• Ana Elena Rugina• Manel Sghairi• Géraldine Vache• Piotr Zaj c

CNAM Doctorate• Frédéric Sorbet

Senior Senior Researchers Researchers (18)(18)•• Jean Jean ArlatArlat (DR2 CNRS) (group leader)(DR2 CNRS) (group leader)

•• Agnan de BonnevalAgnan de Bonneval (MC UPS)(MC UPS)

•• Jacques Collet Jacques Collet (DR2 CNRS)(DR2 CNRS)

•• Alain CostesAlain Costes (Prof INPT - ENSEEIHT)(Prof INPT - ENSEEIHT)

•• Yves Yves CrouzetCrouzet (CR1 CNRS, (CR1 CNRS, HdRHdR))

•• Yves Yves DeswarteDeswarte (DR2 CNRS)(DR2 CNRS)

•• Jean-Charles FabreJean-Charles Fabre (Prof INPT - ENSEEIHT)(Prof INPT - ENSEEIHT)

•• Jérémie Jérémie Guiochet Guiochet (MC IUT)(MC IUT)

•• Mohamed Mohamed KaânicheKaâniche (CR1 CNRS, (CR1 CNRS, HdRHdR))

•• Karama KanounKarama Kanoun (DR2 CNRS)(DR2 CNRS)

•• Marc-Olivier KillijianMarc-Olivier Killijian (CR2 CNRS)(CR2 CNRS)

•• Jean-Claude Jean-Claude LaprieLaprie (DRCE1 CNRS)(DRCE1 CNRS)

•• Vincent Vincent NicometteNicomette (MC INSA)(MC INSA)

•• David PowellDavid Powell (DR1 CNRS)(DR1 CNRS)

•• Nicolas RivièreNicolas Rivière (MC UPS)(MC UPS)

•• Matthieu RoyMatthieu Roy (CR2 CNRS)(CR2 CNRS)

•• Pascale Pascale ThévenodThévenod (DR2 CNRS)(DR2 CNRS)

•• Hélène Hélène WaeselynckWaeselynck (CR1 CNRS)(CR1 CNRS)

Research Engineer• Christophe Zanon

6

Fautes d’interaction sans malveillance

Fautesphysiques

Fautes de conceptionsans malveillance

Fautes de conception avec malveillance

Fautes d’interactionavec malveillance

« intrusions »

Non-maliciousinteraction faults

Physicalfaults

Non-maliciousdesign faults

Maliciousdesign faults

Maliciousinteraction faults

« intrusions »

Availability Safety Confidentiality

Reliability Integrity Maintainability

Dependable Computing Concerns

Prevention Tolerance Removal Forecasting

Dependability Methods

Availability Confidentiality

Integrity

Security

7

EnergyEnergy

Medicine and healthMedicine and health

SpaceSpace

MilitaryMilitary

TransportationTransportation

FinanceFinance

Availability Safety Confidentiality

Reliability Integrity Maintainability

Applications

CriticalCriticalInfrastructuresInfrastructures

Web servicesWeb services TelecommunicationsTelecommunications

8

BasicConcepts

Fault Classes

• Hardware Faults

• Software Faults

• Intrusions

FaultRemoval

FaultPrevention

FaultForecasting

FaultTolerance

SoftwareTesting

Proof-basedTesting

Testing of MobileSystems

Testing ofAspect-Oriented

Programs

AnalyticalAADL-basedDependability

Modeling

Evaluationof Security

Assessment ofDependencies in

CriticalInfrastructures

Algorithms& Architectures

Mobile Systems

Service OrientedArchitectures

On-line Adaptation& Reflexive Computing

Multi-level Integrity

Safety-criticalAutonomous Systems

ResilientNano-architectures

ExperimentalMeasurementof Critical

Execution Times

Characterizationof Intrusions(Honeypots)

Benchmarking wrtAccidental Faultsand Intrusions

SecurityIntrusionTolerance

Security Policies

Access Controlfor Privacy

DistributedAuthorization

Schemes

Securing OTSOSs

A. Avi ienis, J.-C. Laprie,B. Randell, C. Landwehr,Basic Concepts andTaxonomy of Dependableand Secure ComputingIEEE Trans. OnDependable and SecureComputing (TDSC), vol. 1,no. 1, pp. 11-33,Jan.-March 2004.

9

ReSISTReSIST NoEResilience for Survivability in IST

Continuous complexity growthLarge, networked, evolving, applications

running on open systems, fixed or mobile

Scalability of Dependability

Beyond rigorous functional design, provision of

Resilience for Survivabilitywrt accidental and malicious threats

Avionics, railway

signalling, nuclear

control, etc.

Transaction

processing,

back-end servers, etc.

(Reasonably) known:

High dependability and security

for safety-critical or availability-critical systems

Partners

Rationale

Logic

QinetiQEurécomVytautas Magnus U.Pisa U.IRITDeepBlueUlm U.Newcastle U.IRISADarmstad U.Southampton U.Lisbon U.IBM ZurichCity U.Roma-La Sapienza U.LAAS-CNRS (Coord.)France Telecom R&DBudapest U.

[http://www.resist-noe.org]

10

Dependability vs. Fault Tolerance Coverage

2 activeunits

SystemFailure

1 activeunit

1st failure (not covered)

1st failure (covered)

réparation2ème défaillance

UAT 1

UAT 2

E S

2

10 -4 10 -3 10 -2

c = .95

c = .9

c = .99

c = .995

c = .999 c = 1

10 1

10

10

10 4

1

3

MTTF System

MTTF Unit

MTTR

MTTFUnit

Unit

11

Input

From Fault Injection-based Experimentsto Dependability Benchmarking

Robustness Assessment: Explicit Characterizationof Faulty Behaviors and Failure Modes—> Dependability Benchmarking Frame

for “Off-The-Shelf” Software Executives- Targets: Microkernels, OSs, Middleware- Injection locations: Memory segments (code, data), API,

DPI (Driver Programming Interface)

Target System

Activity

Faults

InputError

Processing

Valid

Invalid

Output(Workload)

(Faultload) (Readouts -> Measures)

Experimental Validation of Fault-Tolerant Systems(FI Technique: pin-level, simulation, code mutation, SWIFI)

—> Estimation of the Efficiency (Coverage, Latency) of the Fault Tolerance Mechanisms

12

Benchmarking Software Kernels

API = Application Programming InterfaceHI = Hardware InterfaceDPI = Device Programming Interface

13

Examples of Results

Bit-flips into code segment

SYNC functional component

Bit-flips into interobject messages

System call parameter corruption at API

Restart duration

MAFALDA

System call parameter corruption at DPI

Network card drivers — Two Linux Releases

No Obs.

Deficiencies

ROCADE

Linux

Windows

14

Activity(Workload)

Faults(Faultload)

Analytical Measures

ExperimentalMeasures

BenchmarkMeasures

Model

Processing

Readouts

Processing

Model

PrototypeTargetsystem

Modeling

Experimentation

Dependability Benchmarking Framework

Agreement/Contract: Representativeness, Reproducibility, Portability, Low-intrusiveness,Cost Effectiveness (Time, Money, Reward),Scalability, etc.

15

Design and Evaluation of a FT Plannerfor Critical Robots*

Worklloadmimics possible activities of a space rover

take scientific photoscommunicatereturn to initial location

Four Missions (scheduling of set of activities)Four Worlds (obstacles in environment)

Faultload (SESAME tool)Mutation of source code

Substitution of :• numerical values, variables,• language attribute values and operators,Removal of constraint relation

Each experiment is executed 3 times (asynchronyin the computing layer) -> 48 exp. / mutation

Readouts and MeasuresSubset of goals achievedMission execution timeDistance covered by robot

AssessmentEvaluate performance impact of the FT planner(e.g., model switching, etc.)Evaluate the efficacy of FTP

The “LAAS Architecture”

Fault Tolerant Planner

Diversified planningSerial/parllel execution * B. Lussier et al., DSN-2007

16

On Fault Tolerance Efficacy

M1-M4 / W1-W3

M1-M4 / W1-W4

Robot1/2

Robot1

41%80%70%62%Excluding W4

29%58%64%50%Including W4

MissionsReturnsCommsPhotosImprovement

39 mutants executed (+3500 exp. ; +120 heures)11 trivial mutants removed

28 mutants considered (+2500 exp)3 on attribute values, 6 on variables, 9 on numerical values,4 on operators, 6 constraint removals

% photosfailed

% commsfailed

% returnsfailed

% missionsfailed

17

Additional References and Useful LinksP. Koopman, J. DeVale,“Comparing the Robustness of POSIX Operating Systems”, 29th IEEE Int. Symp. onFault-Tolerant Computing (FTCS-29), (Madison, WI, USA), pp.30-37, IEEE CS Press, 1999.

J. Arlat, J.-C. Fabre, M. Rodríguez, F. Salles, “Dependability of COTS Microkernel-Based Systems”, IEEETrans. on Computers, 51 (2), pp.138-163, February 2002.

E. Marsden, J.-C. Fabre, J. Arlat, “Dependability of CORBA Systems: Service Characterization by FaultInjection”, 21st IEEE Int. Symp. on Reliable Distributed Systems (SRDS-2002), (Osaka, Japan), pp.276-285,IEEE CS Press, 2002.

K. Kanoun, H. Madeira, M. Dal Cin, F. Moreira, J. C. Ruiz Garcia, “DBench (Dependability Benchmarking)”, 5thEuropean Dependable Computing Conference (EDCC-5) - Project Track, (Budapest, Hungary), 2005, Also LAASReport 05-197 -- http://www.laas.fr/Dbench.

K. Kanoun, Y. Crouzet, “Dependability Benchmarks for Operating Systems”, Int. Journal of PerformabilityEngineering, 2 (3), pp. 275-287, July 2006.

Y. Crouzet, H. Waeselynck, B. Lussier, D. Powell, “The SESAME Experience: From Assembly Languages toDeclarative Models”, 2nd Workshop on Mutation Analysis (Mutation2006), (Raleigh, NC, USA), 2006. --http://www.irisa.fr/manifestations/2006/Mutation2006/

A. Albinet, J. Arlat, J.-C. Fabre, “Robustness of the Device Driver-Kernel Interface: Application to the LinuxKernel”, in Dependability Benchmarking (K. Kanoun, L. Spainhower, Eds.), IEEE CS Press, 2007 (to appear). AlsoLAAS Report 06-351.

B. Lussier, M. Gallien, J. Guiochet, F. Ingrand, M.-O. Killijian, D. Powell, “Experiments with Diversified Modelsfor Fault-Tolerant Planning”, 5th IARP/IEEE-RAS/EURON Workshop on Technical Challenges for DependableRobots in Human Environments (IARP/DRHE’07), 2007.

B. Lussier, M. Gallien, J. Guiochet, F. Ingrand, M.-O. Killijian, D. Powell, "Fault Tolerant Planning for CriticalRobots," 37th Annual IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN'07), pp. 144-153, 2007.

B. Lussier, M. Gallien, J. Guiochet, F. Ingrand, M.-O. Killijian, D. Powell, “Planning with Diversified Models forFault-Tolerant Robots”, 17th Int. Conf. On Automated Planning and Scheduling, 2007

IFIP WG 10.4 SIG on Dependability Benchmarking --http://www.laas.fr/~kanoun/ifip_wg_10_4_sigdeb/index.html

18

Benchmarking Dependability / Robustness

InputServiceRobot

System

Activity

Faults

InputError

Handling

Valid

Invalid

Output(Workload)

(Faultload) (Readouts -> Measures)

Computer,Control System

Process,User

Faults,Errors Changes,

Disturbances

Components

Components

NB. This slide had been sketched to illustrate the issues related to the level of the application of benchmarking.

It was briefly presented during the discussion on Sept. 20