Developing Safety Critical Software: Fact and Fiction John A McDermid.

89
Developing Safety Developing Safety Critical Software: Critical Software: Fact and Fiction Fact and Fiction John A McDermid John A McDermid
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    2

Transcript of Developing Safety Critical Software: Fact and Fiction John A McDermid.

Page 1: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Developing Safety Critical Developing Safety Critical Software: Fact and FictionSoftware: Fact and Fiction

John A McDermidJohn A McDermid

Page 2: Developing Safety Critical Software: Fact and Fiction John A McDermid.

OverviewOverview

Fact – costs and distributionsFact – costs and distributions Fiction – get the requirements rightFiction – get the requirements right Fiction – get the functionality rightFiction – get the functionality right Fiction – abstraction is the solutionFiction – abstraction is the solution Fiction – safety critical code must Fiction – safety critical code must

be “bug free”be “bug free” Some key messagesSome key messages

Page 3: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Part 1Part 1

Fact – costs and distributionsFact – costs and distributions

Fiction – get the requirements Fiction – get the requirements rightright

Page 4: Developing Safety Critical Software: Fact and Fiction John A McDermid.

OverviewOverview

Fact – costs and distributionsFact – costs and distributions Fiction – get the requirements rightFiction – get the requirements right Fiction – get the functionality rightFiction – get the functionality right Fiction – abstraction is the solutionFiction – abstraction is the solution Fiction – safety critical code must Fiction – safety critical code must

be “bug free”be “bug free” Some key messagesSome key messages

Page 5: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Costs and DistributionsCosts and Distributions

Examples of industrial experienceExamples of industrial experience– Specific exampleSpecific example– Some more general observationsSome more general observations

Example coversExample covers– Cost by phaseCost by phase– Where errors are introducedWhere errors are introduced– Where errors are detectedWhere errors are detected

and their relationshipsand their relationships

Page 6: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Hardw areSoftw areIntegration

1%Softw are

Integration Test7%

Low Level Softw are Test

17%

Softw areStatic Analysis

1%

Softw areImplementation

10%

Softw areDesign

3%

Review s andInspections

8%

SystemSpecif ication

25%

Management8%

System Integration17%

OtherSoftw are

3%

Process Phases Process Phases

From SystemSpecification

Via Software Engineering

To System Integration Effort/Cost

by Phase

Page 7: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Error IntroductionError Introduction

USERREQUIREMENTS

SYSTEMREQUIREMENTS

DOCUMENTTRACEABILITY

HARDWARE SOFTWARE

ER

RO

RS

RA

ISE

D

FE

MIN FE

NO FE

FE = Functional Effect Min FE typically data change

Page 8: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Finding Requirements ErrorsFinding Requirements ErrorsE

RR

OR

S R

AIS

ED

Requirementstesting tends to

find requirements errors

Phases on Pie Chart System Validation

Page 9: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Erro

rs ra

ised

FE

MIN FE

NO FE

Errors Introduced Here…..

Result - High Development Result - High Development CostCost

Page 10: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Err

ors

ra

ise

d

REQUIREMENT ERROR FUNCTIONAL EFFECT

REQUIREMENT ERROR MINOR FUNCTIONAL EFFECT

REQUIREMENT ERROR NO FUNCTIONAL EFFECT

Erro

rs R

aise

d

FE

MIN FE

NO FE

Errors Introduced Here…..

….are not found until

here

Result - High Development Result - High Development CostCost

Page 11: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Result - High Development Result - High Development CostCost

Err

ors

ra

ise

d

REQUIREMENT ERROR FUNCTIONAL EFFECT

REQUIREMENT ERROR MINOR FUNCTIONAL EFFECT

REQUIREMENT ERROR NO FUNCTIONAL EFFECT

Erro

s Ra

ised

FE

MIN FE

NO FE

Errors Introduced Here…..

….are not found until

here

After following safety critical development

process

Page 12: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Software and MoneySoftware and Money

Typical productivityTypical productivity– 5 Lines of Code (LoC) per person day 5 Lines of Code (LoC) per person day

1 kLoC per person year1 kLoC per person year– Requirements to end of module testRequirements to end of module test

Typical avionics “box”Typical avionics “box”– 100 kLoC100 kLoC– 100 person years of effort100 person years of effort– Circa £10M for software, so Circa £10M for software, so £500M £500M

on a modern aircraft?on a modern aircraft?

Page 13: Developing Safety Critical Software: Fact and Fiction John A McDermid.

US Aircraft Software US Aircraft Software DependenceDependence

0

10

20

30

40

50

60

70

80

90

1960 1964 1970 1975 1982 1990 2000

Year

% f

unct

ions

perf

orm

ed b

y s

of t

ware

F4

A-7

F-111

F-15

F-16

B-2

F-22

DoD Defense Science Board Task Force on Defense Software, November 2000

Page 14: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Increasing DependenceIncreasing Dependence

Software often determinant of functionSoftware often determinant of function Software operates autonomouslySoftware operates autonomously

– Without opportunity for human intervention, Without opportunity for human intervention, e.g. Mercedes Brake Assiste.g. Mercedes Brake Assist

Software affected by other changesSoftware affected by other changes– e.g new weapons fit on EuroFightere.g new weapons fit on EuroFighter

Software has high levels of authoritySoftware has high levels of authority

Page 15: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Inappropriate CofG control in fuel system can reduce fatigue life of wings

Page 16: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Growing DependencyGrowing Dependency

Problem is growingProblem is growing– Now about a third of aircraft Now about a third of aircraft

development costsdevelopment costs– Increasing proportion of car Increasing proportion of car

developmentdevelopment Around 25% of capital cost of new cars in Around 25% of capital cost of new cars in

electronicselectronics

– Problem made more visible by rate of Problem made more visible by rate of improvements in tools for improvements in tools for “mainstream” software development“mainstream” software development

Page 17: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Growth of Airborne SoftwareGrowth of Airborne Software

1980

19871993

19981999

20042004

2014

1

10

100

1000

10000

100000

In Service Date

Co

de S

ize k

Lo

C

Approx £1.5B at current productivity and costs

Page 18: Developing Safety Critical Software: Fact and Fiction John A McDermid.

The Problem - Size mattersThe Problem - Size matters

0

2000

4000

6000

8000

10000

12000

5% 10% 15% 20% 25% 30% 35% 40% 45% 50%

Probability of Software Project Being Cancelled

Capers Jones, Becoming Best In Class, Software Productivity Research, 1995 briefing

Si z

e In

Fu

nct

ion

Po i

nts

1 function point = 80 SLOC of Ada1 function point =128 SLOC of C

Page 19: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Is Software Safety an Issue? Is Software Safety an Issue?

Software has a good track recordSoftware has a good track record– A few high profile accidentsA few high profile accidents

Therac 25Therac 25 Ariane 501Ariane 501 Cali (strictly data not software)Cali (strictly data not software)

– Analysis of 1,100 “computer related Analysis of 1,100 “computer related deaths”deaths”

Only 34 attributed to softwareOnly 34 attributed to software

Page 20: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Chinook - Mull of Kintyre

Was this caused by FADEC software?

Page 21: Developing Safety Critical Software: Fact and Fiction John A McDermid.

But Don’t be ComplacentBut Don’t be Complacent

Many instances of “pilot error” are Many instances of “pilot error” are system assistedsystem assisted

Software failures typically leave no traceSoftware failures typically leave no trace Increasing software complexity and Increasing software complexity and

authorityauthority Can’t measure software safety (no Can’t measure software safety (no

agreement)agreement) Unreliability of commercial softwareUnreliability of commercial software Cost of safety critical softwareCost of safety critical software

Page 22: Developing Safety Critical Software: Fact and Fiction John A McDermid.

SummarySummary

Safety critical software a growing Safety critical software a growing issue issue – Software-based systems are dominant Software-based systems are dominant

source of product differentiationsource of product differentiation– Starting to become a major cost driverStarting to become a major cost driver– Starting to become the drive (drag) on Starting to become the drive (drag) on

product developmentproduct development Can’t cancel, have to keep on spending!!!Can’t cancel, have to keep on spending!!!

– Not major contributor to fatal accidentsNot major contributor to fatal accidents Although many incidentsAlthough many incidents

Page 23: Developing Safety Critical Software: Fact and Fiction John A McDermid.

OverviewOverview

Fact – costs and distributionsFact – costs and distributions Fiction – get the requirements rightFiction – get the requirements right Fiction – get the functionality rightFiction – get the functionality right Fiction – abstraction is the solutionFiction – abstraction is the solution Fiction – safety critical code must Fiction – safety critical code must

be “bug free”be “bug free” Some key messagesSome key messages

Page 24: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Requirements FictionRequirements Fiction

Fiction statedFiction stated– Get the requirements right, and the Get the requirements right, and the

development will be easydevelopment will be easy FactsFacts

– Getting requirements right is difficultGetting requirements right is difficult– Requirements are biggest source of Requirements are biggest source of

errorserrors– Requirements changeRequirements change– Errors occur at organisational boundariesErrors occur at organisational boundaries

Page 25: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Embedded SystemsEmbedded Systems

Computer system embedded in Computer system embedded in larger engineering systemlarger engineering system

Requirements come fromRequirements come from– ““Flow down” from systemFlow down” from system– Design decisions (commitments)Design decisions (commitments)– Safety and reliability analysesSafety and reliability analyses

Derived safety requirements (DSRs)Derived safety requirements (DSRs)

– Fault management/accommodationFault management/accommodation As much as 80% for control applicationsAs much as 80% for control applications

Page 26: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Almost Everything on One Almost Everything on One PicturePicture

NB Based on Parnas’ four variable model

Page 27: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Almost Everything on One Almost Everything on One PicturePicture

IN

System

S1 S2 S3 A1

Control Interface

REQ = restriction on NAT

Control loops, high level modes,end to end response times, etc.

Control System & Software

OUT

Platform

Physical dcomposition of system, to sensors and actuators plus controller.

SOFTREQ specifies what control software must do.

REQ = IN SOFTREQ OUT

SOFTREQ

Page 28: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Almost Everything on One Almost Everything on One PicturePicture

IN

System

S1 S2 S3 A1

Control Interface

REQ = restriction on NAT

HAL

Control loops, high level modes,end to end response times, etc.

I/P O/PSPEC

I/P

Control System & Softw are

Output FnIncluding

loopclosing

OUT

Platform

Input FnIncluding

signalvalidation

Redefinition ofSOFTREQ allow ing

for digitisationnoise, sensormanagement,

actuator dynamics

O/PC

on

tro

lI/

F

Functional decomposition of softw are.Mapping of control functions to genericarchitecture.

SOFTREQ = I/P SPEC O /P

Page 29: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Almost Everything on One Almost Everything on One PicturePicture

Da

taS

ele

cti

on

IN

System

S1 S2 S3 A1

Control Interface

REQ = restriction on NAT

HAL

Control loops, high level modes,end to end response times, etc.

I/P O/PSPEC

I/P

Application

Control System & Software

Output FnIncluding

loopclosing

OUT

Platform

Input FnIncluding

signalvalidation

Redefinition ofSOFTREQ

allow ing fordigitisation noise,

sensormanagement,

actuatordynamics

data selection

O/P

ControllerStructure

Co

ntr

ol

I/F

A

A

F

M

Physical decomp-osition of controller.

Defines FMAAstructure.

Page 30: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Types of LayerTypes of Layer

Some layers have design meaningSome layers have design meaning– Abstraction from computing hardware Abstraction from computing hardware

Time in mS from reference, or ...Time in mS from reference, or ...– Not interrupts or bit patterns from clock hardwareNot interrupts or bit patterns from clock hardware

– The “System” HALThe “System” HAL ““Raw” sensed values, e.g. pressure in psiaRaw” sensed values, e.g. pressure in psia

– Not bit patterns from analogue to digital convertersNot bit patterns from analogue to digital converters

– FMAA to ApplicationFMAA to Application Validated values of platform propertiesValidated values of platform properties

– May also have computational meaningMay also have computational meaning e.g. call to HAL forces scheduling actione.g. call to HAL forces scheduling action

Page 31: Developing Safety Critical Software: Fact and Fiction John A McDermid.

CommitmentsCommitments

Development proceeds via a series of Development proceeds via a series of commitmentscommitments– A design decision which can only be A design decision which can only be

revoked at significant costrevoked at significant cost– Often associated with architectural Often associated with architectural

decision or choice of componentdecision or choice of component Use of triplex redundancy, choice of pump, Use of triplex redundancy, choice of pump,

power supply, etc.power supply, etc.

– Commitments can be functional or physicalCommitments can be functional or physical Most common to make physical commitmentsMost common to make physical commitments

Page 32: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Derived RequirementsDerived Requirements

Commitments introduce derived Commitments introduce derived requirements (requirements (DRsDRs))– Choice of pump gives DRs for control Choice of pump gives DRs for control

algorithm, iteration rate, also algorithm, iteration rate, also requirements for initialisation, etc. requirements for initialisation, etc.

– Also get derived safety requirements Also get derived safety requirements ((DSRsDSRs), e.g. detection and ), e.g. detection and management of sensor failure for management of sensor failure for safety safety

Page 33: Developing Safety Critical Software: Fact and Fiction John A McDermid.

System Level RequirementsSystem Level Requirements

Allocated requirementsAllocated requirements– System level requirements which System level requirements which

come from platformcome from platform– May be (slight) modification due to May be (slight) modification due to

design commitments, e.g.design commitments, e.g. Platform – control engine thrust to within Platform – control engine thrust to within

± 0.5% of demanded± 0.5% of demanded System – control EPR or N1 to within System – control EPR or N1 to within ± ±

0.5% of demanded0.5% of demanded

Page 34: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Stakeholder RequirementsStakeholder Requirements

Direct requirements from stakeholders, Direct requirements from stakeholders, e.g.e.g.– The radar shall be able to detect targets The radar shall be able to detect targets

travelling up to mach 2.5 at 200 nautical travelling up to mach 2.5 at 200 nautical miles, with 98% probabilitymiles, with 98% probability

– In principle allocated from platformIn principle allocated from platform In practice often stated in system termsIn practice often stated in system terms

– Need to distinguish legitimate requirements Need to distinguish legitimate requirements from “soluntioneering”from “soluntioneering”

Legitimacy depends on the stakeholder, e.g. Legitimacy depends on the stakeholder, e.g. CESG and cryptosCESG and cryptos

Page 35: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Requirements TypesRequirements Types

Main requirements typesMain requirements types– Invariants, e.g.Invariants, e.g.

Forward and reverse thrust will not be Forward and reverse thrust will not be commanded at the same timecommanded at the same time

– Functional transform inputs to outputs, e.g.Functional transform inputs to outputs, e.g. Thrust demand from thrust-lever resolver angleThrust demand from thrust-lever resolver angle

– Event response – action on event, e.g.Event response – action on event, e.g. Active ATP on passing signal at dangerActive ATP on passing signal at danger

– Non-functional (NFR) – constraints, e.g.Non-functional (NFR) – constraints, e.g. Timing, resource usage, availabilityTiming, resource usage, availability

Page 36: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Changes to TypesChanges to Types

Note requirements types can Note requirements types can change – NFR to functionalchange – NFR to functional– System – achieve < 10System – achieve < 10-5-5 per hour per hour

unsafe failuresunsafe failures– Software – detect failure modes x, y Software – detect failure modes x, y

and z of the pressure sensor P30 with and z of the pressure sensor P30 with 99% coverage, and mitigate by … 99% coverage, and mitigate by …

Requirements notations/methods Requirements notations/methods must be able to reflect must be able to reflect requirements typesrequirements types

Page 37: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Requirements ChallengesRequirements Challenges

Even if systems requirements are Even if systems requirements are clear, software requirementsclear, software requirements– Must deal with quantisation (sensors)Must deal with quantisation (sensors)– Must deal with temporal constraints Must deal with temporal constraints

(iteration rates, jitter)(iteration rates, jitter)– Must deal with failuresMust deal with failures

Systems requirements often trickySystems requirements often tricky– Open-loop control under failureOpen-loop control under failure– Incomplete understanding of physicsIncomplete understanding of physics

Page 38: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Requirements ErrorsRequirements Errors

Project data suggestsProject data suggests– Typically more than 70% of errors found Typically more than 70% of errors found

post unit test are requirements errorspost unit test are requirements errors– F22 (and other data sets) put F22 (and other data sets) put

requirements errors at 85%requirements errors at 85%– Finding errors drives changeFinding errors drives change

The later they are found, the greater the costThe later they are found, the greater the cost Some data, e.g. F22, write 3 LoC for every Some data, e.g. F22, write 3 LoC for every

one deliveredone delivered

Page 39: Developing Safety Critical Software: Fact and Fiction John A McDermid.

The Certainty of ChangeThe Certainty of Change

Change mainly due to Change mainly due to requirements errorsrequirements errors– high cost due to reverification in high cost due to reverification in

presence of dependenciespresence of dependencies

0

100

200

300

Module

%C

hang

e

The majority ofmodules are

stable

Cumulative change

20%May verify all code 3 times!

Page 40: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Requirements and Requirements and OrganisationsOrganisations

Requirements errors are often Requirements errors are often based on misinterpretations (its based on misinterpretations (its obvious that …)obvious that …)– Thus errors (more likely to) happen at Thus errors (more likely to) happen at

organisational/cultural boundariesorganisational/cultural boundaries Systems to software, safety to software …Systems to software, safety to software …

– Study at NASA by Robyn LutzStudy at NASA by Robyn Lutz 85% of requirements errors arose at 85% of requirements errors arose at

organisational boundariesorganisational boundaries

Page 41: Developing Safety Critical Software: Fact and Fiction John A McDermid.

SummarySummary

Getting requirements right is a Getting requirements right is a major challengemajor challenge– Software is deeply embeddedSoftware is deeply embedded

Discretisation, timing etc. an issueDiscretisation, timing etc. an issue

– Physics not always understoodPhysics not always understood Requirements (genuinely) changeRequirements (genuinely) change

– Notion that can get requirements right Notion that can get requirements right is simplisticis simplistic

Notion of “correct by construction” Notion of “correct by construction” optimisticoptimistic

Page 42: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Part 2Part 2Fiction – get the functionality rightFiction – get the functionality right

Fiction – abstraction is the solutionFiction – abstraction is the solution

Fiction – safety critical code must Fiction – safety critical code must be “bug free”be “bug free”

Some key messagesSome key messages

Page 43: Developing Safety Critical Software: Fact and Fiction John A McDermid.

OverviewOverview

Fact – costs and distributionsFact – costs and distributions Fiction – get the requirements rightFiction – get the requirements right Fiction – get the functionality rightFiction – get the functionality right Fiction – abstraction is the solutionFiction – abstraction is the solution Fiction – safety critical code must Fiction – safety critical code must

be “bug free”be “bug free” Some key messagesSome key messages

Page 44: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Functionality FictionFunctionality Fiction

Fiction statedFiction stated– Get the functionality right, and the Get the functionality right, and the

rest is easyrest is easy FactsFacts

– Functionality doesn’t drive designFunctionality doesn’t drive design Non-Functional Requirements (NFRs) are Non-Functional Requirements (NFRs) are

criticalcritical Functionality isn’t independent of NFRsFunctionality isn’t independent of NFRs

– Fault management is a major aspect Fault management is a major aspect of complexityof complexity

Page 45: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Functionality and DesignFunctionality and Design

FunctionalityFunctionality– System functions allocated to System functions allocated to

softwaresoftware– Elements of REQ which end up in Elements of REQ which end up in

SOFTREQSOFTREQ NB, most of themNB, most of them

– At software level, requirements have At software level, requirements have to allow for properties of sensors, etc.to allow for properties of sensors, etc.

Consider an aero engine exampleConsider an aero engine example

Page 46: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Engine Pressure BlockEngine Pressure Block

Page 47: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Engine Pressure SensorEngine Pressure Sensor

Aero engine measures P0Aero engine measures P0– Atmospheric pressureAtmospheric pressure– A key input to fuel control, etc.A key input to fuel control, etc.

Example input P0Example input P0SensSens

– Byte from A/D converterByte from A/D converter– Resolution – 1 bit Resolution – 1 bit 0.055 psia 0.055 psia– Base = 2, 0 = low (high value Base = 2, 0 = low (high value 16) 16)– Update rate = 50mSUpdate rate = 50mS

Page 48: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Pressure Sensing ExamplePressure Sensing Example

Simple requirementSimple requirement– Provide validated P0 value to other Provide validated P0 value to other

functions and aircraftfunctions and aircraft Output data itemOutput data item

– P0P0ValVal

16 bits16 bits Resolution – 1 bit Resolution – 1 bit 0.00025 psia 0.00025 psia Base = 0, 0 = low (high value Base = 0, 0 = low (high value 16.4) 16.4)

Page 49: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Example RequirementsExample Requirements

Simple functional requirementSimple functional requirement– RS1: P0RS1: P0ValVal shall be provided within 0.03 bar of shall be provided within 0.03 bar of

sensed valuesensed value

– R1: P0R1: P0ValVal = P0 = P0SensSens [ [± 0.03] (software level)± 0.03] (software level)

– Note: simple algorithmNote: simple algorithmP0P0ValVal = (P0 = (P0SensSens * 0.055 + 2)/0.00025 * 0.055 + 2)/0.00025

P0P0SensSens = 0 → P0 = 0 → P0ValVal = 8000 = 00010 1111 0100 0000 binary = 8000 = 00010 1111 0100 0000 binary

P0P0SensSens = 1111 1111 = 16.025 → P0 = 1111 1111 = 16.025 → P0ValVal = 64100 = 1111 = 64100 = 1111 1010 0100 01001010 0100 0100

– Does R1 meet RS1? Does the algorithm meet R1?Does R1 meet RS1? Does the algorithm meet R1?

Page 50: Developing Safety Critical Software: Fact and Fiction John A McDermid.

A Non-Functional A Non-Functional RequirementRequirement

Assume duplex sensorsAssume duplex sensors– P0P0Sens1Sens1 and P0 and P0Sens2Sens2

System levelSystem level– RS2: no single point of failure shall lead to RS2: no single point of failure shall lead to

loss of function (assume loss of function (assume P0P0ValVal is covered by is covered by this requirement)this requirement)

This will be a safety or availability requirementThis will be a safety or availability requirement NB in practice may be different sensors wired NB in practice may be different sensors wired

to different channels, and cross channel commsto different channels, and cross channel comms

Page 51: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Software Level NFRSoftware Level NFR

Software levelSoftware level– R2: If R2: If | | P0P0Sens1Sens1 - P0 - P0Sens2Sens2 | < 0.06 | < 0.06

then then P0P0ValVal = (P0 = (P0Sens1Sens1 + P0 + P0Sens2Sens2 )/2 )/2 else else P0P0ValVal = 0 = 0

– Is R2 a valid requirement?Is R2 a valid requirement? In other words, have we stated the right In other words, have we stated the right

thing?thing?

– Does R2 satisfy RS2?Does R2 satisfy RS2?

Page 52: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Temporal Requirements Temporal Requirements

Timing is often an important system Timing is often an important system propertyproperty– It may be a safety property, e.g. It may be a safety property, e.g.

sequencing in weapons releasesequencing in weapons release System level System level

– RS3: validated pressure value shall never RS3: validated pressure value shall never lag sensed value by more than 100mSlag sensed value by more than 100mS

NB not uncommon to ensure quality of NB not uncommon to ensure quality of controlcontrol

Page 53: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Software Level TimingSoftware Level Timing

Software level requirement, Software level requirement, assuming scheduling on 50mS cycles assuming scheduling on 50mS cycles – R3: P0R3: P0ValVal (t) = P0 (t) = P0SensSens (t-2) [ (t-2) [± 0.03]± 0.03]

– If t is quantised in units of 50mS, If t is quantised in units of 50mS, representing cycles representing cycles

– Is R3 a valid requirement?Is R3 a valid requirement?– Does R3 satisfy RS3?Does R3 satisfy RS3?

NB need data on processor timing to NB need data on processor timing to validatevalidate

Page 54: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Timing and SafetyTiming and Safety

Software levelSoftware level– R4: If R4: If | | P0P0Sens1Sens1 (t) - P0 (t) - P0Sens2Sens2 (t) (t) | < 0.06 | < 0.06

then then P0P0ValVal (t+1) = (P0 (t+1) = (P0Sens1Sens1 (t) + (t) + P0 P0Sens2Sens2 (t))/2 (t))/2 else if | else if | P0P0Sens1Sens1 (t) - P0 (t) - P0Sens1Sens1 (t-1) (t-1) | < | <

| P | P00Sens2Sens2 (t) - P0 (t) - P0Sens2Sens2 (t-1) (t-1) | | then then P0P0ValVal (t+1) = (P0 (t+1) = (P0Sens1Sens1 (t)) (t))

else P0 else P0ValVal (t+1) = (P0 (t+1) = (P0Sens2Sens2 (t)) (t))

– What does R4 respond to (can you think of What does R4 respond to (can you think of an RS4)?an RS4)?

Page 55: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Requirements ValidationRequirements Validation

Is R4 a valid requirement?Is R4 a valid requirement?– Is R4 “safe” in the system context Is R4 “safe” in the system context

(assume that misleading values of P0 (assume that misleading values of P0 could lead to a hazard, e.g. a thrust could lead to a hazard, e.g. a thrust roll-back on take off)roll-back on take off)

Does R4 satisfy RS3?Does R4 satisfy RS3? Does R4 satisfy RS2?Does R4 satisfy RS2? Does R4 satisfy RS1?Does R4 satisfy RS1?

Page 56: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Real RequirementsReal Requirements

Example still somewhat simplisticExample still somewhat simplistic– Need to store sensor state, i.e. Need to store sensor state, i.e.

knowledge of what has failedknowledge of what has failed Typically timing, safety, etc. drive Typically timing, safety, etc. drive

the detailed designthe detailed design– Aspects of requirements, e.g. error Aspects of requirements, e.g. error

bands, depend on timing of codebands, depend on timing of code– Requirements involve trade-offs Requirements involve trade-offs

between, say, safety and availabilitybetween, say, safety and availability

Page 57: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Requirements and Requirements and ArchitectureArchitecture

NFRs also drive the architectureNFRs also drive the architecture– Failure rate 10Failure rate 10-6-6 per hour per hour

Probably just duplex (especially if fail stop)Probably just duplex (especially if fail stop) Functions for cross comms and channel Functions for cross comms and channel

changechange

– Failure rate 10Failure rate 10-9-9 per hour per hour Probably triplex or quadruplexProbably triplex or quadruplex Changes in redundancy managementChanges in redundancy management

NB change in failure rate affects low level NB change in failure rate affects low level functionsfunctions

Page 58: Developing Safety Critical Software: Fact and Fiction John A McDermid.

QuantificationQuantification

The “system level” functionality is The “system level” functionality is in the minorityin the minority– Typically over half is fault managementTypically over half is fault management– EuroFighter exampleEuroFighter example

FCS FCS 1/3 MLoC 1/3 MLoC Control laws Control laws 18 kLoC 18 kLoC

Note, very hard to validateNote, very hard to validate– 777 flight incident in Australia due to 777 flight incident in Australia due to

error in fault management, and error in fault management, and software changesoftware change

Page 59: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Boeing 777 Incident near PerthBoeing 777 Incident near Perth

Problem caused by Air Data Inertial Problem caused by Air Data Inertial Reference Unit (ADIRU)Reference Unit (ADIRU)– Software contained a latent fault Software contained a latent fault

which was revealed by a changewhich was revealed by a changeJune 2001 accelerometer

#5 fails with erroneous

high output values, ADIRU

discards output valuesPower Cycle on ADIRU

occurs each occasion

aircraft electrical system

is restarted

Aug 2006 accelerometer

#6 fails, latent software

error allows use of

previously failed accel #5

Page 60: Developing Safety Critical Software: Fact and Fiction John A McDermid.
Page 61: Developing Safety Critical Software: Fact and Fiction John A McDermid.

SummarySummary

Functionality is importantFunctionality is important– But not the But not the primaryprimary driver of design driver of design

Key drivers of designKey drivers of design– Safety and availabilitySafety and availability

Turns into fault management at software Turns into fault management at software levellevel

– Timing behaviourTiming behaviour Functionality not independent of NFRsFunctionality not independent of NFRs

– Requirements change to reflect NFRsRequirements change to reflect NFRs

Page 62: Developing Safety Critical Software: Fact and Fiction John A McDermid.

OverviewOverview

Fact – costs and distributionsFact – costs and distributions Fiction – get the requirements rightFiction – get the requirements right Fiction – get the functionality rightFiction – get the functionality right Fiction – abstraction is the solutionFiction – abstraction is the solution Fiction – safety critical code must Fiction – safety critical code must

be “bug free”be “bug free” Some key messagesSome key messages

Page 63: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Abstraction FictionAbstraction Fiction

Fiction statedFiction stated– Careful use of abstraction will address Careful use of abstraction will address

problems of requirements etc.problems of requirements etc. FactFact

– Most forms of abstraction don’t work Most forms of abstraction don’t work in embedded control systemsin embedded control systems

State abstraction is of some useState abstraction is of some use

The devil is in the detailThe devil is in the detail

Page 64: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Data AbstractionData Abstraction

Most data is simpleMost data is simple– Boolean, integer, floating pointBoolean, integer, floating point– Complex data structures are rareComplex data structures are rare

May exist in a maintenance subsystem May exist in a maintenance subsystem (e.g. records of fault events)(e.g. records of fault events)

– Systems engineers work in low-level Systems engineers work in low-level terms, e.g. pressures, temperatures, terms, e.g. pressures, temperatures, etc.etc.

Hence requirements are in these termsHence requirements are in these terms

Page 65: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Control Models are Low LevelControl Models are Low Level

Page 66: Developing Safety Critical Software: Fact and Fiction John A McDermid.

LoosenessLooseness

A key objective is to ensure that A key objective is to ensure that requirements are completerequirements are complete– Specify behaviour under all conditionsSpecify behaviour under all conditions– Normal behaviour (everything working)Normal behaviour (everything working)– Fault conditionsFault conditions

Single faults, and combinationsSingle faults, and combinations

– Impossible conditionsImpossible conditions So design is robust against incompletely So design is robust against incompletely

understood requirements/environmentunderstood requirements/environment

Page 67: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Despatch RequirementsDespatch Requirements

Can despatch (use) system Can despatch (use) system “carrying” failures“carrying” failures– Despatch analysis based on Markov Despatch analysis based on Markov

modelmodel– Evaluate probability of being in non-Evaluate probability of being in non-

despatchable state, e.g. only one despatchable state, e.g. only one failure from hazardfailure from hazard

– Link between safety/availability Link between safety/availability process and software designprocess and software design

Page 68: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Fault Management LogicFault Management Logic

Fault-accommodation Fault-accommodation requirements may use four valued requirements may use four valued logiclogic– Working, undetected, detected, Working, undetected, detected,

and confirmedand confirmed– Table illustrates Table illustrates

“logical and” ([.])“logical and” ([.])– Used for analysisUsed for analysis

.. ww uu dd cc

ww ww uu dd cc

uu uu uu dd cc

dd dd dd dd cc

cc cc cc cc cc

Page 69: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Example ImplementationExample Implementation

.. ww dd cc

ww ww dd cc

dd dd dd cc

cc cc cc cc

Page 70: Developing Safety Critical Software: Fact and Fiction John A McDermid.

State AbstractionState Abstraction

Some state abstraction is possibleSome state abstraction is possible– Mainly low-level state to operational Mainly low-level state to operational

modesmodes Aero engine controlAero engine control

– Want to produce thrust proportional to Want to produce thrust proportional to demand (thrust lever angle in cockpit)demand (thrust lever angle in cockpit)

– Can’t measure thrust directlyCan’t measure thrust directly– Can use various “surrogates” for thrustCan use various “surrogates” for thrust

Work with best value, but reversionary Work with best value, but reversionary modelsmodels

Page 71: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Thrust ControlThrust Control

Engine pressure ratio (EPR) – between Engine pressure ratio (EPR) – between atmosphere & the exhaust pressuresatmosphere & the exhaust pressures– Best approximation to thrustBest approximation to thrust– Depends on P0Depends on P0

Low level state modelling “health” of P0 sensorLow level state modelling “health” of P0 sensor

– If P0 fails, revert to use N1 (fan speed)If P0 fails, revert to use N1 (fan speed)– Have control modesHave control modes

EPR, N1, etc. which abstract away from details EPR, N1, etc. which abstract away from details of sensor fault stateof sensor fault state

Page 72: Developing Safety Critical Software: Fact and Fiction John A McDermid.

SummarySummary

Opportunity for abstraction much Opportunity for abstraction much more limited than in “IT” systemsmore limited than in “IT” systems– Hinders many classical approachesHinders many classical approaches

Abstraction is of some valueAbstraction is of some value– Mainly state abstraction, relating low-Mainly state abstraction, relating low-

level state information, e.g. sensor level state information, e.g. sensor “health” to system level control modes“health” to system level control modes

NB formal refinement, a la B, is NB formal refinement, a la B, is helped by this, as little data helped by this, as little data refinementrefinement

Page 73: Developing Safety Critical Software: Fact and Fiction John A McDermid.

OverviewOverview

Fact – costs and distributionsFact – costs and distributions Fiction – get the requirements rightFiction – get the requirements right Fiction – get the functionality rightFiction – get the functionality right Fiction – abstraction is the solutionFiction – abstraction is the solution Fiction – safety critical code must Fiction – safety critical code must

be “bug free”be “bug free” Some key messagesSome key messages

Page 74: Developing Safety Critical Software: Fact and Fiction John A McDermid.

““Bug Free” FictionBug Free” Fiction

Fiction statedFiction stated– Safety critical code must be “bug Safety critical code must be “bug

free”free” FactsFacts

– It is hard to correlate fault density It is hard to correlate fault density and failure rateand failure rate

– <1 fault per kLoC is pretty good!<1 fault per kLoC is pretty good!– Being “bug free” is unrealistic, and Being “bug free” is unrealistic, and

there is a need to “sentence” faultsthere is a need to “sentence” faults

Page 75: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Close to Fault Free?Close to Fault Free?

DO 178A Level 1 software (engine DO 178A Level 1 software (engine controller) – now would be DAL Acontroller) – now would be DAL A– Natural language specifications and Natural language specifications and

macro-assemblermacro-assembler– Over 20,000,000 hours without Over 20,000,000 hours without

hazardous failurehazardous failure– But on version 192 (last time I knew)But on version 192 (last time I knew)

Changes “trims” to reflect hardware Changes “trims” to reflect hardware propertiesproperties

Page 76: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Pretty BuggyPretty Buggy

DO 178B Level A software (aircraft DO 178B Level A software (aircraft system)system)– Natural language, control diagrams Natural language, control diagrams

and high level languageand high level language– 118 “bugs” found in first 18 months, 118 “bugs” found in first 18 months,

20% critical20% critical– Flight incidents but no accidentsFlight incidents but no accidents– Informally “less safe” than the other Informally “less safe” than the other

example, but still flying, still no example, but still flying, still no accidentsaccidents

Page 77: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Fault DensityFault Density

So far as one can get dataSo far as one can get data– <1 flaw per kLoC for SC is pretty good<1 flaw per kLoC for SC is pretty good– Commercial much worse, may be as Commercial much worse, may be as

high as 30 faults per kLoChigh as 30 faults per kLoC– Some “extreme” casesSome “extreme” cases

Space Shuttle – 0.1 per kLoCSpace Shuttle – 0.1 per kLoC Praxis system – 0.04 per kLoCPraxis system – 0.04 per kLoC

– But will a hazardous situation arise?But will a hazardous situation arise?

Page 78: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Faults and FailuresFaults and Failures

Why doesn’t software “crash” more Why doesn’t software “crash” more often?often?– Paths miss “bugs” as Paths miss “bugs” as

don’t get critical datadon’t get critical data– Testing “cleans up” Testing “cleans up”

common pathscommon paths– Also “subtle faults” Also “subtle faults”

which don’t cause a crashwhich don’t cause a crash NB IBM OS NB IBM OS

– 1/3 of failures were “3,000 year events”1/3 of failures were “3,000 year events”

Program Execution Space

Program PathBugs

Page 79: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Commercial SoftwareCommercial Software

Examples of data dependent faults?Examples of data dependent faults?– Loss of availability is acceptableLoss of availability is acceptable– Most SCS have to operate through faultsMost SCS have to operate through faults

Can’t “fail stop” – even reactor protection Can’t “fail stop” – even reactor protection software needs to run circa 24 hours for heat software needs to run circa 24 hours for heat removalremoval

Pic

ture

s ©

3B

P.c

om

Page 80: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Retrospective AnalysisRetrospective Analysis

Retrospective analysis of US civil Retrospective analysis of US civil product for UK military useproduct for UK military use– Analysis of over 500kLoC, in several Analysis of over 500kLoC, in several

languageslanguages– Found 23 faults per kLoC, 3% safety Found 23 faults per kLoC, 3% safety

criticalcritical– Vast majority not safety criticalVast majority not safety critical

NB most of the 3% related to assumptions, NB most of the 3% related to assumptions, i.e. were requirements issuesi.e. were requirements issues

Page 81: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Find and FixFind and Fix

If a fault is found it may not be fixedIf a fault is found it may not be fixed– First it will be “sentenced”First it will be “sentenced”

If not critical, it probably won’t be fixedIf not critical, it probably won’t be fixed

– Potentially critical faults will be analysedPotentially critical faults will be analysed Can it give rise to a problem in practice?Can it give rise to a problem in practice? If decide not to change, document reasonsIf decide not to change, document reasons

– Note: changes may bring (unknown) faultsNote: changes may bring (unknown) faults e.g. Boeing 777 near Perthe.g. Boeing 777 near Perth

Page 82: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Perils of ChangePerils of Change

Module

Dependency

Page 83: Developing Safety Critical Software: Fact and Fiction John A McDermid.

SummarySummary

Probably no safety critical software is Probably no safety critical software is fault freefault free– Less than 1 fault per kLoC is goodLess than 1 fault per kLoC is good– Hard to correlate fault density with Hard to correlate fault density with

failure rate (especially unsafe failures)failure rate (especially unsafe failures) In practiceIn practice

– Sentence faults, and change if net Sentence faults, and change if net benefitbenefit

Need to show presence of faultsNeed to show presence of faults– To decide if need to remove themTo decide if need to remove them

Page 84: Developing Safety Critical Software: Fact and Fiction John A McDermid.

OverviewOverview

Fact – costs and distributionsFact – costs and distributions Fiction – get the requirements rightFiction – get the requirements right Fiction – get the functionality rightFiction – get the functionality right Fiction – abstraction is the solutionFiction – abstraction is the solution Fiction – safety critical code must Fiction – safety critical code must

be “bug free”be “bug free” Some key messagesSome key messages

Page 85: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Summary of the SummariesSummary of the Summaries

Safety critical softwareSafety critical software– Has a good track recordHas a good track record– Increased dependency, complexity, Increased dependency, complexity,

etc. mean that this may not continueetc. mean that this may not continue Much of the difficulty is in Much of the difficulty is in

requirementsrequirements– Partly a systems engineering issuePartly a systems engineering issue– Many of the problems arise from errors Many of the problems arise from errors

in communication in communication – Classical CS approaches limited utilityClassical CS approaches limited utility

Page 86: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Research Directions (1)Research Directions (1)

Advances may come at architectureAdvances may come at architecture– Improve notations to work at architecture Improve notations to work at architecture

and implement via code generationand implement via code generation– Develop approaches, e.g. good interfaces, Develop approaches, e.g. good interfaces,

product lines, to ease changeproduct lines, to ease change– Focus on V&V, recognising that the aim is Focus on V&V, recognising that the aim is

fault-findingfault-finding AADL an interesting developmentAADL an interesting development

Page 87: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Research Directions (2)Research Directions (2)

Advances may come at requirementsAdvances may come at requirements– Work with systems engineering notationsWork with systems engineering notations

Improve to address issues needed for software Improve to address issues needed for software design and assessment, NB PFSdesign and assessment, NB PFS

Produce better ways of mapping to architectureProduce better ways of mapping to architecture Try to find ways of modularising, to bound impact Try to find ways of modularising, to bound impact

of change, e.g. contractsof change, e.g. contracts

– Focus on V&V, e.g. simulationFocus on V&V, e.g. simulation Developments of Parnas/Jackson ideas?Developments of Parnas/Jackson ideas?

Page 88: Developing Safety Critical Software: Fact and Fiction John A McDermid.

Research Directions (3)Research Directions (3)

Work on automation, especially for Work on automation, especially for V&VV&V– Design remains creativeDesign remains creative– V&V is 50% of life-cycle cost, and can be V&V is 50% of life-cycle cost, and can be

automatedautomated– Examples includeExamples include

Auto-generation of test data and test oraclesAuto-generation of test data and test oracles Model-checking consistency/completenessModel-checking consistency/completeness

The best way to apply “classical” CS?The best way to apply “classical” CS?

Page 89: Developing Safety Critical Software: Fact and Fiction John A McDermid.

CodaCoda

Safety critical software researchSafety critical software research– Always “playing catch up”Always “playing catch up”– Aspirations for applications growing Aspirations for applications growing

fastfast To be successfulTo be successful

– Focus on “right problems”, i.e. where Focus on “right problems”, i.e. where the difficulties arise in practicethe difficulties arise in practice

– If possible work with industry – to try If possible work with industry – to try to provide solutions to their problemsto provide solutions to their problems