Download - “ Forensic Software Engineering - Les HattonTitle Slide “ Forensic Software Engineering: avoiding the avoidable and living with the rest" by Professor Les Hatton CIS, University

Title Slide

“Forensic Software Engineering:avoiding the avoidable and living with the rest"

by

Professor Les Hatton

CIS, University of [email protected], [email protected]

V ersio n 1.1: 23/S ep /2004

SURV IV A L A ND A V OIDA NCE STRA TEGIES FOR SOFTW A RE FA ILURE

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Principles

Scope

Conclusions

Overview

Principles

An observation

• All the evidence suggests that many if not most failures exhibited by software controlled systems could have been avoided by techniques we already know how to apply.

Principles

The two engineering obligations

• When a system fails (and it will), it should be designed in such a way as to minimise deleterious effects on its user by means of built-in redundancy or otherwise

• When a system fails, the diagnostic system should always be able to provide an efficient means for finding the corresponding fault or faults so they can be corrected.

How to get it wrong

An Airbus having a bad day

A Tarom airlines Airbus which performed an uncontrolled dive,

climb, roll and spin near Orly in 1995 due to ‘ a fault in the automatic pilot’ .

The plane landed safely, a tribute to the pilots’ skill.

How to get it wrong:

Flight management system

An example from real life, Airbus A320 AF319, 25/8/88, (Mellor (1994)):-

• MAN PITCH TRIM ONLY, followed in quick succession by ...

• Fault in right main landing gear

• Fault in electrical flight control system computer 2

• Fault in alternate ground spoilers 1-2-3-5

• Fault in left pitch control green hydraulic circuit

• Loss of attitude protection

• Fault in Air Data System 2

• Autopilot 2 shown as engaged when it was disengaged

• LAVATORY SMOKE

Principles

Scope

Conclusions

Overview

Scope

Forensic Software Engineering encompasses:-

• Forensic Process Analysis

• Project failures

• Forensic Product Analysis

• Implementation failures (e.g. linguistic), test failures …

• Forensic Systems Analysis

• OS reliability, security, environment failures (eg arithmetic), compiler quality, implications for design …

In each area we are trying to answer the question “ Why ?” to avoid future occurrences

“Planning is an unnatural process. Its much more fun to get on with

it. The real benefit of not planning is that failure comes as a

complete surprise and is not preceded by months of worry.”

Sir John Harvey Jones.

Forensic Process Analysis

When the train of ambition pulls away from the platform of reality

Planning data from a grand ‘ unified’ programming project.

(Produced after the project seemed to be struggling.)

Note that unify appears next to unintelligible in the OCD.

Unsucce ss ful proje ct (abandone d)

0

10

20

30

40

50

60

70

80

90

1 24 39 54 64 74

D ay o f pred ict io n

Pre

dic

ted

day

s to

co

mpl

etio

n

P r e d ic t e d d a y s

T o c o m p le t io n

Ruthlessly controlling tasks

Project restarted with (far) less ambitious goals and tracked weekly with

results published on staff notice board.

Succe s ful proje ct (about 10% ove rrun)

0

20

40

60

80

100

120

140

160

1

15

36

50

64

78

92

106

120

134

176

D ay o f pred ict io n

Pre

dic

ted

day

s to

co

mpl

etio

n

P r e d ic t e d d a y s

T o c o m p le t io n

Forensic Process Analysis:

results so far

The following are necessary (but may not be sufficient) for satisfactory project planning:-

• No sub-task with software systems should be longer than 1 week

• Projects should be tracked weekly with progress published

• Programmers underestimate the time taken to do things by about 50%

Forensic Product Analysis:

Here we are essentially analysing the software product itself to understand the nature of its failures.

The T-experiments

Multi-industry study using static inspection, 1990-1992

E-S Aerospace ......

Single-industry study using N-version techniques, 1990-1993

Earth Science

Nuclear Control

StagesObserved many repeating faults in development of SKSDeveloped F77 parsing engine to study other packages, 1988-1992Developed C parsing engine to study similar problems in different language, 1990-1994Measured around 100 major systems 1988-1997Developed more advanced C parsing engine 1996-2000, restart experiments on embedded control systems

1988-1997: The T1 Fault experiments

Fault frequencies in C applications

Wei

ght

ed

faul

ts p

er

1000

lin

es.

0

5

10

15

20

25G

rap

hic

s

Gen

eral

Ele

c-en

g

Des

ign

Sys

tem

Co

ntr

ol

Dat

abas

e

Gra

ph

ics

Par

sin

g

Par

sin

g

Insu

ran

ce

Uti

litie

s

Uti

litie

s

Uti

litie

s

Co

ntr

ol

Co

mm

s

Co

mm

s

Average

of 8

Fault frequencies in Fortran 77 applications

Weighted faults p

er 1000 lines.

0

5

10

15

20

25general

elc-eng

EarthSci

parsing

CadCam

ChemM

od

EarthSci

elc-eng

fld-eng

mch-eng

mch-eng

nuc-eng

nuc-eng

oper-rs

CadCam

the-phys

Geodesy

Aerospace

general

Average

of 12

S a m e a p p lic a t io n a re a

o n e a t 1 4 0 / K L O C a n d o n e

a t 0 / K L O C

Data derived from CAA CDIS

Res

idua

l se

riou

s s

tati

c fa

ult

s

per

KLO

C

0

0.5

1

1.52

2.5

3

3.5

4

Average

dynamic

test ing

Thorough

dynamic

test ing

Where and how do faults fail historically ?

This study shows that statically detectable faults do in fact fail

during the life-cycle of the software.

StagesAn observation: Failure experiments are REALLY expensive compared with fault experiments“ T2” experiment, 1990-1993

Funded by Enterprise Oil plc in the UK

Compared the output of 9 packages all in Fortran 77 developed independently

Carried out with a colleague Andy Roberts

1990-1996: Failure experiments

T2 details

9 independently developed commercial versions of same ~750,000 F77 package of signal processing algorithms.Same input data tapes.Same processing parameters, (46 page monitored specification document).All algorithms published with precise specification, (e.g. FFT, deconvolution, finite-difference wave-equation solutions, tridiagonal matrix inversions and so on).All companies had detailed QA and testing procedures.

Overall goals were:To estimate the magnitude of disagreement.To see what form disagreement took.To identify poorly implemented processes.To attempt to improve agreement by feedback confirming nature of fault.To preserve complete confidentiality.

Basic goals of T2 experiment

Data analysis

Analysis goals were:Analyse at 14 "primary" calibration points and 20 "secondary" calibration points.

Analyse data in multiple windows.

Use two sets of independently developed analysis software to improve confidence.

Similarity v. coordinate: No feedback

Defect example 1: feedback detail

Similarity v. coordinate: Feedback to company 8

Defect example 2: feedback detail

Similarity v. coordinate: Feedback to company 3

The end product: 9 subtly different views of the geology

T2 Results

The accompanying slides illustrate:Only 1-2 significant figures agreement after processing.Disagreement is non-random and alternate views seem equally plausibleFeedback of anomalies along with other evidence confirms source of disagreement as software failure.

A summary of 10 years of failure experiments

Seismic processing software environment Number of significantfigures agreement

32 bit floating point arithmetic. 6

Same software on different platforms, samedata.

4

Same software on same platform, 5-1 lossycompression.

3-4

Same software subjected to continual'enhancement'

1-2

T2: different software, same specs, same data,same language, same parameters.

1

Forensic Product Analysis:

results so far

The following seem well supported

• Modern programming languages are riddled with poorly defined behaviour which programmers regularly fall prey to

• Even in high-quality environments, scientific software results degrade quite rapidly with the amount of computation even though the results may look plausible

• The choice of programming language is irrelevant, it is the fluency of the programmers in that language which matters most.

Forensic Systems Analysis:

Here we are essentially analysing the systems environment in which software functions to understand the nature of its failures.

Forensic Systems Analysis

We can identify at least the following areas:

• OS reliability

• Security

• Arithmetic environment

• Compiler quality

0.1

1

10

100

1000

10000

W'95 M acint osh7.5-8.1

N T 4.0 L inux Sparc4.1.3c

OS

MT

BF

(h

rs)

OS Reliability

Mean Time Between Failures of various operating systems

H o u r s 2 0 0 0 ,

X P

> 50,000 hours

Security

A very big subject which includes:-

• Monolithic v. modular design

• The use of binary format files

• How permissions and users are defined

• Software failures, (many security breaks are due to buffer overflow caused by programmers using inappropriate functions, (e.g. strcpy instead of strncpy))

Arithmetic environment

Even in 2004, computers still get arithmetic wrong:-

• Embedded System Paranoia extends the venerable paranoia to embedded control systems with similar results:-

http://www.leshatton.org/ESP_903.html

Compiler Quality

Note the following:-

• In April 2000, NIST formally stopped validating compilers in any language

• Most compilers fail the existing validation suites in some way or another – these departures are not documented, you assume the risk

• It seems very likely that the situation will get worse as languages get more complicated

Forensic Systems Analysis:

results so far

The following seem well supported

• If you need a reliable OS environment (MTBF > 500 hours) do not use Windows

• If you need a secure OS environment do not use Windows or binary file formats

• Test your computer arithmetic, it will probably have inconvenient flaws and may have major failures

• Do not take your compiler quality for granted. Seek written assurances from the supplier if possible.

Principles

Scope

Conclusions

Overview

Conclusions

Forensic Software Engineering seeks:

• To analyse failures in various categories and to disseminate this information in a searchable way to allow developers and scientists to avoid future occurrences of these failures.

All the evidence suggests that relatively simple use of avoidance strategies can lead to extraordinarily reliable applications

Reference site

For more information, downloadable papers and software, see:-

http://www.leshatton.org/

[email protected], [email protected]