Title Slide
“Forensic Software Engineering:avoiding the avoidable and living with the rest"
by
Professor Les Hatton
CIS, University of [email protected], [email protected]
V ersio n 1.1: 23/S ep /2004
SURV IV A L A ND A V OIDA NCE STRA TEGIES FOR SOFTW A RE FA ILURE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Principles
Scope
Conclusions
Overview
Principles
An observation
• All the evidence suggests that many if not most failures exhibited by software controlled systems could have been avoided by techniques we already know how to apply.
Principles
The two engineering obligations
• When a system fails (and it will), it should be designed in such a way as to minimise deleterious effects on its user by means of built-in redundancy or otherwise
• When a system fails, the diagnostic system should always be able to provide an efficient means for finding the corresponding fault or faults so they can be corrected.
How to get it wrong
An Airbus having a bad day
A Tarom airlines Airbus which performed an uncontrolled dive,
climb, roll and spin near Orly in 1995 due to ‘ a fault in the automatic pilot’ .
The plane landed safely, a tribute to the pilots’ skill.
How to get it wrong:
Flight management system
An example from real life, Airbus A320 AF319, 25/8/88, (Mellor (1994)):-
• MAN PITCH TRIM ONLY, followed in quick succession by ...
• Fault in right main landing gear
• Fault in electrical flight control system computer 2
• Fault in alternate ground spoilers 1-2-3-5
• Fault in left pitch control green hydraulic circuit
• Loss of attitude protection
• Fault in Air Data System 2
• Autopilot 2 shown as engaged when it was disengaged
• LAVATORY SMOKE
Principles
Scope
Conclusions
Overview
Scope
Forensic Software Engineering encompasses:-
• Forensic Process Analysis
• Project failures
• Forensic Product Analysis
• Implementation failures (e.g. linguistic), test failures …
• Forensic Systems Analysis
• OS reliability, security, environment failures (eg arithmetic), compiler quality, implications for design …
In each area we are trying to answer the question “ Why ?” to avoid future occurrences
“Planning is an unnatural process. Its much more fun to get on with
it. The real benefit of not planning is that failure comes as a
complete surprise and is not preceded by months of worry.”
Sir John Harvey Jones.
Forensic Process Analysis
When the train of ambition pulls away from the platform of reality
Planning data from a grand ‘ unified’ programming project.
(Produced after the project seemed to be struggling.)
Note that unify appears next to unintelligible in the OCD.
Unsucce ss ful proje ct (abandone d)
0
10
20
30
40
50
60
70
80
90
1 24 39 54 64 74
D ay o f pred ict io n
Pre
dic
ted
day
s to
co
mpl
etio
n
P r e d ic t e d d a y s
T o c o m p le t io n
Ruthlessly controlling tasks
Project restarted with (far) less ambitious goals and tracked weekly with
results published on staff notice board.
Succe s ful proje ct (about 10% ove rrun)
0
20
40
60
80
100
120
140
160
1
15
36
50
64
78
92
106
120
134
176
D ay o f pred ict io n
Pre
dic
ted
day
s to
co
mpl
etio
n
P r e d ic t e d d a y s
T o c o m p le t io n
Forensic Process Analysis:
results so far
The following are necessary (but may not be sufficient) for satisfactory project planning:-
• No sub-task with software systems should be longer than 1 week
• Projects should be tracked weekly with progress published
• Programmers underestimate the time taken to do things by about 50%
Forensic Product Analysis:
Here we are essentially analysing the software product itself to understand the nature of its failures.
The T-experiments
Multi-industry study using static inspection, 1990-1992
E-S Aerospace ......
Single-industry study using N-version techniques, 1990-1993
Earth Science
Nuclear Control
StagesObserved many repeating faults in development of SKSDeveloped F77 parsing engine to study other packages, 1988-1992Developed C parsing engine to study similar problems in different language, 1990-1994Measured around 100 major systems 1988-1997Developed more advanced C parsing engine 1996-2000, restart experiments on embedded control systems
1988-1997: The T1 Fault experiments
Fault frequencies in C applications
Wei
ght
ed
faul
ts p
er
1000
lin
es.
0
5
10
15
20
25G
rap
hic
s
Gen
eral
Ele
c-en
g
Des
ign
Sys
tem
Co
ntr
ol
Dat
abas
e
Gra
ph
ics
Par
sin
g
Par
sin
g
Insu
ran
ce
Uti
litie
s
Uti
litie
s
Uti
litie
s
Co
ntr
ol
Co
mm
s
Co
mm
s
Average
of 8
Fault frequencies in Fortran 77 applications
Weighted faults p
er 1000 lines.
0
5
10
15
20
25general
elc-eng
EarthSci
parsing
CadCam
ChemM
od
EarthSci
elc-eng
fld-eng
mch-eng
mch-eng
nuc-eng
nuc-eng
oper-rs
CadCam
the-phys
Geodesy
Aerospace
general
Average
of 12
S a m e a p p lic a t io n a re a
o n e a t 1 4 0 / K L O C a n d o n e
a t 0 / K L O C
Data derived from CAA CDIS
Res
idua
l se
riou
s s
tati
c fa
ult
s
per
KLO
C
0
0.5
1
1.52
2.5
3
3.5
4
Average
dynamic
test ing
Thorough
dynamic
test ing
Where and how do faults fail historically ?
This study shows that statically detectable faults do in fact fail
during the life-cycle of the software.
StagesAn observation: Failure experiments are REALLY expensive compared with fault experiments“ T2” experiment, 1990-1993
Funded by Enterprise Oil plc in the UK
Compared the output of 9 packages all in Fortran 77 developed independently
Carried out with a colleague Andy Roberts
1990-1996: Failure experiments
T2 details
9 independently developed commercial versions of same ~750,000 F77 package of signal processing algorithms.Same input data tapes.Same processing parameters, (46 page monitored specification document).All algorithms published with precise specification, (e.g. FFT, deconvolution, finite-difference wave-equation solutions, tridiagonal matrix inversions and so on).All companies had detailed QA and testing procedures.
Overall goals were:To estimate the magnitude of disagreement.To see what form disagreement took.To identify poorly implemented processes.To attempt to improve agreement by feedback confirming nature of fault.To preserve complete confidentiality.
Basic goals of T2 experiment
Data analysis
Analysis goals were:Analyse at 14 "primary" calibration points and 20 "secondary" calibration points.
Analyse data in multiple windows.
Use two sets of independently developed analysis software to improve confidence.
Similarity v. coordinate: No feedback
Defect example 1: feedback detail
Similarity v. coordinate: Feedback to company 8
Defect example 2: feedback detail
Similarity v. coordinate: Feedback to company 3
The end product: 9 subtly different views of the geology
T2 Results
The accompanying slides illustrate:Only 1-2 significant figures agreement after processing.Disagreement is non-random and alternate views seem equally plausibleFeedback of anomalies along with other evidence confirms source of disagreement as software failure.
A summary of 10 years of failure experiments
Seismic processing software environment Number of significantfigures agreement
32 bit floating point arithmetic. 6
Same software on different platforms, samedata.
4
Same software on same platform, 5-1 lossycompression.
3-4
Same software subjected to continual'enhancement'
1-2
T2: different software, same specs, same data,same language, same parameters.
1
Forensic Product Analysis:
results so far
The following seem well supported
• Modern programming languages are riddled with poorly defined behaviour which programmers regularly fall prey to
• Even in high-quality environments, scientific software results degrade quite rapidly with the amount of computation even though the results may look plausible
• The choice of programming language is irrelevant, it is the fluency of the programmers in that language which matters most.
Forensic Systems Analysis:
Here we are essentially analysing the systems environment in which software functions to understand the nature of its failures.
Forensic Systems Analysis
We can identify at least the following areas:
• OS reliability
• Security
• Arithmetic environment
• Compiler quality
0.1
1
10
100
1000
10000
W'95 M acint osh7.5-8.1
N T 4.0 L inux Sparc4.1.3c
OS
MT
BF
(h
rs)
OS Reliability
Mean Time Between Failures of various operating systems
H o u r s 2 0 0 0 ,
X P
> 50,000 hours
Security
A very big subject which includes:-
• Monolithic v. modular design
• The use of binary format files
• How permissions and users are defined
• Software failures, (many security breaks are due to buffer overflow caused by programmers using inappropriate functions, (e.g. strcpy instead of strncpy))
Arithmetic environment
Even in 2004, computers still get arithmetic wrong:-
• Embedded System Paranoia extends the venerable paranoia to embedded control systems with similar results:-
http://www.leshatton.org/ESP_903.html
Compiler Quality
Note the following:-
• In April 2000, NIST formally stopped validating compilers in any language
• Most compilers fail the existing validation suites in some way or another – these departures are not documented, you assume the risk
• It seems very likely that the situation will get worse as languages get more complicated
Forensic Systems Analysis:
results so far
The following seem well supported
• If you need a reliable OS environment (MTBF > 500 hours) do not use Windows
• If you need a secure OS environment do not use Windows or binary file formats
• Test your computer arithmetic, it will probably have inconvenient flaws and may have major failures
• Do not take your compiler quality for granted. Seek written assurances from the supplier if possible.
Principles
Scope
Conclusions
Overview
Conclusions
Forensic Software Engineering seeks:
• To analyse failures in various categories and to disseminate this information in a searchable way to allow developers and scientists to avoid future occurrences of these failures.
All the evidence suggests that relatively simple use of avoidance strategies can lead to extraordinarily reliable applications
Reference site
For more information, downloadable papers and software, see:-
http://www.leshatton.org/
Top Related