8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 1/56
Reliability
Glen Dobson
http://www.comp.lancs.ac.uk/~dobsong/teaching/dependability
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 2/56
Recapping
• Overview of dependability
“the property of a system such that we can justifiably place our reliance on theservice it delivers”
• Key dependability attributes
Reliability, Availability, Safety, Security
• Relationship between attributes Effect of primary attributes on each other as well as the effect of auxiliary
attributes• Criticality and conflict
Different attributes are critical to different systems
Improving one attribute may be detrimental to another
• Dependability requirements Use measurable criteria
High dependability => Hard (& expensive) to test
• Availability “Readiness for correct service”
Problems – inherent vs. operational availability, what availability does not tell you
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 3/56
Overview
•
Definition of reliability• Reliability metrics
• System failure
• Preventing failure
• Testing for failures
• Group Discussion
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 4/56
Definition
•
Laprie Reliability is the continuity of correct service
(a service is correct when it implements thesystem function)
• More pragmatically
In a given time period, for a given usage
pattern, how likely is a system to fail? Failure = deviating from the system specification
• For some systems Failure may mean deviating fromexpectations
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 5/56
Assessment qualifiers
•
Assessment of reliability depends on: Intended system usage
Intended operational profile
Context and environment of use
Time and period of use
Load and intensity of use
• Reliability is a function of these factors
• If any change, we must reassess
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 6/56
Reliability measures
•
POFOD - Prob. of failure on demand• ROCOF - Rate of occurrence of failure
• MTTF - Mean time to failure
• Each is suitable for different systems
• Suitable time units should be chosen
Physical or logical
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 7/56
Suitable time units???
•
Determine suitable time units for: ATM cash withdrawal
Editing with a word processor
Web server providing pages
General practitioner diagnosis
Nuclear reactor core shutdown Alert
Patient illness
Request
Minute
Transaction
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 8/56
Bath tub curve
Time
Failures
Effects:
• Hardware (degradation)
• Software (evolution)• People (mental faculties)
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 9/56
Software & the bath tub…
•
The bath tub is a sketch of hardware reliability• Tends not to apply so well to software
Burn in period is similar
Upgrades cause a sudden decrease in reliability Usually followed by another burn in period
Ideally the upgrade/burn in effect decreases over time
Once the software is no longer upgraded then the
reliability becomes constant
• Our bath tub will certainly have a bumpy bottom
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 10/56
Software Reliability Curve
Time
Failures
v1.0 v2.0 v3.0
Initial
Development
Software no longer
actively maintained
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 11/56
But these are only sketches
•
Systems are made up of software, hardware…• …and what about people?
What does their failure curve look like?
Burn in period = training/familiarisation
May forget/pick up bad habits over time
Organisational changes will have an effect
• e.g. high workload/stress will affect mental capacity
Personnel changes will have an effect
• So the bath tub is likely to have bumps all over
the place.
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 12/56
Failure manifestation
Fault FailureError
•
Fault The adjudged or hypothesised cause of an error. Typically a
mistake or lack in the preparation of a component.
• Error
The initial deviation in system state which eventually leads tofailure. This is usually unintended/unexpected behaviour.
• Failure
A deviation from correct service (i.e. from the specification or system
function)
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 13/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 14/56
Examples
• Fault Programming mistake
Poor training
Joined pins on chip
• Error Incorrect floating point calculation
Misfiling of documents
No actuator signal
•
Failure Mis-navigation
Treatment not given to patient
Burglar alarm not sounding
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 15/56
Fault classification
• Can classify along many axes:
Phase of creation/occurrence
Development Faults/Operation Faults
System Boundaries
Internal Faults/External Faults
Phenomonological Causes
Natural Faults/Human-made Faults
Dimension Hardware Faults/Software Faults
Objective
Malicious Faults/Non-Malicious Faults
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 16/56
Fault classification (2)
•
Can classify along many axes: Intent
Deliberate Faults/Non-deliberate Faults
Capability Accidental Fault/Incompetence Faults
Persistence
Permanent Faults/Transient Faults
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 17/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 18/56
Fault-Error-Failure
•
A Common source of confusion resultsfrom perspective on “system”, eg..
mental human fault⇒
programming error⇒
software fault⇒
software error⇒
system failure
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 19/56
The general case
Failure
Fault
Error
Failure
Fault Error Failure
Fault
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 20/56
Latencies
Fault FailureError
Fault latency Failure latency
• Faults may go undetected for a long time
• Faults may be dormant (i.e. never lead to an error) oractive
• Internal errors may never reach the system’s externalstate
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 21/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 22/56
Automated statistical testing
• Capture of operational profiles
• Auto generation of minimal test sets
• Used to assess the reliability of system
• Still unrealistic for VHR systems
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 23/56
Failure prevention
Fault FailureErrorFault
avoidanceFault
tolerance
It is better to avoid faults than tolerate them(prevention is better than the cure)
We might not regain our balance !
Faultremoval
Fault avoidance - Use tarmac instead
Fault removal - Council fix slab, walk around slab
Fault tolerance - Trip, but regain balance
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 24/56
Fault avoidance
•
Prevent inclusion of new faults Managed development
Formal methods
Quality culture
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 25/56
Managed development
• Mature lifecycle (Certified Process?)
• Management and control of: Requirements and designs
Evolution
Testing
Configuration
Documentation
• Traceability• Accountability
• Audit and review
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 26/56
Formal methods
•
Specify system using formal language• Precise vocabulary, syntax and semantics
• Based on maths, set theory, logic etc.
• Spec. can then be processed formally
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 27/56
Benefits of formal methods
• Reduce ambiguity & misunderstanding
• Automatically analyse for:
Consistency
Correctness Completeness
• Specs can be emulated or simulated
• Verify produced system using proofs• Prove various properties of system
• Transformation to construct system
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 28/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 29/56
Predicate calculus
•
Extends propositional calculus• Much more powerful
• Allows the use of variables
• “for all” (universal) quantifier
• “there exists” (existential) quantifier
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 30/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 31/56
Formal method problems
• Time consuming and expensive
• Hard to understand (not “fun”)
• Domain experts stand little chance
• Problems concealed by formality• Transformation slow and difficult
• Tool support is patchy
• How do we know specification is right?
• No single language suitable for everything
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 32/56
Applying formal methods
• Useful for specific sub-systems
• Useful for specific sub-problems (e.g.
safety)
• Cost effective if used appropriately
• Limited use in industry
•Has yet to deliver in large scale
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 33/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 34/56
Fault removal
• Detect and remove existing faults
Testing and debugging
Reviews/inspections
Static analysis
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 35/56
Testing
• Alpha/Beta - Acceptance/real operational use
• Black/White box - opaque/transparent components
• Functional/Structural - (as above)
• Defect/Statistical - explicit search/normal usage
• Unit/integration - component/whole system
• Regression - repeat test set after each repair
• Stress - push upper bounds, try to break system
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 36/56
Reviews and inspections
• Focus on artifacts produced
• No operational system required
• Expert judgement - cross discipline
•Examine & critique produced artifacts
• Requires knowledge of artifacts and domain
• Often cheaper than testing
• Not all problems are identified
• Used to assess non testable attributes
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 37/56
Fault tolerance
Fault FailureErrorFault
avoidanceFault
toleranceFault
removal
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 38/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 39/56
Assertions
• Run time checks
• Performed periodically
• Ensures system in a safe state
• Are we safe before we continue?
• Manually coded or auto-generated
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 40/56
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 41/56
Failure likelihood
• All modules failing together unlikely
• Provided modules are independent !!!
• Scaled up to N-version systems
• Automatic module repair/replacement
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 42/56
Recovery Blocks
• Redundant components used in series
• Acceptance test used to assess results
• If one component fails, try the next
• Roll-back state before retry
• Try until success or no more left
C i h
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 43/56
Comparing approaches
• Modular redundancy less efficient
• Since all modules MUST be executed
• Recovery blocks good (with no failure)
• But how to write the acceptance test ?
C t di it
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 44/56
Component diversity
• Diversity essential for these methods
• Both in design and implementation
• Each component should use different: System specifications
Design paradigms
Programming languages
Development environments
Algorithms
Backgrounds and cultures
P bl ith d d
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 45/56
Problems with redundancy
• Duplicate faults can still exist !!!
• People still make the same mistakes
• Hard to think of different ways to work
•
Added complexity can hide faults• Can’t do acceptance test for everything
• What happens if components don’t agree?
• Big efficiency hit (problem for RT systems)
• Can be very expensive (three times the cost)
? R li bilit f thi ?
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 46/56
? Reliability of this course ?
Fault avoidance(almost formal methods ;o)
Fault avoidanceRedundancy, Diversity
Fault removal
Fault removalTesting
Redundancy
Diversity
Fault removal - review
Fault tolerance
Trying to avoid the failure of this course:
• Used Ian’s book
• Also used alternative sources
• Used spell checker to make slides
• 5th year we have done this course
• Both Mark and Glen are lecturing
•
Checked each other’s slides• Mark (social sci.) Glen (comp sci.)
• Your comments in lectures
F lt i j ti
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 47/56
Fault injection
• Assess system or sub-component
• Test harness to assess fault tolerance
• Artificially create and introduce faults
• Can be performed on: Simulation of component
Actual component under test load
Actual component in actual use
• Each has it’s own pros and cons
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 48/56
Fault injection
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 49/56
Fault injection
FaultInjector
DataCollector
WorkloadGenerator
Fault
Library
Workload
Library
Collected
Data
Target System
Controller
Types of injection
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 50/56
Types of injection
• Compile time injection
• Run time injection
• Interactive injection
• Addition of new
• Alteration of existing• Removal of old
Problems with fault injection
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 51/56
Problems with fault injection
• Unrealistic operational profiles
• Time consuming for complex systems
• Impractical for VHR systems
• Instruments interfere with operation
• Limited to S/W and H/W components
Or is it ?
Summary
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 52/56
Summary
Group discussion
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 53/56
Group discussion
Dave works in an office. He uses a desktop PC with an off-the-shelf
OS. He does most of his work using a standard office suite (wordprocessor, spreadsheet, etc.). Dave browses the web and e-mails
his friends when he is bored. His dog is called Caruthers.
Dave finds using his computer very unreliable and regularly loses
work. What reasons could there be for this unreliability? What could
be done to improve reliability?
Consider both social and technical perspectives. Do not limit your
thinking to only those topics covered in the lecture material.
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 54/56
Thoughts - solutions
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 55/56
Thoughts - solutions
• Hardware checking and repair
• Install latest s/w updates
• Proactive behaviour by Dave (regular saving)
• Backing up procedures
• Avoiding unreliable features (e.g. tables)
• Redundancy - Davina replicates work
• Better testing, reviews, walkthroughs etc
• Formal modelling of Dave’s office (NOT)
• Ethno study of Dave’s office (insight ?)
•
Not cost effective to provide high reliability !
Further questions
8/12/2019 03 - Reliability Software
http://slidepdf.com/reader/full/03-reliability-software 56/56
Further questions
• What effect would the publishing of
portions of the OS on the web have?
• What effect would a new competitor for the
office suite have?• What effect would the installation of a new
version of the OS have?
Top Related