Concepts in Software Safety

31
Principles of Software Safety Sanders, Ver 4 Copyright 2002 Guidelines For Clinical Information Systems And The Electronic Medical Record (EMR)

description

An overview of software safety concepts and applicability to electronic medical records.

Transcript of Concepts in Software Safety

Page 1: Concepts in Software Safety

Principles of Software Safety

Sanders, Ver 4 Copyright 2002

Guidelines For Clinical Information Systems And The Electronic Medical Record (EMR)

Page 2: Concepts in Software Safety

2

Overview

• Premise• Background• Brief history of software safety

– Examples of software safety incidents– Analysis concepts

• Characteristics of a “safe software” environment• Principles for clinical information systems and the

Electronic Medical Record (EMR)

Page 3: Concepts in Software Safety

3

Premise

1. Clinical Information Systems and Electronic Medical Records (EMR) are safety-critical, software-based systems

– Lives are at stake if the software is “unsafe”

2. These systems should be designed, developed, and maintained in similar fashion to other safety-critical software systems

– Transportation, medical devices, nuclear power, weapon systems

Page 4: Concepts in Software Safety

4

My Background: I’ve given this topic some thought…

• Air Force Nuclear Weapons Safety Program: 1989-1995– Peacekeeper and Minuteman Intercontinental Ballistic Missile (ICBM)

launch and guidance control software safety analysis• TRW Independent Research and Development (IR&D): 1992-1995

– “Assessing Software Safety and Risk in Commercially Developed Software”

– “Predicting Software Safety and Reliability Using Expert Systems”• Army Tactical Nuclear Weapons Safety Office: 1993-1994

– Pershing Missile Software Safety Assessment for Unauthorized Launch• National Security Agency (NSA): 1993-1995

– U.S. Nuclear Command and Control Risk Assessment• Food and Drug Administration:1995

– “Applying 510k Principles to Computerized Patient Records”• Intermountain Health Care: 1997-2004

Page 5: Concepts in Software Safety

5

Characteristics of Safe Software

• Software is “safe” if…– It has features and procedures which ensure that

it performs predictably under normal and abnormal conditions

– The likelihood of an undesirable event occurring in the execution of that software is minimized

– If an undesirable event does occur, the consequences are controlled and contained

“Software Safety and Reliability”--D. Herrman

Page 6: Concepts in Software Safety

6

Key Concepts

• Not all software should be developed and tested using the same methodology– An EMR vs. A web page for posting minutes– Adjust your methodology according to your risks

• No software-based, safety-critical system is 100% “safe”– The question is: How safe is safe enough?

• Software safety is a component of system safety– Software safety must be evaluated within the context of

the system it operates, including the humans that interact with that system

Page 7: Concepts in Software Safety

7

Drivers of Software Safety History

• 1960s– Nuclear Intercontinental Ballistic Missile Program– Apollo Space Program

• 1970s– Space Transportation System (Space Shuttle)– Food and Drug Administration– Department of Transportation– Department of Energy

• 1980s and 1990s– Rapid increase in software dependence within all the above– Control of safety critical computer systems moves from hardware-

based logic to software-based logic– Complexity in all of these environments increases

Page 8: Concepts in Software Safety

8

A Few Notable Examples

• Patriot Missile system in Gulf War– Abnormal condition: Continuous operation, no

“reboot”• Chrysler Automotive Jeep

– Sudden acceleration transitioning from Park to Drive• Therac-25 Radiation Therapy

– Dose calculation algorithm error• Washington D.C. Metro

– Central control system failure• Radiology report failed to display

– Missed diagnosis, delayed treatment for cancer

Page 9: Concepts in Software Safety

9

The Role of Risk Assessment

Risk = Frequency x Consequence

• Frequency– How often is this bad event likely to occur?

• Probability of an event occurring during a given time frame

• Consequence– The business impact of that bad event– If possible, it should be measured in dollars– Not always possible

• Could be measured in lives, customers lost, etc.

• Risk– Ideally, expressed as dollars lost per unit of time

Page 10: Concepts in Software Safety

10

Risk Prevention: Software Safety Analysis

• Two basic types of safety analysis techniques– Event based: “If we made a mistake in this

software, could it lead to a patient safety incident? If so, how so and how severe?”

– Consequence based: “What would happen if we failed to retrieve all the radiology reports for this patient?”

Page 11: Concepts in Software Safety

11

Contributing Factors to Safety Risks

*-- From the FAA

*

Page 12: Concepts in Software Safety

12

Risk and Software Safety

• Frequency– How often will we experience a Patient Safety related event that is

attributable to a software error?– It a subset of your Software Defect Rate, assuming you are

tracking the number of “bugs” found in your software over a given period of time

• Consequence– The clinical impact of that software error– If possible, it should be measured in dollars– If not dollars, some other meaningful unit of consequence

• Lawsuits, readmits, LOS, sentinel events, etc.

• Risk– Ideally, expressed as dollars lost per unit of time

Page 13: Concepts in Software Safety

13

Root Causes of Software Safety Violations

• Requirements specification and communication– Single largest source of errors– Software executes “correctly” according to the understanding of

the requirement, but the requirement was wrong within the scope of the system

– The requirement was simply misunderstood by the programmer• Design and coding errors

– Second most common source of errors– Poorly structured code– Timing errors, incorrect queries, syntax errors, algorithm errors,

results display errors, lack of self-tests, failed error handling

Page 14: Concepts in Software Safety

14

Root Causes of Software Safety Violations (cont)

• Computer hardware induced errors– Not as common, but possible– Hardware logic errors caused by overheating, power transients,

radiation, magnetic fields• Software change control process

– Changes to software introduces unanticipated errors– Can be traced back to requirements and programming errors– Failure of the configuration control process

• Inadequate testing– Software functions properly in unit testing– Software passes systems and integration testing, but should not

because safety-critical test coverage is inadequate– Can be traced back to requirements and programming errors

Page 15: Concepts in Software Safety

Specific Techniques for Software Safety Analysis

• All with their roots in hardware-based systems– But they can be applied effectively to software

• Failure Modes Effects Analysis– “If this software fails to return the correct lab value, what is the impact?”

• Fault Tree Analysis– “What are all the events that could cause this software to incorrectly

display lab values?”• Fault Hazard Analysis

– Typically uses a Fault Tree Analysis– Also considers human factors and operational procedures

• Common Cause Analysis– Looks across fault trees for common roots

• Sneak Circuit Analysis– “Stray” code that inhibits desired functions or causes undesired functions

to occur

Page 16: Concepts in Software Safety

16

General Software Safety Scenarios

• Software fails to perform a required function– Function not executed or answer never returned

• Software performs a function that is not required or intended– Wrong answer returned or issues wrong control instruction

• Software performs the right function, but at the wrong time or under inappropriate conditions

• Software timing or sequencing failure– Parallel executions fail– Synchronous or time-dependent executions fail

• Software fails to recognize a hazardous condition and react accordingly– Or, the software recognizes the condition but reacts improperly

Page 17: Concepts in Software Safety

17

Data Safety Areas

• Validity checks fail (or do not exist) before acting upon safety critical data– Illegal or out of range parameters

• Failure during initializing, clearing, or resetting critical data

• Validation failure of data addresses, pointers, indices, and variables

• Incorrect relationships established between files and records

• Detecting, handling, and/or correcting errors during data transfers

• Protecting data from being deleted or inadvertently overwritten

Page 18: Concepts in Software Safety

Creating a “Safe Software”Environment

What would an auditor from the FAA look for at Boeing?

Page 19: Concepts in Software Safety

19

Auditing for Software Safety

1. Is at least a high-level risk assessment conducted for software safety during the requirements and design phase?

2. Is the software testing and quality assurance process risk adjusted?

3. Are the test and development environments adequate for identifying safety risks before they appear in production?

4. Is there an emphasis on human computer interfaces (HCI) and their relationship to safety risks?

5. Is there a well-documented Safety Event Response process when software safety defects are discovered?

6. Is there a robust root cause discovery and communication process after a safety event has occurred?

7. Is there a software safety defect reporting and tracking system?8. Are there similar principles but different safety risk analysis

processes for software developed internally vs. purchased?

Page 20: Concepts in Software Safety

20

More Audit Areas

1. Is there an understanding and appreciation among the software development staff for safety risks?

2. Is there a clear nomenclature for characterizing software safety risk scenarios?

3. Is there a nomenclature for categorizing software defects based on safety risk?

4. Is there a software safety governance and oversight body?5. Is there a well-documented software engineering process for

safety critical applications?6. Is independent validation and verification of software part of

the development methodology?7. Is safety critical software more tightly controlled for

versioning and configuration? 8. Is there a certification program for software engineers that

are allowed to develop and work on safety critical software?

Page 21: Concepts in Software Safety

Applying This To Clinical Information Systems and EMR’s

Page 22: Concepts in Software Safety

22

Relating Patient Safety To EMR Software Safety

• Step 1: Define the general categories of patient safety risk scenarios, regardless of cause

• Step 2: Define the relationship between these general risk scenarios and the ability for the EMR or clinical information system to contribute to these scenarios

• Step 3: Use a software testing and safety tracking system to measure against these risk scenarios– “This function or module of software could contribute to a

Moderate patient safety risk scenario. We should design and test accordingly.”

– “This is a Severe software defect. It must be repaired immediately.”

Page 23: Concepts in Software Safety

23

Potential Categories of Patient Safety Risk Scenarios

Type 1: Catastrophic

Patient life is in grave danger. The probability for humans to recognize and intervene to mitigate this event is very low or non-existent. Intervention is required within seconds to prevent the loss of life.

Type 2: Severe Patient health is in immediate danger. The probability for humans to recognize and intervene to mitigate this event is low, but possible. Intervention is required within minutes to prevent serious injury or degradation of patient health that could lead to the loss of life.

Type 3: Moderate

Patient health is at risk. However, the probability for humans to recognize and intervene to mitigate this event is probable. Intervention is required within hours or a few days to prevent a moderate degradation in patient health.

Type 4: Minor Patient health is minimally at risk. The probability for humans to recognize and intervene to mitigate this event is high. Corrective action should occur within days or weeks to avoid any degradation in patient health.

Page 24: Concepts in Software Safety

24

Specific EMR Safety Risk Scenarios

1. Errors in computerized protocols and decision support tools

2. Invalid data posted to a patient record3. Valid data that is accidentally deleted4. Valid data that is not posted or not available

– Incomplete record

5. Clinical data posted to the wrong patient record– Right data, wrong patient

6. Data that appears current and timely, but is not

Their severity depends on the nature of the specific dataor decision making context

Page 25: Concepts in Software Safety

25

When Developing and Testing…

Software staff must ask: Does this software control or affect…

• Computerized protocols and decision support tools?• Data that is posted to the EMR?

– Valid data that is not posted• Incomplete record

– Clinical data posted to the wrong patient record• Right data, wrong patient

– Timeliness of EMR data

• The deletion of EMR data?• The performance or availability of the overall EMR?

If so, the rigor of the software engineering process must increase accordingly.

Page 26: Concepts in Software Safety

26

Software Control vs. Safety Risk

Catastrophic Severe Moderate Minor

Computerized protocols and decision support tools

Creating or updating data to the EMR

Deleting data from the EMR

Performance or availability of the overall EMR

Does my software control any of these?

If so, what is the probability that a defect could cause one of these scenarios?

Hig

h R

isk

= R

igor

ous

Des

ign

and

Test

ing

Page 27: Concepts in Software Safety

27

Most to Least Safety Critical?

GroupWise TransfusionManagement

HELPHELP2

Mysis

Not necessarily the same as “Business Criticality”…

For purposes of illustration…

Increasing Safety Criticality and Software Engineering Rigor

Page 28: Concepts in Software Safety

28

For Illustration, Again…

Information

System

Business

Criticality

Data Sensitivity

Safety Criticality

Accudose 1 1 1

AGFA 1 1 1

Amicus 1 1 1

AS/400 Financial

1 1 4

Audit Log 2 4 4

Software safety processes don’t apply

Page 29: Concepts in Software Safety

29

In Conclusion

• There is growing need for software safety awareness in clinical information systems and EMR’s

• There are significant lessons learned from other industries– We don’t have to reinvent the wheel

• To get started…– Think like an FAA Software Safety Auditor– Think like a patient– Think like a physician

Page 30: Concepts in Software Safety

30

Acknowledgements

• Commercial aviation– RTCA/DO-178B, Software Considerations in Airborne Systems and

Equipment Certification• European Committee for Electrotechnical Standards

– EN 51028, Software for Railway Control and Protection Systems• Society of Automotive Engineers

– JA 1002 Software Safety and Reliability Program Standard• U.S. FDA Center for Devices and Radiological Health

– Premarket Notification Submissions (510k)• “General Principles of Software Validation”

• U.S. FAA System Safety Handbook– Appendix J: Software Safety

• “Software Safety and Reliability”– Debra S. Herrman

• “Safeware: System Safety and Computers”– Nancy G. Levison

Page 31: Concepts in Software Safety

31

Thank You

• Please contact me if you have any questions– Dale Sanders– Intermountain Health Care www.ihc.com– 801-408-2121– [email protected]