Usability)of)Paper Based&Industrial&Operating& Procedures ... · accomplishing a task. This study...

Usability)of)Paper!Based&Industrial&Operating&Procedures"

by"

Mario&Iannuzzi"

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science

Graduate Department of Mechanical & Industrial Engineering

University of Toronto

© Copyright by Mario Iannuzzi 2014

ii

Usability of Paper-Based Industrial Operating Procedures

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science

Graduate Department of Mechanical & Industrial Engineering

University of Toronto

Abstract Procedures are standardized lists of instructions that designate the safe and accepted way of

accomplishing a task. This study intended to develop and compare the usability of paper-based

industrial operating procedures. Two procedures at a plant were redesigned with evidence-based

guidelines and human factors input. 16 operators of varying experience were asked to read through

and assess the new and old procedures. The new procedures were rated significantly or moderately

better than their predecessors for efficiency, effectiveness, and subjective satisfaction. On average,

inexperienced operators reported fewer inaccuracies, more confusion, and higher workload ratings than

their experienced counterparts, regardless of procedure type or area. For satisfaction, experienced and

inexperienced operators reported similar ratings across both procedure types and areas. Future studies

should attempt to discern which particular change in the procedures contributed the most to increased

usability, and whether operator experience significantly correlates with usability ratings.

iii

Acknowledgements My first and deepest thanks go to my supervisor, Dr. Greg Jamieson. Throughout the period of this

thesis, he worked tirelessly to instill in me a deep sense of professionalism, and continuously guided

me through critiques of my work. His incredible patience throughout the process, despite his other

responsibilities, astounds me, and I’m truly glad I had the experience of being supervised by him.

My gratitude also goes out to the operators, managers, and engineers who I worked with at the plant in

order to get this work completed. Their participation and input was invaluable, and this work would truly

have been impossible without them.

I would also like to thank MITACS for their funding, as this thesis grew out of an internship program by

them.

I would like to express my sincere appreciation to my committee members, Dr. Alison Smiley and Dr.

Olivier St. Cyr, for their willingness to be on my committee and their time; this thesis was incredibly

improved after their input.

My sincere thanks also go to all of my lab mates in the Cognitive Engineering Lab and the other human

factors labs; I’d particularly like to thank Patrick Stahl and Lisa Min for constantly being there to trade

ideas, troubleshoot problems, and provide support.

Finally, I would like to express my appreciation for all of the constant morale upkeep, motivation and

encouragement given to me by my mom, Cristina Iannuzzi; my sister, Nadia; my brother-in-law, Tony

Criminisi; my brother, Enzo Iannuzzi; and my sister-in-law, Kim Iannuzzi. Most of all, I am forever

indebted to my girlfriend, Terri Mattucci, who has supported me every step of the way, and never let me

give up.

iv

Table&of&Contents&

Abstract(.................................................................................................................................................................(ii!

Acknowledgements(.........................................................................................................................................(iii!

1! Introduction(..................................................................................................................................................(1!

2! Literature(Review(.......................................................................................................................................(3!

2.1! Introduction(to(procedures(............................................................................................................................(3!

2.2! Procedure(design:(Challenges(and(guidelines(..........................................................................................(3!

2.3! Human(factors(and(procedures(.....................................................................................................................(5!

2.4! Procedure(culture,(following(and(adherence(...........................................................................................(6!

2.5! PaperG(and(computerGbased(procedures(....................................................................................................(7!

2.6! Checklists(............................................................................................................................................................(10!

2.7! Usability(studies(of(other(paperGbased(documents(..............................................................................(11!

2.8! Summary(.............................................................................................................................................................(12!

3! Procedure(Redesign(................................................................................................................................(13!

3.1! Motivation(for(Procedure(Redesign(...........................................................................................................(13!

3.2! Redesign(Process(and(Results(......................................................................................................................(15!

4! Method(.........................................................................................................................................................(23!

4.1! Test(design(..........................................................................................................................................................(24!

4.2! Independent(Variables(...................................................................................................................................(24!

4.3! Dependent(Variables(.......................................................................................................................................(25!

4.4! Hypotheses(.........................................................................................................................................................(26!

4.5! Testing(protocol(................................................................................................................................................(27!

v

5! Results(.........................................................................................................................................................(28!

5.1! Demographics(....................................................................................................................................................(28!

5.2! Effectiveness(......................................................................................................................................................(28!

5.3! Efficiency(.............................................................................................................................................................(32!

5.4! Satisfaction(.........................................................................................................................................................(34!

5.5! Statistical(Test(Summary(...............................................................................................................................(40!

5.6! Stated(Preferences(...........................................................................................................................................(41!

6! Discussion(..................................................................................................................................................(42!

6.1! Effect(of(New(Procedures(...............................................................................................................................(42!

6.2! Effect(of(Operator(Experience(......................................................................................................................(43!

6.3! Limitations(..........................................................................................................................................................(43!

7! Conclusion(and(Future(Work(...............................................................................................................(45!

References(.........................................................................................................................................................(46!

vi

List&of&Tables&Table 3-1 - Comparison of old procedure and new procedure features ................................................. 22!

Table 4-1 - Participating operator breakdown ......................................................................................... 24!

Table 5-1 - Demographics Summary ...................................................................................................... 28!

Table 5-2 - Cronbach's Alpha for New Procedures ................................................................................ 35!

Table 5-3 - Cronbach's Alpha for Old Procedures .................................................................................. 35!

Table 5-4 - Summary of p-values for all usability measures ................................................................... 40!

vii

List&of&Figures&Figure 3-1 - Page from old cylinder filling procedure .............................................................................. 14!

Figure 3-2 - Part of the WDA for the cell room ....................................................................................... 15!

Figure 3-3 – Task analysis of revised cell-dipping procedure ................................................................. 17!

Figure 3-4 - Page from the new cylinder filling procedure ...................................................................... 18!

Figure 3-5 – Old procedure - equipment name only ............................................................................... 19!

Figure 3-6 - New procedure - Equipment name and ID .......................................................................... 19!

Figure 3-7 - Old procedure - Step-detail format ...................................................................................... 19!

Figure 3-8 - New procedure - Task hierarchy ......................................................................................... 20!

Figure 3-9 - Old procedure - Paragraphs ................................................................................................ 20!

Figure 3-10 - New procedure - Individual steps with warnings and cautions .......................................... 20!

Figure 3-11 – Old procedure - Inconsistent step structure ..................................................................... 21!

Figure 3-12 - New procedure - Action-object step structure ................................................................... 21!

Figure 3-13 - Old procedure - Rarely contained explanations ................................................................ 21!

Figure 3-14 - Explanation embedded in warning and caution ................................................................ 21!

Figure 5-1 - Mean Counts of Confusion .................................................................................................. 29!

Figure 5-2 - Mean Counts of Confusion by Experience .......................................................................... 30!

Figure 5-3 - Mean Counts of Inaccuracies .............................................................................................. 31!

Figure 5-5 - Mean Subjective Workload ................................................................................................. 33!

Figure 5-6 - Mean Subjective Workload by Experience .......................................................................... 34!

Figure 5-7 – Mean Perception of Organization ....................................................................................... 36!

Figure 5-8 - Mean Perception of Ease of Reading ................................................................................. 36!

Figure 5-9 – Mean Perception of Accuracy of Procedure ....................................................................... 37!

viii

Figure 5-10 – Mean Perception of Ability to Complete Procedure Safely ............................................... 37!

Figure 5-11 – Mean Perception of Appropriate Amount of Detail ........................................................... 38!

Figure 5-12 – Mean Subjective Satisfaction ........................................................................................... 38!

Figure 5-13 - Mean Total Score from Operator Satisfaction Questionnaire ........................................... 39!

Figure 5-14 - Mean Score from Operator Satisfaction Questionnaire by Experience ............................. 40!

& &

1

1 Introduction After the accident at Three Mile Island, the priority of procedural improvement increased in the nuclear

industry; it sparked a realization amongst industry experts that these action lists that prescribe the

standardized methods of achieving objectives are more complicated than they seem, and require

design consideration and analysis from many perspectives (Carvalho, Dos Santos, & Vidal, 2006; Dien,

Llory, & Montmayeul, 1992; Orendi, Petras, Lipner, Oft, & Fanto, 1988; Theureau, Jeffroy, & Vermersch,

2000; D. R. Wieringa & Farkas, 1991). The issues surrounding procedures are particularly complex in

this industry and other safety-critical domains; procedures can have disastrous consequences if

executed incorrectly, have difficult-to-describe physical actions and phenomena, and must be written

with an appropriate amount of detail. These concerns are not minor ones, as it has been reported that

procedural faults are a factor in 69% of nuclear plant incidents (Goodman & DiPalo, 1991).

Several guidelines for the design of procedures in the nuclear industry have been developed in an

attempt to alleviate some of these concerns. They include items such as having multidisciplinary writing

teams, being willing to rely on operator’s knowledge, and using writing guides to keep procedures, their

format, and their language consistent (Bullemer & Hajdukiewicz, 2004; D. R. Wieringa & Farkas, 1991).

Additionally, human factors-specific guidelines have been suggested to improve the usability of written

procedures, such as treating each page as a display, keeping information in small blocks, and using the

learned expectations of the operators as the template for the procedure (Luna, Sturdivant, & McKay,

1988).

ISO-9241 defines usability as the “extent to which a product can be used by specified users to achieve

specified goals with effectiveness, efficiency, and satisfaction” (ISO, 1998). If operators perceive that

their procedures are inefficient, ineffective, or unsatisfactory, they are less likely to use them.

Presumably, purposely not following an incorrect procedure would be judged a logical action; however,

the current culture that exists in some organizations in the industry causes operators to be concerned

about consequences that they may have to face for such a purposeful violation.

Experts in the procedure-writing field must be convinced of the beneficial impact of usability quality on

their products. For this to happen, these experts must have solid evidence that human factors

interventions are a necessary part of the procedure development process, and that they should be

integrated into it. This is particularly true for procedures that are used under normal conditions; as they

are used frequently, errors in their implementation or use might cause abnormal operating conditions

(Carnio, 1980).

Although some nuclear plants may be migrating their procedures to computer-based procedures, most

of them still use the traditional paper ones (Oxstrand, Le Blanc, & Hays, 2012). Thus, it seems peculiar

2

that despite the existence of HF guidelines for procedure design, and despite the fact that usability

comparisons have been performed between computer-based procedures and paper-based procedures,

no usability comparison of the design features of paper operating procedures for normal conditions has

been completed. Consequently, this thesis aims to be an exploratory study in this area.

This thesis directly compares the usability of two existing paper industrial operating procedures with

versions that were redesigned with the input of evidence-based document design principles and human

factors processes. The findings illustrate that the new procedures have higher-or-equal ratings for

effectiveness and efficiency and elicit higher satisfaction from the operators.

3

2 Literature Review The majority of the existing procedure literature revolves around the topics of procedure design

challenges, computer-based procedures, and human factors design guidelines. While the focus of this

thesis is procedure usability, it is necessary to take a cursory exploration of these other topics.

Although there are a number of articles that focus on the topics of procedural adherence, operator

autonomy, and checklist design, most of this subject matter lies outside the scope of this thesis.

2.1 Introduction to procedures Two definitions of procedure from the literature are as follows:

- “[Procedures are] Prescribed action lists to help operators remember and follow mandatory

steps that guarantee safety, workload and performance criteria.” (Boy & De Brito, 2000)

- “Procedures indicate to the human operator the manner in which operational management

intends to have various tasks performed. The intent is to provide guidance…to ensure a logical,

efficient, safe, and predictable (standardized) means of carrying out…objectives.” (Degani &

Wiener, 1994)

The definition of a safe behaviour may help to clarify and reinforce the above definitions:

- “[A safe behaviour is] … one consisting in avoiding any action that might be detrimental to the

plant safety” (Dien et al., 1992) (p. 178)

Most sources in the literature point to the Three Mile Island incident as being the inciting event that

brought procedural improvement to the forefront of the nuclear industry, particularly in the emergency

operating procedure (EOP) area (Carvalho et al., 2006; Dien et al., 1992; Orendi et al., 1988; Theureau

et al., 2000; D. R. Wieringa & Farkas, 1991).

2.2 Procedure design: Challenges and guidelines There are several instances in the literature of studies that quantify the extent and impact of procedural

problems. For example, a field study was conducted at five refining and chemical sites to understand

factors that impact the success of procedural operations and to develop recommendations to increase

that success (Bullemer & Hajdukiewicz, 2004). The study noted that up to 30% of all reports of failed

operations had procedural operations listed as a cause, and suggested that this same cause was

behind up to 8% of reported financial losses.

Ockerman and Pritchett (2000) cited two studies about procedural problems. The first, a study by the

Institute of Nuclear Power Operations (INPO) (1986), reported that of the 48% of incidents initially

assigned to “failures of the human factor,” almost 65% involved a deficiency in a procedure. The

4

second study looked at several hundred incidents and concluded that procedural faults were involved in

69% (Goodman & DiPalo, 1991). Though the exact causes were not listed, some of the general

possibilities included: the operator having a different intention than the procedure they were using; the

range of the procedure differing from the environment or the capabilities of the operator; and when the

set of actions included in the procedure are inaccurate.

What design challenges lead to these deficiencies in procedures? In order to communicate, share, and

remember procedures, they must be written down (Ockerman & Pritchett, 2000). However, the people

that are tasked with the writing of these procedures in NPPs may face a litany of difficulties (D. R.

Wieringa & Farkas, 1991). The system that they are attempting to document is very complex, consisting

of many parts and intricate subsystems. They must grapple with unique interface challenges such as

describing, in writing, physical actions or nuances particular to their specific plant. For example, a

procedure writer in a chemical plant may have to describe at what speed an operator should close a

valve. The procedures that they are writing will sometimes be used in adverse conditions, or perhaps

by a team that is coordinating a complex task. Additionally, a procedure writer must attempt to design

procedures whose comprehensiveness, accuracy, and detail match the needs of the worker that will be

using them (Ockerman & Pritchett, 2000). In particular, a balance between having too much detail and

being too general is very difficult to maintain (Dien, 1998), (Boy & De Brito, 2000). This decision can be

exacerbated by variability in the competency levels of operators, and the complexity of the system.

To combat these multiple difficulties, several approaches to procedure development have been

discussed (D. R. Wieringa & Farkas, 1991). These include writing procedures in multidisciplinary

teams; keeping a rigorously consistent format; fostering an understanding that the designer will have to

rely on the operator’s prior knowledge; supporting group tasks with customized procedures that have

several different procedures for different operators working on the same task; and validating thorough

user testing with simulators and cognitive walkthroughs.

To foster and maintain the consistent format mentioned above, a writer’s guide should be compiled to

lay out guidelines and maintain procedure quality (Bullemer & Hajdukiewicz, 2004). It should be noted

that one must take caution when putting these writer’s guides together. They are, in essence, a

procedure on how to write procedures. As such, they should still be written with the operator in mind,

and not just the procedure writer (Caccamise & Mecherikoff, 1993); to wit, the end language should

contain terms that the operators are familiar with and have been trained on (Carvalho et al., 2006).

Part of maintaining a consistent format is selecting a layout for the procedures. Wieringa and Farkas

(1991) illustrated and discussed some common examples of procedural layouts, including a two-column

text format that distinguishes between general and highly detailed instruction, and a graphical flowchart

5

format. While the two-column text format reduces reliance on operator knowledge, they may become

“lost” in the procedure while attempting to troubleshoot or diagnose issues. The graphical flowchart

format prevents operators from becoming lost because of its helpful flow-lines, but takes up more space,

is difficult to follow over multiple pages, has less space for text, and cannot easily show the hierarchical

information that is usually necessary in procedures.

2.3 Human factors and procedures While procedures are necessarily designed with consideration of constraints of the system itself, they

sometimes overlook the cognitive characteristics of the operators (Dien, 1998). There are some articles

that explicitly deal with human factors guidelines of procedures in their different forms. Overall, the

primary goals of applying human factors to procedures are as follows:

o to ensure that operators can carry out the procedures without overloading themselves;

and,

o to ensure that procedures are structured in a way that are “easy to understand and

follow” (Niwa, Hollnagel, & Green, 1996).

To improve the usability of written procedures, human factors principles should be taken into

consideration (Luna et al., 1988). Luna et al. list these principles:

• each page should be treated as a display;

• information should be kept in small blocks;

• information should be presented consistently;

• the expectations that the operator has learned while working in the physical plant should serve

as the template for the procedure; and,

• physical features of the plant should be matched.

Human factors considerations have also been produced in regards to writing emergency response

guidelines that might allow operators to diagnose and recover from beyond design-basis or low-

probability events (Orendi et al., 1988):

• Steps should be concise;

• any extraneous information should be minimized;

• font emphasis should be used consistently;

• steps should be simplified and standardized; and,

• the vocabulary should be kept specific and consistent.

6

Sharit discussed a contextual modeling approach to human and system reliability analysis (HRA), and

its usefulness for procedural design applications (Sharit, 1998). He noted that in HRAs, the traditional

way of dealing with procedures has been to treat them as performance-shaping factors. He then argued

that HRA should instead be applied to the details of written procedures in order to determine the

contexts of work that might be linked to human error. With this in mind, he suggested several general

guidelines for the writing of procedures from an HRA perspective. At their highest level, these

guidelines consisted of the following:

• explore possibilities for errors in the form of slips and lapses due to factors occurring at the skill-

based level of performance;

• explore possibilities for mistakes due to factors occurring at the rule-based level of performance;

• explore errors that could arise during contingency operations related to performing work

procedures;

• explore errors that may arise from shift changeover; and,

• explore errors that may arise from activities involving communication.

These guidelines were subject to several caveats: the guidelines will be more useful to procedure

designers who are knowledgeable in HRA; the way the guidelines are used will be dependent on

whether they are being applied to procedures that are in use or procedures that are being planned; and

the guidelines should and could be expanded and cross-linked with industry-specific knowledge.

Burks and Peres attempted to design a procedure classification rubric by determining elements that

contribute to cognitive complexity in procedures (Burks & Peres, 2011). The rubric was based on

Campbell’s (1988) objective complexity framework, and on the assumption that the attributes of a

procedure are what determine its complexity (as opposed to how many steps it contains). Elements that

were thought to contribute to complexity included decision points and concurrent procedural operations.

Though the rubric depicted what contributed to cognitive complexity, it could not be practically used;

every procedure that was reviewed had three or more decision points, and thus would all be labeled at

the highest level of the rubric.

2.4 Procedure culture, following and adherence In addition to the above-mentioned difficulties, there are several more nuanced factors that affect the

use of procedures that designers must take into account. After a study completed at four petrochemical

refineries in North America, several issues with procedural culture came to light (Jamieson & Miller,

2000). Procedures were rarely seen in use for several reasons: operators only checked procedures

during complex or infrequent tasks; they thought they knew them well enough to ignore them; and

7

sometimes they were too difficult to understand or retrieve. Also, there was a perception from the

operators that the procedures were likely out of date.

Even if operators think the procedures are too difficult to understand or that they are out of date, they

must think about the consequences that may be handed down from management if they attempt to

deviate from a procedure, particularly if their efforts fail. Some operators have said that due to potential

consequences, they would rather follow a procedure that they know is incorrect or inefficient rather than

disregard it (Dien et al., 1992). Clearly, there is a difficult balance that operators must maintain: they

must follow procedures in order to maintain safety, reliability, consistency (and to avoid negative

judgment and consequences from management); at the same time, they must stay critical of the

procedure for the same reasons (Theureau et al., 2000).

The concept of “controlled initiative” dictates that non-adherence to a procedure should not inherently

be judged as incorrect or punishable; rather, its motive and results should be examined and weighed

(Dien, 1998). This would hopefully lead to an atmosphere that would allow operators to overcome the

fear expressed above. However, this approach is not without its dangers; the more that operators

deviate from procedures, the higher the probability there is for the operator to commit an error.

Procedure writers and designers must maintain consistency in their work; if errors and inconsistencies

are present in procedures, the procedures may not make sense to operators, and this could lead to

both a lack of confidence in the designer and lower procedure adherence overall (Sharit, 1998).

However, if procedures seem logical to the operators, they are more likely to adhere to them. Thus,

procedures should be written in such a way as to help their users understand the rationale behind the

steps and criteria (Boy & De Brito, 2000; Carvalho et al., 2006; Kontogiannis, 1999).

2.5 Paper- and computer-based procedures Some nuclear and petrochemical plants are moving away from paper-based procedures (PBP) and

switching to computer-based procedures (CBP), as CBPs can be seen as a solution to some of the

drawbacks of PBPs (Bullemer & Hajdukiewicz, 2004). For example, PBPs can become very large as

the complexity of the procedure grows; thus, they become very difficult to move due to their volume,

especially on control desks (Kontogiannis, 1999; Niwa et al., 1996; Ockerman & Pritchett, 2000). During

a special inspection program for emergency operating procedures (sponsored by the US Nuclear

Regulatory Commission), it was found that PBPs were inadequate at instruction presentation and

cross-referencing to other procedures (Lapinsky, 1989). They are also highly inflexible, insofar as they

are very difficult to change when it comes to dynamic situations (Niwa et al., 1996). For example, using

CBPs, one could “collapse” or “hide” sections of a page that are not relevant to their current situation.

Moreover, in PBPs, information is fixed in a sequential form, and the cautions/warnings they present

8

may not always be accurate for the current plant state (Boy & De Brito, 2000; O’Hara, Higgins, &

Stubler, 2000). Navigation through PBPs may not be simple, especially if sequential steps are not in the

same list, or if they must be cross-referenced (Ockerman & Pritchett, 2000). Finally, PBPs almost never

anticipate interruptions in their design. A study showed that when pilots are interrupted, it usually leads

to omissions in the procedure that they were carrying out (Boy & De Brito, 2000).

Niwa (1996) discussed the development of human factors guidelines for computerized emergency

operating procedures. Recommendations for CBP implementation included formatting having fixed,

well-defined fields; the avoidance of overlapping windows; individual sub-step representation; and the

minimization of the navigation required from the operator. Additionally, notes and cautions in CBPs

should make use of borders and colours to remain distinct from steps, and should also be kept near

them; this helps to make them salient and to keep the operator from overlooking them (Kontogiannis,

1999).

There have been several studies that attempted to compare CBPs and PBPs. A study of the

computerization of NPP procedures showed a balance of advantages between CBPs and PBPs

(O’Hara et al., 2000). Although CBPs exhibited easier data retrieval and faster completion times than

their paper-based counterparts, they also had higher complexity and attentional demands. One

important conclusion that was drawn from this study was that even if CBPs are implemented to

enhance the operator experience, PBP backup systems must be in place in case of any technological

failure.

Kontogiannis (1999) reviewed several studies that evaluated the usability of computerized emergency

operating procedures. They were all pilot studies that only tested a few crews in a small number of

emergency scenarios. Though the small sizes of these studies prohibited statistical analysis, it was

observed that crews that used CBPs made fewer errors, but, in contrast to the study discussed above,

took longer to bring the plant back to normal operational status than crews using the PBPs. Though it is

difficult to say for certain whether these values can be directly compared, the percentage loss in

completion time in these studies was much smaller than the percentage gain in accuracy from using the

CBPs. Based on operator comments, it would seem that the main issues with the CBPs that led to the

slower completion times were the design of the interface, the navigation, the formatting of the on-

screen procedures, and the small amount of information on each screen relative to a full page in a

traditional PBP. On the other hand, another study showed that computer-based cockpit procedures

were rated higher and were performed more quickly and more accurately than their paper-based

counterparts (Shamo, Dror, & Degani, 1998).

9

What if the appropriate guidelines are applied to both media? A battery of studies indicated that if

guidelines on information readability, content, and organization are followed and applied to both the

PBP and CBP version of procedures, the CBP versions were rated only slightly higher (Patel, Drury, &

Lofgren, 1994).

Theureau et al. performed an empirical study of operator activity in different accident scenarios on a

simulator (Theureau et al., 2000). It was discovered that the step-by-step nature of the CBP used in

that case reduced the operator’s awareness of the overall evolution of the procedure. This, in turn,

made it difficult for the operator to carry out spontaneous adjustments. Additionally, it was difficult for

the operators to estimate the overall effect of the spontaneous adjustments because of this reduced

awareness. The operators found it easier to read ahead with the PBPs to gain an overall view of the

procedure. An older study discovered a similar issue (Elm & Woods, 1985); it also noted that when the

CBP was redesigned with this in mind using an “electronic book” metaphor, the effect of the step-by-

step nature on the operator’s awareness of the overall view of the procedure was successfully mitigated.

A study performed in a control room simulator explored the effectiveness and usability of CBPs

(Converse, 1994). It involved eight pairs of operators controlling a scaled pressurized water reactor

facility. Each team performed both normal and accident scenarios with both the CBPs and traditional

PBPs with the order of CBP and PBP and scenario types counterbalanced. The number of errors

committed, times to initiate and complete procedures, and subjective workload estimates (using the

NASA-TLX) were recorded. Under normal operating conditions, the procedure type did not significantly

affect the performance measures (errors or times). However, the individual NASA-TLX dimensions

showed that operators were significantly more confident of their appropriate accomplishment of the task

goals with the CBPs than the PBPs. They attributed this to the CBPs structuring their responses more

rigidly, making it less likely for them to skip a step. In the accident scenarios, there was a significant

effect of procedure type on response initiation time, but no effect was found on completion time; the

PBPs had a faster initiation time than the CBPs, but this made no difference in how long the procedures

actually took to complete. Additionally, the use of PBPs also resulted in four times as many errors than

use of CBPs. This led to a significant interaction between procedure type and task type; there were

many more errors for PBPs as compared to CBPs in accident scenarios, but not in normal scenarios.

Thus, future evaluations of CBPs should be completed with a variety of scenario types.

Now that I have reviewed multiple studies about the usability of CBPs, and about usability comparisons

between PBPs and CBPs, the gap that remains becomes clear: there are no usability comparisons of

two PBPs.

10

2.6 Checklists The purpose and use of checklists are similar enough to procedures that some of the literature

revolving around them and their usability pertains to procedures. Just as procedures, they are used to

help operators remember and follow steps in processes. However, while procedures are meant to be

more detailed instructions to follow to achieve a goal, checklists are meant to be shorter lists meant to

prevent the forgetting of an intended action, and usually require a confirmation of each action taken

(LEHTO & SALVENDY, 1995).

Degani & Wiener began their study of checklists by concentrating on the human factors of a paper

checklist as a display (Degani & Wiener, 1990). They used field studies, interviews with pilots, accident

reports, interviews with government agencies, and information from aircraft manufacturers and the

general literature in order to understand the way that flight crews use checklists. A key finding was that

pilots did not strictly use the checklists as they were intended to during their design. Pilots “short-cut”

the checklists by calling several challenges out in one chunk and waiting for the other pilot to reply with

their “chunk” of responses. The intended use of the checklist was to have step-by-step challenges and

responses. In addition to this, they also discussed issues relevant to checklist design such as task

analysis, operational logic, sequencing, and duplication.

To compare which components in a checklist most effectively contributed to detection of medicine

errors at a patient’s bedside, two different checklists (an old and a new) were compared in a high-

fidelity usability lab (White et al., 2010). Human factors engineers designed new checklists after

observing nurses use the old checklist; the new checklists were meant to mitigate issues that the

engineers noted while observing the nurses. Ten nurses were each asked to check fourteen infusion

pumps in a within-subjects design that counterbalanced errors between participants and the order of

checklist used. The use of the new checklists resulted in the discovery of significantly more errors than

the old checklists. Specific instructions were found to be more effective in aiding the detection of errors

than general reminders. While this was a comparison of two paper documents, they were checklists,

and not procedures.

Much like PBPs, paper checklists have known shortcomings. A study involving four four-person flight

crews that compared electronic and paper checklists outlined these imperfections (Palmer & Degani,

1991). The errors included operators losing track of which step they were currently performing, skipping

steps due to interruptions or distractions, forgetting to return to a skipped item, and incorrectly marking

items as completed.

The performance of different types of electronic checklists have been compared against each other and

against performance from a paper checklist (Mosier, Palmer, & Degani, 1992). 12 two-person crews

11

flew a full mission simulation while being randomly assigned to three different groups: automatically-

sensed checklists, manually-sensed checklists, or paper checklists. Performance from memory was

also compared with performance from the checklists. The major dependent variable that was recorded

and analyzed was the handling of engine problems that were introduced in an emergency condition.

The small sample size (due to the between-subjects design) made statistical analysis inappropriate, but

trends in the data suggested that crews that erroneously shut down an engine tended to be those that

were initiating engine shutdown from memory, and that were using electronic checklists (5 of 8); these

pilots also recorded the highest workload on their subjective workload assessments. The crews that

took the correct course of action were the ones with the “less automated” checklists; that is, they did not

initiate any of the checklist items from memory, and they were using paper checklists (3 of 4).

Presumably, the crews that had to take the time to retrieve paper checklists had more time to discuss

and process the cues that were presented and were less likely to respond incorrectly.

2.7 Usability studies of other paper-based documents If usability studies and evaluations of paper-based documents other than procedures have been

completed, why haven’t any been completed for PBPs? Though the scope of this project is limited to

paper-based procedure usability, usability studies of other paper-based documents across a variety of

domains are briefly discussed below.

A one-group pretest-posttest comparison was used to compare personal digital assistant (PDA) and

paper and pencil tests. It measured quiz scores, mental workload, completion time, and user

satisfaction (Segall, Doolen, & Porter, 2005). The study found that students completed the tests more

quickly with the PDA, but no differences were found in quiz scores, subjective workloads, or satisfaction.

While this study was not concerned with procedures, it used the same categories for usability that were

used in the current study: effectiveness, efficiency, and satisfaction.

In the healthcare domain, a usability study was conducted to compare the effect of evidence-based

document design principles on patient information leaflets (Pander Maat & Lentz, 2010). 154 users

tested three leaflets before their revision, and 164 tested them after their revision. Dependent measures

included the success rate and time necessary for users to find particular information, comprehension,

and perception of usability. After correcting for literacy differences between the groups, it was shown

that the revisions led to significantly better localization performance, comprehension, and improved

appreciation of the material. A follow-up study was performed that determined that the gains listed

above are independent of the A4 paper formats that were originally tested; thus, it is possible that these

results could be generalized over many document formats, including procedures.

12

2.8 Summary I have reviewed some of the literature surrounding procedures, including design challenges and

guidelines; human factors issues; culture and adherence; medium of transcription and use; checklists;

and also other usability studies of other paper documents.

Several gaps exist in the procedure usability literature that has been reviewed here. Though guidelines

for procedure design, writing and formatting have been briefly discussed, few of them are evidence-

based. Although a larger discussion of this is warranted, it falls outside the scope of this thesis.

Additionally, several studies have been completed comparing the usability of CBPs and PBPs, but none

that compared the usability of PBPs against other PBPs based on their design features. Many of these

studies focused on emergency operating procedures, whereas only a few focused on procedures used

under normal conditions. Finally, though there has been one usability study of a paper-based checklist

before and after human factors intervention, a similar study of a PBP does not exist.

In order to fill the gap described above, this thesis aims to complete a usability comparison of two

existing paper industrial operating procedures with versions that were redesigned with the input of

evidence-based document design principles and human factors processes.

13

13

3 Procedure Redesign I performed a comparison of paper industrial operating procedures at a plant that refines low-level

radioactive materials. Due to the nature of the work, the facility falls under the governance of the

Canadian Nuclear Safety Commission (CNSC), and follows similar rules to NPPs. Though the plant

does contain a control room, the majority of the operations that are carried out are field operations.

In the fall of 2009, several reportable incidents occurred at this plant. Upon examination of these

incidents operating procedures were cited as contributing causal factors. This prompted management

to request an internal audit to review the status of operating procedures. The audit resulted in several

recommendations, including the creation of a procedure style guide and the updating of the plant’s

procedures.

I was one of three consultants that were asked to redesign the procedures at this plant from a human

factors perspective. At the time, each of us was responsible for procedures in a different area. The

plant had more than 500 individual procedures, in addition to short, reminder job-aids that were posted

at key points throughout the plant. At the time of writing of this thesis, only a few procedures have been

rewritten. As will be discussed in more detail below, we created a new style guide and procedure format,

and conducted an operating experience review, function analysis, and task analysis for each procedure.

3.1 Motivation for Procedure Redesign The existing format of the procedures was a two-column action-detail (or, in this case, step-comment)

format. An example of a page from the old cylinder filling procedure can be seen below, in Figure 3-1.

The image is modified to protect proprietary information.

14

Figure 3-1 - Page from old cylinder filling procedure Although the existing procedures were familiar to the operators and relatively short (e.g., the example

procedure above was only three pages in length), the internal audit revealed several weaknesses that

may have contributed to the reportable incidents, including the following:

A) The procedure format was inconsistent;

B) Procedures and/or job aids were not being used on a daily basis;

C) Relevant information relating to the execution of the procedures was missing;

D) Procedures were not easily accessible by operators when performing tasks.

From the above list, the first three points were the ones most impacted by this human factors project.

We proceeded to review incident reports, review several of the procedures, and interview several

operators to start our analysis of the procedures. This is discussed in further detail in the section below.

We discovered issues similar to the ones discussed in the audit:

A) The procedures were inconsistent in their vocabulary; and,

B) Operators felt steps that were necessary and typically performed were missing from the written

procedures; thus, the procedures were incomplete.

15

3.2 Redesign Process and Results A review of the literature revealed that there were few clear evidence-based guidelines indicating how

to construct an improved, consistent procedure template. In order to fulfill this goal and to create a new

style guide and format for the procedures, we consulted the sources that did have evidence-based

guidelines: an industry-standard book on procedure writing (D. Wieringa, Moore, & Barnes, 1998), the

Nuclear Energy Institute’s procedure writers’ manual (Nuclear Energy Institute, 2006), and the Institute

of Nuclear Power Operations (INPO) guide to procedure use and adherence (Institute of Nuclear Power

Operations, 2009). Additionally, we consulted industry professionals, used human factors heuristics

(including, amongst others, consistency, standardization, and flexibility and efficiency of use), and

professional judgment, and attempted to incorporate (as a guideline) what the operators responded

positively to.

Next, a human factors process to redesign the content of the procedures had to be completed, tested

and implemented. The process was based on that found in the United States Nuclear Regulatory

standard, NUREG-0711 (O’Hara, J.M. & Higgins, J.C., 2004). First, we conducted an operating

experience review (E. Davey, personal communication, May 13, 2013) that included interviews with

operators and a review of incident reports. As the area that I worked on procedures in was the cell

room, my interviews included interviews with operators from that area of the plant, and a review of cell

room incidents. Next, we completed a function analysis; this was a hierarchical representation of

system functions that allowed us to understand the system capabilities. For the cell room, I chose to

use work-domain analysis (WDA), and an abstraction hierarchy (AH) as the tool for the function

analysis. This allowed for me to gain an understanding of what pieces of equipment comprised the cell

room, and why and how each of the pieces and processes were connected in order to achieve the goal

of the cell room. Part of the result of the WDA can be seen in Figure 3-2, below.

Figure 3-2 - Part of the WDA for the cell room

RailcarRectifiersRectifier Cooling System

Clean-up reactors

Port Hope Conversion

Facility

Flame Reactors Cell room

AHF Storage Tanks

Cell cooling system

F2 Subsystem

H2 Subsystem

Header System Cells

H2 Surge Tank H2 Blower Scrubber KOH Seal

PotAtmospher

ePropylene

filtersF2 Surge

Tank CompressorF2

Discharge drum

Monel Filters

Rectifier cooling

exchanger

Rectifier cooling

water tank

AHF Receiving

Carbon Anode

Carbon Steel

Cathode

16

After that was completed, we performed a task analysis. To inform the task analysis, we performed

heuristic and experienced-based reviews of existing procedures in order to identify issues in operation.

Additionally, we observed operators completing each procedure at least twice. These were naturalistic

observations in which we avoided interrupting or asking questions of the operators; while the operators

were aware of our presence, we assured them that we were merely trying to gain an understanding of

the procedures, and not to judge their performance. Additionally, an interview with the system engineer

was completed. These activities allowed us to ensure that we had an essential understanding of the

plant, the procedures, and their purpose. In my case, they allowed me to better understand what pieces

of equipment were being referred to in the cell room. Additionally, observing the operators while I was

reading the procedures began to give me a sense of where the existing procedures may have been

missing some actions.

This was followed by at least two operational walkthroughs of the task with operators on different crews.

While the previous observations were naturalistic in nature, these walkthroughs were more intrusive:

we asked the operators to speak aloud and explain to us why they were performing particular actions.

Additionally, we asked them to tell us when they came upon a point in the procedure where they

thought a step was missing or inaccurate, their practices were recorded in detail, along with the

rationale for the way certain things were performed. For the cell room procedures, this allowed me to fill

in many of the actions that were missing from the old procedures, and helped me to further understand

why the operators were performing the actions that they were. Additionally, the rationale served to fill

some of the details, notes, cautions and warnings that were placed in the new procedure.

We placed all of this information in task analysis tables, and held follow-up discussions with operating

staff and area engineers to clarify any remaining issues. For example, when different operators (or

different crews) were observed performing steps or actions differently, we consulted with shift

supervisors and area engineers to determine which of these methods would be the safest and most

resistant to potential errors. Once the steps were placed into the task analysis tables, it became easy to

see the hierarchy and the natural groupings of actions; see Figure 3-3 below for an early version of the

hierarchical task analysis of the cell room procedure that I worked on redesigning. The highlighted cells

represented areas where different operators presented conflicting steps of how their crews performed

certain actions.

17

Figure 3-3 – Task analysis of revised cell-dipping procedure

Next, we organized the task analysis into the proposed procedure sequence and discussed it with the

shift supervisors and area engineers to determine whether any important or critical steps were missing,

placed in the wrong order, or had excessive (or insufficient) detail. We then took the task analysis and

created a draft procedure in the new format. The draft was distributed to all of the plant’s crews for

review and comment for two iterations. The iterative process helped the teams reach consensus with

each other and with area engineers on points of contention. Finally, we took the draft procedure to the

management team for review and signoff acceptance. The procedure was then introduced into service

and the operators and training staff were informed of the new procedure. Any new training needs were

identified and communicated to the Training Department.

An example of a page of the new cylinder filling procedure (one of the procedures used in the usability

testing) can be seen in Figure 3-4 below. The new procedures contain more steps that operators

18

perform (as identified through the task analyses); are structured in local groups of hierarchical actions

that were driven by the task analysis; are based on a style guide that enforces consistent vocabulary

and formatting; and contain new, relevant information in the form of clearly demarcated details, notes,

warnings, and cautions. The length of the procedures also grew (e.g., the example cylinder filling

procedure expanded from 3 pages to 14 pages).

Figure 3-4 - Page from the new cylinder filling procedure

19

Some direct comparisons can be made in order to illustrate the differences between the old and the

new procedures directly. For example, Figure 3-5 below shows an example of how in the old

procedures, equipment was identified by functional name only; this would have made it difficult to find

the equipment if you knew what the equipment identifier number was, but did not know what it looked

like. In the new procedures, both functional names and device identifiers were included (Figure 3-6).

Figure 3-5 – Old procedure - equipment name only

Figure 3-6 - New procedure - Equipment name and ID

Figure 3-7 below shows an example of how the old procedures were written in a two-column, step-

detail format. Though the intention here was to always have steps in the left column and details in the

right column, they were sometimes mixed up. On the other hand, as can be seen in Figure 3-8 below,

the new procedures were revised with the output from the task analysis in order to create a task

hierarchy, numbering scheme, and local groupings of actions.

Figure 3-7 - Old procedure - Step-detail format

20

Figure 3-8 - New procedure - Task hierarchy

Whereas the old procedures were written in paragraphs (Figure 3-9), which could have sometimes

made it difficult to locate and isolate steps in the block of text, the new procedures eliminated

paragraphs and wrote out individual steps instead (Figure 3-10). Additionally, important warnings and

cautions were highlighted according to ANSI guidelines.

Figure 3-9 - Old procedure - Paragraphs

Figure 3-10 - New procedure - Individual steps with warnings and cautions

While the old procedures sometimes displayed an inconsistent step structure that (as previously

mentioned) sometimes had comments in the step column (or multiple steps written together) (Figure

3-11), the new procedures consistently followed an action-object step structure, with the lowest-level

actions capitalized (Figure 3-12).

21

Figure 3-11 – Old procedure - Inconsistent step structure

Figure 3-12 - New procedure - Action-object step structure

Finally, while the old procedures rarely contained explanations for the actions contained within, the new

procedures embedded these explanations for actions directly within highlighted details, notes, cautions,

and warnings. In the example illustrated below, while the old procedure mentions the mandatory use of

the automatic valve closer (Figure 3-13), it does not detail why it should be used, or what might happen

if it is not used correctly. These explanations can be seen in the new procedure (Figure 3-14).

Figure 3-13 - Old procedure - Rarely contained explanations

Figure 3-14 - Explanation embedded in warning and caution

22

Table 3-1, below summarizes some of the differences in the characteristics of the old procedures and

the new procedures.

Table 3-1 - Comparison of old procedure and new procedure features Old procedures New procedures

Equipment is identified by functional name

only; not name and device identity.

Where possible, both functional names

and device identifiers were included in

the revised procedures.

Two-column, step-detail format; not sub-

divided into logical groupings of actions.

Task analysis output a task hierarchy;

used to create a numbering scheme and

local groupings of actions.

Written in paragraphs Elimination of paragraphs; use steps

instead. Highlighting of warnings and

cautions consistent with ANSI guidelines.

Inconsistent step structure. Action-object step structure; lowest-level

actions in capital letters.

Rarely contained explanations for actions. Explanations for actions embedded in

cautions and warnings.

23

4 Method I wanted to evaluate whether the operators would rate the usability of the procedures based on the new

style guide, format, and development process higher than the old procedures.

In the plant where this research was performed there are several different working areas, and each

have their own sets of procedures. I wanted to ensure that the results obtained were not biased by one

particularly well-done or poorly written procedure, or one relatively difficult or simple area. Thus, I

decided that it would be best to test at least two different revised procedures from at least two different

areas of the plant. Additionally, I chose two procedures that were created and updated by each of two

different human factors engineers for the same reason. I avoided the area in which I performed the

procedure redesign (the cell room) in an attempt to avoid any bias on my part.

The two procedure areas that I selected were the flame reactor and cold trap area (FR), and the

cylinder filling (CF) area. These areas were selected because they involved the least amount of

equipment and would require the operators to move around the least. This would make it easier for the

experimenter to track the actions and movements of the operators. Thus, in total, four procedures were

used during the tests: two procedures from CF (one old and one new), and two procedures from FR

(one old and one new). The old CF procedure was three pages long, while the new procedure is 14

pages long; the old FR procedure was two pages long, while the new FR procedure is eight pages long.

The operators are classified by qualification levels between 0 and 3 in each area of the plant based on

the number of supervised hours of work in the area and the amount of training they’ve undergone (D.

Perry, personal communication, July 16, 2013). The definitions of the levels are given below for

clarification:

Level 0 (L0) – new operator, yet to complete common and area-specific training components (though

have started training in the area)

Level 1 (L1) – operator is not authorized to operate equipment or perform job tasks/procedures unless

under the direct supervision of a qualified person until the operator has successfully completed the

performance evaluation for the given task/procedure

Level 2 (L2) – able to perform all tasks and procedures independently under normal supervision but

with a qualified operator available for guidance

Level 3 (L3) – qualified and able to run the given area independently under normal supervision.

The plant has four different crews that work on a shift schedule. 16 operators were selected for the

study based on a range of experience and qualifications. I wanted half of the operators to be

24

experienced, and half to be inexperienced. See Table 4-1 below for an illustrative breakdown of the

operator selection. Four of these operators were qualified as L3 operators in the CF area and at least

two other areas (with the exclusion of FR), four were L3 operators in FR and at least two other areas

(with the exclusion of CF), and eight operators were less than or equivalent to L0 in the CF and FR

areas (and L3 in one other area or less). For reference, I will refer to the first group as “Experienced

C/F”, the second group as “Experienced F/R”, and the third group as “Inexperienced”.

Table 4-1 - Participating operator breakdown Operator Cylinder Filling Flame Reactor Other areas

Operators 1-4

(Experienced C/F)

L3 < L0 L3 in ≥ 2 areas

Operators 5-8

(Experienced F/R)

< L0 L3 L3 in ≥ 2 areas

Operators 9-16

(Inexperienced)

≤L0 ≤L0 L3 in ≤ 1 area

4.1 Test design I used a within-subjects counterbalanced design for this study. I decided to use a within-subjects design

in order to accommodate for the potentially low number of participants; it was expected that there would

be a finite number of available operators that had the time available to participate in the study, and

even fewer that would want to participate that would fit the selection criteria. This is an attempt to avoid

the pitfall of not having enough subjects for a statistical comparison when performing a between-

subjects design, as experienced in a 1992 study on a comparative performance measurement of

checklists (Mosier et al., 1992). Additionally, using the within-subjects design should control for any

individual variability between operators. In order to counter any learning effects or order effects that

may present themselves, I counterbalanced the study in several ways: the order of presentation of each

procedure and the order of which area was presented first was counterbalanced for each crew and for

each category of qualification level.

4.2 Independent Variables The independent variables in this study were the procedure type (new or old), the procedure area (CF

or FR), and operator experience level (experienced or inexperienced).

25

4.3 Dependent Variables ISO-9241 defines usability as the “extent to which a product can be used by specified users to achieve

specified goals with effectiveness, efficiency, and satisfaction” (ISO, 1998). A 2006 review of usability

measurement methods examined usability measures from 180 studies published in HCI journals and

classified them according to the three categories from ISO-9241 that were presented above (Hornbæk,

2006). The following definitions of the categories were cited from the ISO standard:

Effectiveness: “accuracy and completeness with which users achieve specified goals”

Efficiency: “resources expended in relation to the accuracy and completeness with which users achieve

goals”

Satisfaction: “freedom from discomfort, and positive attitudes towards the user of the product”.

In this study, I aimed to have at least one measure (as reviewed in (Hornbæk, 2006)) from each of

these categories. Due to reasons of limited operator availability, limited equipment availability, and time

constraints, few measures were deemed suitable matches. Despite this limited number of matches, this

study still contains one measure from each category. Hornbæk noted that of the studies that he

reviewed, 22% had no effectiveness measure, 19% had no efficiency measure, and 38% had no

satisfaction measure.

In order to fill the category of “effectiveness”, I used a users’ assessment; for efficiency, a modified

NASA Task Load Index (NASA TLX) questionnaire (Appendix A); and for satisfaction, a questionnaire

(Appendix B) that included several Likert-type scale-questions and open-ended questions.

The “users” referred to above are the operators. I asked the operators to read through the procedures

and note any inaccuracies or confusing points that they found in the procedures. I recorded and tallied

these items to form a count of inaccuracies and confusing points for each procedure. In an attempt to

remain as objective as possible, I asked the operators to count only the inaccuracies that, if the

procedure was followed precisely, would result in damage to equipment, themselves, others, or an

inability to complete the task.

The NASA Task Load Index is a “multi-dimensional rating procedure that provides an overall workload

score based on a weighted average of ratings on six subscales” (NASA, 1986). The published version

contained the six dimensions of mental demands, physical demands, temporal demands, own

performance, effort, and frustration. However, given that the item being tested here were the usability of

documents, I deemed the subscale for “physical demands” to be irrelevant.

26

In order to create an operator questionnaire that suited our particular analysis, I adapted two items: the

Questionnaire for User Interface Satisfaction (QUIS) (Chin, Diehl, & Norman, 1988), and a

questionnaire based on the consumer information rating form that measured the perception of usability

(Pander Maat & Lentz, 2010).

The Likert questions that were included in the operator questionnaire asked the operator about the

following qualities for each of the procedures that they read:

• Well-organized

• Ease-of-reading

• Written accurately enough in order to be able to achieve the specified goal completely

• Written in a way that lets you achieve the specified goal safely

• Written with the appropriate amount of detail

• Overall satisfaction with the procedure

Each of the questions was on a 5-point Likert-scale that ranged from strongly disagree (1) to strongly

agree (5). Additionally, there was an open-ended question that dealt with the amount of detail that

allowed the operator to comment if they disagreed with the amount of detail that was included in the

procedure. This allowed the operator to explain whether they thought there was too much or too little

detail, and why they thought that. At the end of the testing protocol, I also asked the operator whether

they preferred the new or the old procedures overall, and why.

4.4 Hypotheses I expected that, in general, the new procedures would have a positive usability impact. The type of

procedure (old versus new) was predicted to directly affect all usability factors: effectiveness, efficiency,

and satisfaction. The inaccuracy and confusion counts of the new, redesigned procedures were

predicted to be lower than that of the old procedures; the NASA TLX scores of the new procedures

were predicted to be lower than that of the old procedures; and the operators were predicted to give

higher subjective ratings of satisfaction in the survey to the new procedures than the old procedures.

I also predicted that these positive impacts of the new procedures would be true across both of the

procedure areas for all dependent variables.

With regards to operator experience, I predicted that there would be very little difference between the

ratings given by the experienced operators or the ratings given by the inexperienced operators in all of

the usability factor categories.

27

Finally, I predicted that the majority of the operators would state that they prefer the new procedures to

the old ones.

4.5 Testing protocol For each operator that participated in the study, I used the following test protocol:

First, I contacted each individual operator, explained the study, and obtained their informed consent

(Appendix C). A time was agreed upon when the operator would be free to participate in the study.

The tests were run in a meeting room behind the control room that contained a table and several chairs.

During the test, the operators were seated at the table, with copies of all of the procedures in front of

them. Before the study commenced, I explained the NASA-TLX workload assessment to the operator,

including the definitions of each of the 5 subscales. Subsequently, the operator was allowed one

practice run with the NASA-TLX to ensure that the scales were understood.

I then explained to the operator that they were to read the procedure that was presented to them. While

they were reading, if the operator came across anything that confused them or that seemed inaccurate,

I told them to advise me about it. Similarly, if they had any comments that they thought of while reading

the procedure, I encouraged them to dictate them to me. While the operator read through the procedure,

I tallied (and recorded the reason for) any confusing occurrences or inaccuracies that were reported by

the operator.

After the operator finished reading each procedure, I asked them to fill the operator questionnaire. I

then asked them to fill out the modified NASA-TLX workload assessments in regards to each procedure.

After the first and third procedure, the operator was given a 5-minute break. After the second procedure

(when switching between different areas in the plant), the operator was given a 20-minute break.

After the operator completed the read-through of the two procedures in the first area (and their

accompanying questionnaires and NASA-TLX scales), I asked them to complete their subjective

weightings of the NASA-TLX factors. Finally, after all four of the procedures had been read through, I

asked the operator whether they had an overall preference for the new procedures or the old ones, and

whether they had any overall comments; the operators dictated these comments directly to me, and I

recorded them in a notebook. Finally, the operators were debriefed.

28

5 Results For all of the following graphs, “N” refers to the new procedure, and “O” to the old one. All of the data

were tested for normality before statistical tests were run; where the assumption of normality was

violated, an appropriate non-parametric test was used.

5.1 Demographics Below is a summary of the demographic data collected from the operator participants. Most of the

participants were between the ages of 31-50, and most had been working at that particular plant for

between 5 and 10 years.

Table 5-1 - Demographics Summary

& Years"as"operator"Ages" 10"years"

or"more"5!10"years" 2!5"years" Less"than"2"

years"Grand"Total"

18!30" 0.00%" 0.00%" 0.00%" 12.50%"(2)" 12.50%"(2)"31!50" 12.50%"(2)" 25.00%"(4)" 6.25%"(1)" 18.75%"(3)" 62.50%"(10)"51"or"older" 6.25%"(1)" 12.50%"(2)" 6.25%"(1)" 0.00%" 25.00%"(4)"Grand&Total& 18.75%&(3)& 37.50%&(6)& 12.50%&(2)& 31.25%&(5)& 100.00%&

(16)&

5.2 Effectiveness Confusion

I asked the operators to indicate when they were confused by anything mentioned in the procedure; I

then tallied up the number of times this occurred per procedure. The results can be seen in Figure 5-1,

below.

29

Figure 5-1 - Mean Counts of Confusion A Wilcoxon signed-rank test showed no evidence of a difference in the counts of confusion for the new

CF procedure (M=0.94) as compared to the old CF procedure (M=1.06, p=0.751, z=-1.78). A second

Wilcoxon signed-rank test showed strong evidence of a difference in the counts of confusion for the

new FR procedure (M=0.69) as compared to the old FR procedure (M=1.81, p=0.027, z=-2.212).

A third Wilcoxon signed-rank test indicated that there was strong evidence of a difference between the

new procedures and the old procedures (p=0.037, z=-2.083). This significant difference may indicate

less operator confusion with the new procedures.

The mean number of times the experienced and inexperienced operators were confused while reading

each procedure can be seen in Figure 5-2, below.

0!

0.5!

1!

1.5!

2!

2.5!

3!

Confusion!/!New! Confusion!/!Old! Confusion!/!New! Confusion!/!Old!C/F! C/F! F/R! F/R!M

ean(Num

ber(of(Times(Confused(

Mean(Counts(of(Confusion(

30

Figure 5-2 - Mean Counts of Confusion by Experience The new CF procedure had a mean confusion count of 0.625 for experienced operators and 1.25 for

inexperienced operators, while the old one had a mean of 0.875 for experienced operators and 1.25 for

inexperienced operators. The new FR procedure had a mean of 0.25 for experienced operators and

1.125 for inexperienced operators, while the old one had a mean of 1 for experienced operators and

2.625 for inexperienced operators.

Inaccuracies

I also asked the operators to let me know when they thought they spotted something inaccurate or

missing from the procedures, and tallied up this count as well. These results can be seen in Figure 5-3

below.

0!

0.5!

1!

1.5!

2!

2.5!

3!

New!C/F! Old!C/F! New!F/R! Old!F/R!

Mean(Num

ber(of(Times(Confused(

Mean(Counts(of(Confusion(by(Experience(

Experienced!Operators!

Inexperienced!Operators!

31

Figure 5-3 - Mean Counts of Inaccuracies A Wilcoxon signed-rank test showed strong evidence of a difference in the counts of inaccuracies for

the new CF procedure (M=0.5) as compared to the old CF procedure (M=2.94, p=0.005, z=-2.807). A

second Wilcoxon signed-rank test showed moderate evidence of an effect in the counts of inaccuracies

for the new FR procedure (M=0.75) as compared to the old FR procedure (M=1.69, p=0.07, z=-1.812).

A third Wilcoxon signed-rank test indicated that there was strong evidence of a difference between the

new procedures and the old procedures (p<0.001, z=-3.291).

The mean number of times the experienced and inexperienced operators found inaccuracies while

reading each procedure can be seen in Figure 5-4, below.

0!0.5!1!

1.5!2!

2.5!3!

3.5!4!

4.5!5!

Inaccuracy!/!New!Inaccuracy!/!old!Inaccuracy!/!New!Inaccuracy!/!Old!C/F! C/F! F/R! F/R!

Mean(Num

ber(of(Inaccuracies(

Mean(Counts(of(Inaccuracies(

32

Figure 5-4 - Mean Counts of Inaccuracies by Experience The new CF procedure had a mean inaccuracy count of 1 for experienced operators and 0 for


inexperienced operators. The new FR procedure had a mean of 1 for experienced operators and 0.5 for

inexperienced operators, while the old one had a mean of 2.375 for experienced operators and 1 for

inexperienced operators.

5.3 Efficiency Subjective Workload Assessment

The mean subjective weighted workload ratings were calculated per procedure and can be seen in

Figure 5-5, below.

0!0.5!1!

1.5!2!

2.5!3!

3.5!4!

4.5!5!


Mean(Num

ber(of(Inaccuracies(

Mean(Counts(of(Inaccuracies(by(Experience(



33

Figure 5-5 - Mean Subjective Workload A paired t-test (one-tailed) showed strong evidence that the mean workload ratings for the new CF

procedure (M=7.42) are significantly lower than the workload ratings for the old CF procedure (M=10.05,

p=0.02, z=-2.05). A second paired t-test (one-tailed) showed no evidence of a significant difference in

the workload ratings between the new (M=8.58) and old FR procedures (M=9.35, p=0.27, z=-1.93).

A 2-tailed paired t-test showed moderately strong evidence of an effect between the new procedures

and the old procedures for subjective workload (p = 0.055, z=-1.92).

The mean subjective workload ratings were also broken down by procedure and experience, and can

be seen in Figure 5-6, below.

0!2!4!6!8!10!12!14!

Weighted!Rating!/!New!

Weighted!Rating!/!Old!

Weighted!Rating!/!New!

Weighted!Rating!/!Old!

C/F! C/F! F/R! F/R!

Mean(NASAGTLX(Score(

Mean(Subjective(Workload(

34

Figure 5-6 - Mean Subjective Workload by Experience The new CF procedure had a mean workload rating of 6.35 for experienced operators and 9.425 for


inexperienced operators. The new FR procedure had a mean of 8.07 for experienced operators and

10.17 for inexperienced operators, while the old one had a mean of 8.43 for experienced operators and

11.45 for inexperienced operators.

5.4 Satisfaction Internal consistency is an important feature of any scale; it determines whether several questions that

claim to measure the same variable result in similar scores. In this particular case, the variable in

question would be satisfaction. Cronbach’s alpha coefficient is a measure of internal consistency (Bland

& Altman, 1997). The generally acceptable values for alpha range from 0.70 to 0.95; however, a

maximum value of 0.90 has been recommended (Tavakol & Dennick, 2011). If alpha is higher than 0.90,

some of the items on the scale may be redundant. As access to operators was limited, assessing the

internal consistency of the survey questions had to be performed post-hoc.

The surveys for the new procedures had a Cronbach’s alpha of 0.894, and the surveys for the old

procedures had a Cronbach’s alpha of 0.915, as can be seen in Error! Reference source not found.

and Error! Reference source not found. below. Thus, it can be said that the operator questionnaire

questions were relatively internally consistent and all measured the same variable.

0!

2!

4!

6!

8!

10!

12!

14!


Mean(NASAGTLX(Score(

Mean(Subjective(Workload(by(Experience(



35 Table 5-2 - Cronbach's Alpha for New Procedures

Reliability Statistics

Cronbach's Alpha

Cronbach's Alpha Based on

Standardized Items

N of Items

.894 .895 6

Table 5-3 - Cronbach's Alpha for Old Procedures

Reliability Statistics

Cronbach's Alpha

Cronbach's Alpha Based on

Standardized Items

N of Items

.915 .915 6

Likert Items

Each quality that was measured by a Likert question can be seen in the figures below:

• Well-organized (Figure 5-7);

• Easy to read (Figure 5-8);

• Written accurately enough in order to be able to achieve the specified goal completely (Figure

5-9);

• Written in a way that lets you achieve the specified goal safely (Figure 5-10);

• Written with the appropriate amount of detail (Figure 5-11); and,

• Overall satisfaction with the procedure (Figure 5-12).

Figure 5-13 illustrates the mean of the sum of these scores from the operator questionnaires.

36

Figure 5-7 – Mean Perception of Organization The new CF procedure had a mean perception of organization of 4.19, while the old one had a mean of

3.12. The new FR procedure had a mean of 4.25, while the old one had a mean of 2.94.

Figure 5-8 - Mean Perception of Ease of Reading The new CF procedure had a mean perception of ease of reading of 3.81, while the old one had a

mean of 3.44. The new FR procedure had a mean of 3.94, while the old one had a mean of 3.19.

0!1!2!3!4!5!

1/N! 1/O! 1/N! 1/O!C/F! C/F! F/R! F/R!

Mean(Likert(Score(

Procedure(

Mean(Perception(of(Organization(

0!1!2!3!4!5!

2/N! 2/O! 2/N! 2/O!C/F! C/F! F/R! F/R!

Mean(Likert(Score(

Procedure(

Mean(Perception(of(Ease(of(Reading(

37

Figure 5-9 – Mean Perception of Accuracy of Procedure The new CF procedure had a mean perception of accuracy of 4.25, while the old one had a mean of

2.89. The new FR procedure had a mean of 3.81, while the old one had a mean of 2.81.

Figure 5-10 – Mean Perception of Ability to Complete Procedure Safely The new CF procedure had a mean perception of the ability to complete it safely of 4.25, while the old

one had a mean of 2.69. The new FR procedure had a mean of 3.81, while the old one had a mean of

2.75.

0!1!2!3!4!5!

3/N! 3/O! 3/N! 3/O!C/F! C/F! F/R! F/R!

Mean(Likert(Score(

Procedure(

Mean(Perception(of(Accuracy(of(Procedure(

0!1!2!3!4!5!

4/N! 4/O! 4/N! 4/O!C/F! C/F! F/R! F/R!

Mean(Likert(Score(

Procedure(

Mean(Perception(of(Ability(to(Complete(Procedure(Safely(

38

Figure 5-11 – Mean Perception of Appropriate Amount of Detail The new CF procedure had a mean perception of appropriate amount of detail of 4.06, while the old

one had a mean of 2.81. The new FR procedure had a mean of 3.81, while the old one had a mean of

2.75.

Figure 5-12 – Mean Subjective Satisfaction The new CF procedure had a mean subjective satisfaction of 4, while the old one had a mean of 2.875.

The new FR procedure had a mean of 3.88, while the old one had a mean of 2.81.

The mean of the sum of the scores from the operator satisfaction questionnaire were calculated per

procedure and can be seen in Figure 5-13, below.

0!1!2!3!4!5!

5/N! 5/O! 5/N! 5/O!C/F! C/F! F/R! F/R!

Mean(Likert(Score(

Procedure(

Mean(Perception(of(Appropriate(Amount(of(Detail(

0!1!2!3!4!5!

6/N! 6/O! 6/N! 6/O!C/F! C/F! F/R! F/R!

Mean(Likert(Score(

Procedure(

Mean(Subjective(Satisfaction(

39

Figure 5-13 - Mean Total Score from Operator Satisfaction Questionnaire

A Wilcoxon signed-rank test for the CF area showed strong evidence of a significant difference in the

overall satisfaction questionnaire score between the new procedure (M=24.56) and the old procedure

(M=17.81, p=0.003, z=-2.97). A second Wilcoxon signed-rank test for the FR area showed strong

evidence of a significant difference in the overall satisfaction questionnaire score between the new

procedure (M=23.5) and the old procedure (M=17.25, p=0.007, z=-2.7).

A Wilcoxon signed rank test indicated that there was strong evidence of a significant difference in

satisfaction between the new procedures and the old procedures (p<0.001, z=-5.99).

The mean total scores from the operator satisfaction questionnaire were also broken down by

procedure and experience, and can be seen in Figure 5-14, below.

0!5!10!15!20!25!30!


Mean(Score(from

(Questionnaire(

Mean(Total(Score(from(Operator(Satisfaction(Questionnaire(

40

Figure 5-14 - Mean Score from Operator Satisfaction Questionnaire by Experience The new CF procedure had a mean operator satisfaction questionnaire score of 26.88 for experienced

operators and 22.25 for inexperienced operators, while the old one had a mean of 18.25 for

experienced operators and 17.38 for inexperienced operators. The new FR procedure had a mean of

23.63 for experienced operators and 23.38 for inexperienced operators, while the old one had a mean

of 16.75 for experienced operators and 17.75 for inexperienced operators.

5.5 Statistical Test Summary Below, a summary of the statistical test results can be found. Significant p-values are bolded. Values

that indicate moderate or moderately strong evidence of an effect have an asterisk.

Table 5-4 - Summary of p-values for all usability measures New vs. Old New C/F vs. Old C/F New F/R vs. Old F/R

Confusion

(Wilcoxon)

0.037 0.751 0.027

Inaccuracy

(Wilcoxon)

0.001 0.005 0.07*

Subjective

Workload (t-test)

0.055* 0.02 0.27

Satisfaction

(Wilcoxon)

0.00 0.003 0.007

0!5!10!15!20!25!30!


Mean(Score(from

(Operator(

Questionnaire(

Mean(Score(from(Operator(Satisfaction(Questionnaire(by(

Experience(

Experienced!Operators!Inexperienced!Operators!

41

5.6 Stated Preferences When asked after the testing protocol which procedure they preferred, 13 of 16 operators stated a

preference for the new procedures. The majority of cited reasons and comments for this preference

claimed that the new procedures have more detail and information. Of the three that did not prefer the

new procedures, all of them stated that they thought the old procedures were more direct and had

fewer steps.

Of the three operators that did not prefer the new procedures, 1 of them was an experienced operator,

and 2 were inexperienced.

42

6 Discussion 6.1 Effect of New Procedures I found that paper-based procedures redesigned with human factors input and evidence-based

guidelines have higher usability ratings than their original versions. As predicted by our hypotheses,

strong or moderate evidence of differences between the new and the old procedures were shown in the

results across all of the usability categories tested. Of the nine metrics that were collected for each

procedure (two for effectiveness, one for efficiency, and six for satisfaction), all of them displayed better

mean values for the new procedures than for the old procedures.

Specifically, in regards to the cylinder filling procedures, I found that the new CF procedures were

reported to be more effective, more efficient, and had higher subjective ratings of satisfaction than the

old ones. Of the four statistical tests that were run between the new and old CF procedures, all but one

of them (confusion count) showed strong evidence of statistically significant differences; the new CF

procedures were shown to be significantly better than the old. Similarly, in regards to the flame reactor

procedures, I found that the new FR procedures were reported to be more effective, more efficient, and

had higher subjective ratings of satisfaction than their old counterparts. Of the four statistical tests that

were run between the new and old FR procedures, two of them showed strong evidence of a

statistically significant difference, one showed moderate evidence of an effect (inaccuracy count), and

one showed no evidence of a significant effect (subjective workload); all were in favour of the new FR

procedure.

While the confusion count for the CF procedures did not show evidence of a significant difference, it

should be noted that the confusion count for the new CF procedure is similar to the confusion count for

the new FR procedure. However, the difference in the mean confusion counts between the old

procedures seems larger. Thus, perhaps the old CF procedure was not as confusing as the old FR

procedure. Therefore, it could be said that the operators were not any more confused about the new CF

procedure than the old one, and significantly less confused about the new FR procedure than the old

one.

The lack of evidence for a significant difference in the subjective workload category between the FR

procedures seems to be mainly due to a couple of operators who gave unexpected low subjective

workload ratings to the old FR procedure. One factor that could have impacted their rating could have

simply been their familiarity with the old procedure. Also, it is possible that even though the old

procedure was incomplete and missing several steps, since it was so short (less than two pages), it had

a very low subjective workload for some of these operators. Despite this, the majority of the evidence

as discussed above still supports the conclusion that the new FR procedures had higher usability. This

43

can be considered non-trivial given the difference in familiarity level as discussed above, which lends

favour to the old procedures.

Finally, I found that the new procedures were rated with a higher overall subjective satisfaction, and

were preferred, over the old ones. The operators surveyed found the selected new procedures to be

better organized, more accurate, more conducive to safe work practices, more appropriately detailed,

and more satisfactory than the old procedures. For the “ease of reading” question, there seemed to be

a much less noticeable difference in the mean rating between the new and the old CF procedure that

was tested. Perhaps, like the subjective workload ratings, this ease of reading question had only a

small difference between the new and old CF procedures due to the length of the new procedure; three

operators commented about how long it was relative to the old procedure, was less direct, and how that

made it more difficult to get through. However, this did not affect the overall conclusion, as 13 of the 16

operators tested (approximately 81%) stated a preference the new procedures.

6.2 Effect of Operator Experience Though I did not perform additional inferential statistical tests in order to avoid inflating type I error rates,

I compared the means of confusion counts, inaccuracy, workload ratings, and total satisfaction

questionnaire scores in order to determine whether there was any difference between the ratings given

by experienced operators and those given by inexperienced operators.

Generally, inexperienced operators reported more instances of confusion, fewer inaccuracies, and

higher workload scores than their experienced counterparts, independent of whether the procedure was

new or old. These differences may be due to the differences in the amount of training that these

operators have received when compared to their experienced counterparts. If the inexperienced

operators received the same amount of exposure to the areas and the training as the experienced

operators, they may have been less confused, found more inaccuracies, and reported lower workload

scores for these procedures.

Conversely, upon examination of the means of the total scores reported from the operator satisfaction

questionnaire, there appeared to be very little difference between the ratings reported by the

experienced operators and those reported by the inexperienced operators.

6.3 Limitations A key limitation on the study was the need to evaluate the procedures in the context of a real process

environment. Multiple anticipated and unanticipated shutdowns constrained the selection of our

research methods. These shutdowns prevented me from being able to walk through the plant with the

operators in order to follow and observe them while they actually performed the procedures in the

44

process environment. As the tests were performed with the operators reading the procedures in a

meeting room, as opposed to actually performing them (or walking through them), the removal of

environmental cues that they may have been used to experiencing while using these procedures may

have affected their ability to count inaccuracies or to accurately judge their subjective workload.

Additionally, though many studies reviewed were held in simulators, I had no access to such a device.

If I had had access to a simulator, and more time, it would have been possible to have the operators

perform the procedures on a simulated process, while more objective performance metrics (such as

error rates or product yield amounts) could be determined and compared from the simulator data.

Our limited time and the limited number and availability of operators at the plants also forced me to

adapt our testing protocol.

Another potential limitation of the study was the fact that at the end of the testing protocol, I asked the

operators to dictate their preference between the new and the old procedures to me. Although I did not

design these procedures in particular, the operators may have believed that I had some vested interest

in them. Thus, as a result, there is a minor chance that they may have biased their verbal response in

order to align with my interests. If I had asked them to write down these items as part of the operator

questionnaire instead of asking them to verbally report their preference, this bias may have been

minimized.

Finally, though it would have been insightful to determine which of the particular changes in format or

content contributed the most to usability, this was an additional item that could not be carried out due to

the previously mentioned time constraints.

This study was carried out with an industrial operator population with mixed experience and gender

using paper-based field operation procedures. Generalizability to other populations is unknown. The

literature review indicates that computer-based and paper-based procedures seem to share some

design guidelines. Although the sample size for this study was small, the results may be generalizable

to other paper-based procedures and computer-based procedures in other plants with similar operator

populations.

45

7 Conclusion and Future Work This is the first body of work to perform a usability comparison of paper-based industrial operating

procedures for normal conditions. Despite multiple limitations that were present, the study was carried

out with two procedures that had been redesigned based on evidence-based procedure-writing

guidelines and human factors input. As predicted, the new procedures (as compared to the old

procedures) were rated moderately or significantly higher across the categories of efficiency,

effectiveness, and satisfaction. More specifically, the new cylinder filling procedure was rated

significantly better than its predecessor across counts of inaccuracy, subjective workload, and

satisfaction. The new flame reactor procedure was rated significantly (or at least moderately) better

than its predecessor across counts of confusion, counts of inaccuracy, and satisfaction. Of the four

measures where ratings were compared across the different operator experience groups, the only

measure where experienced and inexperienced operators gave similar ratings was in the category of

satisfaction.

Ideally, an immediate follow-up to this study should attempt to identify which particular change in the

procedures contributed most to the differences in the usability ratings. Hopefully, future studies of

procedure usability in settings such as this would either have access to a simulator or the actual plant

(for walkthroughs) so that a more objective measure could be used to capture the effectiveness metric

(e.g., number of errors committed, time completed, etc.). Additionally, although I could not test the

hypothesis with inferential statistics here, a valuable study would be to determine whether

inexperienced operators and experienced operators would rate the usability of the procedures

significantly differently.

46

References Bland, J. M., & Altman, D. G. (1997). Statistics notes: Cronbach’s alpha. BMJ, 314(7080), 572–572.

doi:10.1136/bmj.314.7080.572

Boy, G. A., & De Brito, G. (2000). Toward a categorization of factors related to procedure following and

situation awareness. In Proceedings of the HCI-Aero 2000 Conference. In Cooperation with

ACM-SIGCHI, Cepadues, Toulouse, France. Retrieved from

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.6279&rep=rep1&type=pdf

Bullemer, P. T., & Hajdukiewicz, J. R. (2004). A Study of Effective Procedural Practices in Refining and

Chemical Operations. In Proceedings of the Human Factors and Ergonomics Society Annual

Meeting (Vol. 48, pp. 2401–2405). Retrieved from

http://pro.sagepub.com/content/48/20/2401.short

Burks, R., & Peres, S. C. (2011). Procedure Classification Putting Campbell’s Objective Complexity

Framework to Work for a Petrochemical Company. In Proceedings of the Human Factors and

Ergonomics Society Annual Meeting (Vol. 55, pp. 1472–1475). Retrieved from


Caccamise, D. J., & Mecherikoff, M. (1993). Human factoring the procedures element in a complex

manufacturing system. In Proceedings of the Human Factors and Ergonomics Society Annual

Meeting (Vol. 37, pp. 1046–1050). Retrieved from


Carnio, A. (1980). Improvement of Operating Procedures in a Nuclear Power Plant. Trans. Am. Nucl.

Soc.; (United States), 35. Retrieved from

http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=5092041

Carvalho, P. V., Dos Santos, I. L., & Vidal, M. C. (2006). Safety implications of cultural and cognitive

issues in nuclear power plant operation. Applied Ergonomics, 37(2), 211–223.

47 Chin, J. P., Diehl, V. A., & Norman, K. L. (1988). Development of an instrument measuring user

satisfaction of the human-computer interface. In Proceedings of the SIGCHI conference on

Human factors in computing systems (pp. 213–218). Retrieved from

http://dl.acm.org/citation.cfm?id=57203

Converse, S. A. (1994). Operating procedures: do they reduce operator errors? In Proceedings of the

Human Factors and Ergonomics Society Annual Meeting (Vol. 38, pp. 205–209). Retrieved from


Degani, A., & Wiener, E. (1994). On the design of flight-deck procedures.

Degani, A., & Wiener, E. L. (1990). Human factors of flight-deck checklists: the normal checklist.

Dien. (1998). Safety and application of procedures, orhow dothey’have to use operating procedures in

nuclear power plants?’. Safety Science, 29(3), 179–187.

Dien, Llory, M., & Montmayeul, R. (1992). Operator’s knowledge, skill and know-how during the use of

emergency procedures: design, training and cultural aspects. In Human Factors and Power

Plants, 1992., Conference Record for 1992 IEEE Fifth Conference on (pp. 178–181). Retrieved

from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=283413

Elm, W. C., & Woods, D. D. (1985). Getting lost: a case study in interface design. In Proceedings of the

Human Factors and Ergonomics Society Annual Meeting (Vol. 29, pp. 927–929). Retrieved from


Goodman, P. C., & DiPalo, C. A. (1991). Human Factors Information System: A Tool to Assess Error

Related to Human Performance in U.S. Nuclear Power Plants. Proceedings of the Human

Factors and Ergonomics Society Annual Meeting, 35(10), 662–665.

doi:10.1177/154193129103501015

48 Hornbæk, K. (2006). Current practice in measuring usability: Challenges to usability studies and

research. International Journal of Human-Computer Studies, 64, 79–102.

doi:10.1016/j.ijhcs.2005.06.002

Institute of Nuclear Power Operations. (2009). Procedure Use & Adherence.

ISO, W. (1998). 9241-11. Ergonomic requirements for office work with visual display terminals (VDTs).

Guidance on Usability.

Jamieson, G., & Miller, C. (2000). Exploring the ‚Äúculture of procedures‚Äù (pp. 141–145).

Kontogiannis, T. (1999). Applying information technology to the presentation of emergency operating

procedures: implications for usability criteria. BEHAVIOUR & INFORMATION TECHNOLOGY,

18, 261–276.

Lapinsky, G. (1989). Lessons Learned from the Special Inspection Program for Emergency Operating

Procedures. Division of Licensee Performance and Quality Evaluation, Office of Nuclear

Reactor Regulation, US Nuclear Regulatory Commission.

LEHTO, M., & SALVENDY, G. (1995). Warnings: a supplement not a substitute for other approaches to

safety. Ergonomics, 38(11), 2155–2163. doi:10.1080/00140139508925259

Luna, S. F., Sturdivant, M. H., & McKay, R. C. (1988). Factoring humans into procedures. In Human

Factors and Power Plants, 1988., Conference Record for 1988 IEEE Fourth Conference on (pp.

201–207). Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=27503

Mosier, K. L., Palmer, E. A., & Degani, A. (1992). Electronic checklists: Implications for decision making.

In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 36, pp. 7–

11). Retrieved from http://pro.sagepub.com/content/36/1/7.short

NASA. (1986). Nasa Task Load Index (TLX) Manual.

49 Niwa, Y., Hollnagel, E., & Green, M. (1996). Guidelines for computerized presentation of emergency

operating procedures. Nuclear Engineering and Design, 167, 113–127. doi:10.1016/s0029-

5493(96)01297-6

Nuclear Energy Institute. (2006). Procedure Writers’ Manual (No. NEI AP-907-005). Nuclear Energy

Institute.

O’Hara, J. M., Higgins, J., & Stubler, W. (2000). Computerization of Nuclear Power Plant Emergency

Operating Procedures. In Proceedings of the Human Factors and Ergonomics Society Annual

Meeting (Vol. 44, pp. 819–822). Retrieved from http://pro.sagepub.com/content/44/22/819.short

O’Hara, J.M., & Higgins, J.C. (2004). NUREG-0711: Human Factors Engineering Programme Review

Model. Washington, DC: US Nuclear Regulatory Commission.

Ockerman, J., & Pritchett, A. (2000). A Review and Reappraisal of Task Guidance: Aiding Workers in

Procedure Following. International Journal of Cognitive Ergonomics, 4(3), 191–212.

doi:10.1207/S15327566IJCE0403_2

Orendi, R. G., Petras, D. S., Lipner, M. H., Oft, R. R., & Fanto, S. V. (1988). Human-factors

considerations in emergency procedure implementation. In Human Factors and Power Plants,

1988., Conference Record for 1988 IEEE Fourth Conference on (pp. 214–221). Retrieved from

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=27505

Oxstrand, J., Le Blanc, K., & Hays, S. (2012). Evaluation of Computer-Based Procedure System

Prototype. Idaho National Laboratory External Report. Retrieved from

http://www.inl.gov/technicalpublications/Documents/5581215.pdf

Palmer, E., & Degani, A. (1991). Electronic checklists: Evaluation of two levels of automation. In

Proceedings of the Sixth Symposium on Aviation Psychology (pp. 178–183). Retrieved from

http://ti.arc.nasa.gov/m/profile/adegani/Electronic%20checklist%20eval.pdf

50 Pander Maat, H., & Lentz, L. (2010). Improving the usability of patient information leaflets. Patient

Education and Counseling, 80, 113–119. doi:10.1016/j.pec.2009.09.030

Patel, S., Drury, C. G., & Lofgren, J. (1994). Design of workcards for aircraft inspection. Applied

Ergonomics, 25(5), 283–293. doi:10.1016/0003-6870(94)90042-6

Segall, N., Doolen, T. L., & Porter, J. D. (2005). A usability comparison of PDA-based quizzes and

paper-and-pencil quizzes. Computers & Education, 45(4), 417–432.

Shamo, M. K., Dror, R., & Degani, A. (1998). Evaluation of a New Cockpit Device: The Integrated

Electronic Information System. Proceedings of the Human Factors and Ergonomics Society

Annual Meeting, 42(1), 138–142. doi:10.1177/154193129804200131

Sharit, J. (1998). Applying human and system reliability analysis to the design and analysis of written

procedures in high-risk industries. Human Factors and Ergonomics in Manufacturing & Service

Industries, 8(3), 265–281.

Tavakol, M., & Dennick, R. (2011). Making sense of Cronbach’s alpha. International Journal of Medical

Education, 2, 53–55.

Theureau, J., Jeffroy, F., & Vermersch, P. (2000). Controling a nuclear reactor in accidental situations

with symptombased computerized procedures: a semiological & phenomenological analysis.

CSEPC 2000 Proceedings, 22–25.

White, R. E., Trbovich, P. L., Easty, A. C., Savage, P., Trip, K., & Hyland, S. (2010). Checking it twice:

an evaluation of checklists for detecting medication errors at the bedside using a chemotherapy

model. Quality and Safety in Health Care, 19, 562–567.

Wieringa, D., Moore, C., & Barnes, V. (1998). Procedure writing: principles and practices. Battelle

Press Columbus, OH. Retrieved from http://www.getcited.org/pub/100322096

51 Wieringa, D. R., & Farkas, D. K. (1991). Procedure writing across domains: Nuclear power plant

procedures and computer documentation. In Proceedings of the 9th annual international

conference on Systems documentation (pp. 49–58). Retrieved from

http://dl.acm.org/citation.cfm?id=122787

52

Appendix(A (MODIFIED(NASA(TASK(LOAD(INDEX(QUESTIONNAIRE(

Name Task Date

Mental Demand How mentally demanding was the task?

Physical Demand How physically demanding was the task?

Temporal Demand How hurried or rushed was the pace of the task?

Performance How successful were you in accomplishing whatyou were asked to do?

Effort How hard did you have to work to accomplishyour level of performance?

Frustration How insecure, discouraged, irritated, stressed,and annoyed wereyou?

Figure 8.6

NASA Task Load Index

Hart and Staveland’s NASA Task Load Index (TLX) method assesseswork load on five 7-point scales. Increments of high, medium and lowestimates for each point result in 21 gradations on the scales.

Very Low Very High

Very Low Very High

Very Low Very High

Very Low Very High

Perfect Failure

Very Low Very High

Mario Iannuzzi

Mario Iannuzzi

Subject number:Procedure type:Area:

53

Appendix(B OPERATOR(QUESTIONNAIRE(

Subject number: Presentation procedure number: Area number: Overview I would like to thank you for your interest and participation in this study. This study is being conducted in order to collect data about the usability of different procedures. The data collected through this study will be analyzed as part of a Master’s-level research thesis. If you have any questions following completion of the questionnaire please feel free to contact the researcher at [email protected] . Research Ethics & Confidentiality In keeping with research ethics practices at the University of Toronto all completed questionnaires will be kept confidential and viewed only by the researcher. If at any time you feel unable or unwilling to answer a question please feel free to leave the question blank or alternatively enter ‘no response’. Your participation in this exercise can be halted at any time. For more information on research ethics at the University of Toronto please visit www.research.utoronto.ca. Please answer questions 1 to 3 by reading each question carefully and responding according to the provided instructions.

1) Qualification – Are you a qualified operator in this area? Yes No

2) Experience – How long have you worked at this plant as an operator? Please enter your answer in the box below.

Less than 2 years 2 – 5 years 5 – 10 years 10 years or more

3) Age - Check the box to the immediate right of your intended answer. 18-30 31-50 51 or Older

Please indicate how strongly you agree or disagree with the statements below by checking the box above your corresponding level of agreement:

I thought that this procedure was…

1) …well-organized.

Strongly Disagree

Disagree Undecided Agree Strongly Agree

2) …easy to read.

Strongly Disagree


3) …written accurately enough to be able to achieve the specified goal completely.

Strongly Disagree


4) …written in a way that lets you achieve the specified goal safely.

Strongly Disagree


5) …written with the appropriate amount of detail.

Strongly Disagree


If you checked “disagree” or “strongly disagree” for question #5 above, please explain why. ______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ ______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

6) Overall, I was satisfied with the quality of this procedure.

Strongly Disagree


54

Appendix(C INFORMED(CONSENT(FORM(

Informed Consent Form Date: Study Name: Usability Comparison of Paper Procedures

Researchers: Mario Iannuzzi

Supervisor: Dr. Greg Jamieson About the Research: Thank you for considering participation in this study! The purpose of this study will be to develop a method to compare the usability of paper procedures. In total, the time commitment for you will be roughly 5 hours. It will include:

• an introduction and a debriefing session; • a walkthrough of four procedures in the plant where you will be asked questions about the

procedure; • two questionnaires; • and a brief closing interview.

Potential Risks and Discomforts: You will be exposed to the plant environment; however, this should not bring you any more discomfort than you experience during your regular working periods. Benefits of the Research and/or Benefits to You: This methodology will be among the first of its kind to compare the usability levels of paper procedures. It will benefit both the scientific human factors community, and industries that are attempting to modify their procedures. As an operator at a corporation that has recently changed how the procedures are written, this research will benefit you by assuring you that the new procedures are indeed more usable. Voluntary Participation and Withdrawal from the Study: Your participation in the study is completely voluntary and you may choose to stop participating at any time. Should you decide not to volunteer or to withdraw this will not influence the nature of your relationship with the researchers or the nature of your relationship with the University of Toronto either now, or in the future. You can stop participating in the study at any time, for any reason, if you so decide. In that event, the data you will have produced up to that point shall be discarded at your request. Confidentiality: All performance data and information you supply on the questionnaire during the

research will be held in confidence; in other words, your name will not appear in any report or publication of the research. Your data will be safely stored in a locked facility and only research staff will have access to this information. Confidentiality will be provided to the fullest extent possible by law. Questions About the Research? If you have questions about the research in general or about your role in the study, please feel free to contact Mario Iannuzzi either by telephone at (647) 986-6719 or by e-mail ([email protected]). You may also contact Dr. Greg Jamieson ([email protected]). This research has been reviewed by the University of Toronto’s Office of Research Ethics (ORE). If you have any questions about this process, or about your rights as a participant in the study, please contact The Office of Research Ethics at McMurrich Building, 2nd floor, 12 Queen’s Park Crescent West, Toronto, ON M5S 1S8, Fax:(416) 946-5763. Signature: I____________________________, consent to participate in the experiment described above, conducted by Mario Iannuzzi. I have understood the nature of the project and wish to participate. I am not waiving any of my legal rights by signing this form, and I have been given a copy of it for future reference. My signature below indicates my consent. Signature Date_____________________________ Participant's Name: Participant Signature __________________________________ Investigator’s Signature __________________________________ !

Usability)of)Paper Based&Industrial&Operating& Procedures ... · accomplishing a task. This study...

Documents

Transcript of Usability)of)Paper Based&Industrial&Operating& Procedures ... · accomplishing a task. This study...