Usability)of)Paper Based&Industrial&Operating& Procedures ... · accomplishing a task. This study...
Transcript of Usability)of)Paper Based&Industrial&Operating& Procedures ... · accomplishing a task. This study...
Usability)of)Paper!Based&Industrial&Operating&Procedures"
by"
Mario&Iannuzzi"
A thesis submitted in conformity with the requirements for the degree of Master of Applied Science
Graduate Department of Mechanical & Industrial Engineering
University of Toronto
© Copyright by Mario Iannuzzi 2014
ii
Usability of Paper-Based Industrial Operating Procedures
A thesis submitted in conformity with the requirements for the degree of Master of Applied Science
Graduate Department of Mechanical & Industrial Engineering
University of Toronto
Abstract Procedures are standardized lists of instructions that designate the safe and accepted way of
accomplishing a task. This study intended to develop and compare the usability of paper-based
industrial operating procedures. Two procedures at a plant were redesigned with evidence-based
guidelines and human factors input. 16 operators of varying experience were asked to read through
and assess the new and old procedures. The new procedures were rated significantly or moderately
better than their predecessors for efficiency, effectiveness, and subjective satisfaction. On average,
inexperienced operators reported fewer inaccuracies, more confusion, and higher workload ratings than
their experienced counterparts, regardless of procedure type or area. For satisfaction, experienced and
inexperienced operators reported similar ratings across both procedure types and areas. Future studies
should attempt to discern which particular change in the procedures contributed the most to increased
usability, and whether operator experience significantly correlates with usability ratings.
iii
Acknowledgements My first and deepest thanks go to my supervisor, Dr. Greg Jamieson. Throughout the period of this
thesis, he worked tirelessly to instill in me a deep sense of professionalism, and continuously guided
me through critiques of my work. His incredible patience throughout the process, despite his other
responsibilities, astounds me, and I’m truly glad I had the experience of being supervised by him.
My gratitude also goes out to the operators, managers, and engineers who I worked with at the plant in
order to get this work completed. Their participation and input was invaluable, and this work would truly
have been impossible without them.
I would also like to thank MITACS for their funding, as this thesis grew out of an internship program by
them.
I would like to express my sincere appreciation to my committee members, Dr. Alison Smiley and Dr.
Olivier St. Cyr, for their willingness to be on my committee and their time; this thesis was incredibly
improved after their input.
My sincere thanks also go to all of my lab mates in the Cognitive Engineering Lab and the other human
factors labs; I’d particularly like to thank Patrick Stahl and Lisa Min for constantly being there to trade
ideas, troubleshoot problems, and provide support.
Finally, I would like to express my appreciation for all of the constant morale upkeep, motivation and
encouragement given to me by my mom, Cristina Iannuzzi; my sister, Nadia; my brother-in-law, Tony
Criminisi; my brother, Enzo Iannuzzi; and my sister-in-law, Kim Iannuzzi. Most of all, I am forever
indebted to my girlfriend, Terri Mattucci, who has supported me every step of the way, and never let me
give up.
iv
Table&of&Contents&
Abstract(.................................................................................................................................................................(ii!
Acknowledgements(.........................................................................................................................................(iii!
1! Introduction(..................................................................................................................................................(1!
2! Literature(Review(.......................................................................................................................................(3!
2.1! Introduction(to(procedures(............................................................................................................................(3!
2.2! Procedure(design:(Challenges(and(guidelines(..........................................................................................(3!
2.3! Human(factors(and(procedures(.....................................................................................................................(5!
2.4! Procedure(culture,(following(and(adherence(...........................................................................................(6!
2.5! PaperG(and(computerGbased(procedures(....................................................................................................(7!
2.6! Checklists(............................................................................................................................................................(10!
2.7! Usability(studies(of(other(paperGbased(documents(..............................................................................(11!
2.8! Summary(.............................................................................................................................................................(12!
3! Procedure(Redesign(................................................................................................................................(13!
3.1! Motivation(for(Procedure(Redesign(...........................................................................................................(13!
3.2! Redesign(Process(and(Results(......................................................................................................................(15!
4! Method(.........................................................................................................................................................(23!
4.1! Test(design(..........................................................................................................................................................(24!
4.2! Independent(Variables(...................................................................................................................................(24!
4.3! Dependent(Variables(.......................................................................................................................................(25!
4.4! Hypotheses(.........................................................................................................................................................(26!
4.5! Testing(protocol(................................................................................................................................................(27!
v
5! Results(.........................................................................................................................................................(28!
5.1! Demographics(....................................................................................................................................................(28!
5.2! Effectiveness(......................................................................................................................................................(28!
5.3! Efficiency(.............................................................................................................................................................(32!
5.4! Satisfaction(.........................................................................................................................................................(34!
5.5! Statistical(Test(Summary(...............................................................................................................................(40!
5.6! Stated(Preferences(...........................................................................................................................................(41!
6! Discussion(..................................................................................................................................................(42!
6.1! Effect(of(New(Procedures(...............................................................................................................................(42!
6.2! Effect(of(Operator(Experience(......................................................................................................................(43!
6.3! Limitations(..........................................................................................................................................................(43!
7! Conclusion(and(Future(Work(...............................................................................................................(45!
References(.........................................................................................................................................................(46!
vi
List&of&Tables&Table 3-1 - Comparison of old procedure and new procedure features ................................................. 22!
Table 4-1 - Participating operator breakdown ......................................................................................... 24!
Table 5-1 - Demographics Summary ...................................................................................................... 28!
Table 5-2 - Cronbach's Alpha for New Procedures ................................................................................ 35!
Table 5-3 - Cronbach's Alpha for Old Procedures .................................................................................. 35!
Table 5-4 - Summary of p-values for all usability measures ................................................................... 40!
vii
List&of&Figures&Figure 3-1 - Page from old cylinder filling procedure .............................................................................. 14!
Figure 3-2 - Part of the WDA for the cell room ....................................................................................... 15!
Figure 3-3 – Task analysis of revised cell-dipping procedure ................................................................. 17!
Figure 3-4 - Page from the new cylinder filling procedure ...................................................................... 18!
Figure 3-5 – Old procedure - equipment name only ............................................................................... 19!
Figure 3-6 - New procedure - Equipment name and ID .......................................................................... 19!
Figure 3-7 - Old procedure - Step-detail format ...................................................................................... 19!
Figure 3-8 - New procedure - Task hierarchy ......................................................................................... 20!
Figure 3-9 - Old procedure - Paragraphs ................................................................................................ 20!
Figure 3-10 - New procedure - Individual steps with warnings and cautions .......................................... 20!
Figure 3-11 – Old procedure - Inconsistent step structure ..................................................................... 21!
Figure 3-12 - New procedure - Action-object step structure ................................................................... 21!
Figure 3-13 - Old procedure - Rarely contained explanations ................................................................ 21!
Figure 3-14 - Explanation embedded in warning and caution ................................................................ 21!
Figure 5-1 - Mean Counts of Confusion .................................................................................................. 29!
Figure 5-2 - Mean Counts of Confusion by Experience .......................................................................... 30!
Figure 5-3 - Mean Counts of Inaccuracies .............................................................................................. 31!
Figure 5-5 - Mean Subjective Workload ................................................................................................. 33!
Figure 5-6 - Mean Subjective Workload by Experience .......................................................................... 34!
Figure 5-7 – Mean Perception of Organization ....................................................................................... 36!
Figure 5-8 - Mean Perception of Ease of Reading ................................................................................. 36!
Figure 5-9 – Mean Perception of Accuracy of Procedure ....................................................................... 37!
viii
Figure 5-10 – Mean Perception of Ability to Complete Procedure Safely ............................................... 37!
Figure 5-11 – Mean Perception of Appropriate Amount of Detail ........................................................... 38!
Figure 5-12 – Mean Subjective Satisfaction ........................................................................................... 38!
Figure 5-13 - Mean Total Score from Operator Satisfaction Questionnaire ........................................... 39!
Figure 5-14 - Mean Score from Operator Satisfaction Questionnaire by Experience ............................. 40!
& &
1
1 Introduction After the accident at Three Mile Island, the priority of procedural improvement increased in the nuclear
industry; it sparked a realization amongst industry experts that these action lists that prescribe the
standardized methods of achieving objectives are more complicated than they seem, and require
design consideration and analysis from many perspectives (Carvalho, Dos Santos, & Vidal, 2006; Dien,
Llory, & Montmayeul, 1992; Orendi, Petras, Lipner, Oft, & Fanto, 1988; Theureau, Jeffroy, & Vermersch,
2000; D. R. Wieringa & Farkas, 1991). The issues surrounding procedures are particularly complex in
this industry and other safety-critical domains; procedures can have disastrous consequences if
executed incorrectly, have difficult-to-describe physical actions and phenomena, and must be written
with an appropriate amount of detail. These concerns are not minor ones, as it has been reported that
procedural faults are a factor in 69% of nuclear plant incidents (Goodman & DiPalo, 1991).
Several guidelines for the design of procedures in the nuclear industry have been developed in an
attempt to alleviate some of these concerns. They include items such as having multidisciplinary writing
teams, being willing to rely on operator’s knowledge, and using writing guides to keep procedures, their
format, and their language consistent (Bullemer & Hajdukiewicz, 2004; D. R. Wieringa & Farkas, 1991).
Additionally, human factors-specific guidelines have been suggested to improve the usability of written
procedures, such as treating each page as a display, keeping information in small blocks, and using the
learned expectations of the operators as the template for the procedure (Luna, Sturdivant, & McKay,
1988).
ISO-9241 defines usability as the “extent to which a product can be used by specified users to achieve
specified goals with effectiveness, efficiency, and satisfaction” (ISO, 1998). If operators perceive that
their procedures are inefficient, ineffective, or unsatisfactory, they are less likely to use them.
Presumably, purposely not following an incorrect procedure would be judged a logical action; however,
the current culture that exists in some organizations in the industry causes operators to be concerned
about consequences that they may have to face for such a purposeful violation.
Experts in the procedure-writing field must be convinced of the beneficial impact of usability quality on
their products. For this to happen, these experts must have solid evidence that human factors
interventions are a necessary part of the procedure development process, and that they should be
integrated into it. This is particularly true for procedures that are used under normal conditions; as they
are used frequently, errors in their implementation or use might cause abnormal operating conditions
(Carnio, 1980).
Although some nuclear plants may be migrating their procedures to computer-based procedures, most
of them still use the traditional paper ones (Oxstrand, Le Blanc, & Hays, 2012). Thus, it seems peculiar
2
that despite the existence of HF guidelines for procedure design, and despite the fact that usability
comparisons have been performed between computer-based procedures and paper-based procedures,
no usability comparison of the design features of paper operating procedures for normal conditions has
been completed. Consequently, this thesis aims to be an exploratory study in this area.
This thesis directly compares the usability of two existing paper industrial operating procedures with
versions that were redesigned with the input of evidence-based document design principles and human
factors processes. The findings illustrate that the new procedures have higher-or-equal ratings for
effectiveness and efficiency and elicit higher satisfaction from the operators.
3
2 Literature Review The majority of the existing procedure literature revolves around the topics of procedure design
challenges, computer-based procedures, and human factors design guidelines. While the focus of this
thesis is procedure usability, it is necessary to take a cursory exploration of these other topics.
Although there are a number of articles that focus on the topics of procedural adherence, operator
autonomy, and checklist design, most of this subject matter lies outside the scope of this thesis.
2.1 Introduction to procedures Two definitions of procedure from the literature are as follows:
- “[Procedures are] Prescribed action lists to help operators remember and follow mandatory
steps that guarantee safety, workload and performance criteria.” (Boy & De Brito, 2000)
- “Procedures indicate to the human operator the manner in which operational management
intends to have various tasks performed. The intent is to provide guidance…to ensure a logical,
efficient, safe, and predictable (standardized) means of carrying out…objectives.” (Degani &
Wiener, 1994)
The definition of a safe behaviour may help to clarify and reinforce the above definitions:
- “[A safe behaviour is] … one consisting in avoiding any action that might be detrimental to the
plant safety” (Dien et al., 1992) (p. 178)
Most sources in the literature point to the Three Mile Island incident as being the inciting event that
brought procedural improvement to the forefront of the nuclear industry, particularly in the emergency
operating procedure (EOP) area (Carvalho et al., 2006; Dien et al., 1992; Orendi et al., 1988; Theureau
et al., 2000; D. R. Wieringa & Farkas, 1991).
2.2 Procedure design: Challenges and guidelines There are several instances in the literature of studies that quantify the extent and impact of procedural
problems. For example, a field study was conducted at five refining and chemical sites to understand
factors that impact the success of procedural operations and to develop recommendations to increase
that success (Bullemer & Hajdukiewicz, 2004). The study noted that up to 30% of all reports of failed
operations had procedural operations listed as a cause, and suggested that this same cause was
behind up to 8% of reported financial losses.
Ockerman and Pritchett (2000) cited two studies about procedural problems. The first, a study by the
Institute of Nuclear Power Operations (INPO) (1986), reported that of the 48% of incidents initially
assigned to “failures of the human factor,” almost 65% involved a deficiency in a procedure. The
4
second study looked at several hundred incidents and concluded that procedural faults were involved in
69% (Goodman & DiPalo, 1991). Though the exact causes were not listed, some of the general
possibilities included: the operator having a different intention than the procedure they were using; the
range of the procedure differing from the environment or the capabilities of the operator; and when the
set of actions included in the procedure are inaccurate.
What design challenges lead to these deficiencies in procedures? In order to communicate, share, and
remember procedures, they must be written down (Ockerman & Pritchett, 2000). However, the people
that are tasked with the writing of these procedures in NPPs may face a litany of difficulties (D. R.
Wieringa & Farkas, 1991). The system that they are attempting to document is very complex, consisting
of many parts and intricate subsystems. They must grapple with unique interface challenges such as
describing, in writing, physical actions or nuances particular to their specific plant. For example, a
procedure writer in a chemical plant may have to describe at what speed an operator should close a
valve. The procedures that they are writing will sometimes be used in adverse conditions, or perhaps
by a team that is coordinating a complex task. Additionally, a procedure writer must attempt to design
procedures whose comprehensiveness, accuracy, and detail match the needs of the worker that will be
using them (Ockerman & Pritchett, 2000). In particular, a balance between having too much detail and
being too general is very difficult to maintain (Dien, 1998), (Boy & De Brito, 2000). This decision can be
exacerbated by variability in the competency levels of operators, and the complexity of the system.
To combat these multiple difficulties, several approaches to procedure development have been
discussed (D. R. Wieringa & Farkas, 1991). These include writing procedures in multidisciplinary
teams; keeping a rigorously consistent format; fostering an understanding that the designer will have to
rely on the operator’s prior knowledge; supporting group tasks with customized procedures that have
several different procedures for different operators working on the same task; and validating thorough
user testing with simulators and cognitive walkthroughs.
To foster and maintain the consistent format mentioned above, a writer’s guide should be compiled to
lay out guidelines and maintain procedure quality (Bullemer & Hajdukiewicz, 2004). It should be noted
that one must take caution when putting these writer’s guides together. They are, in essence, a
procedure on how to write procedures. As such, they should still be written with the operator in mind,
and not just the procedure writer (Caccamise & Mecherikoff, 1993); to wit, the end language should
contain terms that the operators are familiar with and have been trained on (Carvalho et al., 2006).
Part of maintaining a consistent format is selecting a layout for the procedures. Wieringa and Farkas
(1991) illustrated and discussed some common examples of procedural layouts, including a two-column
text format that distinguishes between general and highly detailed instruction, and a graphical flowchart
5
format. While the two-column text format reduces reliance on operator knowledge, they may become
“lost” in the procedure while attempting to troubleshoot or diagnose issues. The graphical flowchart
format prevents operators from becoming lost because of its helpful flow-lines, but takes up more space,
is difficult to follow over multiple pages, has less space for text, and cannot easily show the hierarchical
information that is usually necessary in procedures.
2.3 Human factors and procedures While procedures are necessarily designed with consideration of constraints of the system itself, they
sometimes overlook the cognitive characteristics of the operators (Dien, 1998). There are some articles
that explicitly deal with human factors guidelines of procedures in their different forms. Overall, the
primary goals of applying human factors to procedures are as follows:
o to ensure that operators can carry out the procedures without overloading themselves;
and,
o to ensure that procedures are structured in a way that are “easy to understand and
follow” (Niwa, Hollnagel, & Green, 1996).
To improve the usability of written procedures, human factors principles should be taken into
consideration (Luna et al., 1988). Luna et al. list these principles:
• each page should be treated as a display;
• information should be kept in small blocks;
• information should be presented consistently;
• the expectations that the operator has learned while working in the physical plant should serve
as the template for the procedure; and,
• physical features of the plant should be matched.
Human factors considerations have also been produced in regards to writing emergency response
guidelines that might allow operators to diagnose and recover from beyond design-basis or low-
probability events (Orendi et al., 1988):
• Steps should be concise;
• any extraneous information should be minimized;
• font emphasis should be used consistently;
• steps should be simplified and standardized; and,
• the vocabulary should be kept specific and consistent.
6
Sharit discussed a contextual modeling approach to human and system reliability analysis (HRA), and
its usefulness for procedural design applications (Sharit, 1998). He noted that in HRAs, the traditional
way of dealing with procedures has been to treat them as performance-shaping factors. He then argued
that HRA should instead be applied to the details of written procedures in order to determine the
contexts of work that might be linked to human error. With this in mind, he suggested several general
guidelines for the writing of procedures from an HRA perspective. At their highest level, these
guidelines consisted of the following:
• explore possibilities for errors in the form of slips and lapses due to factors occurring at the skill-
based level of performance;
• explore possibilities for mistakes due to factors occurring at the rule-based level of performance;
• explore errors that could arise during contingency operations related to performing work
procedures;
• explore errors that may arise from shift changeover; and,
• explore errors that may arise from activities involving communication.
These guidelines were subject to several caveats: the guidelines will be more useful to procedure
designers who are knowledgeable in HRA; the way the guidelines are used will be dependent on
whether they are being applied to procedures that are in use or procedures that are being planned; and
the guidelines should and could be expanded and cross-linked with industry-specific knowledge.
Burks and Peres attempted to design a procedure classification rubric by determining elements that
contribute to cognitive complexity in procedures (Burks & Peres, 2011). The rubric was based on
Campbell’s (1988) objective complexity framework, and on the assumption that the attributes of a
procedure are what determine its complexity (as opposed to how many steps it contains). Elements that
were thought to contribute to complexity included decision points and concurrent procedural operations.
Though the rubric depicted what contributed to cognitive complexity, it could not be practically used;
every procedure that was reviewed had three or more decision points, and thus would all be labeled at
the highest level of the rubric.
2.4 Procedure culture, following and adherence In addition to the above-mentioned difficulties, there are several more nuanced factors that affect the
use of procedures that designers must take into account. After a study completed at four petrochemical
refineries in North America, several issues with procedural culture came to light (Jamieson & Miller,
2000). Procedures were rarely seen in use for several reasons: operators only checked procedures
during complex or infrequent tasks; they thought they knew them well enough to ignore them; and
7
sometimes they were too difficult to understand or retrieve. Also, there was a perception from the
operators that the procedures were likely out of date.
Even if operators think the procedures are too difficult to understand or that they are out of date, they
must think about the consequences that may be handed down from management if they attempt to
deviate from a procedure, particularly if their efforts fail. Some operators have said that due to potential
consequences, they would rather follow a procedure that they know is incorrect or inefficient rather than
disregard it (Dien et al., 1992). Clearly, there is a difficult balance that operators must maintain: they
must follow procedures in order to maintain safety, reliability, consistency (and to avoid negative
judgment and consequences from management); at the same time, they must stay critical of the
procedure for the same reasons (Theureau et al., 2000).
The concept of “controlled initiative” dictates that non-adherence to a procedure should not inherently
be judged as incorrect or punishable; rather, its motive and results should be examined and weighed
(Dien, 1998). This would hopefully lead to an atmosphere that would allow operators to overcome the
fear expressed above. However, this approach is not without its dangers; the more that operators
deviate from procedures, the higher the probability there is for the operator to commit an error.
Procedure writers and designers must maintain consistency in their work; if errors and inconsistencies
are present in procedures, the procedures may not make sense to operators, and this could lead to
both a lack of confidence in the designer and lower procedure adherence overall (Sharit, 1998).
However, if procedures seem logical to the operators, they are more likely to adhere to them. Thus,
procedures should be written in such a way as to help their users understand the rationale behind the
steps and criteria (Boy & De Brito, 2000; Carvalho et al., 2006; Kontogiannis, 1999).
2.5 Paper- and computer-based procedures Some nuclear and petrochemical plants are moving away from paper-based procedures (PBP) and
switching to computer-based procedures (CBP), as CBPs can be seen as a solution to some of the
drawbacks of PBPs (Bullemer & Hajdukiewicz, 2004). For example, PBPs can become very large as
the complexity of the procedure grows; thus, they become very difficult to move due to their volume,
especially on control desks (Kontogiannis, 1999; Niwa et al., 1996; Ockerman & Pritchett, 2000). During
a special inspection program for emergency operating procedures (sponsored by the US Nuclear
Regulatory Commission), it was found that PBPs were inadequate at instruction presentation and
cross-referencing to other procedures (Lapinsky, 1989). They are also highly inflexible, insofar as they
are very difficult to change when it comes to dynamic situations (Niwa et al., 1996). For example, using
CBPs, one could “collapse” or “hide” sections of a page that are not relevant to their current situation.
Moreover, in PBPs, information is fixed in a sequential form, and the cautions/warnings they present
8
may not always be accurate for the current plant state (Boy & De Brito, 2000; O’Hara, Higgins, &
Stubler, 2000). Navigation through PBPs may not be simple, especially if sequential steps are not in the
same list, or if they must be cross-referenced (Ockerman & Pritchett, 2000). Finally, PBPs almost never
anticipate interruptions in their design. A study showed that when pilots are interrupted, it usually leads
to omissions in the procedure that they were carrying out (Boy & De Brito, 2000).
Niwa (1996) discussed the development of human factors guidelines for computerized emergency
operating procedures. Recommendations for CBP implementation included formatting having fixed,
well-defined fields; the avoidance of overlapping windows; individual sub-step representation; and the
minimization of the navigation required from the operator. Additionally, notes and cautions in CBPs
should make use of borders and colours to remain distinct from steps, and should also be kept near
them; this helps to make them salient and to keep the operator from overlooking them (Kontogiannis,
1999).
There have been several studies that attempted to compare CBPs and PBPs. A study of the
computerization of NPP procedures showed a balance of advantages between CBPs and PBPs
(O’Hara et al., 2000). Although CBPs exhibited easier data retrieval and faster completion times than
their paper-based counterparts, they also had higher complexity and attentional demands. One
important conclusion that was drawn from this study was that even if CBPs are implemented to
enhance the operator experience, PBP backup systems must be in place in case of any technological
failure.
Kontogiannis (1999) reviewed several studies that evaluated the usability of computerized emergency
operating procedures. They were all pilot studies that only tested a few crews in a small number of
emergency scenarios. Though the small sizes of these studies prohibited statistical analysis, it was
observed that crews that used CBPs made fewer errors, but, in contrast to the study discussed above,
took longer to bring the plant back to normal operational status than crews using the PBPs. Though it is
difficult to say for certain whether these values can be directly compared, the percentage loss in
completion time in these studies was much smaller than the percentage gain in accuracy from using the
CBPs. Based on operator comments, it would seem that the main issues with the CBPs that led to the
slower completion times were the design of the interface, the navigation, the formatting of the on-
screen procedures, and the small amount of information on each screen relative to a full page in a
traditional PBP. On the other hand, another study showed that computer-based cockpit procedures
were rated higher and were performed more quickly and more accurately than their paper-based
counterparts (Shamo, Dror, & Degani, 1998).
9
What if the appropriate guidelines are applied to both media? A battery of studies indicated that if
guidelines on information readability, content, and organization are followed and applied to both the
PBP and CBP version of procedures, the CBP versions were rated only slightly higher (Patel, Drury, &
Lofgren, 1994).
Theureau et al. performed an empirical study of operator activity in different accident scenarios on a
simulator (Theureau et al., 2000). It was discovered that the step-by-step nature of the CBP used in
that case reduced the operator’s awareness of the overall evolution of the procedure. This, in turn,
made it difficult for the operator to carry out spontaneous adjustments. Additionally, it was difficult for
the operators to estimate the overall effect of the spontaneous adjustments because of this reduced
awareness. The operators found it easier to read ahead with the PBPs to gain an overall view of the
procedure. An older study discovered a similar issue (Elm & Woods, 1985); it also noted that when the
CBP was redesigned with this in mind using an “electronic book” metaphor, the effect of the step-by-
step nature on the operator’s awareness of the overall view of the procedure was successfully mitigated.
A study performed in a control room simulator explored the effectiveness and usability of CBPs
(Converse, 1994). It involved eight pairs of operators controlling a scaled pressurized water reactor
facility. Each team performed both normal and accident scenarios with both the CBPs and traditional
PBPs with the order of CBP and PBP and scenario types counterbalanced. The number of errors
committed, times to initiate and complete procedures, and subjective workload estimates (using the
NASA-TLX) were recorded. Under normal operating conditions, the procedure type did not significantly
affect the performance measures (errors or times). However, the individual NASA-TLX dimensions
showed that operators were significantly more confident of their appropriate accomplishment of the task
goals with the CBPs than the PBPs. They attributed this to the CBPs structuring their responses more
rigidly, making it less likely for them to skip a step. In the accident scenarios, there was a significant
effect of procedure type on response initiation time, but no effect was found on completion time; the
PBPs had a faster initiation time than the CBPs, but this made no difference in how long the procedures
actually took to complete. Additionally, the use of PBPs also resulted in four times as many errors than
use of CBPs. This led to a significant interaction between procedure type and task type; there were
many more errors for PBPs as compared to CBPs in accident scenarios, but not in normal scenarios.
Thus, future evaluations of CBPs should be completed with a variety of scenario types.
Now that I have reviewed multiple studies about the usability of CBPs, and about usability comparisons
between PBPs and CBPs, the gap that remains becomes clear: there are no usability comparisons of
two PBPs.
10
2.6 Checklists The purpose and use of checklists are similar enough to procedures that some of the literature
revolving around them and their usability pertains to procedures. Just as procedures, they are used to
help operators remember and follow steps in processes. However, while procedures are meant to be
more detailed instructions to follow to achieve a goal, checklists are meant to be shorter lists meant to
prevent the forgetting of an intended action, and usually require a confirmation of each action taken
(LEHTO & SALVENDY, 1995).
Degani & Wiener began their study of checklists by concentrating on the human factors of a paper
checklist as a display (Degani & Wiener, 1990). They used field studies, interviews with pilots, accident
reports, interviews with government agencies, and information from aircraft manufacturers and the
general literature in order to understand the way that flight crews use checklists. A key finding was that
pilots did not strictly use the checklists as they were intended to during their design. Pilots “short-cut”
the checklists by calling several challenges out in one chunk and waiting for the other pilot to reply with
their “chunk” of responses. The intended use of the checklist was to have step-by-step challenges and
responses. In addition to this, they also discussed issues relevant to checklist design such as task
analysis, operational logic, sequencing, and duplication.
To compare which components in a checklist most effectively contributed to detection of medicine
errors at a patient’s bedside, two different checklists (an old and a new) were compared in a high-
fidelity usability lab (White et al., 2010). Human factors engineers designed new checklists after
observing nurses use the old checklist; the new checklists were meant to mitigate issues that the
engineers noted while observing the nurses. Ten nurses were each asked to check fourteen infusion
pumps in a within-subjects design that counterbalanced errors between participants and the order of
checklist used. The use of the new checklists resulted in the discovery of significantly more errors than
the old checklists. Specific instructions were found to be more effective in aiding the detection of errors
than general reminders. While this was a comparison of two paper documents, they were checklists,
and not procedures.
Much like PBPs, paper checklists have known shortcomings. A study involving four four-person flight
crews that compared electronic and paper checklists outlined these imperfections (Palmer & Degani,
1991). The errors included operators losing track of which step they were currently performing, skipping
steps due to interruptions or distractions, forgetting to return to a skipped item, and incorrectly marking
items as completed.
The performance of different types of electronic checklists have been compared against each other and
against performance from a paper checklist (Mosier, Palmer, & Degani, 1992). 12 two-person crews
11
flew a full mission simulation while being randomly assigned to three different groups: automatically-
sensed checklists, manually-sensed checklists, or paper checklists. Performance from memory was
also compared with performance from the checklists. The major dependent variable that was recorded
and analyzed was the handling of engine problems that were introduced in an emergency condition.
The small sample size (due to the between-subjects design) made statistical analysis inappropriate, but
trends in the data suggested that crews that erroneously shut down an engine tended to be those that
were initiating engine shutdown from memory, and that were using electronic checklists (5 of 8); these
pilots also recorded the highest workload on their subjective workload assessments. The crews that
took the correct course of action were the ones with the “less automated” checklists; that is, they did not
initiate any of the checklist items from memory, and they were using paper checklists (3 of 4).
Presumably, the crews that had to take the time to retrieve paper checklists had more time to discuss
and process the cues that were presented and were less likely to respond incorrectly.
2.7 Usability studies of other paper-based documents If usability studies and evaluations of paper-based documents other than procedures have been
completed, why haven’t any been completed for PBPs? Though the scope of this project is limited to
paper-based procedure usability, usability studies of other paper-based documents across a variety of
domains are briefly discussed below.
A one-group pretest-posttest comparison was used to compare personal digital assistant (PDA) and
paper and pencil tests. It measured quiz scores, mental workload, completion time, and user
satisfaction (Segall, Doolen, & Porter, 2005). The study found that students completed the tests more
quickly with the PDA, but no differences were found in quiz scores, subjective workloads, or satisfaction.
While this study was not concerned with procedures, it used the same categories for usability that were
used in the current study: effectiveness, efficiency, and satisfaction.
In the healthcare domain, a usability study was conducted to compare the effect of evidence-based
document design principles on patient information leaflets (Pander Maat & Lentz, 2010). 154 users
tested three leaflets before their revision, and 164 tested them after their revision. Dependent measures
included the success rate and time necessary for users to find particular information, comprehension,
and perception of usability. After correcting for literacy differences between the groups, it was shown
that the revisions led to significantly better localization performance, comprehension, and improved
appreciation of the material. A follow-up study was performed that determined that the gains listed
above are independent of the A4 paper formats that were originally tested; thus, it is possible that these
results could be generalized over many document formats, including procedures.
12
2.8 Summary I have reviewed some of the literature surrounding procedures, including design challenges and
guidelines; human factors issues; culture and adherence; medium of transcription and use; checklists;
and also other usability studies of other paper documents.
Several gaps exist in the procedure usability literature that has been reviewed here. Though guidelines
for procedure design, writing and formatting have been briefly discussed, few of them are evidence-
based. Although a larger discussion of this is warranted, it falls outside the scope of this thesis.
Additionally, several studies have been completed comparing the usability of CBPs and PBPs, but none
that compared the usability of PBPs against other PBPs based on their design features. Many of these
studies focused on emergency operating procedures, whereas only a few focused on procedures used
under normal conditions. Finally, though there has been one usability study of a paper-based checklist
before and after human factors intervention, a similar study of a PBP does not exist.
In order to fill the gap described above, this thesis aims to complete a usability comparison of two
existing paper industrial operating procedures with versions that were redesigned with the input of
evidence-based document design principles and human factors processes.
13
13
3 Procedure Redesign I performed a comparison of paper industrial operating procedures at a plant that refines low-level
radioactive materials. Due to the nature of the work, the facility falls under the governance of the
Canadian Nuclear Safety Commission (CNSC), and follows similar rules to NPPs. Though the plant
does contain a control room, the majority of the operations that are carried out are field operations.
In the fall of 2009, several reportable incidents occurred at this plant. Upon examination of these
incidents operating procedures were cited as contributing causal factors. This prompted management
to request an internal audit to review the status of operating procedures. The audit resulted in several
recommendations, including the creation of a procedure style guide and the updating of the plant’s
procedures.
I was one of three consultants that were asked to redesign the procedures at this plant from a human
factors perspective. At the time, each of us was responsible for procedures in a different area. The
plant had more than 500 individual procedures, in addition to short, reminder job-aids that were posted
at key points throughout the plant. At the time of writing of this thesis, only a few procedures have been
rewritten. As will be discussed in more detail below, we created a new style guide and procedure format,
and conducted an operating experience review, function analysis, and task analysis for each procedure.
3.1 Motivation for Procedure Redesign The existing format of the procedures was a two-column action-detail (or, in this case, step-comment)
format. An example of a page from the old cylinder filling procedure can be seen below, in Figure 3-1.
The image is modified to protect proprietary information.
14
Figure 3-1 - Page from old cylinder filling procedure Although the existing procedures were familiar to the operators and relatively short (e.g., the example
procedure above was only three pages in length), the internal audit revealed several weaknesses that
may have contributed to the reportable incidents, including the following:
A) The procedure format was inconsistent;
B) Procedures and/or job aids were not being used on a daily basis;
C) Relevant information relating to the execution of the procedures was missing;
D) Procedures were not easily accessible by operators when performing tasks.
From the above list, the first three points were the ones most impacted by this human factors project.
We proceeded to review incident reports, review several of the procedures, and interview several
operators to start our analysis of the procedures. This is discussed in further detail in the section below.
We discovered issues similar to the ones discussed in the audit:
A) The procedures were inconsistent in their vocabulary; and,
B) Operators felt steps that were necessary and typically performed were missing from the written
procedures; thus, the procedures were incomplete.
15
3.2 Redesign Process and Results A review of the literature revealed that there were few clear evidence-based guidelines indicating how
to construct an improved, consistent procedure template. In order to fulfill this goal and to create a new
style guide and format for the procedures, we consulted the sources that did have evidence-based
guidelines: an industry-standard book on procedure writing (D. Wieringa, Moore, & Barnes, 1998), the
Nuclear Energy Institute’s procedure writers’ manual (Nuclear Energy Institute, 2006), and the Institute
of Nuclear Power Operations (INPO) guide to procedure use and adherence (Institute of Nuclear Power
Operations, 2009). Additionally, we consulted industry professionals, used human factors heuristics
(including, amongst others, consistency, standardization, and flexibility and efficiency of use), and
professional judgment, and attempted to incorporate (as a guideline) what the operators responded
positively to.
Next, a human factors process to redesign the content of the procedures had to be completed, tested
and implemented. The process was based on that found in the United States Nuclear Regulatory
standard, NUREG-0711 (O’Hara, J.M. & Higgins, J.C., 2004). First, we conducted an operating
experience review (E. Davey, personal communication, May 13, 2013) that included interviews with
operators and a review of incident reports. As the area that I worked on procedures in was the cell
room, my interviews included interviews with operators from that area of the plant, and a review of cell
room incidents. Next, we completed a function analysis; this was a hierarchical representation of
system functions that allowed us to understand the system capabilities. For the cell room, I chose to
use work-domain analysis (WDA), and an abstraction hierarchy (AH) as the tool for the function
analysis. This allowed for me to gain an understanding of what pieces of equipment comprised the cell
room, and why and how each of the pieces and processes were connected in order to achieve the goal
of the cell room. Part of the result of the WDA can be seen in Figure 3-2, below.
Figure 3-2 - Part of the WDA for the cell room
RailcarRectifiersRectifier Cooling System
Clean-up reactors
Port Hope Conversion
Facility
Flame Reactors Cell room
AHF Storage Tanks
Cell cooling system
F2 Subsystem
H2 Subsystem
Header System Cells
H2 Surge Tank H2 Blower Scrubber KOH Seal
PotAtmospher
ePropylene
filtersF2 Surge
Tank CompressorF2
Discharge drum
Monel Filters
Rectifier cooling
exchanger
Rectifier cooling
water tank
AHF Receiving
Carbon Anode
Carbon Steel
Cathode
16
After that was completed, we performed a task analysis. To inform the task analysis, we performed
heuristic and experienced-based reviews of existing procedures in order to identify issues in operation.
Additionally, we observed operators completing each procedure at least twice. These were naturalistic
observations in which we avoided interrupting or asking questions of the operators; while the operators
were aware of our presence, we assured them that we were merely trying to gain an understanding of
the procedures, and not to judge their performance. Additionally, an interview with the system engineer
was completed. These activities allowed us to ensure that we had an essential understanding of the
plant, the procedures, and their purpose. In my case, they allowed me to better understand what pieces
of equipment were being referred to in the cell room. Additionally, observing the operators while I was
reading the procedures began to give me a sense of where the existing procedures may have been
missing some actions.
This was followed by at least two operational walkthroughs of the task with operators on different crews.
While the previous observations were naturalistic in nature, these walkthroughs were more intrusive:
we asked the operators to speak aloud and explain to us why they were performing particular actions.
Additionally, we asked them to tell us when they came upon a point in the procedure where they
thought a step was missing or inaccurate, their practices were recorded in detail, along with the
rationale for the way certain things were performed. For the cell room procedures, this allowed me to fill
in many of the actions that were missing from the old procedures, and helped me to further understand
why the operators were performing the actions that they were. Additionally, the rationale served to fill
some of the details, notes, cautions and warnings that were placed in the new procedure.
We placed all of this information in task analysis tables, and held follow-up discussions with operating
staff and area engineers to clarify any remaining issues. For example, when different operators (or
different crews) were observed performing steps or actions differently, we consulted with shift
supervisors and area engineers to determine which of these methods would be the safest and most
resistant to potential errors. Once the steps were placed into the task analysis tables, it became easy to
see the hierarchy and the natural groupings of actions; see Figure 3-3 below for an early version of the
hierarchical task analysis of the cell room procedure that I worked on redesigning. The highlighted cells
represented areas where different operators presented conflicting steps of how their crews performed
certain actions.
17
Figure 3-3 – Task analysis of revised cell-dipping procedure
Next, we organized the task analysis into the proposed procedure sequence and discussed it with the
shift supervisors and area engineers to determine whether any important or critical steps were missing,
placed in the wrong order, or had excessive (or insufficient) detail. We then took the task analysis and
created a draft procedure in the new format. The draft was distributed to all of the plant’s crews for
review and comment for two iterations. The iterative process helped the teams reach consensus with
each other and with area engineers on points of contention. Finally, we took the draft procedure to the
management team for review and signoff acceptance. The procedure was then introduced into service
and the operators and training staff were informed of the new procedure. Any new training needs were
identified and communicated to the Training Department.
An example of a page of the new cylinder filling procedure (one of the procedures used in the usability
testing) can be seen in Figure 3-4 below. The new procedures contain more steps that operators
18
perform (as identified through the task analyses); are structured in local groups of hierarchical actions
that were driven by the task analysis; are based on a style guide that enforces consistent vocabulary
and formatting; and contain new, relevant information in the form of clearly demarcated details, notes,
warnings, and cautions. The length of the procedures also grew (e.g., the example cylinder filling
procedure expanded from 3 pages to 14 pages).
Figure 3-4 - Page from the new cylinder filling procedure
19
Some direct comparisons can be made in order to illustrate the differences between the old and the
new procedures directly. For example, Figure 3-5 below shows an example of how in the old
procedures, equipment was identified by functional name only; this would have made it difficult to find
the equipment if you knew what the equipment identifier number was, but did not know what it looked
like. In the new procedures, both functional names and device identifiers were included (Figure 3-6).
Figure 3-5 – Old procedure - equipment name only
Figure 3-6 - New procedure - Equipment name and ID
Figure 3-7 below shows an example of how the old procedures were written in a two-column, step-
detail format. Though the intention here was to always have steps in the left column and details in the
right column, they were sometimes mixed up. On the other hand, as can be seen in Figure 3-8 below,
the new procedures were revised with the output from the task analysis in order to create a task
hierarchy, numbering scheme, and local groupings of actions.
Figure 3-7 - Old procedure - Step-detail format
20
Figure 3-8 - New procedure - Task hierarchy
Whereas the old procedures were written in paragraphs (Figure 3-9), which could have sometimes
made it difficult to locate and isolate steps in the block of text, the new procedures eliminated
paragraphs and wrote out individual steps instead (Figure 3-10). Additionally, important warnings and
cautions were highlighted according to ANSI guidelines.
Figure 3-9 - Old procedure - Paragraphs
Figure 3-10 - New procedure - Individual steps with warnings and cautions
While the old procedures sometimes displayed an inconsistent step structure that (as previously
mentioned) sometimes had comments in the step column (or multiple steps written together) (Figure
3-11), the new procedures consistently followed an action-object step structure, with the lowest-level
actions capitalized (Figure 3-12).
21
Figure 3-11 – Old procedure - Inconsistent step structure
Figure 3-12 - New procedure - Action-object step structure
Finally, while the old procedures rarely contained explanations for the actions contained within, the new
procedures embedded these explanations for actions directly within highlighted details, notes, cautions,
and warnings. In the example illustrated below, while the old procedure mentions the mandatory use of
the automatic valve closer (Figure 3-13), it does not detail why it should be used, or what might happen
if it is not used correctly. These explanations can be seen in the new procedure (Figure 3-14).
Figure 3-13 - Old procedure - Rarely contained explanations
Figure 3-14 - Explanation embedded in warning and caution
22
Table 3-1, below summarizes some of the differences in the characteristics of the old procedures and
the new procedures.
Table 3-1 - Comparison of old procedure and new procedure features Old procedures New procedures
Equipment is identified by functional name
only; not name and device identity.
Where possible, both functional names
and device identifiers were included in
the revised procedures.
Two-column, step-detail format; not sub-
divided into logical groupings of actions.
Task analysis output a task hierarchy;
used to create a numbering scheme and
local groupings of actions.
Written in paragraphs Elimination of paragraphs; use steps
instead. Highlighting of warnings and
cautions consistent with ANSI guidelines.
Inconsistent step structure. Action-object step structure; lowest-level
actions in capital letters.
Rarely contained explanations for actions. Explanations for actions embedded in
cautions and warnings.
23
4 Method I wanted to evaluate whether the operators would rate the usability of the procedures based on the new
style guide, format, and development process higher than the old procedures.
In the plant where this research was performed there are several different working areas, and each
have their own sets of procedures. I wanted to ensure that the results obtained were not biased by one
particularly well-done or poorly written procedure, or one relatively difficult or simple area. Thus, I
decided that it would be best to test at least two different revised procedures from at least two different
areas of the plant. Additionally, I chose two procedures that were created and updated by each of two
different human factors engineers for the same reason. I avoided the area in which I performed the
procedure redesign (the cell room) in an attempt to avoid any bias on my part.
The two procedure areas that I selected were the flame reactor and cold trap area (FR), and the
cylinder filling (CF) area. These areas were selected because they involved the least amount of
equipment and would require the operators to move around the least. This would make it easier for the
experimenter to track the actions and movements of the operators. Thus, in total, four procedures were
used during the tests: two procedures from CF (one old and one new), and two procedures from FR
(one old and one new). The old CF procedure was three pages long, while the new procedure is 14
pages long; the old FR procedure was two pages long, while the new FR procedure is eight pages long.
The operators are classified by qualification levels between 0 and 3 in each area of the plant based on
the number of supervised hours of work in the area and the amount of training they’ve undergone (D.
Perry, personal communication, July 16, 2013). The definitions of the levels are given below for
clarification:
Level 0 (L0) – new operator, yet to complete common and area-specific training components (though
have started training in the area)
Level 1 (L1) – operator is not authorized to operate equipment or perform job tasks/procedures unless
under the direct supervision of a qualified person until the operator has successfully completed the
performance evaluation for the given task/procedure
Level 2 (L2) – able to perform all tasks and procedures independently under normal supervision but
with a qualified operator available for guidance
Level 3 (L3) – qualified and able to run the given area independently under normal supervision.
The plant has four different crews that work on a shift schedule. 16 operators were selected for the
study based on a range of experience and qualifications. I wanted half of the operators to be
24
experienced, and half to be inexperienced. See Table 4-1 below for an illustrative breakdown of the
operator selection. Four of these operators were qualified as L3 operators in the CF area and at least
two other areas (with the exclusion of FR), four were L3 operators in FR and at least two other areas
(with the exclusion of CF), and eight operators were less than or equivalent to L0 in the CF and FR
areas (and L3 in one other area or less). For reference, I will refer to the first group as “Experienced
C/F”, the second group as “Experienced F/R”, and the third group as “Inexperienced”.
Table 4-1 - Participating operator breakdown Operator Cylinder Filling Flame Reactor Other areas
Operators 1-4
(Experienced C/F)
L3 < L0 L3 in ≥ 2 areas
Operators 5-8
(Experienced F/R)
< L0 L3 L3 in ≥ 2 areas
Operators 9-16
(Inexperienced)
≤L0 ≤L0 L3 in ≤ 1 area
4.1 Test design I used a within-subjects counterbalanced design for this study. I decided to use a within-subjects design
in order to accommodate for the potentially low number of participants; it was expected that there would
be a finite number of available operators that had the time available to participate in the study, and
even fewer that would want to participate that would fit the selection criteria. This is an attempt to avoid
the pitfall of not having enough subjects for a statistical comparison when performing a between-
subjects design, as experienced in a 1992 study on a comparative performance measurement of
checklists (Mosier et al., 1992). Additionally, using the within-subjects design should control for any
individual variability between operators. In order to counter any learning effects or order effects that
may present themselves, I counterbalanced the study in several ways: the order of presentation of each
procedure and the order of which area was presented first was counterbalanced for each crew and for
each category of qualification level.
4.2 Independent Variables The independent variables in this study were the procedure type (new or old), the procedure area (CF
or FR), and operator experience level (experienced or inexperienced).
25
4.3 Dependent Variables ISO-9241 defines usability as the “extent to which a product can be used by specified users to achieve
specified goals with effectiveness, efficiency, and satisfaction” (ISO, 1998). A 2006 review of usability
measurement methods examined usability measures from 180 studies published in HCI journals and
classified them according to the three categories from ISO-9241 that were presented above (Hornbæk,
2006). The following definitions of the categories were cited from the ISO standard:
Effectiveness: “accuracy and completeness with which users achieve specified goals”
Efficiency: “resources expended in relation to the accuracy and completeness with which users achieve
goals”
Satisfaction: “freedom from discomfort, and positive attitudes towards the user of the product”.
In this study, I aimed to have at least one measure (as reviewed in (Hornbæk, 2006)) from each of
these categories. Due to reasons of limited operator availability, limited equipment availability, and time
constraints, few measures were deemed suitable matches. Despite this limited number of matches, this
study still contains one measure from each category. Hornbæk noted that of the studies that he
reviewed, 22% had no effectiveness measure, 19% had no efficiency measure, and 38% had no
satisfaction measure.
In order to fill the category of “effectiveness”, I used a users’ assessment; for efficiency, a modified
NASA Task Load Index (NASA TLX) questionnaire (Appendix A); and for satisfaction, a questionnaire
(Appendix B) that included several Likert-type scale-questions and open-ended questions.
The “users” referred to above are the operators. I asked the operators to read through the procedures
and note any inaccuracies or confusing points that they found in the procedures. I recorded and tallied
these items to form a count of inaccuracies and confusing points for each procedure. In an attempt to
remain as objective as possible, I asked the operators to count only the inaccuracies that, if the
procedure was followed precisely, would result in damage to equipment, themselves, others, or an
inability to complete the task.
The NASA Task Load Index is a “multi-dimensional rating procedure that provides an overall workload
score based on a weighted average of ratings on six subscales” (NASA, 1986). The published version
contained the six dimensions of mental demands, physical demands, temporal demands, own
performance, effort, and frustration. However, given that the item being tested here were the usability of
documents, I deemed the subscale for “physical demands” to be irrelevant.
26
In order to create an operator questionnaire that suited our particular analysis, I adapted two items: the
Questionnaire for User Interface Satisfaction (QUIS) (Chin, Diehl, & Norman, 1988), and a
questionnaire based on the consumer information rating form that measured the perception of usability
(Pander Maat & Lentz, 2010).
The Likert questions that were included in the operator questionnaire asked the operator about the
following qualities for each of the procedures that they read:
• Well-organized
• Ease-of-reading
• Written accurately enough in order to be able to achieve the specified goal completely
• Written in a way that lets you achieve the specified goal safely
• Written with the appropriate amount of detail
• Overall satisfaction with the procedure
Each of the questions was on a 5-point Likert-scale that ranged from strongly disagree (1) to strongly
agree (5). Additionally, there was an open-ended question that dealt with the amount of detail that
allowed the operator to comment if they disagreed with the amount of detail that was included in the
procedure. This allowed the operator to explain whether they thought there was too much or too little
detail, and why they thought that. At the end of the testing protocol, I also asked the operator whether
they preferred the new or the old procedures overall, and why.
4.4 Hypotheses I expected that, in general, the new procedures would have a positive usability impact. The type of
procedure (old versus new) was predicted to directly affect all usability factors: effectiveness, efficiency,
and satisfaction. The inaccuracy and confusion counts of the new, redesigned procedures were
predicted to be lower than that of the old procedures; the NASA TLX scores of the new procedures
were predicted to be lower than that of the old procedures; and the operators were predicted to give
higher subjective ratings of satisfaction in the survey to the new procedures than the old procedures.
I also predicted that these positive impacts of the new procedures would be true across both of the
procedure areas for all dependent variables.
With regards to operator experience, I predicted that there would be very little difference between the
ratings given by the experienced operators or the ratings given by the inexperienced operators in all of
the usability factor categories.
27
Finally, I predicted that the majority of the operators would state that they prefer the new procedures to
the old ones.
4.5 Testing protocol For each operator that participated in the study, I used the following test protocol:
First, I contacted each individual operator, explained the study, and obtained their informed consent
(Appendix C). A time was agreed upon when the operator would be free to participate in the study.
The tests were run in a meeting room behind the control room that contained a table and several chairs.
During the test, the operators were seated at the table, with copies of all of the procedures in front of
them. Before the study commenced, I explained the NASA-TLX workload assessment to the operator,
including the definitions of each of the 5 subscales. Subsequently, the operator was allowed one
practice run with the NASA-TLX to ensure that the scales were understood.
I then explained to the operator that they were to read the procedure that was presented to them. While
they were reading, if the operator came across anything that confused them or that seemed inaccurate,
I told them to advise me about it. Similarly, if they had any comments that they thought of while reading
the procedure, I encouraged them to dictate them to me. While the operator read through the procedure,
I tallied (and recorded the reason for) any confusing occurrences or inaccuracies that were reported by
the operator.
After the operator finished reading each procedure, I asked them to fill the operator questionnaire. I
then asked them to fill out the modified NASA-TLX workload assessments in regards to each procedure.
After the first and third procedure, the operator was given a 5-minute break. After the second procedure
(when switching between different areas in the plant), the operator was given a 20-minute break.
After the operator completed the read-through of the two procedures in the first area (and their
accompanying questionnaires and NASA-TLX scales), I asked them to complete their subjective
weightings of the NASA-TLX factors. Finally, after all four of the procedures had been read through, I
asked the operator whether they had an overall preference for the new procedures or the old ones, and
whether they had any overall comments; the operators dictated these comments directly to me, and I
recorded them in a notebook. Finally, the operators were debriefed.
28
5 Results For all of the following graphs, “N” refers to the new procedure, and “O” to the old one. All of the data
were tested for normality before statistical tests were run; where the assumption of normality was
violated, an appropriate non-parametric test was used.
5.1 Demographics Below is a summary of the demographic data collected from the operator participants. Most of the
participants were between the ages of 31-50, and most had been working at that particular plant for
between 5 and 10 years.
Table 5-1 - Demographics Summary
& Years"as"operator"Ages" 10"years"
or"more"5!10"years" 2!5"years" Less"than"2"
years"Grand"Total"
18!30" 0.00%" 0.00%" 0.00%" 12.50%"(2)" 12.50%"(2)"31!50" 12.50%"(2)" 25.00%"(4)" 6.25%"(1)" 18.75%"(3)" 62.50%"(10)"51"or"older" 6.25%"(1)" 12.50%"(2)" 6.25%"(1)" 0.00%" 25.00%"(4)"Grand&Total& 18.75%&(3)& 37.50%&(6)& 12.50%&(2)& 31.25%&(5)& 100.00%&
(16)&
5.2 Effectiveness Confusion
I asked the operators to indicate when they were confused by anything mentioned in the procedure; I
then tallied up the number of times this occurred per procedure. The results can be seen in Figure 5-1,
below.
29
Figure 5-1 - Mean Counts of Confusion A Wilcoxon signed-rank test showed no evidence of a difference in the counts of confusion for the new
CF procedure (M=0.94) as compared to the old CF procedure (M=1.06, p=0.751, z=-1.78). A second
Wilcoxon signed-rank test showed strong evidence of a difference in the counts of confusion for the
new FR procedure (M=0.69) as compared to the old FR procedure (M=1.81, p=0.027, z=-2.212).
A third Wilcoxon signed-rank test indicated that there was strong evidence of a difference between the
new procedures and the old procedures (p=0.037, z=-2.083). This significant difference may indicate
less operator confusion with the new procedures.
The mean number of times the experienced and inexperienced operators were confused while reading
each procedure can be seen in Figure 5-2, below.
0!
0.5!
1!
1.5!
2!
2.5!
3!
Confusion!/!New! Confusion!/!Old! Confusion!/!New! Confusion!/!Old!C/F! C/F! F/R! F/R!M
ean(Num
ber(of(Times(Confused(
Mean(Counts(of(Confusion(
30
Figure 5-2 - Mean Counts of Confusion by Experience The new CF procedure had a mean confusion count of 0.625 for experienced operators and 1.25 for
inexperienced operators, while the old one had a mean of 0.875 for experienced operators and 1.25 for
inexperienced operators. The new FR procedure had a mean of 0.25 for experienced operators and
1.125 for inexperienced operators, while the old one had a mean of 1 for experienced operators and
2.625 for inexperienced operators.
Inaccuracies
I also asked the operators to let me know when they thought they spotted something inaccurate or
missing from the procedures, and tallied up this count as well. These results can be seen in Figure 5-3
below.
0!
0.5!
1!
1.5!
2!
2.5!
3!
New!C/F! Old!C/F! New!F/R! Old!F/R!
Mean(Num
ber(of(Times(Confused(
Mean(Counts(of(Confusion(by(Experience(
Experienced!Operators!
Inexperienced!Operators!
31
Figure 5-3 - Mean Counts of Inaccuracies A Wilcoxon signed-rank test showed strong evidence of a difference in the counts of inaccuracies for
the new CF procedure (M=0.5) as compared to the old CF procedure (M=2.94, p=0.005, z=-2.807). A
second Wilcoxon signed-rank test showed moderate evidence of an effect in the counts of inaccuracies
for the new FR procedure (M=0.75) as compared to the old FR procedure (M=1.69, p=0.07, z=-1.812).
A third Wilcoxon signed-rank test indicated that there was strong evidence of a difference between the
new procedures and the old procedures (p<0.001, z=-3.291).
The mean number of times the experienced and inexperienced operators found inaccuracies while
reading each procedure can be seen in Figure 5-4, below.
0!0.5!1!
1.5!2!
2.5!3!
3.5!4!
4.5!5!
Inaccuracy!/!New!Inaccuracy!/!old!Inaccuracy!/!New!Inaccuracy!/!Old!C/F! C/F! F/R! F/R!
Mean(Num
ber(of(Inaccuracies(
Mean(Counts(of(Inaccuracies(
32
Figure 5-4 - Mean Counts of Inaccuracies by Experience The new CF procedure had a mean inaccuracy count of 1 for experienced operators and 0 for
inexperienced operators, while the old one had a mean of 4.5 for experienced operators and 1.375 for
inexperienced operators. The new FR procedure had a mean of 1 for experienced operators and 0.5 for
inexperienced operators, while the old one had a mean of 2.375 for experienced operators and 1 for
inexperienced operators.
5.3 Efficiency Subjective Workload Assessment
The mean subjective weighted workload ratings were calculated per procedure and can be seen in
Figure 5-5, below.
0!0.5!1!
1.5!2!
2.5!3!
3.5!4!
4.5!5!
New!C/F! Old!C/F! New!F/R! Old!F/R!
Mean(Num
ber(of(Inaccuracies(
Mean(Counts(of(Inaccuracies(by(Experience(
Experienced!Operators!
Inexperienced!Operators!
33
Figure 5-5 - Mean Subjective Workload A paired t-test (one-tailed) showed strong evidence that the mean workload ratings for the new CF
procedure (M=7.42) are significantly lower than the workload ratings for the old CF procedure (M=10.05,
p=0.02, z=-2.05). A second paired t-test (one-tailed) showed no evidence of a significant difference in
the workload ratings between the new (M=8.58) and old FR procedures (M=9.35, p=0.27, z=-1.93).
A 2-tailed paired t-test showed moderately strong evidence of an effect between the new procedures
and the old procedures for subjective workload (p = 0.055, z=-1.92).
The mean subjective workload ratings were also broken down by procedure and experience, and can
be seen in Figure 5-6, below.
0!2!4!6!8!10!12!14!
Weighted!Rating!/!New!
Weighted!Rating!/!Old!
Weighted!Rating!/!New!
Weighted!Rating!/!Old!
C/F! C/F! F/R! F/R!
Mean(NASAGTLX(Score(
Mean(Subjective(Workload(
34
Figure 5-6 - Mean Subjective Workload by Experience The new CF procedure had a mean workload rating of 6.35 for experienced operators and 9.425 for
inexperienced operators, while the old one had a mean of 8.83 for experienced operators and 12.52 for
inexperienced operators. The new FR procedure had a mean of 8.07 for experienced operators and
10.17 for inexperienced operators, while the old one had a mean of 8.43 for experienced operators and
11.45 for inexperienced operators.
5.4 Satisfaction Internal consistency is an important feature of any scale; it determines whether several questions that
claim to measure the same variable result in similar scores. In this particular case, the variable in
question would be satisfaction. Cronbach’s alpha coefficient is a measure of internal consistency (Bland
& Altman, 1997). The generally acceptable values for alpha range from 0.70 to 0.95; however, a
maximum value of 0.90 has been recommended (Tavakol & Dennick, 2011). If alpha is higher than 0.90,
some of the items on the scale may be redundant. As access to operators was limited, assessing the
internal consistency of the survey questions had to be performed post-hoc.
The surveys for the new procedures had a Cronbach’s alpha of 0.894, and the surveys for the old
procedures had a Cronbach’s alpha of 0.915, as can be seen in Error! Reference source not found.
and Error! Reference source not found. below. Thus, it can be said that the operator questionnaire
questions were relatively internally consistent and all measured the same variable.
0!
2!
4!
6!
8!
10!
12!
14!
New!C/F! Old!C/F! New!F/R! Old!F/R!
Mean(NASAGTLX(Score(
Mean(Subjective(Workload(by(Experience(
Experienced!Operators!
Inexperienced!Operators!
35 Table 5-2 - Cronbach's Alpha for New Procedures
Reliability Statistics
Cronbach's Alpha
Cronbach's Alpha Based on
Standardized Items
N of Items
.894 .895 6
Table 5-3 - Cronbach's Alpha for Old Procedures
Reliability Statistics
Cronbach's Alpha
Cronbach's Alpha Based on
Standardized Items
N of Items
.915 .915 6
Likert Items
Each quality that was measured by a Likert question can be seen in the figures below:
• Well-organized (Figure 5-7);
• Easy to read (Figure 5-8);
• Written accurately enough in order to be able to achieve the specified goal completely (Figure
5-9);
• Written in a way that lets you achieve the specified goal safely (Figure 5-10);
• Written with the appropriate amount of detail (Figure 5-11); and,
• Overall satisfaction with the procedure (Figure 5-12).
Figure 5-13 illustrates the mean of the sum of these scores from the operator questionnaires.
36
Figure 5-7 – Mean Perception of Organization The new CF procedure had a mean perception of organization of 4.19, while the old one had a mean of
3.12. The new FR procedure had a mean of 4.25, while the old one had a mean of 2.94.
Figure 5-8 - Mean Perception of Ease of Reading The new CF procedure had a mean perception of ease of reading of 3.81, while the old one had a
mean of 3.44. The new FR procedure had a mean of 3.94, while the old one had a mean of 3.19.
0!1!2!3!4!5!
1/N! 1/O! 1/N! 1/O!C/F! C/F! F/R! F/R!
Mean(Likert(Score(
Procedure(
Mean(Perception(of(Organization(
0!1!2!3!4!5!
2/N! 2/O! 2/N! 2/O!C/F! C/F! F/R! F/R!
Mean(Likert(Score(
Procedure(
Mean(Perception(of(Ease(of(Reading(
37
Figure 5-9 – Mean Perception of Accuracy of Procedure The new CF procedure had a mean perception of accuracy of 4.25, while the old one had a mean of
2.89. The new FR procedure had a mean of 3.81, while the old one had a mean of 2.81.
Figure 5-10 – Mean Perception of Ability to Complete Procedure Safely The new CF procedure had a mean perception of the ability to complete it safely of 4.25, while the old
one had a mean of 2.69. The new FR procedure had a mean of 3.81, while the old one had a mean of
2.75.
0!1!2!3!4!5!
3/N! 3/O! 3/N! 3/O!C/F! C/F! F/R! F/R!
Mean(Likert(Score(
Procedure(
Mean(Perception(of(Accuracy(of(Procedure(
0!1!2!3!4!5!
4/N! 4/O! 4/N! 4/O!C/F! C/F! F/R! F/R!
Mean(Likert(Score(
Procedure(
Mean(Perception(of(Ability(to(Complete(Procedure(Safely(
38
Figure 5-11 – Mean Perception of Appropriate Amount of Detail The new CF procedure had a mean perception of appropriate amount of detail of 4.06, while the old
one had a mean of 2.81. The new FR procedure had a mean of 3.81, while the old one had a mean of
2.75.
Figure 5-12 – Mean Subjective Satisfaction The new CF procedure had a mean subjective satisfaction of 4, while the old one had a mean of 2.875.
The new FR procedure had a mean of 3.88, while the old one had a mean of 2.81.
The mean of the sum of the scores from the operator satisfaction questionnaire were calculated per
procedure and can be seen in Figure 5-13, below.
0!1!2!3!4!5!
5/N! 5/O! 5/N! 5/O!C/F! C/F! F/R! F/R!
Mean(Likert(Score(
Procedure(
Mean(Perception(of(Appropriate(Amount(of(Detail(
0!1!2!3!4!5!
6/N! 6/O! 6/N! 6/O!C/F! C/F! F/R! F/R!
Mean(Likert(Score(
Procedure(
Mean(Subjective(Satisfaction(
39
Figure 5-13 - Mean Total Score from Operator Satisfaction Questionnaire
A Wilcoxon signed-rank test for the CF area showed strong evidence of a significant difference in the
overall satisfaction questionnaire score between the new procedure (M=24.56) and the old procedure
(M=17.81, p=0.003, z=-2.97). A second Wilcoxon signed-rank test for the FR area showed strong
evidence of a significant difference in the overall satisfaction questionnaire score between the new
procedure (M=23.5) and the old procedure (M=17.25, p=0.007, z=-2.7).
A Wilcoxon signed rank test indicated that there was strong evidence of a significant difference in
satisfaction between the new procedures and the old procedures (p<0.001, z=-5.99).
The mean total scores from the operator satisfaction questionnaire were also broken down by
procedure and experience, and can be seen in Figure 5-14, below.
0!5!10!15!20!25!30!
New!C/F! Old!C/F! New!F/R! Old!F/R!
Mean(Score(from
(Questionnaire(
Mean(Total(Score(from(Operator(Satisfaction(Questionnaire(
40
Figure 5-14 - Mean Score from Operator Satisfaction Questionnaire by Experience The new CF procedure had a mean operator satisfaction questionnaire score of 26.88 for experienced
operators and 22.25 for inexperienced operators, while the old one had a mean of 18.25 for
experienced operators and 17.38 for inexperienced operators. The new FR procedure had a mean of
23.63 for experienced operators and 23.38 for inexperienced operators, while the old one had a mean
of 16.75 for experienced operators and 17.75 for inexperienced operators.
5.5 Statistical Test Summary Below, a summary of the statistical test results can be found. Significant p-values are bolded. Values
that indicate moderate or moderately strong evidence of an effect have an asterisk.
Table 5-4 - Summary of p-values for all usability measures New vs. Old New C/F vs. Old C/F New F/R vs. Old F/R
Confusion
(Wilcoxon)
0.037 0.751 0.027
Inaccuracy
(Wilcoxon)
0.001 0.005 0.07*
Subjective
Workload (t-test)
0.055* 0.02 0.27
Satisfaction
(Wilcoxon)
0.00 0.003 0.007
0!5!10!15!20!25!30!
New!C/F! Old!C/F! New!F/R! Old!F/R!
Mean(Score(from
(Operator(
Questionnaire(
Mean(Score(from(Operator(Satisfaction(Questionnaire(by(
Experience(
Experienced!Operators!Inexperienced!Operators!
41
5.6 Stated Preferences When asked after the testing protocol which procedure they preferred, 13 of 16 operators stated a
preference for the new procedures. The majority of cited reasons and comments for this preference
claimed that the new procedures have more detail and information. Of the three that did not prefer the
new procedures, all of them stated that they thought the old procedures were more direct and had
fewer steps.
Of the three operators that did not prefer the new procedures, 1 of them was an experienced operator,
and 2 were inexperienced.
42
6 Discussion 6.1 Effect of New Procedures I found that paper-based procedures redesigned with human factors input and evidence-based
guidelines have higher usability ratings than their original versions. As predicted by our hypotheses,
strong or moderate evidence of differences between the new and the old procedures were shown in the
results across all of the usability categories tested. Of the nine metrics that were collected for each
procedure (two for effectiveness, one for efficiency, and six for satisfaction), all of them displayed better
mean values for the new procedures than for the old procedures.
Specifically, in regards to the cylinder filling procedures, I found that the new CF procedures were
reported to be more effective, more efficient, and had higher subjective ratings of satisfaction than the
old ones. Of the four statistical tests that were run between the new and old CF procedures, all but one
of them (confusion count) showed strong evidence of statistically significant differences; the new CF
procedures were shown to be significantly better than the old. Similarly, in regards to the flame reactor
procedures, I found that the new FR procedures were reported to be more effective, more efficient, and
had higher subjective ratings of satisfaction than their old counterparts. Of the four statistical tests that
were run between the new and old FR procedures, two of them showed strong evidence of a
statistically significant difference, one showed moderate evidence of an effect (inaccuracy count), and
one showed no evidence of a significant effect (subjective workload); all were in favour of the new FR
procedure.
While the confusion count for the CF procedures did not show evidence of a significant difference, it
should be noted that the confusion count for the new CF procedure is similar to the confusion count for
the new FR procedure. However, the difference in the mean confusion counts between the old
procedures seems larger. Thus, perhaps the old CF procedure was not as confusing as the old FR
procedure. Therefore, it could be said that the operators were not any more confused about the new CF
procedure than the old one, and significantly less confused about the new FR procedure than the old
one.
The lack of evidence for a significant difference in the subjective workload category between the FR
procedures seems to be mainly due to a couple of operators who gave unexpected low subjective
workload ratings to the old FR procedure. One factor that could have impacted their rating could have
simply been their familiarity with the old procedure. Also, it is possible that even though the old
procedure was incomplete and missing several steps, since it was so short (less than two pages), it had
a very low subjective workload for some of these operators. Despite this, the majority of the evidence
as discussed above still supports the conclusion that the new FR procedures had higher usability. This
43
can be considered non-trivial given the difference in familiarity level as discussed above, which lends
favour to the old procedures.
Finally, I found that the new procedures were rated with a higher overall subjective satisfaction, and
were preferred, over the old ones. The operators surveyed found the selected new procedures to be
better organized, more accurate, more conducive to safe work practices, more appropriately detailed,
and more satisfactory than the old procedures. For the “ease of reading” question, there seemed to be
a much less noticeable difference in the mean rating between the new and the old CF procedure that
was tested. Perhaps, like the subjective workload ratings, this ease of reading question had only a
small difference between the new and old CF procedures due to the length of the new procedure; three
operators commented about how long it was relative to the old procedure, was less direct, and how that
made it more difficult to get through. However, this did not affect the overall conclusion, as 13 of the 16
operators tested (approximately 81%) stated a preference the new procedures.
6.2 Effect of Operator Experience Though I did not perform additional inferential statistical tests in order to avoid inflating type I error rates,
I compared the means of confusion counts, inaccuracy, workload ratings, and total satisfaction
questionnaire scores in order to determine whether there was any difference between the ratings given
by experienced operators and those given by inexperienced operators.
Generally, inexperienced operators reported more instances of confusion, fewer inaccuracies, and
higher workload scores than their experienced counterparts, independent of whether the procedure was
new or old. These differences may be due to the differences in the amount of training that these
operators have received when compared to their experienced counterparts. If the inexperienced
operators received the same amount of exposure to the areas and the training as the experienced
operators, they may have been less confused, found more inaccuracies, and reported lower workload
scores for these procedures.
Conversely, upon examination of the means of the total scores reported from the operator satisfaction
questionnaire, there appeared to be very little difference between the ratings reported by the
experienced operators and those reported by the inexperienced operators.
6.3 Limitations A key limitation on the study was the need to evaluate the procedures in the context of a real process
environment. Multiple anticipated and unanticipated shutdowns constrained the selection of our
research methods. These shutdowns prevented me from being able to walk through the plant with the
operators in order to follow and observe them while they actually performed the procedures in the
44
process environment. As the tests were performed with the operators reading the procedures in a
meeting room, as opposed to actually performing them (or walking through them), the removal of
environmental cues that they may have been used to experiencing while using these procedures may
have affected their ability to count inaccuracies or to accurately judge their subjective workload.
Additionally, though many studies reviewed were held in simulators, I had no access to such a device.
If I had had access to a simulator, and more time, it would have been possible to have the operators
perform the procedures on a simulated process, while more objective performance metrics (such as
error rates or product yield amounts) could be determined and compared from the simulator data.
Our limited time and the limited number and availability of operators at the plants also forced me to
adapt our testing protocol.
Another potential limitation of the study was the fact that at the end of the testing protocol, I asked the
operators to dictate their preference between the new and the old procedures to me. Although I did not
design these procedures in particular, the operators may have believed that I had some vested interest
in them. Thus, as a result, there is a minor chance that they may have biased their verbal response in
order to align with my interests. If I had asked them to write down these items as part of the operator
questionnaire instead of asking them to verbally report their preference, this bias may have been
minimized.
Finally, though it would have been insightful to determine which of the particular changes in format or
content contributed the most to usability, this was an additional item that could not be carried out due to
the previously mentioned time constraints.
This study was carried out with an industrial operator population with mixed experience and gender
using paper-based field operation procedures. Generalizability to other populations is unknown. The
literature review indicates that computer-based and paper-based procedures seem to share some
design guidelines. Although the sample size for this study was small, the results may be generalizable
to other paper-based procedures and computer-based procedures in other plants with similar operator
populations.
45
7 Conclusion and Future Work This is the first body of work to perform a usability comparison of paper-based industrial operating
procedures for normal conditions. Despite multiple limitations that were present, the study was carried
out with two procedures that had been redesigned based on evidence-based procedure-writing
guidelines and human factors input. As predicted, the new procedures (as compared to the old
procedures) were rated moderately or significantly higher across the categories of efficiency,
effectiveness, and satisfaction. More specifically, the new cylinder filling procedure was rated
significantly better than its predecessor across counts of inaccuracy, subjective workload, and
satisfaction. The new flame reactor procedure was rated significantly (or at least moderately) better
than its predecessor across counts of confusion, counts of inaccuracy, and satisfaction. Of the four
measures where ratings were compared across the different operator experience groups, the only
measure where experienced and inexperienced operators gave similar ratings was in the category of
satisfaction.
Ideally, an immediate follow-up to this study should attempt to identify which particular change in the
procedures contributed most to the differences in the usability ratings. Hopefully, future studies of
procedure usability in settings such as this would either have access to a simulator or the actual plant
(for walkthroughs) so that a more objective measure could be used to capture the effectiveness metric
(e.g., number of errors committed, time completed, etc.). Additionally, although I could not test the
hypothesis with inferential statistics here, a valuable study would be to determine whether
inexperienced operators and experienced operators would rate the usability of the procedures
significantly differently.
46
References Bland, J. M., & Altman, D. G. (1997). Statistics notes: Cronbach’s alpha. BMJ, 314(7080), 572–572.
doi:10.1136/bmj.314.7080.572
Boy, G. A., & De Brito, G. (2000). Toward a categorization of factors related to procedure following and
situation awareness. In Proceedings of the HCI-Aero 2000 Conference. In Cooperation with
ACM-SIGCHI, Cepadues, Toulouse, France. Retrieved from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.6279&rep=rep1&type=pdf
Bullemer, P. T., & Hajdukiewicz, J. R. (2004). A Study of Effective Procedural Practices in Refining and
Chemical Operations. In Proceedings of the Human Factors and Ergonomics Society Annual
Meeting (Vol. 48, pp. 2401–2405). Retrieved from
http://pro.sagepub.com/content/48/20/2401.short
Burks, R., & Peres, S. C. (2011). Procedure Classification Putting Campbell’s Objective Complexity
Framework to Work for a Petrochemical Company. In Proceedings of the Human Factors and
Ergonomics Society Annual Meeting (Vol. 55, pp. 1472–1475). Retrieved from
http://pro.sagepub.com/content/55/1/1472.short
Caccamise, D. J., & Mecherikoff, M. (1993). Human factoring the procedures element in a complex
manufacturing system. In Proceedings of the Human Factors and Ergonomics Society Annual
Meeting (Vol. 37, pp. 1046–1050). Retrieved from
http://pro.sagepub.com/content/37/16/1046.short
Carnio, A. (1980). Improvement of Operating Procedures in a Nuclear Power Plant. Trans. Am. Nucl.
Soc.; (United States), 35. Retrieved from
http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=5092041
Carvalho, P. V., Dos Santos, I. L., & Vidal, M. C. (2006). Safety implications of cultural and cognitive
issues in nuclear power plant operation. Applied Ergonomics, 37(2), 211–223.
47 Chin, J. P., Diehl, V. A., & Norman, K. L. (1988). Development of an instrument measuring user
satisfaction of the human-computer interface. In Proceedings of the SIGCHI conference on
Human factors in computing systems (pp. 213–218). Retrieved from
http://dl.acm.org/citation.cfm?id=57203
Converse, S. A. (1994). Operating procedures: do they reduce operator errors? In Proceedings of the
Human Factors and Ergonomics Society Annual Meeting (Vol. 38, pp. 205–209). Retrieved from
http://pro.sagepub.com/content/38/4/205.short
Degani, A., & Wiener, E. (1994). On the design of flight-deck procedures.
Degani, A., & Wiener, E. L. (1990). Human factors of flight-deck checklists: the normal checklist.
Dien. (1998). Safety and application of procedures, orhow dothey’have to use operating procedures in
nuclear power plants?’. Safety Science, 29(3), 179–187.
Dien, Llory, M., & Montmayeul, R. (1992). Operator’s knowledge, skill and know-how during the use of
emergency procedures: design, training and cultural aspects. In Human Factors and Power
Plants, 1992., Conference Record for 1992 IEEE Fifth Conference on (pp. 178–181). Retrieved
from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=283413
Elm, W. C., & Woods, D. D. (1985). Getting lost: a case study in interface design. In Proceedings of the
Human Factors and Ergonomics Society Annual Meeting (Vol. 29, pp. 927–929). Retrieved from
http://pro.sagepub.com/content/29/10/927.short
Goodman, P. C., & DiPalo, C. A. (1991). Human Factors Information System: A Tool to Assess Error
Related to Human Performance in U.S. Nuclear Power Plants. Proceedings of the Human
Factors and Ergonomics Society Annual Meeting, 35(10), 662–665.
doi:10.1177/154193129103501015
48 Hornbæk, K. (2006). Current practice in measuring usability: Challenges to usability studies and
research. International Journal of Human-Computer Studies, 64, 79–102.
doi:10.1016/j.ijhcs.2005.06.002
Institute of Nuclear Power Operations. (2009). Procedure Use & Adherence.
ISO, W. (1998). 9241-11. Ergonomic requirements for office work with visual display terminals (VDTs).
Guidance on Usability.
Jamieson, G., & Miller, C. (2000). Exploring the ‚Äúculture of procedures‚Äù (pp. 141–145).
Kontogiannis, T. (1999). Applying information technology to the presentation of emergency operating
procedures: implications for usability criteria. BEHAVIOUR & INFORMATION TECHNOLOGY,
18, 261–276.
Lapinsky, G. (1989). Lessons Learned from the Special Inspection Program for Emergency Operating
Procedures. Division of Licensee Performance and Quality Evaluation, Office of Nuclear
Reactor Regulation, US Nuclear Regulatory Commission.
LEHTO, M., & SALVENDY, G. (1995). Warnings: a supplement not a substitute for other approaches to
safety. Ergonomics, 38(11), 2155–2163. doi:10.1080/00140139508925259
Luna, S. F., Sturdivant, M. H., & McKay, R. C. (1988). Factoring humans into procedures. In Human
Factors and Power Plants, 1988., Conference Record for 1988 IEEE Fourth Conference on (pp.
201–207). Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=27503
Mosier, K. L., Palmer, E. A., & Degani, A. (1992). Electronic checklists: Implications for decision making.
In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 36, pp. 7–
11). Retrieved from http://pro.sagepub.com/content/36/1/7.short
NASA. (1986). Nasa Task Load Index (TLX) Manual.
49 Niwa, Y., Hollnagel, E., & Green, M. (1996). Guidelines for computerized presentation of emergency
operating procedures. Nuclear Engineering and Design, 167, 113–127. doi:10.1016/s0029-
5493(96)01297-6
Nuclear Energy Institute. (2006). Procedure Writers’ Manual (No. NEI AP-907-005). Nuclear Energy
Institute.
O’Hara, J. M., Higgins, J., & Stubler, W. (2000). Computerization of Nuclear Power Plant Emergency
Operating Procedures. In Proceedings of the Human Factors and Ergonomics Society Annual
Meeting (Vol. 44, pp. 819–822). Retrieved from http://pro.sagepub.com/content/44/22/819.short
O’Hara, J.M., & Higgins, J.C. (2004). NUREG-0711: Human Factors Engineering Programme Review
Model. Washington, DC: US Nuclear Regulatory Commission.
Ockerman, J., & Pritchett, A. (2000). A Review and Reappraisal of Task Guidance: Aiding Workers in
Procedure Following. International Journal of Cognitive Ergonomics, 4(3), 191–212.
doi:10.1207/S15327566IJCE0403_2
Orendi, R. G., Petras, D. S., Lipner, M. H., Oft, R. R., & Fanto, S. V. (1988). Human-factors
considerations in emergency procedure implementation. In Human Factors and Power Plants,
1988., Conference Record for 1988 IEEE Fourth Conference on (pp. 214–221). Retrieved from
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=27505
Oxstrand, J., Le Blanc, K., & Hays, S. (2012). Evaluation of Computer-Based Procedure System
Prototype. Idaho National Laboratory External Report. Retrieved from
http://www.inl.gov/technicalpublications/Documents/5581215.pdf
Palmer, E., & Degani, A. (1991). Electronic checklists: Evaluation of two levels of automation. In
Proceedings of the Sixth Symposium on Aviation Psychology (pp. 178–183). Retrieved from
http://ti.arc.nasa.gov/m/profile/adegani/Electronic%20checklist%20eval.pdf
50 Pander Maat, H., & Lentz, L. (2010). Improving the usability of patient information leaflets. Patient
Education and Counseling, 80, 113–119. doi:10.1016/j.pec.2009.09.030
Patel, S., Drury, C. G., & Lofgren, J. (1994). Design of workcards for aircraft inspection. Applied
Ergonomics, 25(5), 283–293. doi:10.1016/0003-6870(94)90042-6
Segall, N., Doolen, T. L., & Porter, J. D. (2005). A usability comparison of PDA-based quizzes and
paper-and-pencil quizzes. Computers & Education, 45(4), 417–432.
Shamo, M. K., Dror, R., & Degani, A. (1998). Evaluation of a New Cockpit Device: The Integrated
Electronic Information System. Proceedings of the Human Factors and Ergonomics Society
Annual Meeting, 42(1), 138–142. doi:10.1177/154193129804200131
Sharit, J. (1998). Applying human and system reliability analysis to the design and analysis of written
procedures in high-risk industries. Human Factors and Ergonomics in Manufacturing & Service
Industries, 8(3), 265–281.
Tavakol, M., & Dennick, R. (2011). Making sense of Cronbach’s alpha. International Journal of Medical
Education, 2, 53–55.
Theureau, J., Jeffroy, F., & Vermersch, P. (2000). Controling a nuclear reactor in accidental situations
with symptombased computerized procedures: a semiological & phenomenological analysis.
CSEPC 2000 Proceedings, 22–25.
White, R. E., Trbovich, P. L., Easty, A. C., Savage, P., Trip, K., & Hyland, S. (2010). Checking it twice:
an evaluation of checklists for detecting medication errors at the bedside using a chemotherapy
model. Quality and Safety in Health Care, 19, 562–567.
Wieringa, D., Moore, C., & Barnes, V. (1998). Procedure writing: principles and practices. Battelle
Press Columbus, OH. Retrieved from http://www.getcited.org/pub/100322096
51 Wieringa, D. R., & Farkas, D. K. (1991). Procedure writing across domains: Nuclear power plant
procedures and computer documentation. In Proceedings of the 9th annual international
conference on Systems documentation (pp. 49–58). Retrieved from
http://dl.acm.org/citation.cfm?id=122787
52
Appendix(A (MODIFIED(NASA(TASK(LOAD(INDEX(QUESTIONNAIRE(
Name Task Date
Mental Demand How mentally demanding was the task?
Physical Demand How physically demanding was the task?
Temporal Demand How hurried or rushed was the pace of the task?
Performance How successful were you in accomplishing whatyou were asked to do?
Effort How hard did you have to work to accomplishyour level of performance?
Frustration How insecure, discouraged, irritated, stressed,and annoyed wereyou?
Figure 8.6
NASA Task Load Index
Hart and Staveland’s NASA Task Load Index (TLX) method assesseswork load on five 7-point scales. Increments of high, medium and lowestimates for each point result in 21 gradations on the scales.
Very Low Very High
Very Low Very High
Very Low Very High
Very Low Very High
Perfect Failure
Very Low Very High
53
Appendix(B OPERATOR(QUESTIONNAIRE(
Subject number: Presentation procedure number: Area number: Overview I would like to thank you for your interest and participation in this study. This study is being conducted in order to collect data about the usability of different procedures. The data collected through this study will be analyzed as part of a Master’s-level research thesis. If you have any questions following completion of the questionnaire please feel free to contact the researcher at [email protected] . Research Ethics & Confidentiality In keeping with research ethics practices at the University of Toronto all completed questionnaires will be kept confidential and viewed only by the researcher. If at any time you feel unable or unwilling to answer a question please feel free to leave the question blank or alternatively enter ‘no response’. Your participation in this exercise can be halted at any time. For more information on research ethics at the University of Toronto please visit www.research.utoronto.ca. Please answer questions 1 to 3 by reading each question carefully and responding according to the provided instructions.
1) Qualification – Are you a qualified operator in this area? Yes No
2) Experience – How long have you worked at this plant as an operator? Please enter your answer in the box below.
Less than 2 years 2 – 5 years 5 – 10 years 10 years or more
3) Age - Check the box to the immediate right of your intended answer. 18-30 31-50 51 or Older
Please indicate how strongly you agree or disagree with the statements below by checking the box above your corresponding level of agreement:
I thought that this procedure was…
1) …well-organized.
Strongly Disagree
Disagree Undecided Agree Strongly Agree
2) …easy to read.
Strongly Disagree
Disagree Undecided Agree Strongly Agree
3) …written accurately enough to be able to achieve the specified goal completely.
Strongly Disagree
Disagree Undecided Agree Strongly Agree
4) …written in a way that lets you achieve the specified goal safely.
Strongly Disagree
Disagree Undecided Agree Strongly Agree
5) …written with the appropriate amount of detail.
Strongly Disagree
Disagree Undecided Agree Strongly Agree
If you checked “disagree” or “strongly disagree” for question #5 above, please explain why. ______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ ______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
6) Overall, I was satisfied with the quality of this procedure.
Strongly Disagree
Disagree Undecided Agree Strongly Agree
54
Appendix(C INFORMED(CONSENT(FORM(
Informed Consent Form Date: Study Name: Usability Comparison of Paper Procedures
Researchers: Mario Iannuzzi
Supervisor: Dr. Greg Jamieson About the Research: Thank you for considering participation in this study! The purpose of this study will be to develop a method to compare the usability of paper procedures. In total, the time commitment for you will be roughly 5 hours. It will include:
• an introduction and a debriefing session; • a walkthrough of four procedures in the plant where you will be asked questions about the
procedure; • two questionnaires; • and a brief closing interview.
Potential Risks and Discomforts: You will be exposed to the plant environment; however, this should not bring you any more discomfort than you experience during your regular working periods. Benefits of the Research and/or Benefits to You: This methodology will be among the first of its kind to compare the usability levels of paper procedures. It will benefit both the scientific human factors community, and industries that are attempting to modify their procedures. As an operator at a corporation that has recently changed how the procedures are written, this research will benefit you by assuring you that the new procedures are indeed more usable. Voluntary Participation and Withdrawal from the Study: Your participation in the study is completely voluntary and you may choose to stop participating at any time. Should you decide not to volunteer or to withdraw this will not influence the nature of your relationship with the researchers or the nature of your relationship with the University of Toronto either now, or in the future. You can stop participating in the study at any time, for any reason, if you so decide. In that event, the data you will have produced up to that point shall be discarded at your request. Confidentiality: All performance data and information you supply on the questionnaire during the
research will be held in confidence; in other words, your name will not appear in any report or publication of the research. Your data will be safely stored in a locked facility and only research staff will have access to this information. Confidentiality will be provided to the fullest extent possible by law. Questions About the Research? If you have questions about the research in general or about your role in the study, please feel free to contact Mario Iannuzzi either by telephone at (647) 986-6719 or by e-mail ([email protected]). You may also contact Dr. Greg Jamieson ([email protected]). This research has been reviewed by the University of Toronto’s Office of Research Ethics (ORE). If you have any questions about this process, or about your rights as a participant in the study, please contact The Office of Research Ethics at McMurrich Building, 2nd floor, 12 Queen’s Park Crescent West, Toronto, ON M5S 1S8, Fax:(416) 946-5763. Signature: I____________________________, consent to participate in the experiment described above, conducted by Mario Iannuzzi. I have understood the nature of the project and wish to participate. I am not waiving any of my legal rights by signing this form, and I have been given a copy of it for future reference. My signature below indicates my consent. Signature Date_____________________________ Participant's Name: Participant Signature __________________________________ Investigator’s Signature __________________________________ !