202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION,...

14
202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010 Assessing Concurrent Think-Aloud Protocol as a Usability Test Method: A Technical Communication Approach —LYNNE COOKE Abstract—Concurrent think-aloud protocol (CTA) is often used in usability test settings to gain insight into participants’ thoughts during their task performances. This study adds to a growing body of research within technical communication that addresses the use of think-aloud protocols in usability test settings. The eye movements and verbalizations of 10 participants were recorded as they searched for information on a website. The analysis of transcripts and real-time eye movement showed that CTA is an accurate data-collection method. The researcher found that the majority of user verbalizations in the study included words, phrases, and sentences that users read from the screen. Silence and verbal fillers that occurred during CTA enabled users to assess and process information during their searches. This study demonstrates the value technical communicators add to the study of usability test methods, and the paper recommends future avenues of research. Index Terms—Cognitive psychology, communication, technical communication. How do usability specialists with backgrounds in technical communication differ from usability specialists with backgrounds in cognitive science and human-computer interaction (HCI)? While all usability specialists share the common goal of improving the user experience, their backgrounds—to a certain extent—determine how “user experience” is defined regarding usability. Usability specialists with backgrounds in cognitive science are trained in rigorous experimental research methods that are put to use in laboratory settings where variables are defined and tightly controlled. In these settings, cognitive models are tested or developed. Consequently, a cognitive science approach to usability typically focuses on the human brain and how users process information [1]–[3]. Usability specialists with backgrounds in HCI, by contrast, tend to have education and experience in psychology, social science, and computer science. An HCI approach to usability focuses on the technology—the internet, computers, and mobile devices, for example—or the context in which the technology is used [4], [5]. Since the study of human behavior is still rooted in the sciences, an HCI approach to usability typically concerns quantitative data that can be statistically analyzed—mouse clicks, error rates, eye fixations—or user activity that can be used to Manuscript received June 07, 2009; revised October 04, 2009; accepted October 05, 2009. Date of current version August 25, 2010. The author is with the English Department, West Chester University of Pennsylvania, West Chester, PA 19383 USA (email: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. IEEE 10.1109/TPC.2010.2052859 build models of human behavior based on task and use scenarios [6]. Usability specialists with backgrounds in technical communication are apt to have education and experience in the humanities, which typically means that they are familiar with many different quantitative and qualitative methods such as rhetorical analysis, statistical analysis, content analysis, discourse analysis, and visual analysis. Moreover, as Ramey [7], [8]; Redish [9]; and Sullivan [10], [11] pointed out more than 20 years ago, technical communicators are trained in audience analysis, which makes them highly conscious of the rhetorical aspects of usability testing. Therefore, technical communicators are more likely than cognitive scientists to consider the following questions when developing usability tests and analyzing their results: Who are the intended users of the test object? How much knowledge and experience do they have with similar test objects? How and in what context will they use the test object? It is this focus on the audience/user that motivates technical communicators-turned-usability specialists to examine the user experience through the use of multiple data-collection methods and multiple modes of analysis. This is not to say that technical communicators are better suited than cognitive scientists or HCI researchers for the study or practice of usability testing; however, technical communicators do think about usability from different perspectives than cognitive science and HCI. To understand the different approaches that cognitive scientists, HCI researchers, and technical communicators take to studying usability, this paper first examines how a popular usability 0361-1434/$26.00 © 2010 IEEE

Transcript of 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION,...

Page 1: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010

Assessing Concurrent Think-Aloud Protocol as a UsabilityTest Method: A Technical Communication Approach—LYNNE COOKE

Abstract—Concurrent think-aloud protocol (CTA) is often used in usability test settings to gain insight intoparticipants’ thoughts during their task performances. This study adds to a growing body of research within technicalcommunication that addresses the use of think-aloud protocols in usability test settings. The eye movements andverbalizations of 10 participants were recorded as they searched for information on a website. The analysis oftranscripts and real-time eye movement showed that CTA is an accurate data-collection method. The researcherfound that the majority of user verbalizations in the study included words, phrases, and sentences that users readfrom the screen. Silence and verbal fillers that occurred during CTA enabled users to assess and process informationduring their searches. This study demonstrates the value technical communicators add to the study of usability testmethods, and the paper recommends future avenues of research.

Index Terms—Cognitive psychology, communication, technical communication.

How do usability specialists with backgroundsin technical communication differ from usabilityspecialists with backgrounds in cognitive scienceand human-computer interaction (HCI)? Whileall usability specialists share the commongoal of improving the user experience, theirbackgrounds—to a certain extent—determine how“user experience” is defined regarding usability.Usability specialists with backgrounds in cognitivescience are trained in rigorous experimentalresearch methods that are put to use in laboratorysettings where variables are defined and tightlycontrolled. In these settings, cognitive models aretested or developed. Consequently, a cognitivescience approach to usability typically focuseson the human brain and how users processinformation [1]–[3].

Usability specialists with backgrounds in HCI, bycontrast, tend to have education and experiencein psychology, social science, and computerscience. An HCI approach to usability focuseson the technology—the internet, computers,and mobile devices, for example—or the contextin which the technology is used [4], [5]. Sincethe study of human behavior is still rootedin the sciences, an HCI approach to usabilitytypically concerns quantitative data that can bestatistically analyzed—mouse clicks, error rates,eye fixations—or user activity that can be used to

Manuscript received June 07, 2009; revised October 04, 2009;accepted October 05, 2009. Date of current version August25, 2010.The author is with the English Department, West ChesterUniversity of Pennsylvania, West Chester, PA 19383 USA (email:[email protected]).Color versions of one or more of the figures in this paper areavailable online at http://ieeexplore.ieee.org.

IEEE 10.1109/TPC.2010.2052859

build models of human behavior based on task anduse scenarios [6].

Usability specialists with backgrounds in technicalcommunication are apt to have education andexperience in the humanities, which typicallymeans that they are familiar with many differentquantitative and qualitative methods such asrhetorical analysis, statistical analysis, contentanalysis, discourse analysis, and visual analysis.Moreover, as Ramey [7], [8]; Redish [9]; andSullivan [10], [11] pointed out more than 20years ago, technical communicators are trainedin audience analysis, which makes them highlyconscious of the rhetorical aspects of usabilitytesting. Therefore, technical communicators aremore likely than cognitive scientists to considerthe following questions when developing usabilitytests and analyzing their results: Who are theintended users of the test object? How muchknowledge and experience do they have withsimilar test objects? How and in what contextwill they use the test object? It is this focuson the audience/user that motivates technicalcommunicators-turned-usability specialists toexamine the user experience through the use ofmultiple data-collection methods and multiplemodes of analysis. This is not to say that technicalcommunicators are better suited than cognitivescientists or HCI researchers for the study orpractice of usability testing; however, technicalcommunicators do think about usability fromdifferent perspectives than cognitive science andHCI.

To understand the different approaches thatcognitive scientists, HCI researchers, and technicalcommunicators take to studying usability, thispaper first examines how a popular usability

0361-1434/$26.00 © 2010 IEEE

Page 2: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

COOKE: ASSESSING CONCURRENT THINK-ALOUD PROTOCOL AS A USABILITY TEST METHOD 203

test method—concurrent think-aloud protocol(CTA)—has been treated by researchers in thesedisciplines. In usability testing, CTA is a processin which users verbalize their thoughts as theyperform tasks. The objective of CTA is to gaininsight into user behavior that would be difficult toobtain from observation alone.

Think-aloud protocols have their roots incognitive psychology. Here, researchers “testtheories about the human information processingsystem” [12, p. 10]. While they were not the firstresearchers to employ this method, Ericsson andSimon popularized CTA in the 1980s [12], [13].These researchers established strict guidelinesfor administering CTA by categorizing users’verbalizations. Across the cognitive psychologydiscipline at the time, CTA was primarily usedto gain access to users’ direct reports of theirtask performance. These behavioral data, in turn,served as the theoretical foundation for cognitivemodels. Since CTA was used in tightly controlledexperimental conditions and because resultsnarrowly applied to cognitive science, Ericsson andSimon defined three levels of verbalizations [12].Level 1 verbalizations are the most reliable becausethey are direct reports of user behavior fromshort-term memory. In usability testing, readingtext off the screen would be categorized as Level 1verbalizations. Level 2 verbalizations are subjectto an intermediary process in which the usermust transform images or abstract concepts intowords. In usability testing, translating percentagesfrom a pie chart or other graph into words wouldbe categorized as Level 2 verbalizations. Level 2verbalizations are reliable, but less so than Level 1verbalizations. Level 3 verbalizations, by contrast,are not considered to be reliable by Ericsson andSimon. These thoughts require additional cognitiveprocessing that comes from long-term memorybefore they can be verbalized. Verbalizations thatrequire inference, categorization, or filtering wouldbe considered Level 3 verbalizations. In usabilitytesting, any verbalizations prompted by the testfacilitator (such as “go on ” or “I see”) wouldbe categorized as Level 3 verbalizations. Otherverbalizations such as value judgments, streamof consciousness, and daydreams would not beconsidered legitimate verbalizations because oftheir subjective content.

CTA was controversial within cognitive psychology,with researchers arguing the following points:

• CTA did not represent unconscious or automaticprocesses [14], [15].

• CTA changed the sequence of thought processes[16].

• The act of thinking aloud interfered with cognitiveprocessing [17], [18].

Moreover, as Boren and Ramey pointed out inone of the first published articles in the technicalcommunication discipline that critically assessedthe use of Ericsson and Simon’s method inusability testing, adhering to the strict guidelinesfor administering CTA and evaluating CTAverbalizations were impractical and “out of synch”with usability practice [19, p. 263].

Despite criticisms of CTA, the method was usedin composition studies during the 1980s forgaining insight into students’ writing processes.Unlike cognitive psychology, however, CTA wasapplied in ways that better suited the humanities.Verbalizations were not categorized according tolevels, and all verbalizations (including introspectiveand inferential thoughts) were considered to bevalid. Flower and Hayes, for example, instructedstudents to “verbalize everything that [went]through their minds as they [wrote], including straynotions, false starts, and incomplete or fragmentarythought” [20, p. 368]. As a result of using CTA,they found that writer-generated goal setting isone of the most important aspects of writing andrevision because writers continuously form goalsas they write. Flower and Hayes argue that goalsare what encourage writers to invent and exploreideas, which are essential to being a successfulcommunicator. Critics of Flower and Hayes’sresearch, however, argued against CTA, contendingthat cognitive processes are not visible enough tobe examined scientifically [21] and questioningthe validity of a writing-process model developedfrom these verbalizations [22]. Steinberg, however,persuasively countered these arguments and othercriticisms about the use of CTA in compositionstudies [23].

CTA AND USABILITY TESTING

In the early 1990s, usability engineers, HCIresearchers, usability researchers and consultants,and usability practitioners all began to advocateCTA [24]–[27]. The goal of applying CTAduring usability testing was similar to that ofcognitive scientists: to gain access to users’thoughts during task performance. The processfor administering CTA, however, remainedinconsistent. Consequently, it was criticized forits lack of methodological rigor and systematic

Page 3: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

204 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010

application. These two problems had the potentialto undermine the validity of results. To addressthese problems, Boren and Ramey, two usabilityresearchers with backgrounds in technicalcommunication, developed a CTA model based onSpeech Communication Theory instead of cognitivescience [19]. A subsequent comparison of thecognitive science and speech communication CTAmodels revealed that the Boren and Ramey modelwas better suited for usability testing because itresulted in a higher task-completion rate [28].

Besides Boren and Ramey, other researcherssuch as van den Haak, de Jong, and Schellens[29]–[32] have systematically studied the “meritsand restrictions” of different think-aloud protocolmethods in usability test settings [29, p. 340]. Bycomparing concurrent, constructive interactionand retrospective think-aloud protocols (RTAs), theteam found that all three methods were equallyeffective for detecting usability problems. Eachprotocol, however, offered specific advantagesrelated to the usability specialist’s priorities. CTA,for example, was effective at identifying usabilityproblems in a time-effective manner, whereasRTA offered a more “truthful representation oftask performance” [32, p. 70]. Moreover, theseresearchers are situated in the communicationstudies and technical communication disciplines.Researchers in these two disciplines often seek tocontextualize the results of their studies throughthe use of multiple research methods.

Gaps still exist in our understanding of the use ofCTA in usability test settings. For instance, onekey issue around CTA is assessing its accuracy,since doing so requires a second measure towhich verbalizations can be compared. Forinstance, Rhenius and Deffner used eye tracking tocompare sequences of verbalization to sequencesof eye fixations on a computer monitor [33]. Theresearchers reported an 87% to 98% accuracylevel between eye fixations and verbalizations.However, the researchers failed to report the unitof analysis that they used in their study as wellas the procedure that they used to calculate eyemovement and verbalization variables. As a morerecent example of this problem, Hertzum, Hansen,and Andersen collected CTA and eye-movementdata in a usability test setting [34]. However, thetwo measures were not compared for verbalizationaccuracy.

In addition to accuracy, little information existsregarding the types of verbalizations that CTAyields. According to Bowers and Snyder, users

verbalized mostly descriptions of their onscreenactions [35]. Van den Haak, de Jong, andSchellens’s comparison of CTA and RTA yieldedsimilar results [29]. However, since the purposeof their study was to determine usability problemdetection, reactivity, and the protocols’ effect onusers’ experience, the CTA data were not analyzedby content categories.

Equally important as CTA task-related verbalcontent are verbal fillers (e.g., “um” and “ah”)and silences that inevitably occur as usersperform a task. Both of these communicationconstructs reveal important information about taskperformance. For example, cognitive psychologistshave reported that during CTA, users lapse intosilence while completing cognitively complextasks. This occurs because users’ brains prioritizecognitive processing over verbalization. Thus,silences are more likely to occur when usersare required to think abstractly or to engage incomplicated computational-processing activities[12], [36]. Still, little is known about how silencesand verbal fillers function in CTA. This lack ofinformation arises, in part, because anotherresearch method similar to eye tracking isnecessary to gain insight into users’ onscreenactivity during periods of nonverbalization. Theuse of multiple methods increases the concurrentvalidity of the scholarship and helps reveal if thereare common onscreen elements that contribute tocognitive-processing difficulty.

This study aims to add to the growing bodyof literature about how CTA functions withinusability test settings. As such, the studyaddresses three areas that have yet to befully explored: verbalized accuracy, verbalizedcontent, and silence/verbal fillers. Threeresearch questions (RQ) guide the study:

RQ1. To what degree are the statements usersproduce during CTA accurate?RQ2. What content categories do CTAverbalizations include?RQ3. What do users’ eye movements revealabout their behavior when they are silent orusing verbal fillers?

This paper is structured as follows: First, the studymethods and an analysis of annotated transcriptsof participants’ verbalizations and eye movementsare reported. Then, the results are discussed asthey relate to the use of CTA in usability testing.The study of user behavior demonstrates the

Page 4: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

COOKE: ASSESSING CONCURRENT THINK-ALOUD PROTOCOL AS A USABILITY TEST METHOD 205

Fig. 1. Washington State’s Department of Licensing website.

strengths of using a technical communicationapproach to assessing usability test methods.

METHODS

The current study was conducted over four daysat The University of Washington’s Laboratory forUsability Testing and Evaluation. Ten participantsverbalized their thoughts as they completedfour tasks on one website. During this time,participants’ eye movements, onscreen movements,and verbalizations were recorded. Transcriptsannotated with eye and mouse movement were thenanalyzed for verbalization accuracy and content.

Participants Ten people—five women and fivemen—affiliated with the University of Washingtonparticipated in the study. Participants’ agesranged from 22 to 41 with a mean age of 27. Allparticipants reported that they were regular usersof the web. Due to limitations of the eye-trackingequipment, only participants who did not wearglasses or contacts were recruited for the study.Four subjects had participated in prior eye-trackingstudies, and all participants had been subjects inat least one usability study that included CTA.

IRB approval was obtained from the University ofNorth Texas and the University of Washington. Allsubjects gave their written informed consent toparticipate in the study.

Website and Tasks The Washington StateDepartment of Licensing website was selected forthis study for several reasons. First, the site neededto be free of animation, pop-ups, advertisements,and other visual distracters that might causerapid shifts in eye movement not related to taskperformance. Meeting this condition was alsonecessary due to limitations in the eye-trackingequipment. Second, the site needed a simple,consistent design since the objective of the studywas to explore the relationship between eyemovement and CTA. The Department of Licensingwebsite followed a four-panel design: a leftnavigation panel, a top identification panel, a right“Quick Clicks” column, and a main content panelin the center of the page (see Fig. 1).

Participants performed the following four searchtasks, in the same order, on the website:

(1) Find the webpage that gives you informationabout getting a motorcycle permit.

Page 5: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

206 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010

(2) Find the webpage that tells you how to registera trade name for your business.

(3) Find the webpage that tells how many days aperson must wait before retaking the exam tobecome a private eye.

(4) Find the webpage that tells you how to get apersonalized license plate for your car.

Most tasks were completed in four minutes or fewer,although participants were told that they could takeas much time as they needed to complete the tasks.

Eye-Tracking Equipment Eye-movementdata were collected using the ERICA Eye-GazeResponse Computer Aid remote eye-trackingsystem and GazeTracker software, both of whichwere developed by Eye Response Technologies.The eye tracker works by measuring the cornealreflection of an infrared-light emitting diode (LED),which illuminates and generates a reflection offthe surface of the eye. This action causes thepupil to appear as a bright disk in contrast tothe surrounding iris and creates a small glintunderneath the pupil. It is this glint that theeye-tracking system uses for calibration andtracking. In this study, ERICA and the GazeTrackersoftware captured video images of the mousemovement and the glint (eye) movement at astandard sampling rate of 30 times per second. Onplayback, mouse movements appeared as arrows orcursors, and eye movement appeared as a movingred “X” overlaid on the website. Since this video didnot include audio, participants’ verbalizations werecaptured on a separate video track. A discussion ofhow data from these two sources was prepared foranalysis is covered in the Data Processing sectionof this paper.

Using eye tracking as a data-collection methodposed some limitations on the study. Sinceeye movement was captured from a singleremote input—the camera mounted underneaththe computer monitor—it was necessary forparticipants to keep their heads very still as theycompleted the tasks. This lack of movement wasimportant because head movement could causeparticipants to lose calibration with the eye trackerand invalidate the eye-movement data. In thecurrent study, loss of calibration was the largestreason that eye-movement data had to be removedfrom the data set before analysis. Since the currentstudy required participants to verbalize during taskperformance, it was not possible to use a chin restfor head stability. In addition, the eye tracker may

have increased participants’ awareness of theirsearch strategies and, as a result, added to theartificiality of the study. (For additional informationabout the history and use of eye tracking inusability, see [37]–[39].)

Procedure The CTA study was conducted over fourdays at the University of Washington’s Laboratoryfor Usability Testing and Evaluation. Eachparticipant was run individually through the studyin a session that lasted approximately 30 minutes.To ensure the reliability and validity of the study,the researcher read from an orientation scriptthat explained the test procedures and informedparticipants that they would be videotaped whiletheir eye and mouse movements were tracked.Participants were not, however, told that the intentof the study was to measure verbalizations againsteye movements, nor were they told that theirverbalizations would be analyzed for content, sincedoing so would likely affect users’ verbalizationquantity and content. Instead, participants weretold that the study was a usability test of thewebsite.

After the orientation, participants—while thinkingaloud—performed a practice task on an unrelatedwebsite. Participants were calibrated to theeye-tracker system using GazeTracker. To maintaincalibration, the tasks were presented on screenin an open Microsoft Word document in a windowseparate from the website. Participants first readthe task aloud and then clicked on the webpage,which signaled the start of the task. Since allparticipants had experience with CTA, promptingwas rarely necessary. After participants completedthe fourth task, they were thanked and given asmall gift for taking part in the study.

Processing the Data After all participants wererun through the study, several stages of dataprocessing were required before data analysiscould take place (see Fig. 2). Two video tracks wereused: one that captured participants’ on-screenactions and verbalizations and another thatcaptured their eye movements. These two files weresynchronized according to mouse movements andmouse clicks. Each time a mouse click was heardon the audio tape, the video tape was aligned sothat the on-screen mouse movement correspondedto the mouse-click sound. The two tracks wereusually off by no more than one or two seconds.Still, this data-processing step was necessary sincethis study reports the relationship between eyemovements and verbalizations.

Page 6: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

COOKE: ASSESSING CONCURRENT THINK-ALOUD PROTOCOL AS A USABILITY TEST METHOD 207

Fig. 2. Procedure for preparing the eye-tracking data and transcribed verbalization data for analysis.

Video files were then created for each individualtask and used to prepare time-stamped transcriptsfor each individual session. Time stamps wereinserted at what Eveland and Dunwoody refer toas “thought units” in their analysis of think-aloudprotocol content [40, p. 229]. Such units includedsentences, clauses, phrases, single words, verbalfillers (e.g., “um” and “ah”), and gaps of silencesof two seconds or more. Since thought unitstypically ended with short pauses, the video couldbe stopped and rewound so that the thought-unitstart time could be noted on the transcript. Eachtime-stamped thought unit was then annotatedwith descriptions of eye movements and mousemovements.

At this point in the data processing, the transcriptswere ready for the first coding stage. This stagecorresponds with RQ1: To what degree are thestatements users produce during CTA accurate?The researcher chose time instead of the thoughtunit as a unit of analysis so that a direct comparisoncould be made between the transcripts and thereal-time eye movements. To ensure reliability,the researcher and a trained graduate studentindependently coded the descriptive transcripts.The transcripts were coded to determine when eyemovements verified task-related verbalizations.Coders compared each task-related verbalizationwith the eye movements that occurred during thatverbalization. When it was difficult to determine

Page 7: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

208 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010

Fig. 3. Excerpt from the annotated transcript that includes verbalization, eye movement, and mouse movement.

TABLE ICATEGORIES USED IN CODING TRANSCRIBED THOUGHT UNITS

from eye-movement description alone whether theeye movement verified the verbalization, the videoand/or the website was consulted to contextualizethe data [40]. In one case (noted in the transcriptexcerpt in Fig. 3), for instance, it was difficult todetermine what the word “there” referred to. In thebrackets, the verbalized thought unit is followedfirst by a description of the eye movement and thenby a description of the mouse movement.

In this case, subsequent analysis of the video andthe website revealed that “there” correspondedto the hyperlink “more about motorcycles.”(This hyperlink was embedded in the secondparagraph located in the main content panel.) Inthis first coding stage, the intercoder reliabilityfor verbalization/eye movement verification, asmeasured by Scott’s pi [41], was 0.81. (Scott’spi corrects for chance agreement among coders.)Intercoder reliability was acceptable.

In the next coding stage, RQ2 was addressed:What content categories do concurrent

think-aloud verbalizations include? Thecontent categories included reading, procedure,observation, explanation, and other. Threeof these categories—reading, procedure, andexplanation—were based a priori on Bowers andSnyder’s study of concurrent and retrospectivethink-aloud protocols [35]. The researcher addedobservation as a category based on an examinationof the transcript data for users’ verbalizationpatterns. Reading was defined as participants’reading of words, phrases, or sentences directlyoff the screen. Procedure included participants’descriptions of their current or future actions.Observation was defined as participants’ remarksabout the webpage or their behavior. Explanationincluded motivations for participants’ behavior.The “other” category included content that didnot fit into the other four categories. Table Iincludes transcript excerpts that illustrate thecontents of each category. The researcher and thesame graduate student who coded the descriptivetranscripts for verbalization accuracy coded thetranscripts for verbalization content. Scott’s piacross the researcher and the other coder was 0.90.

Page 8: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

COOKE: ASSESSING CONCURRENT THINK-ALOUD PROTOCOL AS A USABILITY TEST METHOD 209

The following verbalizations—task-prompt reading,silences, and verbal fillers—were removed fromthe analysis in order for the researcher to addressthe first and second research questions. Includingthese verbalizations in the data set would haveskewed the findings. If the task-prompt readingtime and content had been included, the accuracytimes would have increased since verbalizationswould have been verified by eye movement whenparticipants read the prompts aloud. Theseverbalizations also would have artificially raised thenumber of thought units in the reading category.Verbal fillers and silences were not included inthe analysis since these actions do not containinformation about the participants’ activities thatcould be verified against their eye movements.

RQ3—What do users’ eye movements reveal abouttheir behavior when they are silent or using verbalfillers?—was exploratory since verbal fillers andsilences have not yet been addressed by usabilityresearchers. Time stamps noted on the transcriptsduring verbal fillers and silences, and the videocontaining participants’ real-time eye movementsduring these times were examined together to gaininsight into participants’ behavior.

RESULTS AND DISCUSSION

The total task-performance time was 40 minutesand 4 seconds. Since RQ1 and RQ2 do not includeverbal fillers and silences, 34 minutes and 2seconds—which included 532 thought units—wereanalyzed to answer these two research questions.Since RQ3 addresses verbal fillers and silences,the total task-performance time of 40 minutes and4 seconds was analyzed to answer this researchquestion. In this section, the verbalization accuracyresults are first presented, followed by the analysisof the CTA content. Finally, the information aboutparticipants’ behavior during verbal fillers andsilences is discussed.

To What Degree Are the Statements UsersProduce During CTA Accurate? RQ1 concernedthe percentage of task-performance time that CTAverbalizations were verified by eye movement. Todetermine that data from the four tasks could becombined to report an aggregate percentage, anANOVA test was run. Since the raw data representpercentages, it was necessary to convert these datato arcsin values. (Percentage scores only rangefrom 0% to 100%. To use percentage scores wouldtherefore violate the assumption that the data arenormally distributed [42].) Using the converted

arcsin scores—and assuming a Type 1 error set at0.05—the ANOVA test showed that, across the

four tasks, there was no significant difference in thepercentage of time that verbalizations were verifiedby eye movements 3, 23 0.351 0.079 .The small sample size used in the current studylimits the statistical power of this test. Still, itis reasonable to analyze the verbalizations fromdifferent task transcripts together.

Verbalizations were verified by eye movement 80%of the time (27 minutes and 2 seconds). Thisfinding is consistent with Rhenius and Deffner’seye-tracking study of CTA [33]. These researchersreported that during an individual participant’stask performance, verbalizations were accurate87% to 90% of the time. Rhenius and Deffnerdid not report the total number of participants.Consequently, the mean accuracy percentagecould not be calculated from their findings.Nevertheless, the results of that research combinedwith the current study indicate that CTA, overall, isaccurate. Indeed, the results from the present studyare more relevant to technical communicators whouse CTA since the word and geometric puzzles usedin cognitive-psychology experiments are less likelyto be performed in usability test settings.

Across the remaining 20% of the task time,participants’ verbalizations could not be verified byeye movements. Participants reported looking atone part of the screen while their eye movementsrevealed that they were engaged in other activities.It is unlikely that these participants purposelyomitted information about their on-screenactions. Instead, the eye-movement/verbalizationdiscrepancies could be explained by users’ abilityto visually process information at a faster ratethan their ability to verbalize this information [43].Although the video tracks were synchronized toaccount for this eye/voice span, examination ofthe transcripts and real-time eye-movement videorevealed that participants’ verbalized accounts oftheir actions occurred at a slower rate than theirvisual searches of the webpage.

What Content Categories Do CTA VerbalizationsInclude? RQ2 concerned the content of users’verbalizations. A total of 532 time-stamped thoughtunits were coded for content analysis. Thoughtunits categorized as reading accounted for thelargest percentage of verbalizations, while thoughtunits categorized as explanation accounted forthe smallest percentage of verbalizations. Fig. 4shows the distribution of thought units accordingto content category.

Page 9: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

210 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010

Fig. 4. Thought units by category.

Of the 532 thought units, the majority (58%)included words, phrases, and sentences thatparticipants read off the screen. This highpercentage of thought units categorized as readingis not surprising since prior research has foundthat reading is not a cognitively demandingactivity. Ericsson’s review of the literature in thisarea revealed that when text is well written andeasily understandable, participants will, in mostcases, read the text aloud without verbalizing anyadditional information [44]. Interestingly, most ofthe thought units categorized as reading includedsection headings and link names—items userswould likely seek out during search tasks. Thisfinding confirms website designers’ advice to createinformative and visually distinct headings. Further,links should be clearly identifiable, since it appearsthat users seek these features out as a way tonavigate through a webpage, and by extension, awebsite [45]–[47].

The second-highest-ranking category, procedure,accounted for 19% of the thought units.These verbalizations consisted of participants’descriptions of their current or future activity.Procedure statements are useful because theyrepresent a direct report of that participant’sbehavior. For example, the statement, “I’m lookingat the ‘Quick Links,’” describes the participant’saction. This information would not be readilyaccessible by observation alone. Procedureverbalizations also provide information aboutthe processes by which participants achieve

their working goals. According to Flower andHayes, “People rapidly forget many of their ownworking goals once those goals have been satisfied”[20, p. 377]. Procedural verbalizations, then,most likely reveal insight about the subgoals thatparticipants form in order to complete their tasks.Since these actions are transitory, this informationwould less likely be included in RTA.

Together, reading and procedure thought unitsaccounted for 77% of CTA content. This resultsupports Bowers and Snyder’s earlier finding that amajority of verbalizations fall into these two contentcategories [35]. The difference in the findingsfrom the current study and Bowers and Snyder’sresearch is how the results are distributed betweenthe two content categories. In the Bowers andSnyder study, more verbalizations were categorizedas procedure than reading. In the current study,the reverse is true: More verbalizations werecategorized as reading than procedure. Thereversal of these findings could be due to severalthings. First, different test objects were used inthe two studies. The test object used in Bowersand Snyder’s study contained little onscreen textand, therefore, presented more opportunities forparticipants to verbalize procedure statements. Thetest object used in the current study containedlarge amounts of text. This presented participantswith more opportunities to read off the screen.Second, since participants were asked to searchfor specific information—instead of carrying out anaction—the act of reading may also be considered

Page 10: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

COOKE: ASSESSING CONCURRENT THINK-ALOUD PROTOCOL AS A USABILITY TEST METHOD 211

as a verbal representation of participants’ workinggoals.

Observation thought units accounted for 10%of the verbalizations. These verbalizationsconsisted of participants’ observations about thewebsite or about their own behavior. Real-timeeye-movement data revealed that participants usedthe opportunity to orient themselves visually tothe webpage before verbalizing their thoughts or toexplore other items on the page to make their nextnavigational move. For example, the participantwho made the statement, “This [paragraph] doesn’tsay anything about renewing,” was no longerlooking at the content that was referenced in theverbalization. Instead, the participant was scanningthe “Quick Links” column on the right side of thepage. Since observation was a content categorythat the researcher added based on a review of thetranscripts, it cannot be compared to Bowers andSnyder’s findings [35].

Participants’ explanations for the rationale and/ormotivation for their actions accounted for 5% ofthe thought units analyzed. These verbalizationsare illustrated by the following transcript excerpts:

• An endorsement is like a permit, so I’ll click here.• I’m checking out the links here because there

might be one for trade names.• This should take me where I want to go since a

private investigator is a profession.

The small percentage of explanation verbalizationscorresponds to Bowers and Snyder’s findings.Verbalizations categorized as explanation wereusually verified by eye movement since participantstended to look at the object on screen duringverbalization. In the statement, “An endorsementis like a permit so I’ll click here,” for example, theparticipant was focused on the “Endorsement”link in the main content panel. In other cases,however, eye-movement data were essential todefine referents. In regard to the statement, “I’mchecking out the links here because there might beone for trade names,” it would have been impossibleto determine where “here” meant since there werelinks on the left and right sides of the screen. Inthis case, the participant was looking at the linkson the left side of the screen when this verbalizationoccurred, even though the participant chose a linkon the right side of the screen.

Thought units categorized as “other” accounted for8% of the total number of thought units. Otherverbalizations were typically sentence fragments

that did not fit into the existing four categories.Frequently, participants started to express athought and then, while continuing to work onthe task, they trailed off into silence. As withobservation thought units, other verbalizationsprovided few clues about a participant’s on-screenactivity. Without the eye-tracking data, it wouldhave been challenging to determine from theverbalization “Is that? Well, I see ” that theparticipant, like several other participants, wasconfused by the link name wording. Since no onespecifically verbalized their confusion about thelink name, it would have been difficult to tell fromthe CTA data alone that this particular link wasthe cause of usability problems. Similarly, theverbalization, “Okay, if I ” did not reveal that adifferent participant started the search process bylooking at the top three links on the left navigationmenu, especially since the next verbalization (“I’llclick here and see what happens”) was procedural,accompanying the action of choosing a link in adifferent part of the screen.

The data set used to answer RQ1 and RQ2 dealtwith verbalizations verified by eye movement andverbalized content; these data excluded the verbalfillers and periods of silence that inevitably occurduring a CTA session. RQ3 includes these itemsand is discussed in the next section of this paper.

What Do Users’ Eye Movements Reveal aboutTheir Behavior When They Are Silent or UsingVerbal Fillers? RQ3 concerned silences and theuse of verbal fillers during CTA. Using the convertedarcsin scores, the ANOVA test showed that, acrossthe four tasks, there was no significant difference inthe percentage of time that verbal fillers were used

3, 23 1.15 0.352 . The researcher alsofound no significant difference in the percentageof time that silences occurred across the fourtasks 3, 23 0.815 0.50). Analysis of thetranscripts shows that during the 40 minutes and4 seconds of task-performance time, silences andverbal fillers occurred 16% of the time (6 minutesand 4 seconds).

Within a usability setting, researchers have notmeasured verbal fillers and silences in CTA. Assuch, it is difficult to determine from the presentstudy alone the typical or atypical nature of thesefindings. As with any usability test, it is possiblethat the tasks may have affected the findings;however, participants in the current study did notverbally report difficulties in completing the tasks.This indicates that the tasks were not cognitivelycomplex.

Page 11: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

212 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010

While the tasks may not have been complex, theact of searching for information onscreen mayhave taken cognitive priority over verbalization.In the following transcript excerpt, for instance,the eye-movement description during silenceshows that the participant continued thesearch process while the silence occurred.

About business licensing [Reads the link“About business licensing” at the top of themain content panel. Mouse on scrollbar.]

That’ll probably have something onregistering a trade name. [Fixates on the “Aboutbusiness licensing” link. Mouse on scrollbar.]

[Silence. Quickly scans down the maincontent panel; quickly looks at the top of the“Quick Links” column. Mouse not in view.]

There’s a lot of choices here. [Slowlyscans down the main content panel. Mouse notin view.]

Notice that the four-second silence provided theparticipant the opportunity to rapidly check outand assess the options located in the “Quick Links”column. Searching for information, as eye-trackingresearchers have demonstrated, is a rapid processthat requires users to simultaneously processtext and images [48]–[50]. While doing so, usersmust quickly compare the information that theyare processing with the search goal. Then, userscan assess which path to take to reach that goal.Users may lapse into these brief periods of silencewhile they make these assessments. In the aboveexample, the silence precipitated a commentabout usability: “There’s a lot of choices here.”Potentially, this indicates that the participantfound the screen to be overloaded with links. (Thisparticular screen had 24 links.) Gaps of silence,then, are not necessarily a result of a participant’slack of experience or comfort with the method.Instead, it appears that silences are signs ofcognitive-processing difficulty and, as such, shouldbe treated as opportunities to solicit informationfrom participants about their immediate thoughts.

Like silence, verbal fillers in CTA seem to servea similar assessment function. A participant’suse of a verbal filler is illustrated in the followingtranscript excerpt:

Okay, back to the links over here. [Slowlyscans the top part of the navigation bar. Mousemoves across the page to the left navigationmenu.]

dah dah dah [Continues scanning thenavigation bar. Mouse hovers at the top of thenavigation menu.]

I think cars. [Fixates on the “Cars” linkon the navigation menu. Mouse moves downto the “Cars” link on the navigation menu andclicks on the “Cars” link.]

In this case, the eye-movement description showsthat the four-second verbal filler occurred as theparticipant continued assessing the navigation barlink options before making a choice. According tospeech communication researchers, people useverbal fillers to quickly assess situations beforetaking action [51]. Since verbal fillers also provide“audible evidence that [a person] is engaged inspeech production labor” [52, p. 293], people mayuse verbal fillers in CTA situations to provide the“audible evidence” of continued verbalization.In sum, verbal fillers enable people to visuallyexplore and mentally process information duringan on-screen search.

CONCLUSION

In this study, verbalizations provided by usersduring usability testing were accurate 80% ofthe time during CTA. It is unlikely that userspurposely omit accounts of their on-screenbehavior; however, verbalizations alone do notprovide a complete picture of their experience.When participants’ verbalizations did not providebehavioral information—or when participantsused verbal fillers or lapsed into silence—their eyemovements revealed that they were still activelyengaged in the task. Paradoxically, it may beat these (seemingly insignificant) times duringnonverbalization when researchers glean theusability of an object since these instances mayindicate points when cognitive processing slowsdown. In this study, for instance, pages overloadedwith hyperlinks were often a usability problemsince participants were likely to lapse into shortperiods of silence or use verbal fillers when theyreached these screens. Instead of disregarding ordiscounting verbal fillers and silences that occurduring the CTA, usability test facilitators shouldcarefully take note of these instances and follow upwith participants after they have completed theirtask.

This study also shows that CTA content consistsprimarily of verbalizations categorized as readingor procedure. These low-level verbalizations area useful account of user on-screen behavior.These verbalizations also indicate that search is

Page 12: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

COOKE: ASSESSING CONCURRENT THINK-ALOUD PROTOCOL AS A USABILITY TEST METHOD 213

rhetorical. Like the writing model Flower and Hayesdescribe [20], users must first define the rhetoricalproblem—what is the object they hope to discoveror find during the search process? In usabilitytesting, like composition classrooms, the problemis already defined: Users are expected to findinformation that satisfies the task requirements.While in composition classes, students solve thisrhetorical problem by moving through a nonlinear,iterative writing process, in usability testing, userssolve this problem by developing search strategies.“How is information categorized? Will this linktake me to information that will help me completethe task?” In both cases, the writer or user willdraw upon long-term memory. In writing, authorsdraw upon their “knowledge of topic, audience, andwriting plans” [20, p. 370]. In search, users drawupon their experience of performing similar taskson similar websites. The low-level verbalizationsprovide insight into how users form and apply theirsearch strategies.

The explanation verbalizations that are typicallydescribed as richer sources of informationaccounted for only 5% of the total verbalizationsproduced by participants in the study. Even whenparticipants’ verbalizations were categorized asexplanation, the content was limited in terms ofwhat it revealed regarding usability. Participantsrarely said more than a few words about theirunderlying motivation for making a particular linkchoice before they quickly moved on with the task.Since CTA yields a low percentage of explanationverbalizations, it is not the optimum method toobtain this type of usability information.

Future research in this area should test whetherthe findings in this study are capable of beinggeneralized across different website genres anddesigns. In addition, the number of participantscould be increased, which would increase thenumber of thought units available for analysis andthe statistical power of the results. Qualitative-dataanalysis with larger participant pools, however, istime consuming. Data preparation in the presentstudy, for instance, had to be done manually.Instead, additional research that continues toempirically test usability evaluation methodswould be more useful. De Jong and Schellenshave recommended research that addresses theeffects of sample composition and size on thetype and relevance of information collected usingdifferent usability evaluation methods [53]. Theyhave also suggested research that addresses theimplementation and revision phases of usabilityevaluation. As the current study and previousresearch demonstrate, technical communicatorshave the education, knowledge, and expertise tocreate new theoretical foundations and guidelinesfor using CTA protocols in usability testing.

ACKNOWLEDGMENTS

The author would like to thank the following peoplefor their help with the project: Drs. J. Ramey,E. Cuddihy, R. DeSantis, and Z. Guan from theUniversity of Washington’s Laboratory for UsabilityTesting and Evaluation. The author would alsolike to thank Drs. J. Downing and T. Shadid. Thisresearch project was supported by grants fromthe University of North Texas and West ChesterUniversity of Pennsylvania.

REFERENCES

[1] A. J. Hornof, “Visual search and mouse-pointing in labeled versus unlabeled two-dimensional visualhierarchies,” ACM Trans. Comput.-Human Interact., vol. 8, no. 3, pp. 171–197, 2001.

[2] M. D. Byrne, “Cognitive architecture,” in The Human-Computer Interaction Handbook, J. A. Jacko andA. Sears, Eds. New York: Lawrence Erlbaum Associates, 2003, pp. 97–117.

[3] J. R. Anderson, The Rules of the Mind. Hillsdale, NJ: Lawrence Erlbaum Associates, 1993.[4] G. M. Olson and J. S. Olson, “Human-computer interaction: Psychological aspects of the human use of

computing,” Annu. Rev. Psychol., vol. 54, pp. 491–516, 2003.[5] J. M. Carroll, “Human-computer interaction: Psychology as a science of design,” Annu. Rev. Psychol., vol. 48,

pp. 61–83, 1997.[6] K. Holtzblatt, “Contextual design,” in The Human-Computer Interaction Handbook, J. A. Jacko and A. Sears,

Eds. New York: Lawrence Erlbaum Associates, 2003, pp. 941–963.[7] J. Ramey, “Usability testing: Conducting the test procedure itself,” in Proc. Int. Professional Communication

Conf., 1987, pp. 131–134.[8] J. Ramey, “A self-reporting methodology for rapid data analysis in usability testing,” in Proc. Int. Professional

Communication Conf., 1988, pp. 147–150.[9] J. Redish, “Reading to learn to do,” IEEE Trans. Prof. Commun., vol. 32, no. 4, pp. 289–293, Dec., 1989.

[10] P. Sullivan, “Beyond a narrow conception of usability testing,” IEEE Trans. Prof. Commun., vol. 32, no. 4,pp. 256–264, Dec., 1989.

[11] P. Sullivan, “User protocols: Tools for building an understanding of users,” in Proc. Int. Prof. Commun. Conf.,1988, pp. 259–263.

Page 13: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

214 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 53, NO. 3, SEPTEMBER 2010

[12] K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Aspects as Data, revised ed. Cambridge, MA:MIT Press, 1993.

[13] K. A. Ericsson and H. A. Simon, Protocol Analysis. Cambridge, MA: MIT Press, 1980.[14] K. S. Bowers, G. Regehr, C. Balthazard, and K. Parker, “Intuition in the context of discovery,” Cognit.

Psychol., vol. 22, no. 1, pp. 72–110, 1990.[15] L. L. Jacoby, S. D. Lindsay, and J. P. Toth, “Unconscious influences revealed: Attention, awareness, and

control,” Amer. Psychol., vol. 47, no. 6, pp. 802–809, 1992.[16] T. D. Wilson and J. W. Schooler, “Thinking too much: Introspection can reduce the quality of preferences and

decisions,” J. Personal. Social Psychol., vol. 60, no. 2, pp. 181–192, 1991.[17] J. W. Schooler, S. Ohlsson, and K. Brooks, “Thoughts beyond words: When language overshadows insight,” J.

Exper. Psychol., vol. 122, no. 2, pp. 166–183, 1993.[18] J. E. Russo, E. J. Johnson, and D. L. Stephens, “The validity of verbal protocols,” Memory

Cognit., vol. 17, no. 6, pp. 759–769, 1989.[19] M. T. Boren and J. Ramey, “Thinking aloud: Reconciling theory and practice,” IEEE Trans. Prof. Commun.,

vol. 43, no. 3, pp. 261–278, Sep., 2000.[20] L. Flower and J. R. Hayes, “A cognitive process theory of writing,” Coll. Compos. Commun., vol. 32, no. 4,

pp. 365–387, 1981.[21] R. J. Connors, “Composition studies and science,” Coll. English, vol. 44, no. 8, pp. 1–20, 1982.[22] M. Cooper and M. Holtzman, “Talking about protocols,” Coll. Comp. Commun., vol. 34, no. 3, pp. 284–293,

1983.[23] E. R. Steinberg, “Protocols, retrospective reports, and the stream of consciousness,” Coll. English, vol. 48, no. 7,

pp. 697–712, 1986.[24] J. Nielsen, Usability Engineering. London, UK: Academic Press, 1993.[25] B. Schneiderman, Designing the User Interface: Strategies for Effective Human-Computer Interaction, 3rd

ed. Reading, MA: Addison-Wesley, 1998.[26] J. C. Dumas and J. S. Redish, A Practical Guide to Usability Testing. New York: Ablex, 1994.[27] J. Rubin, Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. New York: Wiley,

1994.[28] E. Krahmer and N. Ummelen, “Thinking about thinking aloud: A comparison of verbal protocols for usability

testing,” IEEE Trans. Prof. Commun., vol. 47, no. 3, pp. 105–117, Sep., 2004.[29] M. J. van den Haak, M. D. T. de Jong, and P. J. Schellens, “Retrospective versus concurrent think-aloud

protocols: Testing the usability of an online library catalogue,” Behav. Inf. Technol., vol. 22, no. 5, pp. 339–351,2003.

[30] M. J. van den Haak, M. D. T. de Jong, and P. J. Schellens, “Employing think-aloud protocols and constructiveinteraction to test the usability of an online library,” Interact. Comput., vol. 16, no. 6, pp. 1153–1170, 2004.

[31] M. J. van den Haak, M. D. T. de Jong, and P. J. Schellens, “Constructive interaction: An analysis of verbalinteraction in a usability setting,” IEEE Trans. Profess. Commun., vol. 49, no. 4, pp. 311–324, Dec., 2006.

[32] M. J. van den Haak, M. D. T. de Jong, and P. J. Schellens, “Evaluation of an informational Web site: Threevariants of the think-aloud method compared,” Tech. Commun., vol. 54, no. 1, pp. 58–71, 2007.

[33] D. Rhenius and G. Deffner, “Evaluation of concurrent thinking aloud using eye-tracking data,” in Proc. 34thHuman Factors Soc. Annu. Meet., 1990, pp. 1265–1269.

[34] M. Hertzum, K. D. Hansen, and H. H. K. Andersen, “Scrutinising usability evaluation: Does thinking aloudaffect behaviour and mental workload?,” Behav. Inf. Technol., vol. 28, no. 2, pp. 165–181, 2009.

[35] V. A. Bowers and H. L. Snyder, “Concurrent versus retrospective verbal protocol for comparing Windowsusability,” in Proc. 34th Human Factors Soc. Annu. Meet., 1990, pp. 1270–1274.

[36] L. R. Peterson, “Concurrent verbal activity,” Psychol. Rev., vol. 76, no. 4, pp. 376–386, 1969.[37] R. J. K. Jacob and S. K. Karn, “Eye tracking in human-computer interaction and usability research: Ready to

deliver the promises,” in Cognitive and Applied Aspects of Eye Movement Research, J. Hyönä, R. Radach, andH. Deubel, Eds. Amsterdam, The Netherlands: Elsevier, 2003, pp. 573–605.

[38] L. Cooke, “Eye tracking: How it works and how it relates to usability,” Tech. Commun., vol.52, no. 4, pp. 456–463, 2005.

[39] L. Cooke, “How do users search web home pages? An eye-tracking study of multiple navigation menus,”Tech. Commun., vol. 55, no. 2, pp. 176–194, 2008.

[40] W. P. Eveland and S. Dunwoody, “Examining information processing on the World Wide Web using thinkaloud protocols,” Media Psychol., vol. 2, no. 3, pp. 219–244, 2004.

[41] W. A. Scott, “Reliability of content analysis: The case of nominal scale coding,” Public Opin. Quart., vol. 19, no.3, pp. 321–325, 1955.

[42] G. O. Wesolowsky, Multiple Regression and Analysis of Variance. New York: Wiley, 1976.[43] K. Rayner, “Eye movements in reading and information processing: 20 years of research,” Psychol.

Bull., vol. 124, no. 3, pp. 372–422, 1998.[44] K. A. Ericsson, “Concurrent verbal reports on reading and text compression,” Text, vol. 8, no. 4, pp. 295–325,

1988.[45] J. Redish, Letting Go of the Words: Writing Web Content That Works. Amsterdam, The Netherlands: Elsevier,

2007.[46] S. Krug, Don’t Make Me Think: A Common Sense Approach to Web Usability, 2nd ed. Berkeley, CA: New

Rider’s Press, 2005.

Page 14: 202 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, …courses.ischool.utexas.edu/Bias_Randolph/2012/Fall/... · usability testing, adhering to the strict guidelines for administering

COOKE: ASSESSING CONCURRENT THINK-ALOUD PROTOCOL AS A USABILITY TEST METHOD 215

[47] J. Nielsen and H. Loranger, Prioritizing Web Usability. Berkeley, CA: New Rider’s Press, 2006.[48] F. Engle, “Visual conspicuity, visual search and fixation tendencies of the eye,” Vis. Res., vol. 17, no. 1,

pp. 95–108, 1977.[49] J. Findlay, “Visual search: Eye movements and peripheral vision,” Optomet. Vis. Sci., vol. 72, no. 7, pp. 461–466,

1995.[50] J. M. Henderson and A. Hollingworth, “Eye movements during scene viewing: An overview,” in Eye Guidance in

Reading and Scene Perception, G. Underwood, Ed. Amsterdam, The Netherlands: Elsevier, 1998, pp. 269–293.[51] H. H. Clark and J. E. Fox Tree, “Using uh and um in spontaneous speaking,” Cognition, vol. 84, no. 1,

pp. 73–111, 2002.[52] E. Goffman, “Radio talk,” in Forms of Talk, E. Goffman, Ed. Philadelphia, PA: Univ. of Pennsylvania Press,

1981, pp. 197–327.[53] M. de Jong and P. J. Schellens, “Toward a document evaluation methodology: What does research tell

us about the validity and reliability of evaluation methods?,” IEEE Trans. Prof. Commun., vol. 43, no. 3,pp. 242–260, Sep., 2000.

Lynne Cooke received the Ph.D. degree from RensselaerPolytechnic Institute, Troy, NY. Currently, she is an assistantprofessor in the English Department at West Chester Universityof Pennsylvania, West Chester, PA, where she teaches courses

in technical communication, visual design, and usability testmethods. Her research explores relationships between technicalcommunication and cognitive psychology, human-computerinteraction, and information design.