Evaluating Interfaces
Goals of evaluation
Lab versus field based evaluation
Evaluation methodsDesign-oriented
Implemented-oriented
The goals of evaluation?
To ensure that the interface behaves as we expect and meets user needs Assess the extent of its functionality Assess its impact on the user Identify specific problems Assess the ‘usability’ of the interface
Laboratory studies versus field studies
Laboratory studies The user comes to the evaluator Well-equipped laboratory may contain sophisticated
recording facilities, two-way mirrors, instrumented computers etc.
Can control or deliberately manipulate the context of use The only option for some dangerous or extreme interfaces But cannot reproduce the natural working context of a
users environment, especially social interaction and contingencies
Difficult to evaluate long-term use
Field studies The evaluator goes to the user Captures actual context Captures real working practice and social
interaction Not possible for some applications It can also be difficult to capture data Cannot ‘prove’ specific hypotheses
Different kinds are appropriate at different stages of design
USER
Requirements
Design
Test
Implement
Maintain
Task
Computers
Environment
Iterations
Later-on: evaluation of the implementation – detailed, rigorous and with end-user
Early-on: formative evaluation of the design – may only involve designers and other experts
Evaluation Methods
Design-oriented evaluation methods: Cognitive walkthrough Heuristic/expert inspections Theory and literature review
Implementation-oriented methods: Observation Controlled experiments Query techniques – interviews and surveys
Cognitive Walkthrough
A predictive technique in which designers and possibly experts simulate the user’s problem-solving process at each step of the human-computer dialogue
Originated in ‘code walkthrough’ from software engineering
Used mainly to consider ‘ease of learning’ issues – especially how users might learn by exploring the interface
Cognitive Walkthrough – The Stages
Begins with A detailed description of the prototype (e.g., menu
layouts) Description of typical tasks the user will perform A written list of the actions required to complete
the tasks with the prototype An indication of who the users are and what kind
of experience and knowledge they may have
For each task, evaluators step through the necessary action sequences, imagining that they are a new user and asking the following questions: Will the user know what to do next? Can the user see how to do it? Will they know that they have done the right thing?
It is vital to document the walkthrough Who did what and when Problems that arose and severity ratings Possible solutions
A short fragment of cognitive walkthrough Evaluating the interface to a personal desktop photocopier A design sketch shows a numeric keypad, a "Copy" button, and a
push button on the back to turn on the power. The specification says the machine automatically turns itself off after
5 minutes inactivity. The task is to copy a single page, and the user could be any office
worker. The actions the user needs to perform are to turn on the power, put
the original on the machine, and press the "Copy" button Now tell a believable story about the user's motivation and
interaction at each action …
From Philip Craiger's page at http://istsvr03.unomaha.edu/gui/cognitiv.htm
The user wants to make a copy and knows that the machine has to be turned on. So they push the power button. Then they go on to the next action. But this story isn't very believable.
We can agree that the user's general knowledge of office machines will make them think the machine needs to be turned on, just as they will know it should be plugged in. But why shouldn't they assume that the machine is already on? The interface description didn't specify a "power on" indicator. And the user's background knowledge is likely to suggest that the machine is normally on, like it is in most offices.
Even if the user figures out that the machine is off, can they find the power switch? It's on the back, and if the machine is on the user's desk, they can't see it without getting up. The switch doesn't have any label, and it's not the kind of switch that usually turns on office equipment (a rocker switch is more common). The conclusion of this single-action story leaves something to be desired as well. Once the button is pushed, how does the user know the machine is on? Does a fan start up that they can hear? If nothing happens, they may decide this isn't the power switch and look for one somewhere else.
Heuristic/Expert Inspections
Experts assess the usability of an interface guided by usability principles and guidelines (heuristics)
Jacob Nielsen suggest that 5 experts may enough to uncover 75% of usability problems
Best suited to early design and when there is some kind of representation of the system – e.g., storyboard
It’s only as good as the experts and you need experts in the problem domain and usability
The Process of Heuristic Expert Inspections Briefing session
Experts all given identical description of product, its context of use ad goals of evaluation
Evaluation period Each experts spends several hours independently
critiquing the interface At least two passes through the interface, one for overall
appreciation and others for detailed assessment Debriefing session
Experts meet to compare findings, prioritise problems and propose solutions
They report/present their findings to decision makers and other stakeholders
Theory and literature review
We have seen before that you can apply existing theory to evaluate a design The Keystroke Level Model Fit’s law
HCI and experimental psychology already contain a wealth of knowledge about how people interact with computers Scour the literature (ACM Digital Library, Google, Citeseer
and others) But think carefully about whether the results transfer
Observation
Observe users interacting with the interface – in the laboratory or field
Record interactions using Pen and paper Audio Video Computer logging User notebooks and diaries
Think-aloud techniques
Analysis
Illustrative fragments Detailed transcription and coding Post-task walkthroughs
Specialised analysis software can replay video along system data and help the analyst synchronise notes and data
Savannah
An educational game for six players at a time A virtual savannah is overlaid on an empty school playing field
Studying Savannah
Six trials over three days
Two video recordings from the field
Game replay interface
Impala Sequence
The Impala Sequence RevealedElsa suddenly stops
Circular formation
Counting aloud
Nala and Elsa cannot see the impala
Replay shows them stopped on edge of locale
GPS drift carries them over the boundary
The boy who passed through attacked first
Controlled Experiments
Query Techniques
Elicit the user’s view of the system Can address large numbers of users Key techniques are:
Interviews surveys
Relatively simple and cheap to administer But not so detailed or good for exploring
alternative designs
DECIDE: A Framework to Guide Evaluation (Preece, et al, 2001)
Determine the overall goals that the evaluation addresses
Explore the specific questions to be answered Chose the evaluation approach and specific
techniques to answer these questions Identify the practical issues that must be addressed Decide how to deal with ethical issues Evaluate, interpret and present the date
Evaluation through questionnaires A fixed set of written questions usually with written answers Advantages:
gives the user’s point of view –good for evaluating satisfaction quick and cost effective to administer and score and so can deal
with large numbers of users user doesn’t have to be present
Disadvantages: only tells you how the user perceives the system Not good for some kinds of information
Things that are hard to remember (e.g., times and frequencies) Things that involve status or are sensitive to disclose
Usually not very detailed May suffer from bias
Questions Three types of questions
Factual - ask about observable information Opinion –what the user thinks about something (outward facing) Attitudes – how the user feels about something (inward facing).
Maybe whether they feel efficient, do they like the system, do they feel in control?
Two general styles of question Closed – the user chooses from among a set number of options
– quick to complete and easy to summarise with statistics Open – the user gives free-form answers – captures more
information – slower to complete and harder to summarise statistically (may require coding)
Most questionnaires mix open and closed questions
Options for closed questions
Number of options matches number of possible responses Likert scales capture strength of opinion
odd number of options on a preference scale when there is the possibility of a neutral response
granularity of the scales depends upon respondents’ expertise
strongly agree
agree neutral disagree strongly disagree
Questionnaire Analysis
Closed questions subject to statistics Graphical representations (bar charts, pie charts) Averages and measures of spread Always look at the raw data You can only make statistical inferences from
carefully designed questionnaires Open questions
Give general sense of feedback May be coded and then statistically analysed
Deploying questionnaires
Post Interview Email As part of interface Web – brings important advantages
Ease of deployment Reliable data collection Semi-automated analysis
Designing questionnaires
What makes a good or bad questionnaire Reliability - ability to give the same results when filled out
by like-minded people in similar circumstances Validity - the degree to which the questionnaire is actually
measuring or collecting data about what you think it should Clarity, length and difficulty
Designing a good questionnaire is surprisingly difficult – pilot, pilot, pilot!!
Statistically valid questionnaire design is a very specialised skill – use an existing one
What to ask
Background questions on the users: Name, age, gender Experience with computers in general and this
kind of interface in specific Job responsibilities and other relevant information Availability for further contact such as interview
Interface specific questions
System Usability Scale (SUS)
1. I think I would like to use this system frequently2. I found the system unnecessarily complex3. I thought the system was easy to use4. I think I would need the support of a technical person to be able
to use this system5. I found the various functions in this system were well integrated6. I thought there was too much inconsistency in this system7. I would imagine that most people would learn to use this system
very quickly8. I found the system very cumbersome to use9. I felt very confident using the system10. I needed to learn a lot of things before I could get going with this
system
Calculating a rating from SUS
For odd numbered questions, score = scale position - 1
For even numbered questions, score = 5 - scale position
Multiply all scores by 2.5 (so each question counts 10 points)
Final score for an individual = sum of multiplied scores for all questions (out of 100)
Evaluation during active use System refinement based on experience or in
response to changes in users interviews and focus group discussions continuous user-performance data logging
frequent and infrequent error messages analyse sequences of actions to suggest improvements or
new actions BUT respect people’s rights and consult them first!
User feedback mechanisms on-line forms, email and bulletin boards workshops and conferences
Choosing participants
Usually we generalise findings based on a sample of participants
Need to select carefully to avoid sampling bias Watch out for self-selection and selection based on
convenience Random sampling – every member of the target
population has an equal chance to be selected But how to get access? Often need to advertise Describe your sample in your write up
How many users? 5-12 as a rough rule of thumbNielson and Landauer (1993) www.useit.com/alertbox/20000319.html
Ethical Issues
Explain the purpose of the evaluation to participants, including how their data will be used and stored
Get their consent, preferably in writing. Get parental consent for kids Anonymise data:
As stored – use anonymous ids apart from in one name-id mapping table
As reported – in text and also in images Do not include quotes that reveal identity
Gain approval from your ethics committee and/or professional body
Example consent form
“I state that I am {specific requirements} and wish to participate in a study being conducted by {name/s of researchers/ evaluators} at the {organisation name}.
The purpose of the study is to {general study aims}.The procedures involve {generally what will happen}. I understand that I will be asked to {specific tasks being given}.I understand that all information collected in the study is confidential, and that my name will not be identified at any time.I understand that I am free to withdraw from participation at any time without penalty”
Signature of participant and date
Good practice
Inform users that it is the system under test, not them
Put users at ease Do not criticise their performance/opinions Ideally, you should reward or pay participants May be polite and a good motivator to make
results available to participants
Which method to choose Design or implementation? Laboratory or field studies? Subjective or objective? Qualitative or quantitative? Performance or satisfaction? Level of information provided? Immediacy of response? Intrusiveness? Resources and cost?
Top Related