Download - Evaluating Interfaces

Evaluating Interfaces

Goals of evaluation

Lab versus field based evaluation

Evaluation methodsDesign-oriented

Implemented-oriented

The goals of evaluation?

To ensure that the interface behaves as we expect and meets user needs Assess the extent of its functionality Assess its impact on the user Identify specific problems Assess the ‘usability’ of the interface

Laboratory studies versus field studies

Laboratory studies The user comes to the evaluator Well-equipped laboratory may contain sophisticated

recording facilities, two-way mirrors, instrumented computers etc.

Can control or deliberately manipulate the context of use The only option for some dangerous or extreme interfaces But cannot reproduce the natural working context of a

users environment, especially social interaction and contingencies

Difficult to evaluate long-term use

Field studies The evaluator goes to the user Captures actual context Captures real working practice and social

interaction Not possible for some applications It can also be difficult to capture data Cannot ‘prove’ specific hypotheses

Different kinds are appropriate at different stages of design

USER

Requirements

Design

Test

Implement

Maintain

Task

Computers

Environment

Iterations

Later-on: evaluation of the implementation – detailed, rigorous and with end-user

Early-on: formative evaluation of the design – may only involve designers and other experts

Evaluation Methods

Design-oriented evaluation methods: Cognitive walkthrough Heuristic/expert inspections Theory and literature review

Implementation-oriented methods: Observation Controlled experiments Query techniques – interviews and surveys

Cognitive Walkthrough

A predictive technique in which designers and possibly experts simulate the user’s problem-solving process at each step of the human-computer dialogue

Originated in ‘code walkthrough’ from software engineering

Used mainly to consider ‘ease of learning’ issues – especially how users might learn by exploring the interface

Cognitive Walkthrough – The Stages

Begins with A detailed description of the prototype (e.g., menu

layouts) Description of typical tasks the user will perform A written list of the actions required to complete

the tasks with the prototype An indication of who the users are and what kind

of experience and knowledge they may have

For each task, evaluators step through the necessary action sequences, imagining that they are a new user and asking the following questions: Will the user know what to do next? Can the user see how to do it? Will they know that they have done the right thing?

It is vital to document the walkthrough Who did what and when Problems that arose and severity ratings Possible solutions

A short fragment of cognitive walkthrough Evaluating the interface to a personal desktop photocopier A design sketch shows a numeric keypad, a "Copy" button, and a

push button on the back to turn on the power. The specification says the machine automatically turns itself off after

5 minutes inactivity. The task is to copy a single page, and the user could be any office

worker. The actions the user needs to perform are to turn on the power, put

the original on the machine, and press the "Copy" button Now tell a believable story about the user's motivation and

interaction at each action …

From Philip Craiger's page at http://istsvr03.unomaha.edu/gui/cognitiv.htm

The user wants to make a copy and knows that the machine has to be turned on. So they push the power button. Then they go on to the next action. But this story isn't very believable.

We can agree that the user's general knowledge of office machines will make them think the machine needs to be turned on, just as they will know it should be plugged in. But why shouldn't they assume that the machine is already on? The interface description didn't specify a "power on" indicator. And the user's background knowledge is likely to suggest that the machine is normally on, like it is in most offices.

Even if the user figures out that the machine is off, can they find the power switch? It's on the back, and if the machine is on the user's desk, they can't see it without getting up. The switch doesn't have any label, and it's not the kind of switch that usually turns on office equipment (a rocker switch is more common). The conclusion of this single-action story leaves something to be desired as well. Once the button is pushed, how does the user know the machine is on? Does a fan start up that they can hear? If nothing happens, they may decide this isn't the power switch and look for one somewhere else.

Heuristic/Expert Inspections

Experts assess the usability of an interface guided by usability principles and guidelines (heuristics)

Jacob Nielsen suggest that 5 experts may enough to uncover 75% of usability problems

Best suited to early design and when there is some kind of representation of the system – e.g., storyboard

It’s only as good as the experts and you need experts in the problem domain and usability

The Process of Heuristic Expert Inspections Briefing session

Experts all given identical description of product, its context of use ad goals of evaluation

Evaluation period Each experts spends several hours independently

critiquing the interface At least two passes through the interface, one for overall

appreciation and others for detailed assessment Debriefing session

Experts meet to compare findings, prioritise problems and propose solutions

They report/present their findings to decision makers and other stakeholders

Theory and literature review

We have seen before that you can apply existing theory to evaluate a design The Keystroke Level Model Fit’s law

HCI and experimental psychology already contain a wealth of knowledge about how people interact with computers Scour the literature (ACM Digital Library, Google, Citeseer

and others) But think carefully about whether the results transfer

Observation

Observe users interacting with the interface – in the laboratory or field

Record interactions using Pen and paper Audio Video Computer logging User notebooks and diaries

Think-aloud techniques

Analysis

Illustrative fragments Detailed transcription and coding Post-task walkthroughs

Specialised analysis software can replay video along system data and help the analyst synchronise notes and data

Savannah

An educational game for six players at a time A virtual savannah is overlaid on an empty school playing field

Studying Savannah

Six trials over three days

Two video recordings from the field

Game replay interface

Impala Sequence

The Impala Sequence RevealedElsa suddenly stops

Circular formation

Counting aloud

Nala and Elsa cannot see the impala

Replay shows them stopped on edge of locale

GPS drift carries them over the boundary

The boy who passed through attacked first

Controlled Experiments

Query Techniques

Elicit the user’s view of the system Can address large numbers of users Key techniques are:

Interviews surveys

Relatively simple and cheap to administer But not so detailed or good for exploring

alternative designs

DECIDE: A Framework to Guide Evaluation (Preece, et al, 2001)

Determine the overall goals that the evaluation addresses

Explore the specific questions to be answered Chose the evaluation approach and specific

techniques to answer these questions Identify the practical issues that must be addressed Decide how to deal with ethical issues Evaluate, interpret and present the date

Evaluation through questionnaires A fixed set of written questions usually with written answers Advantages:

gives the user’s point of view –good for evaluating satisfaction quick and cost effective to administer and score and so can deal

with large numbers of users user doesn’t have to be present

Disadvantages: only tells you how the user perceives the system Not good for some kinds of information

Things that are hard to remember (e.g., times and frequencies) Things that involve status or are sensitive to disclose

Usually not very detailed May suffer from bias

Questions Three types of questions

Factual - ask about observable information Opinion –what the user thinks about something (outward facing) Attitudes – how the user feels about something (inward facing).

Maybe whether they feel efficient, do they like the system, do they feel in control?

Two general styles of question Closed – the user chooses from among a set number of options

– quick to complete and easy to summarise with statistics Open – the user gives free-form answers – captures more

information – slower to complete and harder to summarise statistically (may require coding)

Most questionnaires mix open and closed questions

Options for closed questions

Number of options matches number of possible responses Likert scales capture strength of opinion

odd number of options on a preference scale when there is the possibility of a neutral response

granularity of the scales depends upon respondents’ expertise

strongly agree

agree neutral disagree strongly disagree

Questionnaire Analysis

Closed questions subject to statistics Graphical representations (bar charts, pie charts) Averages and measures of spread Always look at the raw data You can only make statistical inferences from

carefully designed questionnaires Open questions

Give general sense of feedback May be coded and then statistically analysed

Deploying questionnaires

Post Interview Email As part of interface Web – brings important advantages

Ease of deployment Reliable data collection Semi-automated analysis

Designing questionnaires

What makes a good or bad questionnaire Reliability - ability to give the same results when filled out

by like-minded people in similar circumstances Validity - the degree to which the questionnaire is actually

measuring or collecting data about what you think it should Clarity, length and difficulty

Designing a good questionnaire is surprisingly difficult – pilot, pilot, pilot!!

Statistically valid questionnaire design is a very specialised skill – use an existing one

What to ask

Background questions on the users: Name, age, gender Experience with computers in general and this

kind of interface in specific Job responsibilities and other relevant information Availability for further contact such as interview

Interface specific questions

System Usability Scale (SUS)

1. I think I would like to use this system frequently2. I found the system unnecessarily complex3. I thought the system was easy to use4. I think I would need the support of a technical person to be able

to use this system5. I found the various functions in this system were well integrated6. I thought there was too much inconsistency in this system7. I would imagine that most people would learn to use this system

very quickly8. I found the system very cumbersome to use9. I felt very confident using the system10. I needed to learn a lot of things before I could get going with this

system

Calculating a rating from SUS

For odd numbered questions, score = scale position - 1

For even numbered questions, score = 5 - scale position

Multiply all scores by 2.5 (so each question counts 10 points)

Final score for an individual = sum of multiplied scores for all questions (out of 100)

Evaluation during active use System refinement based on experience or in

response to changes in users interviews and focus group discussions continuous user-performance data logging

frequent and infrequent error messages analyse sequences of actions to suggest improvements or

new actions BUT respect people’s rights and consult them first!

User feedback mechanisms on-line forms, email and bulletin boards workshops and conferences

Choosing participants

Usually we generalise findings based on a sample of participants

Need to select carefully to avoid sampling bias Watch out for self-selection and selection based on

convenience Random sampling – every member of the target

population has an equal chance to be selected But how to get access? Often need to advertise Describe your sample in your write up

How many users? 5-12 as a rough rule of thumbNielson and Landauer (1993) www.useit.com/alertbox/20000319.html

Ethical Issues

Explain the purpose of the evaluation to participants, including how their data will be used and stored

Get their consent, preferably in writing. Get parental consent for kids Anonymise data:

As stored – use anonymous ids apart from in one name-id mapping table

As reported – in text and also in images Do not include quotes that reveal identity

Gain approval from your ethics committee and/or professional body

Example consent form

“I state that I am {specific requirements} and wish to participate in a study being conducted by {name/s of researchers/ evaluators} at the {organisation name}.

The purpose of the study is to {general study aims}.The procedures involve {generally what will happen}. I understand that I will be asked to {specific tasks being given}.I understand that all information collected in the study is confidential, and that my name will not be identified at any time.I understand that I am free to withdraw from participation at any time without penalty”

Signature of participant and date

Good practice

Inform users that it is the system under test, not them

Put users at ease Do not criticise their performance/opinions Ideally, you should reward or pay participants May be polite and a good motivator to make

results available to participants

Which method to choose Design or implementation? Laboratory or field studies? Subjective or objective? Qualitative or quantitative? Performance or satisfaction? Level of information provided? Immediacy of response? Intrusiveness? Resources and cost?