Usability and Human Factors Unit 5a Usability Evaluation Methods.
Usability evaluation methods (part 2) and performance metrics
-
Upload
andres-baravalle -
Category
Education
-
view
700 -
download
0
Transcript of Usability evaluation methods (part 2) and performance metrics
CN5111 – Week 4: Usability evaluation methods (part 2) and performance metrics Dr. Andres Baravalle
Lecture content• Usability testing (review & scenarios)• Usability inspection• Usability inquiry• Performance metrics
2
Usability testing (review and scenarios)
3
Usability testing• When: common for comparison of products or
prototypes• Tasks & questions focus on how well users
perform tasks with the product– Focus is on time to complete task & number & type of
errors• Data collected by video & interaction logging• Experiments are central in usability testing
– Usability inquiry tends to use questionnaires & interviews
4
Testing conditions• Usability lab or other controlled space• Emphasis on:
– Selecting representative users– Developing representative tasks
• Small sample (5-10 users) typically selected• Tasks usually last no longer than 30 minutes• The test conditions should be the same for every
participant
5
Some type of data· Time to complete a task· Time to complete a task after a specified time
away from the product· Number and type of errors per task· Number of errors per unit of time· Number of navigations to online help or manuals· Number of users making a particular error· Number of users completing task successfully
6
How many participants is enough for user testing?• The number is a practical issue• Depends on:
– Schedule for testing– Availability of participants– Cost of running tests
• Typically 5-10 participants– Some experts argue that testing should
continue with additional users until no new insights are gained
7
Examples• The next slides describe 2 experiments:
the one behind the book Prioritizing Web Usability and a fictional one on OpenSMSDroid
• Both use Thinking Aloud and video/screen recording for data collection
8
Prioritizing Web Usability• Prioritizing Web Usability (Nielsen and Loranger, 2006)
used the Thinking Aloud method to collect insight on user behaviour:– 69 users, all with at least one year experience in using the
web– Broad range of job backgrounds and web experience – but no
one working in IT or marketing– 25 web sites tested with specific tasks– Windows desktops with 1024x768 resolution running Internet
Explorer– Recordings of monitor and upper body for each session– Broadband speed between 1 and 3 Mbps
9
Prioritizing Web Usability (2)• The tasks that the users were asked to perform
included:– Go to ups.com and find how much does it cost to
send a postcard to China– You want to visit the Getty Museum this weekend. Go
to getty.edu and find opening times/prices– Go to nestle.com and find a snack to eat during
workouts– Go to bankone.com and find best savings account if
you have a $1,000 balance
10
Prioritizing Web Usability (3)• The result of the research is presented as
a book:– Organising the finding in categories (including
searching, navigation, typography and writing style)
– Using plenty of examples and screenshots to demonstrate the usability issues that were identified
11
Prioritizing Web Usability: findings• People succeed 66% of the time when
working on “single site” activities and 60% of the time when having to browse through the internet for information
12
Prioritizing Web Usability: findings (2)• Experienced users spend about 25
seconds in a homepage and 45 in an interior page (35 and 60 for inexperienced users)
• Only 23% of users scroll on their first visit of a homepage – The number decreases after the first visit– The average scroll for first visit is 0.8 of a
screen
13
Prioritizing Web Usability: findings (3)• 88% of users go to search engines to find
information• Font face and size: different font faces for
print and screen – Different font size depending on target
audience• More in the book…
14
OpenSmsDroid evaluation• You have been tasked to evaluate the usability
for a new (fictional) Android application to write short text messages, OpenSMSDroid
• You have decided to set up an experiment– The next experiment is (loosely) adapted from
“Experimental Evaluation of Techniques for Usability Testing of Mobile Systems in a Laboratory Setting” (Beck, Christiansen, Kjeldskov, Kolbe and Stage, 2003)
15
OpenSmsDroid evaluation• Your test users will be perform a set of
tasks in specific configurations using the thinking aloud method for data collection– A constraint of 5 minutes has been set for
each of the tasks– The usability researcher will record the
session and take notes
16
OpenSmsDroid evaluation: testing configurations• Configurations for the test (tentative list):
– Sitting on a chair at a table– Walking on a treadmill at constant speed– Walking on a treadmill at varying speed– Walking on an 8-shaped course that is changing as
obstructions are being moved, within 2 meters of a person that walks at constant speed
– Walking on an 8-shaped course that is changing as obstructions are being moved, within 2 meters of a person that walks at varying speed
– Walking in Westfield Stratford at 16:00 on Saturday
17
OpenSmsDroid evaluation: testing configurations (2)• For practical reasons and after reviewing the
literature, these settings have been selected for this evaluation:– Sitting on a chair at a table– Walking on a treadmill at constant speed– Walking in Westfield Stratford at 16:00 on
Saturday
18
OpenSmsDroid evaluation: tasks• Writing a new SMS containing the phrase “The quick
brown fox jumps over the lazy dog” repeated 2 times to an existing contact (without using predictive text features)
• Writing a new SMS containing the phrase “The quick brown fox jumps over the lazy dog” repeated 2 times to an existing contact (using predictive text features)
• Taking a picture and sending it to an existing contact• Taking a short 1 minute video and sending it to an
existing contact
19
OpenSmsDroid evaluation: tasks (2)• In each test, you can collect:
– Quantitative data: time needed to perform the task, and if the task has been completed
– Qualitative data: asking the user to think aloud while interacting with the device and recording the interaction
20
OpenSmsDroid evaluation: data analysis• The evaluation will analyse the data
collected and report on any findings, informing on any difference in performance and suggesting possible changes to the interface– An experiment can also generate further
hypothesis which will be used in further experiments
21
OpenSmsDroid experiment: what's missing?• Something to compare to!
– Otherwise you cannot know if the interface is better or not
22
Usability inspections
23
Usability inspection methods• Heuristic evaluation and walkthroughs
are the most common usability inspection methods– We'll also see several other methods
24
Usability inspections and heuristics (2)• Usability inspection methods are based on
having evaluators inspecting an user interface
• Usability inspection methods aim to examine usability-related aspects of an user interface, even if the interface has not been yet developed– Can be used to perform usability evaluations
in the initial stages of the development
25
Heuristic evaluations• Heuristic evaluation is a method that
requires usability specialists to judge whether each element of an user interface follows established usability principles and guidelines– E.g. Jakob Nielsen’s heuristics
• Heuristics are being developed for mobile devices, wearables, virtual worlds, etc.
26
27
Nielsen’s heuristics: discount evaluations• An heuristic evaluation is referred to as
discount evaluation when 5 evaluators are used– Empirical evidence suggests that on
average 5 evaluators identify 75-80% of usability problems on generalist web sites
28
Heuristic evaluations: stages
• Briefing session to tell experts what to do.• Evaluation period of 1-2 hours in which:
– Each expert works separately– Take one pass to get a feel for the product– Take a second pass to focus on specific features
• Debriefing session in which experts work together to prioritize and categorise the problems
Relevant standards and guidelines• Relevant standards and guidelines
include:– W3C HTML and CSS standards– W3C WAI guidelines– Mobile Web Best Practices guidelines– ISO 9271
29
Heuristic evaluations: advantages and problems• Few ethical & practical issues to consider
because users not involved– Can be difficult & expensive to find experts– Experts should have knowledge of application domain
& of the evaluation method used• Critical points:
– Important problems may get missed– Focus can be lost on trivial problems – Experts have biases
30
Cognitive walkthroughs• Focus on ease of learning• Designer presents an aspect of the design
& usage scenarios• Expert is told the assumptions about user
population, context of use, task details.• One or more experts walk through the
design prototype with the scenario.
31
Cognitive walkthroughs (2)• Experts are guided by 3 questions:
– Will the correct action be sufficiently evident to the user?
– Will the user notice that the correct action is available?
– Will the user associate and interpret the response from the action correctly?
• As the experts work through the scenario they note problems.
32
Pluralistic walkthrough• Variation on the cognitive walkthrough
theme.– Performed by a team
• The panel of experts begins by working separately
• Then there is managed discussion that leads to agreed decisions.
• The approach lends itself well to participatory design
33
Feature inspection• Feature inspection is a technique that focuses
on the features of a product or of a web site– A group of inspectors that are given some use cases
and are asked to analyse each feature of the web site for what regards availability, understandability, and other aspects of usability
– This technique is better in the middle stages of development, when features are known but the artefact cannot be evaluated with methods as lab experiments.
34
Standards inspection• Standards inspection is a technique used to
ensure the compliance of a web site against some standard
• A usability professional with extensive knowledge of the relevant standards inspects a web site for compliance
• Different standard inspections can be run on the same artefact– Nielsen’s heuristics include standards inspection
35
Usability inquiry
36
Truckers and mobile devices[...] a major mobile device company was trying to understand why there were so many data entry errors on a mobile device for long-haul truck drivers. Many people in the company tended to blame the truckers, whom they assumed were un educated.None of them had ever actually met a trucker, but they figured it couldn’t be too hard to type in a word or two. One winter, a senior user interface (UI) designer decided to see for himself.The designer spent a week at a truck stop watching truck drivers use the device and talking to them about it. He quickly discovered that the truckers could spell perfectly. Instead, the problem was the device. The truckers tended to be big men, with big fingers. To make matters worse, they often wore bulky gloves in the winter.The device had tiny buttons, making typing with big fingers in warm gloves frustrating. (Observing the User's Experience; Goodman, Kuniavsky and Moed 2012)
37
Truckers and mobile devices (2)• The team realized it had been basing
important design decisions on faulty assumptions
• The team redesigned the UI so that it required less typing and increased the size of buttons. – The error rates dropped dramatically
38
Truckers and mobile devices (2)• That's an example of how usability enquiry
methods work
39
Usability inquiry• Usability inquiry methods focus (at
different degrees) on analysing an artefact either from “the native point of view" or looking for “the native point of view"– Used to obtain information about users' likes,
dislikes, needs, and understanding of the system
40
Usability inquiry (2)• They may use one or more of these
tecniques:– Talking to users– Observing users using a system in a real
working situation– Letting the users answer questions (verbally
or in written form)
41
Data collection & analysis• Data collection:
– Observation & interviews (e.g. contextual inquiry)– Notes, pictures, recordings, diaries– Video– Logging
• Analysis– Categorizing the findings– Using existing categories can be provided by pre-
existing research
42
Usability inquiry methods• The next slides will cover some popular
usability inquiry methods:– Diary– Contextual inquiry– Interviews and focus groups– Surveys
43
Diary method• The diary method requires users to keep a
diary of their interactions• Diaries can be free form or structured
– The diary method is best used when the researcher does not have the time, the resources or the possibility to use user monitoring methods or when the level of detail provided by user monitoring methods is not needed
44
Contextual inquiry• Contextual inquiry is a structured field
interviewing method which typically evaluates:– User opinions – User experience – Motivation– Context
• It is a study based on dialogue and interaction between interviewee and user, and it is one of the best methods to use when researchers need to understand the users' work context.
45
Interviews and focus groups• Interviews and focus groups are research
methods based on interaction between researchers and users– The researcher facilitates the discussion
about the issues rose by the questions– In focus groups (multiple users present), the
interaction among the users may raise additional issues, or identify common problems that many persons experience
46
Surveys• Surveys are a quantitative research
method, where a set list of questions are asked and the users' responses recorded– When the questions are administered by a
researcher, the survey is called a structured interview
– When the questions are administered by the respondent, the survey is referred to as a questionnaire
47
And now…• You have had an overview of a wide
selection of usability evaluation methods– And you are ready to use them in your
assignment
48
Measuring the User Experience• The next slides are based on the core text
book for this module, "Measuring the User Experience"
49
Performance metrics
50
Types of performance metrics• Task success• Level of success• Errors• Efficiency• Learnability
51
Task success• Binary success• Level of success
52
Task success• It measures how effectively users are able
to complete a given set of tasks• To measure task success, each task that
users are asked to perform must have a clear end state or goal
53
Task success (2)• You can:
– Ask users to articulate the answer verbally– Ask users to provide an answer in a
structured way (e.g. using an online tool or paper)
– Use proxy measures (e.g. when the correct solution is not easily identifiable as it may depend from individual circumstances)
54
Task failure• There are many different ways in which a
participant might fail– Giving up: participants indicate that they would not
continue with the task if they were doing this on their own
– Moderator “calls” it because the participant is not making any progress
– Too long: the participant completed the task but not within a predefined time
– Wrong: participants thought that they completed the task successfully, but they actually did not
55
Binary success• Binary success is the simplest and most
common way of measuring task success– Each time users perform a task, they should
be given a “success” or “failure” score (0 or 1)– Users either completed a task successfully or
they didn’t– That's the case, for example. for e-commerce– Sometimes, you will should measure
perceived success rather than factual success (why?)
56
Binary success: analysing and presenting data• The most common way to analyse and
present binary success rates is by task.• This involves simply presenting the
percentage of participants who completed each task successfully
57
Binary success: analysing and presenting data by task
58
Binary success: presenting data by user• Frequency of use (infrequent users versus
frequent users)• Previous experience using the product• Domain expertise • Age group
59
Levels of success• Identifying levels of success is useful
when there are reasonable shades of grey associated with task success.
60
Levels of success: complete, partial and failure• Complete success
– With assistance– Without assistance
• Partial success– With assistance– Without assistance
• Failure– User thought it was complete, but it wasn’t– User gave up
61
Levels of success: analysing and presenting data
62
Time on task• In most situations, the faster a user can
complete a task, the better the experience– Time on task is particularly important for
products where tasks are performed repeatedly by the user.
63
Time on task: analysing and presenting data• The most common way is to look at the
average amount of time spent on any particular task– You should always report a confidence
interval to show the variability in the time data
64
Time on task: range and threshold• A variation is to create ranges and report
the frequency of users who fall into each interval
• Another useful way to analyse task time data is by using a threshold. – In many situations, the only thing that matters
is whether users can complete certain tasks within an acceptable amount of time.
65
Errors• Errors reflect the mistakes made during a
task. Errors can be useful in pointing out particularly confusing or misleading parts of an interface.
66
Errors (2)• Examples of errors include:
– Entering incorrect data into a form field – Making the wrong choice in a menu or drop-
down list – Taking an incorrect sequence of actions – Failing to take a key action
67
Errors: analysing and presenting data• The most common approach is to look at
average error rates per task or per participant
68
Efficiency• Efficiency can be assessed by examining
the amount of effort a user expends to complete a task, such as the number of clicks in a website or the number of button presses on a mobile phone.
69
Efficiency (2)• Efficiency typically measures the number
of actions or steps that users took in performing each task.– Identify the action(s) to be measured: for
websites, it's typically mouse clicks or page views
– Define the start and end of an action– Count the action
70
Learnability• Learnability is a way to measure how
performance improves or fails to improve over time.
71
Learnability (2)• Learnability is normally measured using
performance metrics: time on task, errors, number of steps, or task success per minute.
72
Learnability: analysing and presenting data• The most common way to analyse and
present learnability data is by examining a specific performance metric (such as time on task, number of steps, or number of errors) by trial for each task or aggregated
• across all tasks.
73
Learnability: analysing and presenting data (2)
74
References• Beck, E., Christiansen, M., Kjeldskov, J.,
Kolbe, N. and Stage, J. (2003). ‘Experimental Evaluation of Techniques for Usability Testing of Mobile Systems in a Laboratory Setting’, OzCHI 2003.
• Nielsen, J. and Loranger, H. (2006). Prioritizing Web Usability.
75