Evaluation, cont’d

114
Evaluation, cont’d

description

Evaluation, cont’d. Two main types of evaluation. Formative evaluation is done at different stages of development to check that the product meets users’ needs. Summative evaluation assesses the quality of a finished product. Our focus is on formative evaluation. What to evaluate. - PowerPoint PPT Presentation

Transcript of Evaluation, cont’d

Page 1: Evaluation, cont’d

Evaluation, cont’d

Page 2: Evaluation, cont’d

Two main types of evaluation

Formative evaluation is done at different stages of development to check that the product meets users’ needs.

Summative evaluation assesses the quality of a finished product.

Our focus is on formative evaluation

Page 3: Evaluation, cont’d

Iterative design & evaluation is a continuous process that examines:

– Early ideas for conceptual model – Early prototypes of the new system– Later, more complete prototypes

Designers need to check that they understand users’ requirements.

What to evaluate

Page 4: Evaluation, cont’d

Tog says …

“Iterative design, with its repeating cycle of design and testing, is the only validated methodology in existence that will consistently produce successful results. If you don’t have user-testing as an integral part of your design process you are going to throw buckets of money down the drain.”

Page 5: Evaluation, cont’d

When to evaluate

Throughout design From the first descriptions, sketches

etc. of users needs through to the final product

Design proceeds through iterative cycles of ‘design-test-redesign’

Evaluation is a key ingredient for a successful design.

Page 6: Evaluation, cont’d

Another example - development of “HutchWorld”

Many informal meetings with patients, carers & medical staff early in design

Early prototype informally tested on site– Designers learned a lot

• language of designers & users was different• asynchronous communication was also needed

Redesigned to produce the portal version

Page 7: Evaluation, cont’d

Usability testing

– User tasks investigated:- how users’ identify was represented- communication- information searching- entertainment

User satisfaction questionnaire

Triangulation to get different perspectives

Page 8: Evaluation, cont’d

Findings from the usability test

• The back button didn’t always work

• Users didn’t pay attention to navigation buttons

• Users expected all objects in the 3-D view to be clickable.

• Users did not realize that there could be others in the 3-D world with whom to chat,

• Users tried to chat to the participant list.

Page 9: Evaluation, cont’d

Key points Evaluation & design are closely integrated in

user-centered design. Some of the same techniques are used in

evaluation & requirements but they are used differently (e.g., interviews & questionnaires)

Triangulation involves using a combination of techniques to gain different perspectives

Dealing with constraints is an important skill for evaluators to develop.

Page 10: Evaluation, cont’d

A case in point …

“The Butterfly Ballot: Anatomy of disaster”.See http://www.asktog.com/columns/042ButterflyBallot.html

Page 11: Evaluation, cont’d

An evaluation framework

Page 12: Evaluation, cont’d

The aims

Explain key evaluation concepts & terms.Describe the evaluation paradigms &

techniques used in interaction design.Discuss the conceptual, practical and

ethical issues that must be considered when planning evaluations.

Introduce the DECIDE framework.

Page 13: Evaluation, cont’d

Evaluation paradigm

Any kind of evaluation is guided explicitly or implicitly by a set of beliefs, which are often under-pinned by theory. These beliefs and the methods associated with them are known as an ‘evaluation paradigm’

Page 14: Evaluation, cont’d

User studies

User studies involve looking at how people behave in their natural environments, or in the laboratory, both with old technologies and with new ones.

Page 15: Evaluation, cont’d

Four evaluation paradigms

‘quick and dirty’ usability testing field studies predictive evaluation

Page 16: Evaluation, cont’d

Quick and dirty

‘quick & dirty’ evaluation describes the common practice in which designers informally get feedback from users or consultants to confirm that their ideas are in-line with users’ needs and are liked.

Quick & dirty evaluations are done any time. The emphasis is on fast input to the design

process rather than carefully documented findings.

Page 17: Evaluation, cont’d

Usability testing Usability testing involves recording typical users’

performance on typical tasks in controlled settings. Field observations may also be used.

As the users perform these tasks they are watched & recorded on video & their key presses are logged.

This data is used to calculate performance times, identify errors & help explain why the users did what they did.

User satisfaction questionnaires & interviews are used to elicit users’ opinions.

Page 18: Evaluation, cont’d

Field studies

Field studies are done in natural settings The aim is to understand what users do

naturally and how technology impacts them. In product design field studies can be used

to:- identify opportunities for new technology- determine design requirements - decide how best to introduce new technology- evaluate technology in use.

Page 19: Evaluation, cont’d

Predictive evaluation

Experts apply their knowledge of typical users, often guided by heuristics, to predict usability problems.

Another approach involves theoretically based models.

A key feature of predictive evaluation is that users need not be present

Relatively quick & inexpensive

Page 20: Evaluation, cont’d

Overview of techniques

observing users, asking users’ their opinions, asking experts’ their opinions, testing users’ performance modeling users’ task performance

Page 21: Evaluation, cont’d

DECIDE: A framework to guide evaluation Determine the goals the evaluation addresses. Explore the specific questions to be answered. Choose the evaluation paradigm and

techniques to answer the questions. Identify the practical issues. Decide how to deal with the ethical issues. Evaluate, interpret and present the data.

Page 22: Evaluation, cont’d

Determine the goals

What are the high-level goals of the evaluation?

Who wants it and why?

The goals influence the paradigm for the study

Some examples of goals: Identify the best metaphor on which to base the design. Check to ensure that the final interface is consistent. Investigate how technology affects working practices. Improve the usability of an existing product .

Page 23: Evaluation, cont’d

Explore the questions

All evaluations need goals & questions to guide them so time is not wasted on ill-defined studies.

For example, the goal of finding out why many customers prefer to purchase paper airline tickets rather than e-tickets can be broken down into sub-questions:- What are customers’ attitudes to these new tickets? - Are they concerned about security?- Is the interface for obtaining them poor?

What questions might you ask about the design of a cell phone?

Page 24: Evaluation, cont’d

Choose the evaluation paradigm & techniques The evaluation paradigm strongly

influences the techniques used, how data is analyzed and presented.

E.g. field studies do not involve testing or modeling

Page 25: Evaluation, cont’d

Identify practical issues

For example, how to:

• select users

• stay on budget

• staying on schedule

• find evaluators

• select equipment

Page 26: Evaluation, cont’d

Decide on ethical issues

Develop an informed consent form

Participants have a right to:- know the goals of the study- know what will happen to the findings- privacy of personal information- not to be quoted without their agreement - leave when they wish - be treated politely

Page 27: Evaluation, cont’d

Evaluate, interpret & present data

How data is analyzed & presented depends on the paradigm and techniques used.

The following also need to be considered:- Reliability: can the study be replicated?- Validity: is it measuring what you thought?- Biases: is the process creating biases?- Scope: can the findings be generalized?- Ecological validity: is the environment of the study influencing it - e.g. Hawthorne effect

Page 28: Evaluation, cont’d

Pilot studies

A small trial run of the main study. The aim is to make sure your plan is viable. Pilot studies check:

- that you can conduct the procedure- that interview scripts, questionnaires, experiments, etc. work appropriately

It’s worth doing several to iron out problems before doing the main study.

Ask colleagues if you can’t spare real users.

Page 29: Evaluation, cont’d

Key points An evaluation paradigm is an approach that is influenced by

particular theories and philosophies.

Five categories of techniques were identified: observing users, asking users, asking experts, user testing, modeling users.

The DECIDE framework has six parts: - Determine the overall goals- Explore the questions that satisfy the goals- Choose the paradigm and techniques- Identify the practical issues- Decide on the ethical issues- Evaluate ways to analyze & present data

Do a pilot study

Page 30: Evaluation, cont’d

Observing users

Page 31: Evaluation, cont’d

The aims Discuss the benefits & challenges of different

types of observation. Describe how to observe as an on-looker, a

participant, & an ethnographer. Discuss how to collect, analyze & present

observational data. Examine think-aloud, diary studies & logging. Provide you with experience in doing observation

and critiquing observation studies.

Page 32: Evaluation, cont’d

What and when to observe

Goals & questions determine the paradigms and techniques used.

Observation is valuable any time during design. Quick & dirty observations early in design Observation can be done in the field (i.e., field

studies) and in controlled environments (i.e., usability studies)

Observers can be:- outsiders looking on- participants, i.e., participant observers- ethnographers

Page 33: Evaluation, cont’d

Frameworks to guide observation

- The person. Who? - The place. Where?- The thing. What?

The Goetz and LeCompte (1984) framework:- Who is present? - What is their role? - What is happening? - When does the activity occur?- Where is it happening? - Why is it happening? - How is the activity organized?

Page 34: Evaluation, cont’d

The Robinson (1993) framework

Space. What is the physical space like? Actors. Who is involved? Activities. What are they doing? Objects. What objects are present? Acts. What are individuals doing? Events. What kind of event is it? Goals. What do they to accomplish? Feelings. What is the mood of the group and of

individuals?

Page 35: Evaluation, cont’d

You need to consider

Goals & questions Which framework & techniques How to collect data Which equipment to use How to gain acceptance How to handle sensitive issues Whether and how to involve informants How to analyze the data Whether to triangulate

Page 36: Evaluation, cont’d

Observing as an outsider

As in usability testing More objective than participant observation In usability lab equipment is in place Recording is continuous Analysis & observation almost simultaneous Care needed to avoid drowning in data Analysis can be coarse or fine grained Video clips can be powerful for telling story

Page 37: Evaluation, cont’d

Participant observation & ethnography Debate about differences Participant observation is key component of

ethnography Must get co-operation of people observed Informants are useful Data analysis is continuous Interpretivist technique Questions get refined as understanding grows Reports usually contain examples

Page 38: Evaluation, cont’d

Data collection techniques

Notes & still camera Audio & still camera Video Tracking users:

- diaries- interaction logging

Page 39: Evaluation, cont’d

Data analysis

Qualitative data - interpreted & used to tell the ‘story’ about what was observed.

Qualitative data - categorized using techniques such as content analysis.

Quantitative data - collected from interaction & video logs. Presented as values, tables, charts, graphs and treated statistically.

Page 40: Evaluation, cont’d

Interpretive data analysis

Look for key events that drive the group’s activity Look for patterns of behavior Test data sources against each other - triangulate Report findings in a convincing and honest way Produce ‘rich’ or ‘thick descriptions’ Include quotes, pictures, and anecdotes Software tools can be useful e.g., NUDIST,

Ethnograph (URLs will be provided)

Page 41: Evaluation, cont’d

Looking for patterns

Critical incident analysis Content analysis Discourse analysis Quantitative analysis - i.e., statistics

Page 42: Evaluation, cont’d

Key points

Observe from outside or as a participant Analyzing video and data logs can be time-

consuming. In participant observation collections of

comments, incidents, and artifacts are made. Ethnography is a philosophy with a set of techniques that include participant observation and interviews.

Ethnographers immerse themselves in the culture that they study.

Page 43: Evaluation, cont’d

Asking users & experts

Page 44: Evaluation, cont’d

The aims

Discuss the role of interviews & questionnaires in evaluation.

Teach basic questionnaire design. Describe how do interviews, heuristic

evaluation & walkthroughs. Describe how to collect, analyze &

present data. Discuss strengths & limitations of these

techniques

Page 45: Evaluation, cont’d

Interviews

Unstructured - are not directed by a script. Rich but not replicable.

Structured - are tightly scripted, often like a questionnaire. Replicable but may lack richness.

Semi-structured - guided by a script but interesting issues can be explored in more depth. Can provide a good balance between richness and replicability.

Page 46: Evaluation, cont’d

Basics of interviewing

Remember the DECIDE framework Goals and questions guide all interviews Two types of questions:

‘closed questions’ have a predetermined answer format, e.g., ‘yes’ or ‘no’‘open questions’ do not have a predetermined format

Closed questions are quicker and easier to analyze

Page 47: Evaluation, cont’d

Things to avoid when preparing interview questions

Long questions Compound sentences - split into two Jargon & language that the interviewee may

not understand Leading questions that make assumptions

e.g., why do you like …? Unconscious biases e.g., gender stereotypes

Page 48: Evaluation, cont’d

Components of an interview

Introduction - introduce yourself, explain the goals of the interview, reassure about the ethical issues, ask to record, present an informed consent form.

Warm-up - make first questions easy & non-threatening.

Main body – present questions in a logical order

A cool-off period - include a few easy questions to defuse tension at the end

Closure - thank interviewee, signal the end, e.g, switch recorder off.

Page 49: Evaluation, cont’d

The interview process

Use the DECIDE framework for guidance Dress in a similar way to participants Check recording equipment in advance Devise a system for coding names of

participants to preserve confidentiality. Be pleasant Ask participants to complete an informed

consent form

Page 50: Evaluation, cont’d

Probes and prompts

Probes - devices for getting more information.e.g., ‘would you like to add anything?’

Prompts - devices to help interviewee, e.g., help with remembering a name

Remember that probing and prompting should not create bias.

Too much can encourage participants to try to guess the answer.

Page 51: Evaluation, cont’d

Group interviews Also known as ‘focus groups’ Typically 3-10 participants Provide a diverse range of opinions Need to be managed to:

- ensure everyone contributes- discussion isn’t dominated by one person- the agenda of topics is covered

Page 52: Evaluation, cont’d

Analyzing interview data Depends on the type of interview Structured interviews can be

analyzed like questionnaires Unstructured interviews generate

data like that from participant observation

It is best to analyze unstructured interviews as soon as possible to identify topics and themes from the data

Page 53: Evaluation, cont’d

Questionnaires Questions can be closed or open Closed questions are easiest to analyze, and

may be done by computer Can be administered to large populations Paper, email & the web used for

dissemination Advantage of electronic questionnaires is that

data goes into a data base & is easy to analyze

Sampling can be a problem when the size of a population is unknown as is common online

Page 54: Evaluation, cont’d

Questionnaire style Varies according to goal so use the DECIDE

framework for guidance Questionnaire format can include:

- ‘yes’, ‘no’ checkboxes- checkboxes that offer many options- Likert rating scales- semantic scales- open-ended responses

Likert scales have a range of points 3, 5, 7 & 9 point scales are common Debate about which is best

Page 55: Evaluation, cont’d

Developing a questionnaire Provide a clear statement of purpose &

guarantee participants anonymity Plan questions - if developing a web-based

questionnaire, design off-line first Decide on whether phrases will all be

positive, all negative or mixed Pilot test questions - are they clear, is there

sufficient space for responses Decide how data will be analyzed & consult a

statistician if necessary

Page 56: Evaluation, cont’d

Encouraging a good response

Make sure purpose of study is clear Promise anonymity Ensure questionnaire is well designed Offer a short version for those who do not

have time to complete a long questionnaire If mailed, include a s.a.e. Follow-up with emails, phone calls, letters Provide an incentive 40% response rate is high, 20% is often

acceptable

Page 57: Evaluation, cont’d

Advantages of online questionnaires

Responses are usually received quickly No copying and postage costs Data can be collected in database for

analysis Time required for data analysis is reduced Errors can be corrected easily Disadvantage - sampling problematic if

population size unknown Disadvantage - preventing individuals from

responding more than once

Page 58: Evaluation, cont’d

Problems with online questionnaires

Sampling is problematic if population size is unknown

Preventing individuals from responding more than once

Individuals have also been known to change questions in email questionnaires

Page 59: Evaluation, cont’d

Questionnaire data analysis & presentation

Present results clearly - tables may help Simple statistics can say a lot, e.g., mean,

median, mode, standard deviation Percentages are useful but give population

size Bar graphs show categorical data well More advanced statistics can be used if

needed

Page 60: Evaluation, cont’d

Well-known forms

SUMI MUMMS QUIS -- see Perlman site

Page 61: Evaluation, cont’d

Asking experts

Experts use their knowledge of users & technology to review software usability

Expert critiques (crits) can be formal or informal reports

Heuristic evaluation is a review guided by a set of heuristics

Walkthroughs involve stepping through a pre-planned scenario noting potential problems

Page 62: Evaluation, cont’d

Heuristic evaluation

Developed by Jakob Nielsen in the early 1990s

Based on heuristics distilled from an empirical analysis of 249 usability problems

These heuristics have been revised for current technology, e.g., HOMERUN for web

Heuristics still needed for mobile devices, wearables, virtual worlds, etc.

Design guidelines form a basis for developing heuristics

Page 63: Evaluation, cont’d

Nielsen’s heuristics Visibility of system status Match between system and real world User control and freedom Consistency and standards Help users recognize, diagnose, recover from

errors Error prevention Recognition rather than recall Flexibility and efficiency of use Aesthetic and minimalist design Help and documentation

Page 64: Evaluation, cont’d

Discount evaluation

Heuristic evaluation is referred to as discount evaluation when 5 evaluators are used.

Empirical evidence suggests that on average 5 evaluators identify 75-80% of usability problems.

Page 65: Evaluation, cont’d

3 stages for doing heuristic evaluation

Briefing session to tell experts what to do Evaluation period of 1-2 hours in which:

- Each expert works separately- Take one pass to get a feel for the product- Take a second pass to focus on specific features

Debriefing session in which experts work together to prioritize problems

Page 66: Evaluation, cont’d

Advantages and problems

Few ethical & practical issues to consider Can be difficult & expensive to find

experts Best experts have knowledge of

application domain & users Biggest problems

- important problems may get missed- many trivial problems are often identified

Page 67: Evaluation, cont’d

Cognitive walkthroughs

Focus on ease of learning Designer presents an aspect of the

design & usage scenarios One of more experts walk through the

design prototype with the scenario Expert is told the assumptions about

user population, context of use, task details

Experts are guided by 3 questions

Page 68: Evaluation, cont’d

The 3 questions Will the correct action be sufficiently evident

to the user? Will the user notice that the correct action is

available? Will the user associate and interpret the

response from the action correctly?

As the experts work through the scenario they note problems

Page 69: Evaluation, cont’d

Pluralistic walkthrough Variation on the cognitive walkthrough theme Performed by a carefully managed team The panel of experts begins by working

separately Then there is managed discussion that leads

to agreed decisions The approach lends itself well to participatory

design

Page 70: Evaluation, cont’d

Key points Structured, unstructured, semi-structured

interviews, focus groups & questionnaires Closed questions are easiest to analyze & can

be replicated Open questions are richer Check boxes, Likert & semantic scales Expert evaluation: heuristic & walkthroughs Relatively inexpensive because no users Heuristic evaluation relatively easy to learn May miss key problems & identify false ones

Page 71: Evaluation, cont’d

A project for you …

Activeworlds.com Questionnaire to test reactions with friends http://

www.acm.org/~perlman/question.html http://www.ifsm.umbc.edu/djenni1/osg/ Develop heuristics to evaluate usability

and sociability aspects

Page 72: Evaluation, cont’d

A project for you …

http://www.id-book.com/catherb/

-provides heuristics and a template so that you can evaluate different kinds of systems. More information about this is provided in the interactivities section of

the id-book.com website.

Page 73: Evaluation, cont’d

A project for you …

Go to the The Pew Internet & American Life Survey www.pewinternet.org/ (or to another survey of your choice)

Critique one of the recent online surveys

Critique a recent survey report

Page 74: Evaluation, cont’d

Interpretive Evaluation

Contextual inquiry Cooperative and participative evaluation Ethnography

rather than emphasizing statement of goals, objective tests, research reports, instead emphasizes usefulness of findings to the people concerned

good for feasibility study, design feedback, post-implementation review

Page 75: Evaluation, cont’d

Contextual Inquiry

Users and researchers participate to identify and understand usability problems within the normal working environment of the user

Differences from other methods include:– work context -- larger tasks– time context -- longer times– motivational context -- more user control– social context -- social support included that is

normally lacking in experiments

Page 76: Evaluation, cont’d

Why use contextual inquiry?

Usability issues located that go undetected in laboratory testing.– Line counting in word processing– unpacking and setting up equipment

Issues identified by users or by user/evaluator

Page 77: Evaluation, cont’d

Contextual interview: topics of interest Structure and language used in work individual and group actions and

intentions culture affecting the work explicit and implicit aspects of the work

Page 78: Evaluation, cont’d

Cooperative evaluation

A technique to improve a user interface specification by detecting the possible usability problems in an early prototype or partial simulation

low cost, little training needed think aloud protocols collected during

evaluation

Page 79: Evaluation, cont’d

Cooperative Evaluation

Typical user(s) recruited representative tasks selected user verbalizes problems/ evaluator

makes notes debriefing sessions held Summarize and report back to design

team

Page 80: Evaluation, cont’d

Participative Evaluation

More open than cooperative evaluation subject to greater control by users cooperative prototyping, facilitated by

– focus groups– designers work with users to prepare

prototypes– stable prototypes provided, users evaluate– tight feedback loop with designers

Page 81: Evaluation, cont’d

Ethnography

Standard practice in anthropology Researchers strive to immerse themselves in

the situation they want to learn about Goal: understand the ‘real’ work situation typically applies video - videos viewed,

reviewed, logged, analyzed, collections made, often placed in databases, retrieved, visualized ….

Page 82: Evaluation, cont’d

Predictive Evaluation

Predict aspects of usage rather than observe and measure

doesn’t involve users cheaper

Page 83: Evaluation, cont’d

Predictive Evaluation Methods

Inspection Methods– Standards inspections– Consistency inspection– Heuristic evaluation– “Discount” usability evaluation– Walkthroughs

Modelling:The keystroke level model

Page 84: Evaluation, cont’d

Standards inspections

Standards experts inspect the interface for compliance with specified standards

relatively little task knowledge required

Page 85: Evaluation, cont’d

Consistency inspections

Teams of designers inspect a set of interfaces for a family of products– usually one designer from each project

Page 86: Evaluation, cont’d

Usage simulations

Aka - “expert review”, “expert simulation”

Experts simulate behavior of less-experienced users, try to anticipate usability problems

more efficient than user trials prescriptive feedback

Page 87: Evaluation, cont’d

Heuristic evaluation

Usage simulation in which system is evaluted against list of “heuristics”, e.g.

Two passes: per screen, and flow from screeen to screen

Study: 5 evaluators found 75% of problems

Page 88: Evaluation, cont’d

Sample heuristics

Use simple and natural dialogue speak the user’s language minimize user memory load be consistent provide feedback provide clearly marked exits provide shortcuts provide good error messages prevent errors

Page 89: Evaluation, cont’d

Discount usability engineering

Phase 1: usability testing + scenario construction (1-3 users)

Phase 2: scenarios refined + heuristic evaluation

“Discount” features– small scenarios, paper mockups– informal think-aloud (no psychologists)– Scenarios + think-aloud + heuristic evaluation– small number of heuristics (see previous slide)– 2-3 testers sufficient

Page 90: Evaluation, cont’d

Walkthroughs

Goal - detect problems early on; remove construct carefully designed tasks from a

system specification or screen mockup walk-through the activities required, predict

how users would likely behave, determine problems they will encounter

-- see checklist for cognitive walkthrough

Page 91: Evaluation, cont’d

Modeling: keystroke level model

Goal: calculate task performance times for experienced users

Requires– specification of system functionality– task analysis, breakdown of each task into

its components

Page 92: Evaluation, cont’d

Keystroke-level modeling

Time to execute sum of:– Tk - keystroking (0.35 sec)– Tp - pointing (1.10)– Td - drawing (problem-

dependent)– Tm - mental (1.35)– Th - homing (0.4)– Tr - system response (1.2)

Page 93: Evaluation, cont’d

KLM: example

Save file with new name in wp that uses mouse and pulldown menus

(1) initial homing: (Th) (2) move cursor to file menu at top of screen(Tp + Tm

) (3) select ‘save as’ in file menu(click on file menu,

move down file menu, click on ‘save as’) (Tm + Tk + Tp +Tk)

(4) word processor prompts for new file name, user types filename (Tr + Tm + Tk(filename) + Tk)

Page 94: Evaluation, cont’d

Experiments and Benchmarking

Traditional experiments Usability Engineering

Page 95: Evaluation, cont’d

Traditional Experiments

Typically narrowly defined, evaluate particular aspects such as:– menu depth v. context– icon design– tickers v. fade_boxes v. replace_boxes

Usually not practical to include in design process

Page 96: Evaluation, cont’d

Example: Star Workstation,text selection Goal: evaluate methods for selecting

text, using 1-3 mouse buttons Operations:

– Point (between characters, target of move,copy, or insert)

– Select text (character, word, sentence, par, doc)

– Extend selection to include more text

Page 97: Evaluation, cont’d

Selection Schemes

A B C D E F G

Button1

Point Point Point

C

Drwthru

Point

C, W, S, P, D

Drwthru

Point

C, W, S, P, D, Drwthru

Point

C

Dthru

Point

C, W, S, P, D

Button2

C

Drwthru

C, W, S, P, D

Drwthru

W, S, P, D

Drwthru

Adjust Adjust Adjust

Button3

W, S, P, D

Drwthru

Page 98: Evaluation, cont’d

Methodology

Between-subjects paradigm six groups, 4 subjects per group in each group: 2 experienced w/mouse, 2 not each subject first trained in use of mouse and

in editing techniques in Star w.p. system Assigned scheme taught Each subject performs 10 text-editing tasks, 6

times each

Page 99: Evaluation, cont’d

Results: selection time

Time:

Scheme A :12.25 sScheme B: 15.19 sScheme C: 13.41 sScheme D: 13.44 sScheme E: 12.85 sScheme F: 9.89 s (p < 0.001)

Page 100: Evaluation, cont’d

Results: Selection Errors

Average: 1 selection error per four tasks 65% of errors were drawthrough errrors,

same across all selection schemes 20% of errors were “too many clicks” ,

schemes with less clicking better 15% of errors were ‘click wrong mouse

button”, schemes with fewer buttons better

Page 101: Evaluation, cont’d

Selection scheme: test 2

Results of test 1 lead to conclusion to avoid:– drawthroughs– three buttons– multiple clicking

Scheme “G” introduced -- avoids drawthrough, uses only 2 buttons

New test, but test groups were 3:1 experienced w/mouse to not

Page 102: Evaluation, cont’d

Results of test 2

Mean selection time: 7.96s for scheme G, frequency of “too many clicks” stayed about the same

Conclusion: scheme G acceptable– selection time shorter– advantage of quick selection balances

moderate error rate of multi-clicking

Page 103: Evaluation, cont’d

Experimental design - concerns

What to change? What to keep constant? What to measure?

Hypothesis, stated in a way that can be tested.

Statistical tests: which ones, why?

Page 104: Evaluation, cont’d

Variables

Independent variable - the one the experimenter manipulates (input)

Dependent variable - affected by the independent varialbe (output)

experimental effect - changes in dependent caused by changes in independent

confounded -- when dependent changes because of other variables (task order, learning, fatigue, etc.)

Page 105: Evaluation, cont’d

Selecting subjects - avoiding bias

Age bias -- Cover target age range Gender bias -- equal numbers of

male/female Experience bias -- similar level of

experience with computers etc. ...

Page 106: Evaluation, cont’d

Experimental Designs

Independent subject design– single group of subjects allocated randomly to each of the

experimental conditions

Matched subject design– subjects matched in pairs, pairs allocated randomly to each

of the experimental conditions

Repeated measures design– all subjects appear in all experimental conditions– Concerns: order of tasks, learning effects

Single subject design– in-depth experiments on just one subject

Page 107: Evaluation, cont’d

Critical review of experimental procedure User preparation

– adequate instructions and training? Impact of variables

– how do changes in independent variables affect users

Structure of the tasks– were tasks complex enough, did users know aim?

Time taken– fatigue or boredom?

Page 108: Evaluation, cont’d

Critical review of experimental results Size of effect

– statistically signficant? Practically significant? Alternative interpretations

– other possible causes for results found? Consistency between dependent variables

– task completion and error scores versus user preferences and learning scores

Generalization of results– to other tasks,users, working environments?

Page 109: Evaluation, cont’d

Usability Engineering

Usability of product specified quantitatively, and in advance

As product is built, it can be demonstrated that it does or does not reach required levels of usability

Page 110: Evaluation, cont’d

Usability Engineering

Define usability goals through metrics Set planned levels of usability that need to

be achieved Analyze the impact of various design

solutions Incorporate user-defined feedback in

product design Iterate through design-evaluate-design

loop until planned levels are achieved

Page 111: Evaluation, cont’d

Metrics

Include:– time to complete a particular task– number of errors– attitude ratings by users

Page 112: Evaluation, cont’d

Metrics - example, conferencing system

Attribute Measuring

ConceptMeasuring

Method

Worst case Planned level Best case Now level

Initial use Conferencing task

successful interxns / 30 min

1-2 3-4 8-10 ?

Infreq. Use Tasks after 1-2 weeks disuse

% of errors Equal to product Z

50% better 0 errors ?

Learning rate Task 1st half vs. 2nd half score

Two halves equal

Second half better

‘much’ better ?

Preference over prod. Z

Questionnaire

Score

Ratio of scores

Same as Z None prefer Z ?

Pref over product A

Questionnaire

Score

Ratio of scores

Same as Q None prefer Q

?

Error recovery

Critical incident analysis

% incidents accounted for

10% 50% 100% ?

Initial evaluation

Attitude questionnaire

Semantic differential score

0 (neutral) 1(somewhat positive)

2 (highly positive)

?

Casual eval. Attitude questionnaire

Semantic differential score

0 (neutral) 1(somewhat positive)

2 (highly positive)

?

Mastery eval Attitude questionnaire

Semantic differential score

0 (neutral) 1(somewhat positive)

2 (highly positive)

?

Page 113: Evaluation, cont’d

Benchmark tasks

Carefully constructed standard tests used to monitor users’ performance in usability testing

typically use multiple videos, keyboard logging

controlled testing -- specified set of users, well-specified tasks, controlled environment

tasks longer than scientific experiments, shorter than “real life”

Page 114: Evaluation, cont’d

Making tradeoffs

impact analysis - used to establish priorities among usability attributes. It is a listing of attributes and proposed design decisions, and % impact of each.

Usability engineering reported to produce a measurable improvement in usability of about 30%.