Evaluation How do we test the interaction design? Several Dimensions –Qualitative vs. Quantitative...

39
Evaluation • How do we test the interaction design? • Several Dimensions – Qualitative vs. Quantitative assessments – Conceptual vs. Physical Design

Transcript of Evaluation How do we test the interaction design? Several Dimensions –Qualitative vs. Quantitative...

Evaluation

• How do we test the interaction design?

• Several Dimensions– Qualitative vs. Quantitative assessments– Conceptual vs. Physical Design

Why Evaluate

• Five good reasons:– Problems fixed before product is shipped– Team can concentrate on real (not imaginary)

problems– Engineers can develop code instead of debating

about their personal preferences– Time to market is sharply reduced– Solid, tested design will sell better

When to Evaluate

• Formative Evaluations– Conducted during requirements specification and

design– Consider alternatives

• Summative Evaluations– Assess the success of a finished product– Determine whether product satisfies requirements

What to Evaluate

• A huge variety of User Interaction features can (and should) be evaluated, such as:– Sequence of links in a web search

– Enjoyment experienced by game users

– System response time

– Signal detection performance in data analysis

• Gould’s principles:– Focus on users and their tasks

– Observe, measure, and analyze user performance

– Design iteratively

Qualitative Assessment

• Informal– Simply ask users how they like the system– Listen to “hallway” conversations about systems

• Formal– Develop survey instruments to ask specific

questions, e.g.• How long did it take you to become comfortable?

• Which task is the most difficult to accomplish?

– Hold focus group discussions about system

Quantitative Assessment

• Identify Usability Criteria (from Requirements) to test

• Design human performance experiments to test these, e.g.– Measure response time or time to complete a task– Measure error rate or incidence of “dead end”

• This can be used during the design process to compare alternative designs

An Evaluation Framework

• Evaluation must be an intentional, planned process– ad hoc evaluations are of very little value

• The details of the particular framework can vary from team to team

• What is important is that the framework be crafted in advance, and that all team members understand the framework

Evaluation Paradigms

• Evaluation Paradigms are the beliefs and practices (perhaps underpinned by theory) that guide a user study

• We’ll discuss four core evaluation paradigms:– Quick and Dirty evaluation– Usability Testing– Field Studies– Predictive Evaluation

Quick and Dirty

• Informal feedback from users

• Can be conducted at any stage

• Emphasis is on speed, not quality

• Often consultants are used as surrogate users

Usability Testing

• Measuring typical users’ performance on carefully prepared tasks that are typical for the system

• Metrics can include such things as– Error rate and time to completion– Observations/recordings/logs of interaction– Questionnaires

• Strongly controlled by the evaluator

What is usability?• An Operational Definition

– Efficient– Effective– Safe– Easy

• To learn• To remember• To use

– Productive• As well as

– Satisfying– Enjoyable– Pleasing– Motivating– Fulfilling

Field Studies

• Done in natural settings

• Try to learn what users do and how• Artifacts are collected

– Video, notes, sketches, &c

• Two approaches:– As an outsider looking on

• Qualitative techniques used to gather data

• Which may be analyzed qualitatively or quantitatively

– As an insider• Easier to capture role of social environment

Predictive Evaluation

• Uses models of typical users– Heuristic or theoretical

• Users themselves need not be present– Cheaper, faster

• Tried and true heuristics can be useful– E.g. speak the users’ language

Evaluation Techniques

• Observing users

• Asking users their opinions

• Asking experts their opinions

• Testing users’ performance

• Modeling users’ task performance to predict efficacy of the interface

Techniques vs. Paradigms

Models used to predict efficacy

N/AN/AN/AModeling user’s performance

N/ACan measure performance, but difficult

Test typical users, typical

tasks

N/AUser testing

Heuristics early in design

N/AN/AProvide critiques

Asking experts

N/AMay interviewQuestionnaires & interviews

Discussions, focus groups

Asking users

N/ACentral technique

Video and logging

See how users behave

Observing Users

PredictiveField StudiesUsability Testing

Quick and DirtyTechniques

Evaluation Paradigms

DECIDE

• Determine the overall goals that the evaluation addresses

• Explore the specific questions to be answered• Choose the evaluation paradigm and

techniques• Identify the practical issues that must be

addressed• Decide how to deal with the ethical issues• Evaluate, interpret, and present the data

Determine the goals

• What are the high-level goals of the evaluation?

• Who wants the evaluation and why?• Should guide the evaluation, i.e.:

– Check that evaluators understood users’ needs

– Identify the metaphor under the design

– Ensure that interface is consistent

– Investigate degree to which technology influences working practices

– Identify how the interface of an existing product could be engineered to improve its usability

Explore the questions

• This amounts to hierarchical question development:– “Is the user interface good?”

• “Is the system easy to learn?”– “Are functions easy to find?”

– “Is the terminology confusing?”

• “Is response time too slow?”– “Is login time too slow?”

– “Is calculation time too slow?”

– …

Choose the evaluation paradigm and techniques

• Choosing one or more evaluation paradigms– Can use different paradigms in different stages– Can use multiple paradigms in a single stage

• Combinations of techniques can be used to obtain different perspectives

Identify the practical issues

• Practical issues must be considered BEFORE beginning any evaluation– Users

• Adequate number of representative users must be found

– Facilities and equipment• How many cameras? Where? Film?

– Schedule and budget• Both always less than would be ideal

– Expertise• Assemble the correct evaluation team

Decide how to deal with ethical issues

• Experiments involving humans must be conducted within strict ethical guidelines– Tell participants the goals and what will happen

– Explain that personal information is confidential

– They’re free to stop at any time

– Pay subjects when possible: formal relationship

– Avoid using quotes that reveal identity

– Ask users’ permission to quote them, show them the report

• Example: Yale shock experiment 1961-2

Evaluate, interpret and present the data

• What data to collect, how to analyze them• Questions need to be asked

– Reliability: is it reproducible?– Validity: measures what it’s supposed to– Biases: biases cause distortion– Scope: how generalizable?– Ecological validity: how important is the

evaluation environment – does it match the real environment of interest?

Stopped Here

Observing Users

• Ethnography – observing the social environment and recording observations which help to understand the function and needs of the people in it

• Users can be observed in controlled laboratory conditions or in natural environments in which the products are used – i.e. the field

Goals, questions and paradigms

• Goals and questions should guide all evaluation studies

• Ideally, these are written down

• Goals help to guide the observation because there is always so much going on

What and when to observe

• Insider or outsider?

• Laboratory or field?– Control vs. realism

• What times are critical times (especially for field observations)?

Approaches to observation

• Quick and dirty– Informal

– Insider or outsider

• Observation in usability testing– Formal

– Video, interaction logs, performance data

– Outsider

• Observation in field studies– Outsider, participant, or ethnographer (participant or not?)

How to observe

• Techniques of observation and data gathering vary

In controlled environments

• Decide location– Temporary lab in user’s environment?– Remote laboratory?

• Equipment

• Hard to know what user is thinking– “Think Aloud” technique– But speaking can alter the interaction– Having two subjects work together can help

In the field

• Who is present?– What are their roles?

• What is happening?– Include body language, tone

• When does activity occur?

• Where is it happening?

• Why is it happening

• How is the activity organized?

Participant observation and ethnography

• In this case, the observer/evaluator must be accepted into the group

• Honesty about purpose is important both ethically and to gain trust

• Disagreement in the field about the distinction between ethnography and participant observation– Do ethnographers begin with any assumptions?

Data collection

• Notes plus still camera

• Audio recording plus still camera

• Video

Indirect observation: tracking users’ activities

• Diaries

• Interaction logging

Analyzing, interpreting, and presenting data

• Observation produces large quantities of data of various types

• How to analyze and interpret depends on the research questions first developed

Qualitative analysis to tell a story

• The ensemble of data (notes, video, diaries, &c) are used to help designers, as a team, understand the users

• There is much room for evaluator bias in these techniques

Qualitative analysis for categorization

• A taxonomy can be developed into which user’s behaviors can be placed

• This can be done by different observers, with the discrepancies used as a measure of observer bias

Quantitative data analysis

• Observations, interaction logs, and results are gathered and quantified– Counted– Measured

• Analysis using statistical reasoning can be used to draw conclusions– What is statistical significance?– What is a T-test?

Case Study: Evaluating a Data Representation

Feeding the findings back into design

• Ideally, the design team will participate in post-evaluation discussions of qualitative data

• Reports to designers should include artifacts, such as quotes, anecdotes, pictures, video clips

• Depending on the design team, quantitative data may or may not be compelling