IAT 334 Experimental Evaluation
description
Transcript of IAT 334 Experimental Evaluation
IAT 334
Experimental Evaluation
______________________________________________________________________________________
SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT] | WWW.SIAT.SFU.CA
March 10, 2011 IAT 334 2
Evaluationg Evaluation stylesg Subjective data
– Questionnaires, Interviewsg Objective data
– Observing Users• Techniques, Recording
g Usability Specifications– Why, How
March 10, 2011 IAT 334 3
Our goal?
March 10, 2011 IAT 334 4
Evaluationg Earlier:
– Interpretive and Predictive• Heuristic evaluation, walkthroughs,
ethnography…
g Now:– User involved
• Usage observations, experiments, interviews...
March 10, 2011 IAT 334 5
Evaluation Formsg Summative
– After a system has been finished. Make judgments about final item.
g Formative– As project is forming. All through the
lifecycle. Early, continuous.
March 10, 2011 IAT 334 6
Evaluation Data Gatheringg Design the experiment to collect the
data to test the hypothesis to evaluate the interface to refine the design
g Information we gather about an interface can be subjective or objective
g Information also can be qualitative or quantitative– Which are tougher to measure?
March 10, 2011 IAT 334 7
Subjective Datag Satisfaction is an important factor in
performance over time
g Learning what people prefer is valuable data to gather
March 10, 2011 IAT 334 8
Methodsg Ways of gathering subjective data
– Questionnaires– Interviews– Booths (eg, trade show)– Call-in product hot-line– Field support workers
March 10, 2011 IAT 334 9
Questionnairesg Preparation is expensive, but
administration is cheapg Oral vs. written
– Oral advs: Can ask follow-up questions– Oral disadvs: Costly, time-consuming
g Forms can provide better quantitative data
March 10, 2011 IAT 334 10
Questionnairesg Issues
– Only as good as questions you ask– Establish purpose of questionnaire– Don’t ask things that you will not use– Who is your audience?– How do you deliver and collect
questionnaire?
March 10, 2011 IAT 334 11
Questionnaire Topicg Can gather demographic data and
data about the interface being studied
g Demographic data:– Age, gender– Task expertise– Motivation– Frequency of use– Education/literacy
March 10, 2011 IAT 334 12
Interface Datag Can gather data about
– screen– graphic design– terminology– capabilities– learning– overall impression– ...
March 10, 2011 IAT 334 13
Question Formatg Closed format
– Answer restricted to a set of choices
Characters on screen
hard to read easy to read 1 2 3 4 5 6 7
March 10, 2011 IAT 334 14
Closed Formatg Likert Scale
– Typical scale uses 5, 7 or 9 choices– Above that is hard to discern– Doing an odd number gives the neutral
choice in the middle
March 10, 2011 IAT 334 15
Closed Formatg Advantages
– Clarify alternatives– Easily quantifiable– Eliminates useless answers
g Disadvantages– Must cover whole range– All should be equally likely– Don’t get interesting, “different”
reactions
March 10, 2011 IAT 334 16
Issuesg Question specificity
– “Do you have a computer?”g Language
– Beware terminology, jargong Clarityg Leading questions
– Can be phrased either positive or negative
March 10, 2011 IAT 334 17
Issuesg Prestige bias
– People answer a certain way because they want you to think that way about them
g Embarrassing questionsg Hypothetical questionsg “Halo effect”
– When estimate of one feature affects estimate of another (eg, intelligence/looks)
March 10, 2011 IAT 334 18
Deploymentg Steps
– Discuss questions among team– Administer verbally/written to a few
people (pilot). Verbally query about thoughts on questions
– Administer final test
March 10, 2011 IAT 334 19
Open-ended Questionsg Asks for unprompted opinionsg Good for general, subjective
information, but difficult to analyze rigorously
g May help with design ideas– “Can you suggest improvements to this
interface?”
March 10, 2011 IAT 334 20
Ethicsg People can be sensitive about this process
and issuesg Make sure they know you are testing software, not them
g Attribution theory – Studies why people believe that they
succeeded or failed--themselves or outside factors (gender, age differences)
g Can quit anytime
March 10, 2011 IAT 334 21
Objective Datag Users interact with interface
– You observe, monitor, calculate, examine, measure, …
g Objective, scientific data gatheringg Comparison to interpretive/predictive
evaluation
March 10, 2011 IAT 334 22
Observing Usersg Not as easy as you think
g One of the best ways to gather feedback about your interface
g Watch, listen and learn as a person interacts with your system
March 10, 2011 IAT 334 23
Observationg Direct
– In same room– Can be intrusive– Users aware of
your presence– Only see it one
time– May use
semitransparent mirror to reduce intrusiveness
g Indirect– Video recording– Reduces intrusiveness, but doesn’t
eliminate it– Cameras focused on screen, face &
keyboard– Gives archival record, but can spend a
lot of time reviewing it
March 10, 2011 IAT 334 24
Locationg Observations may be
– In lab - Maybe a specially built usability lab• Easier to control• Can have user complete set of tasks
– In field• Watch their everyday actions• More realistic• Harder to control other factors
March 10, 2011 IAT 334 25
Challengeg In simple observation, you observe
actions but don’t know what’s going on in their head
g Often utilize some form of verbal protocol where users describe their thoughts
March 10, 2011 IAT 334 26
Verbal Protocolg One technique: Think-aloud
– User describes verbally what s/he is thinking and doing• What they believe is happening• Why they take an action• What they are trying to do
March 10, 2011 IAT 334 27
Think Aloudg Very widely used, useful techniqueg Allows you to understand user’s
thought processes better
g Potential problems:– Can be awkward for participant– Thinking aloud can modify way user
performs task
March 10, 2011 IAT 334 28
Teamsg Another technique: Co-discovery
learning– Join pairs of participants to work
together– Use think aloud– Perhaps have one person be semi-
expert (coach) and one be novice– More natural (like conversation) so
removes some awkwardness of individual think aloud
March 10, 2011 IAT 334 29
Alternativeg What if thinking aloud during session
will be too disruptive?g Can use post-event protocol
– User performs session, then watches video afterwards and describes what s/he was thinking
– Sometimes difficult to recall
March 10, 2011 IAT 334 30
Historical Recordg In observing users, how do you
capture events in the session for later analysis?
March 10, 2011 IAT 334 31
Capturing a Sessiong 1. Paper & pencil
– Can be slow– May miss things– Is definitely cheap and easy
Time 10:00 10:03 10:08 10:22
Task 1 Task 2 Task 3 …
Se
Se
March 10, 2011 IAT 334 32
Capturing a Sessiong 2. Audio tape
– Good for talk-aloud– Hard to tie to interface
g 3. Video tape– Multiple cameras probably needed– Good record– Can be intrusive
March 10, 2011 IAT 334 33
Capturing a Sessiong 4. Software logging
– Modify software to log user actions– Can give time-stamped key press or
mouse event– Two problems:
• Too low-level, want higher level events• Massive amount of data, need analysis tools
March 10, 2011 IAT 334 34
Assessing Usabilityg Usability Specifications
– Quantitative usability goals, used a guide for knowing when interface is “good enough”
– Should be established as early as possible in development process
March 10, 2011 IAT 334 35
Measurement Processg “If you can’t measure it, you can’t
manage it”
g Need to keep gathering data on each iterative refinement
March 10, 2011 IAT 334 36
What to Measure?g Usability attributes
– Initial performance– Long-term performance– Learnability– Retainability– Advanced feature usage– First impression– Long-term user satisfaction
March 10, 2011 IAT 334 37
How to Measure?g Benchmark Task
– Specific, clearly stated task for users to carry out
g Example: Calendar manager– “Schedule an appointment with Prof.
Smith for next Thursday at 3pm.”
g Users perform these under a variety of conditions and you measure performance
March 10, 2011 IAT 334 38
Assessment TechniqueUsability Measure Value to Current Worst Planned Best poss Observattribute instrument be measured level acc level target level level results
Initial Benchmk Length of 15 secs 30 secs 20 secs 10 secs perf task time to (manual) success add appt on first trial
First Quest -2..2 ?? 0 0.75 1.5impression
March 10, 2011 IAT 334 39
Summaryg Measuring Instrument
– Questionnaires– Benchmark tasks
March 10, 2011 IAT 334 40
Summaryg Value to be measured
– Time to complete task– Number of percentage of errors– Percent of task completed in given time– Ratio of successes to failures– Number of commands used– Frequency of help usage
March 10, 2011 IAT 334 41
Summaryg Target level
– Often established by comparison with competing system or non-computer based task
Ethicsg Testing can be arduousg Each participant should consent to
be in experiment (informal or formal)– Know what experiment involves, what to
expect, what the potential risks are g Must be able to stop without danger
or penaltyg All participants to be treated with
respectNov 2, 2009 IAT 334 42
Consentg Why important?
– People can be sensitive about this process and issues
– Errors will likely be made, participant may feel inadequate
– May be mentally or physically strenuousg What are the potential risks (there are
always risks)?– Examples?
g “Vulnerable” populations need special care & consideration (& IRB review)– Children; disabled; pregnant; students (why?)Nov 2, 2009 IAT 334 43
Before Studyg Be well prepared so participant’s time is
not wastedg Make sure they know you are testing
software, not them– (Usability testing, not User testing)
g Maintain privacyg Explain procedures without compromising
resultsg Can quit anytimeg Administer signed consent formNov 2, 2009 IAT 334 44
During Study Make sure participant is comfortable Session should not be too long Maintain relaxed atmosphere Never indicate displeasure or anger
Nov 2, 2009 IAT 334 45
After Study State how session will help you improve
system Show participant how to perform failed
tasks Don’t compromise privacy (never identify
people, only show videos with explicit permission)
Data to be stored anonymously, securely, and/or destroyed
Nov 2, 2009 IAT 334 46
March 10, 2011 IAT 334 47
One Model