HCI460: Week 10 Lecture November 11, 2009. 2 Project 3: Questions? Recap: Step-by-step stats for...

110
HCI460: Week 10 Lecture HCI460: Week 10 Lecture November 11, 2009 November 11, 2009

Transcript of HCI460: Week 10 Lecture November 11, 2009. 2 Project 3: Questions? Recap: Step-by-step stats for...

Page 1: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

HCI460: Week 10 LectureHCI460: Week 10 LectureNovember 11, 2009November 11, 2009

Page 2: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

2

Project 3: Questions?

Recap: Step-by-step stats for A-B comparisons– For within-subjects design – For between-subjects design

Summative testing– Case study & exercise: Pick best-in-class mp3 player

Other usability evaluation methods– Longitudinal– Focus groups– Remote unmoderated

Evaluating expert user interfaces

Review of Project 2

Grades

Outline

Page 3: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

3

Project 3

Page 4: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

4

Deliverable: – Final Report only

Due date: – In-class students: Nov 13th, 11:59pm (Friday)– Distance learning: Nov 17th, 11:59pm (Tuesday)

Report for Project 3Project 3

Page 5: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

5

Executive Summary– Study background: what, when, how etc.– Summary of findings

Introduction (incl. objectives)

Method– Participants– Materials (or Stimuli)– Procedure

Findings– Quantitative results (with statistics) + “story” (i.e., interpretation

of what the results mean)

Recommendations (if applicable)

Report SectionsProject 3

Page 6: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

6

This is a good time to ask.

Questions about Project 3?Project 3

Page 7: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

7

Recap: Step-by-Step Stats for A-B Comparisons

Page 8: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

8

SummaryStep-by-Step Stats Recap

For time on task, ratings, and number of errors data:– If you have a between-subjects design, use an unpaired t-test

(aka “independent samples t-test”)• DF (degrees of freedom) = sum of the participants minus 2

– If you have a within-subjects design, use a paired t-test (aka “dependent samples t-test” or “repeated measures t-test”)

• DF (degrees of freedom) = sum of all participants minus 1

For task completion (success/fail) data:– Use Fisher’s Exact Test

Page 9: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

9

Recap: Step-by-Step Stats for A-B ComparisonsWithin-Subjects Design

Page 10: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

10

New vs. old package inserts

Search tasks, e.g.:– Question: How many drops

of control are used in the sample cup?

– Correct answer: 3 drops

Study design– Within-subjects (all

participants saw both versions)

Measures:– Task completion (success/fail)– Time on task (per task)– Ease of use ratings (per task)

Within-Subjects Design: Study BackgroundStep-by-Step Stats Recap

Old New

Page 11: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

11

Within-Subjects Design: The DataStep-by-Step Stats Recap

These findings now need to make it to your report.

What are your next steps?

Page 12: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

12

Within-Subjects Design: Next StepsStep-by-Step Stats Recap

1. ?

2. ?

3. ?

4. ?

5. ?

6. ?

7. ?

8. ?

9. ?

10. ?

Form groups of 2 – 4 people

Write down your steps (plan of action) – “What are you going to do with these findings before you hand them in to the stakeholders (in a report)?”

No need to compute anything at this point

You have 5 - 7 minutes

Form groups of 2 – 4 people

Write down your steps (plan of action) – “What are you going to do with these findings before you hand them in to the stakeholders (in a report)?”

No need to compute anything at this point

You have 5 - 7 minutes

Page 13: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

13

Within-Subjects Design: Step 1Step-by-Step Stats Recap

1. Calculate the averages to get a sense for what the data is telling you.– But…

Page 14: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

14

Within-Subjects Design: Step 1Step-by-Step Stats Recap

1. Calculate the averages to get a sense for what the data is telling you.– But…

Page 15: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

15

Within-Subjects Design: Step 2Step-by-Step Stats Recap

2. Delete time on task and ease of use data for participants who failed.– If within-

subjects design, delete all data for participants who failed A and/or B (at least one).[If between-subjects design, keep all correct data]

Page 16: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

16

Within-Subjects Design: Steps 3 and 4Step-by-Step Stats Recap

3. Open Stats Usability Pak spreadsheet (or another statistical package).

4. Decide on the alpha level (before running any statistics).– .01?– .05? most

common– .1?

Page 17: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

17

Within-Subjects Design: Step 5Step-by-Step Stats Recap

5. Select an appropriate test for task completion (success rate).

Page 18: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

18

Within-Subjects Design: Step 5Step-by-Step Stats Recap

5. Select an appropriate test for task completion (success rate).– Run Fisher’s Exact Test.

Page 19: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

19

Within-Subjects Design: Step 5Step-by-Step Stats Recap

5. Select an appropriate test for task completion (success rate).– Run Fisher’s Exact Test.– Write down the important number(s).

Page 20: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

20

Within-Subjects Design: Step 6Step-by-Step Stats Recap

6. Select an appropriate test for the time on task measure.

Page 21: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

21

Within-Subjects Design: Step 6Step-by-Step Stats Recap

6. Select an appropriate test for the time on task measure.– Run a paired t-test.

Page 22: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

22

Within-Subjects Design: Step 6Step-by-Step Stats Recap

6. Select an appropriate test for the time on task measure.– Run a paired t-test.– Write down the important number(s).

Why is the sample size only 17?

Why is the sample size only 17?

Why do we need this number?

Why do we need this number?

Degrees of freedom!

DF for within-subjects design = n – 1

DF = 17 – 1 = 16

Degrees of freedom!

DF for within-subjects design = n – 1

DF = 17 – 1 = 16

Page 23: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

23

Within-Subjects Design: Step 7Step-by-Step Stats Recap

7. Select an appropriate test for the ease of use ratings.

Page 24: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

24

Within-Subjects Design: Step 7Step-by-Step Stats Recap

7. Select an appropriate test for the ease of use ratings.– Run a paired t-test.

Page 25: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

7. Select an appropriate test for the ease of use ratings.– Run a paired t-test.– Write down the important number(s).

25

Within-Subjects Design: Step 7Step-by-Step Stats Recap

DF = 16DF = 16

Page 26: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

26

Within-Subjects Design: Step 8Step-by-Step Stats Recap

8. Describe your findings for task completion (success/fail).– We conducted Fisher’s Exact test to compare the task

completion rate with the old insert (69%) to the task completion rate with the new insert (86%) and found no statistically significant difference at alpha level .05.

But what if there was a difference?• We conducted Fisher’s Exact

test to compare the task completion rate with the old insert (69%) to the task completion rate with the new insert (93%). The completion rates were significantly different (p < .05), such that more participants successfully completed the task when using the new insert than when using the old insert.

Page 27: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

27

Within-Subjects Design: Step 9Step-by-Step Stats Recap

9. Describe your findings for time on task.– We conducted a paired t-test to compare the time on task

between the two insert versions. We found a significant difference (t(16) = 2.19, p < .05), such that participants took longer to complete the task when using the old insert (M = 33.1 s, SD = 12.8 s) than when using the new insert (M = 23.6 s, SD = 12.5 s).

Page 28: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

28

Within-Subjects Design: Step 10Step-by-Step Stats Recap

10. Describe your findings for ease of use ratings.– We also conducted a paired t-test to compare the ease of use

ratings participants assigned to the two insert versions. There was no significant difference between the ease of use ratings for the old insert (M = 5.6, SD = 1.2) and the new insert (M = 6.0, SD = 0.9) at alpha level .05.

Page 29: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

29

Recap: Step-by-Step Stats for A-B ComparisonsBetween-Subjects Design

Page 30: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

30

Between-Subjects Design: The DataStep-by-Step Stats Recap

Now let’s pretend that we had a between-subjects design:– One group of participants used the old insert and another group

used the new insert:

Page 31: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

31

Between-Subjects Design: Step 1Step-by-Step Stats Recap

1. Calculate the averages to get a sense for what the data is telling you.– But…

Page 32: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

32

Between-Subjects Design: Step 1Step-by-Step Stats Recap

1. Calculate the averages to get a sense for what the data is telling you.– But…

Page 33: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

33

Between-Subjects Design: Step 2Step-by-Step Stats Recap

2. Delete time on task and ease of use data for participants who failed.

Page 34: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

34

Between-Subjects Design: Steps 3 and 4Step-by-Step Stats Recap

3. Open Stats Usability Pak spreadsheet (or another statistical package).

4. Decide on the alpha level (before running any statistics).– .01?– .05? most

common– .1?

Page 35: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

35

Between-Subjects Design: Step 5Step-by-Step Stats Recap

5. Select an appropriate test for task completion (success rate).

Page 36: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

36

Between-Subjects Design: Step 5Step-by-Step Stats Recap

5. Select an appropriate test for task completion (success rate).– Run Fisher’s Exact Test.

Page 37: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

37

Between-Subjects Design: Step 5Step-by-Step Stats Recap

5. Select an appropriate test for task completion (success rate).– Run Fisher’s Exact Test.– Write down the important number(s).

Page 38: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

38

Between-Subjects Design: Step 6Step-by-Step Stats Recap

6. Select an appropriate test for the time on task measure.

Page 39: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

39

Between-Subjects Design: Step 6Step-by-Step Stats Recap

6. Select an appropriate test for the time on task measure.– Run an independent samples t-test.

Page 40: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

6. Select an appropriate test for the time on task measure.– Run an independent samples t-test.– Write down the important number(s).

40

Between-Subjects Design: Step 6Step-by-Step Stats Recap

DF = (20+25) – 2 =43

DF = (20+25) – 2 =43

Page 41: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

41

Between-Subjects Design: Step 7Step-by-Step Stats Recap

7. Select an appropriate test for the ease of use ratings.

Page 42: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

42

Between-Subjects Design: Step 7Step-by-Step Stats Recap

7. Select an appropriate test for the ease of use ratings.– Run an independent samples t-test.

Page 43: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

7. Select an appropriate test for the ease of use ratings.– Run an independent samples t-test.– Write down the important number(s).

43

Between-Subjects Design: Step 7Step-by-Step Stats Recap

DF = (20+25) – 2 =43

DF = (20+25) – 2 =43

Page 44: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

44

Between-Subjects Design: Step 8Step-by-Step Stats Recap

8. Describe your findings for task completion (success/fail).– We conducted Fisher’s Exact test to compare the task

completion rate with the old insert (69%) to the task completion rate with the new insert (86%) and found no statistically significant difference at alpha level .05.

But what if there was a difference?• We conducted Fisher’s Exact

test to compare the task completion rate with the old insert (69%) to the task completion rate with the new insert (93%). The completion rates were significantly different (p < .05), such that more participants successfully completed the task when using the new insert than when using the old insert.

Page 45: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

45

Between-Subjects Design: Step 9Step-by-Step Stats Recap

9. Describe your findings for time on task.– We conducted an independent samples t-test to compare the

time on task between the two insert versions. We found a significant difference (t(43) = -2.50, p < .05), such that participants who used the old insert took longer to complete the task (M = 33.5 s, SD = 12.6 s) than participants who used the new insert (M = 24.3 s, SD = 11.7 s).

Page 46: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

46

Between-Subjects Design: Step 10Step-by-Step Stats Recap

10. Describe your findings for ease of use ratings.– We also conducted an independent samples t-test to compare

the ease of use ratings participants assigned to the two insert versions. There was no significant difference between the ease of use ratings for the old insert (M = 5.6, SD = 1.1) and the new insert (M = 6.0, SD = 0.8) at alpha level .05.

Page 47: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

47

Case Study: Pick Best-In-Class MP3 Player

Page 48: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

48

Pick A Device, Any DeviceCase Study: Pick Best-In-Class MP3 Player

Page 49: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

49

Understand the user experience related to popular MP3 devices

Identify usability issues

Recommend possible solutions to improve the UI

Any concerns?

Seemed reasonable to us.

Research ObjectivesCase Study: Pick Best-In-Class MP3 Player

Page 50: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

50

What should we do?

What are we measuring?

Who are the participants?

How long is each session?

How many participants?

What Makes Sense?Case Study: Pick Best-In-Class MP3 Player

Page 51: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

51

What should we do?– Formative research, but on several competitive products– Iterative usability test a UI

What are we measuring?– Success/Fail, ease-of-use, usability issues, etc.

Who are the participants?– Depends on market demographics

How long is each session?– Depends on what we test

How many participants?– Probably small samples, because this is formative

Our Thoughts…Case Study: Pick Best-In-Class MP3 Player

Page 52: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

52

On market at the time…

Five MP3 PlayersCase Study: Pick Best-In-Class MP3 Player

Page 53: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

53

21 Tasks of Interest were selected– High Frequency of Use (“Play Song”)– Priority (“Create Playlist”)

5 User Interfaces – Four Competitors– One Client Design (Echo)

Stakeholder Had More Information…Case Study: Pick Best-In-Class MP3 Player

Alpha Bravo Charlie Delta Echo

Page 54: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

54

Task x Device MatrixCase Study: Pick Best-In-Class MP3 Player

What do you make of this?

Page 55: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

55

Ensure that the new design will be best-of-breed relative to the competition

Oh, we failed to mention:– This is extremely high profile– Data will drive strategy for entire organization and all products– Report will go directly to C-level executives…

And oh, did I mention that we’re gonna need the results to be statistically significant…?

During Kickoff Meeting, More Was Revealed…Case Study: Pick Best-In-Class MP3 Player

Page 56: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

56

21 Tasks x 5 Designs = 105 Combinations to test

Can we run a within-subjects design methodology? – Why or why not?

• Learning• Fatigue

Can we run a between-subjects design methodology?

What Concerns You?Case Study: Pick Best-In-Class MP3 Player

Page 57: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

57

21 Tasks x 5 Designs = 105 Combinations to test

Can we run a within-subjects design methodology? – Why or why not?

• Learning• Fatigue

Can we run a between-subjects design methodology?

What Concerns You?Case Study: Pick Best-In-Class MP3 Player

Page 58: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

58

21 Tasks x 5 Designs = 105 Combinations to test

Can we run a within-subjects design methodology? – Why or why not?

• Learning• Fatigue

Can we run a between-subjects design methodology?

What Concerns You?Case Study: Pick Best-In-Class MP3 Player

Page 59: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

59

21 Tasks x 5 Designs = 105 Combinations to test

Can we run a within-subjects design methodology? – Why or why not?

• Learning• Fatigue

Can we run a between-subjects design methodology?

What Concerns You?Case Study: Pick Best-In-Class MP3 Player

Page 60: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

60

Found out that not all tasks were possible on each device!!!

When We Looked Deeper…Case Study: Pick Best-In-Class MP3 Player

Page 61: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

61

They analyzed the task flows and wanted to add another variant UI

Goal Is Best of Breed, So One More ChangeCase Study: Pick Best-In-Class MP3 Player

Page 62: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

62

What Else Can I Say?Case Study: Pick Best-In-Class MP3 Player

Page 63: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

63

Our ApproachCase Study: Pick Best-In-Class MP3 Player

Now, we did convince the client that– UC would design the study such that it would be sensitive

enough to detect statistical significance if it did indeed exist

Thus, there would be NO a priori assurances of finding significant differences!

We were charged with:– Research activities– Methodology that would provide data to justify design direction

A device was going to be built…

Page 64: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

64

Core Elements to TestCase Study: Pick Best-In-Class MP3 Player

Core elements to test– Access points (navigating to a feature is easy)– Feature task flows (completing task may be hard)– Design look and feel– Iconography– Verbiage

Page 65: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

65

Research Program InvolvedCase Study: Pick Best-In-Class MP3 Player

Testing functional task flows

Testing verbiage (navigation and features)

Testing access

Testing iconography

Testing graphic design treatments

Re-test of updated user interface (all elements)

Page 66: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

66

Focusing on Testing Functional Task FlowsCase Study: Pick Best-In-Class MP3 Player

Thoughts on considerations for 21 Tasks x 6 Designs?

Areas for bias?– Order

• Task• Device• Combination

– Device hardware

Page 67: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

67

Minimize BiasCase Study: Pick Best-In-Class MP3 Player

Realistically, participants could– Only interact with 2-3 designs effectively (not 5-7) – Assumed each participant could complete ~ 6 tasks

Create prototypes on a computer– Level the “playing field” to core task flow elements

Usability testing / Quantitative Data Collection– Recruited target demographic (incl. high schools) – Needed simultaneous test teams– In the end, we really needed to know the “story”

Page 68: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

68

Approach: Create Blocks Case Study: Pick Best-In-Class MP3 Player

Sheer size of all possible combinations required blocking into groups– Designs x Tasks combinations

Within each block… Order of task presentation

– Tasks were systematically counter-balanced to reduce learning and order effects

Order of device presentation– For each participant, devices were randomized within each task

to reduce learning and order effects

Page 69: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

69

Use of BlocksCase Study: Pick Best-In-Class MP3 Player

Participants assigned to blocks– Individual participant device x task combinations were

• Randomized • Assigned to participants to reduce learning effects

– Constraint: Familiarity biases were avoided • In this sanitized example, iPod owners did not test their

iPods

Blocks of four to six tasks were formed to roughly control for total number of steps to complete

Page 70: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

70

Experimental DesignCase Study: Pick Best-In-Class MP3 Player

Design– Each participant received five or six (of the 21) tasks by three (of

the six) devices• Six devices (A/B/C/D/E/F) taken three at a time yields a total

of 20 unique device combinations per task (e.g., ABC, ABD, ABE…DEF)

• A participant was matched to each unique three device combinations

• Hence, to complete a full block, 20 participants were needed

Each device by task cell was represented with at least 10 participants (N=80 to have four blocks of 20)

Page 71: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

71

Example of Participants #1 and #2Case Study: Pick Best-In-Class MP3 Player

Julian Tracy O.

Participant #1 Participant #2

16-24 years 25-44 years

task="11" device=Bravo task="16" device=Bravo

task="11" device=Foxtrot task="16" device=Delta

  task="16" device=Charlie

task="6" device=Alpha task="13" device=Charlie

task="6" device=Foxtrot task="13" device=Delta

task="6" device=Bravo task="13" device=Bravo

task="4" device=Bravo task="8" device=Delta

task="4" device=Foxtrot task="8" device=Bravo

task="4" device=Alpha task="8" device=Charlie

task="15" device=Foxtrot task="18" device=Bravo

task="15" device=Bravo task="18" device=Delta

 

task="17" device=Foxtrot task="9" device=Charlie

task="17" device=Bravo task="9" device=Bravo

task="17" device=Alpha task="9" device=Delta

Page 72: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

72

MeasuresCase Study: Pick Best-In-Class MP3 Player

Time-on-Task

Efficiency (Deviation from Optimal Path)– Total screens viewed / Optimal path for the task

• More incorrect “steps” increases this metric

Success– % participants in each cell (device x task) who successfully

completed the task

Preference– Pair-wise device preferences for a particular task with a

magnitude judgment

Page 73: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

73

Task-by-Task AnalysisCase Study: Pick Best-In-Class MP3 Player

Page 74: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

74

Need More Than Simply QuantCase Study: Pick Best-In-Class MP3 Player

We anticipated that the results would not be so clear cut as to identify a single design and declare a winner

On a particular task, we might find that Device Foxtrot was #1– But, what made Device Echo #2?

We need to know the “why”– Partitioned participants (N=16) for qualitative UT– Other participants (N=80) were sufficient for quantitative

Page 75: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

75

Qualitative data: User CommentsCase Study: Pick Best-In-Class MP3 Player

Page 76: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

76

Interesting PointsCase Study: Pick Best-In-Class MP3 Player

Start with the high frequency / high priority tasks…– Why did the Foxtrot design win?– Why did the Alpha design lose?– Compare quantitative with qualitative

Complex tasks – The fastest time was not the best– More clicks, less error, high satisfaction

Sometimes winners would emerge for different reasons…– How do you weigh different UI conventions?

Page 77: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

77

Lessons LearnedCase Study: Pick Best-In-Class MP3 Player

“Know the story”– Benefit of Qualitative Data– Absolute “must have”– Asked for N=100, and pushed some to qualitative

Learning– Counterbalancing was sufficient– But, these are not walk-up-and-use devices…

Avoid “Frankenstein” Design– Do not simply pick the winning task flow and implement– Consistency matters!

Page 78: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

78

Other Usability Evaluation Methods

Page 79: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

79

FGs are structured group interviews that quickly and inexpensively reveal a target audience’s desires, experiences, reactions, attitudes...– Mostly used in market

research.– Capture opinions rather

than behavior.

However, FGs can sometimes be used for usability evaluation.– But the procedure and

questions should be modified.

Focus Groups (FGs)Other Usability Evaluation Methods

Page 80: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

80

If it is a new product:– Participants interact with a

product and then discuss their difficulties as a group.

• You can have a few simultaneous one-on-one sessions followed by a FG.

• You can also have activities during FGs.

Benefits:– Participants are not just watching the moderator use the product

(as they do in traditional FGs) but get to experience it first hand.– Group discussion can provide more feedback and richer

feedback in less time than several individual UTs.

Focus GroupsOther Usability Evaluation Methods

Page 81: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

81

If it is an existing product:– Participants who have experience interacting with the product

are invited to discuss their difficulties in a FG.• E.g., Pharmacists come to discuss their issues with particular

types of blister packs. They also get to interact with the packs (and a few prototype packs) during the session.

Benefits:– Group discussion can provide more feedback and richer

feedback in less time than several individual UTs.

Focus GroupsOther Usability Evaluation Methods

Page 82: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

82

Usability testing assesses walk-up-and-use usability, not actual use– Assess learning, usage and satisfaction over time

“Drop and Soak” studies– Provide device and let soak– Transition from initial use to familiar/experienced use

Repeat user testing studies– Bring users back multiple times to assess evolution– Multiple usability tests with same participants– Iterations and evolution

User group – Rinse, wash, repeat

Sidebar: Why longitudinal research is rarely seen

LongitudinalOther Usability Evaluation Methods

Page 83: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

83

Gap Between Web Analytics and SurveysOther Usability Evaluation Methods

Web AnalyticsWhat customers do

SurveysWho Customers AreWhat Customers Say

Automated TestingCombines Analytics andSurvey

Page 84: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

84

Traditional online surveys measure satisfaction.– Collect quantitative measures.– Obtain attitudinal scores.

But, there is no connection to the experience.

Interactive surveys (remote un-moderated testing)– Task-based research using online panels (Keynote, UserZoom

etc.)– True-intent research using actual site visitors (LEOtrace)

Remote Unmoderated TestingOther Usability Evaluation Methods

Page 85: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

85

Task-based Web SurveyOther Usability Evaluation Methods

Page 86: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

86

Task-based Web SurveyOther Usability Evaluation Methods

Page 87: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

87

Heatmaps of ClicksOther Usability Evaluation Methods

Page 88: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

88

ScorecardsOther Usability Evaluation Methods

Navigation menu through the modules defined

The homepage highlights the most important information

from each module

Different colors illustrate different results

Comparative analysis

Page 89: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

89

Evaluating Expert User Interfaces

Page 90: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

90

User of interface is used an expert– User has extensive experience on application– User uses the application a lot

• One could argue MS Word is an example

Exercise– Break into groups– Evaluate usability of this expert interface– Customer relationship management (CRM) system

Not Walk Up and UseEvaluating Expert User Interfaces

Page 91: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

91

Customer care information– Details on a single customer for residential service

Exercise: Review This Interface…Evaluating Expert User Interfaces

Page 92: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

92

As with any customer, knowing the “history” can help– User clicks Notes on left to open Interaction Notes

Example: HistoryEvaluating Expert User Interfaces

Page 93: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

93

From Interactive Notes, individual notes can be reviewed

Example: NotesEvaluating Expert User Interfaces

Page 94: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

94

Call centers– Approx. 100 calls per day– Typical environment involves cubicle farms– Stand and ask questions– Multi-tasking– Time pressure– Possible sales incentives in effect– Rapid consumption of screens

Does this change anything?!?!

Does the evaluation differ?

Example: ContextEvaluating Expert User Interfaces

Page 95: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

95

Quite a unique user group

Demonstrate over-learned behaviors– High number of transactions, high volume of calls– Rote memorization of commands and actions

Emphasis is all about their workflow – They make transactions quickly across multiple systems – In most cases, they do not need to look at entire screen

• Consider adding a new service, the first few screens may be irrelevant as they are observed clicking for navigation and not for information

– Introduction of any new system can have grave impacts

Reality: – Traditional walk up and use methods may be insufficient for this

audience and this type of interface

Expert UsersEvaluating Expert User Interfaces

Page 96: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

96

S:– s

Take Another 5 minutes…Evaluating Expert User Interfaces

Page 97: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

97

Findings…?

Main ScreenEvaluating Expert User Interfaces

Page 98: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

98

Findings…?

Interaction NotesEvaluating Expert User Interfaces

Page 99: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

99

Emphasis on pretty and “easily-trainable” – But, this is not a walk-up-and-use application

Design issues– Fitts’ Law

Workflow– More effort to get the same information

What is the impact?– Will users even click?– Poor knowledge of user history

• Impact?

Core IssuesEvaluating Expert User Interfaces

Page 100: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

100

Change can be hard– Especially if this represents a paradigm shift from accessing

features and sections to a single view of customer• Seems logical that a single point of view is better, but this

assumes an thorough understanding of the workflow across a day

• However, theoretical savings on a spreadsheet do not often equate to benefits in the real world

– Users multi-task

Adoption and usage are key drivers…– How should you test these interfaces?!?!

Introduction of a New ApplicationEvaluating Expert User Interfaces

Page 101: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

101

Usability tests uncovered issues on a page by page and task level basis– Areas such as time, content comprehension, navigation, layout,

labeling, and functionality – UT is exceptional at targeting page level details

However, usage adoption requires more than just page level data– How this system fits into workflow or “I won’t use”– UT participants were so task-focused – No mention of high-level impacts to their job or and daily tasks

Focus groups for interactive applications leads to “spoon feeding”

Try two-person UTs– Workflow becomes an issue as one watches the other work…

Discussion of MethodsEvaluating Expert User Interfaces

Page 102: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

102

Review of Project 2

Page 103: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

103

Executive summary– Context + main themes / key findings– Don’t forget to mention the positives– Issues: Too much detail or too high-level

Objectives Methods Findings Recommendations– “Dangling objectives” (no findings address them)– “Dangling methods” (no corresponding findings or objectives)– “Dangling findings” (no corresponding objectives or method)

Describing findings– Description of a participant difficulty/error.– Description of the source of the difficulty.

• What in the interface caused the difficulty?

ThemesReview of Project 2

TASK/USER-ORIENTED

PRODUCT-ORIENTED

Page 104: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

104

Within-subjects design?

“Participants” vs. “users”

“Formative study”

Visual references– Video clips

ThemesReview of Project 2

Page 105: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

105

ScoresReview of Project 2

ABB-C+CDF

Page 106: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

106

Grades

Page 107: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

107

15% Project 1: Expert evaluation

25% Project 2: Formative usability study

20% Project 3: Quantitative comparison study

10% Take-home midterm quiz

20% Final exam

10% Individual contribution to projects

GradingGrades

Page 108: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

108

Project 1Grades

ABB-C+CDF A-

Page 109: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

109

Project 2Grades

ABB-C+CDF

Page 110: HCI460: Week 10 Lecture November 11, 2009. 2  Project 3: Questions?  Recap: Step-by-step stats for A-B comparisons –For within-subjects design –For.

110

A 10 and 9.5 A- 9 B+ 8.5 B 8 B- 7.5 C+ 7 C 6.5 C- 6 D+ 5.5 D 5

MidtermGrades