1 CENG 394 Introduction to Human-Computer Interaction CENG 394 HCI Evaluation.
User Study Evaluation Human-Computer Interaction.
-
Upload
hilda-alexander -
Category
Documents
-
view
215 -
download
2
Transcript of User Study Evaluation Human-Computer Interaction.
![Page 1: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/1.jpg)
User Study EvaluationHuman-Computer Interaction
![Page 2: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/2.jpg)
Hypothesis
• A statement of prediction• Describes what you expect will happen in your study• Alternative hypothesis (H1) – your prediction, i.e. a claim of difference in the population• e.g. Participants will commit more errors with interface A
than with interface B
• Null hypothesis (H0) – No difference or no effect• e.g. Participants will commit the same number of errors
between interface A and interface B or Participants will commit more errors in interface B than with interface A
![Page 3: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/3.jpg)
Hypothesis – one or two tailed?• Alternative hypothesis
• One-tailed: Participants will commit more errors with interface A than with interface B (i.e. directional)
• Two-tailed: There will be a significant difference in the number of errors participants commit with interface A than with interface B • but I don’t know if there will be more or fewer (i.e. non-
directional)
• Can’t prove the alternative hypothesis, can only reject the null hypothesis
• If your prediction was correct – reject null hypothesis• Not rejecting null hypothesis ≠ accepting it
![Page 4: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/4.jpg)
Metrics
• What you are measuring• Some types of metrics
• Objective – facts of an event• Time to complete task (continuous)• Errors (discrete, i.e. distinct and separate, can be
counted)• Subjective – a person’s opinion
• Satisfaction
![Page 5: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/5.jpg)
Metrics
• Types of metrics• Objective – facts of an event• Subjective – a person’s opinion• *Both* are important
• How to measure• Instrumentation – record data within your system• Questionnaires / Surveys
• Scales• Free-response
• Let’s discuss appropriateness of each• Let’s look at a very popular survey (SUS)
![Page 6: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/6.jpg)
Analysis•Most of what we do involves:
•Normal Distributed Results•Independent Testing•Homogenous Population
•Recall, we are testing the hypothesis by trying to prove the NULL hypothesis false
![Page 7: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/7.jpg)
Analysis• 3 main steps for analysis
• Data Preparation: Cleaning and organizing the data for analysis• Checking the data for accuracy• Transforming data (e.g. reverse coding survey data)
• Descriptive Statistics: Describing the data• Provide simple summaries about the sample and the measures• Simply describing what is, what the data shows
• Inferential Statistics: Testing Hypotheses and Models• Try to infer from the sample data what the population thinks• Make judgments of the probability that an observed difference
between groups is a dependable one or one that might have happened by chance
![Page 8: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/8.jpg)
Data preparation
• Checking data for accuracy• Are the responses legible/readable?
• Are all important questions answered?
• Are the responses complete?
• Is all relevant contextual information included (e.g., data, time, place, researcher)?
![Page 9: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/9.jpg)
Data preparation• Data transformations
• Missing values• Depending on program, need designate specific values to
represent missing values, e.g. -99• Scale totals
• Add or average across individual items • Item reversals
• Likert scale – sometimes rating for items need to be reversed
• 1 (strongly disagree) – 5 (strongly agree)• “I generally feel good about myself.”• “Sometimes I feel like I'm not worth much as a person.”• What does a 5 mean in each case?
![Page 10: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/10.jpg)
Descriptive statistics• Simple summaries of sample and measures, i.e. data• Describing what is or what the data shows• Central tendency – estimate of the “center” of a distribution of values• Mean – average across a set of values
• 15, 15, 18, 25, 33 = 106• µ = 106/5 = 21.2
• Median – score found in middle of a set of values• 15, 15, 18, 25, 33
• Mode – most frequently occurring value• 15, 15, 18, 25, 33
• Describe the data with a number and a graph
![Page 11: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/11.jpg)
Inferential statistics• Try to reach conclusions that go beyond the immediate data – draw inferences
• e.g. want to compare the average performance of 2 groups to see if there’s a difference
t-test: statistical test used to determine whether two observed means are statistically different
![Page 12: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/12.jpg)
t-test• What does it mean to say that the averages for two groups are statistically different?
![Page 13: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/13.jpg)
t-test
• Variability is the noise that may make it harder to see the group difference
• Variance: measure of variability around the mean
• Standard deviation: square root of the variance
![Page 14: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/14.jpg)
t – test • (rule of thumb) Good values of t > 1.96 (standard
deviations from the mean)
![Page 15: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/15.jpg)
t-test• Once computed, look up t-value to see whether the ratio is large
enough to say that the difference between the groups is not likely to have been a chance finding.
• To test the significance, you need to set a risk level (called the alpha level). Accepted standard is alpha level of .05. • 5 times out of 100 you would find a statistically significant difference
between the means even if there was none (i.e., by "chance").
• Degrees of freedom (df). For t-test, the df = sum of the persons in both groups minus 2.
• Given the alpha level, the df, and the t-value, look up t-value to determine whether the t-value is large enough to be significant.
• If yes, conclude that difference between means for the 2 groups is different (even given the variability) and reject null hypothesis.
![Page 16: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/16.jpg)
α and p values• α value – probability of making a Type I error (rejecting null hypothesis when really true)
• p value – probability that the effect found did not occur by chance. The lower the p value, the higher the statistical significance (the more rigorous the test)
![Page 17: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/17.jpg)
Relationship between α and p values• Once the alpha level has been set, a statistic (like t) is
computed.
• Each statistic has an associated probability value called a p-value, or the likelihood of an observed statistic occurring due to chance, given the sampling distribution.
• Alpha sets the standard for how extreme the data must be before we can reject the null hypothesis. The p-value indicates how extreme the data are.
• Compare the p-value with alpha to determine whether the observed data are statistically significantly different from the null hypothesis
![Page 18: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/18.jpg)
Kinds of t-tests
Formula is slightly different for each:• Single-sample:
• tests whether a sample mean is significantly different from a pre-existing value (e.g. norms)
• Paired-samples:
• tests the relationship between 2 linked samples, e.g. means obtained in 2 conditions by a single group of participants
• Independent-samples:
• tests the relationship between 2 independent populations
• Which test fits your situation?
![Page 19: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/19.jpg)
t and alpha values
![Page 20: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/20.jpg)
Independent samples t-test• Example: social presence questionnaire• “I perceived I was in the presence of a patient in the room
with me.”• http://www.vassarstats.net/tu.html
![Page 21: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/21.jpg)
Correlations
Correlations – relationship between two variablesPearon’s product-moment correlation coefficient – r
http://bdaugherty.tripod.com/KeySkills/lineGraphs.html
![Page 22: User Study Evaluation Human-Computer Interaction.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649ea15503460f94ba42b6/html5/thumbnails/22.jpg)
Correlations
Pearson’s product-moment correlation coefficient – r
http://www.socscistatistics.com/tests/pearson/Default2.aspx
http://en.wikipedia.org/wiki/Correlation_and_dependence