Evaluation Eyal Ophir CS 376 4/28/09. Readings Methodology Matters (McGrath, 1994) Practical Guide...
-
date post
22-Dec-2015 -
Category
Documents
-
view
213 -
download
1
Transcript of Evaluation Eyal Ophir CS 376 4/28/09. Readings Methodology Matters (McGrath, 1994) Practical Guide...
Readings
Methodology Matters (McGrath, 1994)Practical Guide to Controlled Experiments
on the Web (Kohavi et al., 2007)
Methodology Matters
Methods for Research in the Behavioral and Social Sciences
Different methods have strengths and weaknesses
Tradeoff between: Generalizability Precision Realism
Credibility requires consistency, convergence across methods
Study Design
Find baserates, correlations, or differencesRandomization of selection, assignment to
conditionsStatistical significanceValidity (internal, statistical, construct,
external)
Measures
Self reportTrace measuresObservation (by a visible or hidden
observer)Archival records (public or private)
Case Study: Multitasking UI
Users play two simultaneous instantiations of a game
Does making the two instantiations visually different make it easier to switch back and forth?
Case Study
• Tradeoffs: Generalizability, Precision, Realism
• Design: baserates, correlations, differences
• Random selection, assignment
• Validity: internal, statistical, construct, external
• Measures: self-report, trace measures, observation, archival records
• Manipulation: selection, intervention, induction
Web Experiments
Hypothesis testing and sample size Confidence, power Reducing the standard error
Sufficiently large sample size OEC with inherently low variability Reduce variability by excluding irrelevant cases
Web Experiments
Limitations of web experiments No explanation of mechanism Focus on short term effects Primacy/newness Must implement treatments
Web Experiments
Implementation Randomization
Pseudorandom with caching Hash and partition
Assignment Traffic splitting Server-side Client-side
Lessons learned (i.e.- tips for the researcher):Analysis
Mine the Data Time matters Multi-factor experiments
Lessons Learned
Trust and Execution Run A/A tests (test your system) Ramp-up and abort Correct sample size Assign 50% to treatment Beware day of week effects
Lessons Learned
Culture and Business Agree on OEC upfront Beware “harmless” features Weigh performance vs. maintenance cost Data-driven (vs. opinion-driven) culture
Extended Case Study
Assume the game UI from the first case study was an actual gaming site
The website is interested in promoting multiple simultaneous games between users, but users complain that it’s difficult to manage multiple games
Design a web-based study informed by the reading to test the new design
Case Study
• OEC
• Sample size, reducing error
• Ramp-up, automation
• Mechanism explanation, short vs. long-term effects, primacy/newness
• Randomization/assignment
• Mine the data, multi-factor experiments
• A/A tests, sample size, day of week effects