Answering Questions StatisticallyENVS 407 – Prevention of Tobacco Addiction
Key statistical ideas
• Clarify your questions
• Construct contrasts
• Know the procedure
• Control as much as you can, then leave the rest to chance!
Clarify your questions
• Initial Questions:
▫ Why are people buying cigarettes?
▫ Where are people getting their cigarettes?
• Problems:
▫ “Why” is super hard to answer.
• Solution:
▫ Break question into smaller, easier questions
▫ Try to think of questions that can be written as “how much/many” or “is this more than that”
Clarify your questions (cont’d)
• Break the question into parts:
▫ Availability: what stores sell the most types of tobacco?
▫ Availability: what stores carry the most brands of cigarettes?
▫ Advertising: which modes of advertising are most popular?
▫ Advertising: do different locations have different size advertisements?
Construct contrasts
• Statistics is about comparing one set of things to another set of things… and figuring out if they’re different
Construct contrasts – hypothesis
• What do you want to disprove?
▫ Null hypothesis
• What do you want to prove beyond a reasonable doubt?
▫ Alternative hypothesis
Construct contrasts – EXAMPLE 1
• Null hypothesis: The average number of tobacco advertisements at grocery stores is the same as the number of advertisements at a liquor shop.
• Alternative hypothesis: The average number of advertisements at grocery stores is less than at a liquor shop.
Construct contrasts – EXAMPLE 2
• Null hypothesis: The average percent of “sexy” ads at grocery stores is the same as at liquor stores.
• Alternative hypothesis: They are not the same.
Know the procedure – collecting data
• Once you translate the questions into statistics…
▫ Design the study
Means vs percentage?
▫ Randomly select places to sample the data
Careful of bias!
▫ Collect the data
Harder than you think!!
▫ Analyze the data
Easier than you think!!
▫ Interpret the data
Know the procedure – testing
• Things aren’t ever perfectly like the null…
• …but how different is too different?
Know the procedure – counts/means
• Greek vs. Roman
▫ μ = true mean (a.k.a. “average”)
▫ σ = true standard deviation
▫ x = sample mean (i.e., comes from the data)
▫ s = sample standard deviation (i.e., from the data)
▫ n = sample size (sometimes have n1 and n2)
Know the procedure – counts/means
Know the procedure – counts/means
Know the procedure – counts/means
Know the procedure – counts/means
Know the procedure – counts/means
Know the procedure – counts/means
Know the procedure – counts/means
Know the procedure – counts/means
Know the procedure – counts/means
t-statistic
Know the procedure – counts/means
t-statistic
But is this “big”?
Tables for the t distribution• If we want a 100·C% confidence
level for the test, we need to find the value so that we have a probability of C between -t* and t*
in a t distribution with n-1 degrees of freedom
• Example: 95% confidence level when n = 14 means that we need a
tail probability of 0.025, so t*=2.15
= 0.95
= 0.025
t*-t*
df = 14
Know the procedure – percentages
• The symbols
▫ p = true percentage
▫ Y = observed outcome (e.g. count of successes)
▫ n = sample size
▫ p = sample percentage (i.e., Y/n)
Know the procedure – percentages
Control as much as you can…
• Make sure you don’t “stack the deck”
▫ Don’t pick all your grocery stores from Center City and all of your liquor stores from University City
• Standardize definitions of “size of advertisement” and “theme of ad”
▫ It’s surprising how much opinions differ
• Think carefully about all variables which are important… but aren’t the one you’re most interested in. CONTROL THEM!!!
…leave the rest to chance!
• Randomize once you’ve controlled for the important variables
▫ Get a list of well “controlled” stores, and then randomly pick which you’ll visit
▫ Picking the easiest to go to will introduce selection bias!!
Key statistical ideas
• Clarify your questions▫ Bigger -> smaller▫ Intangible -> quantifiable
• Construct contrasts▫ Compare two things: greater than? less than?
merely different?
• Know the procedure ▫ Or know someone who knows the procedure…
• Control as much as you can, leave the rest to chance!
Websites and resources
• Quick reference▫ http://en.wikipedia.org/wiki/Student's_t-test
▫ Use “unequal sample sizes, unequal variance”
• Simple t-test calculator▫ http://www.graphpad.com/quickcalcs/ttest1.cfm
• Wharon StatLab▫ http://www-stat.wharton.upenn.edu/~sivana/statlab.html
• My page ▫ http://stat.wharton.upenn.edu/~mbaiocch/
▫ The slides I used today▫ Spreadsheet▫ My contact info
Top Related