Confidence Intervals and Hypothesis Tests

30
Section 4.5 Confidence Intervals and Hypothesis Tests

description

Section 4.5. Confidence Intervals and Hypothesis Tests. Bootstrap and Randomization Distributions. Big difference : a randomization distribution assumes H 0 is true, while a bootstrap distribution does not. Which Distribution?. - PowerPoint PPT Presentation

Transcript of Confidence Intervals and Hypothesis Tests

Page 1: Confidence Intervals and  Hypothesis Tests

Section 4.5

Confidence Intervals and Hypothesis Tests

Page 2: Confidence Intervals and  Hypothesis Tests

Bootstrap and Randomization Distributions

Bootstrap Distribution Randomization Distribution

Estimates the distribution of sample statistics

Estimates the distribution of sample statistics, if H0 true

Centered around the observed sample statistic

Centered around the null hypothesized value

Simulate sampling from the population by resampling from the original sample

Simulate samples assuming H0 were true

Big difference: a randomization distribution assumes H0 is true, while a bootstrap distribution does not

Page 3: Confidence Intervals and  Hypothesis Tests

Which Distribution? Let be the average amount of sleep college students get per

night. Data was collected on a sample of students, and for this sample hours.

A bootstrap distribution is generated to create a confidence interval for , and a randomization distribution is generated to see if the data provide evidence that > 7.

Which distribution below is the bootstrap distribution?

(a) is centered around the sample statistic, 6.7

Page 4: Confidence Intervals and  Hypothesis Tests

Which Distribution? Intro stat students are surveyed, and we find that 152 out

of 218 are female. Let p be the proportion of intro stat students at that university who are female.

A bootstrap distribution is generated for a confidence interval for p, and a randomization distribution is generated to see if the data provide evidence that p > 1/2.

Which distribution is the randomization distribution?

(a) is centered around the null value, 1/2

Page 5: Confidence Intervals and  Hypothesis Tests

Body TemperatureWe created a bootstrap distribution for average body temperature by resampling with replacement from the original sample (

Page 6: Confidence Intervals and  Hypothesis Tests

Body TemperatureWe also created a randomization distribution to see if average body temperature differs from 98.6F by adding 0.34 to every value to make the null true, and then resampling with replacement from this modified sample:

Page 7: Confidence Intervals and  Hypothesis Tests

Body TemperatureThese two distributions are identical (up to random

variation from simulation to simulation) except for the center

The bootstrap distribution is centered around the sample statistic, 98.26, while the randomization distribution is centered around the null hypothesized value, 98.6

The randomization distribution is equivalent to the bootstrap distribution, but shifted over

Page 8: Confidence Intervals and  Hypothesis Tests

Body Temperature

Bootstrap Distribution

Randomization DistributionH0: = 98.6Ha: ≠ 98.6

98.26 98.6

Page 9: Confidence Intervals and  Hypothesis Tests

Body Temperature

Bootstrap Distribution

98.26 98.4

Randomization DistributionH0: = 98.4Ha: ≠ 98.4

Page 10: Confidence Intervals and  Hypothesis Tests

Intervals and Tests

If a 95% CI misses the parameter in H0, then a two-tailed test should reject H0 at a 5%

significance level.

If a 95% CI contains the parameter in H0, then a two-tailed test should not reject H0 at a 5%

significance level.

Page 11: Confidence Intervals and  Hypothesis Tests

Intervals and Tests

A confidence interval represents plausible values for the population parameter

If the null hypothesized value IS NOT within the CI, it is not plausible and should be rejected

If the null hypothesized value IS within the CI, it is plausible and should not be rejected

Page 12: Confidence Intervals and  Hypothesis Tests

• Using bootstrapping, we found a 95% confidence interval for the mean body temperature to be (98.05, 98.47)

• This does not contain 98.6, so at α = 0.05 we would reject H0 for the hypotheses

H0 : = 98.6Ha : ≠ 98.6

Body Temperatures

Page 13: Confidence Intervals and  Hypothesis Tests

Both Father and Mother• “Does a child need both a father and a mother to grow

up happily?”

• Let p be the proportion of adults aged 18-29 in 2010 who say yes. A 95% CI for p is (0.487, 0.573).

• Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, do we reject H0 or not reject H0?

Do not reject H0

0.5 is within the CI, so is a plausible value for p.

http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1

Page 14: Confidence Intervals and  Hypothesis Tests

Both Father and Mother• “Does a child need both a father and a mother to grow

up happily?”

• Let p be the proportion of adults aged 18-29 in 1997 who say yes. A 95% CI for p is (0.533, 0.607).

• Testing H0: p = 0.5 vs Ha: p ≠ 0.5 with α = 0.05, we reject H0 or do not reject H0?

Reject H0

0.5 is not within the CI, so is not a plausible value for p.

http://www.pewsocialtrends.org/2011/03/09/for-millennials-parenthood-trumps-marriage/#fn-7199-1

Page 15: Confidence Intervals and  Hypothesis Tests

Intervals and TestsConfidence intervals are most useful when you

want to estimate population parameters

Hypothesis tests and p-values are most useful when you want to test hypotheses about population parameters

Confidence intervals give you a range of plausible values; p-values quantify the strength of evidence against the null hypothesis

Page 16: Confidence Intervals and  Hypothesis Tests

Interval, Test, or Neither?

Are the following questions best assessed using a confidence interval, a hypothesis test, or is statistical inference not relevant?

• Do a majority of adults riding a bicycle wear a helmet?

• On average, how much more do adults who played sports in high school exercise than adults who did not play sports in high school?

• On average, were the 23 players on the 2010 Canadian Olympic hockey team older than the 23 players on the 2010 US Olympic hockey team?

Page 17: Confidence Intervals and  Hypothesis Tests

• With small sample sizes, even large differences or effects may not be significant

• With large sample sizes, even a very small difference or effect can be significant

• A statistically significant result is not always practically significant, especially with large sample sizes

Cautions About Significance

Page 18: Confidence Intervals and  Hypothesis Tests

Example: Suppose a weight loss program recruits 10,000 people for a randomized experiment.

• A difference in average weight loss of only 0.5 lbs could be found to be statistically significant

• Suppose the experiment lasted for a year. Is a loss of ½ a pound practically significant?

Statistical vs Practical Significance

Page 19: Confidence Intervals and  Hypothesis Tests

Diet and Sex of Baby

Are certain foods in your diet associated with whether or not you conceive a boy or a girl?

To study this, researchers asked women about their eating habits, including asking whether or not they ate 133 different foods regularly.

For each of the 133 foods studied, a hypothesis test was conducted for a difference between mothers who conceived boys and girls in the proportion who consume each food.

http://www.newscientist.com/article/dn13754-breakfast-cereals-boost-chances-of-conceiving-boys.html

Page 20: Confidence Intervals and  Hypothesis Tests

Hypothesis TestsState the null and alternative hypotheses

If there are NO differences (all null hypotheses are true), about how many significant differences would be found using α = 0.05?

A significant difference was found for breakfast cereal (mothers of boys eat more), prompting the headline “Breakfast Cereal Boosts Chances of Conceiving Boys”. How might you explain this?

pb: proportion of mothers who have boys that consume the food regularlypg: proportion of mothers who have girls that consume the food regularly

H0: pb = pg

Ha: pb ≠ pg

133 0.05 = 6.65

Random chance; several tests (about 6 or 7) are going to be significant, even if no differences exist

Page 21: Confidence Intervals and  Hypothesis Tests

Multiple Testing

When multiple hypothesis tests are conducted, the chance that at least one test

incorrectly rejects a true null hypothesis increases with the number of tests.

If the null hypotheses are all true, α of the tests will yield statistically significant results

just by random chance.

Page 22: Confidence Intervals and  Hypothesis Tests

Multiple Comparisons• Consider a topic that is being

investigated by research teams all over the world

Using α = 0.05, 5% of teams are going to find something significant, even if the null hypothesis is true

Page 23: Confidence Intervals and  Hypothesis Tests

Multiple Comparisons

• Consider a research team/company doing many hypothesis tests

Using α = 0.05, 5% of tests are going to be significant, even if the null hypotheses are all true

Page 24: Confidence Intervals and  Hypothesis Tests

This is a serious problem

• The most important thing is to be aware of this issue, and not to trust claims that are obviously one of many tests (unless they specifically mention an adjustment for multiple testing)

• There are ways to account for this (e.g. Bonferroni’s Correction), but these are beyond the scope of this class

Multiple Comparisons

Page 25: Confidence Intervals and  Hypothesis Tests

Publication Bias

Publication bias refers to the fact that usually only the significant results get published

• The one study that turns out significant gets published, and no one knows about all the insignificant results

• This combined with the problem of multiple comparisons, can yield very misleading results

Page 26: Confidence Intervals and  Hypothesis Tests

http://xkcd.com/882/

Jelly Beans Cause Acne!

Page 27: Confidence Intervals and  Hypothesis Tests
Page 28: Confidence Intervals and  Hypothesis Tests
Page 29: Confidence Intervals and  Hypothesis Tests

http://xkcd.com/882/

Page 30: Confidence Intervals and  Hypothesis Tests

SummaryIf a null hypothesized value lies inside a 95% CI, a

two-tailed test using α = 0.05 would not reject H0

If a null hypothesized value lies outside a 95% CI, a two-tailed test using α = 0.05 would reject H0

Statistical significance is not always the same as practical significance

Using α = 0.05, 5% of all hypothesis tests will lead to rejecting the null, even if all the null hypotheses are true