Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences...

28

Transcript of Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences...

Page 1: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.
Page 2: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.

Today we’ll use the very same principles, but we’ll do tests to use sample proportions of categorical variables to make inferences about parameters.

Page 3: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

E.g., a random sample of households in a poor neighborhood in Lima, Peru, finds that 38% of the households are headed by single women.

Is this neighborhood proportion representative of the population proportion of Lima poor neighborhoods), or is it different enough to test statistically significant?

Page 4: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

E.g., since a non-government organization began providing enterprise training & subsidies to women in a poor neighborhood in Buenos Aires, a random sample finds that the neighborhood’s proportion of income-earning adult women has increased from 34% to 39%.

Is this after vs. before proportion due to sampling variability? Or is it different enough to be statistically significant?

Page 5: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

The same random sample finds that 41% of the sampled women who participated in the program & 36% of the sampled women who didn’t participate are now engaged in income-earning activities.

Is this difference in proportions due to sampling variability? Or is it different enough to be statistically significant?

Page 6: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Sample proportion: a binomial count within a sample divided by the sample size-n.

It is a categorical variable (e.g., yes vs. no; lived vs. died)

Sample Proportion

Page 7: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Premises

Random sample of independent observations.

Binomial (i.e. ‘success’/’failure’) count.

The population must be at least 10 times larger than the sample.

There must be at least 10 observations for p & at least 10 observations for 1 – p.

Page 8: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

If these sample assumptions are met, then the difference between the two proportions being compared (e.g., observed proportion vs. benchmark; two-sample proportions; after vs. before proportion) is approximately standard normal in distribution.

Page 9: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Moore/McCabe use a more precise than ‘traditional’ estimate of confidence interval for proportions called the ‘Wilson estimate.’

The Stata command is:

. ci binaryvar, binomial wilson

As we’ll later discuss, there are other options besides ‘wilson’.

Page 10: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Wilson estimate of the population proportion based on sample data:

= X + 2/n + 4

Standard error of the proportion (i.e. based on sample data):

41 n)p(pse

Page 11: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Approximate level C confidence interval for the proportion:

Large-sample (n >5) significance test for a population proportion

se*zp

Page 12: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

The Stata command to find a ‘Wilson estimate’ of a population proportion based on a confidence interval of sample data:

. ci hmath, binomial wilson

. ci hmath, b w level(90)

. ci hmath, b w l(99)

Other binomial options: exact, agresti, jeffreys.

Page 13: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

To Repeat: The Steps

Step 1: Ask if the binomial assumptions are fulfilled (including that both the expected #failures & the expected #successes >10).

Step 2: Do a frequency table or bar graph of the binary variable & display the variable’s sample proportion.

Step 3: If all checks out okay, state the null hypothesis & the alternative hypothesis.

Step 4: Conduct the hypothesis test.

Page 14: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

. use hsb2, clear

. gen hmath=math>=60 & math<.

. la var hmath “Honors math (>=60)”

Example

Page 15: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

. tab hmath

Honors math(>=60) Freq. Percent Cum.

0 151 75.50 75.501 49 24.50 100.00

Total 200 100.00

Page 16: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

. ci hmath, binomial wilson

Variable Obs Mean Std. Err. [95% Conf Interval]

hmath | 200 .245 .0304118 .1905687 .3090424

Ho: hmath=.265; Ha: hmath~=.265.

Is hmath significantly different from .265 (two-sided test, i.e. does the mean of hmath fall outside the confidence interval)?

Does using other command options (or no option except ‘binomial’) make a difference?

A Two-Sided CI Hypothesis Test

Page 17: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

How to Conduct a Large-Sample Hypothesis Test for a Population

Proportion: prtest

. ‘prtest’ allows testing one- or two-sided hypotheses.

Check the premises & data.

Test the hypothesis.

Page 18: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

. prtest hmath = .265One-sample test of proportion hmath: Number of obs = 200

Variable Mean Std. Err. z P>z [95% Conf. Interval]

hmath .245 .0304118 8.05609 0.0000 .1853941 .3046059

Ho: proportion(hmath) = .265

Ha: hmath < .265 Ha: hmath ~= .265 Ha: hmath > .265

z = -0.641 z = -0.641 z = -0.641

P < z = 0.2608 P > z = 0.5216 P > z = 0.7392

Conclusion: Fail to reject Ho.

Page 19: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Recall that conclusions are always uncertain.

Page 20: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

How to Conduct a Test Comparing Two Proportions

Ho: female hmath = male hmath

Ha: female hmath ~= male hmath

Check the premises & the data:.

Page 21: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

tab female hmath

female=1 | math>=60 male=0 | 0 1 | Total------------------------------------------- male | 68 23 | 91 female | 83 26 | 109 ------------------------------------------- Total | 151 49 | 200

Page 22: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

State the hypotheses.

Ho: female hmath = male hmath

Ha: female hmath ~= male hmath

Page 23: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Step 3: test the hypothesis.

Page 24: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

. prtest hmath, by(female)Two-sample test of proportion

male: Number of obs = 91

female: Number of obs = 109

Variable Mean Std. Err. z P>z [95% Conf. Interval]

male .2527473 .0455571 5.54792 0.0000 .1634569 .3420376

female .2385321 .0408212 5.84334 0.0000 .158524 .3185402

diff .0142151 .0611704 -.1056767 .134107

under Ho: .0610714 .232763 0.8159

Ho: proportion(male) - proportion(female) = diff = 0

Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0

z = 0.233 z = 0.233 z = 0.233

P < z = 0.5920 P > z = 0.8159 P > z = 0.4080

Conclusion: Fail to reject Ho.

Page 25: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Substantive conclusions? Next research steps?

Page 26: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Here’s an after vs. before example.

. Does a summer math course significantly increase the proportion of students who qualify for honors math?

Check the sample premises.

Display the data proportion.

Test the hypothesis:

Ho: post-test honors proportion = pre-test honors proportion (i.e. difference = 0)

Ha: post-test honors proportion > pre-test honors proportion (i.e. difference > 0)

Page 27: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

. prtesti 191 .271 200 .245Two-sample test of proportion x: Number of obs = 191

y: Number of obs = 200

------------------------------------------------------------------------------

Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x | .271 .0321612 .2079653 .3340347

y | .245 .0304118 .1853941 .3046059

-------------+----------------------------------------------------------------

diff | .026 .044263 -.0607539 .1127539

| under Ho: .0442491 0.59 0.557

------------------------------------------------------------------------------

Ho: proportion(x) - proportion(y) = diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

z = 0.588 z = 0.588 z = 0.588

P < z = 0.7216 P > |z| = 0.5568 P > z = 0.2784

Page 28: Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Test conclusion?

Results are always uncertain.