Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences...

Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.

Today we’ll use the very same principles, but we’ll do tests to use sample proportions of categorical variables to make inferences about parameters.

E.g., a random sample of households in a poor neighborhood in Lima, Peru, finds that 38% of the households are headed by single women.

Is this neighborhood proportion representative of the population proportion of Lima poor neighborhoods), or is it different enough to test statistically significant?

E.g., since a non-government organization began providing enterprise training & subsidies to women in a poor neighborhood in Buenos Aires, a random sample finds that the neighborhood’s proportion of income-earning adult women has increased from 34% to 39%.

Is this after vs. before proportion due to sampling variability? Or is it different enough to be statistically significant?

The same random sample finds that 41% of the sampled women who participated in the program & 36% of the sampled women who didn’t participate are now engaged in income-earning activities.

Is this difference in proportions due to sampling variability? Or is it different enough to be statistically significant?

Sample proportion: a binomial count within a sample divided by the sample size-n.

It is a categorical variable (e.g., yes vs. no; lived vs. died)

Sample Proportion

Premises

Random sample of independent observations.

Binomial (i.e. ‘success’/’failure’) count.

The population must be at least 10 times larger than the sample.

There must be at least 10 observations for p & at least 10 observations for 1 – p.

If these sample assumptions are met, then the difference between the two proportions being compared (e.g., observed proportion vs. benchmark; two-sample proportions; after vs. before proportion) is approximately standard normal in distribution.

Moore/McCabe use a more precise than ‘traditional’ estimate of confidence interval for proportions called the ‘Wilson estimate.’

The Stata command is:

. ci binaryvar, binomial wilson

As we’ll later discuss, there are other options besides ‘wilson’.

Wilson estimate of the population proportion based on sample data:

= X + 2/n + 4

Standard error of the proportion (i.e. based on sample data):

41 n)p(pse

Approximate level C confidence interval for the proportion:

Large-sample (n >5) significance test for a population proportion

se*zp

The Stata command to find a ‘Wilson estimate’ of a population proportion based on a confidence interval of sample data:

. ci hmath, binomial wilson

. ci hmath, b w level(90)

. ci hmath, b w l(99)

Other binomial options: exact, agresti, jeffreys.

To Repeat: The Steps

Step 1: Ask if the binomial assumptions are fulfilled (including that both the expected #failures & the expected #successes >10).

Step 2: Do a frequency table or bar graph of the binary variable & display the variable’s sample proportion.

Step 3: If all checks out okay, state the null hypothesis & the alternative hypothesis.

Step 4: Conduct the hypothesis test.

. use hsb2, clear

. gen hmath=math>=60 & math<.

. la var hmath “Honors math (>=60)”

Example

. tab hmath

Honors math(>=60) Freq. Percent Cum.

0 151 75.50 75.501 49 24.50 100.00

Total 200 100.00

. ci hmath, binomial wilson

Variable Obs Mean Std. Err. [95% Conf Interval]

hmath | 200 .245 .0304118 .1905687 .3090424

Ho: hmath=.265; Ha: hmath~=.265.

Is hmath significantly different from .265 (two-sided test, i.e. does the mean of hmath fall outside the confidence interval)?

Does using other command options (or no option except ‘binomial’) make a difference?

A Two-Sided CI Hypothesis Test

How to Conduct a Large-Sample Hypothesis Test for a Population

Proportion: prtest

. ‘prtest’ allows testing one- or two-sided hypotheses.

Check the premises & data.

Test the hypothesis.

. prtest hmath = .265One-sample test of proportion hmath: Number of obs = 200

Variable Mean Std. Err. z P>z [95% Conf. Interval]

hmath .245 .0304118 8.05609 0.0000 .1853941 .3046059

Ho: proportion(hmath) = .265

Ha: hmath < .265 Ha: hmath ~= .265 Ha: hmath > .265

z = -0.641 z = -0.641 z = -0.641

P < z = 0.2608 P > z = 0.5216 P > z = 0.7392

Conclusion: Fail to reject Ho.

Recall that conclusions are always uncertain.

How to Conduct a Test Comparing Two Proportions

Ho: female hmath = male hmath

Ha: female hmath ~= male hmath

Check the premises & the data:.

tab female hmath

female=1 | math>=60 male=0 | 0 1 | Total------------------------------------------- male | 68 23 | 91 female | 83 26 | 109 ------------------------------------------- Total | 151 49 | 200

State the hypotheses.

Ho: female hmath = male hmath

Ha: female hmath ~= male hmath

Step 3: test the hypothesis.

. prtest hmath, by(female)Two-sample test of proportion

male: Number of obs = 91

female: Number of obs = 109

Variable Mean Std. Err. z P>z [95% Conf. Interval]

male .2527473 .0455571 5.54792 0.0000 .1634569 .3420376

female .2385321 .0408212 5.84334 0.0000 .158524 .3185402

diff .0142151 .0611704 -.1056767 .134107

under Ho: .0610714 .232763 0.8159

Ho: proportion(male) - proportion(female) = diff = 0

Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0

z = 0.233 z = 0.233 z = 0.233

P < z = 0.5920 P > z = 0.8159 P > z = 0.4080

Conclusion: Fail to reject Ho.

Substantive conclusions? Next research steps?

Here’s an after vs. before example.

. Does a summer math course significantly increase the proportion of students who qualify for honors math?

Check the sample premises.

Display the data proportion.

Test the hypothesis:

Ho: post-test honors proportion = pre-test honors proportion (i.e. difference = 0)

Ha: post-test honors proportion > pre-test honors proportion (i.e. difference > 0)

. prtesti 191 .271 200 .245Two-sample test of proportion x: Number of obs = 191

y: Number of obs = 200

------------------------------------------------------------------------------

Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

x | .271 .0321612 .2079653 .3340347

y | .245 .0304118 .1853941 .3046059

-------------+----------------------------------------------------------------

diff | .026 .044263 -.0607539 .1127539

| under Ho: .0442491 0.59 0.557

------------------------------------------------------------------------------

Ho: proportion(x) - proportion(y) = diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

z = 0.588 z = 0.588 z = 0.588

P < z = 0.7216 P > |z| = 0.5568 P > z = 0.2784

Test conclusion?

Results are always uncertain.

Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences...

Documents

Transcript of Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences...